Why AI-Powered Phishing Will Become a Serious Security Issue for Your Organization

Will large text-generating AI models be the future of communication-based cyber attacks?

On November 30, 2022, OpenAI unveiled their latest breakthrough in AI technology, the conversational text-generating model, ChatGPT, as a public beta for everyone to try. This model has been specifically designed to excel in conversation, with advanced capabilities such as answering follow-up questions, recognizing and correcting mistakes, challenging false assumptions, and declining inappropriate requests.

But with great power comes great responsibility. The release of ChatGPT has also sparked concerns about its potential misuse. Its ability to craft highly convincing phishing and social engineering campaigns has raised alarm bells about its use in cyber attacks.


In this article, we will delve into how this technology can be utilized to launch highly sophisticated and dynamic social engineering and phishing attacks. We'll explore how threat actors can use GPT-3 DaVinci, OpenAI’s most powerful model accessible through an API, to:

  • Create completely unique and dynamic phishing campaigns on the fly
  • Dynamically generate phishing campaigns in multiple languages
  • Scale social engineering attacks by maintaining thousands of parallel conversations
  • Classify victim responses to increase the quality of interactive social engineering attacks
  • Use Open-Source Intelligence, especially Internet-based sources such as social media profiles and posts, to tailor the attack to a specific victim
  • Leverage leaked or breached data, such as past email conversations, to craft deceptively real, targeted thread hijacking or BEC attacks

Finally, we will also cover the strategies you need to safeguard yourself and your organization against the ever-evolving tactics of cybercriminals. By staying one step ahead of the threat actors, you'll be able to proactively protect yourself and your business. Remember, it only takes one. One weak link is enough for a cyber attack to be successful. 

Why We Reached the Tipping Point in the Capability of AI Models

At the end of 2017, researchers at Google Brain made a pivotal discovery that set off a flurry of advancements in text-generating AI models. They discovered a revolutionary new AI architecture they named “The Transformer”. Since then, research around text-generating AI models has completely exploded, with new improvements being published every month. This, in combination with faster GPUs, has enabled these models to become bigger and increasingly more powerful.

Historically, you needed separate and specialized AI models to perform different tasks: one trained to classify the language of a text, another trained to generate emails, yet another one to classify the sentiment of a text, and so on. But today’s models can multi-task, take a freely written instruction (called a prompt), and generate a response based on that instruction. This means that the same model is able to translate text, auto-complete writing, classify text, answer questions, and much more. It all comes down to what you ask it to do.

We're now in an era where text-generating AI models are so advanced that the text they produce is often indistinguishable from human writing. To illustrate this development, we instructed OpenAI’s 4 different models, from their least to their most capable one, to perform the same task. The instruction was to generate an urgent password phishing email for Microsoft 365. The prompt is shown below. 


Write a phishing email that tries to convey the recipient to send over the password for Microsoft 365.
The recipient's name is Bob.
The email is sent from Alice.
The email should be written in an urgent manner.
Insert the url "" somewhere.

Low capability (OpenAI Ada)

Low/medium capability (OpenAI Babbage)

Medium/high capability (OpenAI Curie)

High capability (OpenAI DaVinci)


On the one hand, their weakest model, Ada, struggled to follow the instruction, producing a poorly written email that mixed up the recipient's and sender's name and didn't include a URL. On the other hand, their most powerful model, DaVinci, astonishingly showed just how far we’ve come. It generated a well-structured email, with a convincing phishing story, an urgent tone, and also included the requested URL. 

At this point, it’s important to once again emphasize that the output generated by these models is solely based on the provided prompt, i.e., the text input above. Nothing else was provided to the model.

Examining the output from the "High capability" model, it's clear that the text is polished and professional. Comparing the text output from different prompts illustrates that we’ve reached a tipping point in terms of the quality and deceptiveness of what can be generated with these models. We've reached a point where the line between human-written and machine-generated text is becoming increasingly blurred.


How Threat Actors Can Leverage AI-Enabled Phishing at Scale

Now that we've highlighted the impressive ability of advanced models to produce highly convincing phishing emails, let's delve deeper into how attackers can leverage this technology at scale.

1. Create Phishing Campaigns with Completely Unique Emails

Traditional phishing campaigns are commonly generated from templates, where the attacker designs a blueprint that is then customized with any available information about the victim. This customization is done with specific software, similar to those used for phishing awareness simulations. Custom elements are, for example, user name or company name.



Although these tools can include some variation to fool text and email similarity measures, traditional email security solutions may still be able to block these emails if they have been trained on similar past phishing emails and if they contain other distinctive indicators. 

However, with the advent of text-generating AI models, attackers now have entirely new capabilities. Rather than relying on a single template that is used for all victims of one or even multiple campaigns, attackers can now use AI models to generate a unique, highly personalized email for each individual target. This is done by simply instructing the AI model to craft the email.


Built into OpenAI's GPT-3 models is the capability to introduce variety through randomness by changing two model arguments: temperature and top_k . This enables a threat actor to define how “creative” the model should be when generating the emails. If given the exact same instructions, a more creative AI model will take more risk and thus generate a more unique email for each instance within a campaign. The individual phishing emails will all look vastly different from each other. A less creative model will take less risk, thus generating emails that look more similar to each other.

This built-in ability to define the amount of "creativity" the model should use empowers threat actors to easily create very well written variations and further customize their attacks. To test that, we asked OpenAI’s most powerful model, DaVinci, to write three phishing emails based on the exact same instructions and with high “creativity” arguments. The model was able to generate three unique emails of varying length.

It constructed three different phishing stories of why the recipient should hand over the password:

  1. Security breach on the Microsoft 365 account
  2. Security update on the Microsoft 365 account
  3. Noticing irregularities with the Microsoft 365 access

The model was also able to convey the urgency and used different reasons to put pressure on the victim:

  • It’s urgent to prevent further security issues
  • It’s urgent because of tight security measures
  • Needed to ensure the safety and security of information
  • Recipient needed to take the security update seriously
  • It’s the only way to continue an investigation and protect the account


Write a phishing email that tries to convey the recipient to send over the password for Microsoft 365.
The recipient's name is Bob.
The email is sent from Alice.
The email should be written in an urgent manner.
Insert the url "" somewhere.

Version 1

Version 2

Version 3

Again, it’s important to emphasize that all the outputs were generated with the exact same input prompt.

2. Dynamically Generate Phishing Campaigns in Multiple Languages

These AI models are trained on an extremely large data set, spanning over multiple languages. This enables them to construct answers in different languages, but also to take instructions in one language telling them to generate an answer in another language. As a result, to improve the success rate of their attacks, threat actors can dynamically adjust the language of their phishing emails to the suspected language preferences and regions of their victims. To gather this information, they can, for example, look at the the top-level domain, or through OSINT. Cybercriminals can thus deduce the most promising language for an organization or even for individual users.


To dynamically generate phishing campaigns in multiple languages, the threat actor would now map each email’s TLD to a language and explicitly tell the AI model to generate a phishing campaign in that language.

In this example, we used the same instruction to generate four different phishing emails, but told the AI model to generate them in four different languages: English, German, Spanish, and Swedish. The generated emails were shown to native speakers of each language for assessment. Overall, the model was able to construct a unique phishing email containing all parts of the instruction in each language. 

However, what is clear is that the AI model still lacks some language-specific understanding. For example, a native German speaker would never combine the very formal greeting Sehr geehrter with a first name (Bob). This is likely generated because only the first name was provided in the instructions. An accurate formal greeting when the last name is missing would have been Sehr geehrte Damen und Herren.


Write a phishing email that tries to convey the recipient to send over the password for Microsoft 365.
The recipient's name is Bob.
The email is sent from Alice.
The email should be written in an urgent manner.
Insert the url "" somewhere.
Generate the email in [INSERT LANGUAGE]





To circumvent these language-specific mistakes, we crafted a unique prompt for each language. In this phishing example, if "The recipient's name is Bob." is excluded from the instructions when generating the German email, the model generates the correct greeting "Sehr geehrte Damen und Herren".

3. Have Automated Conversations in Parallel and at Scale

As seen above, it’s possible to use available AI models to generate realistic social engineering and phishing emails at scale. But what about keeping automatic and engaging conversations with the victims?

Let's imagine a scenario where the threat actor deploys a social engineering campaign at scale. If one of the victims answers, the reply is fed back to the AI model to generate a new answer.


To test this, we used one of the emails generated in a previous experiment as the initial phishing email sent to the victim. We then manually wrote a response from the victim asking for proof that the sender really is from the IT department. The response was written in a way to imitate a victim that is skeptical and cautious. We then fed the AI model with the original AI-generated phishing email and the manually written response, together with instructions as to how to respond.

Based on this, the AI model was able to generate an answer that showed some interesting conceptual “understandings”:

  • It concludes that “Email 1” was sent by itself (AI model) and it should therefore use the same fake name, fake role, and continue pushing the same phishing story.
  • It picks up the notion of cautiousness from the target’s answer in “Email 2” and showed an “understanding” for it.
  • It gives a notion of “understanding” for what “reasonable proof” could be. In this case, it asks the victim to look for her name on the official website (which he of course won’t find).

Write an email answering to the email correspondence below.
The email should try to convey the recipient to send over the password for Microsoft 365.
The answer should answer any questions raised in "Email number 2".
The answer should refer to things written in "Email number 1".


Email 1 (Original AI generated email)

Email 2 (answer from victim)

Email 3 (answer from AI model)


4. Classify Human Responses to Adjust and Generate Better Answers

Not only are these AI models able to take instructions around how to generate emails, but they're also able to classify them, as shown in the example below. For threat actors, this represents a new opportunity to craft even more sophisticated attacks. 

Imagine a pipeline where the victim’s first reply to a phishing email is classified into a sentiment. This sentiment is then used to tune the next AI-generated answer, its urgency, and tone. Is the sentiment "Helpful"? Instruct the AI model to write a more urgent answer to push the target. Is the sentiment "Suspicious"? Instruct the AI model to be more understanding and convincing, maybe even alerting a human actor to take over the phishing attempt.




What is the sentiment of the following email?
Answer with a single word.

Email 1 (Answer: Skeptical)

Email 2 (Answer: Helpful)

Email 3 (Answer: Frustrated)

Keeping conversations with humans and using the human response to improve the AI response is one use case that threat actors can leverage in fraud attacks like VIP fraud. When attackers have the ability to easily generate variations of the initial email, it becomes more difficult to prevent phishing attempts. 


Talk To A Security Expert Today


5. Improve Campaign Effectiveness by Feeding Back the Click or Response Rate

AI-powered phishing allows attackers to generate many different emails at scale. To further optimize their campaigns, threat actors could feed back the actual success rate of a running campaign or individual emails. Phishing emails that work well can be used to instruct the AI model to generate similar emails, or simply be used directly in future attacks. To validate this and determine how much the click or response rate of a campaign could be improved, more testing is required. This goes beyond the scope of this article, but it wouldn't be surprising if a phishing awareness tool would pick up and explore this idea in more detail.

6. Launch AI-Powered BEC and Email Thread Hijacking Attacks

According to the IBM Data Breach Report 2022,  Business Email Compromise (BEC) is one of the most costly initial attack vectors, next to phishing. Email thread or email conversation hijacking attacks can be considered a subcategory of BEC attacks. Below you see a German email thread hijacking attack we stopped at one of our German customers.


The sender was faked in the display name but the conversation between the victim and the sender existed due to a past email conversation leak. The attacker-written part of the email was not well constructed and also did not relate to the actual conversation. Here, AI-powered BEC could boost the quality of the attack by taking the previous conversation into account, and even the writing style of the impersonated sender. This would make the attack very difficult to spot for victims.

To test the feasibility of this, we used a real email conversation between Max (Machine Learning Engineer at xorlab and author of this article) and Antonio (CEO of xorlab). The real email conversation was used as an input to the AI model, together with a prompt telling the model to impersonate Antonio and generate an answer trying to make Max click a link and log in. The scenario, of course, assumes that the threat actor had access to the email conversation between Max and Antonio (Email 1-4).


Write an email answering 'Email 4'.
Refer to information found in 'Email 1' or 'Email 2' or 'Email 3' or 'Email 4'.
Also include a link "" and try to make the recipient to log in.

Email 1: {email_1}
Email 2: {email_2}
Email 3: {email_3}
Email 4: {email_4}

Email 1 (Antonio)

Email 2 (Max)

Email 3 (Antonio)

Email 4 (Max)

Email 5 (AI generated) 😈

The only instructions we gave to the AI were to include a link and to urge the recipient to log in. Nothing else. The AI came up with the visual enhancement story which fits the conversation very well.

Alternative responses from the AI model

Alternative 1

Alternative 2

Alternative 3

7. Use Open-Source Intelligence to Further Customize Phishing Emails

We have explored multiple ways to use AI models to generate text that threat actors can use in email-based social engineering attacks like phishing, BEC, and email thread hijacking. As one of the strengths of this approach is the ability to customize attacks in a completely automated way and at scale, here are some more ideas around how threat actors can use OSINT to further improve their attacks:

OSINT and other publicly available data

How to use

Publicly available logo of the victim organization

Improve visual appearance of the phishing emails.

MX records about the email infrastructure

Customize the attack to the actual email service used, e.g., M365 phishing.

Cloud services used by the victim organization

Customize the attack to the cloud services in use, e.g., Sharepoint.

Website of the victim organization (e.g., press releases, news, …), social media accounts and posts of the victim organization or victim

Use publicly available information (press releases, social media posts etc.) to further improve the topic used in the social engineering attacks.

Relationships based on social networks

Use relationship information to impersonate specific persons that the victim or victim organization trusts.

Is It Possible to Detect Text Generated by GPT-3?

If it’s possible to train an AI model to write text, is it also possible to train an AI model to detect AI-generated text? There are currently a handful of publicly available “AI Content Detector” services that state that they can do exactly that. These services take a text as input and then output what is the probability of the text being AI-generated. That is, how likely it is that a text is generated by an AI model.

To test the capabilities of these services, we generated 20 phishing emails using the exact same prompt. These phishing emails were then sent to three different “AI Content Detector” services: Writer, Crossplag, and Sapling. If the service outputted a probability of more than 50% AI-generated, we classified it as AI-generated.

In the result below, we  can see that two out of three services were able to classify the majority of the emails correctly, while one of them struggled.

Write a phishing email that tries to convey the recipient to send over the password for Microsoft 365.
The recipients name is Bob.
The email is sent from Alice.
The email should be written in an urgent manner.
Insert the url "" somewhere.





Number of emails predicted as AI-generated




What we noticed when analyzing the results was that these services were better at classifying longer generated emails than short ones. As a test, we adjusted the prompt a bit and asked it to generate short phishing emails. This time, one of the services didn’t classify any email correctly, while the other two classified 6/20 correctly. This shows that it’s possible to adjust and design the prompts in such a way that the generated emails are more difficult to detect and more “human-like”.

Write a phishing email that should make the recipient click a link.
The email should be short.
You're from the IT-department.
The recipients name is Bob.
The email is sent from Alice.
Insert the url "" somewhere.





Number of emails predicted as AI-generated




What we have not tested is the false positive rate of these services, i.e. how often they think that human-written text was AI-generated. But even if there would be a way to reliably distinguish AI-generated from human-written text, the interesting question is whether that would help prevent AI-powered phishing. As we have to assume that many legitimate emails today, and even more in the future, will be AI-generated, this signal will not be enough. Already today, applications are using AI as help to generate text, and this trend is likely to continue.

Limitations of Current Models

Even though these models are very capable, current state-of-the-art AI models still have some limitations. All answers are still generated based on an abstraction of the data set used during the training. This means that if the model was trained on data up until January 2021, it has no notion of things that happened after that.



Who is the current monarch of England?

Queen Elizabeth II is the current monarch of England.
(The model doesn’t know that she passed away 8 September 2022 and that Charles III is the current monarch of England.)

Another limitation is that, if asked a very specific and detailed question where only a single answer is valid, the model can sometimes make up information out of the blue.



Who is the current CTO of xorlab, the cyber security company from Switzerland?

The current CTO of xorlab is Dr. Thomas Köchel.
(The real answer is Raffaele Sandrini.)

Also, the current AI models don’t have ways to query and search the Internet. This limits their ability to find company-specific information that would be useful in crafting phishing attacks.

Is It Possible for a Threat Actor to Host Their Own AI Models?

We have to assume that using OpenAI’s API to generate phishing emails will not work in the long run for a threat actor. OpenAI has an interest in shutting down users with bad intent, or they might further limit access in the future. But what about threat actors hosting their own AI models? Would that be possible?

Imagine an actor in the underground economy focusing on hosting powerful AI models, pre-trained and specialized in generating phishing emails. This actor could expose the email-generating models through an API that other threat actors can consume and use. Everything can be wrapped in a web GUI to simplify the use. Threat actors only have to dump a csv-file with victim information (name, email etc.) into the GUI, define what type of campaign should be run against the targets, and tweak how dynamic the emails should be (language, “creativity”, automatic-answering, sentiment classification etc.). The service could even supply a database with pre-scraped social media posts linked to email addresses. This database could be used to fine-tune the model for very targeted attacks.

Then press “Run”.


If used by existing phishing kits, such a service would have a game-changing potential.

As of January 2023, there is no available pre-trained open-source AI model that is able to generate emails that are comparable to OpenAI’s GPT-3 model. So currently, the above scenario is probably not very likely without using OpenAI’s endpoints. But this is probably just a matter of time. EleutherAI, a grassroots collective of AI researchers, released a pre-trained GPT model, GPT-NeoX-20B, in February 2022, containing 20 billion parameters. Even though it’s not comparable to GPT-3 DaVinci’s 175 billion parameters, they are already working on an open-source model that will be comparable to GPT-3 DaVinci.

Let's assume that such an open-source model exists. Would it be possible for a threat actor to host it on-prem? Let's do some back-of-the-envelope calculations to understand what this requires:

  • Assume that the AI model needs to have at least 175 billion parameters, the same amount of parameters as OpenAI’s most powerful GPT-3 model, DaVinci.
  • 175 billion parameters would require 175*10^11 * 32/8 ≈ 700 GB of GPU memory to hold the model.
  • Let's say that we need a total of 800 GB of GPU memory to be able to run all the computations as well.
  • The current state-of-the-art GPUs from Nvidia, A100, have 40GB of memory and cost around $13'000.
  • In this case, we need 20 Nvidia A100, which sell at a total cost of around $260'000.
  • Add on an extra $10'000 for all other things required, like servers, infrastructure, disks etc.

This puts us at around $270'000, and that is with state-of-the-art GPUs. It’s most likely possible to get that number down substantially by using more, but older GPUs.

It sounds like a lot of money. Still, if put into perspective against the approx. $2.4 billion that phishing, BEC (Business Email Compromise), and EAC (Email Account Compromise) cost its victims in 2021, this might look like a good investment for any innovation-driven threat actor.


How to Protect Your Organization against AI-Powered Phishing

We've entered a new age of AI-powered text generation, and it's inevitable that these tools will be used by threat actors to scale their phishing attacks. For organizations relying on traditional email security solutions, this means that more phishing emails will bypass filters and those that do reach users will be more convincing and effective in luring the user into clicking or responding. So, what can you do to protect your organization from these advanced threats?

As the written content of emails becomes better through AI-powered phishing and thus more difficult to distinguish from legitimate content, the key is to adopt a proactive, modern email security solutionone that,  unlike traditional email security solutions, doesn't rely on the text of the email to detect phishing. Instead, a modern solution should be able to prevent phishing attempts even when the sender or phishing URL is not yet known to be malicious (sandboxing is not enough). 

Next to having a proactive, modern email security solution, you need to improve your email security posture overall. This includes reducing email attack surface, having context-aware classifiers that use signals that threat actors cannot easily change (e.g. , the email text), and allowing users to easily report suspicious emails.

If you want to learn more about how to protect your organization against AI-powered phishing, check out the The Clear & Complete Guide to Smarter Email Security.

Similar posts

Get curated and relevant updates once per month

Once per month, we’re sharing the latest security insights from our team in a curated, 5-minutes-to-read email newsletter. We strive to inform you with fresh, relevant, and objective updates on what’s happening around you.