Harnessing Large Language Models for Detection of AI Generated Attacks

Large language models have led to the emergence of AI tools that threat actors use to craft sophisticated attacks. Exploitation toolkits leveraging AI, such as WolfGPT, EscapeGPT, XXXGPT, Evil-GPT, FraudGPT, WormGPT, GhostGPT, and Dark LLMs, are proliferating at an alarming rate. Now, with the design of low-cost models like DeepSeek, malicious AI toolkits will increasingly be weaponized by threat actors to amplify the sophistication, scale, and success rate of their attacks, posing a significant and growing threat to global cybersecurity.

Email-delivered threats continue to pose a significant risk to organizations, leveraging vectors such as malware-laden attachments, phishing URLs, and conversational payloads including Business Email Compromise (BEC).

The advent of generative AI has exacerbated this risk by enabling the rapid creation of diverse malicious payloads, landing URLs, and evasion techniques—such as redirect chains, QR codes, CAPTCHAs, file-sharing platforms, and cloud-hosted infrastructure—to obscure the exploitation stage and bypass traditional detection mechanisms. Additionally, generative AI can produce numerous semantically varied BEC messages, challenging the robustness of binary and multi-class neural classifiers like BERT. These classifiers are prone to false negatives when not trained on the full range of semantic variations, highlighting the need for more adaptive and semantically-aware detection strategies.

To address the challenge of detecting AI-generated malicious attachments and URLs, we present a detailed design of an algorithm specifically built by taking a first-principles approach. The algorithm analyzes the semantic and thematic elements embedded in the body of an email and the text in its attachments to derive the email’s purpose, i.e., its intent—treating intent as primary features rather than relying on downstream indicators such as the final landing URL or features from exploitation stage. By decoupling detection from exploitation-stage artifacts, this approach inherently enhances resilience against evasion techniques.

We will first dive into the details of the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, which lays the foundation of our approach. We employ BERTopic, an unsupervised topic modeling technique, to identify common semantic patterns and themes in emails used to deliver malicious attachments and call-to-action URLs. Text from the body of emails and attachments spanning the past decade were extracted and sanitized to ensure consistency. Multilingual embedding models, such as BGE-M3, are used to generate dense vector representations, which are subsequently clustered using density-based algorithms like OPTICS, aided by dimensionality reduction through UMAP. Phi-3-Mini-4K-Instruct was used to derive the semantics from the clusters, while hierarchical topic modeling with hLDA uncovers deeper thematic structures and threat actor patterns from the clusters.

Each incoming email undergoes zero-shot semantic classification using prompt engineering to derive the semantics in the body and from the text in the attachment of the email. Additionally, we perform cosine similarity analysis by comparing the email’s embeddings against a repository of pre-stored embeddings. This helps determine whether the email contains semantics that have been used by threat actors to deliver malicious URLs and attachments. Once semantics are detected, we apply phrase-topic, hierarchical topic modeling to uncover hidden relationships between various topics, revealing the underlying thematic structures in the email.

Semantics extracted from emails — derived using prompt engineering with LLMs, similarity analysis, and cross-encoder-based semantic re-ranking — along with topics and inter-topic relationships identified through hierarchical topic modeling, email headers, and auxiliary data from call-to-action URLs (including WHOIS information, certificates, URL structure, hosting platform details, etc.), as well as deep file parsing results, are all fed into the expert system. The system analyzes the contextual relationships among the intent (semantics, embedded themes), SMTP headers, and auxiliary information from URLs and attachments to determine whether an attachment or URL is malicious or benign. This comprehensive approach enables the detection of phishing URLs and attachment without relying on the final landing URL, exploitation-stage features, or malicious payloads of attachments.

The presentation will dive into the aspect of how the ability to understand the deeper meaning i.e. intent of an email further enables it to detect complex BEC cases, while its Zero-Shot semantic classification layer ensures the detection of AI-driven variations in conversational payloads.

The presentation will share empirical findings from live production traffic, including a detailed breakdown of evasive attack categories. Notably, in real-world environments, 93% of the evasive malicious attachments and URLs—across formats such as SVG, HTML, and DOCX—detected by our algorithm were missed by 96% of antivirus engines as reported in VirusTotal telemetry. The system also accurately identified complex BEC attacks, including impersonation of customers and vendors and invoice fraud, with high precision and minimal false positives. These findings underscore the algorithm’s robustness in detecting stealthy, low-detection-rate threats across both controlled benchmarks and operational deployments.

Abhishek Singh – InceptionCyber.ai

Abhishek Singh is the Founder & CTO of InceptionCyber.ai, with 15+ years of expertise in AI and Cybersecurity. He has a proven track record of driving AI and Cybersecurity research and engineering innovations at Cisco, FireEye, and Microsoft, leading to cutting-edge technologies and revenue growth.

Holding 44 patents, he has authored 17 research papers, seven white papers, and contributed to three books. His research has been presented at prestigious conferences, including CAMLIS, CARO, Virus Bulletin, Black Hat, RSA, CanSecWest, AVAR, and ACSAC.

His contributions have earned notable recognitions:

2025 SC Awards : Top 5 Innovator (Executive or Practitioner) of the Year
2019 Reboot Leadership Award (Innovators Category) – SC Media
Nominee for Virus Bulletin 2018 Péter Szőr Award
Cybersecurity Professional of the Year (Silver) – North America, Cyber Security Excellence Awards 2020

He holds an MS in Information Security & CS from Georgia Tech, a B.Tech in Electrical Engineering from IIT-BHU, a Master of Engineering Leadership (ELPP+) from UC Berkeley, and a Postgraduate Certificate in AI & Deep Learning from IIT-Guwahati.

Kalpesh Mantri – InceptionCyber.ai

Kalpesh Mantri is the Founding Principal Research Engineer at InceptionCyber.ai, with 12+ years of expertise in Cybersecurity R&D. He specializes in email-borne threats, phishing, and malspam, and has led the development of innovative, patented detection solutions.

Previously, he worked as a Senior Security Engineer, focusing on malware reverse engineering and APT investigations, including uncovering operations such as SideCopy and HoneyTrap targeting defense sectors.

A regular speaker at global security conferences, his work has been presented at Black Hat MEA, Virus Bulletin, AVAR, and the CARO Workshop. He holds Bachelor’s and Master’s degrees in Computer Science and a Professional Certificate in Applied Data Science & ML from IIM Kozhikode.

LinkedIn: linkedin.com/in/kalpeshmantri

Shray Kapoor – InceptionCyber.ai

Shray Kapoor is Principal Research Scientist Engineer at InceptionCyber, bringing over 15 years of expertise in cybersecurity to his work. Prior to joining InceptionCyber, he held positions at Cisco Systems (Talos), Didi Research America, FireEye and SecureWorks. Shray has a proven track record of innovation, with multiple patents granted in the area of Cybersecurity. He holds an MS degree in Information Security from Georgia Tech