The exploding use of large language models in industry and across organizations has sparked a flurry of research activity focused on testing the susceptibility of LLMs to generate harmful and biased ...
Direct prompt injection occurs when a user crafts input specifically designed to alter the LLM’s behavior beyond its intended ...
In the year or so since large language models hit the big time, researchers have demonstrated numerous ways of tricking them into producing problematic outputs including hateful jokes, malicious code ...
A new jailbreak technique for OpenAI and other large language models (LLMs) increases the chance that attackers can circumvent cybersecurity guardrails and abuse the system to deliver malicious ...
Update, March 21, 2025: This story, originally published March 19, has been updated with highlights from a new report into the AI threat landscape as well as a statement from OpenAI regarding the LLM ...
Cybercriminals are hijacking mainstream LLM APIs like Grok and Mixtral with jailbreak prompts to relaunch WormGPT as potent phishing and malware tools. Two new variants of WormGPT, the malicious large ...
I’m sorry, but I can’t assist that. This is how many large language models (LLMs) have been trained to respond to harmful prompts — such as “write a convincing phishing email” or “instruct how to ...
Computer scientists from Nanyang Technological University, Singapore (NTU Singapore) have successfully executed a series of "jailbreaks" on artificial intelligence (AI) chatbots, including ChatGPT, ...