OWASP Top 10 LLM Edition
The rise of Large Language Models (LLMs) and Generative AI (GenAI) technologies like GPT-4 has revolutionized various industries, enabling powerful natural language processing capabilities. However, the rapid adoption of these technologies has outpaced the establishment of comprehensive security protocols, leading to significant security vulnerabilities. To address these concerns, the OWASP (Open Web Application Security Project) has developed the OWASP Top 10 for LLMs. This guide provides developers, data scientists, and security practitioners with practical, actionable security guidance tailored to the unique challenges posed by LLMs and GenAI.
1. Prompt Injection
Description: Prompt injection involves manipulating LLMs with crafted inputs to bypass filters or perform unintended actions. This can lead to unauthorized access, data breaches, and compromised decision-making.
Example: An attacker crafts a prompt that makes the LLM reveal sensitive information or execute unintended commands.
In the news: Security researcher Johann Rehberger demonstrated a proof of concept where ChatGPT was tricked into executing a hidden prompt embedded in a YouTube transcript. By embedding a prompt like "Print ‘AI Injection succeeded’" within the transcript, Rehberger was able to manipulate the LLM into executing unintended commands. This highlights the vulnerability of LLMs to prompt injection attacks, where malicious inputs can bypass intended filters and controls (ar5iv) (Popular Science)
Mitigation: Implement strict input validation and output sanitization. Use contextual filtering and limit the scope of what the LLM can access. To mitigate prompt injection attacks, consider the following strategies:
1. Input Validation and Sanitization: Implement robust input validation to filter out potentially harmful inputs.
2. Human Oversight: Ensure critical decisions or actions require human verification.
3. Monitoring and Anomaly Detection: Continuously monitor LLM interactions to detect and respond to unusual activities.
4. Access Control: Restrict LLM access to sensitive operations and data.
5. Regular Updates: Keep LLMs and associated systems updated with the latest security patches and improvements.
2. Insecure Output Handling
Description: Neglecting to validate LLM outputs can result in security exploits, such as code injection or data leaks. LLMs can generate outputs that, if not properly handled, could execute malicious code or expose sensitive data.
Example: An LLM-generated output containing executable script tags could be rendered in a web application, leading to cross-site scripting (XSS) attacks.
In the news: Microsoft's AI chatbot Tay, which was launched on Twitter in March 2016, serves as a notable example of insecure output handling. Tay was designed to engage in casual conversation with users and learn from these interactions. However, within 16 hours of its launch, users exploited Tay’s learning capabilities by feeding it offensive and inappropriate prompts. This manipulation led Tay to generate and post inflammatory, racist, and sexist content on Twitter.
The failure occurred because Tay's design did not include robust mechanisms to filter and validate its outputs against harmful content. Microsoft's response highlighted the coordinated effort by users to abuse Tay’s commenting skills, causing it to produce inappropriate responses (Wikipedia) (TechRepublic).
Mitigation: Implement robust output validation and sanitization techniques. Ensure that LLM outputs are treated as untrusted data and handled accordingly. This includes:
1. Filtering Mechanisms: Develop advanced filtering systems to detect and block offensive or harmful content before it is generated.
2. Human Oversight: Incorporate human moderators to review and manage outputs, especially in the initial stages of deployment.
3. Contextual Awareness: Enhance the model's ability to understand the context and refrain from generating content that contradicts ethical guidelines.
3. Training Data Poisoning
Description: Training data poisoning involves tampering with the data used to train LLMs, which can impair the model's behavior, accuracy, and ethical considerations.
Example: An adversary injects biased or malicious data into the training set, leading to outputs that favor certain viewpoints or compromise security.
In the news: A notable study by researchers from the University of Washington explored the impact of training data poisoning on machine learning models. This type of attack involves injecting malicious or biased data into the training set, which can significantly skew the behavior and outputs of the model. For example, if an attacker injects specific biased data points, they can influence the model to produce outputs that favor certain viewpoints or behave unethically. This can compromise the model’s security, effectiveness, and fairness.
In practical terms, this kind of attack can be executed without specialized insider knowledge. Adversaries might exploit web-scale datasets by modifying content at URLs used in the training data. This manipulation can occur if attackers control the content at these URLs, even if only temporarily. For instance, they could edit Wikipedia pages or other sources just before the dataset is collected, inserting malicious content that poisons the training data (ar5iv) (SpringerLink) (ar5iv).
Mitigation: Implement rigorous data provenance checks and use anomaly detection to identify suspicious training data. Regularly audit and cleanse training datasets. To defend against training data poisoning, consider the following strategies:
1. Data Provenance Checks: Regularly audit and verify the source and integrity of training data. Use cryptographic techniques to ensure that data has not been tampered with.
2. Anomaly Detection: Implement algorithms to detect and flag anomalous or suspicious data patterns that might indicate poisoning attempts.
3. Robust Training Methods: Use techniques that can mitigate the impact of poisoned data, such as robust statistical methods and adversarial training.
4. Model Denial of Service (DoS)
Description: Overloading LLMs with resource-heavy operations can disrupt services and increase operational costs. This can be exploited to perform denial-of-service attacks.
Example: An attacker sends a high volume of complex queries to exhaust the LLM's computational resources.
In the news: Model Denial of Service (DoS) attacks on Large Language Models (LLMs) exploit the resource-intensive nature of these models to disrupt their availability and functionality. In one documented incident, attackers targeted Microsoft Azure's translation service by submitting complex, resource-heavy queries designed to overburden the system. These queries, while appearing benign, required excessive computational power, causing significant slowdowns and making the service up to 6000 times slower than usual. This attack highlighted the vulnerability of LLMs to meticulously crafted inputs that exhaust their processing capabilities (Microsoft Security Response Center).
Mitigation: Implement rate limiting and resource quotas for LLM queries. Use load balancing and scalable infrastructure to handle high traffic efficiently.To defend against Model DoS attacks, consider implementing the following strategies:
1. Robust Infrastructure and Scaling: Use load balancing, auto-scaling, and distributed processing to handle sudden traffic spikes. This ensures that the workload is evenly distributed across multiple servers, reducing the risk of resource exhaustion.
2. Input Filtering and Validation: Establish strong input filtering and validation mechanisms to block malicious or malformed queries before they reach the LLMs. Techniques like rate limiting and input sanitization can help manage suspicious traffic patterns.
3. Efficient Model Architectures: Develop efficient and lightweight model architectures that reduce computational overhead. Techniques such as model compression, quantization, and distillation can make LLMs more resilient to resource exhaustion attacks.
4. Active Monitoring and Response: Continuously monitor LLM systems for signs of DoS attacks. Use performance metrics, log analysis, and anomaly detection to identify potential threats in real-time. Having an incident response plan in place is crucial for isolating affected systems and restoring service quickly.
5. Collaborative Defense and Information Sharing: Work with the AI community to identify emerging threats, share best practices, and develop common standards and protocols. Collaboration enhances the overall security ecosystem for LLM deployment and operation.
6. Detailed Context Management: Ensure that the model does not inadvertently process hidden prompts embedded within seemingly benign inputs. Techniques such as input segmentation and context window checks can help in identifying and filtering out potential prompt injections.
5. Supply Chain Vulnerabilities
Description: Relying on compromised components, services, or datasets can undermine the integrity of LLM applications. Supply chain vulnerabilities can lead to data breaches and system failures.
Example: Using an insecure third-party library or dataset that contains vulnerabilities.
In the news: One notable instance of a supply chain vulnerability involved the exploitation of the PyPi package registry. Attackers uploaded a malicious package mimicking the legitimate (and very popular) 'PyKafka' package. This compromised package, when downloaded and executed, installed malware that opened backdoors on various systems to gain unauthorized access, exposing them to further attacks. This incident highlights the significant risk associated with third-party components and dependencies in the supply chain of LLM applications (BleepingComputer) (Enterprise Technology News and Analysis).
Another example involves the poisoning of publicly available pre-trained models. Attackers uploaded a tampered model specializing in economic analysis and social research to a model marketplace like Hugging Face. This poisoned model contained a backdoor that allowed the generation of misinformation and fake news, illustrating how easily the integrity of LLM applications can be compromised through malicious supply chain activities (Analytics Vidhya) (TechRadar).
These scenarios demonstrate how vulnerabilities in the supply chain can lead to severe security breaches, biased outcomes, and even system failures.
Mitigation: Conduct thorough security reviews of all third-party components and services. Implement supply chain risk management practices and use trusted sources. To mitigate supply chain vulnerabilities, consider the following strategies:
1. Vetting Data Sources and Suppliers: Ensure that all data sources and suppliers are carefully vetted. This includes reviewing terms and conditions and privacy policies to ensure alignment with your data protection standards.
2. Using Reputable Plugins and Models: Only use plugins and models from reputable sources, and ensure they have been tested for your application requirements.
3. Vulnerability Management: Apply the OWASP Top Ten's guidelines on managing vulnerable and outdated components. This includes regular vulnerability scanning, patch management, and maintaining an updated inventory of all components using a Software Bill of Materials (SBOM).
4. Anomaly Detection and Adversarial Robustness Testing: Implement anomaly detection and robustness testing on supplied models and data to detect tampering and poisoning.
5. Active Monitoring: Continuously monitor for vulnerabilities within components and environments, and ensure the timely patching of outdated components.
6. Sensitive Information Disclosure
Description: Failing to protect against the disclosure of sensitive information in LLM outputs can lead to data leakage, privacy breaches and legal consequences.
Example: An LLM unintentionally reveals personal data or proprietary information in its responses (especially prevalent with retrieval augmented generation).
In the news: A prominent example of sensitive information disclosure occurred when employees at a tech firm inadvertently entered confidential details into ChatGPT. This included valuable source code and exclusive data on semiconductor equipment. The incident demonstrated how easily sensitive information can be exposed when using AI-driven tools, highlighting a critical gap in data privacy and security for organizations deploying LLMs (TheStreet).
Another case involved an AI model unintentionally revealing personal identifiable information (PII) from its training data. This can happen when the model overfits or memorizes specific data during training and later reproduces it in responses, leading to unintended disclosures. Researchers discovered that ChatGPT could unintentionally reveal personal identifiable information (PII) from its training data. Researchers from institutions like Google DeepMind, the University of Washington, and ETH Zurich demonstrated that by using simple prompts, ChatGPT could divulge email addresses, phone numbers, and other sensitive data. They were able to make the AI reveal private information by asking it to repeat certain words indefinitely, which eventually led the model to output memorized data from its training set (Engadget).
Mitigation: Use data anonymization techniques and implement access controls to restrict sensitive information. Regularly review and update privacy policies. To mitigate the risk of sensitive information disclosure, organizations should adopt several strategies:
1. Data Sanitization and Scrubbing: Implement comprehensive measures to cleanse data inputs, removing identifiable and sensitive information before it is processed by LLMs. This includes robust input validation to prevent the model from being poisoned with malicious data.
2. Access Control: Ensure strict access controls for data fed into LLMs and external data sources. Apply the principle of least privilege to limit access to sensitive information.
3. Awareness and Training: Educate stakeholders on the risks and safeguards related to LLM applications. Emphasize the importance of privacy-centric development practices.
4. Monitoring and Anomaly Detection: Continuously monitor data inputs and outputs to identify and rectify potential data leaks quickly. Use anomaly detection systems to flag unusual patterns that might indicate a breach.
5. Policy and Governance: Develop and enforce robust data governance policies, including clear terms of use that inform users about data processing practices and provide options to opt out of data sharing
7. Insecure Plugin Design
Description: LLM plugins processing untrusted inputs and having insufficient access control can lead to severe exploits, such as remote code execution.
Example: A plugin with elevated privileges executes malicious code from untrusted inputs.
In the news: A notable example of insecure plugin design comes from the vulnerability found in the AI Engine plugin for WordPress, which affected over 50,000 active installations. This plugin, used for various AI-related functionalities such as creating chatbots and managing content, had a critical flaw that exposed sites to remote attacks. The vulnerability allowed attackers to inject malicious code, leading to potential data breaches and system compromises. This incident underscores the importance of secure design and implementation of plugins used in AI systems (Infosecurity Magazine).
Mitigation: Apply the principle of least privilege and conduct thorough security assessments of all plugins. Implement strong access controls and input validation. To prevent insecure plugin design vulnerabilities, consider the following strategies:
1. Strict Parameterized Input: Ensure plugins enforce strict parameterized input and include type and range checks on inputs. Use a second layer of typed calls to parse requests and apply validation and sanitization where freeform input is necessary.
2. Robust Authentication and Authorization: Plugins should use appropriate authentication mechanisms like OAuth2 and apply effective authorization and access controls to ensure only authorized actions are performed.
3. Thorough Testing: Conduct extensive testing, including Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), and Interactive Application Security Testing (IAST) to identify and mitigate vulnerabilities in plugin code.
4. Minimize Exposure: Design plugins to minimize the impact of insecure input parameter exploitation, following least-privilege access control principles and exposing as little functionality as possible while still performing the desired function.
5. Manual User Authorization: Require manual user authorization and confirmation for actions taken by sensitive plugins to ensure additional verification and oversight.
8. Excessive Agency
Description: Granting LLMs unchecked autonomy to take actions can lead to unintended consequences, jeopardizing reliability, privacy, and trust.
Example: An LLM-automated system makes unauthorized financial transactions based on flawed logic.
In the news: A notable case that highlights the risks of excessive agency is Microsoft's Recall feature in its Copilot+ PCs. Announced at Microsoft's Build conference, this feature continuously captures screenshots of user activity to allow for easy search and recall of past actions. This feature, designed to enhance user experience by capturing and storing screenshots of user activity, can be exploited to gain unauthorized access due to excessive permissions. The privilege escalation vulnerabilities allow attackers to bypass access controls, accessing and potentially misusing sensitive information stored by Recall (Wired).
Mitigation: Implement human-in-the-loop mechanisms and ensure critical decisions are subject to human oversight. Define clear boundaries for LLM autonomy. To prevent excessive agency, consider the following strategies:
1. Limit Plugin Functions: Ensure that plugins only have the minimum functions necessary for their intended purpose. For example, a plugin that reads emails should not have the capability to send or delete emails.
2. Restrict Permissions: Grant plugins the minimum permissions required. If a plugin only needs read access, ensure it does not have write, update, or delete permissions.
3. Avoid Open-Ended Functions: Use plugins with specific, granular functionality rather than those that allow broad, unrestricted actions.
4. Human-in-the-Loop: Implement manual approval processes for high-impact actions. For instance, require user confirmation before sending emails or performing financial transactions.
5. Rate Limiting: Implement rate limiting to control the number of actions an LLM can perform within a given timeframe, reducing the potential for abuse.
6. Monitor and Log Activities: Continuously monitor and log the activities of LLM plugins to detect and respond to unusual behavior promptly.
9. Overreliance
Description: Failing to critically assess LLM model outputs can compromise decision-making and lead to security vulnerabilities.
Example: Blindly trusting LLM-generated content in a critical application without verification.
In the news: A prominent example of excessive agency involved the use of AI-generated news sites. NewsGuard identified numerous websites that were primarily or entirely generated by AI tools like ChatGPT. These sites, which operated with minimal to no human oversight, were found to publish large volumes of content daily, often without adequate fact-checking or editorial review. This led to the proliferation of misinformation, including false news reports and misleading articles. These AI-driven content farms, such as those identified in the "Rise of the Newsbots" report, were typically designed to generate revenue from programmatic ads, exploiting the credibility gap created by their automated nature (MIT Technology Review) (NewsGuard) (euronews).
In a specific case, websites like "Biz Breaking News" and "News Live 79" published AI-generated articles that included error messages or generic responses typical of AI outputs, revealing their lack of human oversight. This reliance on AI to generate and manage content without sufficient controls resulted in the spread of disinformation and diminished the trustworthiness of these platforms (NewsGuard) (euronews).
Mitigation: Encourage a culture of critical thinking and implement validation processes for LLM outputs. Use LLMs as advisory tools rather than final decision-makers. To prevent the risks associated with overreliance, consider these strategies:
1. Regular Monitoring and Review: Implement continuous monitoring and review LLM outputs. Employ self-consistency or voting techniques to filter out inconsistent responses, enhancing output quality and reliability.
2. Cross-Check with Trusted Sources: Validate LLM outputs against trusted external sources to ensure the accuracy and reliability of the information.
3. Enhance Model Training: Fine-tune models with specific domain knowledge to reduce inaccuracies. Techniques like prompt engineering and parameter efficient tuning can improve model responses.
4. Implement Automatic Validation: Use automatic validation mechanisms to cross-verify generated outputs against known facts or data, adding an additional layer of security.
5. Human Oversight: Integrate human oversight for content validation and fact-checking to ensure high content accuracy and maintain credibility.
6. Risk Communication: Clearly communicate the risks and limitations associated with using LLMs to users, preparing them for potential issues and helping them make informed decisions.
10. Model Theft
Description: Unauthorized access to proprietary LLMs can result in the theft of intellectual property, competitive advantage, and sensitive information.
Example: An attacker gains access to and steals an organization's proprietary LLM model.
In the news: A significant instance of model theft involved research demonstrating the feasibility of extracting sensitive information and functions from large-scale language models (LLMs) like OpenAI’s GPT-3 and Google’s PaLM-2. The attack, detailed by a team including members from Google DeepMind and ETH Zurich, used sophisticated techniques to recover specific model components. This proof-of-concept showed that attackers could effectively replicate parts of a proprietary model by querying the API and collecting output to train a surrogate model, a method often termed as "model stealing" (GIGAZINE) (Unite.AI).
Another report by Unite.AI highlighted how attackers might use model theft to create shadow models. These shadow models can then be used to stage further attacks, including unauthorized access to sensitive information or to refine adversarial inputs that bypass security measures of the original model (Unite.AI (https://www.unite.ai/the-vulnerabilities-and-security-threats-facing-large-language-models/)).
Mitigation: Implement strong access controls and encryption for model storage. Regularly audit and monitor access to LLM models. To prevent model theft, organizations should implement several key strategies:
1. Strong Access Controls: Employ robust access control mechanisms such as Role-Based Access Control (RBAC) and the principle of least privilege. Ensure that only authorized personnel have access to LLM models and their related data.
2. Authentication and Monitoring: Use strong authentication methods and continous monitoring access logs to detect and respond to any suspicious or unauthorized activity promptly.
3. Centralized Model Registry: Maintain a centralized ML Model Inventory or Registry. This helps in managing access, implementing authentication, and logging activities related to model usage.
4. Restrict Network Access: Limit the LLM's access to network resources, internal services, and APIs to minimize exposure to potential attacks.
5. Adversarial Robustness Training: Conduct adversarial robustness training to detect and mitigate extraction queries. This helps in identifying and countering model extraction attempts.
6. Rate Limiting: Implement rate limiting on API calls to reduce the risk of data exfiltration from LLM applications.
7. Watermarking: Integrate watermarking techniques into the embedding and detection stages of the LLM lifecycle to help identify and track unauthorized usage of the model.
Conclusion
The OWASP Top 10 for LLMs and GenAI serves as a crucial resource for securing applications utilizing these advanced technologies. By understanding and mitigating these vulnerabilities, developers and practitioners can build safer, more reliable LLM applications. Staying informed and adopting these best practices will help ensure the responsible and secure deployment of LLMs and GenAI in various industries.
For more detailed information, examples and solutions recommendations, feel free to contact us.