Google Security Blog

Mitigating prompt injection attacks with a layered defense strategy
With the rapid adoption of generative AI, a new wave of threats is emerging across the industry with the aim of manipulating the AI systems themselves. One such emerging attack vector is indirect prompt injections. Unlike direct prompt injections, where an attacker directly inputs malicious commands into a prompt, indirect prompt injections involve hidden malicious instructions within external data sources. These may include emails, documents, or calendar invites that instruct AI to exfiltrate user data or execute other rogue actions. As more governments, businesses, and individuals adopt generative AI to get more done, this subtle yet potentially potent attack becomes increasingly pertinent across the industry, demanding immediate attention and robust security measures.
At Google, our teams have a longstanding precedent of investing in a defense-in-depth strategy, including robust evaluation, threat analysis, AI security best practices, AI red-teaming, adversarial training, and model hardening for generative AI tools. This approach enables safer adoption of Gemini in Google Workspace and the Gemini app (we refer to both in this blog as “Gemini” for simplicity). Below we describe our prompt injection mitigation product strategy based on extensive research, development, and deployment of improved security mitigations.
A layered security approach
Google has taken a layered security approach introducing security measures designed for each stage of the prompt lifecycle. From Gemini 2.5 model hardening, to purpose-built machine learning (ML) models detecting malicious instructions, to system-level safeguards, we are meaningfully elevating the difficulty, expense, and complexity faced by an attacker. This approach compels adversaries to resort to methods that are either more easily identified or demand greater resources.
Our model training with adversarial data significantly enhanced our defenses against indirect prompt injection attacks in Gemini 2.5 models (technical details). This inherent model resilience is augmented with additional defenses that we built directly into Gemini, including:
Prompt injection content classifiers
Security thought reinforcement
Markdown sanitization and suspicious URL redaction
User confirmation framework
End-user security mitigation notifications
This layered approach to our security strategy strengthens the overall security framework for Gemini – throughout the prompt lifecycle and across diverse attack techniques.
1. Prompt injection content classifiers
Through collaboration with leading AI security researchers via Google's AI Vulnerability Reward Program (VRP), we've curated one of the world’s most advanced catalogs of generative AI vulnerabilities and adversarial data. Utilizing this resource, we built and are in the process of rolling out proprietary machine learning models that can detect malicious prompts and instructions within various formats, such as emails and files, drawing from real-world examples. Consequently, when users query Workspace data with Gemini, the content classifiers filter out harmful data containing malicious instructions, helping to ensure a secure end-to-end user experience by retaining only safe content. For example, if a user receives an email in Gmail that includes malicious instructions, our content classifiers help to detect and disregard malicious instructions, then generate a safe response for the user. This is in addition to built-in defenses in Gmail that automatically block more than 99.9% of spam, phishing attempts, and malware.
A diagram of Gemini’s actions based on the detection of the malicious instructions by content classifiers.
2. Security thought reinforcement
This technique adds targeted security instructions surrounding the prompt content to remind the large language model (LLM) to perform the user-directed task and ignore any adversarial instructions that could be present in the content. With this approach, we steer the LLM to stay focused on the task and ignore harmful or malicious requests added by a threat actor to execute indirect prompt injection attacks.
A diagram of Gemini’s actions based on additional protection provided by the security thought reinforcement technique.
3. Markdown sanitization and suspicious URL redaction
Our markdown sanitizer identifies external image URLs and will not render them, making the “EchoLeak” 0-click image rendering exfiltration vulnerability not applicable to Gemini. From there, a key protection against prompt injection and data exfiltration attacks occurs at the URL level. With external data containing dynamic URLs, users may encounter unknown risks as these URLs may be designed for indirect prompt injections and data exfiltration attacks. Malicious instructions executed on a user's behalf may also generate harmful URLs. With Gemini, our defense system includes suspicious URL detection based on Google Safe Browsing to differentiate between safe and unsafe links, providing a secure experience by helping to prevent URL-based attacks. For example, if a document contains malicious URLs and a user is summarizing the content with Gemini, the suspicious URLs will be redacted in Gemini’s response.
Gemini in Gmail provides a summary of an email thread. In the summary, there is an unsafe URL. That URL is redacted in the response and is replaced with the text “suspicious link removed”.
4. User confirmation framework
Gemini also features a contextual user confirmation system. This framework enables Gemini to require user confirmation for certain actions, also known as “Human-In-The-Loop” (HITL), using these responses to bolster security and streamline the user experience. For example, potentially risky operations like deleting a calendar event may trigger an explicit user confirmation request, thereby helping to prevent undetected or immediate execution of the operation.
The Gemini app with instructions to delete all events on Saturday. Gemini responds with the events found on Google Calendar and asks the user to confirm this action.
5. End-user security mitigation notifications
A key aspect to keeping our users safe is sharing details on attacks that we’ve stopped so users can watch out for similar attacks in the future. To that end, when security issues are mitigated with our built-in defenses, end users are provided with contextual information allowing them to learn more via dedicated help center articles. For example, if Gemini summarizes a file containing malicious instructions and one of Google’s prompt injection defenses mitigates the situation, a security notification with a “Learn more” link will be displayed for the user. Users are encouraged to become more familiar with our prompt injection defenses by reading the Help Center article.
Gemini in Docs with instructions to provide a summary of a file. Suspicious content was detected and a response was not provided. There is a yellow security notification banner for the user and a statement that Gemini’s response has been removed, with a “Learn more” link to a relevant Help Center article.
Moving forwardOur comprehensive prompt injection security strategy strengthens the overall security framework for Gemini. Beyond the techniques described above, it also involves rigorous testing through manual and automated red teams, generative AI security BugSWAT events, strong security standards like our Secure AI Framework (SAIF), and partnerships with both external researchers via the Google AI Vulnerability Reward Program (VRP) and industry peers via the Coalition for Secure AI (CoSAI). Our commitment to trust includes collaboration with the security community to responsibly disclose AI security vulnerabilities, share our latest threat intelligence on ways we see bad actors trying to leverage AI, and offering insights into our work to build stronger prompt injection defenses.
Working closely with industry partners is crucial to building stronger protections for all of our users. To that end, we’re fortunate to have strong collaborative partnerships with numerous researchers, such as Ben Nassi (Confidentiality), Stav Cohen (Technion), and Or Yair (SafeBreach), as well as other AI Security researchers participating in our BugSWAT events and AI VRP program. We appreciate the work of these researchers and others in the community to help us red team and refine our defenses.
We continue working to make upcoming Gemini models inherently more resilient and add additional prompt injection defenses directly into Gemini later this year. To learn more about Google’s progress and research on generative AI threat actors, attack techniques, and vulnerabilities, take a look at the following resources:
Beyond Speculation: Data-Driven Insights into AI and Cybersecurity (RSAC 2025 conference keynote) from Google’s Threat Intelligence Group (GTIG)
Adversarial Misuse of Generative AI (blog post) from Google’s Threat Intelligence Group (GTIG)
Google's Approach for Secure AI Agents (white paper) from Google’s Secure AI Framework (SAIF) team
Advancing Gemini's security safeguards (blog post) from Google’s DeepMind team
Lessons from Defending Gemini Against Indirect Prompt Injections (white paper) from Google’s DeepMind team
Sustaining Digital Certificate Security - Upcoming Changes to the Chrome Root Store
Note: Google Chrome communicated its removal of default trust of Chunghwa Telecom and Netlock in the public forum on May 30, 2025.
The Chrome Root Program Policy states that Certification Authority (CA) certificates included in the Chrome Root Store must provide value to Chrome end users that exceeds the risk of their continued inclusion. It also describes many of the factors we consider significant when CA Owners disclose and respond to incidents. When things don’t go right, we expect CA Owners to commit to meaningful and demonstrable change resulting in evidenced continuous improvement.
Chrome's confidence in the reliability of Chunghwa Telecom and Netlock as CA Owners included in the Chrome Root Store has diminished due to patterns of concerning behavior observed over the past year. These patterns represent a loss of integrity and fall short of expectations, eroding trust in these CA Owners as publicly-trusted certificate issuers trusted by default in Chrome. To safeguard Chrome’s users, and preserve the integrity of the Chrome Root Store, we are taking the following action.
Upcoming change in Chrome 139 and higher:
- Transport Layer Security (TLS) server authentication certificates validating to the following root CA certificates whose earliest Signed Certificate Timestamp (SCT) is dated after July 31, 2025 11:59:59 PM UTC, will no longer be trusted by default.
- OU=ePKI Root Certification Authority,O=Chunghwa Telecom Co., Ltd.,C=TW
- CN=HiPKI Root CA - G1,O=Chunghwa Telecom Co., Ltd.,C=TW
- CN=NetLock Arany (Class Gold) Főtanúsítvány,OU=Tanúsítványkiadók (Certification Services),O=NetLock Kft.,L=Budapest,C=HU
- TLS server authentication certificates validating to the above set of roots whose earliest SCT is on or before July 31, 2025 11:59:59 PM UTC, will be unaffected by this change.
This approach attempts to minimize disruption to existing subscribers using a previously announced Chrome feature to remove default trust based on the SCTs in certificates.
Additionally, should a Chrome user or enterprise explicitly trust any of the above certificates on a platform and version of Chrome relying on the Chrome Root Store (e.g., explicit trust is conveyed through a Group Policy Object on Windows), the SCT-based constraints described above will be overridden and certificates will function as they do today.
To further minimize risk of disruption, website operators are encouraged to review the “Frequently Asked Questions" listed below.
Why is Chrome taking action?CAs serve a privileged and trusted role on the internet that underpin encrypted connections between browsers and websites. With this tremendous responsibility comes an expectation of adhering to reasonable and consensus-driven security and compliance expectations, including those defined by the CA/Browser Forum TLS Baseline Requirements.
Over the past several months and years, we have observed a pattern of compliance failures, unmet improvement commitments, and the absence of tangible, measurable progress in response to publicly disclosed incident reports. When these factors are considered in aggregate and considered against the inherent risk each publicly-trusted CA poses to the internet, continued public trust is no longer justified.
When will this action happen?The action of Chrome, by default, no longer trusting new TLS certificates issued by these CAs will begin on approximately August 1, 2025, affecting certificates issued at that point or later.
This action will occur in Versions of Chrome 139 and greater on Windows, macOS, ChromeOS, Android, and Linux. Apple policies prevent the Chrome Certificate Verifier and corresponding Chrome Root Store from being used on Chrome for iOS.
What is the user impact of this action?By default, Chrome users in the above populations who navigate to a website serving a certificate from Chunghwa Telecom or Netlock issued after July 31, 2025 will see a full page interstitial similar to this one.
Certificates issued by other CAs are not impacted by this action.
How can a website operator tell if their website is affected?Website operators can determine if they are affected by this action by using the Chrome Certificate Viewer.
Use the Chrome Certificate Viewer
- Navigate to a website (e.g., https://www.google.com)
- Click the “Tune" icon
- Click “Connection is Secure"
- Click “Certificate is Valid" (the Chrome Certificate Viewer will open)
- Website owner action is not required, if the “Organization (O)” field listed beneath the “Issued By" heading does not contain “Chunghwa Telecom" , “行政院” , “NETLOCK Ltd.”, or “NETLOCK Kft.”
- Website owner action is required, if the “Organization (O)” field listed beneath the “Issued By" heading contains “Chunghwa Telecom" , “行政院” , “NETLOCK Ltd.”, or “NETLOCK Kft.”
We recommend that affected website operators transition to a new publicly-trusted CA Owner as soon as reasonably possible. To avoid adverse website user impact, action must be completed before the existing certificate(s) expire if expiry is planned to take place after July 31, 2025.
While website operators could delay the impact of blocking action by choosing to collect and install a new TLS certificate issued from Chunghwa Telecom or Netlock before Chrome’s blocking action begins on August 1, 2025, website operators will inevitably need to collect and install a new TLS certificate from one of the many other CAs included in the Chrome Root Store.
Can I test these changes before they take effect?Yes.
A command-line flag was added beginning in Chrome 128 that allows administrators and power users to simulate the effect of an SCTNotAfter distrust constraint as described in this blog post.
How to: Simulate an SCTNotAfter distrust
1. Close all open versions of Chrome
2. Start Chrome using the following command-line flag, substituting variables described below with actual values
--test-crs-constraints=$[Comma Separated List of Trust Anchor Certificate SHA256 Hashes]:sctnotafter=$[epoch_timestamp]
3. Evaluate the effects of the flag with test websites
Learn more about command-line flags here.
I use affected certificates for my internal enterprise network, do I need to do anything?Beginning in Chrome 127, enterprises can override Chrome Root Store constraints like those described in this blog post by installing the corresponding root CA certificate as a locally-trusted root on the platform Chrome is running (e.g., installed in the Microsoft Certificate Store as a Trusted Root CA).
How do enterprises add a CA as locally-trusted?Customer organizations should use this enterprise policy or defer to platform provider guidance for trusting root CA certificates.
What about other Google products?Other Google product team updates may be made available in the future.
Tracking the Cost of Quantum Factoring
Google Quantum AI's mission is to build best in class quantum computing for otherwise unsolvable problems. For decades the quantum and security communities have also known that large-scale quantum computers will at some point in the future likely be able to break many of today’s secure public key cryptography algorithms, such as Rivest–Shamir–Adleman (RSA). Google has long worked with the U.S. National Institute of Standards and Technology (NIST) and others in government, industry, and academia to develop and transition to post-quantum cryptography (PQC), which is expected to be resistant to quantum computing attacks. As quantum computing technology continues to advance, ongoing multi-stakeholder collaboration and action on PQC is critical.
In order to plan for the transition from today’s cryptosystems to an era of PQC, it's important the size and performance of a future quantum computer that could likely break current cryptography algorithms is carefully characterized. Yesterday, we published a preprint demonstrating that 2048-bit RSA encryption could theoretically be broken by a quantum computer with 1 million noisy qubits running for one week. This is a 20-fold decrease in the number of qubits from our previous estimate, published in 2019. Notably, quantum computers with relevant error rates currently have on the order of only 100 to 1000 qubits, and the National Institute of Standards and Technology (NIST) recently released standard PQC algorithms that are expected to be resistant to future large-scale quantum computers. However, this new result does underscore the importance of migrating to these standards in line with NIST recommended timelines.
Estimated resources for factoring have been steadily decreasing
Quantum computers break RSA by factoring numbers, using Shor’s algorithm. Since Peter Shor published this algorithm in 1994, the estimated number of qubits needed to run it has steadily decreased. For example, in 2012, it was estimated that a 2048-bit RSA key could be broken by a quantum computer with a billion physical qubits. In 2019, using the same physical assumptions – which consider qubits with a slightly lower error rate than Google Quantum AI’s current quantum computers – the estimate was lowered to 20 million physical qubits.
Historical estimates of the number of physical qubits needed to factor 2048-bit RSA integers.
This result represents a 20-fold decrease compared to our estimate from 2019
The reduction in physical qubit count comes from two sources: better algorithms and better error correction – whereby qubits used by the algorithm ("logical qubits") are redundantly encoded across many physical qubits, so that errors can be detected and corrected.
On the algorithmic side, the key change is to compute an approximate modular exponentiation rather than an exact one. An algorithm for doing this, while using only small work registers, was discovered in 2024 by Chevignard and Fouque and Schrottenloher. Their algorithm used 1000x more operations than prior work, but we found ways to reduce that overhead down to 2x.
On the error correction side, the key change is tripling the storage density of idle logical qubits by adding a second layer of error correction. Normally more error correction layers means more overhead, but a good combination was discovered by the Google Quantum AI team in 2023. Another notable error correction improvement is using "magic state cultivation", proposed by the Google Quantum AI team in 2024, to reduce the workspace required for certain basic quantum operations. These error correction improvements aren't specific to factoring and also reduce the required resources for other quantum computations like in chemistry and materials simulation.
Security implications
NIST recently concluded a PQC competition that resulted in the first set of PQC standards. These algorithms can already be deployed to defend against quantum computers well before a working cryptographically relevant quantum computer is built.
To assess the security implications of quantum computers, however, it’s instructive to additionally take a closer look at the affected algorithms (see here for a detailed look): RSA and Elliptic Curve Diffie-Hellman. As asymmetric algorithms, they are used for encryption in transit, including encryption for messaging services, as well as digital signatures (widely used to prove the authenticity of documents or software, e.g. the identity of websites). For asymmetric encryption, in particular encryption in transit, the motivation to migrate to PQC is made more urgent due to the fact that an adversary can collect ciphertexts, and later decrypt them once a quantum computer is available, known as a “store now, decrypt later” attack. Google has therefore been encrypting traffic both in Chrome and internally, switching to the standardized version of ML-KEM once it became available. Notably not affected is symmetric cryptography, which is primarily deployed in encryption at rest, and to enable some stateless services.
For signatures, things are more complex. Some signature use cases are similarly urgent, e.g., when public keys are fixed in hardware. In general, the landscape for signatures is mostly remarkable due to the higher complexity of the transition, since signature keys are used in many different places, and since these keys tend to be longer lived than the usually ephemeral encryption keys. Signature keys are therefore harder to replace and much more attractive targets to attack, especially when compute time on a quantum computer is a limited resource. This complexity likewise motivates moving earlier rather than later. To enable this, we have added PQC signature schemes in public preview in Cloud KMS.
The initial public draft of the NIST internal report on the transition to post-quantum cryptography standards states that vulnerable systems should be deprecated after 2030 and disallowed after 2035. Our work highlights the importance of adhering to this recommended timeline.
More from Google on PQC: https://cloud.google.com/security/resources/post-quantum-cryptography?e=48754805