Networking

Introducing Virgo Network, Google’s scale-out AI data center fabric

Wed, 22 Apr 2026 12:00:00 +0000

The AI era requires a fundamental rethink of physical cloud architecture — networking, in particular. With foundational model parameters growing exponentially, traditional general-purpose networks are reaching their breaking points. To fuel the next decade of machine learning, Google designed Virgo Network, a new megascale AI data center fabric that embraces a "campus-as-a-computer" philosophy, and that underpins our AI Hypercomputer.

Legacy network designs simply cannot handle some of the constraints of modern AI:

Massive scale: Training demands now exceed the power and space of a single data center, requiring unified, multi-data-center domains.
Explosive bandwidth growth: Because foundational model training is heavily network-bound, the required bandwidth per accelerator has surged significantly over the last few years, creating throughput and congestion bottlenecks for older architectures.
Synchronized bursts: Intense, millisecond-level traffic spikes (figure 1) put immense pressure on network buffers. The outcome is that even a single "straggler" node can throttle the entire cluster’s performance.
Low latency: ML serving requires fast, consistent response times to deliver real-time inference, making strict latency control a critical architectural constraint.

Figure 1: Sub-millisecond line-rate bursts of an AI training workload

Reimagining the data center network

Meeting the demands of the AI era requires a fundamental shift away from general-purpose network design towards a specialized flat, low-latency network architecture. To address the unique scale and latency constraints, we leverage our proven Jupiter network for north-south traffic and are introducing a new fabric for east-west communication. The resulting architecture consists of three distinct and specialized layers that operate as one unified compute domain:

Scale-up domain: A high-bandwidth, low-latency interconnect fabric designed for tightly coupled communication between accelerators within a single pod.
Scale-out accelerator fabric (east-west): A dedicated accelerator-to-accelerator remote direct memory access (RDMA) fabric optimized for massive horizontal scale across pods. This layer is engineered for deterministic latency and maximum resilience, to provide high “goodput” for the ML workload.
Jupiter front-end network (north-south): A high-capacity fabric that provides fast, reliable access to distributed storage and general-purpose compute resources. It ensures that data access does not become a bottleneck for training and serving workloads, and is also used to scale-across multiple sites for very large training runs.

This architectural decoupling provides key strategic advantages:

Independent evolution: We can evolve and upgrade each network domain independently, preventing system-wide disruptions while accelerating the innovation cycle.
Dedicated scale-out bandwidth: A non-blocking network delivers massive bisectional bandwidth to accelerators for critical training tasks.
ML and network co-design: The network is built in lockstep with each new generation of ML accelerators, helping ensure the fabric is matched to the hardware it supports.

Figure 2: Data center network architecture

Introducing Virgo Network: Megascale data center fabric

Virgo Network is a scale-out fabric designed for the extreme requirements of modern AI workloads. Built on high-radix switches that reduce network layers by allowing more ports per switch, it employs a flat, two-layer non-blocking topology. Compared with traditional datacenter networks, this significantly reduces latency by minimizing network tiers. It features a multi-planar design with independent control domains to connect accelerators (figure 3). The accelerator racks also connect with the Jupiter north-south fabric to access compute and storage services. Together, this streamlined architecture delivers the massive bisection bandwidth and deterministic low latency necessary for both distributed training and serving workloads.

Figure 3: Megascale data center fabric (Virgo Network)

Virgo Network is the foundation of our next-generation accelerator designs and delivers the following advantages:

Massive fabric scale: Virgo Network can link 134,000 chips (TPU 8t) with up to 47 petabits/sec of non-blocking bi-sectional bandwidth in a single fabric.
Generational performance leap: With up to 4x the bandwidth per accelerator (TPU 8t) over the previous generation, Virgo Network delivers the bandwidth you need to get the full power of every chip.
Predictable low latency: Virgo Network delivers 40% lower unloaded fabric latency for TPUs compared to previous generation leading to more predictable performance for latency sensitive AI workloads.

Improving reliability at scale

In a system supporting hundreds of thousands of chips, hardware failures are inevitable. Because a single faulty component can disrupt a synchronized training job, reliability at scale is a primary focus. To maximize workload goodput, we designed the Virgo Network architecture around fault isolation, deep observability, and the rapid mitigation of hangs and stragglers.

At this scale, system-wide resilience requires a solid network foundation. Virgo Network integrates independent switching planes that provide robust fault isolation, protecting cluster-wide goodput from being degraded by localized hardware failures.

Figure 4: How fail-stop and fail-slow impact MTTR

Building on this foundation, we optimize the software and orchestration stack to maximize mean-time between interruptions (MTBI) and minimize mean-time to recovery (MTTR) through two primary areas:

Observability: Reliability at scale requires high-fidelity visibility. We use sub-millisecond telemetry to monitor network systems. This deep visibility allows us to detect transient congestion, optimize buffer management, and pinpoint the root causes of slowdowns across the hardware and software stack.
Identifying stragglers and hangs: Proactive monitoring is critical for identifying nodes that are experiencing performance degradation (stragglers) or that have stopped responding completely (hangs). By rapidly localizing these bottlenecks, with automated straggler and newly added hang detection, we accelerate the training job and protect it from localized slowdowns.

The foundation of the AI Hypercomputer

Virgo Network is a reimagined scale-out data center network custom-built for the stringent demands of modern AI workloads. This flat, multi-planar architecture unifies accelerators across pods into a single compute domain, addressing the bandwidth and scale limitations of traditional networks. By providing robust fault isolation directly at the hardware level, Virgo Network serves as the foundation for system-wide resilience, protecting synchronized workloads from localized hardware faults.

Ultimately, Virgo Network delivers the scale, predictable latency, and reliability necessary to accelerate the agentic AI era. To learn more about how we are building infrastructure for the future of AI, visit our AI infrastructure solutions page, explore the technical documentation, or attend the dedicated breakout session at Google Cloud Next.

Next ‘26: Redefining security for the AI era with Google Cloud and Wiz

Wed, 22 Apr 2026 12:00:00 +0000

aside_block: <ListValue: [StructValue([('title', 'Our news today from Next ‘26'), ('body', <wagtail.rich_text.RichText object at 0x7f0dd3649310>), ('btn_text', ''), ('href', ''), ('image', None)])]>

The AI era demands a new security era. Organizations are facing the dual challenge of harnessing the potential of AI while defending against its malicious use, and Google Cloud can help you adapt and thrive.

The latest research from Google Cloud shows that adversaries are using AI to accelerate the speed, scale, and sophistication of attacks. Meanwhile, M-Trends 2026 also showed that increased threat actor coordination has driven down the time to hand-off from an initial access to a secondary threat actor from eight hours to 22 seconds in the last three years.

Today at Google Cloud Next, we are showcasing how Google Cloud can help you defend against increasingly sophisticated threats at machine speed, protect AI and multicloud environments, and secure cloud workloads at scale.

Delivering agentic defense

Our full-stack AI approach, from the chips to the models, gives you a competitive advantage with better integration and velocity to help protect customers. Not only can Google action insights from the world’s largest threat observatory and Mandiant frontline experts, but we also bring cutting-edge insights and breakthroughs from Google DeepMind, to help make your platforms more secure.

Today we are introducing three new agents in Google Security Operations to help you defend at the speed of AI.

Threat Hunting agent, now in preview, can help teams proactively hunt for novel attack patterns and stealthy adversary behaviors that bypass traditional defenses.
Detection Engineering agent, now in preview, can identify coverage gaps and create new detections for threat scenarios, reducing toil and transforming detection creation from a manual craft into an automated science.
Third-Party Context agent, coming soon to preview, can enrich your workflows with contextual data from third-party content.

Initiating a threat hunt with the Threat Hunting agent

Our Triage and Investigation agent processed over 5 million alerts in the last year, reducing a typical 30-minute manual analysis to 60 seconds with Gemini.

“Operational resilience and cybersecurity are the bedrock of customer trust at BBVA. By integrating advanced artificial intelligence, such as the Triage and Investigation agent, we are able to scale in new ways," said Diego Martinez Blanco, head of Security Technology, BBVA.

“It handles the initial heavy lifting and filters out false positives so we can prioritize issues that require human attention. The agent's transparent explanations allow our team to understand recommendations and ultimately dedicate our resources to more complex investigations,” he said.

You can build your own security agents with remote Google Cloud model context protocol (MCP) server support for Google Security Operations, now generally available. To make it even easier, you can also access the MCP server client directly from the Google Security Operations chat interface, available in preview.

Organizations leveraging an intelligence-led, AI-augmented approach to modern security operations with Google Cloud's agentic defense can realize a strong ROI. Christopher Kissel
Research Vice President, IDC

Findings report created by the Threat Hunting agent

Security teams can also automate response actions with agentic automation in Google Security Operations. To further move teams from manual triage to agentic defense, we introduced dark web intelligence in Google Threat Intelligence, now in preview. Internal tests show it can analyze millions of daily external events with 98% accuracy to elevate threats that truly matter.

"IDC found that organizations experienced measurable operational gains, including substantial reductions in mean time to detect and mean time to respond, fewer false positives, and higher analyst productivity with AI-powered context and automation. These operational improvements translate into significant business outcomes, such as shorter disruption periods, lower incident-related costs, and improved executive confidence in security posture and decision-making," said Christopher Kissel, research vice president, IDC. "Organizations leveraging an intelligence-led, AI-augmented approach to modern security operations with Google Cloud's agentic defense can realize a strong ROI."

New partner-supported workflows for Google Security Operations

Today, we are also announcing a robust cohort of new partner integrations for Google Security Operations. Designed to deliver high-fidelity security workflows right out of the box, our latest participating Google Cloud Security integration ecosystem partners include Darktrace, Gigamon, and SAP.

Protecting AI and cloud applications across any infrastructure

AI and cloud applications are built across multiple platforms and models. To protect them end-to-end, we want to make it easier and faster to mitigate risk, regardless of where and how you build. This support includes major cloud environments like Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud; software-as-a-service (SaaS) environments like OpenAI; and even custom hosted environments.

Wiz, now a part of Google Cloud, expands and deepens our ability to protect the apps you build and run. Wiz empowers you to quickly and securely adopt AI, while also helping protect the AI development lifecycle.

Wiz announced its AI-Application Protection Platform (AI-APP) at the RSA Conference, providing deep visibility, risk posture, and runtime analysis for your AI applications. Wiz also announced Wiz Security Agents and Wiz Workflows, helping you identify and respond to risks and threats at machine speed.

Today, we’re taking our commitment to secure customers in any cloud, platform, and AI environment further. Wiz now supports Databricks as well as new agent studios like AWS Agentcore, Gemini Enterprise Agent Platform, Microsoft Azure Copilot Studio, and Salesforce Agentforce, so customers gain visibility however their teams choose to build.

In addition, Wiz continues to support security ecosystems with integrations to the outer layer of the cloud, including Google Cloud Apigee, Cloudflare AI Security for Apps, and the Vercel platform, further extending the power of the Wiz Security Graph. We’ve also updated how we integrate security detections from Wiz Defend with Google Security Operations and Mandiant Threat Defense to help analysts more easily configure automatic threat information forwarding.

Wiz is also announcing new capabilities designed to secure the AI-native development lifecycle, helping teams to innovate faster and more securely:

Secure vibe-coded applications: Wiz is announcing a new integration, generally available in May, that runs Wiz security scanning directly inside the Lovable platform so vulnerabilities, secrets, and misconfigurations caught by Wiz surface in Lovable's built-in security view, right where teams are already building.
Secure AI-generated code: Wiz removes risks from AI-generated code the moment it is created. Inline AI security hooks integrate directly into IDEs and agent workflows to evaluate prompts and scan AI-generated output instantly, injecting security guardrails before the code is ever committed.
Agent-based remediation: Wiz Skills equip coding agents and AI-native IDEs with full code-to-cloud context and validated attack surface findings from the Wiz Security Graph. These capabilities enable teams to trigger automated, agent-driven remediation workflows either locally from the developer's individual IDE or globally at the repository and pull request level within your version control system.
Eliminate shadow AI: Wiz’s dynamic AI-Bill of Materials (AI-BOM) automatically inventories all AI frameworks, models, and IDE extensions across your environment. This provides complete visibility into what is writing code across your stack, allowing you to track sanctioned corporate tools like Gemini Code Assist and GitHub Copilot while simultaneously uncovering unapproved shadow AI plugins.

You can learn more about the Wiz announcements here.

Securing your agents and the agentic web

In addition to securing your cloud and AI workloads, Google Cloud’s secure-by-design foundation can help you innovate at the speed of AI — from agents to fraud defense to the web.

Securing and governing agents with the Gemini Enterprise Agent Platform
To build, orchestrate, govern, and optimize agents, today we are announcing Gemini Enterprise Agent Platform including:

Agent Identity to enable access management and AI governance at scale. Our new capability provides agents unique identities to operate autonomously with specific authentication flows, and with scoped human delegation.
Agent Gateway, which enables policy enforcement for all agent-to-agent and agent-to-tool connections. It governs your enterprise agent traffic and understands agent protocols like MCP and Agent2Agent (A2A) to inspect and secure every agent interaction.
Model Armor, our runtime protection for model and agent interactions, now integrates with Agent Gateway, Agent Runtime, and Langchain available in preview, and Firebase, generally available, to help developers add inline enforcement and sanitization of agent traffic and interactions without the need to change code. These integrations expand Model Armor's protection against runtime risks such as prompt injections, tool poisoning, and sensitive data leakage across Google Cloud services and our AI portfolio.

Securing the agentic web with Google Cloud Fraud Defense and Chrome Enterprise
Today, we are evolving reCAPTCHA with the launch of Google Cloud Fraud Defense, generally available. This comprehensive platform is designed to discern the legitimacy and authorization of bots, humans, and agents. Using the same scale and signals that protect Google’s own ecosystem, Fraud Defense will soon offer in preview agent-specific capabilities for human users and AI agents that can help secure the digital commerce journey, from account creation and login to payment and checkout.

Our commitment to securing AI extends to the browser, a vital endpoint for interacting with AI. Chrome Enterprise provides comprehensive data protection for the AI era with the visibility and controls needed to embrace AI safely without compromising corporate data:

AI-aware extension threat detections, now in preview, can surface advanced extension telemetry that helps security teams detect and respond to anomalous AI agent activity.
New shadow AI reporting, generally available soon, can help you gain visibility into the shadow AI landscape by flagging employee use of unsanctioned web-based AI and SaaS applications.

What’s new in Trusted Cloud

We continue to offer new security controls and enhance capabilities across identity, data, and networking on our cloud platform to help you secure your environments. Today we’re announcing the following updates:

Simplifying permissions with modern IAM
To help achieve least privilege quickly and simply, we’ve streamlined our predefined roles catalog with easy-to-use administrator, editor, and viewer roles, such as the IAM role picker and the ability to re-authenticate sensitive actions.

Data security
We are announcing several new capabilities for our cloud platform data security portfolio to help protect your most sensitive data and accelerate AI transformation.

Confidential Computing: In partnership with NVIDIA, today we’re announcing Confidential Computing support for G4 VMs, featuring NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Google Compute Engine (GCE) Confidential G4 VMs, available in preview globally, to help strengthen confidentiality and integrity for a wide spectrum of sensitive AI workloads. In partnership with Intel, we’re also introducing the preview of C4 Confidential VMs, bringing Intel TDX to 6th Gen Xeon processors to help protect diverse AI and analytics workloads while providing industry-leading compute density and performance.
Cloud Key Management Services (KMS): We are announcing the new Confidential External Key Manager (cEKM) in preview, giving you the flexibility to host and protect external keys in any region and maintain verifiable control within a confidential environment.
Post-quantum cryptography (PQC): We are introducing KMS Quantum Safe Key Imports, available in preview, to help you bring your own keys with quantum-safe algorithms.
Secret Manager: To help prevent password leaks and mitigate prompt injection risks, we are announcing the general availability of the native integration of our Secret Manager with Agent Development Kit.

Network security
Google Cloud’s Cross-Cloud Network security products offer several new capabilities:

Cloud NGFW: We’re announcing the Cloud NGFW advanced malware sandbox, in preview later this year, to help defend against highly evasive zero-day threats. This capability is powered by Palo Alto Networks Advanced Wildfire, trained on data from more than 70,000 Palo Alto Networks customers to stop 99% of known and unknown malware.
Cloud Armor: We have released new Cloud Armor managed rules, powered by Thales Imperva and available in preview, to detect Layer 7 application attacks and zero-day CVEs (like React2Shell).

Advancing Google Cloud security with SCC
As our Google Cloud-native security solution, Security Command Center (SCC) establishes a cloud security baseline to protect both your traditional and AI applications on Google Cloud:

AI agents, models, and MCP servers are secured by providing continuous discovery and comprehensive risk analysis to identify threats, vulnerabilities, and misconfigurations.
SCC will add deep runtime visibility to uncover shadow AI for your Google Cloud workloads. Coming soon in preview, SCC will automatically discover unmanaged agentic workloads — including agents, MCP servers hosted on Cloud Run, GKE, and inference endpoints running on GKE, and surface those as posture findings in SCC.
Our enhanced Security Command Center Standard tier provides data security posture management, compliance, vulnerability management, and risk analysis to help any Google Cloud customer establish strong security, compliance and risk coverage from the start at no additional costs.

Take the next step

When you make Google part of your security team, you gain the power of an intelligence-driven, AI-native defense; the freedom of an open cloud that’s secure-by-design; and the industry's most-battle tested experts as an extension of your organization.

For more on these new innovations and how you can secure what’s next, tune in to watch our security spotlight. And be sure to check out the many great security breakout sessions — live and on-demand — to learn more about all of our Next ‘26 announcements.

Cross-cloud infrastructure innovation for the agentic enterprise

Wed, 22 Apr 2026 12:00:00 +0000

The era of agentic AI is accelerating from human- to machine-speed operations, while also creating profound stress on legacy technology infrastructure. This new reality pushes foundational systems to their limits: agents generate thousands of internal messages and complex queries, spawning more agents, all of which can rapidly overwhelm traditional networks and databases, and expose new security vulnerabilities.

Unlocking AI's full potential in the era of agents requires a secure, adaptive foundation. We call it cross-cloud infrastructure for the agentic enterprise – and at Google Cloud Next ‘26, we’re launching a powerful set of new innovations across four areas:

What’s new:

Fluid compute: Google Compute Engine and Kubernetes services work together to enable cost-effective, high-speed AI agents and enterprise workloads with new compute and orchestration capabilities.
Secure cross-cloud connectivity: Agent Gateway, Cloud Armor, and other tools deliver a secure, governed, and simplified networking foundation for AI agents, including observability of agentic traffic across clouds.
Unified data layer: Smart Storage, Knowledge Catalog, and other innovations transform passive data archives into dynamic reasoning engines, giving AI agents the context they need to execute.
Digital sovereignty: Confidential External Key Management and new features in Google Distributed Cloud bring Google’s leading models and AI enablers wherever your data lives.

Let’s take a closer look at all the news for each of these four areas.

Fluid compute

Agentic workloads are dynamic and unpredictable, impacting both traditional enterprise applications and the AI agents themselves. Fluid compute is enabled by Google Compute Engine and Google Kubernetes services working together to dynamically adapt and shift weight in real-time, enabling cost-effective, high-speed AI agents and operational enterprise workloads for all customers.

While our AI Hypercomputer delivers raw power for large-scale AI model training, fluid compute addresses the needs of operational workloads and agents. As agents move toward reasoning and reinforcement learning, CPUs are reclaiming a central role, excelling at the "branchy" logic, complex control flows, and secure execution sandboxes (like those for agentic orchestration, RL, SLM inference, and RAG) that agent workflows demand. CPUs also provide the critical isolation needed for secure agent execution, complementing the parallel processing strength of GPUs and TPUs used in training.

We are introducing new CPU families, GKE capabilities, and Hyperdisk block storage capabilities to run traditional workloads and AI agents securely at scale, including:

Google C4N Series: These VMs help ensure your enterprise workloads don't slow down under the demands of agentic AI by processing up to 95 million packets per second, up to 40% faster than other leading hyperscalers. This eliminates I/O bottlenecks for demanding workloads like security appliances, streaming media, and open source databases, even when utilizing smaller instance sizes.
Google M4N Series with Hyperdisk Extreme: M4N removes data pipeline bottlenecks and eliminates overprovisioning to deliver industry leading per-core IOPS and throughput required to handle massive data I/O from agents, analytics, and mission-critical databases. M4N provides 26.57 GB of RAM per vCPU, allowing you to scale mission-critical workloads cost-effectively on fewer cores. For example, M4N with Hyperdisk Extreme reduces Oracle workload total cost of ownership by over 20% compared to leading hyperscale clouds.
GKE Agent Sandbox: This solution secures agents with trusted gVisor isolation and handles demand spikes, launching up to 300 sandboxes per second, per cluster. Backed by the only managed sandbox technology available among leading hyperscale clouds, it achieves up to 30% better price-performance than competitors when running AI agents on GKE Agent Sandbox with Google Axion N4A.

“Wayfair's AI strategy is built on years of systematic infrastructure modernization on Google Cloud — migrating our core eCommerce engine and databases off legacy systems, decomposing monolithic services into cloud-native architecture, and unifying our data and analytics platform. That foundation is what makes everything else possible. Today, Gemini Enterprise Agent Platform is powering everything from catalog enrichment to generative shopping experiences that help customers create a home that's just right for them — and it's the same foundation preparing us for the agentic era, where AI doesn't just assist but actively drives discovery, personalization, and commerce across every customer touchpoint and across our business.” - Fiona Tan, Chief Technology Officer, Wayfair

Explore all our latest compute innovations in this blog.

Secure cross-cloud connectivity

Agentic AI replaces predictable human requests with autonomous “reasoning loops,” in which agents call other agents that, in turn, call LLMs, triggering massive, sudden surges in compute and machine-to-machine traffic. This shift creates unique challenges for network predictability and security of non-human identities. Optimized for agentic AI, our Cross-Cloud Network moves data across diverse environments, connecting employees, customers, and agents with visibility and security. New in Cross-Cloud Network are:

Agent Gateway: Governs and orchestrates your enterprise agentic traffic as the “air traffic controller” in Gemini Enterprise Agent Platform. It natively understands agent protocols like MCP and A2A to inspect and govern every agent interaction. By integrating with Google and third-party identity and AI safety services, it enables deep inspection to verify access, block attacks, and protect sensitive data, maintaining compliance across your core business.
Cloud Network Insights: Delivers broad visibility across your hybrid and multi-cloud infrastructure to drive faster troubleshooting and network resolutions. Continuously monitor your end-to-end agent, network and web performance across Google Cloud, AWS, Azure, data centers, internet applications, and agentic workloads. Using synthetic traffic analytics, Cloud Network Insights provides hop-by-hop network path visibility to help you pinpoint the source of degradations and is coupled with AI-powered insights from Gemini Cloud Assist to deliver more autonomous operations.
Enhanced Cloud Next Generation Firewall (NGFW) and Cloud Armor: Provides machine-speed, AI-powered protection to combat the rapid explosion of AI-generated polymorphic malware and zero-day exploits. Cloud NGFW advanced malware sandbox delivers real-time inline prevention of AI-generated threats, while Cloud Armor managed rules provides automated protection against both known and unknown Common Vulnerabilities and Exposures (CVEs). Together with Model Armor, these services analyze the intent and content of AI agent communications.

Discover more about how we optimized networking for AI in and outside of the data center.

Unified data layer

AI agents are only as powerful as the data they can access and the context they’re given. More applications and platforms are using structured and unstructured data, but it can be difficult to catalog, find, and act on that data at scale, leading to less effective agent interactions. To close the gap, your agents need all of your data brought together into a cohesive, queryable knowledge engine, or unified data layer. This way, your agents can identify and access accurate sources. At Next ‘26, we’re enhancing the unified data layer with:

Smart Storage: This solution transforms dark data into a powerful knowledge asset for AI agents and training by embedding new semantic intelligence directly into your data objects. With new Google Cloud Storage capabilities like automated annotation, entity extraction, and semantic search, your agents can instantly find and use the specific data they need — whether it's hidden in spreadsheets, PDFs, or other unstructured formats across your entire organization. This significantly speeds up the development and deployment of your AI solutions. Learn more about storage innovations to accelerate your AI workloads.
Knowledge Catalog: Knowledge Catalog maps business meaning across your entire data estate, providing a grounded source of truth so agents can deliver the most accurate results. This foundation enables AI training and inferencing and doesn’t require you to migrate your data; your agents interact with it directly, wherever it lives, with full context and governance, making modernization easier.

Part of our Agentic Data Cloud, Smart Storage and Knowledge Catalog can take your data from a passive archive into a dynamic reasoning engine.

“AI is critical to making our customers’ smart home and security solutions more intelligent and convenient. By leveraging Google Cloud’s Smart Storage, we auto-annotate rich metadata delivered in BigQuery. We’ve scaled and accelerated our data discovery and curation efforts, speeding up our AI development process from months to weeks, continuously delivering innovations that build trust and enhance the overall home experience.” - Brandon Bunker, VP of Product, AI, Vivint

Digital sovereignty

In the agentic era, digital sovereignty is a fundamental requirement for public sector and enterprise customers looking to accelerate innovation — without sacrificing control. There’s no one-size-fits-all solution, which is why we’ve designed a comprehensive set of offerings to meet different sovereign AI needs anywhere: public cloud, on-premises, or hybrid. New capabilities in our sovereign AI portfolio include:

Confidential External Key Management: Organizations can use Confidential External Key Management to maintain complete possession, custody, and control of your encryption keys and the policies that govern them. Confidential External Key Management leverages Confidential Compute to host the key management endpoint in a tamper-proof environment within Google Cloud. You are in control and determine where your keys are stored, who can access them, and under what circumstances. Even highly privileged Google administrators cannot access your keys without authorization, which you can revoke at any time. Your data, your control.
Gemini on Google Distributed Cloud: With Gemini on GDC, companies can securely deploy Gemini in sensitive environments, while meeting data sovereignty needs. Your choice of deployment models includes managed software on your connected hardware or a fully disconnected, air-gapped solution. You can now scale with Google's leading AI capabilities even in the most restricted, high-security environments — from powerful Gemini models to advanced coding, search, and other agentic capabilities.

In addition, Google Distributed Cloud supports an end-to-end AI stack, combining our latest-generation AI infrastructure with Gemini models to accelerate and enhance all your sovereign AI workloads. This stack includes:

NVIDIA Blackwell GPUs: NVIDIA Blackwell (NVIDIA HGX B200) and NVIDIA Blackwell Ultra platforms (NVIDIA HGX B300) GPUs accelerate AI performance, leveraging fifth-gen NVIDIA NVLink to deliver data-center scale bandwidth directly to your environment.
New VM families: New A4 family offerings provide the ability to handle the most demanding inference tasks, delivering a 2.25x increase in peak compute. Memory-Optimized M2 and M3 brings the high memory-to-vCPU ratios needed for massive ERP and data analytics workloads on-premises.
Enhanced storage: Eliminate storage bottlenecks with 6x storage capacity per zone and a 10x performance boost, giving you the ability to do AI reasoning on-premises. Now, your data infrastructure moves at the speed of AI reasoning.

"Our customers demand high-performance, private AI inference without the risks of multi-tenancy. Google Distributed Cloud allows us to provide dedicated, low-latency environments that meet strict sensitive data requirements. With the ability to run Gemini on B200s and B300s, we can significantly increase inference speeds and provide the token throughput our clients need to scale." - Dave Driggers, CEO & Co-founder, Cirrascale Cloud Services

Transforming vision into reality

When these product areas converge, your infrastructure evolves into a high-performing, secure, adaptive foundation for the agentic era. We're not just offering tools; we're providing the architectural blueprint to enable enterprises and the public sector to rapidly embrace the full power of AI and agents with confidence.

To learn more about key industry trends for AI Infrastructure, read our State of Infrastructure in the Agentic AI Era report.

What’s new with the Cross-Cloud Network at Next ‘26

Wed, 22 Apr 2026 12:00:00 +0000

While generative AI sparked a revolution, the true paradigm shift is the rapid evolution from standalone AI models to multi-agent autonomous systems. In this new era, the network transcends basic connectivity to become the critical integration layer for your agentic enterprise.

As AI agents and services surge, your core applications remain as vital as ever. To thrive in this rapidly evolving landscape, you need a planet-scale network to connect, protect, govern, deliver, and secure all your users, data, agents, AI services, and core applications across clouds and on-premises.

Google Cloud's Cross-Cloud Network provides this unified foundation, and is now used by 65% of the Fortune 100 and handles up to 27 exabytes of data per month. At Google Cloud Next, we are introducing networking innovations to accelerate your AI infrastructure, strengthen security, and simplify operations.

Optimized networking infrastructure for AI

As we move toward an agentic world, the network must support massive-scale inference paired with reinforcement learning. At Google, we’ve spent years refining this cycle to power our own global AI services. Today, we’re announcing AI infrastructure network innovations that bring this same architecture directly to your workloads, across agents, inference, training, and beyond.

Networking for agents

The Gemini Enterprise Agent Platform is a comprehensive enterprise environment designed to build, scale, govern, and optimize the next generation of autonomous agents. Key innovations being announced in preview include:

Agent Gateway: Air-traffic control for agentic traffic

Agent Gateway understands MCP and A2A agentic protocols and provides an open, extensible, scalable way to enforce centralized governance policies to securely connect agents, models, and tools across runtimes.

Ambient networking: A seismic shift in service-to-service connectivity

Ambient networking, a new integrated data plane for Google Kubernetes Engine (GKE) and Cloud Run, provides service discovery, zero-trust access, and traffic management without the need for complex and resource-heavy sidecar proxies. It reduces operational overhead and enables up to a 10x reduction in GKE resource usage for layer 4 (L4) mesh capabilities

Ambient networking underpins two new capabilities:

Service bindings automatically establish service-to-service connectivity, allowing developers to move faster to build and scale their agentic applications and services.
Network Services Monitoring bridges application and network observability gaps resulting in faster root-cause analysis and simplified troubleshooting.

Rich partner integrations and customizations

With the help of Service Extensions, we are developing solutions for identity, governance, and AI security for agent-to-anywhere traffic. Coming soon in preview to Agent Gateway are:

Identity and governance administration: Offering delegated authorization to Cloud IAM and partner services from Okta, Ping, Saviynt, and Silverfort to enforce real-time, contextual governance policies based on application and business context.
Runtime security: As a universal enforcement point by integrating with Google Cloud’s Model Armor and partner solutions from Broadcom, Check Point, Cisco, CrowdStrike, Exabeam, F5, Netskope, Palo Alto Networks, Thales, and Zscaler. Together, these can help to secure agentic communications against emerging AI attack vectors.

These innovations are built on an open foundation including Envoy and Kubernetes, providing strong, integrated governance in multicloud environments using standard Kubernetes Gateway APIs.

Networking for inference

At Google we run inference at scale with optimized use of distributed GPU and TPU resources, automatic failover between regions for high availability, and optimized global request routing for fast end-user performance. GKE Inference Gateway delivers these capabilities to our cloud customers including the following new innovations:

Multi-region support allows scaling inference services across regions, enabling cross-regional failover, optimized utilization, and reduced global latency (preview).
Predictive latency boost improves utilization with intelligent request routing based on predefined performance targets (preview).
Disaggregated serving leverages llm-d’s SGLang support, offering the flexibility to choose between vLLM and SGLang for model serving (GA).

Gemini Enterprise Agent Platform reduced Time to First Token (TTFT) latency by over 35% for Qwen3-Coder by using GKE Inference Gateway.

“Before GKE Inference Gateway, managing our inference stack with Ray Serve created a complex, dual-orchestration layer that was a significant burden on our small operations team. Moving to the Inference Gateway and native Kubernetes deployments was the 'North Star' architecture we needed to simplify management and achieve robust production stability with a GKE-native batteries-included solution.” - Mikhail Lubinets, Lead HPC Engineer, Technology Innovation Institute

Networking for training

At Google, we build and run the largest AI models in the world — and we built a network to support that. Some of the new enhancements are:

Massive scale with Virgo Network

This new non-blocking data center fabric removes latency barriers:

Virgo can link up-to 134,000 chips with 47 Petabits/sec of non-blocking bi-sectional bandwidth to deliver 1.7K Exaflops of compute.
With enhancements in Pathways and JAX, you can further connect these Virgo fabrics to scale to over 1 million TPU chips in a single training cluster.
We are also making Virgo Network available on NVIDIA Vera Rubin NVL72, supporting up to 960,000 GPUs.

For more on Virgo Network, check out this blog.

Accelerator network profiles

It’s easier than ever to handle the complex networking prerequisites for accelerator-equipped GKE node pools with DRANET, which improves bandwidth for distributed AI/ML workloads by up to 60% (GA).

AI-native Cloud Interconnect

SLA-backed, and optimized for efficiency, Cloud Interconnect supports petabit-scale data transfers and is available with a fixed price option. Cloud Interconnect now supports:

400 Gbps circuits with up to 3.2 Tbps in a single connection (GA)
Partner Cross-Cloud Interconnect for AWS (GA), CoreWeave (in preview soon), and Lumen (in preview soon)

Cross-Cloud Network for AI and core applications

The Cross-Cloud Network helps ensure you can securely connect users, data, locations, applications, services, and infrastructure anywhere in the world, at planetary scale. We designed our global multi-shard network to scale horizontally to meet the demands of the AI era and enable us to accommodate our 10x WAN traffic growth from 2020 to 2025.

These are some of the improvements we’re making to the Cross-Cloud Network:

Ultra Low Latency Solution for financial exchanges

In partnership with CME Group, we are bringing the world's leading derivatives marketplace to Google Cloud. To support CME Group’s performance requirements, we developed an ultra low latency (ULL) networking and compute solution. This fully managed cloud environment will allow CME Group and its clients to migrate its core trading systems to Google Cloud.

Now in preview, the solution is designed to meet the unique and exacting requirements of running financial exchanges in the cloud. It includes several new technologies:

Deterministic high-performance compute powered by ULL networking, with bare metal and VM form factors, delivers a comprehensive portfolio for your trading compute needs.
Scalable multicast data distribution with hardware-based ultra-low latency enables reliable one-to-many market data sharing.
Nanosecond-level clock sync enabled by Firefly, a novel clock synchronization system. Firefly achieves sub-10ns NIC-to-NIC synchronization to support high-frequency trading.
Advanced network observability with 64-bit nanosecond timestamps, support for multiple traffic-mirroring destinations and multicast traffic, and support for auditing and regulatory requirements.
Low-latency inference allowing exchange participants to connect their AI-driven services to the exchange’s infrastructure.

“The Google Cloud Ultra Low Latency Solution provides the level of performance necessary for CME Group futures and options markets to run in the cloud, expanding access to clients worldwide.” - Sunil Cutinho, CIO, CME Group

Cross-cloud observability for networks, applications, and agents

Whether you’re running core applications or new AI agents, you need visibility into your network infrastructure. Cloud Network Insights, now in preview, offers network performance monitoring (NPM) and digital experience monitoring (DEM) to dramatically reduce the mean time to detect and mitigate network-related agent, application, and API issues.

Cloud Network Insights is enabled by technologies from Broadcom’s AppNeta and powered by AI-enabling natural language queries through Gemini Cloud Assist.

"In an environment as complex and high-scale as Sabre’s, total visibility isn't just a luxury — it's a requirement for operational resilience. Cloud Network Insights will enable us to further shift our posture from reactive troubleshooting to proactive optimization. By providing granular, real-time telemetry across our global cloud footprint, it helps eliminate the traditional 'black box' of the network, allowing our teams to resolve bottlenecks before they impact the traveler experience." - Alfredo Rodriguez, VP Cloud Platform Infrastructure, Sabre Corporation

“Cloud Network Insights closes the 'visibility gap' between the private corporate network and the public cloud, empowering our joint customers to pinpoint performance bottlenecks in seconds rather than hours.” - Alan Davidson, CIO, Broadcom

Cross-Cloud Network for distributed applications

Multicloud and hybrid networks require secure, reliable, and high-performance connectivity. New enhancements for our foundational networking services and tools include:

Private Service Connect

Private Service Connect traffic volume grew 4x in 2025 and it now supports 40+ Google and third-party published services, enabling secure private global access to your managed services.
Private Service Connect endpoint-based security allows for granular authorization policies for producer-to-consumer service communications (preview).
Gemini Cloud Assist for Private Service Connect provides for automated troubleshooting (preview).

Cloud-native IP address management (IPAM)

Cloud Number Registry is an IPAM solution powered by agentic technologies. Network admins can easily find free IP ranges, track utilization, and allocate resources (preview). It also integrates with Infoblox Universal DDI for Cross-Cloud Network IPAM discovery and enforcement.
Hybrid Subnets allow you to migrate legacy workloads from on-premises to a VPC without needing to change hard-coded IP addresses (GA).
Cloud NAT allows you to connect your IPv6-only workloads to private IPv4 destinations using the combined power of DNS64 and private NAT64 (in preview soon).

Network Connectivity Center (NCC)

Partner Cross-Cloud Interconnect for AWS is available as a connectivity type in NCC (preview).
Support for static routes using an internal load balancer as the next hop allows the integration of Secure Web Proxy and third-party network security virtual appliances (GA).
Support for privately used public IP (PUPI) allows the exchange of PUPI IPv4 addresses with VPC spokes and producer VPC spokes (GA).

Granular networking charge visibility

Cost Explorer and the new App Optimize API now provide attribution of associated Data Transfer costs to the originating resources for Google Cloud products (in preview soon).

Cross-Cloud Network for internet-facing services

As part of Cross-Cloud Network, the Global Front End simplifies how you deliver, scale, and protect web, API, and AI workloads. New capabilities include:

Global Front End Enterprise delivers simplified consumption by combining capabilities from global Cloud Load Balancing, Google Cloud Armor, Cloud CDN, and Service Extensions with up to 15% lower TCO (in preview soon).
Post quantum cryptography (PQC) helps secure your workloads with industry-standard algorithms that provide a layered defense against both classical and quantum adversaries.
Google tag gateway, enabling advertisers to serve tags from their own domain, which can significantly improve the accuracy and resilience of measurement signals (GA soon).

In addition, Cloud CDN, an important part of the Global Front End, now offers:

Built-in image optimization to help you deliver content that best fits your end users’ screens and saves on bandwidth costs (in preview soon).
GKE Gateway support so you can enable and manage caching services using GKE APIs (GA).

Cross-Cloud Network’s Cloud WAN for global enterprises

Cloud WAN is a fully managed, reliable global backbone to connect your enterprise. New capabilities include:

Expanded geographic reach: Our network spans more than 10 million kilometers of terrestrial and subsea fiber, and Network Connectivity Center’s site-to-site data transfer is now available in over 25 countries.
NCC Gateway enables third-party secure service edge (SSE) integrations from Palo Alto Networks (GA soon) and Symantec (preview).
The Verified Peering Provider program, which offers highly reliable internet connectivity to Google, now has dramatically expanded availability through 175+ providers worldwide.
Last mile connectivity: Provision site-to-cloud private connectivity in minutes with preferred partners from the Google Cloud console (in preview soon).

“Cloud WAN enables Dun & Bradstreet to evolve our global network via composable, cloud-native constructs. Leveraging NCC, we’ve built a resilient, high-performance platform that simplifies operations and optimizes costs. This foundation supports continued modernization and AI-driven workloads. We expect to extend this architecture as new patterns emerge, maintaining our blueprints-first approach.” - Josh Barry, VP, Network Engineering, Dun & Bradstreet

AI-powered security against evolving threats

The threat landscape is evolving faster than ever, with AI-driven attacks. Staying ahead requires the latest defenses. Cross-Cloud Network relies on Cloud NGFW and Cloud Armor for advanced security capabilities. Here’s the latest on those offerings.

Cloud NGFW

Advanced malware sandbox uses AI models trained on data from 70k+ customers to stop 99% of known and unknown malware, including evasive zero-days. Advanced malware sandbox is powered by Palo Alto Networks Advanced Wildfire (in preview soon).
Internal Application and proxy Network Load Balancer support helps to enforce consistent, service-centric security for abstracted services like GKE, Cloud Run, and Private Service Connect traffic (preview).
Project-level policies allow for creating and managing Cloud NGFW endpoints, security profiles, and security profile groups at the project level (in preview soon).

Cloud Armor

Managed rules, built-in rulesets across 15 threat categories, deliver automated threat protection against a broad set of attacks and zero-day CVEs. This is powered by Thales Imperva based on visibility to 1.5 trillion web requests each month (in preview soon).
Google Cloud Fraud Defense integration helps to discern the legitimacy and authorization of bots, humans, and agents. Fraud Defense is the evolution of reCAPTCHA, which protects over 14 million domains from fraud and abuse.
Adaptive protection for Network Load Balancers & VMs brings advanced machine learning to L3/L4 traffic, to detect and mitigate volumetric DDoS attacks (in preview soon).
A simplified user experience with a visual rule builder makes custom rule creation easier (in preview soon).

AI-powered network operations

Finally, new AI-powered technologies in Gemini Cloud Assist can help automate manual tasks, ease troubleshooting, predict reliability issues, improve security, and help optimize your network to reduce toil and improve reliability with new specialist agents. These include:

A network security agent that streamlines network security operations by assisting with policy generation, recommendations, and impact analysis (in preview soon).
A network agent that optimizes workload placement for performance and reliability, and also provides advanced cost estimation for observability services (in preview soon).

Additionally, to enable customers and partners to build their own agents, we are releasing Network observability MCP tools and agent skills. This will allow their agents to leverage connectivity tests, and allows for natural language querying of VPC Flow Logs (both in preview).

The network that scales with you

We built our Cross-Cloud Network on the same global infrastructure that powers Google’s largest AI and internet services. This provides you with a blazing-fast, planet-scale foundation that is both secure by design and open by principle, allowing you to integrate your trusted partners across any environment.

As we move into the agentic era, our flexible, future-proof solutions ensure you can quickly adopt the latest AI technologies while maintaining the reliability of your core applications.

Whatever comes next, we’ve built the network to help you lead it. Attend our networking sessions at Next ’26 to learn more, or learn more about the Cross-Cloud Network!

Evolving Media CDN for the world’s most demanding broadcast and streaming workloads

Fri, 17 Apr 2026 17:30:00 +0000

Editor’s note: In this post, we share joint insights from Raj Gulani, Director of Product Management for Network Experiences, and Dan Rayburn, Industry analyst with 30-plus years of experience covering streaming media.

In our combined experience observing and building within the media industry, one truth remains constant: the landscape is always evolving. Audience expectations for flawless, broadcast-quality streaming have become the undisputed baseline, while the scale of global live events continues to push the technical boundaries of content delivery.

From our shared perspective, the most successful platforms are no longer defined solely by their ability to handle massive scale. Instead, they are distinguished by their evolution — how they adapt to solve the complex operational and financial challenges that broadcasters and streaming services face every day. This post offers a joint look at some of these key industry demands and how platforms are innovating to meet them.

The need for scale, flexibility, and efficiency

The need to support massive audiences during live global events like the Super Bowl, FIFA World Cup, and IPL is a given. In response to this clear industry trend, content delivery networks (CDN) must continuously scale their infrastructure to support peak traffic demand. We’ve seen this firsthand with Google Cloud’s Media CDN, which shares infrastructure with Youtube, has had to actively respond to customer capacity needs with infrastructure presence in relevant regions, especially for live events.

Beyond raw capacity, however, a more nuanced story is unfolding around the need for greater architectural flexibility and more predictable cost models. We believe the focus has rightly shifted to providing smarter tools that help manage traffic, improve performance, and control costs. Here are a few examples of this:

Flexible caching architectures: One of the key challenges in global delivery is minimizing latency and cost. The introduction of features like flexible shielding – supported today in South Africa, the Middle East, and the US – is a direct answer to this. Such features allow traffic to be managed within a region, avoiding the performance and cost penalties of fetching content from a distant origin.
Solving for interoperability: As workflows become more complex, platforms need to be better integrated. We have seen a focus on addressing common origin compatibility issues through tactical engineering solutions. Examples include adding support for HEAD requests, increasing maximum segment sizes to 25MiB to accommodate 4K/8K content, and enabling multi-part range requests. These kinds of updates are crucial for ensuring a platform works with a customer’s existing infrastructure, not against it.
The shift to predictable cost models: In a maturing industry, operators need financial predictability. The move toward offering monthly savings plans, which provide TCO benefits for a committed level of use, is an important step beyond pure pay-as-you-go pricing models.
The critical need for broadcast-grade visibility: In our analysis of streaming operations, a lack of real-time visibility is a recurring point of failure. For a major live event, customers cannot wait for next business day response times and require more immediate intervention to ensure the live event runs flawlessly — it’s a fundamental requirement. The use of tools like monitoring as a service (MaaS) during major live events highlights the industry's shift toward proactive, data-driven operations. By providing a "broadcast operating center" view into everything from origin health to end-user quality of service, such tools empower engineering teams to identify and mitigate potential problems before they impact the audience.
A shared outlook on the future: The evolution of content delivery platforms is a clear indicator of the media industry's priorities. The focus is increasingly on providing data-driven scaling, sophisticated operational tooling, and tangible architectural and financial benefits. This move toward solving specific, complex challenges demonstrates a maturing market, and it’s a direction we both believe is critical for the future of broadcasting and streaming.

For technical leaders looking to benchmark their current infrastructure against these trends, exploring modern edge architectures is a logical next step. You can learn more about implementing flexible caching and broadcast grade visibility by visiting the Media CDN documentation.

Migrating to Google Cloud’s Application Load Balancer: A practical guide

Fri, 10 Apr 2026 16:00:00 +0000

Migrating your existing application load balancer infrastructure from an on-premises hardware solution to Cloud Load Balancing offers substantial advantages in scalability, cost-efficiency, and tight integration within the Google Cloud ecosystem. Yet, a fundamental question often arises: "What about our current load balancer configurations?"

Existing on-premises load balancer configurations often contain years of business-critical logic for traffic manipulation. The good news is that not only can you fully migrate existing functionalities, but this migration also presents a significant opportunity to modernize and simplify your traffic management.

This guide outlines a practical approach for migrating your existing load balancer to Google Cloud’s Application Load Balancer. It addresses common functionalities, leveraging both its declarative configurations and the innovative, event-driven Service Extensions edge compute capability.

A simple, phased approach to migration

Transitioning from an imperative, script-based system to a cloud-native, declarative-first model requires a structured plan. We recommend a straightforward, four-phase approach.

Phase 1: Discovery and mapping

Before commencing any migration, you must understand what you have. Analyze and categorize your current load balancer configurations. What is each rule's intent? Is it performing a simple HTTP-to-HTTPS redirect? Is it engaged in HTTP header manipulation (addition or removal)? Or is it handling complex, custom authentication logic?

Most configurations typically fall into two primary categories:

Common patterns: Logic that is common to most web applications, such as redirects, URL rewrites, basic header manipulation, and IP-based access control lists (ACLs).
Bespoke business logic: Complex logic unique to your application, like custom proprietary token authentication, advanced header extraction / replacement, dynamic backend selection based on HTTP attributes, or HTTP response body manipulation.

Phase 2: Choose your Google Cloud equivalent

Once your rules are categorized, the next step involves mapping them to the appropriate Google Cloud feature. This is not a one-to-one replacement; it's a strategic choice.

Option 1: the declarative path (for ~80% of rules)
For the majority of common patterns, leveraging the Application Load Balancer's built-in declarative features is usually the best approach. Instead of a script, you define the desired state in a configuration file. This is simpler to manage, version-control, and scale.

Common patterns to declarative feature mapping:

Redirects/rewrites -> Application Load Balancer URL maps
ACLs/throttling -> Google Cloud Armor security policies
Session persistence -> backend service configuration

Option 2: The programmatic path (for complex, bespoke rules)
When dealing with complex, bespoke business logic, you have a programmatic equivalent: Service Extensions, a powerful edge compute capability that allows you to inject custom code (written in Rust, C++ or Go) directly into the load balancer's data path. This approach gives you flexibility in a modern, managed, and high-performance framework.

This flowchart helps you decide the appropriate Google Cloud feature for each configuration

Phase 3: Test and validate

Once you’ve chosen the appropriate path for your configurations, you are ready to deploy your new Application Load Balancer configuration in a staging environment that mirrors your production setup. Thoroughly test all application functionality, paying close attention to the migrated logic. Use a combination of automated testing and manual QA to validate the redirects, security policies, and that the custom Service Extensions logic are behaving as expected.

Phase 4: Phased cutover (canary deployment)

Don't flip a single switch for all your traffic; instead, implement a phased migration strategy. Start the transitioning process by routing a small percentage of production traffic (e.g., 5-10%) to your new Google Cloud load balancer. During this initial period, be sure to monitor key metrics like latency, error rates, and application performance. As you gain confidence, you can progressively increase the percentage of traffic routed to the Application Load Balancer. Always have a clear rollback plan to revert back to the legacy infrastructure in the event you encounter critical issues.

Best practices for a smooth migration

Drawing from our practical experience, we have compiled the following recommendations to assist you in planning your load balancer migrations.

Analyze first, migrate second: A thorough analysis of your existing configurations is the most critical step. Don't "lift and shift" logic that is no longer needed.
Prefer declarative: Always default to Google Cloud's managed, declarative features (URL Maps, Cloud Armor) first. They are simpler, more scalable, and require less maintenance.
Use Service Extensions strategically: Reserve Service Extensions for the complex, bespoke business logic that declarative features cannot handle.
Monitor everything: Continuously monitor both your existing load balancers and Google Cloud load balancers during the migration. Watch key metrics like traffic volume, latency, and error rates to detect and address issues instantly.
Train your team: Ensure your team is trained on Cloud Load Balancing concepts. This will empower them to effectively operate and maintain the new infrastructure.

Migrating from the existing on-premises load balancer infrastructure is more than just a technical task, it's an opportunity to modernize your application delivery. By thoughtfully mapping your current load balancing configurations and capabilities to either declarative Application Load Balancer features or programmatic Service Extensions, you can build a more scalable, resilient, and cost-effective infrastructure destined for future demands.

To get started, review the Application Load Balancer and Service Extensions features and advanced capabilities to come up with the right design for your application. For more guidance and complex use cases, contact your Google Cloud team.

Experimenting with GPUs: GKE managed DRANET and Inference Gateway AI Deployment

Wed, 08 Apr 2026 10:05:00 +0000

Building and serving models on infrastructure is a strong use case for businesses. In Google Cloud, you have the ability to design your AI infrastructure to suit your workloads. Recently, I experimented with Google Kubernetes Engine (GKE) managed DRANET while deploying a model for inference with NVIDIA B200 GPUs on GKE. In this blog, we will explore this setup in easy to follow steps.

What is DRANET

Dynamic Resource Allocation (DRA) is a feature that lets you request and share resources among Pods. DRANET allows you to request and allocate networking resources for your Pods, including network interfaces that support TPUs & Remote Direct Memory Access (RDMA). In my case, the use of high-end GPUs.

How GPU RDMA VPC works

The RDMA network is set up as an isolated VPC, which is regional and assigned a network profile type. In this case, the network profile type is RoCEv2. This VPC is dedicated for GPU-to-GPU communication. The GPU VM families have RDMA capable NICs that connect to the RDMA VPC. The GPUs communicate between multiple nodes via this low latency, high speed rail aligned setup.

Design pattern example

Our aim was to deploy a LLM model (Deepseek) onto a GKE cluster with A4 nodes that support 8 B200 GPUs and serve it via GKE Inference gateway privately. To set up an AI Hypercomputer GKE cluster, you can use the Cluster Toolkit, but in my case, I wanted to test the GKE managed DRANET dynamic setup of the networking that supports RDMA for the GPU communication.

This design utilizes the following services to provide an end-to-end solution:

VPC: Total of 3 VPC. One VPC manually created, two created automatically by GKE managed DRANET, one standard and one for RDMA.
GKE: To deploy the workload.
GKE Inference gateway: To expose the workload internally using a regional internal Application Load Balancers type gke-l7-rilb.
A4 VM’s: These support RoCEv2 with NVIDIA B200 GPU.

Putting it together

To get access to the A4 VM a future reservation was used. This is linked to a specific zone.

Begin: Set up the environment

Create a standard VPC, with firewall rules and subnet in the same zone as the reservation.
Create a proxy-only subnet this will be used with the Internal regional application load balancer attached to the GKE inference gateway

Next: Create a standard GKE cluster node and default node pool.

code_block: <ListValue: [StructValue([('code', 'gcloud container clusters create $CLUSTER_NAME \\\r\n --location=$ZONE \\\r\n --num-nodes=1 \\\r\n --machine-type=e2-standard-16 \\\r\n --network=${GVNIC_NETWORK_PREFIX}-main \\\r\n --subnetwork=${GVNIC_NETWORK_PREFIX}-sub \\\r\n --release-channel rapid \\\r\n --enable-dataplane-v2 \\\r\n --enable-ip-alias \\\r\n --addons=HttpLoadBalancing,RayOperator \\\r\n --gateway-api=standard \\\r\n --enable-ray-cluster-logging \\\r\n --enable-ray-cluster-monitoring \\\r\n --enable-managed-prometheus \\\r\n --enable-dataplane-v2-metrics \\\r\n --monitoring=SYSTEM'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd0104280>)])]>

Once that is complete you can connect to your cluster:

code_block: <ListValue: [StructValue([('code', 'gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE --project $PROJECT'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd0104e50>)])]>

Create a GPU node pool (this example uses, A4 VM with reservation) and additionals flags:

---accelerator-network-profile=auto (GKE automatically adds the gke.networks.io/accelerator-network-profile: auto label to the nodes)

--node-labels=cloud.google.com/gke-networking-dra-driver=true (Enables DRA for high-performance networking)

code_block: <ListValue: [StructValue([('code', 'gcloud beta container node-pools create $NODE_POOL_NAME \\\r\n --cluster $CLUSTER_NAME \\\r\n --location $ZONE \\\r\n --node-locations $ZONE \\\r\n --machine-type a4-highgpu-8g \\\r\n --accelerator type=nvidia-b200,count=8,gpu-driver-version=latest \\\r\n --enable-autoscaling --num-nodes=1 --total-min-nodes=1 --total-max-nodes=3 \\\r\n --reservation-affinity=specific \\\r\n--reservation=projects/$PROJECT/reservations/$RESERVATION_NAME/reservationBlocks/$BLOCK_NAME \\\r\n --accelerator-network-profile=auto \\\r\n--node-labels=cloud.google.com/gke-networking-dra-driver=true'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd36443d0>)])]>

Next: Create a ResourceClaimTemplate, which will be used to attach the networking resources to your deployments. The deviceClassName: mrdma.google.com is used for GPU workloads:

code_block: <ListValue: [StructValue([('code', 'apiVersion: resource.k8s.io/v1\r\nkind: ResourceClaimTemplate\r\nmetadata:\r\n name: all-mrdma\r\nspec:\r\n spec:\r\n devices:\r\n requests:\r\n - name: req-mrdma\r\n exactly:\r\n deviceClassName: mrdma.google.com\r\n allocationMode: All'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd3339c10>)])]>

Deploy model and inference

Now that a cluster and node pool is setup, we can deploy a model and serve it via Inference gateway. In my experiment I used DeepSeek but this could be any model.

Deploy model and services

The nodeSelector: gke.networks.io/accelerator-network-profile: auto is used to assign to the GPU node
The resourceClaims: attaches the resource we defined for networking

Create a secret (I used Hugging Face token):

code_block: <ListValue: [StructValue([('code', 'kubectl create secret generic hf-secret \\\r\n --from-literal=hf_token=${HF_TOKEN}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd1746f10>)])]>

Deployment

code_block: <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: deepseek-v3-1-deploy\r\nspec:\r\n replicas: 1\r\n selector:\r\n matchLabels:\r\n app: deepseek-v3-1\r\n template:\r\n metadata:\r\n labels:\r\n app: deepseek-v3-1\r\n ai.gke.io/model: deepseek-v3-1\r\n ai.gke.io/inference-server: vllm\r\n examples.ai.gke.io/source: user-guide\r\n spec:\r\n containers:\r\n - name: vllm-inference\r\n image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250819_0916_RC01\r\n resources:\r\n requests:\r\n cpu: "190"\r\n memory: "1800Gi"\r\n ephemeral-storage: "1Ti"\r\n nvidia.com/gpu: "8"\r\n limits:\r\n cpu: "190"\r\n memory: "1800Gi"\r\n ephemeral-storage: "1Ti"\r\n nvidia.com/gpu: "8"\r\n claims:\r\n - name: rdma-claim\r\n command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]\r\n args:\r\n - --model=$(MODEL_ID)\r\n - --tensor-parallel-size=8\r\n - --host=0.0.0.0\r\n - --port=8000\r\n - --max-model-len=32768\r\n - --max-num-seqs=32\r\n - --gpu-memory-utilization=0.90\r\n - --enable-chunked-prefill\r\n - --enforce-eager\r\n - --trust-remote-code\r\n env:\r\n - name: MODEL_ID\r\n value: deepseek-ai/DeepSeek-V3.1\r\n - name: HUGGING_FACE_HUB_TOKEN\r\n valueFrom:\r\n secretKeyRef:\r\n name: hf-secret\r\n key: hf_token\r\n volumeMounts:\r\n - mountPath: /dev/shm\r\n name: dshm\r\n livenessProbe:\r\n httpGet:\r\n path: /health\r\n port: 8000\r\n initialDelaySeconds: 1800\r\n periodSeconds: 10\r\n readinessProbe:\r\n httpGet:\r\n path: /health\r\n port: 8000\r\n initialDelaySeconds: 1800\r\n periodSeconds: 5\r\n volumes:\r\n - name: dshm\r\n emptyDir:\r\n medium: Memory\r\n nodeSelector:\r\n gke.networks.io/accelerator-network-profile: auto\r\n resourceClaims:\r\n - name: rdma-claim\r\n resourceClaimTemplateName: all-mrdma\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n name: deepseek-v3-1-service\r\nspec:\r\n selector:\r\n app: deepseek-v3-1\r\n type: ClusterIP\r\n ports:\r\n - protocol: TCP\r\n port: 8000\r\n targetPort: 8000\r\n---\r\napiVersion: monitoring.googleapis.com/v1\r\nkind: PodMonitoring\r\nmetadata:\r\n name: deepseek-v3-1-monitoring\r\nspec:\r\n selector:\r\n matchLabels:\r\n app: deepseek-v3-1\r\n endpoints:\r\n - port: 8000\r\n path: /metrics\r\n interval: 30s'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd1746a90>)])]>

Deploy GKE Inference Gateway

This install needed Custom Resource Definitions (CRDs) in your GKE cluster:

For GKE versions 1.34.0-gke.1626000 or later, install only the alpha InferenceObjective CRD:

code_block: <ListValue: [StructValue([('code', 'kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/v1.0.0/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd1746bb0>)])]>

Create Inference pool

code_block: <ListValue: [StructValue([('code', 'helm install deepseek-v3-pool \\\r\n oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool \\\r\n --version v1.0.1 \\\r\n --set inferencePool.modelServers.matchLabels.app=deepseek-v3-1 \\\r\n --set provider.name=gke \\\r\n --set inferenceExtension.monitoring.gke.enabled=true'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd37555b0>)])]>

Create the Gateway, HTTPRoute and InferenceObjective

code_block: <ListValue: [StructValue([('code', '# 1. The Regional Internal Gateway (ILB)\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: Gateway\r\nmetadata:\r\n name: deepseek-v3-gateway\r\n namespace: default\r\nspec:\r\n gatewayClassName: gke-l7-rilb\r\n listeners:\r\n - name: http\r\n protocol: HTTP\r\n port: 80\r\n allowedRoutes:\r\n namespaces:\r\n from: Same\r\n---\r\n# 2. The HTTPRoute (Routing to the Pool)\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: deepseek-v3-route\r\n namespace: default\r\nspec:\r\n parentRefs:\r\n - name: deepseek-v3-gateway\r\n rules:\r\n - matches:\r\n - path:\r\n type: PathPrefix\r\n value: /\r\n backendRefs:\r\n - group: inference.networking.k8s.io\r\n kind: InferencePool\r\n name: deepseek-v3-pool\r\n---\r\n# 3. The Inference Objective (Performance Logic)\r\napiVersion: inference.networking.x-k8s.io/v1alpha2\r\nkind: InferenceObjective\r\nmetadata:\r\n name: deepseek-v3-objective\r\n namespace: default\r\nspec:\r\n poolRef:\r\n name: deepseek-v3-pool'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd37550a0>)])]>

Once complete, you can create a test VM in your main VPC and make a call to the IP address of the GKE Inference Gateway:

code_block: <ListValue: [StructValue([('code', 'curl -N -s -X POST "http://$GATEWAY_IP/v1/chat/completions" \\\r\n -H "Content-Type: application/json" \\\r\n -d \'{\r\n "model": "deepseek-ai/DeepSeek-V3.1",\r\n "messages": [{"role": "user", "content": "Box A: red. Box B: blue. Box C: empty. Move A to C, Move B to A, Swap B and C. Where is red?"}],\r\n "stream": true\r\n }\' | stdbuf -oL grep "data: " | sed -u \'s/^data: //\' | grep -v "\\[DONE\\]" | \\\r\n jq --unbuffered -rj \'.choices[0].delta | (.reasoning_content // .reasoning // .content // empty)\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd3755490>)])]>

Next Steps

Take a deeper dive into GKE managed DRANET and GKE Inference Gateway, review the following.

Blog: DRA: A new era of Kubernetes device management with Dynamic Resource Allocation
Document set: DRANET
Documentation: AI Hypercomputer

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

See beyond the IP and secure URLs with Google Cloud NGFW

Tue, 07 Apr 2026 17:30:00 +0000

In a cloud-first world, traditional IP-based defenses are no longer enough to protect your perimeter. As services migrate to shared infrastructure and content delivery networks, relying on static IP addresses and FQDNs can create security gaps.

Because single IP addresses can host multiple services, and IPs addresses can change frequently, we are introducing domain filtering with a wildcard capability in Cloud Next Generation Firewall (NGFW) Enterprise. This new capability provides increased security and granular policy controls.

Why domain and SNI filtering matters

The Cloud NGFW URL filtering service performs deep inspections of HTTP payloads to secure workloads against threats from both public and internal networks. This service elevates security controls to the application layer and helps restrict access to malicious domains.

Key use cases include:

Granular egress control: This capability enables the precise allowing and blocking of connections based on domain names and SNI information found in egress HTTP(S) messages. By inspecting Layer 7 (L7) headers, it offers significantly finer control than traditional filtering based solely on IP addresses and FQDNs, which can be inefficient when a single IP hosts multiple services.
Control access without decrypting: For organizations that prefer not to perform full TLS decryption on their traffic, Cloud NGFW can still enforce security policies by controlling traffic based on SNI headers provided during the TLS handshake. This allows for effective domain-level filtering while maintaining end-to-end encryption for privacy or compliance reasons.
Reduced operational overhead: Implementing domain-based filtering helps reduce the constant maintenance typically required to track frequently changing IP addresses and DNS records. By focusing on stable domain identities rather than dynamic network attributes, security teams can minimize the manual effort involved in updating firewall rulebases.
Flexible matching: The service utilizes matcher strings within URL lists, supporting limited wildcard domains to define criteria for both domains and subdomains. For example, using a wildcard like *.example.com allows a single filter to cover all associated subdomains, providing a more scalable solution than defining thousands of individual FQDN entries.
Improved security: URL filtering significantly enhances the security posture by protecting against sophisticated flaws like SNI header spoofing. By evaluating L7 headers before allowing access to an application, Cloud NGFW ensures that attackers cannot bypass security controls by simply spoofing lower-layer identifiers.

How Cloud NGFW URL filtering works

The URL filtering service functions by inspecting traffic at L7 using a distributed architecture.

Cloud NGFW URL filtering service

You can get started with URL filtering in three simple steps.

Deploy Cloud NGFW endpoints:

The first step is to create and deploy a Cloud NGFW endpoint in a zone. The NGFW endpoint is an organization level resource. Please ensure you have the right permission before deploying the endpoint.
Once the endpoint is deployed you can associate it to one or more VPCs of your choice.

Create security profiles and security profile groups:

The URL filtering security profile holds the URL filters with matcher strings and an action (allow or deny).
The security profile group acts as a container for these security profiles, which is then referenced by a firewall policy rule. Create URL filtering security profiles with desired URLs, wildcard FQDNs and add them to a security profile group.
Once the security profile group is created, you will need to reference the security profile group in firewall policies.

Policy enforcement:

You enable the service by configuring a hierarchical or global network firewall policy rule using the apply_security_profile_group action, specifying the name of your security profile group.

For more information about configuring a firewall policy rule, see the following:

Getting started

Get started with Cloud NGFW URL filtering by visiting our documentation and codelab.

Envoy: A future-ready foundation for agentic AI networking

Fri, 03 Apr 2026 16:00:00 +0000

In today's agentic AI environments, the network has a new set of responsibilities.

In a traditional application stack, the network mainly moves requests between services. But as discussed in a recent white paper, Cloud Infrastructure in the Agent-Native Era, in an agentic system the network sits in the middle of model calls, tool invocations, agent-to-agent interactions, and policy decisions that can shape what an agent is allowed to do. The rapid proliferation of agents, often built on diverse frameworks, necessitates a consistent enforcement of governance and security across all agentic paths at scale. To achieve this, the enforcement layer must shift from the application level to the underlying infrastructure. That means the network can no longer operate as a blind transport layer. It has to understand more, enforce better, and adapt faster. This shift is precisely where Envoy comes in.

As a high-performance distributed proxy and universal data plane, Envoy is built for massive scale. Trusted by demanding enterprise environments, including Google Cloud, it supports everything from single-service deployments to complex service meshes using Ingress, Egress, and Sidecar patterns. Because of its deep extensibility, robust policy integration, and operational maturity, Envoy is uniquely suited for an era where protocols change quickly and the cost of weak control is steep. For teams building agentic AI, Envoy is more than a concept: it's a practical, production-ready foundation.

Agentic AI changes the networking problem

Agentic workloads still often use HTTP as a transport, but they break some of the assumptions that traditional HTTP intermediaries rely on. Protocols such as Model Context Protocol (MCP) and Agent2agent (A2A) use JSON-RPC or gRPC over HTTP, adding protocol-level phases such as MCP initialization, where client and server exchange their capabilities, on top of standard HTTP request/response semantics. The key aspects of agentic systems that require intermediaries to adapt include:

Diverse enterprise governance imperatives. The primary challenge is satisfying the wide spectrum of non-negotiable enterprise requirements for safety, security, data privacy, and regulatory compliance. These needs often go beyond standard network policies and require deep integration with internal systems, custom logic, and the ability to rapidly adapt to new organizational rules or external regulations. This demands a highly extensible framework where enterprises can plug in their specific governance models.
Policy attributes live inside message bodies, not headers. Unlike traditional web traffic where policy inputs like paths and headers are readily accessible, agentic protocols frequently bury critical attributes (e.g., model names, tool calls, resource IDs) deep within JSON-RPC or gRPC payloads. This shift requires intermediaries to possess the ability to parse and understand message contents to apply context-aware policies.
Handling diverse and evolving protocol characteristics. Agentic protocols are not uniform. Some, like MCP with Streamable HTTP, can introduce stateful interactions requiring session management across distributed proxies (e.g., using Mcp-Session-Id). The need to support such varied behaviors, along with future protocol innovations, reinforces the necessity of an inherently adaptable and extensible networking foundation.

These factors mean enterprises need more than just connectivity. The network must now serve as a central point for enforcing the crucial governance needs mentioned earlier. This includes providing capabilities like centralized security, comprehensive auditability, fine-grained policy enforcement, and dynamic guardrails, all while keeping pace with the rapid evolution of protocols and agent behaviors. Put simply, agentic AI transforms the network from a mere transit path into a critical control point.

Why Envoy fits this shift

Envoy is a strong fit for agentic AI networking for three reasons. Envoy is:

Battle-tested. Enterprises already rely on Envoy in high-scale, security-sensitive environments, making it a credible platform to anchor a new generation of traffic management and policy enforcement.
Extensible. Envoy can be extended through native filters, Rust modules, WebAssembly (Wasm) modules, and external processing patterns. That gives platform teams room to adopt new protocols without having to rebuild their networking layer every time the ecosystem changes.
Operationally useful today. Envoy already acts as a gateway, enforcement point, observability layer, and integration surface for control planes. That makes it a practical choice for organizations that need to move now, not after the standards settle.

Building on these core strengths, Envoy has introduced specific architectural advancements to meet the unique demands of agentic networking:

1. Envoy understands agent traffic

The first requirement for agentic networking is simple: The gateway needs to understand what the agent is actually trying to do.

That’s harder than it sounds. In protocols such as MCP, A2A, and OpenAI-style APIs, important policy signals may live inside the request body. Traditional HTTP proxies are optimized to treat bodies as opaque byte streams. That design is efficient, but it limits what the proxy can enforce. For protocols that use JSON messages, a proxy may need to buffer the entire request body to locate attribute values needed for policy application — especially when those attributes appear at the end of the JSON message. Business logic specific to gen AI protocols, such as rate limiting based on consumed tokens, may also require parsing server responses.

Envoy addresses this by deframing protocol messages carried over HTTP and exposing useful attributes to the rest of the filter chain. The extensibility model for gen AI protocols was guided by two goals:

Easy reuse of existing HTTP extensions that work with gen AI protocols out of the box, such as RBAC or tracers.
Easy access to deframed messages for gen-AI-specific extensions, so that developers can focus on gen AI business logic without needing to deal with HTTP or JSON envelopes.

Based on these goals, new extensions for gen AI protocols are still built as HTTP extensions and configured in the HTTP filter chain. This provides flexibility to mix HTTP-native business logic, such as OAuth or mTLS authorization, with gen AI protocol logic in a single chain. A deframing extension parses the protocol messages carried by HTTP and provides an ambient context with extracted attributes, or even the entirety of parsed messages, to downstream extensions via well-known filter state and metadata values.

Instead of forcing every policy component to parse JSON envelopes or protocol-specific message formats on its own, Envoy makes those attributes available as structured metadata. Once the gateway has deframed protocol messages, existing Envoy extensions such as ext_authz or RBAC can read protocol properties to evaluate policies using protocol-specific attributes such as tool names for MCP, message attributes for A2A, or model names for OpenAI.

Access logs can include message attributes for enhanced monitoring and auditing. The protocol attributes are also available to the Common Expression Language (CEL) runtime, simplifying creation of complex policy expressions in RBAC or composite extensions.

Buffering and memory management
Envoy is designed to use as little memory as possible when proxying HTTP requests. However, parsing agentic protocols may require an arbitrary amount of buffer space, especially when extensions require the entire message to be in memory. The flexibility of allowing extensions to use larger buffers needs to be balanced with adequate protection from memory exhaustion, especially in the presence of untrusted traffic.

To achieve this, Envoy now provides a per-request buffer size limit. Buffers that hold request data are also integrated with the overload manager, enabling a full range of protective actions under memory pressure, such as reducing idle timeouts or resetting requests that consume the most memory for an extended duration. These changes pave the way for Envoy to serve as a gateway and policy-enforcement point for gen AI protocols without compromising its resource efficiency.

2. Envoy enforces policy on things that matter

Understanding traffic is only useful if the gateway can act on it.

In agentic systems, policy is not just about which service an agent can reach. It’s about which tools an agent can call, which models it can use, what identity it presents, how much it can consume, and what kinds of outputs require additional controls. Those are higher-value decisions than simple layer-4 or path-based controls, and they are exactly the kinds of controls enterprises care about when agents are allowed to take action on their behalf.

Envoy is well-positioned here because it can combine transport-level security with application-aware policy enforcement. Teams can authenticate workloads with mTLS and SPIFFE identities, then enforce protocol-specific rules with RBAC, external authorization, external processing, access logging, and CEL-based policy expressions.

This capability is crucial because it lets platform teams decouple agent development from enforcement. Developers can focus on building useful agents, while operators enforce a consistent zero-trust posture at the network layer, even as tools, models, and protocols continue to change.A prime example of this zero-trust decoupling is the critical "user-behind-agent" scenario, where an AI agent must execute tasks on a human user's behalf. Traditionally, handing user credentials directly to an application introduces severe security risks — if the agent is compromised or manipulated via prompt injection, an attacker could exfiltrate or misuse those credentials. By offloading identity management to Envoy, the proxy can automatically insert user delegation tokens into outbound requests at the infrastructure layer. Because the agent never directly holds the sensitive credential, the risk of a compromised agent misusing or leaking the token is completely neutralized, ensuring actions remain strictly bound to the user's actual permissions.

Case study: Restricting an agent to specific GitHub MCP tools
Consider an agent that triages GitHub issues.

The GitHub MCP server may expose dozens of tools, but the agent may only need a small read-only subset, such as list_issues, get_issue, and get_issue_comments. In most enterprises, that difference matters. A useful agent should not automatically become an unrestricted one.

With Envoy in front of the MCP server, the gateway can verify the agent identity using SPIFFE during the mTLS handshake, parse the MCP message via the deframing filter, extract the requested method and tool name, and enforce a policy that allows only the approved tool calls for that specific agent identity. RBAC uses metadata created by the MCP deframing filter to check the method and tool name in the MCP message:

code_block: <ListValue: [StructValue([('code', 'envoy.filters.http.rbac:\r\n "@type": type.googleapis.com/envoy.extensions.filters.http.rbac.v3.RBACPerRoute\r\n rbac:\r\n rules:\r\n policies:\r\n github-issue-reader-policy:\r\n permissions:\r\n - and_rules:\r\n rules:\r\n - sourced_metadata:\r\n metadata_matcher:\r\n filter: envoy.http.filters.mcp\r\n path: [{ key: "method" }]\r\n value: { string_match: { exact: "tools/call" } }\r\n - sourced_metadata:\r\n metadata_matcher:\r\n filter: envoy.http.filters.mcp\r\n path: [{ key: "params" }, { key: "name" }]\r\n value:\r\n or_match:\r\n value_matchers:\r\n - string_match: { exact: "list_issues" }\r\n - string_match: { exact: "get_issue" }\r\n - string_match: { exact: "get_issue_comments" }\r\n principals:\r\n - authenticated:\r\n principal_name:\r\n exact: "spiffe://cluster.local/ns/github-agents/sa/issue-triage-agent"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd1aa9fa0>)])]>

That’s the real value: Policy is enforced centrally, close to the traffic, and in terms that match the agent's actual behavior.

Beyond static rules: External authorization
A complex compliance policy that can’t be expressed using RBAC rules can be implemented in an external authorization service using the ext_authz protocol. Envoy provides MCP message attributes along with HTTP headers in the context of the ext_authz RPC. It can also forward the agent's SPIFFE identity from the peer certificate:

code_block: <ListValue: [StructValue([('code', 'http_filters:\r\n - name: envoy.filters.http.ext_authz\r\n typed_config:\r\n "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz\r\n grpc_service:\r\n envoy_grpc:\r\n cluster_name: auth_service_cluster\r\n include_peer_certificate: true\r\n metadata_context_namespaces:\r\n - envoy.http.filters.mcp'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd1aa92e0>)])]>

This allows external services to make authorization decisions based on the full combination of agent identity, MCP method, tool name, and any other protocol attributes, without the agent or the MCP server needing to be aware of the policy layer.

Protocol-native error responses
When Envoy denies a request, the error should be meaningful to the calling agent. For MCP traffic, Envoy can use local_reply_config to map HTTP error codes to appropriate JSON-RPC error responses. For example, a 403 Forbidden can be mapped to a JSON-RPC response with isError: true and a human-readable message, ensuring the agent receives a protocol-appropriate denial rather than an opaque HTTP status code.

3. Envoy supports stateful agent interactions at scale

Not all agent traffic is stateless. Some protocols, including Streamable HTTP for MCP, can rely on session-oriented behavior. That creates a new challenge for intermediaries, especially when traffic flows through multiple gateway instances to achieve scale and resilience. An MCP session effectively binds the agent to the server that established it, and all intermediaries need to know this to direct incoming MCP connections to the correct server.

If a session is established on one backend, later requests in that conversation need to reach the right destination. That sounds straightforward for a single-proxy deployment, but it becomes more complicated in horizontally scaled systems, where multiple Envoy instances may handle different requests from the same agent.

Passthrough gateway
In the simpler passthrough mode, Envoy establishes one upstream connection for each downstream connection. Its primary use is enforcing centralized policies, such as client authorization, RBAC, rate limiting, and authentication, for external MCP servers. The session state transferred between intermediaries needs to include only the address of the server that established the session over the initial HTTP connection, so that all session-related requests are directed to that server.

Session state transfer between different Envoy instances is achieved by appending encoded session state to the MCP session ID provided by the MCP server. Envoy removes the session-state suffix from the session ID before forwarding the request to the destination MCP server. This session stickiness is enabled by configuring Envoy's envoy.http.stateful_session.envelope extension.

Aggregating gateway
In aggregating mode, Envoy acts as a single MCP server by aggregating the capabilities, tools, and resources of multiple backend MCP servers. In addition to enforcing policies, this simplifies agent configuration and unifies policy application for multiple MCP servers.

Session management in this mode is more complicated because the session state also needs to include mapping from tools and resources to the server addresses and session IDs that advertised them. The session ID that Envoy provides to the agent is created before tools or resources are known, and the mapping has to be established later, after the MCP initialization phases between Envoy and the backend MCP servers are complete.

One approach, currently implemented in Envoy, is to combine the name of a tool or resource with the identifier and session ID of its origin server. The exact tool or resource names are typically not meaningful to the agent and can carry this additional provenance information. If unmodified tool or resource names are desirable, another approach is to use an Envoy instance that does not have the mapping, and then recreate it by issuing a tools/list command before calling a specific tool. This trades latency for the complexity of deploying an external global store of MCP sessions, and is currently in planning based on user feedback.

This matters because it moves Envoy beyond simple traffic forwarding. It allows Envoy to serve as a reliable intermediary for real agent workflows, including those spanning multiple requests, tools, and backends.

4. Envoy supports agent discovery

Envoy is adding support for the A2A protocol and agent discovery via a well-known AgentCard endpoint. AgentCard, a JSON document with agent capabilities, enables discovery and multi-agent coordination by advertising skills, authentication requirements, and service endpoints. The AgentCard can be provisioned statically via direct response configuration or obtained from a centralized agent registry server via xDS or ext_proc APIs. A more detailed description of A2A implementation and agent discovery will be published in a forthcoming blog post.

5. Envoy is a complete solution for agentic networking challenges

Building on the same foundation that enabled policy application for MCP protocol in demanding deployments, Envoy is adding support for OpenAI and transcoding of agentic protocols into RESTful HTTP APIs. This transcoding capability simplifies the integration of gen AI agents with existing RESTful applications, with out-of-the-box support for OpenAPI-based applications and custom options via dynamic modules or Wasm extensions. In addition to transcoding, Envoy is being strengthened in critical areas for production readiness, such as advanced policy applications like quota management, comprehensive telemetry adhering to OpenTelemetry semantic conventions for generative AI systems, and integrated guardrails for secure agent operation.

Guardrails for safe agents
The next significant area of investment is centralized management and application of guardrails for all agentic traffic. Integrating policy enforcement points with external guardrails presently requires bespoke implementation and this problem area is ripe for standardization.

Control planes make this operational

The gateway is only part of the story. To achieve this policy management and rollout at scale, a separate control plane is required to dynamically configure the data plane using the xDS protocol, also known as the universal data plane API.

That is where control planes become important. Cloud Service Mesh, alongside open-source projects such as Envoy AI Gateway and kube-agentic-networking, uses Envoy as the data plane while giving operators higher-level ways to define and manage policy for agentic workloads.

This combination is powerful: Envoy provides the enforcement and extensibility in the traffic path, while control planes provide the operating model teams need to deploy that capability consistently.

Why this matters now

The shift towards agentic systems and gen AI protocols such as MCP, A2A, and OpenAI necessitates an evolution in network intermediaries. The primary complexities Envoy addresses include:

Deep protocol inspection. Protocol deframing extensions extract policy-relevant attributes (tool names, model names, resource paths) from the body of HTTP requests, enabling precise policy enforcement where traditional proxies would only see an opaque byte stream.
Fine-grained policy enforcement. By exposing these internal attributes, existing Envoy extensions like RBAC and ext_authz can evaluate policies based on protocol-specific criteria. This allows network operators to enforce a unified, zero-trust security posture, ensuring agents comply with access policies for specific tools or resources.
Stateful transport management. Envoy supports managing session state for the Streamable HTTP transport used by MCP, enabling robust deployments in both passthrough and aggregating gateway modes, even across a fleet of intermediaries.

Agentic AI protocols are still in their early stages, and the protocol landscape will continue to evolve. That’s exactly why the networking layer needs to be adaptable. Enterprises should not have to rebuild their security and traffic infrastructure every time a new agent framework, transport pattern, or tool protocol gains traction. They need a foundation that can absorb change without sacrificing control.

Envoy brings together three qualities that are hard to get in one place: proven production maturity, deep extensibility, and growing protocol awareness for agentic workloads. By leveraging Envoy as an agent gateway, organizations can decouple security and policy enforcement from agent development code.

That makes Envoy more than just a proxy that happens to handle AI traffic. It makes Envoy a future-ready foundation for agentic AI networking.

^{Special thanks to the additional co-authors of this blog: Boteng Yao, Software Engineer, Google and Tianyu Xia, Software Engineer, Google and Sisira Narayana, Sr Product Manager, Google.}

Introducing multi-cluster GKE Inference Gateway: Scale AI workloads around the world

Tue, 17 Mar 2026 16:00:00 +0000

The world of artificial intelligence is moving fast, and so is the need to serve models reliably and at scale. Today, we're thrilled to announce the preview of multi-cluster GKE Inference Gateway to enhance the scalability, resilience, and efficiency of your AI/ML inference workloads across multiple Google Kubernetes Engine (GKE) clusters — even those spanning different Google Cloud regions.

Built as an extension of the GKE Gateway API, the multi-cluster Inference Gateway leverages the power of multi-cluster Gateways to provide intelligent, model-aware load balancing for your most demanding AI applications.

Why multi-cluster for AI inference?

As AI models grow in complexity and users become more global, single-cluster deployments can face limitations:

Availability risks: Regional outages or cluster maintenance can impact service.
Scalability caps: Hitting hardware limits (GPUs/TPUs) within a single cluster or region.
Resource silos: Underutilized accelerator capacity in one cluster can’t be used by another
Latency: Users far from your serving cluster may experience higher latency

The multi-cluster GKE Inference Gateway addresses these challenges head-on, providing a variety of features and benefits:

Enhanced high reliability and fault tolerance: Intelligently route traffic across multiple GKE clusters, including across different regions. If one cluster or region experiences issues, traffic is automatically re-routed, minimizing downtime.
Improved scalability and optimized resource usage: Pool and leverage GPU/TPU resources from various clusters. Handle demand spikes by bursting beyond the capacity of a single cluster and efficiently utilize available accelerators across your entire fleet.
Globally optimized, model-aware routing: The Inference Gateway can make smart routing decisions using advanced signals. With GCPBackendPolicy, you can configure load balancing based on real-time custom metrics, such as the model server's KV cache utilization metric, so that requests are sent to the best-equipped backend instance. Other modes like in-flight request limits are also supported.
Simplified operations: Manage traffic to a globally distributed AI service through a single Inference Gateway configuration in a dedicated GKE "config cluster," while your models run in multiple "target clusters."

How it works

In GKE Inference Gateway there are two foundational resources, InferencePool and InferenceObjective. An InferencePool acts as a resource group for pods that share the same compute hardware (like GPUs or TPUs) and model configuration, helping to ensure scalable and high-availability serving. An InferenceObjective defines the specific model names and assigns serving priorities, allowing Inference Gateway to intelligently route traffic and multiplex latency-sensitive tasks alongside less urgent workloads.

With this release, the system uses Kubernetes Custom Resources to manage your distributed inference service. InferencePool resources in each "target cluster" group model-server backends. These backends are exported and become visible as GCPInferencePoolImport resources in the "config cluster." Standard Gateway and HTTPRoute resources in the config cluster define the entry point and routing rules, directing traffic to these imported pools. Fine-grained load-balancing behaviors, such as using CUSTOM_METRICS or IN_FLIGHT requests, are configured using the GCPBackendPolicy resource attached to GCPInferencePoolImport.

This architecture enables use cases like global low-latency serving, disaster recovery, capacity bursting, and efficient use of heterogeneous hardware.

For more information about GKE Inference Gateway core concepts check out our guide.

Get started today

As you scale your AI inference serving workloads to more users in more places, we're excited for you to try multi-cluster GKE Inference Gateway. To learn more and get started, check out the documentation:

The AI-native core: Highly resilient telco architecture using Google Kubernetes Engine

Wed, 04 Mar 2026 08:00:00 +0000

The telecommunications industry has reached a critical tipping point. Traditional, on-premises-heavy data center models are struggling under the weight of escalating infrastructure costs and an under utilization due to availability and compliance requirements. But the AI era demands exponential scale and beyond-nines reliability. The question for operators is no longer if they should modernize, but which architectural path will help them do that fastest.

Modernization isn't a "rip and replace" event; it’s a strategic choice. Today, we’re showcasing how Google Kubernetes Engine (GKE) can serve as a high-performance foundation for two versatile deployment strategies: cloud-centric evolution and strategic hybrid modernization.

The two paths to network modernization

Every operator has a unique appetite for risk, regulatory landscape, and investment base, with some prioritizing agility, and others emphasizing the need for local control. You can use GKE to support both approaches:

1. Cloud- centric modernization: Agility at scale

This path is for operators looking to fully harness the cloud's elasticity. Whether you’re migrating your own containerized network functions (CNFs) or building a cloud-native service like Ericsson-on-Demand, the goal is the same: move the heavy lifting to Google Cloud.

The benefit: By running mission-critical workloads like Voice Core or Policy Control Functions on Google's global fiber backbone, operators can scale instantly for peak events and move toward "zero-human-touch" operations.
The economics: Transition from heavy upfront CAPEX to a "pay-as-you-grow" model. You no longer need to over-provision hardware that sits idle; the cloud absorbs the bursts for you.
Time to market: Accelerate time to market for new services like fixed wireless access, IoT and private 5G.

2. Strategic hybrid modernization: Cloud agility, local control

For many telcos, a hybrid approach offers a better balance. Here, operators can selectively move agile control plane components and data analytics to the cloud while keeping latency-sensitive user-plane functions on premises or at the edge.

The benefit: Optimize for ultra-low latency and meet strict data sovereignty requirements by keeping data plane traffic local, while still gaining the AI-driven insights and orchestration power of the cloud.
The versatility: Using GKE, you can run your control plane workloads in the cloud and data plane services directly in your own data centers or at the network edge, enjoying a unified operational model across your environments.

Engineering the "telco-grade" foundation

Today, we are proud to showcase how GKE has evolved into the industry's most specialized platform for containerized network functions (CNFs), backed by massive momentum from operators and equipment vendor partners.

It’s achieved this thanks to a variety of capabilities.

Connectivity and isolation

Standard Kubernetes wasn't designed for the complex traffic separation that telcos require. GKE bridges this gap with:

Multi-networking API: A native Kubernetes way to manage multiple interfaces per Pod, bringing standard Network Policies to every interface.
Simulated L2 networking: A "migration superpower" that allows legacy applications to maintain their Layer-2 operational model while running on a modern cloud-native stack.
The telco CNI: Support for Multus, IPvlan, and Whereabouts on specialized Ubuntu images. This allows operators to isolate management, control, and user planes with surgical precision.

Persistent reachability

In a world of ephemeral containers, telco functions need stability. GKE enables this through:

GKE IP route: We’ve integrated equal-cost multi-path (ECMP)-like functionality directly into the GKE dataplane. If a workload fails, it is automatically and rapidly removed from the service path, providing high availability without complex external router configurations.
Persistent IP: GKE provides the static IP support that 5G core functions require for consistent reachability across their lifecycle without NAT that isn't available on standard Kubernetes.

Sub-second convergence

For telcos, every millisecond of downtime is a lost connection. GKE’s dataplane via HA Policy is optimized for near-zero downtime with ultra-fast failure detection and convergence, offering operators the choice between self-managed recovery or fully Google-managed failure detection.

Shifting from "saving" to "solving" with AI

For operators, the ultimate goal of modernization is to transition to an autonomous network. By running the core network functions on a platform adjacent to Google Cloud AI and data platforms such as Vertex AI and BigQuery, they can turn telemetry into actionable changes to optimize the network. Some use cases and benefits that modernization enables include:

Predictive AIOps: Use AI to identify performance degradation and trigger automated healing before a call ever drops. Use the cloud for on-demand burst capacity during sporting events or service launches. Or use the data from your GKE-hosted 5G core to fuel AI-powered automation that anticipates issues before they impact subscribers.
Intent-driven programmability: Shift from expensive, reactive operations and cut down new deployment setup times from several weeks to a couple of hours.
Monetize insights: Leverage AI on cloud-native data to identify and capture entirely new revenue opportunities in addition to rightsizing your networks.

Your journey, your terms

The future of telco is intelligent, resilient, and incredibly flexible. Whether you are taking your first step into a hybrid deployment or launching a fully cloud-hosted core, Google Cloud is your strategic partner.

Join us at MWC: Visit booth #2H40 in Hall 2 to see these solutions in action, including live demonstrations of mobile core running on GKE.

Designing private network connectivity for RAG-capable gen AI apps

Mon, 02 Mar 2026 17:00:00 +0000

The flexibility of Google Cloud allows enterprises to build secure and reliable architecture for their AI workloads. In this blog we will look at a reference architecture for private connectivity for retrieval-augmented generation (RAG)-capable generative AI applications. This architecture is for scenarios where communications of the overall system must use private IP addresses and must not traverse the internet.

The power of RAG

RAG is a powerful technique used to optimize the output of large language models (LLMs) by grounding them in specific, authoritative knowledge bases outside of their original training data. RAG allows an application to retrieve relevant information from your documents, datasources, or databases in real time. This retrieved context is then provided to the model alongside the user’s query, helping to ensure that the AI’s responses are accurate, verifiable, and highly relevant to your business. This improves the quality of responses and reduces hallucinations.

This approach is helpful because it allows you to direct generative AI to use a designated source of truth, rather than relying solely on the model's pre-existing knowledge, and without needing to retrain or fine-tune the model itself.

Design pattern example

To understand how to think about setting up your network for private connectivity for a RAG application in a regional design, let's look at the design pattern.

The setup comprises an external network (on-prem and other clouds) and Google Cloud environments consisting of a routing project, a Shared VPC host project for RAG, and three specialized service projects: data ingestion, serving, and frontend.

This design utilizes the following services to provide an end-to-end solution:

Cloud Interconnect or Cloud VPN: To securely connect from your on-premises or other clouds to the routing VPC network
Network Connectivity Center: Used as an orchestration framework to manage connectivity between the routing VPC network and the RAG VPC network via VPC spokes and hybrid spokes
Cloud Router: In the routing project, facilitates dynamic BGP route exchange between the external network and Google Cloud
Private Service Connect: Provides a private endpoint in the routing VPC network to reach the Cloud Storage bucket for data ingestion without traversing the public internet
Shared VPC: Host project architecture that allows multiple service projects to use a common, centralized VPC network
Google Cloud Armor and Application Load Balancer: Placed in the frontend service project to provide security and traffic management for user interaction
VPC Service Controls: Creates a managed security perimeter around all resources to mitigate data exfiltration risks

The traffic flow

RAG population flow

In the diagram, the green dashed line shows the RAG population flow, which describes how data travels from data engineers to vector storage.

From the external network, data travels over Cloud Interconnect or Cloud VPN.
In the routing projects it uses the Private Service Connect endpoint to get to the Cloud Storage bucket.
From the Cloud Storage bucket in the Data Ingestion service project, the data ingestion subsystem processes the raw data.
The AI model creates vectors from the chunks, returns them to the data ingestion subsystem, which writes them to the RAG datastore in the serving service project.

Inference flow

In the diagram, the orange dashed line shows the inference flow, which describes customer or user requests.

The request travels over Cloud Interconnect or Cloud VPN to the routing VPC network and then over the VPC spoke to the RAG VPC network.
The request reaches the Application Load Balancer protected by Cloud Armor; once allowed, it passes it to the frontend subsystem.
The frontend subsystem forwards the request to the serving subsystem, which augments the prompt with data from the RAG datastore and generates a response via the AI model.
The system generates a response via the AI model, and the grounded response is returned along the same path to the requestor.

Management and routing

In the diagram, the blue dotted lines represent the Network Connectivity Center hybrid and VPC spokes that manage the control plane and route orchestration between the routing network and the RAG VPC network. This ensures that routes learned from the external network are appropriately propagated across the environment.

Please read the entire architecture document Private connectivity for RAG-capable generative AI applications to understand the specific including IAM permissions, VPC Service Controls, and deployment considerations.

Next steps

Take a deeper dive into the Cross-Cloud Network, and other guides about generative AI with RAG:

Document set: Generative AI with RAG
Document: Cross-Cloud Network for distributed applications
Blog: Build Your First ADK Agent Workforce

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

Firefly: Illuminating the path to nanosecond-level clock sync in the data center

Mon, 23 Feb 2026 17:00:00 +0000

From the high-frequency trading floors of Wall Street to orchestrating cloud data centers, the ability to synchronize events with nanosecond accuracy is critical. Yet, achieving this level of temporal precision across thousands of interconnected devices in a modern data center is fraught with challenges like clock drift, network jitter, and path asymmetries. And doing so on cloud-hosted infrastructure has traditionally been impossible, preventing a certain class of applications from running there.

This is where Firefly, a clock synchronization system developed by researchers and engineers at Google, comes in. Firefly isn't just a clock synchronization protocol; it's a software-driven approach that combines theoretical insights and practical engineering to deliver ultra-accurate, scalable, and cost-effective time synchronization on commodity hardware within a demanding data center environment.

The nanosecond race: Why precise timing matters

Precise clock synchronization is the foundation of distributed systems. It is non-negotiable in financial exchanges, where regulatory requirements mandate sub-100µs external synchronization to Coordinated Universal Time, or UTC, and fairness demands sub-10ns internal clock synchronization. In high-frequency trading, a minuscule timing advantage can translate to significant financial gains, making accurate timestamping critical for market integrity. Beyond finance, numerous data center operations, including database consistency, distributed logging, virtual machine management, and network telemetry, rely on accurate temporal ordering of events. And as data centers scale, the need for a robust, scalable synchronization solution becomes even more important.

But achieving nanosecond-level synchronization in a dynamic data center environment is difficult. Several factors conspire to undermine precision:

Clock drift: Crystal oscillators, which are fundamental to all clocks, have inherent imperfections that cause them to gradually deviate over time. Although these deviations were considered minor previously, they are substantial when targeting sub-10ns.
Jitter: Network components such as switches and network interface cards (NICs) introduce unpredictable delays. These delays, often stemming from queuing in network buffers or the intricate processing of packets, can manifest as jitter, disrupting the timing of synchronization messages.
Asymmetry: The network path between two devices is rarely symmetrical. Differences in cable lengths, the number of hops, or the internal workings of network equipment can cause signals to take different amounts of time to travel in opposite directions. This asymmetry can introduce significant errors when estimating one-way delays and clock offsets.
Scalability: As data centers expand to house tens of thousands of servers, any synchronization solution must be able to scale efficiently without becoming a bottleneck or requiring disproportionate resources.
Fault tolerance: In a distributed system, failures are inevitable. A synchronization protocol must be resilient to the loss or misbehavior of individual nodes or network links, so that the overall synchronization accuracy is not compromised.

Firefly: Bridging software and theory

Firefly uses a multi-faceted strategy to tackle these challenges, distinguishing itself from prior synchronization protocols. Its core innovations lie in its architectural design and its theoretical underpinnings.

1. Layered synchronization: Firefly employs a novel layered synchronization technique. Instead of relying on a central clock, which can be a single point of failure or introduce delays, it first establishes tight internal synchronization amongst NICs within the data center. Each NIC in the network constantly communicates with a set of its peers, comparing times and making adjustments. From this "swarm" of devices emerges a highly stable and accurate consensus time that the entire group agrees upon. This internal synchronization is rapid and robust, effectively shielding it from external timing disturbances. Concurrently, Firefly synchronizes the entire swarm to UTC. Decoupling of these two processes is crucial, as it prevents external factors like time-server jitter or drift from directly impacting internal synchronization.

2. Distributed consensus over Random graphs: Unlike traditional hierarchical approaches that can be brittle and susceptible to single points of failure, Firefly uses a distributed consensus algorithm built on a d-regular random graph. This means each NIC communicates with a randomly selected set of 'd' peers. Theoretical analysis, as presented in the Firefly research paper, demonstrates that such random graphs offer significant advantages:

Faster convergence: Random graphs promote a more rapid dissemination of clock information across the network, leading to quicker synchronization.
Scalability: The theoretical bounds show that random graphs can maintain synchronization accuracy even as the size of the network grows, provided the number of peers ('d') scales logarithmically with the total number of nodes.
Resilience to asymmetry: The diverse probing paths inherent in random graphs help to average out and mitigate the impact of path asymmetries.

3. Mitigating jitter and asymmetry in practice: Beyond the theoretical advantages of random graphs, Firefly incorporates practical techniques to further refine accuracy:

RTT filtering: By analyzing round-trip time (RTT) measurements, Firefly can identify and discard probe samples that are likely affected by queuing jitter, thereby improving the accuracy of delay estimations.
Path profiling: Firefly actively probes network paths to identify and favor those with minimal asymmetry. This proactive approach helps to select the most reliable paths for synchronization.
Leveraging hardware: Where available, Firefly can utilize features like Transparent Clock (TC) in network switches to accurately account for in-switch delays, further reducing measurement error.

4. Robustness and fault tolerance: Firefly’s use of distributed consensus, combined with its averaging mechanisms, makes it inherently resilient to failures. By not relying on a single time server or a fixed hierarchical structure, the system can gracefully handle the loss or misbehavior of individual nodes.

Performance in the real world

The results discussed in our Firefly research paper are compelling:

Internal synchronization: Firefly consistently achieves sub-10ns NIC-to-NIC synchronization when used in conjunction with Google's latest data center fabric technology. This can be used to determine order of events like packets, logs, remote procedure calls (RPCs) across machines.
External synchronization: The system also delivers significantly better synchronization to UTC than the 100µs regulatory requirement for financial exchanges.

The offset between a pair of clocks that are six hops away in a Firefly-synced network, measured by an oscilloscope via 1 pulse per second.

The accompanying video illustrates the accuracy of NIC-to-NIC synchronization, as quantified by an oscilloscope utilizing a one-pulse-per-second (1PPS) signal from the NICs. Each row corresponds to a NIC clock, with the rising edge indicating the precise moment the NIC clock attains an integer second. The oscilloscope observations confirm that all measured NICs exhibit close synchronization, maintaining alignment within a few nanoseconds.

These results are particularly impressive given that Firefly operates purely in software on commodity hardware, avoiding the need for expensive, specialized synchronization equipment. This makes ultra-accurate time synchronization accessible to a broader range of data center applications.

A foundation for future applications

Firefly's success in delivering nanosecond-level accuracy in a scalable and cost-effective manner has far-reaching implications:

Democratizing high-precision timing: Firefly allows cloud-hosted financial services that traditionally rely on expensive dedicated hardware, to achieve the required precision using standard cloud infrastructure.
Enabling new applications: The availability of precise, synchronized clocks across data center devices can unlock new possibilities in areas like fine-grained network telemetry and congestion control, time-coordinated distributed systems, and deterministic fabric for ML workloads.
Transforming data center operations: By creating a tightly integrated and precisely timed computing entity, Firefly can enhance data centers’ overall efficiency, reliability, and performance.

In conclusion, Firefly represents a significant advancement in the field of clock synchronization. By ingeniously combining theoretical insights into graph theory and consensus algorithms with practical network engineering techniques, it overcomes the long-standing challenges of achieving nanosecond-level precision in complex, distributed environments. As data centers continue to evolve, systems like Firefly will be instrumental in building the high-performance, reliable, and fair infrastructure of the future.

aside_block: <ListValue: [StructValue([('title', '2026 AI Agent Trends in Financial Services'), ('body', <wagtail.rich_text.RichText object at 0x7f0dd1a76040>), ('btn_text', 'Read it now.'), ('href', 'https://cloud.google.com/resources/content/ai-agent-trends-financial-services-2026'), ('image', <GAEImage: FSI_Confirmation email_500x450>)])]>

Google Distributed Cloud brings public-cloud-like networking to air-gapped environments

Tue, 10 Feb 2026 17:00:00 +0000

Organizations in highly regulated industries often struggle to balance the rigid security of air-gapped environments with the need for the agility and flexibility that the cloud provides. To address this, Google Distributed Cloud (GDC) air-gapped 1.15 introduces new networking features in preview that give you more direct control and visibility without compromising your security posture, as well as a new IPAM feature in general availability that simplifies subnet management. These preview features are Cloud NAT, enhanced connectivity for standard clusters, and advanced HTTP and HTTPS health checks in load balancers. Together, they make it easier for you to manage complex workloads in a secure environment.

Manage outbound traffic with Cloud NAT

Cloud NAT for GDC air-gapped replaces previous egress solutions and gives you more control over how your instances communicate with other networks, on par with public cloud functionality. Cloud NAT provides several benefits:

Configurable egress IPs: You can assign and manage multiple egress IP addresses for your outbound traffic so you can identify exactly which workloads are communicating.
Customizable timeouts: Manage connection lifecycles by adjusting timeouts for different types of traffic.
Granular control: Administrators can create specific subnets for egress IPs, while application operators define how pods and VMs route their traffic.

Connect standard clusters directly to your organization

In a secure environment, isolation should not result in disconnected silos. With the latest release, standard clusters include networking updates that help you communicate across your organization while maintaining strict security boundaries, helping you manage your environment more effectively. The updates include:

Direct pod communication: Your standard cluster pods can now communicate directly with workloads in your organization’s Default VPC. This simplifies how you connect standard clusters and shared clusters.
Flexible firewall policies: You can use both Project Network Policy and Kubernetes Network Policy APIs to set granular rules for traffic entering and leaving your pods and nodes.
Managed load balancing: You can create internal and external load balancers using standard Kubernetes Service APIs, while GDC manages the underlying configuration for you.

Pods within a standard cluster can now connect to other pods directly or through a ClusterIP. While traffic to the Infra VPC remains blocked, you can send traffic to shared cluster workloads through GDC internal load balancers. This ensures your applications can reach necessary services quickly.

Improve reliability with Load Balancer HTTP and HTTPS health checks

Previously, L4 load balancing health checks only monitored basic TCP connectivity, only confirming if a port was open. GDC air-gapped load balancers now support HTTP and HTTPS health checks, which allow you to verify if an application is actually functioning correctly. By checking status codes and response content, you can:

Confirm application health: Verify that services are responding correctly, not just that the server is powered on.
Increase reliability: Automatically detect and route traffic away from applications experiencing internal errors.
Improve visibility: Access better data regarding the health of your VM-based workloads to manage performance before issues arise.

Make subnet management easier with subnet groups

Previously, a child subnet could only reference a single parent subnet. With the introduction of the subnet group, a child subnet can now reference a subnet group that may contain multiple parent subnets. This provides the following benefits:

Overcome the challenges of immutable subnet CIDR: While subnet CIDR range is immutable, subnet group simplifies scaling up IP resources by attaching a new subnet to a subnet group. You can reference a subnet group instead of a single parent subnet for easy scale-up.
Automatically identify a parent subnet: Now you can reference a subnet group as parent rather than as a single subnet. By using a subnet group in this way, you don't need to manually identify a parent subnet that has available IP resources: instead, GDC IPAM automatically finds a subnet in the subnet group with enough available IP space as its parent.
Start with smaller CIDRs for easier planning: Using subnet groups to scale IP resources also means that you can start with smaller and discontinuous CIDRs when creating new parent subnets, making IP resource utilization more efficient and the planning process easier.

Get started

To learn more about these features, please refer to our documentation or contact your Google Cloud account team.

A gRPC transport for the Model Context Protocol

Tue, 13 Jan 2026 17:30:00 +0000

AI agents are moving from test environments to the core of enterprise operations, where they must interact reliably with external tools and systems to execute complex, multi-step goals. The Model Context Protocol (MCP) is the standard that makes this agent to tool communication possible. In fact, just last month we announced the release of fully-managed, remote MCP servers. Developers can now simply point their AI agents or standard MCP clients like Gemini CLI to a globally-consistent and enterprise-ready endpoint for Google and Google Cloud services.

MCP uses JSON-RPC as its standard transport. This brings many benefits as it combines an action-oriented approach with natural language payloads that can be directly relayed by agents in their communication with foundational models. Yet many organizations rely on gRPC, a high-performance, open source implementation of the remote procedure call (RPC) model. Enterprises that have adopted the gRPC framework must adapt their tooling to be compatible with the JSON-RPC transport used by MCP. Today, these enterprises need to deploy transcoding gateways to translate between JSON-RPC MCP requests and their existing gRPC-based services.

An interesting alternative to MCP transcoding is to use gRPC as the custom transport for MCP. Many gRPC users are actively experimenting with this option by implementing their own custom MCP servers. At Google Cloud, we use gRPC extensively to enable services and offer APIs at a global scale, and we’re committed to sharing the technology and expertise that has resulted from this pervasive use of gRPC. Specifically, we’re committed to supporting gRPC practitioners in their journey to adopt MCP in production, and we’re actively working with the MCP community to explore mechanisms to support gRPC as a transport for MCP. The MCP core maintainers have arrived at an agreement to support pluggable transports in the MCP SDK, and in the near future, Google Cloud will contribute and distribute a gRPC transport package to be plugged into the MCP SDKs. A community-backed transport package will enable gRPC practitioners to deploy MCP with gRPC in a consistent and interoperable manner.

The use of gRPC as a transport avoids the need for transcoding and helps maintain operational consistency for environments that are actively using gRPC. In the rest of this post, we explore the benefits of using gRPC as a transport for MCP and how Google Cloud is supporting this journey.

The choice of RPC transport

For organizations already using gRPC for their services, gRPC support allows them to continue to use their existing tooling to access services via MCP without altering the services or implementing transcoding proxies. These organizations are on a journey to keep the benefits of gRPC as MCP becomes the mechanism for agents to access services.

“Because gRPC is our standard protocol in the backend, we have invested in experimental support for MCP over gRPC internally. And we already see the benefits: ease of use and familiarity for our developers, and reducing the work needed to build MCP servers by using the structure and statically typed APIs.” - Stefan Särne, Senior Staff Engineer and Tech Lead for Developer Experience, Spotify

Benefits of gRPC

Using gRPC as a transport aligns MCP with the best practices of modern gRPC-based distributed systems, improving performance, security, operations, and developer productivity.

Performance and efficiency

The performance advantages of gRPC provide a big boost in efficiency, thanks to the following attributes:

Binary encoding (protocol buffers): gRPC uses protocol buffers (Protobufs) for binary encoding, shrinking message sizes by up to 10x compared to JSON. This means less bandwidth consumption and faster serialization/deserialization, which translates to lower latency for tool calls, reduced network costs, and a much smaller resource footprint.
Full duplex bidirectional streaming: gRPC natively supports the client (the agent) and the server (the tool), sending continuous data streams to each other simultaneously over a single, persistent connection. This feature is a game-changer for agent-tool interaction, opening the door to truly interactive, real-time agentic workflows without requiring application-level connection synchronization.
Built-in flow control (backpressure): gRPC includes native flow control to prevent a fast-sending tool from overwhelming the agent.

Enterprise-grade security and authorization

gRPC treats security as a first-class citizen, with enterprise-grade features built directly into its core, including:

Mutual TLS (mTLS): Critical for Zero Trust architectures, mTLS authenticates both the client and the gRPC-powered server, preventing spoofing and helping to ensure only trusted services communicate.
Strong authentication: gRPC offers native hooks for integrating with industry-standard token-based authentication (JWT/OAuth), providing verifiable identity for every AI agent.
Method-level authorization: You can enforce authorization policies directly on specific RPC methods or MCP tools (e.g., an agent is authorized to ReadFile but not DeleteFile), helping to ensure strict adherence to the principle of least privilege and combating "excessive agency."

Operational maturity and developer productivity

gRPC provides a powerful, integrated solution that helps offload resiliency measures and improves developer productivity through extensibility and reusability. Some of its capabilities include:

Unified observability: Native integration with distributed tracing (OpenTelemetry) and structured error codes provides a complete, auditable trail of every tool call. Developers can trace a single user prompt through every subsequent microservice interaction.
Robust resiliency: Features like deadlines, timeouts, and automatic flow control prevent a single unresponsive tool from causing system-wide failures. These features allow a client to specify a policy for a tool call that the framework automatically cancels if exceeded, preventing a cascading failure.
Polyglot development: gRPC generates code for 11+ languages, allowing developers to implement MCP Servers in the best language for the job while maintaining a consistent, strongly-typed contract.
Schema-based input validation: Protobuf's strict typing mitigates injection attacks and simplifies the development task by rejecting malformed inputs at the serialization layer.
Error handling and metadata: The framework provides a standardized set of error codes (e.g., UNAVAILABLE, PERMISSION_DENIED) for reliable client handling, and clients can send and receive out-of-band information as key-value pairs in metadata (e.g., for tracing IDs) without cluttering the main request.

Get started

As a founding member of the Agentic AI Foundation and a core contributor to the MCP specification, Google Cloud, along with other members of the community, has championed the inclusion of pluggable transport interfaces in the MCP SDK. Participate and communicate your interest in having gRPC as a transport for MCP:

Express your interest in enabling gRPC as an MCP transport. Contribute to the active pull request for pluggable transport interfaces for the Python MCP SDK.
Join the community that is shaping the future of communications for AI and help advance the Model Context Protocol. Contributor Communication - Model Context Protocol.
Contact us. We want to learn from your experience and support your journey.

How Hackensack Meridian Health de-risked network migration using VPC Flow Logs

Fri, 09 Jan 2026 17:00:00 +0000

Network administrators rely heavily on VPC Flow Logs for visibility into their network traffic. Last year, we updated VPC Flow Logs to offer expanded network traffic visibility, extending beyond subnets to include VLAN attachments and VPN tunnels. This enhancement provides comprehensive monitoring of network traffic across your on-premises and multi-cloud environments.

Now, with VPC Flow Logs for VLAN attachments, you can export detailed telemetry data for your network traffic traversing Cloud Interconnect. This data encompasses essential information such as source and destination IP addresses, ports, protocols, bytes/packets transferred, timestamps, and other relevant metadata. These logs are crucial for a variety of use-cases, including network traffic analysis, troubleshooting, capacity planning, and maintaining compliance and security. Then, you can use Flow Analyzer to quickly analyze your VPC Flow Logs to gain valuable insights into your network without writing complex SQL queries.

Sounds great, but how do you use it? Hackensack Meridian Health (HMH) is a leading not-for-profit healthcare organization and the largest hospital system in New Jersey. As a network of hospitals, urgent care centers, and physician practices, system reliability is extremely important and a cornerstone value of HMH. In this blog post, we demonstrate how HMH leveraged VPC Flow Logs and Flow Analyzer to analyze their Cloud Interconnect traffic prior to migrating their Google Cloud network to a new architecture design.

Let’s jump in.

Using VPC Flow Logs to prepare for migration

Last year, HMH was getting ready to migrate their critical, large-scale network to a newer Google Cloud network design. Before a migration of this scale, they wanted to use sankey diagrams to get a clear understanding of their most important hybrid traffic patterns. This analysis was the only way to accurately identify — and proactively plan for — the biggest risks that could cause disruption during the cutover.

"Getting a clear picture of our interconnect traffic always felt like a black box. Enabling VPC Flow Logs and feeding it into Flow Analyzer finally gave us the 'who-is-talking-to-what' map we needed. Identifying those critical traffic flows before we changed any routes was key to de-risking the entire migration." - Randall Brokaw, Cloud Engineering Manager, Hackensack Meridian Health

To collect the necessary data, HMH enabled VPC Flow Logs on all of their VLAN attachments, then leveraged Flow Analyzer to easily aggregate the ingress and egress data. The following query components were used for ingress analysis:

Flow Analyzer query

Source

Filter: Gateway type = INTERCONNECT_ATTACHMENT
Organize Flows By: Gateway location, Gateway VPC network

Destination

Organize Flows By: GCE Instance Project, Google service type

These selections filter VPC Flow Logs to ingress traffic over Cloud Interconnect VLAN attachments, and aggregate the source traffic volume by Google Cloud region and VPC network.

The destination data was grouped by Compute Engine instance project to easily identify the destination application, since each application is deployed into a dedicated service project. However, since not all traffic is sent to Compute Engine VMs, incorporating the Google service type enabled them to account for traffic destined for Google APIs and Google VPC hosted services.

In your environment, the best flow parameters and destination grouping to conduct this analysis will depend on how your organization deploys applications on Google Cloud. For example, you can group by any of the available fields collected by VPC Flow Logs metadata, such as IP address and port, VPC subnet, GKE cluster, and more.

HMH then transformed the VPC Flow Logs traffic volumes into sankey diagrams. This required formatting each traffic flow into multiple three-column rows of {source, destination, weight}. For this analysis, the weight was the traffic volume displayed in Flow Analyzer, and source,destination corresponded to each layer of the sankey visualization in the following order:

Data center to Google Cloud region
Google Cloud region to VPC network
VPC network to application

Selecting “View the query in Log Analytics” from the Flow Analyzer console allows the traffic flows to be easily exported to Google Sheets and combined correctly for the diagram. Then using Google Charts, HMH created the sankey diagram:

code_block: <ListValue: [StructValue([('code', "var data = new google.visualization.DataTable();\r\n\r\ndata.addColumn('string', 'From');\r\ndata.addColumn('string', 'To');\r\ndata.addColumn('number', 'Weight');\r\ndata.addRows([\r\n [ 'On Premises', 'us-central1', 28 ],\r\n [ 'On Premises', 'us-east1', 7 ],\r\n [ 'us-east1', 'Prod Network', 2 ],\r\n [ 'us-east1', 'Shared Network', 9 ],\r\n [ 'us-central1', 'Prod Network', 4 ],\r\n ...\r\n]);"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd3751fa0>)])]>

Google Charts sankey diagram - Analysis of Cloud Interconnect traffic

Using VPC Flow Logs, HMH network engineers pinpointed critical cutover moments in their plan, allowing them to de-risk the migration through proactive monitoring and preparedness. This preparation proved its value when a migration issue was detected in 3 minutes and resolved in just 5 — slashing a resolution process that previously could have taken hours. This readiness was fundamental to the migration's success.

This implementation uses Flow Analyzer which requires VPC Flow Logs to be stored in Cloud Logging. Alternatively, you have the option to forward VPC Flow Logs straight to BigQuery, bypassing Cloud Logging. From there, you can utilize visualization services like Looker to construct personalized dashboards and gain valuable insights.

VPC Flow Logs and Flow Analyzer for the win

HMH used VPC Flow Logs and Flow Analyzer to facilitate their network migration. But, by providing granular visibility into your Cloud Interconnect traffic, VPC Flow Logs can enable many other use cases, such as for capacity planning, cost attribution, and more. Enable VPC Flow Logs on your VLAN attachments today and leverage Flow Analyzer for insights into your traffic flow patterns.

To learn more, check out the VPC Flow Logs documentation or get started with Flow Analyzer to analyze your logs at no additional cost.

Responding to CVE-2025-55182: Secure your React and Next.js workloads

Wed, 03 Dec 2025 23:00:00 +0000

Editor's note: This blog was updated on Dec. 4, 5, 7, and 12, 2025, with additional guidance on Cloud Armor WAF rule syntax, and WAF enforcement across App Engine Standard, Cloud Functions, and Cloud Run.

Earlier today, Meta and Vercel publicly disclosed two vulnerabilities that expose services built using the popular open-source frameworks React Server Components (CVE-2025-55182) and Next.js to remote code execution risks when used for some server-side use cases. At Google Cloud, we understand the severity of these vulnerabilities, also known as React2Shell, and our security teams have shared their recommendations to help our customers take immediate, decisive action to secure their applications.

Vulnerability background

The React Server Components framework is commonly used for building user interfaces. On Dec. 3, 2025, CVE.org assigned this vulnerability as CVE-2025-55182. The official Common Vulnerability Scoring System (CVSS) base severity score has been determined as Critical, a severity of 10.0.

Vulnerable versions: React 19.0, 19.1.0, 19.1.1, and 19.2.0
Patched in React 19.2.1
Fix: https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700
Announcement: https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components

Next.js is a web development framework that depends on React, and is also commonly used for building user interfaces. (The Next.js vulnerability was referenced as CVE-2025-66478 before being marked as a duplicate.)

Vulnerable versions: Next.js 15.x, Next.js 16.x, Next.js 14.3.0-canary.77 and later canary releases
Patched versions are listed here.
Fix: https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2
Announcement: https://nextjs.org/blog/CVE-2025-66478

Google Threat Intelligence Group (GTIG) has also published a new report to help understand the specific threats exploiting React2Shell.

We strongly encourage organizations who manage environments relying on the React and Next.js frameworks to update to the latest version, and take the mitigation actions outlined below.

Mitigating CVE-2025-55182

We have created and rolled out a new Cloud Armor web application firewall (WAF) rule designed to detect and block exploitation attempts related to CVE-2025-55182. This new rule is available now and is intended to help protect your internet-facing applications and services that use global or regional Application Load Balancers. We recommend deploying this rule as a temporary mitigation while your vulnerability management program patches and verifies all vulnerable instances in your environment.

For customers using App Engine Standard, Cloud Functions, Cloud Run, Firebase Hosting or Firebase App Hosting, we provide an additional layer of defense for serverless workloads by automatically enforcing platform-level WAF rules that can detect and block the most common exploitation attempts related to CVE-2025-55182.

For Project Shield users, we have deployed WAF protections for all sites and no action is necessary to enable these WAF rules. For long-term mitigation, you will need to patch your origin servers as an essential step to eliminate the vulnerability (see additional guidance below).

Cloud Armor and the Application Load Balancer can be used to deliver and protect your applications and services regardless of whether they are deployed on Google Cloud, on-premises, or on another infrastructure provider. If you are not yet using Cloud Armor and the Application Load Balancer, please follow the guidance further down to get started.

While these platform-level rules and the optional Cloud Armor WAF rules (for services behind an Application Load Balancer) help mitigate the risk from exploits of the CVE, we continue to strongly recommend updating your application dependencies as the primary long-term mitigation.

Deploying the cve-canary WAF rule for Cloud Armor

To configure Cloud Armor to detect and protect from CVE-2025-55182, you can use the cve-canary preconfigured WAF rule leveraging the new ruleID that we have added for this vulnerability. This rule is opt-in only, and must be added to your policy even if you are already using the cve-canary rules.

In your Cloud Armor backend security policy, create a new rule and configure the following match condition:

code_block: <ListValue: [StructValue([('code', "(has(request.headers['next-action']) || has(request.headers['rsc-action-id']) || request.headers['content-type'].contains('multipart/form-data') || request.headers['content-type'].contains('application/x-www-form-urlencoded')) && evaluatePreconfiguredWaf('cve-canary',{'sensitivity': 0, 'opt_in_rule_ids': ['google-mrs-v202512-id000001-rce','google-mrs-v202512-id000002-rce']})"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd16c0f70>)])]>

This can be accomplished from the Google Cloud console by navigating to Cloud Armor and modifying an existing or creating a new policy.

Cloud Armor rule creation in the Google Cloud console.

Alternatively, the gcloud CLI can be used to create or modify a policy with the requisite rule:

code_block: <ListValue: [StructValue([('code', 'gcloud compute security-policies rules create PRIORITY_NUMBER \\\r\n --security-policy SECURITY_POLICY_NAME \\\r\n --expression "(has(request.headers[\'next-action\']) || has(request.headers[\'rsc-action-id\']) || request.headers[\'content-type\'].contains(\'multipart/form-data\') || request.headers[\'content-type\'].contains(\'application/x-www-form-urlencoded\')) && evaluatePreconfiguredWaf(\'cve-canary\',{\'sensitivity\': 0, \'opt_in_rule_ids\': [\'google-mrs-v202512-id000001-rce\',\'google-mrs-v202512-id000002-rce\']})" \\\r\n --action=deny-403'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd16c0730>)])]>

Additionally, if you are managing your rules with Terraform, you may implement the rule via the following syntax:

code_block: <ListValue: [StructValue([('code', 'rule {\r\n action = "deny(403)"\r\n priority = "PRIORITY_NUMBER"\r\n match {\r\n expr {\r\n expression = "(has(request.headers[\'next-action\']) || has(request.headers[\'rsc-action-id\']) || request.headers[\'content-type\'].contains(\'multipart/form-data\') || request.headers[\'content-type\'].contains(\'application/x-www-form-urlencoded\')) && evaluatePreconfiguredWaf(\'cve-canary\',{\'sensitivity\': 0, \'opt_in_rule_ids\': [\'google-mrs-v202512-id000001-rce\',\'google-mrs-v202512-id000002-rce\']})"\r\n }\r\n }\r\n description = "Applies protection for CVE-2025-55182 (React/Next.JS)"\r\n }'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0dd16c01c0>)])]>

Verifying WAF rule safety for your application and consuming telemetry

Cloud Armor rules can be configured in preview mode, a logging-only mode to test or monitor the expected impact of the rule without Cloud Armor enforcing the configured action. We recommend that the new rule described above first be deployed in preview mode in your production environments so that you can see what traffic it would block.

Once you verify that the new rule is behaving as desired in your environment, then you can disable preview mode to allow Cloud Armor to actively enforce it.

Cloud Armor per-request WAF logs are emitted as part of the Application Load Balancer logs to Cloud Logging. To see what Cloud Armor’s decision was on every request, load balancer logging first needs to be enabled on a per backend service basis. Once it is enabled, all subsequent Cloud Armor decisions will be logged and can be found in Cloud Logging by following these instructions.

Interaction of Cloud Armor rules with vulnerability scanning tools

There has been a proliferation of scanning tools designed to help identify vulnerable instances of React and Next.js in your environments. Many of those scanners are designed to identify the version number of relevant frameworks in your servers and do so by crafting a legitimate query and inspecting the response from the server to detect the version of React and Next.js that is running.

Our WAF rule is designed to detect and prevent exploit attempts of CVE-2025-55182. As the scanners discussed above are not attempting an exploit, but sending a safe query to elicit a response revealing indications of the version of the software, the above Cloud Armor rule will not detect or block such scanners.

If the findings of these scanners indicate a vulnerable instance of software protected by Cloud Armor, that does not mean that an actual exploit attempt of the vulnerability will successfully get through your Cloud Armor security policy. Instead, such findings mean that the version React or Next.js detected is known to be vulnerable and should be patched.

How to get started with Cloud Armor for new users

If your workload is already using an Application Load Balancer to receive traffic from the internet, you can configure Cloud Armor to protect your workload from this and other application-level vulnerabilities (as well as DDoS attacks) by following these instructions.

If you are not yet using an Application Load Balancer and Cloud Armor, you can get started with the external Application Load Balancer overview, the Cloud Armor overview, and the Cloud Armor best practices.

If your workload is using Cloud Run, Cloud Run functions, or App Engine and receives traffic from the internet, you must first set up an Application Load Balancer in front of your endpoint to leverage Cloud Armor security policies to protect your workload. You will then need to configure the appropriate controls to ensure that Cloud Armor and the Application Load Balancer can’t be bypassed.

Best practices and additional risk mitigations

Once you configure Cloud Armor, we recommend consulting our best practices guide. Be sure to account for limitations discussed in the documentation to minimize risk and optimize performance while ensuring the safety and availability of your workloads.

Serverless platform protections

Google Cloud is enforcing platform-level protections across App Engine Standard, Cloud Functions, and Cloud Run to automatically help protect against common exploit attempts of CVE-2025-55182. This protection supplements the protections already in place for Firebase Hosting and Firebase App Hosting.

What this means for you:

Applications deployed to those serverless services benefit from these WAF rules that are enabled by default to help provide a base level of protection without requiring manual configuration.
These rules are designed to block known malicious payloads targeting this vulnerability.

Important considerations:

Patching is still critical: These platform-level defenses are intended to be a temporary mitigation. The most effective long-term solution is to update your application's dependencies to non-vulnerable versions of React and Next.js, and redeploy them.
Potential impacts: While unlikely, if you believe this platform-level filtering is incorrectly impacting your application's traffic, please contact Google Cloud Support and reference issue number 465748820.

Long-term mitigation: Mandatory framework update and redeployment

While WAF rules provide critical frontline defense, the most comprehensive long-term solution is to patch the underlying frameworks.

While Google Cloud is providing platform-level protections and Cloud Armor options, we urge all customers running React and Next.js applications on Google Cloud to immediately update their dependencies to the latest stable versions (React 19.2.1 or the relevant version of Next.js listed here), and redeploy their services.

This applies specifically to applications deployed on:

Cloud Run, Cloud Run functions, or App Engine: Update your application dependencies with the updated framework versions and redeploy.
Google Kubernetes Engine (GKE): Update your container images with the latest framework versions and redeploy your pods.
Compute Engine: The public OS images provided by Google Cloud do not have React or Next.js packages installed by default. If you have installed a custom OS with the affected packages, update your workloads to include the latest framework versions and enable WAF rules in front of all workloads.
Firebase: If you’re using Cloud Functions for Firebase, Firebase Hosting, or Firebase App Hosting, update your application dependencies with the updated framework versions and redeploy. Firebase Hosting and App Hosting are also automatically enforcing a rule to limit exploitation of CVE-2025-55182 through requests to custom and default domains.

Patching your applications is an essential step to eliminate the vulnerability at its source and ensure the continued integrity and security of your services.

We will continue to monitor the situation closely and provide further updates and guidance as necessary. Please refer to our official Google Cloud Security advisories for the most current information and detailed steps.

If you have any questions or require assistance, please contact Google Cloud Support and reference issue number 465748820.

Gain Cross-Cloud Network traffic insights with VPC Flow Logs and Flow Analyzer

Mon, 01 Dec 2025 17:00:00 +0000

Gaining visibility into your network traffic is crucial, particularly with hybrid environments encompassing both on-premises and cross-cloud infrastructure. VPC Flow Logs have long been a staple to obtain detailed records of network traffic to, from, and within your Google Cloud subnets. But with the rise of more complex network topologies enabled by the Cross-Cloud Network, we knew we needed to expand VPC Flow Logs to give you a more complete picture.

That's why we're excited to share that you can now enable VPC Flow Logs directly on your Cloud VPN tunnels and VLAN attachments for Cloud Interconnect and Cross-Cloud Interconnect. This enhancement provides comprehensive monitoring of critical network traffic moving between your on-prem infrastructure, cross-cloud resources, and Google Cloud. With this new capability, you can:

Gain granular insights: Log network flows passing through Cloud Interconnect and Cloud VPN with 5-tuple granularity (source/destination IP, source/destination port, protocol).
Optimize performance: Quickly identify "elephant flows" (high-bandwidth flows) that might be congesting a specific VPN tunnel or VLAN attachment, enabling you to better plan and manage capacity.
Audit Shared VPC usage: In Shared VPC environments, identify which service projects are consuming the most hybrid bandwidth.
Map utilization to flows: Understand exactly how your hybrid connections are being utilized by mapping high-level bandwidth graphs to specific application flows.
Diagnose connectivity issues: When an on-prem/cross-cloud application can't reach a Google Cloud resource, use logs to check if the traffic is arriving at the Google Cloud gateway (VLAN attachment or VPN tunnel).
Finetune your application awareness on Cloud Interconnect policy configurations: Monitor and verify that your applications are marking differentiated services field codepoints (DSCP) correctly.

To provide more context to these flows, we've also added "gateway" annotations to VPC Flow Logs. Think of a gateway as the entry or exit point for traffic traveling between your Google Cloud VPC and an external network.

When you inspect a flow log of Cross-Cloud Network traffic, you'll now see two key new fields:

reporter: This field tells you the direction of the traffic, relative to the gateway.

SRC_GATEWAY: The traffic was observed entering Google Cloud through Cloud Interconnect or Cloud VPN (e.g., on-prem to Google Cloud).
DEST_GATEWAY: The traffic was observed exiting Google Cloud through Cloud Interconnect or Cloud VPN (e.g., Google Cloud to on-prem).

gateway object: This JSON payload provides the full context of the gateway itself, including its name, type (VPN_TUNNEL or INTERCONNECT_ATTACHMENT), project_id, and location.

Analyze your logs with Flow Analyzer

To help you analyze your flow logs without writing-complex SQL queries, we’ve also integrated the new gateway annotations directly into Flow Analyzer, a native tool for performing deep network traffic analysis on your VPC Flow Logs stored in Cloud Logging at no additional cost. Using Flow Analyzer, you can:

Quickly identify top talkers in your network with 5-tuple granularity.
Run Connectivity Tests in-context to understand how your configurations (ie. firewall policies) impact traffic flowing through your network.
Use Gemini Cloud Assist to construct natural language queries.
Analyze and compare current network flows with historical data (e.g., last hour, day, or week).

Flow Analyzer providing Cloud Interconnect traffic insights

Achieve essential visibility across the Cross-Cloud Network

If you're running a Cross-Cloud Network, enabling VPC Flow Logs on your VLAN attachments and VPN tunnels provides the essential telemetry you need to manage, secure, and scale your interconnected networks. You can enable this feature on your new and existing VLAN attachments and VPN tunnels using CLI, API, Terraform, or directly from the Google Cloud console.

To learn more, check out the VPC Flow Logs documentation or get started with Flow Analyzer.

AWS and Google Cloud collaborate to simplify multicloud networking

Sun, 30 Nov 2025 19:00:00 +0000

As organizations increasingly adopt multicloud architectures, the need for interoperability between cloud service providers has never been greater. Historically, however, connecting these environments has been a challenge, forcing customers to take a complex "do-it-yourself" approach to managing global multi-layered networks at scale.

To address these challenges and advance a more open cloud environment, Amazon Web Services (AWS) and Google Cloud collaborated to transform how cloud service providers could connect with one another in a simplified manner.

Today, AWS and Google Cloud are excited to announce a jointly engineered multicloud networking solution that uses both AWS Interconnect - multicloud and Google Cloud’s Cross-Cloud Interconnect. This collaboration also introduces a new open specification for network interoperability, enabling customers to establish private, high-speed connectivity between Google Cloud and AWS with high levels of automation and speed.

“Integrating Salesforce Data 360 with the broader IT landscape requires robust, private connectivity. AWS Interconnect - multicloud allows us to establish these critical bridges to Google Cloud with the same ease as deploying internal AWS resources, utilizing pre-built capacity pools and the tools our teams already know and love. This native, streamlined experience — from provisioning through ongoing support — accelerates our customers' ability to ground their AI and analytics in trusted data, regardless of where it resides.” - Jim Ostrognai, SVP Software Engineering, Salesforce

Previously, to connect cloud service providers, customers had to manually set up complex networking components including physical connections and equipment; this approach required lengthy lead times and coordinating with multiple internal and external teams. This could take weeks or even months. AWS had a vision for developing this capability as a unified specification that could be adopted by any cloud service provider, and collaborated with Google Cloud to bring it to market.

Now, this new solution reimagines multicloud connectivity by moving away from physical infrastructure management toward a managed, cloud-native experience. By integrating AWS with Google Cloud’s Cross-Cloud Network architecture, we are abstracting the complexity of physical connectivity, network addressing, and routing policies. Customers no longer need to wait weeks for circuit provisioning: they can now provision dedicated bandwidth on demand and establish connectivity in minutes through their preferred cloud console or API.

Reliability and security are the cornerstone of this collaboration. We have collaborated on this solution to deliver high resiliency by leveraging quad-redundancy across physically redundant interconnect facilities and routers. Both providers engage in continuous monitoring to proactively detect and resolve issues. And this solution is built on a foundation of trust, utilizing MACsec encryption between the Google Cloud and AWS edge routers.

“This collaboration between AWS and Google Cloud represents a fundamental shift in multicloud connectivity. By defining and publishing a standard that removes the complexity of any physical components for customers, with high availability and security fused into that standard, customers no longer need to worry about any heavy lifting to create their desired connectivity. When they need multicloud connectivity, it's ready to activate in minutes with a simple point and click.” - Robert Kennedy, VP of Network Services, AWS

“We are excited about this collaboration which enables our customers to move their data and applications between clouds with simplified global connectivity and enhanced operational effectiveness. Today's announcement further delivers on Google Cloud’s Cross-Cloud Network solution focused on delivering an open and unified multicloud experience for customers.” - Rob Enns, VP/GM of Cloud Networking, Google Cloud

This collaboration between AWS and Google Cloud is more than a multicloud solution: it’s a step toward a more open cloud environment. The API specifications developed for this product are open for other providers and partners to adopt, as we aim to simplify global connectivity for everyone. We invite you to explore this new capability today. To learn more about how to streamline your multicloud operations please visit the in-depth Google Cloud Cross-Cloud Interconnect blog and the AWS Interconnect - multicloud website to get started.

Expanding Google Cloud’s Cross-Cloud Network with a groundbreaking AWS collaboration

Sun, 30 Nov 2025 19:00:00 +0000

Today, we announced a significant collaboration with Amazon Web Services (AWS) to offer a managed, private and secure, on-demand, solution for cross-cloud connectivity. This solution is designed to enable customers to easily build enterprise-grade applications that span both Google Cloud and AWS environments. This collaboration is particularly timely, as the adoption of multicloud applications is rapidly accelerating, driven in part by the rise of AI. A Forbes survey highlighted that 82% of respondents anticipate that the arrival of AI services will increase the demand for multicloud networking due to the scarcity of specialized accelerator resources and the availability of diverse AI agents across different vendors. The surge in multicloud adoption is a strategic imperative for organizations looking to build agentic AI applications, optimize workloads, access best-of-breed services, meet data residency requirements, and ensure the necessary resiliency for modern hybrid and multicloud applications.

To address the inherent network infrastructure challenges introduced by multicloud deployments, we designed the Cross-Cloud Network to simplify and optimize networking between Google Cloud and other providers.This commitment to multicloud integration has led to over 50% of the Fortune 500 currently using the Cross-Cloud Network, and this collaboration provides a significant boost. Importantly, this new jointly engineered solution with AWS is being published under an open specification, creating an opportunity to expand the reach, allowing other providers to contribute and implement this solution in their own environments, further benefiting mutual customers.

Introducing the Cross-Cloud Interconnect for AWS

Today marks a major step in simplifying and securing the multicloud journey. We are thrilled to announce a first of its kind open specification that fundamentally streamlines private network connections between customers' environments across different cloud providers. This groundbreaking joint specification has culminated in the preview of partner Cross-Cloud Interconnect for AWS, a powerful expansion of our Cloud Interconnect portfolio. This innovation allows you to build on-demand connections in minutes between your Google Cloud and AWS VPCs, transforming multicloud networking from a complex build into a simple, managed service.

This is more than just a connection — it's a complete shift in how you adopt multicloud solutions. We are delivering substantial value to our mutual customers:

Simplicity and speed: Say goodbye to complex networking builds. This is a fully managed, cloud-native experience where a cross-cloud connection is as easy as peering two VPCs. We're cutting end-to-end setup time from days to mere minutes, with flexible, on-demand bandwidth starting at 1 Gbps during preview and scaling up to 100 Gbps at general availability.
Secure by default: Your data's security is paramount. All connections between the two clouds' edge routers are MACsec-encrypted — providing line-rate performance with always-on encryption — for a more secure foundation.
Inherently resilient: Benefit from an inherently resilient architecture that provides layers of protection against facility, network, and software failures, ensuring your critical applications remain online.
Open and optimized: The foundation is an open specification for seamless adoption across the industry. You can also benefit from an optimized total cost of ownership through vendor consolidation and an on-demand service model that lets you provision exactly what you need, when you need it.

This service is launching with availability in key locations like N. Virginia, Oregon, London, and Frankfurt, with rapid expansion planned to more locations globally.

Simplifying cross-cloud connectivity

Before today's jointly engineered solution, building applications that spanned multiple cloud environments was a significant undertaking, often becoming a barrier to multicloud adoption. Customers faced a complex, multi-layered process that involved cross-functional teams and substantial lead times.

A typical deployment required several intricate steps:

Procurement: Acquiring physical connections, whether dedicated or through a shared partner offering, and then building and managing the necessary infrastructure to ensure network availability and separation of fate.
Logical configuration: Establishing basic connectivity by meticulously assigning and negotiating non-overlapping link-local IP addresses and setting up VLANs.
Routing setup: Configuring BGP sessions, assigning Autonomous System (AS) numbers, and creating complex routing policies to meet specific performance and reliability requirements.
Security implementation: Conducting thorough security reviews and implementing custom solutions to encrypt traffic between the distinct cloud environments.

The integrated partner Cross-Cloud Interconnect offering completely abstracts away this complexity. Customers can now bypass all the manual steps and instantly leverage pre-built physical connections with built-in security and resiliency, achieving streamlined, on-demand connectivity between their Google Cloud VPCs and AWS.

Building this powerful cross-cloud connection is now remarkably simple. Customers configure a single "transport" resource in Google Cloud and accept it in AWS. This transport is an innovative, managed construct that completely abstracts and provisions the underlying physical interconnects, VLAN attachments, and Cloud Router instances. This profound simplification enables end-to-end connectivity in minutes, transforming multicloud deployment from a days-long engineering project into a simple, rapid configuration task.

Under the hood: a secure and resilient foundation

We co-designed our solution to deliver a secure and resilient foundation for cross-cloud applications, with a simple new service that doesn’t compromise on core enterprise availability tenets.

Privacy and security: All peering relationships are built between link local addresses, facilitating connectivity between IPv4 and IPv6 private address spaces across both environments. All underlying physical connections between Google Cloud and AWS edge routers are MACsec-encrypted, and both providers manage key rotation to meet enterprise security requirements.
Quad-redundancy: To enable connectivity between a Google Cloud and an AWS cloud region, quad-redundant connections are leveraged, ensuring facility redundancy, as well as edge-router redundancy. This design helps protect from multiple simultaneous failure scenarios and provides high resiliency levels for joint customers.
Managed operations are key to enabling integrated solutions. The newly introduced solution not only streamlines the physical and logical builds on behalf of joint customers, it also leverages a robust underlying proactive monitoring system that detects and reacts to failures before customers suffer from their consequences. The system relies on coordinated maintenance to avoid overlaps that may impact end-to-end service availability, and streamlines support operations to address potential issues on behalf of customers.

A variety of multicloud workloads

These new streamlined network connections between Google Cloud and AWS enable application teams to automate network builds for a variety of interesting applications. Consider the following scenarios:

Infrastructure and AI deployments supporting active-active or active-standby disaster recovery strategies. With basic connectivity between two peer services — e.g., agentic AI applications or database replicas — applications can synchronize state across the cloud boundary as if they are co-located, supporting maximum application resilience and operational consistency.
AWS customers issuing inbound requests into Google Cloud to allow a service running in AWS to securely and privately access a Google Cloud API. Examples include custom applications running on Compute Engine or a critical data warehouse hosted in BigQuery that bypasses the public internet for enhanced security and performance.
Google Cloud customers issuing outbound requests towards AWS, where a data pipeline orchestrating in Google Cloud can privately pull large datasets from an AWS datastore like S3 or an RDS instance.

Build applications across Google Cloud and AWS today

Regardless of your use case, if your organization would benefit from simple, secure, and robust on-demand connectivity between your Google Cloud and AWS environments, we invite you to start building your applications across clouds and let us manage your network connectivity infrastructure for you.

This collaboration is not restricted to Google Cloud and AWS. We invite other cloud and service providers to offer their customers this streamlined private peering capability with Google Cloud. To learn more, check out the open specification, and contact us at cross-cloud@google.com. We are truly excited to grow this ecosystem for the benefit of our joint customers.