Enterprise Agentic Architecture — AWS · Azure

Client & ingress

Web / Mobile

Browser · iOS · Android

Client

API Consumers

REST · gRPC · WebSocket

Client

Event Producers

Kafka · webhooks · streams

Client

IoT / Edge

MQTT · embedded agents

Client

↓

Global gateway

Global API Gateway + WAF + Rate Limiter

Cloudflare / AWS API GW / Azure API Mgmt / GCP Apigee

Shared plane

↓

Agent orchestration plane (shared)

Orchestrator

LangGraph · Bedrock Agents

Orchestration

Planner Agent

Task decomposition · CoT

Orchestration

Executor Agents

Tool calls · RAG · APIs

Orchestration

Memory + Context

Redis · Cosmos · Spanner

Orchestration

↓

Cloud-native inference + compute

AWS

Bedrock · Lambda · Kinesis

OpenSearch · Aurora · ECS

AWS

Azure

OpenAI · Event Hubs · AKS

AI Search · Cosmos · Functions

Azure

GCP

Vertex AI · Pub/Sub · GKE

Vertex Search · Spanner · Run

GCP

↓

Cross-cloud fabric

HashiCorp Vault · Kafka MirrorMaker · Istio mesh · mTLS · OIDC federation

Fabric

↓

Security + observability plane

Security + Observability Plane

SIEM · OpenTelemetry · Prometheus + Grafana · Cost sentinel · PII redaction

Security

Ingress

API Gateway (HTTP API) + CloudFront + WAF + Shield Advanced

10M+ RPS · edge caching · DDoS protection · JWT authoriser

AWS

↓

Event backbone

Kinesis Data Streams — enhanced fanout

Ordered · durable · 1 MB/s per shard · auto-scale shards

AWS

↓

Compute

Lambda (Provisioned)

Sync agent tools · 128ms cold start eliminated

AWS

ECS Fargate

Long-running agents · SPOT + On-Demand

AWS

Step Functions Express

100k exec/sec · workflow orchestration

AWS

↓

Inference

Amazon Bedrock — model router

Claude 3.5 Sonnet · Titan Embeddings · Cohere · Mistral · Llama 3 · on-demand + provisioned throughput · model routing by latency SLA

AWS

↓

Storage

OpenSearch Serverless

Vector RAG · k-NN + HNSW index

AWS

ElastiCache (Redis)

Agent session memory · sub-ms reads

AWS

Aurora Serverless v2

OLTP + audit log · auto-scales to 256 ACUs

AWS

↓

Security + observability

AWS Security + Observability

GuardDuty · Macie · CloudTrail · X-Ray · CloudWatch Evidently · Cost Explorer · SCPs

Security

Ingress

Azure API Management + Front Door + DDoS Protection

Global load balancing · TLS offload · OAuth2 / managed identity · subscriptions and quotas

Azure

↓

Event backbone

Azure Event Hubs (Kafka-compatible)

10M events/sec · 32 partitions · capture to ADLS · Schema Registry

Azure

↓

Compute

Azure Functions (Premium)

Event-triggered agent tools · pre-warmed

Azure

AKS + KEDA

Long-running agents · event-driven pod scaling

Azure

Durable Functions

Fan-out / fan-in · human-in-the-loop

Azure

↓

Inference

Azure OpenAI Service — model gateway

GPT-4o · GPT-4 Turbo · text-embedding-3-large · PTU (provisioned throughput units) · APIM semantic caching · private endpoint

Azure

↓

Storage

Azure AI Search

Hybrid vector + BM25 · semantic re-ranker

Azure

Cosmos DB (multi-write)

Agent state + tool results · 99.999% SLA

Azure

Azure Cache (Redis)

Session + semantic cache · active geo-replication

Azure

↓

Security + observability

Azure Security + Observability

Defender for Cloud · Sentinel SIEM · Monitor + App Insights · Managed Identity · PIM · Purview DLP

Security

Ingress

Apigee + Cloud Armor + reCAPTCHA Enterprise

API key / OAuth2 / Workload Identity · adaptive DDoS · geo-blocking · quota plans

GCP

↓

Event backbone

Cloud Pub/Sub (exactly-once)

Global · infinite scale · push + pull delivery · ordering keys per entity

GCP

↓

Compute

Cloud Run (min-instances)

Stateless executors · 0 cold start

GCP

GKE Autopilot

Stateful agent pods · managed nodes

GCP

Cloud Workflows + Tasks

DAG orchestration · HTTP step callbacks

GCP

↓

Inference

Vertex AI — model platform

Gemini 1.5 Pro · Gemini Flash · text-embedding-004 · Model Garden · Reasoning Engine (managed agent runtime) · Grounding with Google Search

GCP

↓

Storage

Vertex AI Search

RAG + semantic search · grounding citations

GCP

Spanner (globally dist.)

Agent state + audit · strong consistency · 10k TPS

GCP

Memorystore (Redis)

Agent memory cache · HA + read replicas

GCP

↓

Security + observability

GCP Security + Observability

Security Command Center · Chronicle SIEM · Cloud Trace · Cloud Profiler · VPC Service Controls · DLP API

Security

Cloud endpoints

AWS VPC

us-east-1 · ap-southeast-2

AWS

Azure VNet

australiaeast · eastus2

Azure

GCP VPC

australia-southeast1 · us-central1

GCP

↓

Private network layer

Private dedicated circuits

Megaport / Equinix Fabric · AWS Direct Connect · Azure ExpressRoute · Google Cloud Interconnect · BGP routing · SD-WAN overlay — no agent traffic on public internet

Fabric

↓

Security fabric

Identity federation

OIDC workload identity · no long-lived keys · Okta / Entra ID as IdP

Fabric

Service mesh (Istio)

mTLS everywhere · SPIFFE/SVID · circuit breaker · traffic policy

Fabric

Secrets (HashiCorp Vault)

Dynamic secrets · auto-rotate every 24h · Enterprise namespaces

Fabric

↓

Event replication

Event replication bus — Kafka MirrorMaker 2

Active/active replication across all clouds · Confluent Cloud option · topic-level failover · schema registry sync · consumer group offset translation

Fabric

↓

Unified observability

OpenTelemetry + Grafana Cloud

All agents instrumented via OTel SDK · Prometheus federation · distributed tracing across cloud boundaries · SLO burn alerts · Cost sentinel (Apptio / CloudHealth)

Security

↓

Resilience

Resilience + chaos engineering

Active/active multi-cloud · Gremlin chaos experiments · cross-cloud circuit breaker · RTO < 2 min · RPO = 0 via sync replication

Security

Component details

Select a node from the diagram

◈

Tap any component in the diagram to see a detailed breakdown and design rationale.

Architecture overview

This architecture runs agent workloads across AWS, Azure, and GCP simultaneously, with a shared orchestration plane sitting above all three clouds. Rather than siloed stacks, agents can route to whichever cloud has available capacity or the right model for the task.

AWS layer

Kinesis enhanced fanout decouples ingestion from inference at high throughput. Lambda (provisioned concurrency) handles sub-second synchronous tool calls with cold starts eliminated. ECS Fargate runs stateful, long-context agents. Step Functions Express Workflows orchestrate multi-step pipelines at up to 100k executions per second. Bedrock routes between Claude, Titan, Cohere, and Llama based on latency SLA and task type — expensive models only get invoked when needed.

Azure layer

Event Hubs' Kafka-compatible interface lets agents reuse existing Kafka producers without rewrite. KEDA scales AKS pods directly from queue depth rather than CPU, which is essential for bursty agent workloads that are idle most of the time. Durable Functions orchestrate fan-out / fan-in patterns with native human-in-the-loop checkpoints — no custom state machines needed. Azure OpenAI PTUs eliminate the per-minute token quota ceiling, making sustained high-throughput inference predictable and cost-stable.

GCP layer

Pub/Sub ordering keys guarantee per-entity agent task ordering at unlimited scale. Vertex Reasoning Engine provides a fully managed agent runtime so you do not need to operate orchestration infrastructure yourself. Gemini Flash is the correct model for high-frequency, sub-200ms tool calls. Dataflow (Apache Beam) handles real-time stream enrichment before tasks reach the agent layer. Spanner gives globally consistent agent state across regions without the complexity of custom replication logic.

Cross-cloud fabric

Private dedicated circuits via Megaport or Equinix mean no agent-to-agent or inter-service traffic touches the public internet. OIDC workload identity federation means no static API keys cross cloud boundaries ever. Istio service mesh with SPIFFE / SVID provides mTLS on every connection. Kafka MirrorMaker 2 active/active replication makes the orchestration plane cloud-agnostic. All telemetry is unified via OpenTelemetry before landing in Grafana Cloud, giving a single pane of glass for SLO tracking regardless of which cloud is handling a request.

High-throughput design principles

The architecture is async-first at every tier. Synchronous paths are reserved only for tool calls with hard latency SLAs. Inference is always routed by task type. Agent session memory lives in a Redis-tier cache, never the origin store on hot paths. All three clouds provide horizontal autoscaling at the compute layer. The multi-cloud fabric gives a genuine cross-cloud autoscaling surface, not three independent siloed stacks.