Home Agentic AI Home
AWS
Azure
GCP
Orchestration / Shared
Cross-cloud fabric
Security / Observability
Overview
AWS layer
Azure layer
GCP layer
Cross-cloud fabric
Client & ingress Web / mobile API consumers Event producers IoT / edge devices Global API gateway + WAF + rate limiter Cloudflare / AWS API GW / Azure API Mgmt / GCP Apigee Agent orchestration plane (shared) Orchestrator LangGraph / Bedrock Agents Planner agent Task decomposition Executor agents Tool / API / RAG calls Memory + context Redis / Cosmos / Spanner AWS Bedrock + Claude / Titan LLM inference Lambda + SQS + Kinesis Event-driven compute OpenSearch + Aurora RAG vector + OLTP Step Functions · ECS Fargate Azure Azure OpenAI + Phi LLM inference Event Hubs + Functions Event-driven compute AI Search + Cosmos DB RAG vector + NoSQL Durable Functions · AKS GCP Vertex AI + Gemini LLM inference Pub/Sub + Cloud Run Event-driven compute Vertex Search + Spanner RAG vector + OLAP Workflows · GKE Autopilot Cross-cloud fabric HashiCorp Vault · Kafka MirrorMaker · Istio mesh · mTLS · OIDC federation Security + observability plane SIEM · OpenTelemetry · Prometheus + Grafana · Cost sentinel · PII redaction
Click any tab above to drill into each cloud layer, or click nodes in the diagram.
High-throughput design principles: Async-first event backbone, horizontal autoscaling on all compute tiers, model routing by latency SLA, shared agent memory via globally distributed cache.
AWS agentic layer — high-throughput detail API Gateway (HTTP API) + CloudFront + WAF + Shield Advanced 10M+ RPS · edge caching · DDoS protection · JWT authoriser Kinesis Data Streams (enhanced fanout) Ordered, durable, 1MB/s per shard Lambda (provisioned) Sync agent tools 128ms cold start ECS Fargate Long-running agents SPOT + On-Demand mix Step Functions Express Workflow orchestration 100k exec/sec Amazon Bedrock — model router Claude 3.5 Sonnet · Titan Embeddings · Cohere · Mistral · Llama 3 · on-demand + provisioned throughput Model routing by latency SLA · batch inference for async workloads OpenSearch Serverless Vector RAG store k-NN + HNSW index ElastiCache (Redis) Agent session memory Sub-ms reads Aurora Serverless v2 OLTP + audit log Auto-scales to 256 ACUs AWS security + observability GuardDuty · Macie · CloudTrail · X-Ray · CloudWatch Evidently · Cost Explorer · SCP / SCPs SQS FIFO + DLQ for all async agent tasks · Retry with exponential backoff · Poison-pill isolation · SNS fan-out for multi-agent broadcast Throughput targets 50k concurrent agent tasks · p99 < 800ms · 99.99% SLA Scaling levers Kinesis shard split · Fargate task autoscale · Bedrock provisioned throughput
AWS high-throughput pattern: Kinesis enhanced fanout decouples ingestion from inference. Lambda handles sub-second synchronous tool calls (provisioned concurrency eliminates cold starts). ECS Fargate runs stateful, long-context agents. Step Functions Express Workflows orchestrate multi-step pipelines at 100k executions/sec. Bedrock model router selects Claude vs Titan vs Cohere based on latency SLA and task type.
Azure agentic layer — high-throughput detail Azure API Management + Front Door + DDoS Protection Global load balancing · TLS offload · OAuth2 / managed identity · subscriptions & quotas Azure Event Hubs (Kafka-compatible) 10M events/sec · 32 partitions · capture to ADLS Azure Functions (Premium) Event-triggered agent tools Pre-warmed instances AKS (KEDA autoscale) Long-running agent pods Event-driven pod scaling Durable Functions Fan-out / fan-in Human-in-the-loop waits Azure OpenAI Service — model gateway GPT-4o · GPT-4 Turbo · text-embedding-3-large · PTU (provisioned throughput units) APIM semantic caching · content filtering · model versioning · private endpoint Azure AI Search Hybrid vector + BM25 Semantic re-ranker Cosmos DB (multi-write) Agent state / tool results 99.999% SLA Azure Cache (Redis) Session + semantic cache Active geo-replication Azure security + observability Defender for Cloud · Sentinel SIEM · Monitor + App Insights · Managed Identity · PIM · Purview DLP Service Bus Premium for guaranteed delivery · Dead-letter queues · Message sessions for ordered agent tasks · Schema Registry Throughput targets 40k concurrent · PTU removes token quota ceiling · p99 < 1s Scaling levers KEDA event-driven pod autoscale · Event Hubs partition scale · Cosmos RU autoscale
Azure high-throughput pattern: Event Hubs Kafka-compatible interface lets agents reuse existing Kafka tooling without rewrite. Durable Functions orchestrate fan-out/fan-in patterns with native human-in-the-loop checkpoints. KEDA scales AKS pods directly from queue depth. Azure OpenAI PTUs eliminate the per-minute token quota ceiling for sustained high-throughput inference.
GCP agentic layer — high-throughput detail Apigee API Gateway + Cloud Armor + reCAPTCHA Enterprise API key / OAuth2 / Workload Identity · adaptive DDoS · geo-blocking · quota plans Cloud Pub/Sub (exactly-once) Global · infinite scale · push + pull delivery · ordering keys Cloud Run (min-instances) Stateless agent executors 0 cold start w/ min=1 GKE Autopilot Stateful agent pods Managed node provisioning Cloud Workflows + Tasks DAG orchestration HTTP step callbacks Vertex AI — model platform Gemini 1.5 Pro · Gemini Flash · text-embedding-004 · Model Garden (Llama, Mistral) Reasoning Engine (managed agent runtime) · Grounding with Google Search · Function calling Vertex AI Search RAG + semantic search Grounding citations Spanner (globally dist.) Agent state + audit Strong consistency Memorystore (Redis) Agent memory cache HA + read replicas GCP security + observability Security Command Center · Chronicle SIEM · Cloud Trace · Cloud Profiler · VPC Service Controls · DLP API Dataflow (Apache Beam) for streaming enrichment · BigQuery for agent analytics · Eventarc for event-driven triggers Throughput targets Pub/Sub infinite scale · Gemini Flash for <200ms tool calls · Spanner 10k TPS Scaling levers Cloud Run concurrency · GKE node auto-prov · Vertex batch predictions
GCP high-throughput pattern: Pub/Sub ordering keys guarantee per-entity agent task ordering at unlimited scale. Vertex Reasoning Engine provides a fully managed agent runtime, removing infra overhead. Gemini Flash is the fastest model for high-frequency tool calls. Dataflow provides real-time stream enrichment before tasks reach the agent layer. Spanner global consistency ensures agent state is reliable across regions.
Cross-cloud fabric — connectivity, security and data plane AWS VPC us-east-1 / ap-southeast-2 Azure VNet australiaeast / eastus2 GCP VPC australia-southeast1 / us-central1 Network fabric Megaport / Equinix Fabric — private dedicated circuits (no public internet) · BGP routing · SD-WAN overlay AWS Direct Connect + Azure ExpressRoute + Google Cloud Interconnect Identity federation OIDC cross-cloud workload identity (no long-lived keys) Okta / Entra ID as IdP Service mesh (Istio) mTLS everywhere · SPIFFE SVID identity · traffic policy + circuit breaker Secrets management HashiCorp Vault Enterprise Dynamic secrets · auto- rotate every 24h Event replication bus Kafka MirrorMaker 2 — active/active replication across all clouds · Confluent Cloud as neutral broker option Topic-level failover · schema registry sync · consumer group offset translation Unified observability OpenTelemetry SDK (all agents) · Grafana Cloud — unified dashboards · Prometheus federation Distributed tracing across cloud boundaries · cost sentinel (Apptio / CloudHealth) · SLO burn alerts Resilience: active/active multi-cloud · chaos engineering (Gremlin) · cross-cloud circuit breaker · RTO < 2 min · RPO = 0 (sync replication)
Cross-cloud fabric design: Private circuits (Direct Connect + ExpressRoute + Cloud Interconnect) avoid the public internet for all agent-to-agent and inter-service traffic. OIDC workload identity federation means no static API keys cross cloud boundaries. Kafka MirrorMaker 2 provides active/active event replication so the agent orchestrator is cloud-agnostic. All telemetry is unified via OpenTelemetry before shipping to Grafana Cloud, giving a single pane of glass for SLO tracking regardless of which cloud is serving a request.