©MF — Muhammad Fouzan
Initializing architecture…
000
◆ Case 03 / 06Platform Architecture · 2022 · 11mo engagement

High-Load
Microservices
Hub.

ClientLogistics platform / EU-wide
RolePlatform Architect
StackGo · gRPC · Kubernetes · NATS
OutcomeFrom 240ms p99 → 38ms p99
Fig.01 — Inference topology (v2)
● LIVE
Go/TypeScript/Python/Rust/Kubernetes/Kafka/Postgres/Ray/Redis/gRPC/Terraform/AWS/GCP/DDD/Event-Sourcing/Temporal/OpenTelemetry/ClickHouse/pgvector/Go/TypeScript/Python/Rust/Kubernetes/Kafka/Postgres/Ray/Redis/gRPC/Terraform/AWS/GCP/DDD/Event-Sourcing/Temporal/OpenTelemetry/ClickHouse/pgvector/

Forty-plus services, no shared contracts, and a sync-call graph deep enough that one slow dependency pinned the whole product.

Tracing was partial. The retry storm was not a theoretical risk.

Shared gRPC contracts enforced at the CI level, NATS for everything that could be async, and a single OpenTelemetry pipeline everything had to emit through. Budgets were written into SLOs, not wikis.

The rule became: if you can't explain where a 300ms went, the service doesn't ship.

0ms p99
Gateway Latency
0%
Incident Reduction
0%
Contract Coverage

p99 dropped an order of magnitude. The next incident that would've been a full-platform outage became one team's Slack thread.

The shape of the system didn't change much. Its discipline did.

Next Case — 04/06
Custom Core ERP Systems