Architecture

Paper Notes: HAKES — High Availability Key-value Engine for Search

HAKES is a vector search system designed to scale to billions of vectors while maintaining sub-millisecond tail latencies. Unlike monolithic vector databases, HAKES adopts a disaggregated architecture that separates storage, indexing, and search layers to achieve high availability and seamless horizontal scaling. TL;DR HAKES addresses the limitations of traditional vector databases—specifically resource contention (“heat”) and the high cost of graph-based indices. By employing a two-stage Filter-and-Refine architecture based on IVF + PQ, it offloads persistent data to Cloud Storage and uses a unified cluster management approach to handle planet-scale workloads. ...

Beyond Logical State: The Case for Physical-Aware Orchestration

Solving the Coordination Problem A decade ago, distributed systems faced a fundamental crisis: Coordination. Managing the lifecycle of a distributed database—ensuring replicas were in sync, handling leader elections, and recovering from node failures—was a bespoke nightmare for every new product. LinkedIn Helix solved this by introducing a standardized state-machine model. It moved the industry from “manual scripts and prayers” to a world where a central controller manages transitions (e.g., OFFLINE → SLAVE → MASTER). If a node died, Helix knew exactly how to move the remaining nodes to a “Target State”. It turned cluster management into a deterministic logic problem. ...

The Fallacy of the Ring: Why Scale Requires Tablet-Based Placement

In modern distributed systems, cluster management is no longer a mathematical puzzle—it is a battle against physical constraints. While the Consistent Hash Ring offers a probabilistic shield against hotspots, it fails as we push hardware toward its theoretical limits. The “elegant math” of the 1990s now collides with the cold reality of system physics: data has gravity, metadata has a cost, and rebalancing creates heat. The industry is currently in the midst of a fundamental course-correction. We are retreating from the probabilistic decentralization of the Ring and moving toward the deterministic, explicit control of the Tablet Model. ...