I read a lot of papers. Most are noise; these are the ones that actually changed how I think about systems, scale, and software.
Moving forward, I plan to gradually publish my personal notes and “deconstructions” of these artifacts. This serves three purposes:
- Recap: A durable record for me to track and re-cap my own learning.
- Community: A way for other engineers to learn from my “filter.”
- Visibility: Sharing my perspective with the broader professional network.
🛰️ In-Flight
Current focus and active research.
- High Performance I/O For Large Scale Deep Learning
- Towards a Middleware for Large Language Models
- Slicer: Auto-Sharding for Datacenter Applications
- Lost in the Middle: How Language Models Use Long Contexts
- Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing
đź’ľ Indexed
High-signal resources moved to permanent storage.
- Chain Replication for Supporting High Throughput and Availability
- The slab allocator: an object-caching kernel memory allocator
- Large-scale cluster management at Google with Borg
- MapReduce: Simplified Data Processing on Large Clusters
- Apache Hadoop YARN: Yet Another Resource Negotiator
- Kafka: a Distributed Messaging System for Log Processing
- Untangling Cluster Management with Helix
- Data Infrastructure at LinkedIn
- All aboard the Databus!:Â Linkedin’s scalable consistent change data capture platform
- On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform
- The Google File System
- Spanner: Google’s Globally-Distributed Database
- Bigtable: A Distributed Storage System for Structured Data
- Spanner: Becoming a SQL System
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
- Dynamo: Amazon’s Highly Available Key-value Store
- Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore