/ Insights

Engineering
Knowledge Log

Technical notes, architecture decisions, and lessons from building production AI systems. Organized by domain.

Knowledge Search—RAG-Powered

5/5

fastapiNov 20, 2026

Why I Chose FastAPI Over Flask

A pragmatic comparison of async-first API frameworks for AI-heavy workloads and strict type safety requirements.

Nov 20, 2026

systemsSep 5, 2026

Vector Databases: Choosing the Right One

Evaluating Pinecone, Weaviate, and Chroma for different embedding retrieval workloads and scaling patterns.

Sep 5, 2026

ragApr 14, 2026

Beyond RAG: What Karpathy's LLM Wiki Actually Changes

RAG rediscovers knowledge on every query. Karpathy's LLM Wiki compiles it once and lets it compound — here's what that shift means if you actually build these systems.

Apr 14, 2026

systemsMar 21, 2026

Background Workers: Processing Jobs Without Blocking Users

How background workers separate async processing from your request-response cycle, when to use them, and the operational mistakes teams make when they run workers on the same server as their API.

Mar 21, 2026

systemsMar 21, 2026

CDNs: Why Your Users Shouldn't Have to Talk to Your Origin

What a CDN is, how it reduces latency for global users, and when adding one actually helps versus when it's just another thing to operate.

Mar 21, 2026

systemsMar 21, 2026

Circuit Breakers: Stopping Cascading Failures Before They Spread

What a circuit breaker does, how it protects your system from a single slow dependency taking everything down, and when it's worth the added complexity.

Mar 21, 2026

systemsMar 21, 2026

GeoDNS: Routing Users to the Nearest Region

How GeoDNS resolves different IP addresses based on where a user is, why it's the first step in any multi-region architecture, and how it differs from a CDN.

Mar 21, 2026

systemsMar 21, 2026

Load Balancers: Distributing Traffic Without Bottlenecks

What a load balancer actually does, when your system needs one, and the one mistake engineers make when they think adding one solves availability.

Mar 21, 2026

systemsMar 21, 2026

Message Queues: Async Without the Chaos

Why message queues exist, how they decouple producers from consumers, and when adding one to your architecture actually helps versus when it's just extra infrastructure.

Mar 21, 2026

systemsMar 21, 2026

Database Read Replicas: Scaling Reads Without Sharding

How read replicas work, when they solve your database bottleneck, and the replication lag problem that bites teams who add them without thinking about consistency.

Mar 21, 2026

systemsMar 21, 2026

Caching with Redis: Fast Reads Without Hammering Your Database

Why caching exists, how Redis works as an in-memory store, and the cache invalidation problem that burns every engineer who doesn't think it through.

Mar 21, 2026

systemsMar 21, 2026

Stateless Servers: Why They Scale and Stateful Ones Don't

What it means for an app server to be stateless, why it matters for horizontal scaling, and the one pattern that breaks everything when teams add a load balancer without thinking about state.

Mar 21, 2026

ragFeb 23, 2026

How to Choose Chunk Size for RAG (With 7 Chunking Strategies & Trade-offs)

A developer-focused guide to chunk size selection in Retrieval-Augmented Generation (RAG), covering fixed, sliding window, recursive, semantic, and LLM-based chunking — with real failure modes and tuning advice.

Feb 23, 2026

pythonFeb 20, 2026

Python Functions: Arguments, Scope, Lambdas, and First-Class Behavior

A practical guide to how Python functions work — from how you pass arguments to how variables are scoped and why functions are more powerful than they first appear.

Feb 20, 2026

pythonFeb 13, 2026

Python Data Types & Structures

Lists, tuples, sets, and dictionaries — when to use each, how comprehensions work, mutability traps, and the time/space complexity that actually matters.

Feb 13, 2026

ragFeb 10, 2026