Articles

Notes from building real things

Web, React and Next.js, AI and RAG, and mobile. Written from production work, not tutorials about tutorials.

How Much Does It Cost to Build a RAG Chatbot in 2026?

An MVP RAG chatbot costs $4k-$12k to build in 2026, production builds run $15k-$40k+, and running costs land at $5-$30 per 1,000 queries. Full breakdown.

Jun 9, 2026· 6 min read

RAGAIChatbot

AI Engineering

Reliable JSON From LLMs: Structured Outputs Compared 2026

Strict structured outputs hold ~99.9% schema compliance while plain JSON mode fails 8-15% of the time. I compare OpenAI, Claude, and Gemini with one Zod schema.

Jun 10· 7 min read

AI Engineering

Do AI Agents Need a Memory Layer? Mem0 vs Letta vs Zep

Most AI agents don't need a memory vendor. Unless you need consolidation, decay, or cross-agent state, Postgres with pgvector covers memory for $0 extra.

Jun 10· 8 min read

AI Engineering

How to Migrate Your MCP Server to the 2026 Stateless Spec

The final MCP spec ships July 28, 2026 and removes sessions from the protocol. I migrated my production Node server; here is the exact diff and checklist.

Jun 10· 6 min read

AI Engineering

Private RAG on Local Models: Qwen3 vs Gemma 4 in 2026

Yes, you can ship private RAG on one 24GB GPU in 2026. I ran a 50-question eval: Gemma 4 26B MoE wins English corpora, Qwen3.6 27B wins multilingual.

Jun 10· 7 min read

AI Engineering

How to Secure an MCP Server: 2026 Hardening Checklist

I audited my production MCP stack against the NSA's May 2026 guidance and the OX Security RCE disclosure. Here is the 12-point hardening checklist I use.

Jun 10· 7 min read

AI Engineering

Claude Code vs Cursor vs Codex for Real Client Work 2026

Pricing converged at $20/$200 and SWE-bench scores sit within a point, so workflow decides. Real cost-per-feature numbers from paid client projects.

Jun 10· 7 min read

AI Engineering

Is RAG Dead in 2026? Agentic Retrieval in Production

No. I rebuilt my production SaaS pipeline as agentic retrieval: cost per query down 36%, accuracy up from 68% to 89%. Only naive top-k RAG died in 2026.

Jun 10· 7 min read

AI Engineering

How to Reduce LLM API Costs: Caching and Routing in 2026

I cut my reputation SaaS's LLM bill 79%, from $41.60 to $8.90 per 1,000 AI replies, using routing, prompt caching, semantic caching, and batching.

Jun 10· 7 min read

AI Engineering

AI Agent Observability in Node.js with OpenTelemetry

OTel GenAI spans went stable in early 2026. Here is how I instrument a TypeScript agent in Node.js, track cost per trace, and alert on silent failures.

Jun 10· 7 min read

AI Engineering

How to Add AI Features to an Existing SaaS Without a Rewrite

You do not need a rewrite to add AI to a SaaS. Add one endpoint beside your existing API, wire it to one workflow, stream the output, and cap token spend.

Jun 6· 8 min read

Building something similar?

I take on a few projects at a time: web apps, AI features, and mobile. Tell me what you are working on.

Start a conversation