Wednesday, June 10, 2026

5 stories · 3 min read

The AI coding revolution just got a reality check. New research shows half of what we thought was progress is actually garbage code, while the tools that do work are changing how people code in ways nobody predicted.

New benchmark exposes AI coding's dirty secret

Swyx shared research from METR showing that more than half of SWEBench results — the widely-cited AI coding benchmark — produce "unmergeable slop" that can't actually be used in real software projects. The new FrontierCode benchmark required over 1,000 hours of maintainer validation and includes 3,000+ quality rubrics. Even Anthropic's best model, Opus, only scores 13.8% on the hardest problems.

Why it matters: Every AI coding demo you've seen citing SWEBench scores is probably overstating what these models can actually do. If you're a CTO planning to replace developers with AI agents, these numbers suggest you should plan for a much longer timeline.

Source →

Anthropic engineer reveals how AI coding actually works after a year

Boris Cherny from Anthropic sat down to discuss how Claude Code usage has evolved since launch. He now codes from his phone, uses auto mode instead of planning mode, and relies on routines that fix bugs before he sees them. The conversation covers the gap between early demos and how people actually use AI coding tools daily.

Why it matters: The most revealing insights about AI tools come from the people who built them after they've been using them for months. If Anthropic's own engineers are coding differently than the demos suggested, everyone else probably should be too.

Source →

Google's NotebookLM adds web search and file export

Josh Woodward from Google announced NotebookLM can now search beyond your uploaded source files and export research to PDFs, Word docs, Excel files, PowerPoint presentations, and charts. The update positions NotebookLM as a complete research workflow tool, not just a document chat interface.

Why it matters: Google is quietly building the AI research assistant that could replace how knowledge workers gather and synthesize information. The file export feature means NotebookLM research can now flow directly into existing business workflows.

Source →

The autonomous AI company wave hits a wall

Nikunj Kothari observed that while many "autonomous" AI companies have launched recently, the last mile of actual autonomy remains difficult even with advanced looping techniques. He expects this gap to shrink in the coming months.

Why it matters: The flood of AI agent startups all face the same unsolved problem — getting from 80% accuracy to production-ready reliability. The companies that crack this first will have a significant advantage.

Source →

Anthropic asks: who's writing nested loops?

Thibault Sottiaux from Anthropic posted a simple question about nested loops, generating over 100 replies from developers sharing their experiences with complex AI workflows.

Source →