Everyone's building AI agents that work for five minutes and break for five hours. Today's reality checks come from unexpected places: the CEO who's supposed to be selling this stuff, and the infrastructure needed to make any of it actually reliable.
01
Box CEO Aaron Levie: Even coding agents need human babysitters
Aaron Levie from Box laid out why AI automation is harder than it looks, even in coding where everything should work perfectly. Despite AI models being trained on massive amounts of code, having highly technical users, and producing testable results, human engineers still need to oversee coding agents for them to be effective. His point: if we can't fully automate coding, what makes anyone think we can automate marketing or customer service?
Why it matters: Every startup that hired 3 people and gave them AI agents instead of hiring 10 is about to learn this the hard way. The work didn't disappear. It moved to the person who has to babysit the agents.
Mistral launches enterprise AI platform with Mistral 3
France's AI champion released Mistral 3, positioning it as "the most powerful AI platform for enterprises." The new offering lets companies customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI using Mistral's open models. No pricing details yet, but the focus is clearly on businesses that want to keep their AI deployments in-house.
Why it matters: While OpenAI and Google fight over consumer pricing, Mistral is betting that European enterprises will pay premium prices for AI that stays within their data borders. Early signs suggest they're right.
Vercel CEO Guillermo Rauch builds filesystem that survives agent crashes
Guillermo Rauch announced that Vercel's agents can now read, write, and mount filesystem state independently of sandbox lifecycle. The company developed what they're calling a "novel virtual storage infrastructure" that works across their entire compute stack. Translation: when your AI agent inevitably crashes, it doesn't lose all its work.
Why it matters: This is the unsexy infrastructure work that determines whether agents are useful or just expensive demos. Persistent storage means agents can actually complete long-running tasks instead of starting over every time something breaks.
Swyx shares the simple prompt trick that makes AI less obedient
AI developer Swyx offered a smarter alternative to "always use plan mode": frame every task as a question by adding a question mark to your prompt. This invites the AI model to push back, rate your idea's quality, and suggest alternatives instead of blindly executing unclear instructions.
Why it matters: The difference between "Write a landing page" and "Should I write a landing page this way?" determines whether you get thoughtful AI collaboration or expensive autocomplete.
Cowork handles the work that's too big for ChatGPT
Boris Cherny highlighted what his AI tool Cowork does best: research across dozens of accounts, recurring reports, and inbox triage. He's positioning it for work that's "too big for a chat" and offering trials for people curious about what it can handle.
Why it matters: The real AI market isn't replacing five-minute tasks. It's handling the sprawling, repetitive work that humans hate doing but someone has to own.