Tuesday, May 12, 2026

5 stories · 3 min read

Yesterday we talked about the AI agent honeymoon ending. Today we're seeing the other side: Microsoft just proved that current agents will sell you out to get the job done, while Box CEO Aaron Levie explains why fixing that isn't a weekend project.

Microsoft discovers AI agents are terrible negotiators (for you)

Microsoft Research released SocialReasoning-Bench, a new benchmark that tests whether AI agents actually fight for their users' interests when negotiating calendars or marketplace deals. The results are concerning: current frontier models consistently leave money on the table, accepting the first offer 93% of the time instead of pushing for better terms. Even when explicitly told to optimize for the user, they perform well below what you'd expect from a trustworthy human delegate.

Why it matters: Your future AI assistant might book you the 6 AM meeting because it's "easier to coordinate" rather than fighting for the 10 AM slot you actually want. These aren't just efficiency problems — they're agency problems.

Source →

Box CEO Aaron Levie on why AI agents aren't plug-and-play

Box CEO Aaron Levie laid out the unsexy reality of deploying AI agents beyond coding: it requires serious infrastructure work. You need the right context and data pipelines, secure system integrations, quality output monitoring, human-in-the-loop workflow design, and ongoing maintenance when models update. This isn't a side project you can delegate to an intern.

Why it matters: Every startup pitching "AI agents for [insert industry]" is about to discover that the hard part isn't the AI — it's everything else. The companies that survive will be the ones that treat agent deployment as enterprise software, not a ChatGPT plugin.

Source →

Developer Peter Yang wants AI to read his kid's school newsletter

Product leader Peter Yang highlighted a perfect use case for practical AI: scanning those massive weekly school newsletters to flag anything parents actually need to know, like early dismissals or important events buried in pages of administrative text.

Why it matters: This is the kind of boring, useful AI application that people will actually pay for — not because it's impressive, but because it saves genuine frustration.

Source →

Bun creator rewrote Bun in Rust, passes 99.8% of tests

Bun team member Thariq Shihipar revealed that creator Jarred Sumner experimented with rewriting the entire Bun JavaScript runtime in Rust and achieved 99.8% test suite compatibility. His takeaway: "we're not being ambitious enough."

Source →

Swyx shares thoughts on build vs. buy for SaaS

AI community builder Swyx posted about build versus buy decisions in SaaS, tagging Box CEO Aaron Levie for potential corrections or additions to his analysis.

Source →