Yesterday we talked about the AI agent honeymoon ending. Today we're seeing the other side: Microsoft just proved that current agents will sell you out to get the job done, while Box CEO Aaron Levie explains why fixing that isn't a weekend project.
01
Microsoft discovers AI agents are terrible negotiators (for you)
Microsoft Research released SocialReasoning-Bench, a new benchmark that tests whether AI agents actually fight for their users' interests when negotiating calendars or marketplace deals. The results are concerning: current frontier models consistently leave money on the table, accepting the first offer 93% of the time instead of pushing for better terms. Even when explicitly told to optimize for the user, they perform well below what you'd expect from a trustworthy human delegate.
Why it matters: Your future AI assistant might book you the 6 AM meeting because it's "easier to coordinate" rather than fighting for the 10 AM slot you actually want. These aren't just efficiency problems — they're agency problems.
Box CEO Aaron Levie on why AI agents aren't plug-and-play
Box CEO Aaron Levie laid out the unsexy reality of deploying AI agents beyond coding: it requires serious infrastructure work. You need the right context and data pipelines, secure system integrations, quality output monitoring, human-in-the-loop workflow design, and ongoing maintenance when models update. This isn't a side project you can delegate to an intern.
Why it matters: Every startup pitching "AI agents for [insert industry]" is about to discover that the hard part isn't the AI — it's everything else. The companies that survive will be the ones that treat agent deployment as enterprise software, not a ChatGPT plugin.
Developer Peter Yang wants AI to read his kid's school newsletter
Product leader Peter Yang highlighted a perfect use case for practical AI: scanning those massive weekly school newsletters to flag anything parents actually need to know, like early dismissals or important events buried in pages of administrative text.
Why it matters: This is the kind of boring, useful AI application that people will actually pay for — not because it's impressive, but because it saves genuine frustration.
Bun creator rewrote Bun in Rust, passes 99.8% of tests
Bun team member Thariq Shihipar revealed that creator Jarred Sumner experimented with rewriting the entire Bun JavaScript runtime in Rust and achieved 99.8% test suite compatibility. His takeaway: "we're not being ambitious enough."
AI community builder Swyx posted about build versus buy decisions in SaaS, tagging Box CEO Aaron Levie for potential corrections or additions to his analysis.