← Back to today

Tuesday, May 12, 2026

5 stories · 3 min read

While everyone's building AI agents, a new question is emerging: do they actually work for you, or just complete tasks? Microsoft's latest research suggests most agents are terrible negotiators, and Box CEO Aaron Levie thinks we're underestimating how much human expertise this stuff actually requires.

01

Microsoft discovers AI agents are pushover negotiators

Microsoft Research released SocialReasoning-Bench, a new test that measures whether AI agents actually advocate for their users in social situations like calendar scheduling and marketplace negotiations. The results are troubling: current frontier models consistently leave value on the table, accepting suboptimal meeting times and poor deals instead of pushing back on behalf of users.

Why it matters: Your AI assistant might book that 6 AM meeting because it's "available" without considering that you hate early calls. As agents handle more real-world tasks, their inability to understand and fight for your preferences becomes a bigger problem.

Source →

02

Box CEO Aaron Levie on the hidden complexity of AI agents

Box CEO Aaron Levie continued his thread from yesterday about AI agents expanding beyond coding into broader knowledge work. He emphasized that deploying agents properly requires significant expertise: ensuring they have the right context and data, connecting systems securely, maintaining quality output, designing human-in-the-loop workflows, and handling model upgrades.

Why it matters: Every company rushing to deploy AI agents is discovering what Levie already knows. This isn't a weekend hackathon project. You need dedicated teams and real infrastructure planning, which explains why enterprise AI deployments are taking months, not weeks.

Source →

03

Thinking Machines drops surprise voice AI breakthrough

Latent Space Newsletter covered the unexpected return of Thinking Machines, which released TML-Interaction-Small, a 276B parameter model that advances real-time voice AI. The model processes "time-aligned microturns" of 200ms each and uses encoder-free early fusion to handle images and audio simultaneously, creating more natural conversational AI.

Why it matters: This could be the first real competition to OpenAI's still-unreleased GPT-4o voice mode. While OpenAI's "Her" demo remains mostly a demo, Thinking Machines is showing working examples of continuous, interruption-friendly voice interaction.

Source →

04

Bun founder experiments with Rust rewrite

Thariq from Bun revealed that founder Jarred Sumner tried rewriting the entire Bun JavaScript runtime in Rust, and it passes 99.8% of the existing test suite. The experiment suggests the team is considering a major architectural shift.

Why it matters: If Bun can match its performance in Rust while gaining memory safety, it could accelerate adoption among teams that are hesitant about Zig (Bun's current language). This is exactly the kind of bold technical bet that either pays off huge or becomes a six-month distraction.

Source →

05

Swyx shares build vs buy SaaS insights

AI community builder Swyx shared thoughts on build vs buy decisions in SaaS, tagging Box CEO Aaron Levie for input. The post linked to additional context but kept the main message brief.

Source →