Tuesday, April 7, 2026

5 stories · 3 min read

While OpenAI and Anthropic throw billions at the AI race, Chinese labs are quietly shipping practical solutions that solve real problems. Today's releases show they're not just catching up anymore.

DeepSeek V4 expected this month as Chinese labs race toward domestic chip independence

DeepSeek is preparing to launch its V4 model, which has been adapted to run on Huawei's Ascend chips rather than NVIDIA hardware. The move signals China's accelerating push to build AI infrastructure independent of US technology, with Alibaba, ByteDance, and Tencent placing large Huawei chip orders ahead of the release.

Why it matters: DeepSeek's previous models already proved you don't need the biggest budget to compete. If V4 delivers competitive performance on non-NVIDIA silicon, it rewrites the assumption that cutting-edge AI requires Western hardware.

Source →

Microsoft solves the "too much memory makes AI dumber" problem

Microsoft Research introduced PlugMem, a system that transforms messy AI agent interaction logs into structured, reusable knowledge. The counterintuitive insight: giving AI agents access to all their past conversations actually makes them worse at their jobs because they get lost in irrelevant details.

Why it matters: This addresses one of the biggest practical problems with AI agents in the workplace. Your company's AI assistant will actually get smarter over time instead of drowning in its own chat history.

Source →

Together AI adds Deepgram voice models for real-time agents

Together AI now offers Deepgram's speech-to-text and text-to-speech models directly on their platform, specifically optimized for building voice-powered AI agents. This eliminates the complexity of connecting multiple services to build conversational AI.

Why it matters: Building voice AI just got significantly easier. Instead of juggling APIs from three different companies, developers can now build Siri-like assistants with one provider.

Source →

Alibaba drops Qwen 3.6-Plus with always-on reasoning and 1M context

Alibaba released Qwen 3.6-Plus, removing the thinking/non-thinking toggle from previous models — reasoning is now active by default on every prompt. The model features a 1-million-token context window and is pitched around "agentic coding," able to break down complex programming tasks, write and test code, and iterate until completion.

Why it matters: The always-on reasoning approach is a bold design choice that bets developers want models that think deeply every time, not just when asked. With a free preview on OpenRouter, this is the easiest way to test Chinese frontier models right now.

Source →

IBM's Granite 4.0 Vision targets enterprise document processing

Hugging Face highlighted IBM's new Granite 4.0 3B Vision model, a compact multimodal AI designed specifically for understanding business documents. The model can process text, images, and charts in corporate settings while running efficiently on standard hardware.

Source →