Speed & Cost

27 items tagged with this topic

Speed & Cost27

Recent

Official SourcesfromGoogle AI BlogMay 4

Reduce friction and latency for long-running jobs with Webhooks in Gemini API

Event-Driven Webhooks are a push-based notification system that eliminates the need for inefficient polling.

Speed & Cost

Official SourcesfromMicrosoft Research BlogMay 5

Microsoft at NSDI 2026: Advances in large-scale networked systems

Microsoft researchers share advances in building and operating large-scale distributed systems, spanning datacenters, networking, and the growing intersection with AI during NSDI ’26. The post Microsoft at NSDI 2026: Advances in large-scal…

Speed & Cost Research Custom AI

Official SourcesfromTogether AI BlogMay 8

Deploy and inference any model from HuggingFace

Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.

Speed & Cost Open Source

Older

Chinese Modelsfrom Together AI BlogMay 8, 2026

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-conte…

Official Sourcesfrom Together AI BlogMay 4, 2026

Foundational research powering efficient inference at scale

As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.

Podcasts & Newslettersfrom Import AIMay 4, 2026

Import AI 455: Automating AI Research

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. Subscribe now AI systems are about to start building themselves. What does that mean?…

Podcasts & Newslettersfrom Latent Space NewsletterMay 2, 2026

[AINews] AI Engineer World's Fair — Autoresearch, Memory, World Models, Tokenmaxxing, Agentic Commerce, and Vertical AI Call for Speakers

a quiet day lets us make a call for speakers!

Official Sourcesfrom OpenAI NewsMay 4, 2026

How OpenAI delivers low-latency voice AI at scale

How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking.

Podcasts & Newslettersfrom Latent Space NewsletterApr 30, 2026

[AINews] The Inference Inflection

a quiet day lets us reflect on the growing implications of the inference age

Podcasts & Newslettersfrom Latent Space NewsletterApr 29, 2026

[AINews] not much happened today

a quiet day.

Podcasts & Newslettersfrom ChinaTalkApr 28, 2026

No Jensen, Not All Compute is Created Equal

+ Nick in Cape Town!

Podcasts & Newslettersfrom No PriorsMay 1, 2026

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

Speaker 1 | 00:05 - 00:28 Hi, listeners. Today, Elad and I are here with Tuhin Srivastava, the founder and CEO of Base10, inference cloud. We're here to talk about capacity constraints for AI compute, why inference is the last market, how…

Watchlistfrom Anthropic Engineering

An update on recent Claude Code quality reports

Over the past month, we’ve been looking into reports that Claude’s responses have worsened for some users. We’ve traced these reports to three separate changes that affected Claude Code, the Claude Agent SDK, and Claude Cowork. The API was…

Podcasts & Newslettersfrom Latent Space NewsletterApr 27, 2026

Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition

Applied Intuition puts the AI in mining rigs, drones, trucks, warships and physical vehicles in the most adversarial environments imaginable. We dive in with their CEO and CTO as they emerge.

Buildersfrom XApr 30, 2026

I haven't kept up to date with the latest @openclaw updates - is live low-latency calling with…

I haven't kept up to date with the latest @openclaw updates - is live low-latency calling with your claw now possible?

Chinese Modelsfrom Latent Space NewsletterApr 25, 2026

[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

The prodigal Tiger returns... but is no longer the benchmarks leader.

Podcasts & Newslettersfrom Latent Space NewsletterApr 23, 2026

[AINews] Tasteful Tokenmaxxing

a quiet day lets us reflect on the top conversation that AI leaders are having everywhere.

Podcasts & Newslettersfrom Latent Space NewsletterApr 22, 2026

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

A rare interview with Shopify's CTO on -everything- that Shopify is doing to maximize AI for their customers, with exclusive data on their own AI adoption.

Official Sourcesfrom OpenAI NewsApr 22, 2026

Speed & Cost

Recent

Reduce friction and latency for long-running jobs with Webhooks in Gemini API

Microsoft at NSDI 2026: Advances in large-scale networked systems

Deploy and inference any model from HuggingFace

Older

Serving DeepSeek-V4: why million-token context is an inference systems problem

Foundational research powering efficient inference at scale

Import AI 455: Automating AI Research

[AINews] AI Engineer World's Fair — Autoresearch, Memory, World Models, Tokenmaxxing, Agentic Commerce, and Vertical AI Call for Speakers

How OpenAI delivers low-latency voice AI at scale

[AINews] The Inference Inflection

[AINews] not much happened today

No Jensen, Not All Compute is Created Equal

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

An update on recent Claude Code quality reports

Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition

I haven't kept up to date with the latest @openclaw updates - is live low-latency calling with…

[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

[AINews] Tasteful Tokenmaxxing

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

Speeding up agentic workflows with WebSockets in the Responses API

On the API, a new xhigh effort level between high and max gives you finer control over reasonin…

“Where do you run inference?” “allbirds” “The shoes?” “Yea” https://t.co/2DgPXr2rsw

What a powerful education in latency and how to build voice systems … all just by chatting with…

Don't just ask AI to "summarize" a piece of long-form/difficult content. Try these prompts: 1.…

New ways to balance cost and reliability in the Gemini API

Deepgram speech-to-text and voice models now available natively on Together AI

Gradient Labs gives every bank customer an AI account manager

Gemini 3.1 Flash Live: Making audio AI more natural and reliable