Images, Audio & Video
50 items tagged with this topic
Recent
Introducing Mistral Small 4 | Mistral AI
The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.
Introducing Mistral 3 | Mistral AI
The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.
Older
For folks who make talking head videos with screen share what platform do you use? I'd love som…
For folks who make talking head videos with screen share what platform do you use? I'd love something that lets me easily zoom in and out on the screen to point out specific things. @screenstudio seems cool but does it…
[AINews] Midjourney Medical: scan your organs like you step on a scale
The only bootstrapped frontier lab announces its second product and second
Agents are motivating so many healthy software habits. Open APIs, documentation (skills), tests…
Agents are motivating so many healthy software habits. Open APIs, documentation (skills), tests (evals), Unix (CLIs), payment & commerce protocols, even wide 𝙰𝚌𝚌𝚎𝚙𝚝 use (markdown/json/html). The original vision of…
Made a beautiful HTML deck using my Frontend Slides skill; very happy with how it turned out! L…
Made a beautiful HTML deck using my Frontend Slides skill; very happy with how it turned out! Lots of easter eggs (e.g. you can click any image to enlarge them, lots of nested content/hyperlinks/interactive elements etc…
Video is now live! Watch here: https://t.co/0VPeCoLfSw
Video is now live! Watch here: https://t.co/0VPeCoLfSw
Sen. Slotkin: NDAA, AI guardrails, and banning China's cars
+ does Jordan "need a life"?
Whenever you create an issue on one of oure open source projects, @clawsweeper will review it,…
Whenever you create an issue on one of oure open source projects, @clawsweeper will review it, and *if* it fits the VISION.md file, will pick it up and create+autoreview a PR. e.g.: https://t.co/Q4xOh8RFVp
[AINews] Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms
The much anticipated launch of the Mythos-class model was marred by some controversial usage policies
A viral product has a founder people can see and hear People buy from people. A screen recordin…
A viral product has a founder people can see and hear People buy from people. A screen recording from the founder beats a corporate promo video or a wall of features. Show your face. https://t.co/8gdGFsIVJB
Google DeepMind's Logan Kilpatrick: Why the Model Eats the Harness
Speaker 1 | 00:00 - 00:02 So we could edit this set so it looks like we're Speaker 2 | 00:02 - 00:06 here. Okay? Yeah. Yeah. I I want this where where we were talking off camera. Speaker 2 | 00:06 - 00:36 Like, we should do that for the in…
and the video for reference: https://t.co/tw0w0tmjIK (I didnt get to use the updated designs in…
and the video for reference: https://t.co/tw0w0tmjIK (I didnt get to use the updated designs in time)
here's the deck from this video if you want to go over it yourself: https://t.co/6adKYvxUxD lmk…
here's the deck from this video if you want to go over it yourself: https://t.co/6adKYvxUxD lmk if you have any questions!
Lots of people asked how I used Fable to edit its own launch video so I made a video about that…
Lots of people asked how I used Fable to edit its own launch video so I made a video about that! TLDR it wrote a lot of code & tool calls to use transcription services, ffmpeg, do colorgrading, use the figma mcp, make r…
[AINews] not much happened today
a quiet day
Not clear from the image, but the codex dial goes to 11.
Not clear from the image, but the codex dial goes to 11.
Building intelligent apps for Apple platforms with Claude in the Foundation Models framework
Today we're releasing Foundation Models framework support for Claude through a new Swift package that lets Apple developers use Apple's Foundation Models framework to call Claude for more complex workflows. Apple’s Foundation Mod…
Built to benefit everyone: our plan
A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone.
Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets
How Together served MiniMax-M3 efficiently with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.
this agentic coding crack is more addictive than video games smh
this agentic coding crack is more addictive than video games smh
Also as great as Codex is (and I'm really starting to love it) the frontend design still leaves…
Also as great as Codex is (and I'm really starting to love it) the frontend design still leaves alot to be desired. I have a /slides skill and you can guess which one Codex made vs. Claude. Yes I know I can make an imag…
🔬Scaling Past Informal AI - Carina Hong, Axiom Math
Verified Generation and Compounding Intelligence
Grok Imagine Video on @vercel AI Gateway – the top image-to-video model on https://t.co/tN74yJZ…
Grok Imagine Video on @vercel AI Gateway – the top image-to-video model on https://t.co/tN74yJZsfd https://t.co/hCSzh2JkKa
Here’s the video of my talk at MS Build: Build the thing that builds the thing. https://t.co/lJ…
Here’s the video of my talk at MS Build: Build the thing that builds the thing. https://t.co/lJuv2twhFe
9 demos of Gemini Omni and Gemini 3.5 in action
Watch 9 videos showing the capabilities of Gemini Omni and Gemini 3.5, announced at Google I/O 2026.
GitHub's plan for Agents — Kyle Daigle, GitHub
GitHub pioneered the modern AI coding era with Copilot, and the resulting explosion in agentic coding has led to notable strains on the most popular developer platform in the world. Here's the plan.
[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark
Jensen scores a huge win.
Why Video Agent models are next — Ethan He, xAI Grok Imagine
Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated. For the first time, we do a deep dive with the guy who led it!
Import AI 458: Reckoning with the future; and a singularity story
Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe. This issue consists of a lengthy essay based on a speech I recently gav…
The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray
80% Devin Commits, Spec-to-PR Workflows, Full VMs, Agent Memory, and PMs Shipping Code
MiniMax Hailuo 02, World-Class Quality, Record-Breaking Cost Efficiency - MiniMax News | MiniMax
MiniMax Hailuo 02 launches with NCR architecture innovation. Native 1080p generation, SOTA instruction following, extreme physics mastery. 370M videos generated, ranked #2 globally on Artificial Analy
Generate images, video, audio and remix them. Draw something and make it real. Point-click edit…
Generate images, video, audio and remix them. Draw something and make it real. Point-click edit, move things around, drag them, drop them. Invite a friend and cook some marketing, websites, or art. All on Replit Canvas!…
Vercel CLI as a self-updating binary with zero external dependencies. Our CLI is one of the key…
Vercel CLI as a self-updating binary with zero external dependencies. Our CLI is one of the key interfaces enabling the 'cloud for agents'. This solves a huge bottleneck, as we ship changes to our CLI more than ever, an…
Every week, like clockwork.. Them: How did you get your followers? Me: Idk man, I just write an…
Every week, like clockwork.. Them: How did you get your followers? Me: Idk man, I just write and post my shower thoughts consistently. Them: Do you ragebait? Me: No Them: Do you reply a lot? Me: Not really, but I do if…
[AINews] New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)
it's funding news, but it's good news.
- image or video editing? write scripts - finances, tax work, etc? put in PDFs, write scripts,…
- image or video editing? write scripts - finances, tax work, etc? put in PDFs, write scripts, output HTML - medical advice? put in PDFs + data, output HTML - filling out paperwork? write scripts - creating a report? wr…
Also extracted our image-logic into a separate library. Especially useful if you want to ensure…
Also extracted our image-logic into a separate library. Especially useful if you want to ensure small hacked images don't explode your process. Rastermill - Portable image processing for Node agents. Uses Wasm+Rust to b…
[AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer
a quiet day lets us feature fundraises!
OpenClaw's dependency purge continues. Killed Sharp and Jimp. Replaced it with photon, a small…
OpenClaw's dependency purge continues. Killed Sharp and Jimp. Replaced it with photon, a small WebAssembly that runs compiled Rust for image processing. 2MB vs 140MB. https://t.co/tSimX2GKwP
The Empire of Wuxi
*Not* the TSMC of biotech
Thinking Machines is impressive. In a couple hours I just fine tuned my own Qwen3.5-397B model…
Thinking Machines is impressive. In a couple hours I just fine tuned my own Qwen3.5-397B model this afternoon. Fast usable multimodal is also going to enable very mind-blowing personal AI. https://t.co/mm3laZb766
Ep 87: Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning
Speaker 1 | 00:00 - 00:28 Oriol Vinyals is the co lead of Gemini alongside Noam Shazir and Jeff Dean. He's had an incredible career in AI, pioneering many of the breakthroughs in deep learning in the last decade, and it was a ton of fun to…
[AINews] Google I/O 2026: Gemini 3.5 Flash, Omni (NanoBanana for Video), Spark (background agents), and Antigravity 2.0
Google has been busy!
This will bring AI to 42% of the web. Every model, every provider, every modality (text, image,…
This will bring AI to 42% of the web. Every model, every provider, every modality (text, image, video, audio). https://t.co/0w3UOLwAQO
there's 4 parts to this AI SDLC 1. have ~50 tests in place, with instructions to add more, incl…
there's 4 parts to this AI SDLC 1. have ~50 tests in place, with instructions to add more, including "make a memory that whenever you do browser e2e tests, use computer vision to visually spot check design and ux issues…
Happy anniversary, @FlowbyGoogle! From text-to-video to Omni, agent, and Tools - what a ride it…
Happy anniversary, @FlowbyGoogle! From text-to-video to Omni, agent, and Tools - what a ride it has been. 🚀 Keep pushing the boundaries! On to year two. ⚡ https://t.co/3Hh0kRdLP6
Rebuilding IT From the Ground Up for the AI Age: Serval's Jake Stauch
Speaker 1 | 00:00 - 00:18 You know, I think that there's always a gap between the idealized vision of what you think your job's gonna be and then what your job actually is. Yeah. I think it's true for every profession. You you idealize, an…
Violin: An open-source video translation skill that breaks language barriers
Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.