Images, Audio & Video
50 items tagged with this topic
Recent
[AINews] Anthropic growing 10x/year while everyone else is laying off >10% of their workforce
A quiet day lets us reflect on an interesting dichotomy in the economy.
[AINews] GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs
OpenAI continues deploying GPT-5 everywhere
Older
Ken Liu on AI and Freedom
this show was such a treat
Apparently, this is a hot take.. Startup vintage of 2023-2025 is slowly realizing that fancy la…
Apparently, this is a hot take.. Startup vintage of 2023-2025 is slowly realizing that fancy launch videos & focusing just on distribution MIGHT lead to VC funding, but still leads to money being lit on fire when you sh…
We can now reproduce issues directly in empheral crabboxes with WebVNC (Linux/Windows/macOS). A…
We can now reproduce issues directly in empheral crabboxes with WebVNC (Linux/Windows/macOS). Agents set up the exact state to test + fix and post videos on the PR. Working hard to level up our QA. https://t.co/SEj2XRpa…
Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0
NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.
🦀 Crabbox 0.3.0 is out. Remote Linux runs for dirty worktrees 🔐 GitHub browser login 🧰 Black…
🦀 Crabbox 0.3.0 is out. Remote Linux runs for dirty worktrees 🔐 GitHub browser login 🧰 Blacksmith Testbox wrap 📡 crabbox attach for live run replay 📜 Durable run events ☁️ AWS image create 🛡️ Cloudflare Access bre…
[AINews] ImageGen is on the Path to AGI
reflecting on the continued GPT-Image-2 explosion
request for chrome extension that augments all image input boxes on the web: - lets me generate…
request for chrome extension that augments all image input boxes on the web: - lets me generate a simple word text thing (no ai) OR - draw something with @tldraw (no ai) OR - use either words or drawings to generate som…
Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried…
Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new…
Anyone figure out how to generate great YouTube thumbnails with GPT Image 2 yet? Have a good pr…
Anyone figure out how to generate great YouTube thumbnails with GPT Image 2 yet? Have a good prompt?
Using ChatGPT Image 2 to make slides is SO fun https://t.co/R9exbKERDP
Using ChatGPT Image 2 to make slides is SO fun https://t.co/R9exbKERDP
Demis Hassabis on Building DeepMind, AlphaFold, and the Final Stretch to AGI
Speaker 1 | 00:02 - 00:06 Dennis, thank you so much. Exciting to be here. Thanks everyone for coming. It's great to be here. Speaker 2 | 00:06 - 00:09 We're so honored to have you at our chocolate factory. Speaker 1 | 00:09 - 00:13 Yes, I…
Don't get AI to generate images; get them to generate SVGs! Vector illustrations that seamlessl…
Don't get AI to generate images; get them to generate SVGs! Vector illustrations that seamlessly blend into the design style is the perfect complement to HTML Slides (Here I'm using the @QuiverAI API inside @AnyGenIO's…
Here’s how our TPUs power increasingly demanding AI workloads.
Learn how Google’s TPUs power increasingly demanding AI workloads with this new video.
Products that need APIs/MCPs: @substack @RiversidedotFM Every other video editing tool Every ba…
Products that need APIs/MCPs: @substack @RiversidedotFM Every other video editing tool Every bank that's not @mercury Every government website Every healthcare portal
Coding agents will be the foundation of all superintelligence. At a minimum, coding ability is…
Coding agents will be the foundation of all superintelligence. At a minimum, coding ability is indistinguishable from 'proficiency with computers'. Great coding agents like Claude Code master bash, filesystems, configur…
Check out what else you can make with GPT 5.5 in my video: https://t.co/hdSCW7iaKT
Check out what else you can make with GPT 5.5 in my video: https://t.co/hdSCW7iaKT
“The core design is about understanding, not output.” A design is an intention. Not an image or…
“The core design is about understanding, not output.” A design is an intention. Not an image or a prototype. Output without intention is hallucination. https://t.co/S7QAxEwVRM
I shared my full impressions for GPT 5.5 and ChatGPT Images 2 here, watch now: https://t.co/hdS…
I shared my full impressions for GPT 5.5 and ChatGPT Images 2 here, watch now: https://t.co/hdSCW7iaKT
[AINews] OpenAI launches GPT-Image-2
with Cursor getting a $10B contract with xAI and a right to acquire for $60B.
btw in talking to friends the best framing for how to discuss GPT-Image-2-Thinking taking multi…
btw in talking to friends the best framing for how to discuss GPT-Image-2-Thinking taking multiple tens of mins for generation and being able to oneshot QR codes and diagrams and logos and foods and faces.. ...is that I…
ChatGPT Images works great from the mobile app, but when I try to generate images on @ChatGPTap…
ChatGPT Images works great from the mobile app, but when I try to generate images on @ChatGPTapp web - it often forgets it has access to the image tool and start generating code instead, resulting in "images" like this…
Making a birthday party invite website with my soon to be 8 year old it's gonna be dope (thanks…
Making a birthday party invite website with my soon to be 8 year old it's gonna be dope (thanks ChatGPT image) https://t.co/smRmUlf6Xf
OpenClaw 2026.4.21 is live. Small release, important fix: npm updates now repair bundled plugin…
OpenClaw 2026.4.21 is live. Small release, important fix: npm updates now repair bundled plugin runtime deps, with Docker E2E coverage so Telegram/Discord/Slack do not break after upgrade. Also backports OpenAI Image 2…
Here is a manga made by ChatGPT Images 2.0 of @gabeeegoooh and me looking for more GPUs: https:…
Here is a manga made by ChatGPT Images 2.0 of @gabeeegoooh and me looking for more GPUs: https://t.co/ek95JfUN5V
New ways to create personalized images in the Gemini app
Nano Banana 2 now uses your personal context and Google Photos to create images that reflect your unique life.
the Codex x @skybysoftware acquisition may have been one of the best @openai deals made in the…
the Codex x @skybysoftware acquisition may have been one of the best @openai deals made in the last year. I've been waiting for "real" computer use since @romainhuet demoed the ChatGPT App with 4o Vision at AIEWF 2024..…
Codex for (almost) everything
The updated Codex app for macOS and Windows adds computer use, in-app browsing, image generation, memory, and plugins to accelerate developer workflows.
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.
Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers
@AnthropicAI Having ALOT of fun with Claude Design. Here's a video I just made demoing 5 use ca…
@AnthropicAI Having ALOT of fun with Claude Design. Here's a video I just made demoing 5 use cases including videos, slides, websites, mobile apps, and even a design system. 📌 Watch now: https://t.co/vYMCY11TPx
We’ve watched you create the impossible with @FlowbyGoogle… today, the Flow family is growing.…
We’ve watched you create the impossible with @FlowbyGoogle… today, the Flow family is growing. 🚀 Meet @googleflowmusic (formerly ProducerAI) - a standalone site that helps you create, share, and remix original music. U…
Everyone with a vision can produce very high-quality designs now (with a lil help from Claude)…
Everyone with a vision can produce very high-quality designs now (with a lil help from Claude) https://t.co/uR1cR5P2kp
Turns out in a lot of use cases, designing with code is superior to designing using image gen m…
Turns out in a lot of use cases, designing with code is superior to designing using image gen models I’ve been making most of my graphics using HTML instead of Nano Banana
Scaling Global Organizations in the Age of AI with ServiceNow CEO Bill McDermott
Speaker 1 | 00:00 - 00:40 The cost to replace an enterprise platform in this SaaS apocalypse that people talk about is an extraordinary expense. Let's take that cost, and then let's take the cost associated with the human capital doing tha…
See our blog post for videos and demos: https://t.co/YDE0XkFuUO Request access to GPT-Rosalind…
See our blog post for videos and demos: https://t.co/YDE0XkFuUO Request access to GPT-Rosalind here: https://t.co/XqrW4n4Lmg Life Sciences plugin: https://t.co/m9JHIgLqNg
Some of my favorite things in Opus 4.7: - Very good at async work and following instructions -…
Some of my favorite things in Opus 4.7: - Very good at async work and following instructions - Effort levels are far more predictable for token control (+ new xhigh level) - No more downscaling of high-res images - Noti…
HTML videos are here! HTML is eating everything https://t.co/HIFQpP6saP
HTML videos are here! HTML is eating everything https://t.co/HIFQpP6saP
This is what the future of design looks like. Not just this specific tool¹, but the fact that e…
This is what the future of design looks like. Not just this specific tool¹, but the fact that every team in the world is now is empowered to build their own 'design factory'. Shader Lab was built with Claude Code, @thre…
Multimodal Embedding & Reranker Models with Sentence Transformers
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I th…
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowe…
Create your first full song today. For free. In under 50 days, over 100 million songs have been…
Create your first full song today. For free. In under 50 days, over 100 million songs have been generated on @GeminiApp. To celebrate, we’re unlocking the full power of our music model, Lyria 3. Here’s what you get star…
If you’re still freaking out, I made a video for you: https://t.co/a4BYQ29Ots
If you’re still freaking out, I made a video for you: https://t.co/a4BYQ29Ots
Wan 2.7 now available on Together AI
A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.
At some point, early stage founders decided to optimize for views and funding instead of focusi…
At some point, early stage founders decided to optimize for views and funding instead of focusing on product and retention.. And, it’s starting to show. One of the very first things I do is look at change logs (or featu…
Welcome Gemma 4: Frontier multimodal intelligence on device
Create, edit and share videos at no cost in Google Vids
New AI capabilities are coming to Google Vids, powered by Lyria 3 and Veo 3.1, like high-quality video generation at no cost and more.