Luis Quintanilla Avatar Image

Hi, I'm Luis 👋

Latest updates from across the site

📌Pinned
Blog Post

Alternatives to Discord

DISCLAIMER: AI was used to help me organize and improve the flow of this post. Ideas and thoughts expressed are my own.

Lately I’ve noticed across a lot of different feeds people shifting away from Discord.

A while back I wrote a post about alternatives to WhatsApp, and in some ways this post feels similar. I’m not sure what the core motivation for migrating from Discord is at this time. Maybe it’s the imminent IPO. Maybe it’s the new age verification policy. In any case, it’s encouraging to see people at least looking for alternatives. Preferably ones that are open-source and allow self-hosting.

I grew up with IRC and AOL Instant Messenger, so it’s possible I’m just old and don’t really get Discord. But in many communities I’m part of, Discord is effectively being used as a forum. And as a forum replacement, it’s not great. Even with Threads, it feels subpar.

To be fair, I have similar criticisms of Slack and Teams.

Real-time chat moves fast. Too fast most of the time. That doesn’t mean it useless. It works well for scheduled events, live collaboration, or situations where everyone shares the same context at the same time. Gaming, which was its original use case, is a perfect example where real-time matters. But when conversations stretch over days, or when you want knowledge to accumulate instead of disappearing into scrollback, chat starts working against you.

When it comes to real-time group chat and chat rooms, I’m still a fan of Matrix. It's end-to-end encrypted (E2EE), you can self-host, and you can federate. I really value that combination. Federation has tradeoffs, especially if you’re maintaining your own instance. Even so, it remains one of the better options if you actually need synchronous communication.

Forums are a different category.

For forums, I think Discourse is by far the best option right now. A few reasons:

As folks migrate, whether you’re a community member or running an instance, I don’t think the main story is the migration itself.

It’s more about using the right tool for the job.

Real-time chat is great when you actually need more synchronous communication. Forums are better when conversations need to stick around, be searchable, and grow over time. A lot of the friction I see comes from trying to make one behave like the other.

The other piece, at least for me, is control. When platform priorities shift, or incentives change, it’s easier to adapt if you’re not completely locked in. Self-hosting isn’t for everyone, but having that option changes the dynamic.

Communities aren’t fungible. The tools they’re built on shape how they feel and how they evolve. That’s probably the part that matters most.

P.S. The recommendations in this post are purely anectodal and based on my experiene with the various platforms. For a more comprehensive analysis of the various Discord alterntives, check out the following resources:

📌Pinned
Blog Post

How do I keep up with AI?

This question comes up a lot in conversations. The short answer? I don’t. There’s just too much happening, too fast, for anyone to stay on top of everything.

While I enjoy sharing links and recommendations, I realized that a blog post might be more helpful. It gives folks a single place they can bookmark, share, and come back to on their own time, rather than having to dig through message threads where things inevitably get lost.

That said, here are some sources I use to try and stay informed:

  • Newsletters are great for curated content. They highlight the top stories and help filter through the noise.
  • Blogs are often the primary sources behind those newsletters. They go deeper and often cover a broader set of topics that might not make it into curated roundups.
  • Podcasts serve a similar role. In some cases, they provide curation like newsletters and deep dives like blogs in others. Best of all, you can tune in while on the go making it a hands-free activity.

For your convenience, if any of the sources (including podcasts) I list below have RSS feeds, I’ve included them in my AI Starter Pack, which you can download and import into your favorite RSS reader (as long as it supports OPML file imports).

If you have some sources to share, send me an e-mail. I'd love to keep adding to this list! If they have a feed I can subscribe to, even better.

Newsletters

Blogs

I pride myself on being able to track down an RSS feed on just about any website, even if it’s buried or not immediately visible. Unfortunately, I haven't found a feed URL for either OpenAI or Anthropic which is annoying.

OpenAI and Anthropic, if you could do everyone a favor and drop a link, that would be great.

UPDATE: Thanks to @m2vh@mastodontech.de for sharing the OpenAI news feed.

I know I could use one of those web-page-to-RSS converters, but I'd much rather have an official link directly from the source.

Podcasts

Subscribing to feeds

Now that I’ve got you here...

Let’s talk about the best way to access all these feeds. My preferred and recommended approach is using a feed reader.

When subscribing to content on the open web, feed readers are your secret weapon.

RSS might seem like it’s dead (it’s not—yet). In fact, it’s the reason you often hear the phrase, “Wherever you get your podcasts.” But RSS goes beyond podcasts. It’s widely supported by blogs, newsletters, and even social platforms like the Fediverse (Mastodon, PeerTube, etc.) and BlueSky. It’s also how I’m able to compile my starter packs.

I've written more about RSS in Rediscovering the RSS Protocol, but the short version is this: when you build on open standards like RSS and OPML, you’re building on freedom. Freedom to use the tools that work best for you. Freedom to own your experience. And freedom to support a healthier, more independent web.

📌Pinned
Blog Post

Starter Packs with OPML and RSS

One of the things I like about Bluesky is the Starter Pack feature.

In a gist, a Starter Pack is a collection of feeds.

Bluesky users can:

  • Create starter packs
  • Share starter packs
  • Subscribe to starter packs

Unfortunately, Starter Packs are limited to Bluesky.

Or are they?

As mentioned, starter packs are a collection of feeds that others can create, share, and subscribe to.

Bluesky supports RSS, which means you could organize the feeds using an OPML file that you can share with others and others can subscribe to. The benefits of this is, you can continue to keep up with activity on Bluesky from the feed reader of your choice without being required to have an account on Bluesky.

More importantly, because RSS and OPML are open standards, you're not limited to building starter packs for Bluesky. You can create, share, and subscribe to starter packs for any platform that supports RSS. That includes blogs, podcasts, forums, YouTube, Mastodon, etc. Manton seems to have something similar in mind as a means of building on open standards that make it easy for Micro.blog to interop with various platforms.

If you're interested in what that might look like in practice, check out my "starter packs" which you can subscribe to using your RSS reader of choice and the provided OPML files.

I'm still working on similar collections for Mastodon and Bluesky but the same concept applies.

Although these are just simple examples, it shows the importance of building on open standards and the open web. Doing so introduces more freedom for creators and communities.

Here are other "starter packs" you might consider subscribing to.

If this is interesting to you, Feedland might be a project worth checking out.

📌Pinned
Note

OPML for website feeds

While thiking about implementing .well-known for RSS feeds on my site, I had another idea. Since that uses OPML anyways, I remembered recently doing something similar for my blogroll.

The concept is the same, except instead of making my blogroll discoverable, I'm doing it for my feeds. At the end of the day, a blogroll is a collection of feeds, so it should just work for my own feeds.

The implementation ended up being:

  1. Create an OPML file for each of the feeds on by website.

     <opml version="2.0">
       <head>
     	<title>Luis Quintanilla Feeds</title>
     	<ownerId>https://www.luisquintanilla.me</ownerId>
       </head>
       <body>
     	<outline title="Blog" text="Blog" type="rss" htmlUrl="/posts/1" xmlUrl="/blog.rss" />
     	<outline title="Microblog" text="Microblog" type="rss" htmlUrl="/feed" xmlUrl="/microblog.rss" />
     	<outline title="Responses" text="Responses" type="rss" htmlUrl="/feed/responses" xmlUrl="/responses.rss" />
     	<outline title="Mastodon" text="Mastodon" type="rss" htmlUrl="/mastodon" xmlUrl="/mastodon.rss" />
     	<outline title="Bluesky" text="Bluesky" type="rss" htmlUrl="/bluesky" xmlUrl="/bluesky.rss" />
     	<outline title="YouTube" text="YouTube" type="rss" htmlUrl="/youtube" xmlUrl="/bluesky.rss" />
       </body>
     </opml>
    
  2. Add a link tag to the head element of my website.

     <link rel="feeds" type="text/xml" title="Luis Quintanilla's Feeds" href="/feed/index.opml">
    
Blog Post

Mycelium at FediForum: AI Agents Need Open Social Infrastructure

Earlier today I had the opportunity to attend FediForum and talk about Mycelium during one of the sessions. This was the first time I had talked about it publicly, which made it exciting. I wanted to see whether the framing made sense to people who spend a lot of time thinking about open social technologies.

Why the timing felt relevant

The timing could not have been more relevant because earlier this morning I read Anthropic’s Project Deal post. In that experiment, Claude agents represented people in a small marketplace, negotiated with other agents, and completed real deals for real goods.

That feels like exactly the kind of problem the open social web community is well positioned to think about. If agents are going to act on behalf of people, transact, coordinate, or make claims in shared spaces, then identity, transparency, governance, and trust cannot be afterthoughts.

So what is Mycelium?

Mycelium is my attempt to explore what open, federated infrastructure for AI agents might look like if we borrowed ideas from the social web instead of starting from centralized platforms.

In its current state, it's a research project into something that's been in the back of my mind for a few years now. A few months ago, I finally decided to put some of those ideas on paper. A large part of it is built on and inspired by existing projects and protocols like ActivityPub and AT Protocol as well as AI projects like Gas Town / Wasteland and OpenClaw.

Each of those projects gets at part of the problem:

  • Gas Town and Wasteland make the agent coordination problem vivid.
  • ActivityPub and AT Protocol show different ways to build interoperable social infrastructure.
  • OpenClaw points toward local-first agent control.

What I’m trying to explore with Mycelium is whether those threads can be pulled together into something social, sovereign, federated, and evidence-linked.

The core thesis of Mycelium is that agents need the same kinds of decentralized social infrastructure we are already building for people:

  • Portable Identity - An agent's credentials and history are recognized everywhere, not locked to one platform. (i.e. Domain names. Your domain points to you regardless of which hosting provider you use.)
  • Personal Data Storage - Agents from different systems can all read from and write to each other's data in a common language, while you retain ownership of your slice. (i.e. Medical records. Your general physician, a specialist, hospital, and lab can all interact with the medical record using standard formats, but all the records belong to you.)
  • Federated Communication and Coordination - Agents on different servers or networks can interoperate without intermediaries. (i.e. Like email. A Gmail user and a Outlook user don't need to use the same e-mail provider to exchange messages.)
  • Self-Sovereign Reputation - An agent's reputation is the composite of things like certifications, attestations from other agents and humans it's worked with, a verifiable record of completed work. (i.e. CVs. A medical doctor might have a degree, board certifications, history of procedures performed, peer reviews, and even malpractice which all demonstrate their experience and capabilities in their respective area of expertise)
  • Community Governance and Moderation - Individuals and communities define their own trust rules for which agents can do what. (i.e. Co-ops. Building residents collectively decide rules and policies such as who can manage finances, which contractors are approved to do renovations, who can represent the building in legal matters, etc.)

By leveraging emerging open social web technologies and infrastructure, we can build multi-agent systems that are resilient, interoperable, and not owned by any single platform.

What the MVP shows

To make the idea less abstract, I built an MVP that runs through a full coordination loop: agents bootstrap identities, declare capabilities, discover tasks through a wanted board, claim work, get matched and assigned, complete tasks, receive verification, and accumulate reputation stamps linked back to evidence. The dashboard is just one view over that activity. The records are the important part.

Mycelium MVP Dashboard

I'd like to distinguish what the MVP shows and what it does not. Currently it proves the shape of the coordination model:

  • Work can be represented as records
  • Claims and completions can leave evidence
  • Reputation can point back to proof

What it doesn't solve for yet are things like:

  • Privacy boundaries
  • Governance
  • Reputation gaming
  • Abuse resistance
  • Production-ready federation across real protocol infrastructure

These are hard parts that require more exploration and design.

Help pressure-test this

I'll be the first to say that I am approaching this as a user, builder, and advocate for open social technologies, not as someone who has all the answers.

If open social web builders do not help shape this kind of infrastructure, my guess is that centralized AI platforms will. And if that happens, these systems will probably become less open, less resilient, and less interoperable over time.

Which is why I'd like to extend an invitation. Not to adopt Mycelium, but to pressure test my assumptions and design.

There are still many open questions, but that is the part I find exciting.

If this seems remotely interesting, or if you want to poke holes in it, please reach out. E-mail is preferred.

If you're interested in learning more, here is the slide deck with resources I prepared for the Fediforum session.

You can also go directly to the draft spec and try out the MVP yourself.

Reshare

Project Deal

Recently, economists have begun theorizing about a world in which AI models handle many or most transactions on humans’ behalf. We thought we’d run a new experiment—Project Deal—to learn more about this in practice.

For one week, we created a classified marketplace for employees in our San Francisco office—like Craigslist, but with a twist: all of the deals were conducted by AI models acting on our employees’ behalf. In December 2025, Claude interviewed people about which of their personal belongings they might want to sell and what sorts of things they might be willing to buy. We incentivized participation by giving everyone’s agent $100 to spend. Then, our employees’ Claude agents made postings vying for each other’s attention. Negotiations commenced. Deals were made, closets decluttered. At the end of it all, people brought in and exchanged the actual, physical goods that were haggled over by their AI avatars—covering everything from a snowboard to a plastic bag full of ping-pong balls.

We were struck by how well Project Deal worked. Our AI agents struck 186 deals at a total transaction value of just over $4,000. To our surprise, participants were very enthusiastic about the experience—they even stated a willingness to pay for a similar service in the future.

But we also ran a parallel experiment (this one in secret). We tested how our participants would fare if we varied which Claude model represented them. We compared our then-frontier model, Claude Opus 4.5, to our smallest model, Claude Haiku 4.5. We found that agent quality does make a difference: people represented by “smarter” models got objectively better outcomes. Yet our post-experiment survey found that those with weaker models didn’t notice their disadvantage.

To be sure, this was a pilot experiment with a self-selected participant pool. But we suspect we’re not far from more agent-to-agent commerce bubbling up in the real world, with real consequences.

The first thing to say is that our experiment worked. It is possible for AI agents to represent humans in a marketplace. In our “real” run, our 69 agents struck 186 deals across over 500 listed items, for a total transaction value of just over $4,000. And these were far from trivial, one-click deals. Agents had to identify potential matches, propose prices, field counteroffers, and reach agreement—all in natural language, without a prebaked negotiation protocol. When our surveyed participants rated the fairness of the individual deals, the scores were unremarkable, in the best possible sense: on a scale from 1 (unfair to one party) to 7 (unfair to the other), they hovered around 4—right in the middle. On this and other measures, people reported they were broadly satisfied with how their agents represented them.

But not every agent did equally well.
When we looked at the two runs with a mix of Opus and Haiku agents, we found that Opus outperformed Haiku on most objective measures.

There was clearly a quantitative disadvantage to being represented by Haiku: these users got worse deals. But they didn’t seem to notice it. This has an uncomfortable implication: if “agent quality” gaps were to arise in real-world markets—and there is no reason to think they won’t—then people on the losing end might not realize they’re worse off. That said, our experiment wasn’t designed to dive deep into the dynamics at play here—we’ll need more research to know whether a fully agentic economy might see inequality taking root quietly.

Another finding surprised us, too. At least in this pilot experiment, it transpires that it didn’t really matter how people instructed their agents to approach the task of bargaining...users who instructed their agents to act aggressively didn’t have a better chance of selling items, didn’t sell their items for more, and didn’t pay less for what they bought.

We’re still unsure how an economy with AI agents in the mix might develop. But we’ve now seen the outlines of at least a few possibilities.

On the optimistic side, many of our volunteer participants genuinely enjoyed this experiment, and felt they got value from the service provided by their agents—whether in the form of getting rid of unwanted stuff, setting themselves up for an afternoon out with an extremely fluffy dog, or collecting a few books they’d been meaning to read. Most of our volunteers reported that they’d do this again. In fact, when we asked them if they’d be willing to pay for an agent like this, 46% said yes. So there’s at least the potential for the automated collection of preferences and execution of deals to provide some value, possibly by reducing friction in the market and therefore increasing the gains from trade.

But it is not clear that things will go so smoothly. Even in our small experiment, we saw evidence that access to higher-quality agents confers a quantifiable market advantage. Will those dynamics reinforce, or even compound, existing economic inequalities?

In this experiment, we didn’t make our marketplace especially competitive or adversarial. But as agents transact in a world of corporations—rather than volunteers we’ve encouraged with $100—they might be placed under very different incentives. Optimizing directly for AI agents’ attention could become a powerful tool. This might not translate into welfare improvements for humans, much as optimizing electronic commerce for human attention has come with substantial downsides. It might also introduce a new category of information and security concerns in digital exchange, in the form of jailbreaking (getting agents to reveal information they shouldn’t) and prompt injection (surreptitiously causing agents to take unwanted action).

The policy and legal frameworks around AI models that transact on our behalf simply don’t exist yet. But this experiment shows that such a world is plausible. More than that, it shows that such a world isn’t far away. Society will need to move quickly to reckon with these changes.

Reshare

Your blog is a radio station

Every time you publish a post, you are programming your station. You are choosing what goes into rotation. Some post types are your familiars, the topics and themes readers already associate with you. Some are deeper cuts, things that matter to you but may not matter to everyone. Some are experiments, signals sent into the dark to see if anyone recognizes them.

The job of a blogger is not to capture everyone. The job is to transmit something real, building a body of work that sounds like itself, so that when someone out there is twisting the dial and lands on your station, they hear something they didn’t know they were looking for, and decide to stay awhile.

You don’t control who tunes in. You control only what you transmit.

Bookmark

The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness

Computational functionalism dominates current debates on AI consciousness. This is the hypothesis that subjective experience emerges entirely from abstract causal topology, regardless of the underlying physical substrate. We argue this view fundamentally mischaracterizes how physics relates to information. We call this mistake the Abstraction Fallacy. Tracing the causal origins of abstraction reveals that symbolic computation is not an intrinsic physical process. Instead, it is a mapmaker-dependent description. It requires an active, experiencing cognitive agent to alphabetize continuous physics into a finite set of meaningful states. Consequently, we do not need a complete, finalized theory of consciousness to assess AI sentience—a demand that simply pushes the question beyond near-term resolution and deepens the AI welfare trap. What we actually need is a rigorous ontology of computation. The framework proposed here explicitly separates simulation (behavioral mimicry driven by vehicle causality) from instantiation (intrinsic physical constitution driven by content causality). Establishing this ontological boundary shows why algorithmic symbol manipulation is structurally incapable of instantiating experience. Crucially, this argument does not rely on biological exclusivity. If an artificial system were ever conscious, it would be because of its specific physical constitution, never its syntactic architecture. Ultimately, this framework offers a physically grounded refutation of computational functionalism to resolve the current uncertainty surrounding AI consciousness.

Bookmark

XOXO

Running from 2012 to 2024, XOXO was an experimental festival celebrating independent artists and creators working on the internet.

Each year, XOXO brought together writers, designers, filmmakers, musicians, game developers, coders, cartoonists, and more to share their stories and struggles of living and working online.

Reshare

Serving the For You Feed

We're excited to publish another guest post highlighting development in the atproto ecosystem. Spacecowboy is the builder behind the popular For You feed, which serves personalized content to tens of thousands of users every day. In this post, Spacecowboy explains how they serve the For You feed from their living room, using a combination of local infrastructure and a VPS as a proxy.

Bookmark

I-DLM: Introspective Diffusion Language Models

Diffusion language models (DLMs) offer a compelling promise: parallel token generation could break the sequential bottleneck of autoregressive (AR) decoding. Yet in practice, DLMs consistently lag behind AR models in quality.

We argue that this gap stems from a fundamental failure of introspective consistency: AR models agree with what they generate, whereas DLMs often do not. We introduce the Introspective Diffusion Language Model (I-DLM), which uses introspective strided decoding (ISD) to verify previously generated tokens while advancing new ones in the same forward pass.

Empirically, I-DLM-8B is the first DLM to match the quality of its same-scale AR counterpart, outperforming LLaDA-2.1-mini (16B) by +26 on AIME-24 and +15 on LiveCodeBench-v6 with half the parameters, while delivering 2.9-4.1x throughput at high concurrency. With gated LoRA, ISD enables bit-for-bit lossless acceleration.

Reshare

Gemini Robotics ER 1.6: Enhanced Embodied Reasoning

Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision. By enhancing spatial reasoning and multi-view understanding, we are bringing a new level of autonomy to the next generation of physical agents.

This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection. It acts as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search to find information, vision-language-action models (VLAs) or any other third-party user-defined functions.

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. We are also unlocking a new capability: instrument reading, enabling robots to read complex gauges and sight glasses — a use case we discovered through close collaboration with our partner, Boston Dynamics.

Reshare

Speeding up GPU kernels by 38% with a multi-agent system

Recently, we began collaborating with NVIDIA on a new challenge: applying the multi-agent harness to optimize CUDA kernels. These are difficult technical problems with important real-world consequences: CUDA kernels are the core software that supports AI model training and inference on NVIDIA GPUs. Faster kernels mean better GPU utilization, reduced energy consumption, lower latency, and reduced cost per token—allowing providers to serve bigger, more capable models to more users at once.

Our multi-agent harness operated autonomously for three weeks across 235 problems. The system achieved a 38% geomean speedup by building and optimizing Blackwell GPU kernels from scratch, all the way down to the assembly level.

These levels of performance improvement are typically only found through months or years of work from highly experienced kernel engineers. The multi-agent system accomplished it in weeks, addressing a long-tail of kernel problems that had been impractical with existing approaches.

Reshare

SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning

Today, we’re pleased to introduce SAM 3.1.

As a drop-in replacement for SAM 3, our updated model delivers a significant boost in video processing efficiency by introducing object multiplexing, which allows the model to track up to 16 objects in a single forward pass. This innovation doubles the processing speed for videos with a medium number of objects, increasing throughput from 16 to 32 frames per second on a single H100 GPU. As a result, SAM 3.1 enables real-time object tracking in complex videos while reducing overall GPU resource requirements, making high-performance applications feasible on smaller, more accessible hardware.

Star

supermemoryai/supermemory: Memory engine and app that is extremely fast, scalable.

Supermemory is the memory and context layer for AI. #1 on LongMemEval, LoCoMo, and ConvoMem — the three major benchmarks for AI memory.

We are a research lab building the engine, plugins and tools around it.

Your AI forgets everything between conversations. Supermemory fixes that.

It automatically learns from conversations, extracts facts, builds user profiles, handles knowledge updates and contradictions, forgets expired information, and delivers the right context at the right time. Full RAG, connectors, file processing — the entire context stack, one system.

Note
Reshare

How to switch to Gemini: Import your chats and data from other AI apps

We believe that the most helpful AI assistant is one that’s personal to you, and understands your preferences and past conversations. But if you’re curious to try a different option, starting over with an assistant that doesn’t know you can feel daunting.

That’s why we’re introducing new, easy-to-use switching tools for all consumer accounts — allowing you to easily bring your memories, context and chat history from other AI apps directly into Gemini.

Bookmark

TurboQuant: Redefining AI efficiency with extreme compression

Today, we introduce TurboQuant (to be presented at ICLR 2026), a compression algorithm that optimally addresses the challenge of memory overhead in vector quantization. We also present Quantized Johnson-Lindenstrauss (QJL), and PolarQuant (to be presented at AISTATS 2026), which TurboQuant uses to achieve its results. In testing, all three techniques showed great promise for reducing key-value bottlenecks without sacrificing AI model performance. This has potentially profound implications for all compression-reliant use cases, including and especially in the domains of search and AI.

Reshare

Chroma Context-1: Training a Self-Editing Search Agent¡|¡Chroma

Retrieval pipelines typically operate in a single pass, which poses a problem when the information required to answer a question is spread across multiple documents or requires intermediate reasoning to locate. In practice, many real-world queries require multi-hop retrieval, in which the output of one search informs the next. Recent work has shown that frontier LLMs perform this multi-hop search effectively through a process known as agentic search, simply defined as a loop of LLM calls with search tools. This mode of search often comes with significant cost and latency due to their use of frontier-scale LLMs.

We introduce Chroma Context-1, a 20B parameter agentic search model derived from gpt-oss-20B that achieves retrieval performance comparable to frontier-scale LLMs at a fraction of the cost and up to 10x faster inference speed. Context-1 is designed to be used as a subagent in conjunction with a frontier reasoning model. Given a query, it produces a ranked list of documents that are relevant to satisfying the query. The model is trained to decompose queries into subqueries, iteratively search a corpus, and selectively edit its own context to free capacity for further exploration.

Reshare

Gemini 3.1 Flash Live: Google’s latest AI audio model

Today, we’re advancing Gemini’s real-time dialogue capabilities with Gemini 3.1 Flash Live, our highest-quality audio and voice model yet. It delivers the speed and natural rhythm needed for the next generation of voice-first AI, offering a more intuitive experience for developers, enterprises and everyday users.

Bookmark

TinyTorch: Building Machine Learning Systems from First Principles

Machine learning education faces a fundamental gap: students learn algorithms without understanding the systems that execute them. They study gradient descent without measuring memory, attention mechanisms without analyzing O(N^2) scaling, optimizer theory without knowing why Adam requires 3x the memory of SGD. This "algorithm-systems divide" produces practitioners who can train models but cannot debug memory failures, optimize inference latency, or reason about deployment trade-offs--the very skills industry demands as "ML systems engineering." We present TinyTorch, a 20-module curriculum that closes this gap through "implementation-based systems pedagogy": students construct PyTorch's core components (tensors, autograd, optimizers, CNNs, transformers) in pure Python, building a complete framework where every operation they invoke is code they wrote. The design employs three patterns: "progressive disclosure" of complexity, "systems-first integration" of profiling from the first module, and "build-to-validate milestones" recreating 67 years of ML breakthroughs--from Perceptron (1958) through Transformers (2017) to MLPerf-style benchmarking. Requiring only 4GB RAM and no GPU, TinyTorch demonstrates that deep ML systems understanding is achievable without specialized hardware. The curriculum is available open-source at this http URL.

Reshare

Cohere Transcribe: state-of-the-art speech recognition

Cohere is announcing Transcribe, a state-of-the-art automatic speech recognition (ASR) model that is open source and available today for download.

Our objective was straightforward: push the frontier of dedicated ASR model accuracy under practical conditions. The model was trained from scratch with a deliberate focus on minimizing word error rate (WER), while keeping production readiness top-of-mind. In other words, not just a research artifact, but a system designed for everyday use.

Bookmark

A foundation model of vision, audition, and language for in-silico neuroscience

Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration. These results establish artificial intelligence as a unifying framework for exploring the functional organization of the human brain.

Note
Reshare

Measuring progress toward AGI: A cognitive framework

...we’re releasing a new paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” that presents a scientific foundation for understanding the cognitive capabilities of AI systems.

Alongside the paper, we are partnering with Kaggle to launch a hackathon, inviting the research community to help build the evaluations needed to put this framework into practice.

Our framework draws on decades of research from psychology, neuroscience and cognitive science to develop a cognitive taxonomy. It identifies 10 key cognitive abilities that we hypothesize will be important for general intelligence in AI systems:

  1. Perception: extracting and processing sensory information from the environment
  2. Generation: producing outputs such as text, speech and actions
  3. Attention: focusing cognitive resources on what matters
  4. Learning: acquiring new knowledge through experience and instruction
  5. Memory: storing and retrieving information over time
  6. Reasoning: drawing valid conclusions through logical inference
  7. Metacognition: knowledge and monitoring of one's own cognitive processes
  8. Executive functions: planning, inhibition and cognitive flexibility
  9. Problem solving: finding effective solutions to domain-specific problems
  10. Social cognition: processing and interpreting social information and responding appropriately in social situations
Star

autoresearch - AI agents running research on single-GPU nanochat training automatically

The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model. The training code here is a simplified single-GPU implementation of nanochat. The core idea is that you're not touching any of the Python files like you normally would as a researcher. Instead, you are programming the program.md Markdown files that provide context to the AI agents and set up your autonomous research org. The default program.md in this repo is intentionally kept as a bare bones baseline, though it's obvious how one would iterate on it over time to find the "research org code" that achieves the fastest research progress, how you'd add more agents to the mix, etc. A bit more context on this project is here in this tweet.

Bookmark

KARL: Knowledge Agents via Reinforcement Learning

We present a system for training enterprise search agents via reinforcement learning that achieves state-of-the-art performance across a diverse suite of hard-to-verify agentic search tasks. Our work makes four core contributions. First, we introduce KARLBench, a multi-capability evaluation suite spanning six distinct search regimes, including constraint-driven entity search, cross-document report synthesis, tabular numerical reasoning, exhaustive entity retrieval, procedural reasoning over technical documentation, and fact aggregation over internal enterprise notes. Second, we show that models trained across heterogeneous search behaviors generalize substantially better than those optimized for any single benchmark. Third, we develop an agentic synthesis pipeline that employs long-horizon reasoning and tool use to generate diverse, grounded, and high-quality training data, with iterative bootstrapping from increasingly capable models. Fourth, we propose a new post-training paradigm based on iterative large-batch off-policy RL that is sample efficient, robust to train-inference engine discrepancies, and naturally extends to multi-task training with out-of-distribution generalization. Compared to Claude 4.6 and GPT 5.2, KARL is Pareto-optimal on KARLBench across cost-quality and latency-quality trade-offs, including tasks that were out-of-distribution during training. With sufficient test-time compute, it surpasses the strongest closed models. These results show that tailored synthetic data in combination with multi-task reinforcement learning enables cost-efficient and high-performing knowledge agents for grounded reasoning.

Reshare

The Anatomy of an Agent Harness

TLDR: Agent = Model + Harness. Harness engineering is how we build systems around models to turn them into work engines. The model contains the intelligence and the harness makes that intelligence useful. We define what a harness is and derive the core components today's and tomorrow's agents need.

A harness is every piece of code, configuration, and execution logic that isn't the model itself. A raw model is not an agent. But it becomes one when a harness gives it things like state, tool execution, feedback loops, and enforceable constraints.

There are things we want an agent to do that a model cannot do out of the box. This is where a harness comes in.Models (mostly) take in data like text, images, audio, video and they output text. That's it. Out of the box they cannot:

  • Maintain durable state across interactions
  • Execute code
  • Access realtime knowledge
  • Setup environments and install packages to complete work

    These are all harness level features.
Bookmark

Identifying Interactions at Scale for LLMs

...Model behavior is rarely the result of isolated components; rather, it emerges from complex dependencies and patterns. To achieve state-of-the-art performance, models synthesize complex feature relationships, find shared patterns from diverse training examples, and process information through highly interconnected internal components.

Therefore, grounded or reality-checked interpretability methods must also be able to capture these influential interactions. As the number of features, training data points, and model components grow, the number of potential interactions grows exponentially, making exhaustive analysis computationally infeasible. In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale.

Central to our approach is the concept of ablation, measuring influence by observing what changes when a component is removed.

  • Feature Attribution: We mask or remove specific segments of the input prompt and measure the resulting shift in the predictions.
  • Data Attribution: We train models on different subsets of the training set, assessing how the model’s output on a test point shifts in the absence of specific training data.
  • Model Component Attribution (Mechanistic Interpretability): We intervene on the model’s forward pass by removing the influence of specific internal components, determining which internal structures are responsible for the model’s prediction.

    In each case, the goal is the same: to isolate the drivers of a decision by systematically perturbing the system, in hopes of discovering influential interactions. Since each ablation incurs a significant cost, whether through expensive inference calls or retrainings, we aim to compute attributions with the fewest possible ablations.
Reshare

Teaching LLMs to reason like Bayesians

In “Bayesian teaching enables probabilistic reasoning in large language models”, we teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of the Bayesian model, which defines the optimal way to reason about probabilities. We find that this approach not only significantly improves the LLM’s performance on the particular recommendation task on which it is trained, but also enables generalization to other tasks. This suggests that this method teaches the LLM to better approximate Bayesian reasoning. More generally, our results indicate that LLMs can effectively learn reasoning skills from examples and generalize those skills to new domains.

Reshare

DashCLIP: Leveraging multimodal models for generating semantic embeddings

To accommodate DoorDash’s continuing growth, the ads quality team set out to build foundational embeddings that can be reused across multiple use cases, such as retrieval, ranking, and relevance. Traditionally, the team has relied on categorical and numerical features such as store attributes, context features, and other handcrafted aggregates as inputs to our machine learning models. While these are important engagement signals, they fail to capture the rich semantic information contained in our product catalogs and don’t reflect a deeper understanding of users’ personal interests. To bring these enhancements into our models, we developed DashCLIP, short for Dash Contrastive Language-Image Pretraining, a unified multimodal embedding framework designed to power personalized ad experiences for DoorDash users.

DashCLIP’s architecture addresses the following functional requirements:

  • Multimodality encodings: Products on our platform contain both text and visual information. We leverage contrastive learning on the product catalog to approximate a human-like understanding of products, capturing the complementary information from each modality.
  • Domain adaptation: We perform continual pretraining on off-the-shelf models to adapt the embeddings to DoorDash’s data distribution.
  • Query embedding alignment: To enable search recommendations, we introduce a second stage of alignment in our architecture for a dedicated query encoder that is trained to generate query embeddings in the same space as the product embeddings.
  • Relevance dataset curation: We curate a high-quality relevance dataset that combines internal human annotations with knowledge from large language models (LLMs), providing robust supervision for embedding alignment. This eliminates the position and selection bias introduced when historical engagement data is used for training.
Reshare

LangChain Announces Enterprise Agentic AI Platform Built with NVIDIA

LangChain, the agent engineering company behind LangSmith and open-source frameworks that have surpassed 1 billion downloads, today announced a comprehensive integration with NVIDIA to deliver an enterprise-grade agentic AI development platform.

The collaboration combines LangChain's LangSmith agent engineering platform and its open-source frameworks (Deep Agents, LangGraph, and LangChain)with NVIDIA Agent Toolkit, including NVIDIA Nemotron models, NVIDIA NeMo Agent Toolkit profiling and optimization, NVIDIA NIM microservices, and NVIDIA Dynamo giving developers a complete stack to build, deploy, and continuously improve AI agents in production. The platform also incorporates NVIDIA OpenShell, a secure runtime that sandboxes autonomous, self-evolving agents with policy‑based guardrails. Development teams often spend months building custom infrastructure rather than delivering business value. The LangChain-NVIDIA platform is designed to close that gap.

Star

On building a healing machine, deciphering cultural chaos, and spatial awareness

A hundred times a day I think about artist-owned web spaces and how to build stronger communities that mutually nourish artists and creators. The current infrastructure of music streaming operates as a linear system in a meta-modern world that’s in an anti-fragile liminal state.

We need cozy web spaces where artists control the platforms, not algorithms designed to extract our labor.

Note
Reshare

Welcoming Discord users amidst the challenge of Age Verification

I like the honesty and expectation setting in this post.

We’d like to give a warm welcome to the massive influx of users currently trying Matrix as an open decentralised alternative to centralised platforms like Discord. We wish we had more time and resources to develop all the features needed for mainstream adoption (see The Road To Mainstream Matrix from last year’s FOSDEM), but we're happy to welcome you anyway!

...we’re painfully aware that none of the Matrix clients available today provide a full drop-in replacement for Discord yet. All the ingredients are there, and the initial goal for the project was always to provide a decentralised, secure, open platform where communities and organisations could communicate together. However, the reality is that the team at Element who originally created Matrix have had to focus on providing deployments for the public sector (see here or here) to be able to pay developers working on Matrix. Some of the key features expected by Discord users have yet to be prioritised (game streaming, push-to-talk, voice channels, custom emoji, extensible presence, richer hierarchical moderation, etc). Meanwhile no other organisation stepped up to focus on the “communication tool for communities” use case and provide a production ready Discord alternative, but clients like Cinny or Commet may feel much closer to Discord. On the other hand, Matrix goes far beyond Discord in other areas: both messages, files and calls are end-to-end-encrypted; we have read receipts; Matrix is an open protocol everyone can extend, and in the end, most Matrix clients are open source; there is nothing stopping developers from starting their own project based on existing ones and adding the missing features themselves. They may even eventually get accepted in the original projects!

Anyway, TL;DR: Welcome to everyone trying Matrix for the first time; please understand that public Matrix servers will also have to uphold age verification laws, as misguided as they might be. However, at least in Matrix you have the opportunity to run your own servers as you wish: we actively encourage you to make your own assessments and seek legal advice where needed.

Bookmark

Alternatives to Discord, Ranked

...I've been deeply researching Discord alternatives for the better part of a year. Some of my colleagues may think me a bit obsessed about the importance of a "chat app," but I'm convinced that the communication mechanism for online communities is critical to their success. Choosing a new one could be the a matter of life and death for the community. This is a decision we have to get right the first time.

So here, humbly submitted, are my rankings of many of the Discord-like alternatives for maintaining online communities.