AI NewsGuide Labs debuts a new kind of interpretable LLM

Guide Labs debuts a new kind of interpretable LLM

12:25 AM IST · February 24, 2026

The challenge of wrangling a deep learning model is often understanding why it does what it does: Whether it’s xAI’s repeated struggle sessions to fine-tune Grok’s odd politics, ChatGPT’s struggles with sycophancy, or run-of-the-mill hallucinations, plumbing through a neural network with billions of parameters isn’t easy. Guide Labs, a San Francisco start-up founded by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, is offering an answer to that problem today. On Monday, the company open-sourced an 8 billion parameter LLM,Steerling-8B, trained with a new architecture designed to make its actions easily interpretable: Every token produced by the model can be traced back to its origins in the LLM’s training data. That can as a simple as determining the reference materials for facts cited by the model, or as complex as understanding the model’s understanding of humor or gender. “If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I’ve encoded, and then you have to be able to reliably turn that on, turn them off,” Adebayo told TechCrunch. “You can do it with current models, but it’s very fragile … It’s sort of one of the holy grail questions.” Adebayo began this work while earning his PhD at MIT, co-authoring a widely cited2018 paperthat showed existing methods of understanding deep learning models were not reliable. That work ultimately led to the creation of a new way of building LLMs: Developers insert a concept layer in the model that buckets data into traceable categories. This requires more up front data annotation, but by using other AI models to help, they were able to train this model as their largest proof of concept yet. “The kind of interpretability people do is…neuroscience on a model, and we flip that,” Adebayo said. “What we do is actually engineer the model from the ground up so that you don’t need to do neuroscience.” One concern with this approach is that it might eliminate some of the emergent behaviors that make LLMs so intriguing: Their ability to generalize in new ways about things they haven’t been trained on yet. Adebayo says that still happens in his company’s model: His team tracks what they call “discovered concepts” that the model discovered on its own, like quantum computing. Adebayo argues this interpretable architecture will be something everyone needs. For consumer-facing LLMs, these techniques should allow model builders to do things like block the use of copyrighted materials, or better control outputs around subjects like violence or drug abuse. Regulated industries will require more controllable LLMs, for example in finance, where a model evaluating loan applicants needs to consider things like financial records but not race. There’s also a need for interpretability in scientific work, another area where Guide Labs has developed technology. Protein folding has been a big success for deep learning models, but scientists need more insight into why their software figured out promising combinations. “This model demonstrates is that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science and we can scale them, and there is no reason why this kind of model wouldn’t match the performance of the frontier level models,” which have many more parameters. Guide Labs says that Steerling-8B can achieved 90% of the capability of existing models, but uses less training data, thanks to its novel architecture. The next step for the company, which emerged from Y Combinator and raised a $9 million seed round from Initialized Capital in November 2024, is to build a larger model and begin offering API and agentic access to users. “The way we’re current training models is super primitive, and so democratizing inherent interpretability is actually going to be a long term good thing for our role within the human race,” Adebayo told TechCrunch. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”

Latest AI News

View All News →

Which are the Leading AI Agents in Software Testing?

AI agents are now taking over repetitive work, identifying issues humans may miss, and helping teams maintain testing speed without slowing down releases.

21 minutes ago

View

Best Accessibility Testing Tools For Large-Scale Enterprise Applications

The right accessibility testing tools help organisations catch issues early, improve usability, and build products that work for users with disabilities and a wider range of people.

21 minutes ago

View

Mark Zuckerberg tells staff that AI agents haven’t progressed as quickly as he’d hoped

Replacing people with AI doesn’t seem to be that easy to do, if Meta can be seen as an example. Reutersreportsthat at an internal town hall Thursday, CEO Mark Zuckerberg told staff that the pace of AI agent development had not “accelerated in the way” executives had previously expected them to. Earlier this year, Metalaid off some 8,000 employees— approximately 10% of its corporate workforce — and reassigned another 7,000 to various AI groups, including one called Agent Transformation,Bloomberg reported. During this week’s meeting, Zuckerberg apparently commented on these job cuts — noting that they were not as “clean” as they should have been. The cuts were made because top officials at the company “were worried that we weren’t going to move fast enough ‌to adapt” to the changing landscape of the tech industry, Zuckerberg reportedly added. The corporate leader also apparently said that the perceived upside of the new AI-focused company structure hadn’t “come to fruition yet,” although he said that he believed the company would begin to see improvements from its AI investments during the next three to six months. Several other investigative reports have depictedMeta’s months-old AI unit as a soul-crushing gulag,according to some of the engineers assigned to it. Meta has invested heavily in AI and is expected to spend as much as $145 billion on AI infrastructure this year,Reuters reports. TechCrunch reached out to Meta for comment.

4 hours ago

View

Jersey Mike’s IPO illustrates how bad the AI hype has become

I can’t tell the exact tipping point from realistic excitement over a new technology, to hype, toaww-come-on— but I’m pretty sure when a sandwich shop with Danny DeVito as its public face talks about AI in its IPO documents, we must be getting close. So it is with Jersey Mike’s. Because of investor thirst for all things AI these days, I understand why tech companies feel the need to sprinkle AI dust all over their pitches. This is as true for non-AI startupsraising venture capitalas it is forBending Spoons’ public debut, a company in the business of buying aging, “not-AI” tech companies to rehabilitate. Just for kicks, I took a look at Jersey Mike’s IPO documents to see how far this compulsion may go. Surely a sandwich shop would have no need to mention AI in itsS-1. But lo and behold! The term artificial intelligence and its acronym “AI” were mentioned 22 times. In this case, the company can’t claim to be selling AI software. It sells submarine sandwiches. AI products are what investors are really hungering for (terrible pun intended). Still, it found a way to mention AI in its investor-risk warnings. That may be even more funny. It doesn’t explain what it’s using AI for that could be dangerous to investors, beyond a hand-wave of a phrase, “We are beginning to use AI Technologies in our business.” In all fairness, as a company that operates franchisees, it does rely on software (mentioned 52 times) and data (112 mentions), as all businesses do. Its AI risk warning was boilerplate copy, perhaps even necessary, as such disasters have already happened to other food businesses, likethe half-baked AI inventory toolthat Starbucks rolled out, which couldn’t count and was recently scrapped. Still, I’m going to go out on a limb here and predict that the risk of an AI disaster for a company that produces real-life sandwiches, not AI slop, is about the same as, say, a franchise shop getting hit by lightning. That actuallyhappened, by the way,to a shop in Texas in 2021. Yet weather was only mentioned five times in the S-1. And lightning? Not once.

8 hours ago

View

Guide Labs debuts a new kind of interpretable LLM

Latest AI News

Which are the Leading AI Agents in Software Testing?

Best Accessibility Testing Tools For Large-Scale Enterprise Applications

Mark Zuckerberg tells staff that AI agents haven’t progressed as quickly as he’d hoped

Jersey Mike’s IPO illustrates how bad the AI hype has become

Quick Links

AI Tools by Task

Alternatives

Best AI Tools

Top AI Tools

Browse & Compare

Featured Lists

Discover