Latest AI News

After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M
Groqis looking to raise $650 million in new funding from existing investors, sources tellAxios, as it leans into its inference neocloud business that relies on its homegrown AI chip and systems. In December, Groq struck one of those not-an-acquisition agreements with Nvidia fora reported $20 billion, which involved the departure of some top-level senior Groq employees to the chip giant and the licensing of Groq’s hardware technology to Nvidia. That deal was good news for the startup’s investors, who got paid out in cash with what would have been Nvidia’s largest purchase, if the deal was a full-acquisition, Axios reports. Now these investors have been asked to pony up and back the company’s plans to grow its inference cloud business, which lets developers and enterprises host their inference-hungry apps. Inference is the processing that happens after an AI prompt and is currently a much bigger need in the AI world than model training. The new direction is led right now by Groq’s interim CEO and CFO, Adam Winter and Matt Eng, respectively. In some ways, the $650 million in funding is guaranteed. Axios reports that Groq’s backers Disruptive and Infinitium have agreed to fill the round should other existing investors not want their pro-rata shares.
View

What happens when companies become too AI-pilled?
Loading the player… The people deciding that AI can replace your job are also the ones least likely to understand what your job truly involves, according to Box founder Aaron Levie, who pointed to this as an example of “AI psychosis.” Indeed,ClickUp recently cut 22% of its workforcefor AI agents, tech layoffs in 2026 are already nearly matching all of 2025, andDuckDuckGo installs are climbingfrom users who want Google to stop forcing AI into search and just give them links. Watch as TechCrunch’sEquitypodcast hosts Kirsten Korosec, Anthony Ha, and Sean O’Kane dig into what happens when the AI-pilled and the AI-skeptical are both right at the same time, plus three deals worth knowing about and Waymo’s new robotaxi hitting the road. Subscribe to Equity onYouTube,Apple Podcasts,Overcast,Spotifyand all the casts. You also can follow Equity onXandThreads, at @EquityPod.
View

So you’ve heard these AI terms and nodded along; let’s fix that
Artificial intelligence is changing the world, and simultaneously inventing a whole new language to describe how it’s doing it. Spend five minutes reading about AI and you’ll run into LLMs, RAG, RLHF, and a dozen other terms that can make even very smart people in the tech world feel insecure. This glossary is our attempt to fix that. We update it regularly as the field evolves, so consider it a living document, much like the AI systems it describes. Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman once described AGI as the “equivalent of a median human that you couldhire as a co-worker.” Meanwhile,OpenAI’s charterdefines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry —so are experts at the forefront of AI research. An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’veexplained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks. Think of API endpoints as “buttons” on the back of a piece of software that other programs can press to make it do things. Developers use these interfaces to build integrations — for example, allowing one application to pull data from another, or enabling an AI agent to control third-party services directly without a human manually operating each interface. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see or interact with them. As AI agents grow more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful — and sometimes unexpected — possibilities for automation. Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer (20 chickens and 20 cows). In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning. (See:Large language model) This is a more specific concept that an “AI agent,” which means a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Rather than simply suggesting code for a human to review and paste in, a coding agent can write, test, and debug code autonomously, handling the kind of iterative, trial-and-error work that typically consumes a developer’s day. These agents can operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight. Think of it like hiring a very fast intern who never sleeps and never loses focus — though, as with any intern, a human still needs to review the work. Although somewhat of a multivalent term, compute generally refers to the vitalcomputational powerthat allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often a shorthand for the kinds of hardware that provides the computational power — things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry. A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain. Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher. (See:Neural network) Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics,diffusion systems slowly “destroy” the structure of data— for example, photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise. Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior. Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usuallyviolatesthe terms of service of AI API and chat assistants. This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specialized (i.e., task-oriented) data. Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise. (See:Large language model [LLM]) A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data — including (but not only) deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate. The two models are essentially programmed to try to outdo each other. The generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), rather than general purpose AI. Hallucination is the AI industry’s preferred term for AI models making stuff up — literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality. Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences (think of a health query that returns harmful medical advice). The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. Hallucinations are contributing to a push toward increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise — as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks. Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data. Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips. [See:Training] Large language models, or LLMs, are the AI models used by popular AI assistants, such asChatGPT,Claude,Google’s Gemini,Meta’s AI Llama,Microsoft Copilot, orMistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters. LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words. These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. (See:Neural network) Memory cache refers to an important process that boosts inference (which is the process by which AI works to generate a response to a user’s query). In essence, caching is an optimization technique, designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations and every time those calculations are made, they use up more power. Caching is designed to cut down on the number of calculations a model might have to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known isKV (or key value) caching. KV caching works in transformer-based models, and increases efficiency, driving faster results by reducing the amount of time (and algorithmic labor) it takes to generate answers to user questions. (See:Inference) A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models. Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) — via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery. (See:Large language model [LLM]) Open source refers to software — or, increasingly, AI models — where the underlying code is made publicly available for anyone to use, inspect, or modify. In the AI world, Meta’s Llama family of models is a prominent example; Linux is the famous historical parallel in operating systems. Open source approaches allow researchers, developers, and companies around the world to build on top of one another’s work, accelerating progress and enabling independent safety audits that closed systems cannot easily provide. Closed source means the code is private — you can use the product but not see how it works, as is the case with OpenAI’s GPT models — a distinction that has become one of the defining debates in the AI industry. Parallelization means doing many things at the same time instead of one after another — like having 10 employees working on different parts of a project at the same time instead of one employee doing everything sequentially. In AI, parallelization is fundamental to both training and inference: modern GPUs are specifically designed to perform thousands of calculations in parallel, which is a big reason why they became the hardware backbone of the industry. As AI systems grow more complex and models grow larger, the ability to parallelize work across many chips and many machines has become one of the most important factors in determining how quickly and cost-effectively models can be built and deployed. Research into better parallelization strategies is now a field of study in its own right. RAMageddon is the fun new term for a not-so-fun trend that is sweeping the tech industry: an ever-increasing shortage of random access memory, or RAM chips, which power pretty much all the tech products we use in our daily lives. As the AI industry has blossomed, the biggest tech companies and AI labs — all vying to have the most powerful and efficient AI — are buying so much RAM to power their data centers that there’s not much left for the rest of us. And that supply bottleneck means that what’s left is getting more and more expensive. That includes industries like gaming (where major companies have had toraise prices on consolesbecause it’s harder to find memory chips for their devices), consumer electronics (where memory shortage could causethe biggest dip in smartphone shipmentsin more than a decade), and general enterprise computing (because those companies can’t get enough RAM for their own data centers). The surge in prices is only expected to stop after the dreaded shortage ends but, unfortunately, there’snot really much of a signthat’s going to happen anytime soon. Like AGI, recursive self-improvement is a threshhold for how smart AI can get, and how little it may rely on humans. In the RSI scenario, AI models start improving themselves without human intervention, leading to a huge acceleration in capabilities and autonomy. In some tellings, this would be a cataclysmic moment akin to the singularity, a moment when AI models become immune to outside intervention. But RSI also describes a basic capability — can an AI model design its own successor? — which makes it much easier for engineers to try to build it.A number of recent AI startupshave set out to build recursively self-improving models, but most of them dismiss the apocalyptic implications, presenting RSI as simply the next frontier for research. Reinforcement learning is a way of training AI where a system learns by trying things and receiving rewards for correct answers — like training your beloved pet with treats, except the “pet” in this scenario is a neural network and the “treat” is a mathematical signal indicating success. Unlike supervised learning, where a model is trained on a fixed dataset of labeled examples, reinforcement learning lets a model explore its environment, take actions, and continuously update its behavior based on the feedback it receives. This approach has proven especially powerful for training AI to play games, control robots, and, more recently, sharpen the reasoning ability of large language models. Techniques like reinforcement learning from human feedback, or RLHF, are now central to how leading AI labs fine-tune their models to be more helpful, accurate, and safe. When it comes to human-machine communication, there are some obvious challenges — people communicate using human language, while AI programs execute tasks through complex algorithmic processes informed by data. Tokens bridge that gap: they are the basic building blocks of human-AI communication, representing discrete segments of data that have been processed or produced by an LLM. They are created through a process called tokenization, which breaks down raw text into bite-sized units a language model can digest, similar to how a compiler translates human language into binary code a computer can understand. In enterprise settings, tokens also determine cost — most AI companies charge for LLM usage on a per-token basis, meaning the more a business uses, the more it pays. So again, tokens are the small chunks of text — often parts of words rather than whole ones — that AI language models break language into before processing it; they are roughly analogous to “words” for the purposes of understanding AI workloads. Throughput refers to how much can be processed in a given period of time, so token throughput is essentially a measure of how much AI work a system can handle at once. High token throughput is a key goal for AI infrastructure teams, since it determines how many users a model can serve simultaneously and how quickly each of them receives a response. AI researcher Andrej Karpathy has described feeling anxious when his AI subscriptions sit idle — echoing the feeling he had as a grad student when expensive computer hardware wasn’t being fully utilized — a sentiment that captures why maximizing token throughput has become something of an obsession in the field. Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs toward a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand. Training can be expensive because it requireslotsof inputs, and the volumes required have been trending upwards — which is why hybrid approaches, such as fine-tuning a rules-based AI with targeted data, can help manage costs without starting entirely from scratch. [See:Inference] A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task — allowing knowledge gained in previous training cycles to be reapplied. Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focus (See:Fine tuning) Validation loss is a number that tells you how well an AI model is learning during training — and lower is better. Researchers track it closely as a kind of real-time report card, using it to decide when to stop training, when to adjust hyperparameters, or whether to investigate a potential problem. One of the key concerns it helps flag is overfitting, a condition in which a model memorizes its training data rather than truly learning patterns it can generalize to new situations. Think of it as the difference between a student who genuinely understands the material and one who simply memorized last year’s exam — validation loss helps reveal which one your model is becoming. Weights are core to AI training, as they determine how much importance (or weight) is given to different features (or input variables) in the data used for training the system — thereby shaping the AI model’s output. Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target. For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on. Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset. This article is updated regularly with new information.
View

Kiwibit’s AI-powered bird feeder is my new backyard buddy
Earlier this month, I got my hands on theKiwibit Bird Feeder Pro 4K AI Camera, and it has become my favorite backyard accessory. Setting it up is pretty straightforward. Multiple mounting options allow you to place the feeder on a pole, window ledge, or tree. Its dual seed compartments are designed for easy refills and cleaning. The solar panel on top ensures you don’t have to worry about batteries running low. Durability and camera quality are also strong points. Other specs include support for 2.4 GHz Wi-Fi, cloud storage, built-in two-way audio with a microphone and speaker, and a 130-degree wide-angle lens. As soon as I installed it in the backyard, I connected the feeder to the companion Kiwibit app on my phone. This is where you can be notified when a bird stops by, watch recordings, and track all the visits. A few weeks into testing is when the real fun started. My phone buzzed with a notification every time a new visitor showed up, and I found myself eagerly waiting for updates. Even on extremely rainy days, I managed to entice a few birds, including a stunning northern cardinal that I’ve now come to anticipate seeing every morning. As of this writing, the device has successfully recorded visits from six species. I’ve been addicted ever since. I find myself eagerly checking the app every morning to see which feathered little guy stopped by. I show off the videos to almost everyone I know as if they’re my own pets. One amusing notification I keep receiving is “a nuisance animal detected” when squirrels raid my birdseed stash (which happens as often as you’d expect). The app uses Kiwibit’s proprietary bird-identification algorithm to identify over 10,000 bird species, such as blue jays, ravens, and mourning doves. The Activity tab is particularly useful, as it tracks the number of “visits” captured, videos recorded, and total species observed. You can also navigate through the calendar to view specific days. The Birds tab offers in-depth information on each species, featuring detailed descriptions from Wikipedia. However, I did notice that the system occasionally has trouble accurately counting “visits.” For example, if a house sparrow is feeding in front of the camera for several minutes, the AI might record it as multiple visits, even if the bird hasn’t moved that much. Overall, testing the Kiwibit Bird Feeder Pro has been delightful. If you’re looking for a way to connect with nature while having some fun collecting bird species like Pokémon, give this smart feeder a try. Just be prepared for all the squirrels to visit, too.
View

Final 24 hours to save up to $410 on your TechCrunch Disrupt 2026 ticket
This is it. The countdown is almost over. You now have until tonight at 11:59 p.m. PT to lock in Early Bird savings of up to $410 forTechCrunch Disrupt 2026before prices increase. If Disrupt has been on your must-attend list, this is your final chance to secure the lowest available rates before the next price jump hits. Once the deadline passes, so do the savings. Register nowand join 10,000+ founders, investors, operators, and innovators at Moscone West in San Francisco from October 13–15 for three days packed with networking, startup discovery, and conversations shaping the future of tech.Bring a plus-one at 50%, orbring a group to get an up to 30% discount. TechCrunch Disrupt is where startup momentum accelerates. The event brings together the people actively building, funding, and scaling what’s next across AI, fintech, SaaS, climate, cybersecurity, consumer tech, and beyond. Attendees come to Disrupt for: With300+ exhibiting startups,Startup Battlefield 200, curated networking experiences, and multiple stages of programming, Disrupt is built to help attendees make meaningful connections and real business progress. Disrupt is designed for founders raising capital, investors sourcing opportunities, operators scaling companies, and innovators looking for an edge. Whether you’re launching your next startup, growing your network, or tracking the future of technology, Disrupt puts you in the room with the people driving the industry forward. Every year, Disrupt brings together hundreds of influential voices across startups and venture capital. Past speakers have included leaders from the companies and firms shaping the future of AI, enterprise software, fintech, consumer tech, and more. This year will deliver the same high-caliber experience, with200+ sessionsacross six industry-focused stages, plus roundtables and breakouts covering scaling, AI, fintech, infrastructure, robotics, and emerging technologies.Explore the growing agendato see the latest sessions and speaker announcements. Speakers include: Early Bird savings of up to $410 end tonight at 11:59 p.m. PT. After that, ticket prices increase. Register nowto secure your TechCrunch Disrupt 2026 pass at a low rate before the deadline expires. Bringing more than just you?Save 50% on a second ticket, or up to30% on community passes.
View

Today is the last day to apply to speak at TechCrunch Disrupt 2026
TechCrunch Disrupt 2026returns October 13–15 to Moscone West in San Francisco — and applications to speak are open for just a few more hours. We’re inviting founders, investors, operators, and technology experts to apply for a chance to take the stage at one of the most influential tech events of the year. More than 10,000 startup and VC leaders will gather at Disrupt 2026 to explore what’s next in AI, scaling, fintech, infrastructure, robotics, and the future of innovation. Applications close tonight at 11:59 p.m. PT.Apply nowto share your expertise and help shape the conversations defining the tech industry. We’re looking for high-impact speakers to lead one of two session types: Breakout Sessions: A 30-minute talk (up to 4 speakers, including a moderator) with a 20-minute audience Q&A. Capacity: 100 attendees. Roundtables: A 30-minute speaker-led group discussion, designed for up to 40 participants. No slides or AV — just insight and conversation. Each application will be carefully reviewed by our editorial team. Finalists will be selected for the Audience Choice vote — where TechCrunch readers choose which sessions make it to the Disrupt Stage. Learn more about speaking onDisrupt’s Call for Content page. If you have actionable insights, real-world experience, and a desire to contribute meaningfully to the tech ecosystem, we want to hear from you.Submit your application before today’s deadline.
View

Cognition’s Scott Wu says AI coding agents shouldn’t replace humans
Cognition CEO Scott Wu made headlines again this week when his two-year-old AI coding agent startupraised $1 billion at a $26 billion valuation. Cognition is the maker of Devin, one of the first and, arguably, most successful AI coding agents. Devin, the CEO says, “naturally owns tasks end to end.” In fact, in theblog postannouncing that raise, Cognition laid out a vision where “we are shifting to a world of self-driving software development.” So, could Devin replace, say, a mid-level L4 programmer? Yes, and no, Wu told TechCrunch. “We’ve never thought about it as replacing humans. I know it’s like a scenario, folks have said these things. It has never been our view.” In this wild year of 2026 when every dayanother tech CEO announces layoffsin the name of supplanting workers with AI, Wu says he especially doesn’t want coders to lose their jobs. “We are all programmers ourselves,” he explained. “I started coding when I was nine.” In fact, Wu has been called one of the most accomplished child competitive programmers of all time, according toa recent profile in Colossus. As a second-grader, Wu won a nationwide math competition for seventh-graders, which launched a childhood filled with math and programming tournaments. It also introduced him to other wunderkinds who went on to launch other AI tech startups, like Scale AI founder Alexandr Wang. So, he tells TechCrunch, the idea was never to make human programmers obsolete. “When we started building Devin, it’s kind of a funny thing,” he mused, “but we really just thought of it as: this is your buddy who helps you build more.” In fact, he showed off a little stuffed animal holding a computer, his own Devin teddy bear of sorts, that he keeps on his desk. He thinks of it as a physical symbol of the Devin AI coder “This is my buddy that helps you build more.” Wu doesn’t want AI agents to take the joy of programming away from people. “It’s not a secret, most software engineers love building software, right?” he said. “If you ask them why, what they’ll basically tell you is, ‘Well, it’s like I get to build things from nothing. I can make my whole idea that I have, and turn it into a product. I can turn it into an experience.’” Just like visual development environments abstracted software creation away from machine instructions, he views agents as another layer of abstraction between envisioning a software product and producing it. Yet, Cognition says that Devin’s role in its own company is to ship nearly all the software. The company says that 89% of code committed by its engineers was committed by Devin, and the rest by local agents in Windsurf, the AI coding competitorit acquired last year. Wu explains that his agent’s role is largely to do the kinds of long-tail maintenance tasks that many programmers don’t like to do anyway: bringing old software up to date; moving applications off one platform and onto another. Agents will free programmers “from a lot of the toil, and so they can do much more of the creation side,” he promises. So Wu bristles at the idea of Devin “replacing” human coders. While he says it can work independently, it works at “somewhere between a junior and a mid-level engineer” depending on the task at hand. As for the concept of self-driving software, where the agent learns and improves itself so that one day it will work at higher levels (“recursive” is the latest buzzword in AI these days), Wu says. “I think we are in for a wild ride.” He sees agents entering other fields where they will learn tasks, from customer service to medicine, but hopes the goal will be to augment human workers in those areas, too. “Code and software has been the first to move, but we’ll see this happen in all these other industries,” he predicts. “One thing that’s been clear to us since the beginning is, it should always be up to the human what to do … you really see this in software engineering, but I think it’s true in all these other professions too.”
View

After Nvidia’s $20B not-aqui-hire, AI chip startup Groq reportedly raising $650M
Groq is looking to raise $650 million in new funding from existing investors, sources tellAxios, as it leans into its inference neocloud business that relies on its homegrown AI chip and systems. In December, Groq struck one of those not-an-acquisition agreements with Nvidia fora reported $20 billionwhich involved the departure of some top-level senior Groq employees to the chip giant and the licensing of Groq’s hardware technology to Nvidia. That deal was good news for the startup’s investors who got paid out in cash with what would have been Nvidia’s largest purchase, if the deal was a full-acquisition, Axios reports. Now these investors have been asked to pony up and back the company’s plans to grow its inference cloud business, which lets developers and enterprises host their inference hungry apps. Inference is the processing that happens after an AI prompt and is currently a much bigger need in the AI world than model training. The new direction is led right now by Groq’s interim CEO and CFO, Adam Winter and Matt Eng, respectively. In some ways, the $650 million in funding is guaranteed. Axios reports that Groq’s backers Disruptive and Infinitium have agreed to fill the round should other existing investors not want their pro-rata shares.
View

This chip startup just raised $135M on a bet that AI’s biggest bottleneck isn’t compute — it’s memory
Every time you ask ChatGPT a question, your request triggers a data relay race. Information leaves memory, passes through a CPU for preprocessing, travels to a GPU for heavy computation, and then makes its way back — and that entire journey repeats for every single word the AI generates. The bottleneck is structural — it means routing through some of the most expensive and power-intensive chips in the industry on every single request. That inefficiency is exactly whatXCENA, a startup with offices in South Korea and the U.S., is trying to solve. The four-year-old startup has designed a chip that places compute capabilities much closer to DRAM — the fast, short-term memory chips that store data a processor is actively using — allowing routine data operations to be handled near memory, without the costly round trips between CPUs, GPUs, and memory. If it works at scale, the implications for AI infrastructure costs could be significant, which largely explains investor enthusiasm around the country. Indeed, XCENA just raised $135 million in a Series B at a valuation of $570 million, bringing its total raised to $185 million. XCENA CEO Jin Kim co-founded the startup in 2022 alongside CTO Dohun Kim and CPO Harry Juhyun Kim, all veterans of Samsung and SK Hynix, the memory giants that supply chips powering Nvidia’s GPUs. “CPUs and GPUs have both gotten smarter over the decades. Memory never did. XCENA wants to change that,” Kim said in an interview with TechCrunch. “The recent rise in memory prices and related stocks points to a broader shift in AI infrastructure toward memory-centric architectures,” he added. (This month, the three companies that dominate the global memory chip market — Samsung, SK Hynix, and Micron — each crossed a trillion-dollar valuation for the first time.) XCENA is betting its business on the thesis that “inference isn’t just a compute problem; it’s increasingly a memory scaling problem,” said Kim. XCENA’s chip, the MX1, connects to the CPU through CXL (Compute Express Link) — essentially a dedicated express lane between the processor and memory — processing data before it ever needs to leave the memory module. It brings compute to the data, not the other way around. The company claims that what used to require 10 servers could potentially run on just one. “While GPUs excel at matrix multiplication — the heavy math behind AI model training — much of the surrounding data orchestration, including preprocessing, KV cache management (the system that stores prior conversation context so a model doesn’t have to reprocess it), and data caching, still runs on CPUs. Our chip handles those tasks directly within the memory module itself,” Kim said. Demand for memory solutions has surged since the second half of last year, and the company believes the timing is working in its favor. Conversations with several global memory vendors are in early stages, though Kim declined to name them. The company’s ideal customers are hyperscalers spending tens of billions a year on AI infrastructure, where even a small gain in memory efficiency can mean hundreds of millions in savings. The MX1 is still a prototype. Mass production chips are scheduled to roll off Samsung’s foundry lines by the end of 2026, with the company expecting to generate revenue starting in 2027. While neural processing unit (NPU) makers are competing to challenge Nvidia for training workloads, XCENA is targeting the memory-intensive layer that sits underneath all of it. XCENA’s closest rivals include Astera Labs and Marvell, both Nasdaq-listed companies working on next-generation memory connectivity. Marvell is a large, established player already working in the same space, Kim said, adding that the differentiator comes down to intellectual property. “We have thousands of cores,” Kim said. Based on public specs, Marvell’s approach relies on a handful of general-purpose cores by comparison. Those cores are built on RISC-V — an open source chip design blueprint — and optimized specifically for data processing,with each core deliberately kept small and efficient. Beyond the cores themselves, XCENA designs its own internal memory hierarchy, interconnect bus, and DRAM controller — a level of vertical integration that most chip companies, including larger rivals, typically outsource. Seoul-based VC firms Altinum and IMM Investment co-led the Series B round, along with Corstone Asia and existing investors SBI Investment and Mirae Asset Capital. The company, which has more than 90 staff across offices in Pangyo, a tech hub outside Seoul, and Sunnyvale, is also in conversations with international investors about additional funding.
View

Happiest Minds Sees AI-led Momentum in FY26, But Can it Escape the Mid-Tier Trap?
Happiest Minds’ AI business is growing rapidly, but the real challenge is whether the mid-tier IT firm can scale fast enough to stand out in an increasingly crowded AI services market.
View

Sarvam AI Begins Hiring for New San Francisco Office
The hiring drive comes amid reports that Sarvam is in talks to raise up to $300 million at a valuation of up to $1.5 billion.
View

Enterprise AI Has a Massive Problem, and It’s Not the Models
Organisations recognise that good data quality and governance are crucial for sustainable AI in personalised customer engagement.
View