-
Approaching The Event Horizon
OpenAI’s latest breakthrough suggests we’re nearing artificial general intelligence through an algorithm that can scale reasoning with compute power. The implications are profound: potentially half of knowledge work could be automated. -
Deepseek: The Quiet Giant Leading China’s AI Race
Chinese AI startup Deepseek’s R1 model outperformed OpenAI’s O1 on multiple reasoning benchmarks, solidifying its reputation in the AI community. Led by CEO Liang Wenfeng and funded by the hedge fund High-Flyer, Deepseek is focused on open-sourcing its innovative AI models and cutting-edge architectural advancements like MLA and DeepseekMoE, which have triggered price wars among Chinese tech companies. Deepseek prioritizes AGI research over commercialization, relying on a bottom-up organizati… -
Moonlight 16B Muon trained model (GitHub Repo)
This is the first (public) large scale model trained with the Muon optimizer. It was trained for 5.7T tokens and is a very similar architecture to DeepSeek v3. -
Deep Dive into LLMs (3 hour video)
Andrej Karpathy has released another amazingly educational video that dives deep into many aspects of developing language models that covers pre-training, hallucination mitigation, and post training. -
OpenAI unveils GPT-4.5 ‘Orion,’ its largest AI model yet
OpenAI is now rolling out GPT-4.5 to ChatGPT Pro subscribers and developers on paid tiers of the company’s API. Other users should get the model sometime next week. GPT-4.5 is so expensive to run that OpenAI is evaluating whether to continue serving the model on its API in the long term. It currently costs $75 for every million input tokens and $150 for every million output tokens. This article looks at how the new model compares to OpenAI’s other offerings. -
Anthropic’s latest flagship AI might not have been incredibly costly to train
Claude 3.7 Sonnet apparently cost just a few tens of millions of dollars to train, not factoring in related expenses - a sign of how relatively cheap it is becoming to release state-of-the-art models. -
Claude 3.7 and Claude Code
Anthropic has announced Claude 3.7 Sonnet, an AI model with integrated reasoning capabilities that provides both quick responses and deeper, step-by-step thinking. Alongside the model, they introduced Claude Code, a command line tool for agentic coding. Claude 3.7 Sonnet offers fine-grained control over its thinking time through the API. -
Claude Code Research Preview (GitHub Repo)
Claude Code is an agentic coding tool that lives inside the terminal. It understands code bases and helps developers code faster by executing routine tasks, explaining complex code, and handling git workflows, all through natural language commands. Claude Code can edit files, fix bugs, execute and fix tests, search through git history, resolve merge conflicts, create commits and PRs, and more. -
A new generation of AIs: Claude 3.7 and Grok 3
The new generation of AI models are smarter and the jump in capabilities is striking, particularly in how they handle complex tasks, math, and code. -
OpenAI Researchers Find That Even the Best AI Is “Unable To Solve the Majority” of Coding Problems
OpenAI researchers found that even the most advanced AI models struggle to solve most coding problems. They tested frontier models like GPT-4o and Claude 3.5 Sonnet on software engineering tasks from Upwork. While the AI models were fast, they often failed to grasp the context and scope of bugs, creating incorrect or incomplete solutions. -
A.I. Is Changing How Silicon Valley Builds Start-Ups
Tech startups used to raise huge sums to hire armies of workers and grow fast, but AI tools are enabling workers to be more efficient, making tiny teams more successful. Startups are using AI tools to increase employees’ productivity in everything from customer service and marketing to coding and customer research. They are able to gain more revenue with fewer employees and less cash, something that wouldn’t have been possible without the technology. The potential for AI to let startups do mo… -
AI used to design a multi-step enzyme that can digest some plastics
AI-driven protein design has enabled the possibility of creating things unlike anything found in nature. Scientists have successfully created an enzyme with the potential to digest plastics. Breaking down ester bonds in plastics requires four steps - getting AI to design a protein with the right configuration to do one of these steps is easy, but having it cycle through all four is much harder. The scientists overcame challenges by adding more AI models and eventually designed an esterase cap… -
Elon Musk’s xAI releases its latest flagship model, Grok 3
xAI has released Grok 3 and unveiled new capabilities for the Grok iOS and web apps. The company used an enormous data center in Memphis that contained around 200,000 GPUs to train Grok 3. Grok 3 is a family of models: Grok 3 mini responds to questions more quickly but at the cost of some accuracy; Grok 3, which beats GPT-4o on some benchmarks; Grok 3 Reasoning, which surpasses the best investigation of o3-mini on several popular benchmarks; and Grok 3 mini Reasoning. Subscribers to X’s Premi… -
DeepMind working on distributed training of large AI models
DeepMind has published research that discusses how to distribute the training of models with billions of parameters among clusters of computers (that could theoretically be widely separated) while producing the same level of quality as before. -
[What Happens to SaaS in a World with Computer Using Agents)
The rise of general-purpose AI agents will likely commodify many simple SaaS products by automating their functions behind the scenes, shifting value to platforms providing the underlying infrastructure and AI models. -
The future belongs to idea guys who can just do things
AI will make execution cheap. This will make brand, distribution, and ideas much more important. There is currently an unbounded opportunity for any proficient software engineer. People are still stuck at the level of thinking about singular AI coworkers - it will be possible to have thousands of agents working on the same problems at the same time. -
We Tried OpenAI’s New Deep Research—Here’s What We Found
OpenAI’s ‘deep research’ is a new agentic research assistant that can compile full-blown research reports. Unlike previous versions of ChatGPT, it runs fully autonomously and works through many steps before returning an answer. The assistant can take anywhere from one to 30 minutes to research an answer depending on the complexity of the question - users can follow its work in real time. This article looks at how ‘deep research’ works, how well it works, and the impact the technology will hav… -
Chatbot Software Begins to Face Fundamental Limitations
LLMs are showing fundamental limitations in solving complex, multi-step reasoning problems, particularly those requiring compositional tasks. Research shows that LLMs struggle with these tasks because their underlying transformer architecture has inherent mathematical limits on their ability to handle increasingly complex computations. While techniques like chain-of-thought prompting and embedding tricks can improve performance on some tasks, these are essentially workarounds, not solutions t… -
[Where Will the AI Horde Strike Next)
AI disruption isn’t one big invasion; it’s a wave of startups simultaneously probing every market niche, from Hollywood to social media. Legacy players can’t just defend one fortress; there’s always another angle for AI founders. AI’s expansion is like a roving horde of nimble companies. You either partner up or risk being overrun—because the AI horde can adapt, pivot, and push wherever it sees weakness. -
IBM’s $7B Bet on Vertical AI and What It Means for SaaS Founders
IBM invests $7 billion in Vertical AI, focusing on industry-specific solutions through its WatsonX platform. The approach targets economic efficiencies, leveraging smaller models like Instruct Lab for significant cost and time savings. Strategic partnerships with firms like ServiceNow and Salesforce aim to accelerate enterprise AI adoption, offering SaaS founders new opportunities for domain-specific innovation. -
Claude’s Extended Thinking Mode
Anthropic’s extended thinking mode, introduced in Claude 3.7 Sonnet, allows the model to allocate more cognitive effort to complex problems and make its thought process visible for greater transparency and trust. -
Advancing game ideation with Muse: the first World and Human Action Model (WHAM)
Microsoft Research has introduced “Muse,” an AI model capable of generating video game visuals and gameplay sequences. Developed with Xbox Game Studios’ Ninja Theory, Muse was trained on extensive gameplay data and is now open-sourced. The WHAM Demonstrator enables interaction with the model, highlighting its potential for new creative uses in game development. -
Competitive Programming with Large Reasoning Models
OpenAI paper about using their o series of reasoning models in competitive programming. They find that originally they had to use hand crafted inference strategies, but later versions of o3 scored well without human intervention. -
Sam Altman Regrets Ditching Open Source, Says He’s Been on the “Wrong Side of History”
Chinese AI startup DeepSeek demonstrated its ability to replicate OpenAI’s chatbots at a significantly lower cost, reigniting the open-source debate in the AI industry. -
Why everyone is freaking out about DeepSeek
DeepSeek’s AI models, which are significantly cheaper to train compared to other leading models, have disrupted the AI market, potentially challenging Nvidia and other tech giants by showcasing efficient use of resources. This has shaken investor confidence in the AI sector, which traditionally believed that more spending equated to better performance. DeepSeek’s success suggests that innovation, rather than just financial investment, could redefine the competitive landscape.
This Month in Tech: February 2025
TLDR of the TLDR: February 2025 in Tech