This Month in Tech: January 2025

TLDR of the TLDR: January 2025 in Tech

  1. Why DeepSeek had to be open-source (and why it won’t defeat OpenAI)
    DeepSeek open-sourced its reasoning model R1, rivaling OpenAI’s o1 in performance but at a fraction of the cost due to innovative training methods. This open-source approach, counterintuitive to standard business practices, was necessary for DeepSeek to overcome Western skepticism towards a Chinese AI company and gain market access. The increasing commoditization of LLMs that have similar performance across various models makes the higher cost of proprietary models like OpenAI’s less appealing.

  2. OpenAI launches Operator, an AI agent that performs tasks autonomously
    OpenAI has launched a research preview of Operator, a general-purpose AI agent that can control a web browser and independently perform certain actions. Operator is coming first to US users on ChatGPT Pro but will roll out to users on its Plus, Team, and Enterprise tiers eventually. OpenAI plans to integrate Operator into all of its ChatGPT clients. Operator is powered by a Computer-Using Agent that combines the vision capabilities of GPT-4o with reasoning abilities from the company’s more ad…

  3. DeepSeek Crushes OpenAI o1 with an MIT-Licensed Model—Developers Are Losing It
    Chinese AI research lab DeepSeek has unveiled its latest reasoning models, DeepSeek-R1 and DeepSeek-R1-Zero. DeepSeek-R1 is fully open-source and distributed under the MIT license. The lab has released six distilled models, ranging from 32 billion to 70 billion parameters. They are designed to address tasks in math, code generation, and reasoning.

  4. Google reports halving code migration time with AI help
    Google computer scientists have released a paper describing how the company uses AI for internal code migrations. The company’s focus is on bespoke AI tools developed for specific product areas instead of generic AI tools that provide broadly applicable services. While humans were needed to double-check the AI’s work, it is estimated that AI helped reduce the time required to complete a code migration by 50%. The researchers note that large language models should be used in conjunction with o…

  5. o3, Oh My
    OpenAI presented o3 the Friday before Christmas. o3 improves on some of the most challenging benchmarks and represents substantial progress in general-domain reasoning with reinforcement learning. It potentially ranks as one of the best competitive programmers in the world. This article looks at o3, its strengths and weaknesses, how o3 impacts the AI industry, and much more.

  6. Everything in Business is About Fighting Entropy — Here’s How Rippling Does It
    Rippling COO Matt MacInnis identifies organizational entropy as the main challenge for scaling startups, emphasizing the need for executives to maintain high standards and decisiveness to counteract this effect. He outlines strategies such as under-staffing, focusing on inputs, and leveraging impatience to keep teams agile and accountable. MacInnis further highlights the importance of maintaining a founder’s mindset and emphasizes wisdom through decision-making practice, ensuring executives a…

  7. AI2’s New Model Surpasses DeepSeek V3
    AI2’s Tulu-3, a 405B parameter open-weight language model, surpasses DeepSeek V3 and even OpenAI’s GPT-4o on key benchmarks.

  8. Google quietly announces its next flagship AI model
    Google revealed its next-gen flagship AI model, Gemini 2.0 Pro Experimental, in a changelog for its Gemini chatbot app. The model was available to Gemini Advanced users beginning Thursday, but Google has since removed the mention of the model from its changelog. The new model provides better factuality and stronger performance for coding and mathematics-related tasks. It is still in early preview and can display unexpected behaviors. The model doesn’t have access to real-time information and …

  9. DeepSeek FAQ
    DeepSeek’s R1 model, which is similar to OpenAI’s o1, caused a meltdown in the tech industry over the weekend. Many of the revelations that contributed to the meltdown were included in DeepSeek’s announcement over Christmas and most of the breakthroughs in V3 were actually revealed with the release of the V2 model last January. DeepSeek-V2 introduced two important breakthroughs, DeepSeekMoE and DeepSeekMLA - the implications of these breakthroughs only became apparent with V3, which added a n…

  10. Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
    Meta’s Mark Zuckerberg responded to the release of DeepSeek R1 by assembling four war rooms of engineers to figure out how the Chinese hedge fund built the AI. DeepSeek’s model may potentially outperform the next version of Meta’s Llama AI due to be released in early 2025. Meta is working on deciphering how DeepSeek lowered the cost of training and running its model, finding out what data the Chinese lab used to train its model, and how Meta can restructure its models based on the attributes …

  11. Perplexity launches Sonar, an API for AI search
    Perplexity has launched Sonar, an API that allows enterprises and developers to integrate its generative AI search tools into applications, providing real-time, citation-backed answers.

  12. OpenAI’s agent tool may be nearing release
    A software engineer claims to have uncovered evidence of OpenAI’s long-rumored Operator, an AI tool that can take control of users’ PCs and perform actions on their behalf. OpenAI may be releasing the tool this month. The company’s site and apps now contain hidden references to Operator. While AI agents are risky and speculative, tech giants are already touting them as the next big thing in AI. Analysts say the market for AI agents could be worth $47.1 billion by 2030.

  13. Millions of Accounts Vulnerable Due to Google’s OAuth Flaw
    A flaw in Google’s “Sign in with Google” OAuth allows attackers to access accounts of former employees at defunct startups by purchasing the company’s domain. This vulnerability affects millions of Americans, granting access to sensitive data across various SaaS platforms like Slack, Notion, and HR systems containing social security numbers. Google is currently working on a fix.

  14. Inside Meta’s race to beat OpenAI: “We need to learn how to build frontier and win this race
    Data scarcity has led AI developers to look for new ways to get unique data. Documents from a major copyright lawsuit against Meta show internal discussions about avoiding media coverage suggesting that the company used a dataset it knew to be pirated. It reportedly used the book piracy site LibGen to train its AI systems. The documents hint that OpenAI and Mistral may also be using the library for their models. Meta and other AI companies have previously argued that using copyrighted materia…

  15. Amidst the Noise and Haste, Google Has Successfully Pulled a SpaceX
    Google’s long-term, vertically integrated approach to AI, from chip design (TPUs) to applications, mirrors SpaceX’s success in space launch, creating a massive cost and performance advantage.

  16. How AI-assisted coding will change software engineering: hard truths
    There are two developer types: “bootstrappers,” who use AI for rapid prototyping, and “iterators,” who use AI in daily workflows. While AI accelerates development a lot, it introduces challenges like the “70% problem,” where achieving the final 30% of software quality requires real human expertise. As a result, AI makes the lives of experienced software engineers easier, but doesn’t replace them.

  17. Reflections
    Sam Altman reflects on OpenAI’s journey, the unexpected success of ChatGPT’s launch, and the subsequent rapid growth. It was challenging and messy to build a company around groundbreaking technology, as shown by his own public firing and subsequent reinstatement. He’s learned a lot and will be more confident moving forward.

  18. Things we learned about LLMs in 2024
    This 2024 LLM review highlights significant happenings in LLMs over the past year. The cost of using top-tier LLMs plummeted due to competition and improved efficiency, while multimodal capabilities became increasingly common. GPT-4 is no longer the top LLM, as it has been surpassed by other LLMs like Gemini 2.0 and OpenAI’s o1 and o3. There are still many challenges to tackle, such as the unreliability of LLMs, the need for better evaluation methods, and the uneven distribution of knowledge …

  19. How AI is unlocking ancient texts — and could rewrite history
    Artificial neural networks are being used to decipher ancient texts. The technology is producing more data for scholars than they’ve had for centuries. It is making sense of vast archives, filling in missing and unreadable characters, and decoding rare and lost languages. The technology promises a fundamentally new way to explore ancient sources.

  20. Luma AI releases Ray2
    Ray2 is a large-scale video generative model that’s setting a new standard in realistic visuals with natural, coherent motion and logical event sequences. It was trained on Luma’s new multi-modal architecture, scaled to 10x compute of Ray1.

  21. Mistral Small 3
    Mistral has released a very powerful 24B model that achieves strong performance, especially in multilingual data. It is the perfect size for deployment and strength.

  22. Chain of RAG
    Reasoning models can now be trained to iteratively do retrieval. This is similar in spirit to the work shown in the Operator system. Interestingly, it leads to a substantial improvement, although it’s not clear what the true FLOP controlled gain looks like.

  23. Google is building its own ‘world modeling’ AI team for games and robot training
    Google DeepMind is forming a team led by Tim Brooks to develop AI “world models” for simulating physical environments. These models aim to enhance real-time interactive media and training scenarios, contributing to Google’s pursuit of AGI. The initiative will collaborate with existing Google AI projects like Gemini and Veo.

  24. Can AI do maths yet? Thoughts from a mathematician
    OpenAI’s new language model o3 scored 25% on the FrontierMath dataset, a collection of challenging math problems curated by Epoch AI. Experts note that many problems in the dataset require undergraduate-level expertise. Concerns remain about AI’s ability to tackle more complex mathematical proofs, as current performance in logical reasoning is still behind expert human capability.

  25. The Next Great Leap in AI Is Behind Schedule and Crazy Expensive
    OpenAI’s GPT-5 project, codenamed Orion, faces delays and high costs due to unexpected challenges and a lack of diverse data sources.