This Month in Tech: December 2025 | Critical Thinking Officer

The Junior Hiring Crisis
The tech industry is facing a hiring crisis in hiring junior-level engineers due to AI automation replacing entry-level roles and senior engineers not prioritizing mentorship. Universities are aware of students struggling to find jobs, but they can’t do much about it. As a result, juniors should focus on developing their networks and their interpersonal skills.
The Age of Scaling Is Over!
AI labs have spent five years betting that more data and more GPUs would solve everything. That bet is no longer paying off. As scaling laws flatten, the industry is shifting focus to “test-time compute”, forcing models to spend more resources reasoning during a task rather than just training bigger brains.
State of AI
This year was a turning point in the real-world use of large language models. The field shifted from single-pass pattern generation to multi-step deliberation inference. The shift up folded so fast that our understanding of how these models have been used in practice has lagged behind. This study leverages the OpenRouter platform to analyze over 100 trillion tokens of real-world AI interactions to see how the technology is being used in the real world. The way developers and end-users have be…
Clopus-Watcher: An autonomous monitoring agent
AI will likely make 24/7 on-call a thing of the past. 24/7 monitoring is a lot simpler than the development process. There are often reference documents that engineers can follow to bring systems back up, and if they fail, there’s always a backup and recovery plan in place. On-call jobs have always been more systematic. This post introduces an autonomous monitoring agent that does what an on-call engineer would do, but autonomously, forever.
AI’s real superpower: consuming, not creating
AI’s real superpower is in consumption, not creation. Most people should use it less for generating content and more for its analytical capabilities. Connecting AI to a knowledge base, such as an Obsidian vault filled with notes and reflections, and having it act as a research assistant, allows it to uncover hidden patterns and connect disparate ideas across multitudes of information that would be impossible for a human to process.
AI agents are starting to eat SaaS
AI agents allow companies to build customized solutions more easily and reduce reliance on external SaaS. This shift is already shown as engineers use agents to quickly create internal dashboards, code wrappers, and UI/UX mockups. As a result, SaaS companies, especially those offering simpler back-office or CRUD-based tools, are having challenges to their net revenue retention.
Yes, AGI Can Happen – A Computational Perspective
Today’s models are nowhere near the compute or efficiency ceilings of the hardware, and there’s at least an order of magnitude more computation available.
Claude Agent Skills: A First Principles Deep Dive
Claude’s Agent Skills system is a prompt-based architecture that extends LLM capabilities through specialized instruction injection rather than traditional function calling or code execution. Skills are prompt templates that modify conversation context and execution permissions. Claude makes skill selection decisions through pure language understanding rather than algorithmic routing.
Introducing: Devstral 2 and Mistral Vibe CLI
Mistral AI has launched Devstral 2 and Devstral Small 2, its next-generation open-source agentic coding models, along with the Mistral Vibe CLI for end-to-end code automation. Devstral 2 (123B) is a state-of-the-art model establishing new benchmarks for open-weight code agents. The Mistral Vibe CLI provides an open-source command-line interface that uses Devstral models to autonomously explore, modify, and execute changes across entire codebases.
Nobody “Wants” AI
Henry Ford’s customers didn’t want cars, they wanted to get from point A to point B - cars just fit that demand better than horses.
We removed 80% of our agent’s tools
Vercel spent months building a sophisticated internal text-to-SQL agent with specialized tools, heavy prompt engineering, and careful context management. It kind of worked, but it was fragile, slow, and required constant maintenance. The team then deleted most of it and stripped the agent down to a single tool that executed arbitrary bash commands. Its agent got simpler and better at the same time: it had a 100% success rate instead of 80%.
Introducing Bloom: an open source tool for automated behavioral evaluations
Anthropic’s Bloom is an open-source tool for generating automated behavioral evaluations of AI models. Bloom assesses specific behaviors like self-preferential bias and sabotage by creating scenarios and quantifying behavior occurrence across models. It efficiently differentiates between aligned and misaligned models and correlates strongly with human judgment, enabling scalable and reliable behavior evaluations.
Agent Skills Becomes an Open Standard
Agent Skills, folders of instructions, scripts, and resources that give AI agents new capabilities on demand, originated at Anthropic (which also created MCP) and is now an open format with adoption from Cursor, GitHub, VS Code, Claude Code, and OpenAI’s Codex CLI. Skills let teams package domain expertise and workflows into portable, version-controlled packages that work across different agent products.
[If AI could interact and learn from the physical world, could it make more scientific advances)
OpenAI partnered with Red Queen Bio to introduce a new evolutionary framework that uses GPT-5 to optimize cloning protocols. The project involved the pilot of an autonomous robot to carry out the protocols. GPT-achieved a 79x cloning efficiency gain and introduced a new enzyme-based approach. While this is not a bio breakthrough, the novel optimization shows that GPT-5 is at the level of a competent PhD student at this task.
NVIDIA Debuts Nemotron 3 Family of Open Models
Nvidia released Nemotron 3 Nano (30B parameters, 3B active) with Super (100B) and Ultra (500B) coming in early 2026, with Nano’s benchmark scores rivaling or exceeding closed-source rivals. Nvidia is publishing training data and releasing libraries for agent customization in what appears to be an attempt to undermine OpenAI, Google, and Anthropic, which are increasingly developing their own chips instead of using Nvidia.
OLMo 3: A Deep Dive Into the Fully-Open LLM
AI2 published the most comprehensive starting point for open LLM research: all checkpoints, training data, and code for OLMo 3. The walkthrough covers every stage from data mixing through the full SFT/DPO/RLVR post-training stack, including OlmoRL infrastructure that cuts RL training from 15 days to 6 days. RL with random rewards, which worked on Qwen, fails on OLMo 3.
How AI Is Transforming Work at Anthropic
Anthropic surveyed 132 of its own engineers and found 27% of Claude-assisted work wouldn’t have happened otherwise, but not without costs. Mentorship is declining as Claude is becoming the first stop for questions. Engineers worry that the skills needed to supervise AI are the same ones atrophying from overuse.
Introducing Runway Gen-4.5: A New Frontier for Video Generation
Gen-4.5 topped the Artificial Analysis text-to-video benchmark, ahead of Veo 3 and Sora. The model emphasizes physical accuracy (realistic momentum, fluid motion, and material coherence). Runway has acknowledged persisting issues with object permanence.
Nvidia Launches Vision Language Model for Autonomous Vehicles
Nvidia introduced Alpamayo-R1, the first open-source vision language action model designed for autonomous driving, at NeurIPS. Alpamayo-R1 integrates visual and textual reasoning to enhance decision-making in real-world environments.
The Thinking Game (Website)
The Thinking Game is a documentary that covers the pivotal moments over five years at DeepMind, an AI company founded by Demis Hassabis.

Need help with strategic technology?