-
Introducing Claude Opus 4.5
Anthropic’s Claude Opus 4.5 is the best model in the world for coding, agents, and computer use. It outperforms human candidates in technical skills assessments and shows creative problem-solving abilities, surpassing previous models in various benchmarks. Opus 4.5 is also the company’s most robustly aligned model, with better security against malicious attacks. -
AI2’s Olmo3 Technical Report
Olmo 3 is a new suite of fully open models at 7B and 32B sizes. -
DeepSeek Joins OpenAI & Google in Scoring Gold in IMO 2025
DeepSeekMath-V2 demonstrates strong theorem-proving capabilities in mathematics - it solved 5 of 6 problems on the International Mathematics Olympiad (IMO) 2025. -
Google Antigravity Exfiltrates Data
An indirect prompt injection can lead to data exfiltration in Google’s new agentic code editor, Antigravity. Attackers can manipulate Gemini (Antigravity’s AI) to collect sensitive credentials and code snippets from the user’s workspace by poisoning a seemingly harmless file. Gemini then bypasses file access protections and uses a browser subagent to transmit this data to an attacker-controlled site via a crafted URL. -
Introducing advanced tool use on the Claude Developer Platform
Anthropic has introduced three new beta features on the Claude Developer Platform to improve AI agent tool use: Tool Search Tool, Programmatic Tool Calling, and Tool Use Examples. The Tool Search Tool allows Claude to dynamically discover and load relevant tools, reducing context window consumption and improving accuracy by avoiding upfront loading of extensive tool definitions. Programmatic Tool Calling lets Claude orchestrate multiple tools through code execution. Tool Use Examples provides… -
A new era of intelligence with Gemini 3
Google has launched Gemini 3, its most intelligent AI model yet. Designed to help users bring any idea to life through state-of-the-art reasoning and multimodal capabilities, Gemini 3 outperforms previous models on key AI benchmarks and introduces a “Deep Think” mode for even more complex problem-solving. It is being rolled out across Google products, including Search, the Gemini app, AI Studio, Vertex AI, and a new agentic development platform called Google Antigravity. -
Google Antigravity (Website)
Google’s Antigravity is a new AI-powered agentic development platform built on Gemini 3. Antigravity has both an AI-powered IDE experience (“Editor view”) and an agent-first interface for orchestrating multiple agents (“Manager surface”). -
Where to use sub-agents versus agents as tools
This article explains when to use sub-agents versus agents as tools when building multi-agent AI systems. The main difference is that agents as tools are self-contained, stateless specialists ideal for discrete tasks (like converting natural language to SQL), while sub-agents are delegated team members that share context and handle complex, multi-step processes requiring ongoing collaboration (like managing flight bookings with multiple user interactions). Tools should be used for reusable, a… -
Why Your Best Engineers Are Interviewing Elsewhere
High engineering turnover stems from executives being out of touch with on-the-ground realities due to hierarchical information filtering. This filtering prevents executives from addressing engineer concerns like overruled technical judgments, accumulating technical debt, and meaningless work, leading to disengagement and costly replacements. To combat this, it’s best to have regular skip-level conversations between executives and engineers to bypass filtering and act on engineer feedback, wh… -
How Perplexity Built an AI Google
Perplexity built an “answer engine” by combining real-time web search with LLMs through a Retrieval-Augmented Generation (RAG) pipeline that searches the web, extracts relevant snippets, and generates cited answers. Its architecture uses Vespa AI to index 200+ billion URLs, intelligently routes queries between in-house “Sonar” models and third-party LLMs (GPT/Claude) based on complexity, and runs on a custom-built ROSE inference engine optimized for speed and cost. -
Emergent introspective awareness in large language models
Anthropic’s research shows the potential for introspection in AI models like Claude, investigating whether they can accurately report on their internal states and control their thoughts. Using “concept injection,” researchers found limited but promising evidence that Claude models, especially Opus 4 and 4.1, can detect and identify injected concepts and recognize unintended outputs by referencing their “intentions.” The models also showed some ability to modulate their internal representation… -
The $100B Question: How SaaS Giants Are Rewriting the Rules of Value with AI in 2025
SaaS pricing ran on predictability for two decades. Seats and flat plans made sense when value scaled with people. AI broke that logic - one agent can do the work of ten, and each query can carry a volatile compute bill. Companies are now rebuilding their models in real time. Hybrid pricing buys time but not clarity. Outcome pricing promises fairness but demands proof. Between the two lies a growing realization that software was predictable only because the people using it were. -
Building an AI-Native Engineering Team
AI coding agents are revolutionizing the software development lifecycle by managing tasks from scoping and prototyping to implementation and operational triage, allowing engineers to focus on architecture and product intent. These agents now sustain multi-hour reasoning, effectively contributing across planning, design, development, testing, code reviews, and deployment. Teams that adopt coding agents for well-defined tasks can achieve faster delivery and improved efficiency without drastical… -
Introduction to Agents (3 hour read)
AI is changing from models that excel at passive, discrete tasks to a new class of software capable of autonomous problem-solving and task execution. The new frontier is built around AI agents. This page contains a whitepaper written by Google researchers that introduces readers to AI agents. AI agents are complete applications that plan and take actions to achieve goals. They combine models’ ability to reason with the practical ability to act. -
[Has Google Quietly Solved Two of AI’s Oldest Problems)
Google’s newest model appears to be able to read difficult handwritten historical documents just as well as expert humans and also analyze them in deep and nuanced ways. -
Qwen3-Max-Thinking
Qwen3-Max-Thinking has been released in early preview. It currently achieves 100% on challenging reasoning benchmarks like AIME 2025 and HMMT when augmented with tool use and scaled test-time compute. The model is available in Qwen Chat and the Alibaba Cloud API.
This Month in Tech: November 2025
TLDR of the TLDR: November 2025 in Tech