This Month in Tech: August 2025 | Critical Thinking Officer

Claude Sonnet 4 now supports 1M tokens of context
Anthropic’s Claude Sonnet 4 now supports a 1 million token context window, which gives it the ability to handle more extensive and data-intensive use cases like large-scale code analysis and document synthesis.
AI’s labor impact, and how to not lose our minds
Stanford researchers analyzed payroll data from 25 million American workers and found AI is already replacing early-career workers in specific fields. Software developers aged 22-25 have seen 6% job losses since ChatGPT launched, while developers over 30 gained 9% more jobs in the same period. Customer service shows the same pattern - young workers losing jobs, older workers gaining them. The data suggests AI isn’t coming for all jobs or even most jobs, but it’s systematically eliminating the…
DeepMind reveals Genie 3 “world model” that creates real-time interactive simulations
Google DeepMind’s Genie 3 model can create interactive worlds from prompts or images. It can create continuously generated environments that can be changed on the fly. The ability to create alterable 3D environments could make games more dynamic for players and offer developers new ways to prove out concepts and level designs. Examples of environments generated by the model are available in the article.
GPT-5′s rollout fell flat for consumers, but the AI model is gaining where it matters most
GPT-5’s rollout wasn’t about the consumer - it was OpenAI’s effort to crack the enterprise market. Startups like Cursor, Vercel, and Factory have already made GPT-5 the default model in certain key products and tools due to its faster setup, better results on complex tasks, and lower price. OpenAI has built out an enterprise sales team with more than 500 people. Enterprise demand is rising sharply, especially for planning and multi-step reasoning tasks.
From GPT-2 to gpt-oss: Analyzing the Architectural Advances
This post compares OpenAI’s new open-weight LLMs, gpt-oss-20b and gpt-oss-120b, with GPT-2 and Qwen3. There are architectural advances such as removing dropout, using RoPE for positional embeddings, incorporating Mixture-of-Experts, and more. The post also discusses the training process, reasoning abilities, optimization for single-GPU use, and benchmark performance of the gpt-oss models.
OpenAI launches GPT-5 free to all ChatGPT users
OpenAI has announced GPT-5 in three variants: GPT-5 Pro, GPT-5 mini, and GPT-5 nano. Some of the models will be available across all ChatGPT, including for free users. OpenAI claims the new model family comes with reduced confabulations, improved coding capabilities, and a new approach to handling sensitive requests. Free users now have access to a simulated reasoning AI model. GPT-5 Pro is replacing o3-pro in ChatGPT for those subscriber tiers with access to it.
Read That Code!
It’s important to read AI-generated code, despite the efficiency of tools like Claude Code. Otherwise, the codebase risks weakened architecture, and devs risk the loss of implementation knowledge. There needs to be a distinction between asynchronous tasks suitable for “auto-accept mode” and synchronous coding for core features requiring careful oversight.
RAG is Dead, Context Engineering is King
Chroma’s founder says RAG is dead because it bundles three concepts poorly - the real job is context engineering, figuring out what belongs in the context window for each LLM generation. The best teams now use a two-stage approach: first-stage retrieval (vector search, text search, and metadata) culls 10,000 candidates down to 300, then LLMs re-rank those to the final 20-30. This matters because context rot is real - despite perfect needle-in-haystack marketing, models degrade as you add toke…
The State of AI 2025
AI startups are growing like nothing we’ve ever seen - some hit $100M ARR in their first year. Bessemer studied 20 companies and found two types: Supernovas that explode to $125M by year two but have 25% gross margins and fragile retention, and Shooting Stars that quadruple yearly with 60% margins. The new benchmark isn’t T2D3 anymore, it’s Q2T3 (quadruple, quadruple, triple, triple, triple).
When Building Gets Easy, Winning Gets Hard
Sam Altman says we’re entering the “fast fashion era of SaaS,” and he’s right. AI split software into two camps - the hard builds like infrastructure and foundation models, and the easy builds you can ship in a weekend with Cursor. When building gets easy, winning gets exponentially harder because everyone has the same product. The margin reality is brutal. Replit swings between -14% and +36% gross margins, while OpenAI sits at 45% and Nvidia at 75%.
Gemini 2.5 Flash Image Model Released
Gemini 2.5 Flash Image is an advanced image generation and editing model that supports image blending, character consistency, and natural language-driven edits.
The Bubble That Knows It’s a Bubble
The sequence of ‘revolutionary technology, abundant capital, speculative frenzy, and sudden reality checks’ has played out with remarkable consistency for over 180 years.
Parallel Thinking with Confidence
Meta AI introduced a parallel thinking method that uses model-internal confidence to filter low-quality reasoning traces during or after generation, requiring no extra training or tuning and integrating into existing serving frameworks.
Void (GitHub Repo)
An open‑source, VS Code–derived AI code editor offering direct connections to any LLM (local or cloud), agent modes, change checkpoints, and full data privacy—with no backend middleman.
How Anthropic builds safeguards for Claude
Anthropic outlined a multi‑layer safeguards program spanning policy design, training influence, red‑teaming and testing, real‑time enforcement, and threat intelligence to reduce misuse.
Anthropic Unveils More Powerful AI Model Ahead of Rival GPT-5 Release
Anthropic’s new model, Opus 4.1, is more capable at coding, research, and data analysis and better at fielding complex multistep problems than previous versions. It can better navigate large codebases and make more precise modifications to code. Opus 4.1 scores two percentage points higher than its predecessor on SWE-Bench Verified, a popular coding evaluation benchmark. Anthropic is currently finalizing a deal to raise as much as $5 billion in a new funding round at a valuation of $170 billion.
Google releases Gemini 2.5 Deep Think for AI Ultra subscribers
Google’s most powerful AI model generates multiple solution approaches simultaneously before selecting the best answer. A specialized version that can work for hours on complex problems recently achieved IMO gold medal status and is being shared with select mathematicians.
Our first outage from LLM-written code
Sketch experienced a series of mini-outages on July 15. An LLM had made a small but critical change during a refactor that was missed by a human reviewer. When humans refactor code, they select the original text, cut it, move it to the new file, paste it, and then make intentional changes. LLMs write two patches, a deletion and an insertion, leaving room for transcription errors. Sketch added clipboard support to its agent environment to try to prevent further issues in transcription.

Need help with strategic technology?