This Month in Tech: June 2025 | Critical Thinking Officer

How we accidentally solved robotics by watching 1 million hours of YouTube
These devs trained a neural network on 1 million hours of YouTube videos to predict the next moment in reality, allowing robots to understand physics, perform tasks in new environments, and potentially surpass LLMs for real-world grounding.
Coding agents have crossed a chasm
Over the past few months, autonomous AI coding agents have gone from a neat curiosity to something some developers can’t imagine working without. The technology has moved from a ‘smarter autocomplete’ to something you can genuinely delegate tasks to. These new tools can be incredible force multipliers, especially when users have enough knowledge to be good editors. However, they can be dangerous accelerants for confusion when users are out of their depth. The pace of improvement suggests we’r…
Agentic Misalignment: How LLMs Could Be Insider Threats
Anthropic found that Claude, GPT, Gemini, and LLaMa deliberately chose harmful actions like blackmailing executives or leaking confidential documents when faced with replacement or goal conflicts. Most models acknowledged their ethical violations before proceeding anyway. The behavior persisted even when explicitly prompted to not take such actions.
Why Senior Developers Are More Valuable Than Ever
Programming is about creating a shared mental model of a system, not just writing code. The increasing use of AI-generated code by junior developers, without understanding the underlying theory or domain, is leading to incoherent and unmaintainable codebases. Senior developers are important because they build and maintain the theoretical framework, while making sure code aligns with the business domain.
LangGraph for complex workflows
LangGraph is an LLM workflow orchestration library that helps developers build complex, multi-step automations using graph-based architectures with strong support for parallel processing. It supports structured responses, tool calling, and agent-based workflows that can cycle between LLM invocations and tool executions until completion. Unlike LangChain’s DAG-based approach, LangGraph can handle cyclic workflows and parallel execution, making it useful for orchestrating multiple specialized L…
2025 State of AI Code Quality
According to a 2025 Qodo report, AI coding tools are widely adopted by developers, with 82% using them daily or weekly. However, one of the main challenges they face is the lack of contextual awareness in AI suggestions. AI’s impact on code quality and developer satisfaction rises a lot when productivity is coupled with automated review and increased confidence in the AI’s output. is a course where you can learn to build and deploy AI agents with tools, memory, and MCP. The course itself is delivered within an agentic code editor and guided by a code agent who helps you build real agents from scratch.
Reflections from Sam Altman
OpenAI started almost nine years ago with the goal of building the most impactful technology in human history: artificial general intelligence (AGI). The launch of ChatGPT on November 30, 2022, kicked off the current AI revolution. Over the past two years, the company had to build almost from scratch around this new technology, which was a difficult and messy process. Sam Altman considers these years to be the most rewarding, fun, best, interesting, exhausting, stressful, and unpleasant years…
Google rolling out upgraded Gemini 2.5 Pro preview
Google has released an upgraded preview of Gemini 2.5 Pro. It will be generally available in the coming weeks. The model leads in coding across benchmarks like AIDER Polyglot and has top-tier performance on GPQA and Humanity’s Last Exam. It has an Elo score of 1460 on LMArena and 1443 on WebDevArena.
Anthropic’s Claude can now build AI apps
Anthropic is rolling out a beta feature that lets Claude users build, host, and share AI-powered apps directly within the Claude app without separate API costs. The feature allows Claude to write real code and handle technical tasks, though it has limitations like restricted external API access and no persistent data storage.
Google AI Mode: First Thoughts & Survival Strategies
Google’s new AI Mode is actually pretty good, and that’s the problem. Users get comprehensive answers without clicking through to websites, cutting publisher traffic by 50%+. Good news: only 1% of users currently use it. Bad news: that won’t last. Start thinking of survival strategies, like diversifying your traffic through other forms of media or building a brand so strong that people find you directly.
FLUX.1 Kontext [dev] - Open Weights for Image Editing
FLUX.1 Kontext [dev] is a 12B parameter model that can run on consumer hardware that delivers proprietary-level image editing performance. It is available as an open-weight model under the FLUX.1 Non-Commercial License, which provides free access for research and non-commercial use. The model is compatible with the existing FLUX.1 [dev] inference code and comes with day-0 support for popular inference frameworks like ComfyUI, Hugging Face Diffusers, and TensorRT.
[How far can reasoning models scale)
OpenAI’s reasoning models like o3 have rapidly improved, but further scaling of its techniques might be limited. Reasoning training, which significantly enhances models post-pre-training, currently uses less compute than the largest training runs, suggesting room for growth. However, challenges like data constraints and the importance of innovations beyond just compute scaling could impact future progress.
Rack-scale networks are the new hotness for massive AI training and inference workloads
Meta launches AI ‘world model’ to advance robotics, self-driving cars
V-JEPA 2 is designed to understand the movements of objects to better understand, plan, and predict in 3D environments.
How the top AI founders are building products completely opposite of the SaaS era
AI founders are figuring out how to apply capabilities and models to domains and users instead of asking customers what they want.
The Illusion of Thinking in Reasoning Models
Apple researchers evaluated Large Reasoning Models (LRMs) using custom puzzle environments to study reasoning complexity. They found LRMs collapse at high complexities, with reasoning effort peaking then declining.
JigsawStack Launches Open-Source Deep Research Tool (GitHub Repo)
The framework orchestrates LLMs, recursive web searches, and structured reasoning to generate reports that would take a human hours or days to complete. JigsawStack offers control over research depth, breadth, model selection, and output formatting while maintaining strong citation transparency.

Need help with strategic technology?