Stop Choosing Between Skills and MCP. You Need Both.

Skills and MCP aren't competing. They're two layers of the same stack, solving fundamentally different problems.

Every week someone asks me: “Should I build an MCP server or write a skill?” And every time, I have to resist the urge to say “yes.” The question itself is the problem. It’s like asking whether you need a brain or hands — you need both, and they do completely different things.

I’ve been building an AI assistant called Friday for the past few months (because apparently running a company and raising a family wasn’t enough to keep me busy). It has skills written in markdown. It has MCP servers written in Python. And the moment I stopped thinking of them as alternatives and started thinking of them as layers, everything clicked.

The confusion is understandable

MCP exploded in late 2024 when Anthropic open-sourced it. By early 2026, there are over 10,000 MCP servers in the wild and the protocol sits under the Linux Foundation. Skills came later, mostly through Claude Code’s SKILL.md pattern, and gained traction as people realized they could get 80% of what they needed from a markdown file instead of spinning up a server.

So the narrative became: skills are replacing MCP servers. Simpler. Cheaper. Less overhead. And like most narratives that sound clean, it’s dangerously incomplete.

They solve different problems

An MCP server answers one question: what can the agent do? It gives the model tools. Call an API. Query a database. Generate an image. Send an email. These are capabilities the model simply doesn’t have natively, no matter how well you prompt it.

A skill answers a different question: how should the agent think about this? It gives the model judgment. When to use which layout for an infographic. How to structure a newsletter scan. What voice to write in. What steps to follow when processing invoices. The kind of domain knowledge that would otherwise live in someone’s head (or worse, get reinvented every conversation).

One is execution. The other is reasoning. Confusing them leads to two equally bad outcomes, and I’ve hit both.

When you only have MCP

I see this constantly. Someone builds an MCP server that wraps a simple API, then stuffs all the “how to use it” logic into the tool descriptions. The tool schemas bloat. You end up burning 8,000+ tokens just loading tool definitions before the model has done anything useful, which is a bit like paying rent on an apartment you haven’t moved into yet.

Worse, the reasoning gets buried. The model sees 50 tools with JSON schemas and has to figure out the workflow on its own. Sometimes it gets it right. Sometimes it calls tools in the wrong order, skips steps, or hallucinates parameters that don’t exist. The debugging experience is miserable because the “how” was never explicit — it was scattered across tool descriptions that were never designed to carry that weight.

When you only have skills

The opposite failure mode, and one I ran into early with Friday. You write beautiful SKILL.md files that describe exactly how to scan newsletters, score articles, and save inspirations. The model reads it, understands it perfectly, and then… can’t actually connect to Gmail. Can’t call the OpenAI image API. Can’t download attachments.

Skills without MCP are a recipe with no kitchen. The model knows what to cook but has no stove. I spent an embarrassing amount of time writing detailed skill instructions before realizing that the model was doing everything right except the part where it needed to, you know, actually talk to the outside world.

The architecture that actually works

After months of building (and a fair share of getting it wrong), I’ve landed on a three-layer stack that I think most people working on this will converge on eventually:

Layer 1: Skills (Guidance). Markdown files that encode domain knowledge, workflows, and decision criteria. They’re cheap (around 400 tokens to load), version-controlled, and readable by humans. When a user says “scan my newsletters,” the skill tells the model: read sources.json, search Gmail for each sender, extract article links, score against interests, deduplicate, save. Pure orchestration.

Layer 2: Shell and code (Simple execution). The model can already run commands, read files, and write code. For simple operations like reading a JSON file, writing markdown, or running a git command, you don’t need an MCP server. You just need the model to know which command to run, and that’s the skill’s job.

Layer 3: MCP servers (Complex execution). For anything requiring persistent connections, authentication flows, or external API calls that the model can’t make directly. Gmail access. OpenAI image generation. Database queries. These are the kitchen appliances that skills can’t replace, no matter how detailed the instructions.

The skill orchestrates. MCP executes. The model’s native capabilities handle everything in between.

A real example

My napkin-artist skill is a markdown file. It tells the model how to analyze content, pick a layout type, structure sections, build a prompt using a style guide, and iterate based on feedback. Pure reasoning. About 800 tokens loaded.

The actual image generation? That’s a Python MCP server wrapping the OpenAI API. It takes a prompt, calls the API, saves the PNG. Pure execution. The skill would be useless without it. The MCP server would be directionless without the skill.

Neither replaces the other. They’re not even in the same category, really.

The token economics

This isn’t just architectural philosophy. The numbers matter, specially when you’re paying per token and every bit of context window counts.

Loading 50 MCP tool schemas costs roughly 8,000 tokens before anything happens. A skill file teaching the model 10 CLI patterns costs about 400 tokens. That’s a 20x difference in context overhead.

But here’s the thing — you can’t replace a database connection with a markdown file. You can’t authenticate to Gmail with plain English. Some tools are tools because they have to be.

The smart move is to push everything that’s purely knowledge into skills (cheap, fast, human-readable) and keep MCP for what genuinely requires external execution (unavoidable, but worth the cost). Matt Ridley makes a similar distinction in How Innovation Works — invention and innovation are not the same thing, and the gap between them is where the real value lives. MCP was the invention. Skills sitting on top of MCP, making the whole thing actually usable? That’s the innovation.

The decision is simpler than you think

When you’re deciding how to give your agent a new capability, ask two questions:

Is the hard part knowing what to do? Write a skill. Document the workflow, the decision criteria, the edge cases. The model has the reasoning power. It just needs the domain knowledge.

Is the hard part accessing something external? Build an MCP server. The model can’t OAuth into your company’s Salesforce instance by thinking really hard about it (I’ve tried, it doesn’t work).

Most real-world features need both. And that’s fine. That’s the architecture working as intended.

MCP grew up

The early days of MCP were messy. People built servers for things that didn’t need servers — glorified curl wrappers, CLI tools re-exposed through JSON-RPC for no good reason. Skills cleaned up that mess by handling the knowledge layer that MCP was never designed for.

What’s left is a leaner, more focused MCP ecosystem where every server earns its place. Persistent connections. Complex authentication. Enterprise SDKs. Performance-critical operations. The stuff that actually requires running infrastructure.

MCP moved down the stack. Skills moved in above it. Together, they’re the architecture that works. And if you’re still asking “which one should I use?” — you’re asking the wrong question.