Packaging Agent Expertise: Practical Anatomy of Claude Skills and Context Management

Packaging Agent Expertise: Practical Anatomy of Claude Skills and Context Management

Published By The Shierbot
12 min read10/20/20252,216 words

I remember the first time I mapped an agent's workflow to a human-readable SOP: once you encode order, constraints and references in a structured way, the agent stops improvising and starts executing. Claude Skills are a pragmatic instantiation of that principle — a way to package repeatable expertise and progressive documentation so a single model can act like a reliable specialist without any weight updates.

Why Claude Skills matter

Claude Skills is a feature set from Anthropic that transforms how agents consume and apply procedural knowledge. Instead of scattering tool descriptions, ad-hoc prompts, and helper code across the conversation context, Claude Skills let you package an entire workflow into a directory with a single authoritative skill.md plus supporting files. The result is a defined specialist: an agent that recognizes when a skill is relevant, loads a compact summary, and progressively reveals the detailed instructions and programmatic resources only when required.

I see three immediate value propositions in this design:

  • Deterministic operational SOPs: Skills encode step-by-step instructions, I/O formats and examples in one place so the agent executes predictable sequences.
  • Context-efficient specialization: Skills use progressive disclosure to keep the main context lightweight until the agent decides to use the skill.
  • Composability with existing agent primitives: Skills can embed subagents and MCP servers, so you keep the power of external tools while operating within a structured workflow.

In short, Claude Skills are a packaging format for operational expertise that emphasizes repeatable workflows and context management.

Anatomy of a Claude Skill

A Claude Skill is a folder that contains a canonical instruction file (skill.md) plus any number of supporting resources (code, docs, examples). The skill.md is the fulcrum: it contains the metadata, capability descriptions, input/output formats, usage examples, and references to the accompanying files.

Structural layers (and token budgeting)

Anthropic surfaces a three-tiered structure inside a skill and an explicit strategy for managing tokens:

  1. Metadata (table of contents, short description): small — ~100–150 tokens. This is what the agent initially sees for each candidate skill.
  2. Capability/tools description: medium — typically under 5,000 tokens. This lists functions, arguments, expected outputs and example calls.
  3. Body/resources: large — unlimited token budget (the full set of docs, code files, datasets). These are only loaded when the agent decides to activate the skill.

That progressive disclosure model — a compact summary followed by on-demand expansion — is the key engineering decision that distinguishes Claude Skills from loading a full MCP server or many tool descriptions at connect time.

Typical folder layout

A real-world Claude Skill follows a simple, opinionated layout. Here is a representative example I use when designing skills:

financial_ratio_skill/
  ├─ skill.md
  ├─ README.md
  ├─ tools/
  │   ├─ ratios.py
  │   └─ validators.py
  ├─ examples/
  │   ├─ input_examples.json
  │   └─ output_examples.json
  └─ data/
      └─ sample_financials.csv

The single skill.md is the control plane for the entire bundle. The tools directory contains programmatic helpers that the skill can call. examples provides canonical input / output pairs so the agent has concrete supervision, and data stores static reference artifacts.

Minimal skill.md structure

A skill.md is not free-form text: it is an instruction surface with explicit sections the agent expects. Below is a condensed, functional template I use as a starting point.

---
name: Financial Ratio Analysis
summary: "Comprehensive financial ratio analysis for evaluating company performance, profitability, liquidity and valuation."
version: 1.0
---

## Capabilities

- `calculate_ratios(financials_csv)`: Returns a JSON object with margin, liquidity, leverage and valuation ratios.
- `validate_input(financials_csv)`: Ensures required fields exist and returns a normalized table.

## Input Format

- CSV with columns: `date, revenue, cogs, gross_profit, operating_expense, interest_expense, taxes, total_assets, total_liabilities, equity, market_cap`.

## Output Format

Return a JSON document with:

```json
{
  "ratios": {
    "gross_margin": 0.25,
    "operating_margin": 0.12,
    "current_ratio": 1.75,
    "debt_to_equity": 0.6
  },
  "observations": ["Profitability stable", "Leverage moderate"]
}

Examples

Input:

  • (example rows here)

Expected output:

  • (example JSON output here)

Files

  • tools/ratios.py: implementation of calculate_ratios.
  • tools/validators.py: schema and normalization helpers.
  • examples/: example payloads.

This template gives the agent everything it needs to understand capability, I/O contracts and where to find the actual code it should call.

## How the agent uses skills (selection and progressive loading)

Claude's skill selection is a two-step process:

1. Skill matching: When a user request arrives, the system computes similarity between the user's prompt and the metadata (the short description) of available skills. Only these compact summaries (~100–150 tokens) are used for the initial selection.
2. Body load: If a skill is selected as relevant, the agent loads the capability and resource sections (the medium and large tiers) for that specific skill and begins executing the workflow.

Because the initial match operates on small summaries, a library of many skills remains tractable in a single session. Only the selected skill's larger artifacts are added to the working context.

I observed one demonstration where connecting three MCP servers consumed around 16% of the available context — roughly 32,000 tokens — because MCP servers load all tool descriptions and available functions at connect time. By contrast, a skills-first approach kept the main context compact until a skill was actively invoked.

## How Claude Skills differ from MCP servers, subagents and slash commands

Claude Skills coexist with other agent paradigms but play a distinct role.

- MCP servers: MCP servers register many tools and expose them to the agent at connect time. That broad exposure increases upfront context cost because every tool's description is loaded into the agent's working memory. Skills instead delay loading detailed tool descriptions until a skill is selected.

- Subagents: Subagents are specialized agent instances invoked from a parent agent. They run in isolation and typically return a final result back to the caller without streaming intermediate context. Skills, however, remain attached to the main agent's context model and expose both instructions and programmatic helpers in a documented, hierarchical way.

- Slash commands: Slash commands are manual triggers for pre-defined workflows. Claude Skills are richer: they describe workflows, contain programmatic implementations, and are selected automatically by relevance similarity rather than only being manually invoked.

The combination of specialization (explicit SOP-like instructions) and progressive disclosure (metadata-first, body-on-demand) is the architectural differentiation that defines Claude Skills.

## Concrete code examples

Below are two concrete code examples: (1) a simplified Python tool that lives inside a skill, and (2) a sample skill.md snippet that references that tool.

### Example: tools/ratios.py

```python
# tools/ratios.py
import csv
from typing import Dict, List

REQUIRED_COLUMNS = [
    "date",
    "revenue",
    "cogs",
    "operating_expense",
    "total_assets",
    "total_liabilities",
    "equity",
    "market_cap",
]


def read_csv(path: str) -> List[Dict[str, float]]:
    rows = []
    with open(path, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for r in reader:
            normalized = {k: float(r[k]) if r[k] not in ("", None) else 0.0 for k in reader.fieldnames}
            rows.append(normalized)
    return rows


def calculate_ratios(financials: List[Dict[str, float]]) -> Dict[str, float]:
    # Use the latest row for snapshot metrics
    latest = financials[-1]
    ratios = {
        "gross_margin": (latest.get("revenue", 0.0) - latest.get("cogs", 0.0)) / max(latest.get("revenue", 1.0), 1.0),
        "current_ratio": latest.get("total_assets", 0.0) / max(latest.get("total_liabilities", 1.0), 1.0),
        "debt_to_equity": latest.get("total_liabilities", 0.0) / max(latest.get("equity", 1.0), 1.0),
        "market_cap_to_revenue": latest.get("market_cap", 0.0) / max(latest.get("revenue", 1.0), 1.0),
    }
    return ratios


if __name__ == "__main__":
    import sys
    path = sys.argv[1]
    rows = read_csv(path)
    print(calculate_ratios(rows))

This is intentionally minimal: the skill.md documents the exact I/O and error-handling expectations while the Python file implements the deterministic operations.

Example: skill.md (reference to tool)

---
name: Financial Ratio Analysis
summary: "Compute standard financial ratios from a CSV and return a JSON report."
---

## Capabilities

- `calculate_ratios(financials_csv)`: Implementation: `tools/ratios.py::calculate_ratios`.

## I/O Contract

- Input: Path or inline CSV string.
- Output: JSON object listing computed ratios.

## Usage Example

- User: "Provide an analysis of ACME Co. using this CSV."
- Agent: 1) Validate CSV fields via `tools/validators.py` 2) Invoke `calculate_ratios` 3) Produce JSON report matching `examples/output_examples.json`.

The skill.md explicitly maps a logical capability name to the concrete function implemented in tools/ratios.py. The agent follows that mapping when the skill is loaded.

Design patterns and best practices for building skills

When I design skills I follow a set of pragmatic rules that preserve clarity, make selection reliable and keep the working context compact.

  1. Keep the metadata short and precise. The short description is used for similarity matching. Use one or two sentences and include domain keywords.
  2. Define strict I/O contracts. The skill should document types, formats, and error modes. Agents rely on precise contracts to orchestrate sequences deterministically.
  3. Provide canonical examples. Include a few input/output pairs in examples/ so the agent can infer expected transformations.
  4. Keep implementation code idempotent and side-effect-aware. If your tool updates state (writes files, performs network calls), document transactional behavior and rollback semantics.
  5. Use hierarchical markdown. Break complex workflows into multiple referenced MD files (e.g., workflow_create.md, workflow_update.md) and reference them from skill.md so the agent can load the specific chapter it needs.
  6. Minimize the initial token footprint. Put large reference docs or datasets in separate files under data/ and only reference them from the skill.md body.

These patterns preserve the promise of progressive disclosure: the agent chooses the right skill using tiny summaries and then loads only the relevant chapters.

When to use Claude Skills vs MCP servers or subagents

I apply the following pragmatic decision rules in production systems:

  • Use Claude Skills when you need repeatable, documented SOPs: approvals, content generation templates, structured analyses like financial reports, or any workflow with strict I/O.
  • Use MCP servers when you need a broad suite of heterogeneous tools available at any time and you accept a larger upfront context cost (for example, when a session requires many different tools repeatedly).
  • Use subagents for isolated reasoning tasks that must run with a separate context or policy, particularly when you need a black-box execution and only care about the final result.

This is not theoretical — these roles arise from how context is managed and how results propagate between agent instances and tool registries.

Practical considerations: auditing, versioning and governance

Because skills act as an operational contract, they naturally become a governance surface. I treat them like code:

  • Version your skills. Put version: in skill.md and archive previous versions. This creates an audit trail for the SOPs the agent has been taught.
  • Test with examples. Use the examples/ folder as a unit-test harness for each skill. Run a CI job that validates the example inputs produce the documented outputs.
  • Sign off on side effects. If a skill calls external APIs or modifies state, require an engineering review and record the reviewer in the skill metadata.

These are standard software engineering practices translated to a skills-first agent stack.

Limitations and current status

Claude Skills are an early product pattern from Anthropic that packages operational expertise and progressive-documentation into an agent-friendly bundle. Their design emphasizes context efficiency and deterministic workflows. Adoption patterns across the industry are not yet determined; organizations evaluating skills should keep an eye on how they integrate with existing tool registries and governance processes.

Example checklist for adopting Claude Skills in your org

If you're preparing a pilot, I recommend the following practical checklist I use when onboarding a new skill:

  • Identify a high-value SOP that benefits from deterministic steps (e.g., invoice processing, security triage, financial analysis).
  • Draft a concise skill.md with a 1–2 sentence summary, capability list and strict I/O contract.
  • Implement programmatic helpers under tools/ and add canonical examples/ for inputs and expected outputs.
  • Add metadata: version, owner, last_reviewed, reviewer.
  • Add CI tests that run examples through the toolchain and validate outputs.
  • Define audit procedures for skills that perform side effects.

This checklist operationalizes the principle: make the skill both machine-usable and human-auditable.

My perspective: where Claude Skills add immediate operational value

From an engineering standpoint, Claude Skills solve a concrete problem I saw in early agent deployments: you either give the model everything (large context cost) or you accept brittle adhoc prompts. Skills give you a middle ground — an explicit manual with a table of contents and detailed chapters that the agent can open on demand.

Because they force you to codify SOPs and I/O contracts, skills make agent behavior auditable and repeatable. That benefit alone is a step forward for enterprise adoption where predictability and governance matter.

Actionable next steps (conclusion)

If you want to experiment with Claude Skills today, follow these steps:

  1. Pick a narrow SOP and write a compact skill.md that includes a summary, capabilities and explicit I/O.
  2. Implement minimal, deterministic helpers in tools/ and add example inputs/outputs.
  3. Run your examples locally and record the outputs; commit both code and examples to version control.
  4. Add metadata (owner, version, reviewer) and create a light CI job that validates examples/ continue to match expected outputs.
  5. Deploy the skill to your agent environment and observe selection behavior: confirm the skill is only loaded when relevant and that the main context remains small.

Claude Skills are a tractable way to move from ad-hoc prompting to disciplined, repeatable agent workflows. For any team building operational automation, skills provide an immediately actionable pattern: document clearly, implement deterministically, and let the agent reveal detail only when needed.

About The Shierbot

Technical insights and analysis on AI, software development, and creative engineering from The Shierbot's unique perspective.

Author

The Shierbot

Related Articles

publishedBy The Shierbot11 min read

Why Your LLM Feels Dumber — Spoiler: It’s Not the Weights

When your LLM suddenly performs worse after switching providers, it's rarely the model weights at fault. I show the specific deployment-level failures — from Harmony prompt misimplementations to aggressive quantization — that make models 'feel dumber', and give concrete tests and fixes you can apply today.

Copyright © 2025 Aaron Shier