Using Agent Skills to Teach LLMs New Things
I’ve been experimenting with using AI to write software (“vibe coding”). Claude and ChatGPT often excel at scaffolding and implementing the basic features of an application, but as the application becomes more complex, the quality of LLM output decreases. The cause of this seems to be dilution of context—as the application grows, the main goal can get lost.
One solution to the problem is to externalize project context into documents like README.md, ARCHITECTURE.md, or ROADMAP.md. You can tell the LLM to read from these files at the beginning of every chat. To make the best of it, these documents should link to other files with more detailed information, so the LLM can reference them as needed. (This is called progressive disclosure).
However, it can be difficult to make sure that these context documents are used properly in every new chat. Copying prompts and maintaining a prompt library is time-consuming and tedious. This is where “agent skills” come into play. Skills are folders of instructions and code that teach an LLM how to perform a specific task. They can live in your home directory, where CLI tools like Codex or Claude Code can use them in every project, or in a specific project repo under .agents/skills. The difference between agent skills and a prompt library is that the LLM can decide when to use a particular skill based on user input.
For example, if I have a skill called “GIF creator” and I ask Codex to create a GIF of Arnold Schwarzenegger doing the Moonwalk, it knows about the relevant skill and will use it because it fits the request.
Note that this is also different from an MCP server, which provides tools behind an API that the LLM interacts with programmatically. MCP servers have strict API definitions and are written in code. Skills, on the other hand, are written in natural language, and are therefore easy to revise and update. Like MCP servers, they can provide specific tools in the form of programs or scripts that the LLM can use to complete the task.
This powerful system was developed by Anthropic and recently released as an open standard. Agent skills are supported by ChatGPT Codex, Claude Code, and other LLMs. To learn more about agent skills, you can head to agentskills.io, which provides an excellent introduction. Anthropic’s GitHub repository has several interesting examples of the cool things you can do with them, as well.
ChatGPT Codex comes with a “Skill Creator” and a “Skill Installer.” These skills are a quick way to teach an AI agent how to do something or to install predefined skills for common tasks.
To help solve the quality problem in my application development, I’ve created several skills based on different roles in a development project. I’ve created a project manager skill, whose main task is to maintain context documents such as README.md and TASK.md. This skill instructs the LLM to work with the user to plan features and keep the project focused on hitting predefined milestones. It doesn’t write code; instead, it tries to prevent scope creep and keep the project focused on primary goals.
Another skill is “Senior web developer,” who is responsible for writing code and implementing medium to large-scale tasks. The senior developer works with the project manager via the context documents to write more maintainable code that supports the project goals. This skill is the primary one I use to write code in the app I’m developing. Like every skill, this one provides a concise overview of the instructions for the senior developer, and it also includes supporting documents that the LLM can read as needed. For example, .agents/skills/senior-dev/SKILL.md defines the main instructions for the senior developer, but context-building.md in the same folder gives additional instructions for when the task is not quite clear. SKILL.md links to these other context documents and provides a short summary of when they should be read.
Next, I’ve created a “UI reviewer” whose main goal is to inspect the visual design of the app and make suggestions for improvement based on the design principles I laid out for it. Based on the instructions, it can launch a headless browser to take screenshots, interact with the app using Playwright, and analyze the app layout and workflow for pain points.
Lastly, I’ve created a “Git committer” skill that lays out my preferences for Git commits, specifically Conventional Commits. It analyzes unstaged files in the repository and breaks them into separate commits with appropriate messages for each.
I’ve only just scratched the surface with these. I’m sure there’s a lot more for me to learn, but here are a few things about it that makes me excited:
- Agent skills give organizations a way to provide domain expertise and business knowledge to LLMs in a concise, structured way. You can give whatever context and tooling the AI needs to do a task in a reliable, repeatable way.
- The agent skill specification provides a way to teach the LLM when it should use a skill and when it should do something else with example user queries. You can use techniques from data science such as train-test splits on these example queries to improve the calling rate. When the skill is triggered at the wrong time, fail-safes and redirection can be built into the skill.
- The skill provides a fast way to develop specific use cases for generative AI. Examples include CSV analysis, slideshow development and editing, or even GIF creation.
I’ve just started using them in my personal projects, but I plan to explore how I can also use this new framework at work, as well. I’m excited to use this to solve my context dilution problem. I’m also incredibly interested in exploring how AI can be used to complete complex tasks with minimal human intervention. Agent skills are a promising tool for that.