AI Config: From Afterthought to Infra

TL;DR

What	Details
Project	Méthode Aristote EdTech platform, 9 months (Aug 2025 → Apr 2026)
Total AI config commits	506, 8.6% of all repo commits
Current scale	407 files in `.claude/`, 34 rules, 15 hooks, 36 skills, 57 commands, 17 agents
Distribution	23% in first 5 months, 77% in last 4 months
5 phases	Afterthought → Documentation → Infrastructure → Engineering Practice → Compound Engineering
Context Diet	Apr 2026: always-on context 2,518L → 646L (-74%), ai:score 85 → 125/145
The big bang	Jan 6, 2026: 86 files created in a single day (commit `1421e863`)
Key insight	AI config doesn’t get designed upfront, you figure out what it needs by running into the friction.

This isn’t a guide to setting up a perfect CLAUDE.md, it’s a post-mortem of how AI configuration evolves when you use it seriously over 9 months on a production codebase.

The data is real: I ran git log --all --format="%ad" --date=format:"%Y-%m" -- CLAUDE.md .claude/ on the repository and categorized 506 commits touching AI configuration files. The pattern surprised me, and so did the final number: 8.6% of all commits in this repository touch AI config. CLAUDE.md alone has been committed 149 times.

The data

Commits AI-config per month (506 total, 9 months)

Aug 2025  ██                                                       5
Sep 2025  ████████                                                24
Oct 2025  ███████████                                             34
Nov 2025  █████████                                               27
Dec 2025  █████████                                               27
Jan 2026  ██████████████████████████████████████████████         138  ← Big Bang (86 files in 1 day)
Feb 2026  ███████████████████████████                             81
Mar 2026  ███████████████████████████████████████                116  ← ACE pipeline
Apr 2026  ██████████████                                          43  (partial)

          |←── 23% (117) ───→|←──────────── 77% (378) ────────────────|
                 Aug–Dec                       Jan–Apr

Month	AI-config commits	% of total
Aug 2025	5	1.0%
Sep 2025	24	4.7%
Oct 2025	34	6.7%
Nov 2025	27	5.3%
Dec 2025	27	5.3%
Jan 2026	138	27.3%
Feb 2026	81	16.0%
Mar 2026	116	22.9%
Apr 2026	43	8.5% (partial)
Total	506	100%

Two spikes, not one. January 2026 (month 6) and March 2026 (month 8). Together they represent 50.2% of all AI config activity. Both were triggered by specific structural decisions, not by gradual accumulation.

Phase 1: Config as Afterthought (Aug-Sep, 29 commits)

The first AI config file. August 22, 2025, commit a48d5017b. CLAUDE.md, 282 lines, shipped in the first release.

The content was minimal. Project identity, the T3 stack (Next.js, tRPC, Prisma), basic conventions. A starting point, not a system.

Aug 22 ─── CLAUDE.md (282 lines) ──────── Born: AI config exists
Sep 05 ─── .claude/ directory ──────────── Dedicated structure
Sep 11 ─── .claude/commands/ ───────────── First slash commands

At this stage the mental model was basic: CLAUDE.md as a context file, you write down what the AI needs to know. Useful the way documentation is useful, better than nothing, but not yet a system.

The AI had the obvious stuff: project stack, naming conventions, basic architecture intent. Everything that mattered day-to-day was missing, meaning the business domain, the patterns we’d already established, the reasoning behind specific decisions, and what it should refuse to do.

24 commits in September added the .claude/ directory structure and first slash commands. Basic namespacing: tech:commit, tech:PR, tech:review. Useful shortcuts for repetitive operations.

At this point, AI configuration is something one person maintains informally, evolving whenever something breaks or feels missing. No system yet, just a file someone cares about.

Phase 2: Config as Documentation (Oct-Dec, 88 commits)

Phase 2 is where the AI starts needing business context, not just technical context.

Oct 15 ─── .claude/agents/ (v0.8.0) ────── First custom agents
Oct 23 ─── knowledge-base.md (v0.10.0) ─── Business rules codified
Nov 04 ─── MCP Serena (v0.14.0) ─────────── Persistent memory
Nov 21 ─── CLAUDE.md -50% tokens ────────── First optimization pass

October 23: doc/knowledge-base.md created. This is when the configuration started encoding business knowledge rather than just technical setup. Session mechanics (supervised vs autonomous, 15-minute tolerance, doublet/triplet offsets). User lifecycle rules. Tutor compensation logic. The glossary of French terms that appeared in the code.

Without this, the AI was technically capable but business-ignorant. It could write a repository method but didn’t know that a “session” in this codebase meant something specific (SUPERVISED: 1h with a tutor, or AUTONOMOUS: 30min solo), with a lifecycle of SCHEDULED → STARTED → COMPLETED.

November 4 (v0.14.0): MCP Serena integration. Persistent memory across sessions. The AI could now remember architectural decisions made in previous sessions without restating them every time.

November 21 (v0.15.6): CLAUDE.md optimized, 50% token reduction. The file had grown organically and accumulated noise. First deliberate compression pass.

By Phase 2 the mental model had shifted: AI configuration as onboarding documentation, the kind you’d write for a new senior hire. Business rules, conventions, architectural decisions, the “why” behind the patterns.

Phase 3: Config as Infrastructure (Jan, 138 commits, 27.3%)

January 2026 is the first turning point. The configuration stops being a file and becomes a system, faster than any planned migration would have allowed.

January 6, 2026. Commit 1421e863. 86 files created in a single day.

Jan 06 ─── 12 agents + 5 hooks + settings.json ─── Big Bang (86 files, 1 day)
Jan 09 ─── .claude/rules/ ──────────────────────── Guardrails formalized (21 files)
Jan 16 ─── grepai MCP ──────────────────────────── Semantic code search
Jan 19 ─── Pre-push security hooks ─────────────── Defense at commit level
Jan 26 ─── Tasks API (450 lines doc) ────────────── Multi-session management
Jan 26 ─── Perplexity MCP, Jam.dev MCP ──────────── Expanded context sources
Jan 29 ─── SonarQube MCP ───────────────────────── Real-time quality analysis
Jan 29 ─── 283 tests added ─────────────────────── TDD enforcement in practice
Jan 29 ─── RTK enforcement hook ─────────────────── Token optimization mandatory

What triggered the explosion was team growth. Augustin was joining, and the configuration that worked for one developer (me, on macOS, with a specific workflow) now had to cover multiple people on different setups, tools, and levels of experience. A single monolithic CLAUDE.md couldn’t absorb that, so a system had to.

Skills: 12 agents and the first skills created on January 6, growing to 36 skills today. Loaded on-demand rather than burning context permanently. TDD methodology, security playbooks, database patterns, accessibility rules, all available on trigger and silent otherwise.

Hooks: 5 hooks created on January 6 in a single commit: dangerous-actions-blocker.sh, security-gate.sh, activity-logger.sh, auto-format.sh, notification.sh. Pre-push security checks, token optimization enforcement (RTK mandatory for all CLI operations). These run automatically without requiring the developer to remember to run them. For a full breakdown of hook events and what each can deterministically block, see Claude Code Under the Hood.

Rules: First 3 rule files on January 9, growing to 34 today. Guardrails that fire during coding sessions. Silent catches, hidden fallbacks, and unvalidated nullable access are out. A failing test before implementation code is mandatory, hence the blunt version of the rule: “Write code before the test? Delete it and start over.”

6 MCP servers integrated in 3 weeks: Serena (persistent memory), grepai (semantic code search), Perplexity (web search with citations), Jam.dev (bug recording), SonarQube (code quality), Postgres read-only (direct production queries for context).

By Phase 3, AI configuration has its own PRs, its own review process, and a real maintenance burden. It’s infrastructure in practice. You optimize it, test it, and measure the impact when changes land.

Phase 4: Config as Engineering Practice (Feb, 81 commits, 16.0%)

81 commits in 28 days = 2.9 commits per day on AI configuration alone.

Feb 03 ─── .cursor/ config ──────────────── Cursor support (Augustin)
Feb 05 ─── profiles/ + modules/ YAML ────── Modular system
Feb 09 ─── Zod validation + CI ──────────── Config has tests
Feb 11 ─── Cross-editor sync ────────────── Claude + Cursor synchronized
Feb 13 ─── Memory compression -6.2K tokens ─ Ongoing optimization

February 5 (PR #598): The modular system. Instead of one CLAUDE.md that everyone reads, a generation pipeline:

5 YAML profiles (one per developer)
14 modules (composable content blocks)
A TypeScript pipeline that assembles them with Zod validation
Generated outputs: CLAUDE.md (703 lines for Florian, Claude Code, all modules) and .cursorrules (289 lines for Augustin, Cursor, minimal modules)

February 9 (PR #614): The generated outputs have their own tests. The pipeline validates that no placeholder remains unresolved. The CI catches configuration regressions.

The AI configuration now has the properties we expect from production code:

Version controlled (source files, not generated outputs)
Validated (Zod schema with 7 fields, 5 valid tone values, 6 valid feature modules)
Tested (pipeline tests, CI checks)
Reviewed (PRs for configuration changes, same process as feature PRs)
Documented (450 lines of documentation for the Tasks API alone)

Phase 5: Config as Compound Engineering (Mar, 116 commits, 22.9%)

March 2026 produced more AI config commits than January, which wasn’t in any plan.

Mar 04 ─── ACE pipeline + 12 ADRs ──────── Commit 3fc8c14f (43 files)
Mar 04 ─── Compound engineering patterns ── Architecture decisions codified
Mar ────── multi-agent-coordination.md ──── Agent orchestration rules
Mar ────── research-output.md ───────────── Structured research protocol
Mar ────── retex-review.md ──────────────── Post-task retrospective system

Commit 3fc8c14f: ACE pipeline complet, 12 Architecture Decision Records, compound engineering patterns, 43 files in one commit. The configuration had matured to the point where it could start encoding how to evolve itself.

Phase 5 goes past “more rules”. The configuration starts capturing meta-patterns: how to coordinate agents, how to structure research before implementation, how to extract learnings after each task. The system began codifying its own methodology.

April 2026: The Context Diet

The next inflection wasn’t a spike in commits. It was a deliberate reduction.

By April 2026, the configuration had grown to 23 always-on rules loaded into every Claude session, regardless of what you were working on. 2,518 lines of guardrails firing unconditionally, whether you were touching a React component or a Prisma migration. The system worked, but it was eating 14% of a 200K context window before any code was read.

The branch fix-improve-context ran in 5 phases over a single week.

Phase 0, baseline. I built a scoring script (scripts/ai/score-ai-context.ts) before touching anything, because “better” needed to mean something measurable. 85/100, grade A. That’s the number you optimize from, not a vague intuition.

Phase 1, triage. Every rule classified against one question: does Claude need this without being asked? Three categories emerged. CONSTRAINT means always-on, non-negotiable. PROCEDURE means step-by-step workflows that load on demand. HYBRID is a short directive plus a long protocol. Of 23 always-on rules, 13 turned out to be procedures or hybrids masquerading as constraints.

Phase 2, extraction. Nine procedural rules converted to skills: code-duplication.md became /tech:dupes, defensive-code-audit.md became /tech:audit, implementation-checklist.md became /tech:checklist. Each always-on version got replaced by a 5-to-15 line stub keeping the directive, while the protocol itself moved on-demand. A smart-suggest.sh pattern was added to every extracted skill so Claude surfaces it when relevant context appears, without loading it permanently.

Phase 3, compression. Rules that belonged in always-on but were too verbose got trimmed. rtk-enforcement.md disappeared entirely since RTK already lives in the global CLAUDE.md, making the per-project copy redundant. debugging-methodology.md dropped from 112L to 43L, with three more rules cut alongside it. Net result, 2,518L down to 646L in always-on context, a 74% reduction.

Phase 3+, Cursor parity. The .cursorrules file had grown into a 703L monolith. Converted to a 133L stub (metadata plus pointers) with 23 .cursor/rules/*.mdc path-scoped rules carrying the real content. Cursor now loads rules only when file paths match, so the same economy applies to Claude.

Phase 4, machine-readable index. Three new generated files landed in machine-readable/:

ai-config.yaml (~270L): structured index of all 27 rules, 36 skills, 59 commands, 16 agents, 15 hooks, module list, profile list. One @ reference answers “what’s available?” in a new session without grep.
llms.txt (~50L): standard llms.txt format for LLM crawlers and context injection.
llms-full.txt (~4,500L): full content concat of all rules, modules, and skeletons, for offline/no-tools fallback.

All three generated by pnpm ai:sync. Three new canary checks (C18-C20) were added, worth +10pts to the quality scorer.

The final numbers:

Always-on context:   2,518L → 646L       (-74%)
.cursorrules:          703L → 133L        (-81%)
ai:score:              85/100 → 125/145   (+B grade, +10pts machine-readable)
Canary checks:         17/17  → 20/20

Same pattern that shaped the January and March spikes. Friction made the cost visible, measurement made the improvement verifiable, and the tooling (scoring script, canary checks, generated index) catches the next drift before it compounds. The grade matters less than the maintenance loop underneath. Once that loop runs, the next accumulation doesn’t get the chance to build into a cleanup sprint.

Why two spikes, not one gradual curve

There’s no planning failure here. You can’t write the rules for problems you haven’t hit yet.

Day one, you don’t know what your AI configuration needs to be. You find out by running into the friction.

The January spike was team-driven. Rules that prevent silent catches showed up after we caught AI-generated code doing exactly that. The TDD enforcement rule landed after a stretch of shipping code without tests. The profile system appeared once a second developer with a different setup needed the configuration that had been tuned for one person. Guardrails encode lessons, and you can’t write those down until you’ve hit the problem they’re solving.

The March spike was scale-driven: not new people joining, but the system itself becoming complex enough to require architectural governance. The 12 ADRs codified decisions that had been made implicitly over months. Compound engineering patterns emerged from observing what worked across 1,100 commits. The configuration caught up with the maturity of the codebase.

Team size shapes the investment curve too. Phase 1-2 configuration works fine for a solo developer. Phase 3-4 becomes necessary once someone else joins and the monolithic setup stops fitting their machine or context. Collaboration is what eventually forces the systematization, while solo use rarely asks for it.

What to take from this

If I were starting from scratch today, I wouldn’t try to skip to Phase 3. Those abstractions encode lessons I didn’t have yet. The rules describe problems that hadn’t happened. Start with a CLAUDE.md that honestly reflects what you know: stack, conventions, the business domain you’ve figured out so far. The rest shows up when it needs to. That climb from a plain file to versioned, tested infrastructure is the ladder the six-level context engineering guide lays out one level at a time. This article is that ladder playing out in one repo over five months; the guide is the version without the commit archaeology.

The thing I’d track earlier: hook execution rates and token counts. Not because early investment pays off faster, but because “better” without a score is just a feeling. 713 lines vs 289 lines is a real cost difference. Once I had the scoring script, optimization became verifiable instead of intuitive. I later built ccboard to watch exactly that, hook activity, per-session cost, and MCP health in one view, which is the instrumentation I wish I’d had in this phase.

The 77% back-loaded distribution isn’t a planning failure. It’s what an honest investment curve looks like when you’re building a system while using it at the same time. You can’t know what the system needs to be until the friction shows you.

The real number is this: 8.6% of all commits in this repository touch AI configuration. On a 5,820-commit project, that’s 506 commits dedicated to improving how the AI works alongside the humans. More than most business logic files received. If that feels like a lot, consider that the alternative is a stale CLAUDE.md and an AI that drifts from the codebase it’s supposed to understand. The factual half of that drift, the paths and scripts and version numbers that quietly stop matching the code, is what ctxharness checks automatically, so the stale-CLAUDE.md case gets caught instead of noticed months later.

This dataset gives a concrete answer on timing: later than any upfront plan would suggest, and then much faster than expected once the team’s complexity makes it unavoidable.

Beyond the two-person team

This dataset is one project, two developers, one stack. The friction that shaped the 506 commits was scoped to that size: a second dev joining, one monorepo to cover, one set of conventions to encode.

At organizational scale, the friction shifts shape. Across multiple repos, the same rule ends up duplicated in N CLAUDE.md files and drifts the moment someone updates one without the others. Across stacks, a rule about TypeScript Result types doesn’t translate to a Rust repo using anyhow::Result. And once multiple teams are in the mix, profiles proliferate: not 5 YAML profiles but 50, most of that variance with legitimate reasons behind it.

The pattern likely to emerge: a dedicated AI config repository as the source of truth, consumed by downstream projects. A few shapes this can take:

Git submodule: the config repo is pinned at a commit, reproducible across machines, stack-agnostic. The cost is submodule UX, which is real but manageable with a wrapper script.
Published package (npm, cargo, pip): versioned with semver, distributed through existing release workflows. Works well when your org is stack-homogeneous, less well when a TypeScript team and a Rust team both need the same base rules. It’s the shape I used for a narrower slice of the same problem, reusable hooks and templates rather than full org config: claude-code-plugins ships 181 of them as installable plugins, versioned and pulled in per project.
Shared config CLI that fetches + assembles (think Terraform modules or a dotfiles manager): purpose-built, cross-stack, but it’s another tool to maintain.

Whichever shape, the mental model is the same: org modules + team modules + project modules, composed by a pipeline. The modular system that emerged at Phase 4 for one project is the same shape, scaled up one level.

I haven’t run this setup myself. The projection rests on the principle that made Phases 3-4 work: systematize when collaboration demands it, not before. At org scale, that demand jumps from repo-internal to cross-repo. The investment curve likely mirrors what happened here, a long slow accumulation followed by a spike once a second repo needs the same rules that had been tuned for one.

If you’re in that situation, the prediction is concrete: you’ll resist the central repo for months because your team-level configs feel fine, and then you’ll build it in two weeks because version drift across four repos became impossible to manage.

If you’ve already hit that inflection point at org scale, I’d be curious to hear what shape it took.

Single project (this article)        → CLAUDE.md + .claude/ scoped to one repo
Multi-repo org (logical next step)    → Shared AI config repo + per-project override

The bottom line

AI configuration follows a predictable arc: it starts as a single file someone maintains informally, and it ends up as a system with tests, a generation pipeline, and its own PR review process. The 506 commits on this one project aren’t an outlier, they’re what happens when you take the tooling seriously and measure it. Each phase emerged from friction, not from a plan: the rules that exist today encode problems that had to be hit first. If you’re starting out, don’t try to skip ahead. Ship a CLAUDE.md that reflects what you actually know, instrument it early (hook execution rates, token counts, a scoring script), and let the friction show you what comes next. The investment curve will back-load itself whether you want it to or not.

The modular AI instruction system described here (YAML profiles, modules, generation pipeline with Zod validation) is documented in the Claude Code Ultimate Guide. Source data from the Méthode Aristote repository git history (5,820 commits, v0.1.0 → current). The archaeology command: git log --all --format="%ad" --date=format:"%Y-%m" -- CLAUDE.md .claude/ | sort | uniq -c.

From Afterthought to Infrastructure: How AI Config Evolves in a Real Project