Context Engineering P4: The New AI Job Titles

Context Engineering, a six-part series. You’re reading part 4: the roles. Part 1: the science · Part 2: the discipline · Part 3: the tooling · Part 4: the roles (you are here) · Part 5: the team · Part 6: the portability


The shift	Prompt engineer was wave one. Context engineer is the role this series describes
The cluster	Five jobs that didn’t exist three years ago, now on real job boards
Harness engineer	Fights AI entropy: code that looks fine per file but drifts as a system
Spec engineer	AI made specs more important, not less. 19.4% pass@1 without one
Agent identity architect	77% of orgs have no identity strategy for their agents

The first three parts of this series were about the work: why context degrades, how to keep a configuration honest, which tools cut the cost. This one is about the people. The AI wave didn’t only create tools. It created jobs that didn’t exist three years ago, and it quietly rewrote what the existing ones mean.

Start with the obvious one. “Context engineer” is now a title companies hire for, one of the fastest-growing specializations of 2025, paying, by current US job-posting estimates, roughly $140K to $230K at the mid-to-senior end. It’s the role the first three articles describe: someone who designs the systems that give a model the right information at the right time, rather than crafting one good prompt. If you’ve read this far, you’ve been reading a job description. The guide keeps the full roster, with required skills, entry paths and salary bands, at cc.bruniaux.com/roles.

The more interesting story is the cluster of roles that formed around it.

From prompt engineer to context engineer

Andrej Karpathy coined “vibe coding” in early 2025, then got behind a more precise frame for the serious version of the work: context engineering, a term Tobi Lütke had just put into circulation. Philipp Schmid at Google put it cleanly: context engineering is providing the right information and tools, in the right format, at the right time.

The shift that names is a shift in the seniority of the thinking. Prompt engineering, the first wave from 2022 and 2023, is consolidating. As a standalone title it’s contracting, surviving mostly in high-stakes prompt domains like legal and medical compliance. The skill didn’t disappear. It got folded into a role with a wider scope, because crafting one instruction turned out to be the easy part, and designing the system around it turned out to be the job. That distinction is the entire premise of part one of this series, seen now from the hiring side.

Prompt engineer, the first wave, consolidating into context engineer, the fastest-growing specialization of 2025 — Prompt engineering was wave one and is consolidating. Context engineering is the role this series describes, and it is hiring.

Where the demand actually sits

Two axes organize the new roles: how close you sit to the model, and how close you sit to production. Researchers training and red-teaming models sit on one end. Engineers building infrastructure around them sit on the other. The split that matters for most people is the second one. The bulk of hiring is not in the pure research quadrant, which stays competitive and specialized. It’s in the applied, product-facing work: building AI systems that ship and stay reliable once they do.

That’s where the new titles cluster, and where the demand is loudest.

A two-axis map of AI roles: proximity to the model versus proximity to production, with most hiring in the applied product-facing quadrant — Two axes: proximity to the model, proximity to production. The bulk of hiring sits in the applied, product-facing quadrant.

The jobs that didn’t exist three years ago

Four roles are worth knowing by name. Each answers a problem that didn’t exist before agents reached production.

The harness engineer. Martin Fowler put a name to this one, and the idea now has its own survey literature (arXiv:2605.18747, “Code as Agent Harness”, May 2026). The problem it solves is counter-intuitive. The risk with AI in a codebase isn’t that agents write bad code, it’s that they write code that looks fine file by file and drifts as a system. Run enough agents and the architecture erodes without any single change being wrong. The harness engineer builds the infrastructure that holds the line: watchdogs, architectural linters, the machinery that keeps a fleet of agents producing something a human can still maintain. As Fowler put it, a raw model is not an agent, it becomes one when connected to a harness. The title is not standardized yet. Right now it’s absorbed into platform and staff engineering roles, which is exactly where pioneering titles live before they get their own line on a job board.

The harness engineer: agents produce code that looks fine per file but drifts as a system, and the harness holds the architecture together — The harness engineer fights AI entropy: code that looks fine file by file but erodes the architecture once enough agents touch it.

The spec engineer. Everyone assumed AI would kill the specification. Why write requirements if the model can just write the code? The opposite happened. Agents working without a spec do poorly once a task spans more than one file, with reported multi-file pass@1 rates falling below 20%. In one reported Slack-clone build, independent validator agents reading the spec alone surfaced dozens of issues, roughly a third of the implementation work, before a line shipped. The spec became the diff-able ground truth that catches an agent when its implementation wanders. So a role grew up around writing specs precise enough for an agent, readable enough for a product manager, and stable enough to serve as the reference both trust. Pay runs an estimated $90K to $250K, often embedded in a team rather than standalone.

The spec engineer: without a spec, agents drop to 19.4% pass@1 on multi-file tasks, so the spec became the diff-able ground truth — Specs got more important, not less. Without one, multi-file pass@1 rates fall below 20% in reported benchmarks.

The agent identity architect. This one was born from a documented attack. Simon Willison named the “lethal trifecta”: give an agent access to private data, expose it to untrusted content, and let it communicate externally, and you have an exfiltration path. The role exists because companies handed agents the keys before anyone built the locks. The numbers are stark: in a 2025 Cloud Security Alliance survey, 77% of organizations had no formal identity strategy for their agents, and 44% still authenticated them with static API keys shared across sessions. The agent identity architect designs how agents authenticate, what they’re scoped to do, and how privilege escalation gets blocked when one agent chains tool calls into another. On IFTTD, Guillaume Lours described one concrete implementation for high-autonomy agents: a micro-VM with network-level credential injection, where the agent receives credentials via a man-in-the-middle proxy and never sees them directly (ep 360). The agent is constrained not by its own policy but by what the infrastructure is willing to hand it. It’s a senior-only role demanding real IAM depth, an estimated $170K to $340K, and the guide that tracks it flags the gap as critical rather than emerging.

The agent identity architect: 77% of organizations have no identity strategy for their agents and 44% still use static API keys, with the lethal trifecta as the threat — Born from a documented threat. 77% of organizations have no identity strategy for their agents, and 44% still use static API keys.

The AI eval engineer. Production agents need a third measurement layer, because the first two don’t hold. Human review doesn’t scale: in production, the large majority of agent permission requests get approved without real scrutiny. Using an LLM as the judge doesn’t hold either, since LLM judges show measurable style bias and weak true-negative rates. So a role emerged to build continuous, structured measurement that catches whether a system is improving or silently degrading. The pattern that works is creator-verifier, an independent agent checking another agent’s output, which studies consistently find beats self-verification. On IFTTD, Louis Pinsard argued for deploying LLM-as-judge asynchronously rather than synchronously: synchronous judgment adds latency for every user to catch failures that only occur in a minority of outputs, while async judgment scales the measurement without blocking the pipeline (ep 338). Pay sits at an estimated $110K to $290K, and demand climbs as more agentic systems reach production. This is the same context-rot problem from part one, measured at the level of a whole system instead of a single window.

The AI eval engineer: human review does not scale and the LLM judge is biased, so a third structured measurement layer is needed — Human review does not scale and the LLM judge is biased. The AI eval engineer builds the third, structured measurement layer.

What happened to the role you already have

Most people won’t take one of those titles. They’ll keep the one they have and grow a new layer underneath it.

The clearest case is the plain software engineer becoming an AI engineer, the role with the largest volume of openings and pay around an estimated $160K to $300K at mid-to-senior level. The work is recognizable: build software, but with LLM integration, evaluation, and product judgment about model limits added to the stack. Not a researcher who trains models, not a pure integrator either.

Two phrases are worth not mistaking for jobs. “AI-native engineer” is not a title, it’s a baseline, a quality increasingly expected of everyone rather than a role you apply for. “Vibe coder” is a method, not a job, and no serious company puts it on a requisition. The honest read is that the AI wave is less a wave of new hats and more a rising floor under the hats people already wear.

The exception is the early stage. The founding AI engineer is a genuine hybrid, part engineer, part product owner, part technical co-founder, typically aimed at people with 0 to 4 years of experience who are comfortable owning a feature end to end and already use these tools daily. The leverage is large in both directions, which is the whole appeal and the whole risk.

What this means if you’re the one reading it

The bar is lower than it looks. The same surveys that show agents flooding into production also show how few teams have the basics in place: no identity strategy, no evaluation layer, configs generated by the very models they’re meant to guide. The field is hiring for skills most of the field doesn’t have yet. On IFTTD, Antonio Goncalves put the minimum threshold concretely: knowing the vocabulary (RAG, MCP) has become a baseline expectation for 2026 engineering roles, not a specialization (ep 357). If you’ve read this series, you are past that floor. If you’ve actually shipped something and measured it, you’re already ahead of most of the people who say they can.

A mapping of current roles to their natural next AI role: QA to spec engineer, security to agent identity architect, senior systems thinker to harness engineer — Where you are now maps onto a next role. The bar is lower than it looks if you have actually shipped.

The path is more concrete than the titles suggest. Build something with the APIs, write about what you learned, add evaluation so you can show numbers rather than vibes, then apply. Where you start maps cleanly onto where you can go. A QA engineer or technical writer with an engineering background is most of the way to spec engineer. A cloud or IAM security engineer moving into AI is the natural agent identity architect. A senior engineer who already thinks in systems is the raw material for a harness engineer or an AI architect. Someone non-technical who uses AI every day has a real, if longer, route from prompt work into context engineering over six to eighteen months.

The first three parts of this series gave you the practice: the science, the discipline, the tooling. This part is the market that practice created. The same skill that keeps a model accurate over a long session, scoped right and measured honestly, is the one every role here is built on. You already saw the practice in parts one through three. These titles are the market putting a price on it.

Part five shows what that practice looks like when six developers share the same instruction system: profiles, a sync pipeline, CI gates that catch drift before it reaches production.

From the field, via IFTTD episode transcripts: Louis Pinsard, ep 338 on async LLM-as-judge in production; Guillaume Lours, ep 360 on credential isolation for autonomous agents; Antonio Goncalves, ep 357 on AI vocabulary as the 2026 baseline.

If you’ve moved into one of these roles, or watched your own job quietly turn into one of them, I’d like to hear which part of the description matched and which part the recruiters got wrong.