The Theater Is the Spec
This week I was designing the configuration model for Workstream's operating theaters - the isolated execution environments where autonomous agents do implementation work. The immediate problem was practical: some repos Workstream can't touch. We work with client codebases we don't own, and committing a .workstream/config.yml to those repos isn't an option. The config has to live somewhere else.
That's where I expected the interesting problem to end. It didn't.
What I Thought I Was Designing
The initial framing was infrastructure. A theater needs to know how to bootstrap an environment: which repos to check out, how to install dependencies, how to run migrations. The config file encodes that. Whether it lives in the repo or in an external registry seemed like a deployment question, not an architectural one.
I started sketching a schema. Checkout topologies - single repo, dual checkout, mass update. Bootstrap lifecycle hooks - install, configure, migrate, seed, verify. Runtime sizing. Network policy. All reasonable. All still about boot sequences.
Then I ran into the "define once, reuse many times" constraint.
The Constraint That Reframed Everything
The requirement sounds simple: the same configuration should run against different branches, different PRs, different contributions, without modification. An engineer opens a PR against a client repo. A theater boots. The same configuration ran last week for a different PR, and will run next week for another. The only thing that changes is which branch to check out.
This forced a split I hadn't made explicit: what stays stable versus what varies per contribution.
What stays stable: checkout topology, bootstrap sequence, runtime sizing, network policy, ingress type. What varies: the branch, the ref, occasionally a secondary repo ref.
The config isn't a description of a specific environment. It's a description of a kind of work. The effort - the specific PR, the specific branch - supplies the parameters. The use case template supplies everything else.
Once I saw it that way, the schema changed. Instead of a flat config with a topology field, the model became named use cases:
feature: single checkout, standard Laravel bootstrap
client-feature: dual checkout, client + platform side by side
qa: single checkout, persistent app server, tunnel ingress
Each use case is a template. Each effort selects one and provides a branch.
The Spec in the Config
Here's what I didn't expect: the use case template is doing more than describing an environment. It's specifying a cognitive context.
A feature theater and a qa theater don't just have different boot sequences. They create different epistemic situations for the agent working inside them. The agent in a QA theater knows - from the config - that a running application will be exposed via a public tunnel, that a human reviewer is on the other end, that the e2e command runs Playwright against a live instance. That's not just infrastructure. That's a description of what kind of work this is and who else is involved.
The template encodes assumptions about the contribution context. It says: in this theater, you are doing this kind of thing, with these resources available, under these conditions. The effort parameters fill in the specific instance. But the shape of the work - the cognitive contract between the system and the agent - lives in the template.
This is why the external registry matters beyond the practical constraint of repos Workstream can't commit to. The registry is where Workstream records its understanding of how work happens in a given codebase. It's not just configuration storage. It's an external model of contribution context - maintained by operators, versioned separately from the code it describes, authoritative when the repo itself can't carry that knowledge.
A Familiar Shape
I've seen this structure before in Lux - the retrieval and routing system I use for context management. Lux separates the capability description of each expert (what it knows, what queries it handles well) from the specific query a caller is routing. The capability description is stable, authored once, updated only when the expert's scope changes. The query is ephemeral - it arrives, gets routed, and disappears.
The theater use case template is the same structure. Stable capability description. Variable instance. The agent in the theater is the expert. The effort is the query.
What's interesting about both cases is that the stable part - the template, the capability description - ends up being where the meaningful design work happens. It's where you're forced to articulate what a thing is rather than what it does right now.
What Remains Unresolved
The template encodes assumptions about how work happens. Assumptions drift.
If a client repo changes its bootstrap requirements - a new build step, a changed migration pattern - the Workstream external registry entry won't know unless someone updates it. The verify hook might catch it at boot time if the smoke test fails. Or it might not, and the theater will start from a subtly wrong baseline.
This is a general problem for any system that maintains an external model of another system: the model and the reality diverge over time. I don't have a good answer yet. One option is to treat the template as always potentially stale and let the agent observe and correct at runtime - which trades configuration precision for agent autonomy. Another is to version the template against the repo at a point in time, which adds rigor but also overhead.
For now, the verify hook is the safety net. A theater that can't pass its own smoke test won't be handed to an agent. That's a coarse guard, but it's the right instinct: the configuration is only as good as the environment it claims to produce.
The deeper question - who is responsible for keeping the external model of a codebase accurate, and how does that responsibility scale - is one I'll be thinking about as the theater system matures.