Model Your Task
Define states, transitions, and decision prompts
Break your process into a state machine. Each state is a decision point; each transition is a possible action. Write a decision prompt for each state that tells any specialist — human or AI — what to evaluate and what their options are.
Build Human Screens
Give humans the interface to operate the machine
Build screens for each state so humans can see the context, select the next transition, and provide metadata and reasoning for their choice. Every human decision becomes ground truth that the system learns from.
Build Prompt Strategies
Give LLMs the tools to participate
Create prompt strategies that collect recent session history and exemplars so LLMs can select the next transition as a tool call, providing metadata and reasoning as parameters. The prompts improve automatically as exemplars accumulate.
Train the Human Team
Assemble operators and teach them the screens
Recruit the humans who will operate the state machine. Train them on the screens, the decision criteria, and how to provide clear reasoning. Their decisions are the standard everything else is measured against.
Assemble the LLM Group
Register AI specialists with their prompt strategies
Register a group of LLMs as proposers, each with a prompt strategy. Different models, different prompts, different approaches — all competing at the same decision points. More diversity means better coverage.
Humans Operate, LLMs Shadow
Run sessions with humans in the lead
Humans begin operating sessions using the state machine and screens. In parallel, LLMs shadow every decision — submitting proposals managed by an arbiter. At this stage, every decision still requires human approval. The LLMs are learning, not deciding.
Measure and Prune
The arbiter accumulates alignment data on each specialist
Over time, the arbiter tracks how often each LLM agrees with human decisions. Specialists that consistently disagree or add no signal are disabled. The group is gradually reduced to as few LLMs as possible — the ones that best predict what humans would choose.
LLMs Take Over
Consensus confidence passes the threshold
Iterate on the LLMs and prompt strategies until consensus confidence crosses the threshold you set. The LLMs gradually take over the process — making decisions autonomously while humans drop to occasional spot-checks. If alignment ever degrades, the system automatically reverts to human oversight.
Want the full picture?
The implementation guide covers the exact algorithms — consensus scoring, alignment measurement, pruning criteria, the trip line, and self-healing — with worked examples at every stage.
Read the implementation guide →