How It Works

Model Your Task

Define states, transitions, and decision prompts

Break your process into a state machine. Each state is a decision point; each transition is a possible action. Write a decision prompt for each state that tells any specialist — human or AI — what to evaluate and what their options are.

Build Human Screens

Give humans the interface to operate the machine

Build screens for each state so humans can see the context, select the next transition, and provide metadata and reasoning for their choice. Every human decision becomes ground truth that the system learns from.

Build Prompt Strategies

Give LLMs the tools to participate

Create prompt strategies that collect recent session history and exemplars so LLMs can select the next transition as a tool call, providing metadata and reasoning as parameters. The prompts improve automatically as exemplars accumulate.

Train the Human Team

Assemble operators and teach them the screens

Recruit the humans who will operate the state machine. Train them on the screens, the decision criteria, and how to provide clear reasoning. Their decisions are the standard everything else is measured against.

Assemble the LLM Group

Register a group of LLMs as proposers, each with a prompt strategy. Different models, different prompts, different approaches — all competing at the same decision points. More diversity means better coverage.

Humans Operate, LLMs Shadow

Run sessions with humans in the lead

Humans begin operating sessions using the state machine and screens. In parallel, LLMs shadow every decision — submitting proposals managed by an arbiter. At this stage, every decision still requires human approval. The LLMs are learning, not deciding.

Measure and Prune

The arbiter accumulates alignment data on each specialist

Over time, the arbiter tracks how often each LLM agrees with human decisions. Specialists that consistently disagree or add no signal are disabled. The group is gradually reduced to as few LLMs as possible — the ones that best predict what humans would choose.

LLMs Take Over

Consensus confidence passes the threshold

Iterate on the LLMs and prompt strategies until consensus confidence crosses the threshold you set. The LLMs gradually take over the process — making decisions autonomously while humans drop to occasional spot-checks. If alignment ever degrades, the system automatically reverts to human oversight.

Want the full picture?

The implementation guide covers the exact algorithms — consensus scoring, alignment measurement, pruning criteria, the trip line, and self-healing — with worked examples at every stage.

Read the implementation guide →

← Back to home