Research archiveOpen access

Workflow governance

Agent Approvals and Human Leashes, 2026

A category report on how human approval, delegation windows, renewal, and runtime leash enforcement should work in serious agent systems.

Published
Updated
AccessFree

Why one-time approval is not the same thing as continuing authority, and why long-running agents need both.


As of March 22, 2026, the biggest governance mistake in agent systems is still conceptual, not technical: teams keep collapsing approval and delegation into one control. A human approval answers a narrow question such as "may this workflow begin?" or "may it cross this checkpoint?" A human leash answers a different one: "while delegated authority remains active, what can this system continue doing without asking again?" Those are related controls, but they are not interchangeable.

That distinction matters because long-running agents do not stay inside one moment of risk. A workflow can be created safely, drift into a higher-risk action later, pause for human review, resume after context has changed, keep running on a recurring schedule, and finally publish or release something irreversible. Treating all of that as one blanket consent creates opposite failure modes at once. Either the system nags humans for every harmless step until operators disable the controls, or it quietly converts one-time approval into standing authority.

The better model is stage-aware authority. Microsoft's AG-UI human-in-the-loop guidance and Cloudflare's workflow approval model both treat approval as a workflow event with a clear pause, response, and continuation. Oracle's delegate-versus-reassign distinction shows why temporary delegation is not the same as permanent ownership transfer. Passage's step-up docs and F5's step-up authentication overview show why some actions deserve fresh user presence even when the session is otherwise valid. And the runtime-policy work captured by Cerbos, AI Runtime Security, and the arXiv paper on customizable runtime enforcement for LLM agents makes the other half explicit: once authority is delegated, it still needs to be constrained continuously.

This report argues for a practical split:

  • approval should be modeled by workflow stage
  • human leashes should be time-bounded and scope-bounded
  • resume should be treated as a new risk surface, not a silent continuation
  • renewal should be a first-class ceremony for unattended systems
  • publish or release should require the strongest fresh-auth path

That sounds like more ceremony, but in practice it reduces friction. When approval and continuing authority are modeled separately, most low-risk runtime steps can proceed without surprise while the genuinely sensitive moments still force a human decision.


The Stage Model

The cleanest way to reason about human authority is to track it alongside the workflow lifecycle:

Human authority moves with the workflow

Approval and delegated authority should change shape as the system moves from setup to runtime to outward-facing release.

1
Create

Approve intent, budget, and capabilities before a job or subscription exists.

approvalbudgetscope
2
Run

Let low-risk steps proceed inside a bounded runtime envelope.

leashspend capspolicy
3
Resume

Treat recovery as a fresh authority checkpoint instead of a silent continuation.

reapprovaldenial reasonslease validity
4
Renew

Re-open continuing authority before unattended delegation becomes stale.

renewalnotificationsrevocation
5
Publish

Require the strongest ceremony for outward-facing or irreversible change.

step-upowner actionreview

That framing fixes a lot of product confusion. It tells operators that "approval" is not one toggle and "autonomy" is not one mode. The question is always narrower: what kind of human decision belongs at this stage, and what kind of continuing scope is safe afterward?

Which control surface should dominate each workflow stage

Create and publish should stay human-presence heavy. Steady-state runtime should be leash heavy. Resume and renew are blended checkpoints.

approval or reviewactive leashfresh step-up

This is more useful than treating approval as one global switch. It shows where the decision should live: human ceremony at the edges, runtime enforcement in the middle.

StageHuman questionRuntime leash roleDenial reasons that should be explicit
CreateShould this workflow or subscription exist at all?Usually none yet, because authority has not been delegatedbudget_exceeded, capability_disallowed, private_lane_not_authorized
RunCan the system keep executing inside known bounds?Primary control surface for spend, capabilities, destinations, and private-data lanesexpired_lease, out_of_scope, spend_limit_exceeded
ResumeCan this run safely re-enter after a block or failure?Helpful, but not sufficient if the block itself requires fresh reviewapproval_required, lease_expired, renewal_required
RenewShould unattended authority continue?Central, because renewal is literally about extending delegated scoperenewal_required, stale_scope, user_presence_required
PublishIs this outward-facing or irreversible action allowed right now?Usually not enough on its ownstep_up_required, review_required, publish_blocked

That table is the category-level answer many teams are missing. A leash is not an all-purpose substitute for human judgment. It is a runtime envelope. Approval remains the mechanism for crossing a boundary that should not be crossed silently.


Create: Approval Is About Intent

Workflow creation is the moment when human intent is clearest and easiest to capture. Before a job or subscription exists, the operator still knows the requested task, the expected budget, the allowed data lane, the target capability set, and whether the run is one-shot or recurring. That is why creation should carry the cleanest approval ceremony.

Microsoft Copilot Studio's multistage approval model is useful here because it separates AI policy checks from human approvals instead of pretending one replaces the other. A system can reject obviously out-of-policy work automatically, then ask a human only when the remaining decision actually needs business judgment. That same split belongs in agent systems. Policy can reject unsupported capabilities, overspend, missing prerequisites, or out-of-scope destinations. Human approval should decide whether the workflow is worth creating in the first place.

This is also the stage where denial reasons matter most. "Denied" is not enough. The system should say whether the block came from budget, capability scope, private data, recipient restrictions, or a more general policy mismatch. The reason is operational, not just UX polish. Teams can only tune thresholds and workflows if they know what is being denied most often.

Creation is also the worst place to fake runtime delegation. A common anti-pattern is to ask for a fresh approval once, then quietly let that approval imply ongoing autonomous authority forever. That feels efficient for a week, then turns into confusion the moment a run is resumed, retried, or scheduled again under changed conditions. One-time approval is about starting. It is not a permanent license.

The strongest creation pattern is therefore:

  1. run automated policy checks first
  2. ask for human approval only when judgment is still needed
  3. show the exact scope being approved: spend, capabilities, private routes, recurrence, and outward effects
  4. if approval succeeds, mint a separate runtime leash when ongoing authority is actually required

That last step is the important one. Approval should not be overloaded just because a separate delegation artifact feels more complex.


Run: A Leash Is About Continuing Authority

Once a workflow is running, the question changes. The operator is no longer deciding whether the work should exist. The question is whether the system may continue acting inside bounded constraints without another interrupt.

This is where a human leash becomes useful. The best description of the leash category is not "background approval." It is time-bounded, scope-bounded delegated authority. The system may continue to act only while the window is still open and only inside the explicit scope that was delegated. If the run drifts outside that envelope, the right answer is not "keep going because the workflow was approved yesterday." The right answer is a runtime denial.

AI Runtime Security's multi-agent controls are especially clear on the principles that should govern delegated execution: no privilege escalation, scope inheritance, and delegation-depth limits. Cerbos on authorization in workflows makes a similar point from the application side: authorization decisions do not disappear after a process starts. They continue to matter as the workflow transitions through states.

The arXiv paper on customizable runtime enforcement helps clarify why runtime policy is not just another approval queue. Some constraints are hard constraints that must never be violated: forbidden functions, forbidden destinations, no delete or payout outside an allowlist. Others are softer, such as rate ceilings or retry thresholds, where the system can fail gracefully and recover. A runtime leash is the container that makes those constraints enforceable over time.

In practice, a good leash usually needs at least four dimensions:

  • time window: when delegated authority expires
  • capability scope: which tools, actions, or workflow templates remain allowed
  • economic scope: spend ceilings, rate limits, or per-window totals
  • data and destination scope: which private surfaces, recipients, hosts, or webhook targets are still in bounds

When those checks fail, the system should surface machine-readable reasons. expired_lease, out_of_scope, and renewal_required are much better product primitives than a vague "authorization failed." They tell the operator what changed and whether the fix is new approval, a narrower request, or a simple renewal.

The operator goal at runtime is not to eliminate friction entirely. It is to keep the predictable path quiet while making every out-of-bounds event legible.


Resume: Recovery Is a New Risk Surface

Resume is where many otherwise careful systems become careless. The common mistake is to treat resume as if it were a harmless continuation of the original run. But a paused or blocked workflow has already told you that the original assumptions were not enough. Maybe a human approval was pending. Maybe a policy check failed. Maybe a dependency timed out. Maybe the operator context changed while the run was waiting.

That is why resume needs its own authority model.

Cloudflare's human-in-the-loop workflow guidance treats approval checkpoints as a first-class workflow pause, not a minor flag. The workflow reaches an approval step, waits, and then resumes only after a decision. The operator meaning is obvious: resume is conditional on a human event. Community discussions around Argo workflows and restart-safe approval patterns in long-running automation make the same operational point from another angle. Once a system pauses around manual intervention, safe resumption becomes part of the design problem.

Resume therefore deserves at least three checks:

  1. Why did the run stop? A policy denial, pending approval, transient fault, and exhausted budget should not share one resume path.
  2. Is the delegated leash still valid? Old delegated authority should not be smuggled through a new moment of uncertainty.
  3. Is fresh approval required? If the run stopped at a human checkpoint, resume should not bypass that checkpoint just because someone clicked "continue."

This is where Oracle's distinction between delegation and reassignment becomes more than a process footnote. If a task is delegated, the original accountable actor still owns the underlying authority. If the system resumes under someone else's click without preserving that accountability, the audit model gets muddied very quickly.

For operators, the design lesson is straightforward: resume is not a retry button. It is a controlled re-entry into a run that has already proven it needs more scrutiny than the happy path.


Renew: Recurring Automation Needs Its Own Ceremony

Recurring subscriptions and unattended workflows are where human leashes either become useful or become dangerous. If the system requires full fresh approval for every low-risk recurring action, operators stop trusting the automation because it becomes noisy and slow. If the system grants broad standing authority with no renewal, the automation becomes invisible.

That is why renewal should be treated as a dedicated ceremony rather than an error state.

ServiceNow's approvals and delegation discussion is useful because it frames delegation as an explicit managed state, not a hidden background behavior. AI Runtime Security's multi-agent controls adds the guardrail view: delegation should inherit scope, forbid privilege escalation, and cap depth. Combined with Cloudflare-style pause and timeout patterns, the right operator model becomes clear: a recurring run should move smoothly while authority is current, then shift into a renewal path before that authority silently goes stale.

That matters because operators should expect more product variance around runtime delegation and renewal than around basic approval or step-up. This is where design choices still matter most. Renewal is therefore not simply "approve again." A good renewal flow should tell the operator:

  • what has been happening during the delegated window
  • what scope remains active if renewed
  • what spend, recipient, or capability limits will continue
  • what changed since the last approval
  • how to narrow or revoke the leash instead of only extending it

That is why the strongest subscription products expose expiring_soon, expired, and renewed events instead of only surfacing a sudden failure after the fact. Renewal should be visible before it becomes a production surprise.


Publish or Release: Require Fresh Presence

Publish and release actions deserve the strongest ceremony in the system because they are usually outward-facing, reputation-bearing, and often irreversible. A human leash that was appropriate for repeated background work is usually not strong enough for a final public action.

Passage's step-up authentication docs and F5's step-up authentication overview both make the principle explicit: some actions require fresh proof of user presence even when a broader session is otherwise valid. In workflow terms, publish is one of those actions.

This is also the place where review state matters. A strong publish flow should not only ask "who is clicking publish?" It should also ask:

  • what changed since the last published version
  • did the new draft drop claims, charts, or sources that matter
  • is this a no-op publish dressed up as progress
  • is the actor authorized to make an outward-facing release right now

That is why publish controls should usually combine three things:

  1. diff-aware review
  2. fresh owner or publisher presence
  3. clear deny reasons when the release is blocked

The category lesson is simple: a runtime leash is excellent for bounded continuity. It is a poor substitute for final-release ceremony.


Comparison Table

Control surfaceWhat it decidesWhat it should never silently replaceBest use
ApprovalWhether a workflow may begin or cross a checkpointOngoing delegated runtime authorityCreate, risky transitions, manual decision points
Human leashWhat the system may continue doing while delegation is still validFinal release decisions or new high-risk scopeRepeated low-risk runtime activity, subscriptions, bounded autonomy
Resume gateWhether a blocked run may safely re-enterThe original approval state that caused the pauseRecovery after approval pauses, policy denials, or uncertain failures
RenewalWhether unattended authority should continueSilent permanent delegationRecurring subscriptions, long-lived sessions, scheduled refreshes
Step-up authWhether a sensitive actor is really present right nowGeneral runtime delegationPublish, release, destructive actions, ownership transfer

That table is the practical answer for operator teams. If everything is treated as approval, autonomy becomes unusable. If everything is treated as delegation, accountability becomes blurry. Serious systems need both.


Recommendations for Operators

  1. Model approval and leash as separate objects. Approval should answer the stage-specific "may this proceed?" question. The leash should answer the runtime "what can continue?" question.
  1. Give resume its own policy. If a workflow stopped because of approval, denial, or ambiguity, resume should not be treated like a harmless retry.
  1. Use explicit denial reasons everywhere. Operators should see whether the problem is budget, out-of-scope behavior, expired delegation, missing renewal, or required step-up.
  1. Make renewal proactive, not punitive. Expiring-soon notices, revocation paths, and one-tap extension flows are better than sudden unattended failure.
  1. Reserve fresh step-up for truly sensitive edges. Publish, release, destructive mutation, and ownership-changing actions should ask for fresh presence even if runtime delegation is otherwise valid.
  1. Keep runtime quiet when it is behaving. If a human has to approve every harmless step, the system is not governed. It is stalled.

Bottom Line

Human approval and human leashes should be treated as complementary controls, not rival ones. Approval is about intent at a stage. A leash is about continuing authority inside bounds. Resume is where those models collide. Renewal is where unattended systems become either trustworthy or invisible. Publish is where fresh human presence matters most.

The best operator pattern in 2026 is therefore not blanket approval and not blanket autonomy. It is a staged model:

  • approve creation intentionally
  • enforce runtime scope continuously
  • treat resume as a fresh risk surface
  • make renewal explicit before authority expires
  • require fresh step-up for release

That is the design that preserves human accountability without turning every workflow into a queue of pointless clicks.

History

Recent activity

published

Saved report version

A category report on how human approval, delegation windows, renewal, and runtime leash enforcement should work in serious agent systems.

seed8 artifacts

Details

Report details

  • Status: Open access
  • Updated: March 23, 2026
  • Source mix: 5 official, 5 ecosystem
  • Method steps: 4
  • Versions: 1
  • Definition: 0 sections, 4 query runners, 1 prompt runners, and 0 chart goals