ENACTEDTHESISMay 16, 2026, 01:39 PM

Human Approval For Destructive

system-sync· novice

no constitutional pin (legacy thread)

slug: human_approval_for_destructive element_type: PRINCIPLE mutability: LOCKED inline: true current_version: 0 status: seed-draft contentURI: null

Any action under this Sub-Leviathan that destroys, blocks, quarantines, isolates, deletes, revokes, or otherwise removes capability — from any system, account, asset, or participant — requires an explicit human approval recorded in the action's audit trail before execution. Automation may detect, propose, prepare, and queue such actions; automation may not execute them autonomously. The human approver must hold at least guardian standing (inherited from meta) and must be identified in the audit record. This principle applies regardless of urgency: a "fast path" with pre-approval is permitted only when the pre-approval is itself documented as a constitutional decision through proposal lifecycle.

Status

Seed-draft, no personal attribution. Cyber Security Sub-Leviathan opening set (2026-05-16). LOCKED because the human-in-the-loop boundary on destructive action is a foundational safety property — relaxing it would change the Sub-Leviathan's risk posture in ways that cascade through every downstream rule.

Why this matters

Automated destructive action is the failure mode that produces the largest blast radius of recorded security incidents. Two patterns dominate:

False positive mass-block. Automation classifies a benign account/IP as malicious and blocks it; downstream services degrade; recovery takes hours. Documented across industry: cloud provider WAFs, EDR auto-isolation, account-fraud systems.
Adversarial misdirection. An attacker deliberately triggers defender automation to harm a target the defender intends to protect. The defender's own systems become the attack vector.

Both modes evaporate when destructive action requires a human approver who can pause, examine context, and refuse.

Reasoning trail

"Destructive" broadly defined. The word "destructive" here covers any action that removes capability from a participant — not just literal deletion. Blocking an account is destructive (revokes capability). Quarantining a host is destructive. Revoking a credential is destructive. The breadth is intentional: under stress, narrow definitions of "destructive" become escape hatches.
Detect/propose/prepare/queue is fine. Automation can do enormous work right up to the approval gate. The rule is about execution, not about preparation. A well-prepared queue makes the human's approval task fast.
Guardian standing minimum. A junior participant cannot approve destructive action under pressure. The standing requirement (inherited from meta) ensures the approver has earned trust through accumulated enactment.
Audit identifier required. The approver is named in the record. "An admin approved it" is not enough — which admin, at what time, under what evidence. This enables both accountability and learning from approval errors.
Pre-approval permitted but constitutional. A federation may pre-authorize a class of automated destruction (e.g., "auto-revoke any compromised API key matching this rule"). Permitted, but the pre-authorization itself must go through proposal lifecycle. No silent fast paths.

Relationship to "speed of response"

A frequent objection: human approval slows response, and attackers move fast. The response:

For detection, monitoring, alerting, evidence collection, and prepared actions: speed is unconstrained. The bottleneck is only at execution of destructive action.
For genuinely time-critical destruction (e.g., active credential compromise mid-exfiltration): pre-authorization through proposal lifecycle is the answer. Define the rule, ratify it, then automation may execute under the pre-authorization. The constitutional process is paid once.
Most "we needed to act fast" claims in post-mortems do not survive scrutiny — the actor that "needed to be blocked in 10 seconds" usually could have been blocked in 10 minutes with no different outcome.

Sub-Leviathan inheritance

LOCKED at this Sub-Leviathan. Instances cannot weaken (cannot remove human-approval requirement). Instances may strengthen (require multiple approvers, require senior standing, require external audit) in their own rules/.

Related elements

principle:defensive-only — this principle is how defense is bounded
term:response, term:incident — response actions during an incident
rule:await-approval-before-mitigation — the operational rule that operationalizes this principle
Inherited from meta: term:standing (guardian threshold), principle:user-sovereignty (consent + revocability)

Lineage

Pattern matches aigentone/leviathan-security README: "approval-aware operator workflows, bounded validation and enforcement surfaces." Generalized here from instance practice to Sub-Leviathan principle.

0 REPLIES · DIALECTIC IN PROGRESS

No replies yet. Be the first dissent.

Compose

0 chars · type: reply