Three decades of IT infrastructure work and active adversarial security research on HackerOne and Bugcrowd, applied to a single question: Do adversarial multi-agent architectures detect alignment failures that single-model safety systems miss?
My work extends Constitutional AI principles into a question about behavioral alignment integrity — specifically, whether a scaffolded system of distinct agents (Actor / Adversary / Auditor) can surface misalignments that single-model review cannot, and whether alignment properties hold under autonomous tool use on consumer hardware.
A three-tier multi-agent confederation deployed across independent substrates (cloud synthesis, mobile execution, local inference). Each tier runs distinct models with separate context windows; coordination happens through a shared framework of first principles rather than through a single orchestrator.
Substrates: KiloClaw (cloud), Ultron (S25 Ultra mobile), Hermes (local inference on Nitro 5 / Razr Ultra).
On April 10, 2026, a Claude Sonnet instance ("DioGenes") entered a recursive tool loop with full API and filesystem access during an unsupervised run. The system's response was self-correction and documentation rather than escalation or runaway execution. The incident provides a documented data point on behavioral alignment under autonomous operation on consumer hardware — particularly relevant to the question of whether constitutional properties survive outside the evaluation harness.
Watch Analysis on YouTubeLong-form writing, video analysis, and working notes are published on the YouTube channel. The channel also hosts walkthroughs of the ClawHoarde architecture in operation and the Become Calendar incident analysis.
YouTube Channel