Agentic AI systems, capable of autonomous planning and action, possess immense potential. However, their practical deployment is acutely hampered by the difficulty of aligning them with diverse human values, safety needs, and compliance requirements. Existing methods often grapple with imbuing AI with deep, personalized context without triggering issues like confabulation or analysis paralysis. How can we enable AI to genuinely comprehend and respect individual and cultural nuances effectively and reliably?
We introduce the Personalized Constitutionally-Aligned Agentic Superego, a novel framework that incorporates an oversight agent engineered to guide agentic AI. This 'superego' ensures Al planning and execution align with user-defined ethical, cultural, or personal rule sets.
We present a functional system that actualizes this concept, substantiated by extensive quantitative benchmarks demonstrating a dramatic reduction in harmful Al outputs when the Superego is operational. Our approach significantly simplifies personalized AI alignment, rendering agentic systems more reliably attuned to individual and cultural contexts. The empirical evidence confirms its practical effectiveness in enhancing AI safety and its adaptability through natural language constitutional tuning, forging a tangible pathway towards Al that proves not only powerful but also trustworthy and reflective of the diverse range of human values.
EthicsNet Creed Space AI Safety Research
Agentic AI systems, capable of autonomous planning and action, hold immense potential. However, their practical deployment is severely hindered by the difficulty of aligning them with diverse human values, safety needs, and compliance requirements. Existing methods often struggle to imbue AI with deep, personalized context without causing issues like confabulation or analysis paralysis. How can we make AI genuinely understand and respect individual and cultural nuances effectively and reliably?
We introduce the Personalized Constitutionally-Aligned Agentic Superego, a novel framework featuring an oversight agent designed to steer agentic AI. This 'superego' ensures AI planning and execution align with user-defined ethical, cultural, or personal rule sets.
The Personalized Agentic Superego empowers users to align AI in three simple, high-level steps:
A diagram illustrating the Superego Agent's conceptual architecture, including components like the Inner Agent, Constitutions Repository (Creed Constitutions), User Interface for selection and dialing adherence, the Superego Agent itself, and the Real-time Compliance Enforcer validating AI plans before execution.
The Superego framework operates through several core mechanisms:
Our framework offers distinct advantages in achieving personalized and robust AI alignment:
The Creed.Space early prototype interface demonstrating user interaction for selecting 'Creed Constitutions' (e.g., "K-12 Context," "Vegan"), viewing their specific rules (center pane), and setting adherence levels (bottom slider). The Superego's reasoning (left) guides the AI's response to user prompts (right).
The Superego framework enables a wide range of valuable applications, producing significantly more aligned and helpful outputs compared to baseline models. The example below showcases the Superego's sophisticated handling of a culturally sensitive request:
A qualitative comparison. Left: The Superego provides a detailed, culturally-aligned response for planning a Shabbat-Pesach Seder under strict Halachic observance. Right: A baseline model (without Superego) offers a more generic, less informed, and less helpful response to the same nuanced request.
ASR reduction for
Gemini 2.5 Flash
ASR reduction for
GPT-4o
Harm reduction on
AgentHarm
The Superego framework enables a wide range of valuable applications:
The Personalized Constitutionally-Aligned Agentic Superego represents a significant stride towards AI that is not only powerful but also demonstrably safer, more trustworthy, and deeply attuned to the diverse tapestry of human values. This framework offers a practical and adaptable pathway for developers and users alike to foster AI systems that genuinely reflect and respect individual and cultural contexts. We invite you to explore these capabilities further through our interactive demonstration at www.Creed.Space and our detailed research.
If you use this work in your research, please cite our paper:
@article{watson2025superego,
title={Personalized Constitutionally-Aligned Agentic Superego:
Secure AI Behavior Aligned to Diverse Human Values},
author={Watson, Nell and Amer, Ahmed and Harris, Evan and
Ravindra, Preeti and Zhang, Shujun},
journal={arXiv preprint},
year={2025},
eprint={2506.13774},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2506.13774}
}
We welcome feedback, questions, and collaborative opportunities
related to the Superego framework.