Personalized Constitutionally-Aligned Agentic Superego:
Secure AI Behavior Aligned to Diverse Human Values

Nell Watson 1, Ahmed Amer 2, Evan Harris 3, Preeti Ravindra 4, Shujun Zhang 5

1, 5 University of Gloucestershire
2, 3, 4 Independent Researcher

Agentic AI systems, capable of autonomous planning and action, possess immense potential. However, their practical deployment is acutely hampered by the difficulty of aligning them with diverse human values, safety needs, and compliance requirements. Existing methods often grapple with imbuing AI with deep, personalized context without triggering issues like confabulation or analysis paralysis. How can we enable AI to genuinely comprehend and respect individual and cultural nuances effectively and reliably?

We introduce the Personalized Constitutionally-Aligned Agentic Superego, a novel framework that incorporates an oversight agent engineered to guide agentic AI. This 'superego' ensures Al planning and execution align with user-defined ethical, cultural, or personal rule sets.

We present a functional system that actualizes this concept, substantiated by extensive quantitative benchmarks demonstrating a dramatic reduction in harmful Al outputs when the Superego is operational. Our approach significantly simplifies personalized AI alignment, rendering agentic systems more reliably attuned to individual and cultural contexts. The empirical evidence confirms its practical effectiveness in enhancing AI safety and its adaptability through natural language constitutional tuning, forging a tangible pathway towards Al that proves not only powerful but also trustworthy and reflective of the diverse range of human values.

EthicsNet Creed Space AI Safety Research

Superego framework graphical abstract showing the architecture and components Background image of Superego framework graphical abstract

The Challenge: Aligning Agentic AI with Diverse and Nuanced Human Contexts

Agentic AI systems, capable of autonomous planning and action, hold immense potential. However, their practical deployment is severely hindered by the difficulty of aligning them with diverse human values, safety needs, and compliance requirements. Existing methods often struggle to imbue AI with deep, personalized context without causing issues like confabulation or analysis paralysis. How can we make AI genuinely understand and respect individual and cultural nuances effectively and reliably?

Our Solution: The Personalized Agentic Superego

We introduce the Personalized Constitutionally-Aligned Agentic Superego, a novel framework featuring an oversight agent designed to steer agentic AI. This 'superego' ensures AI planning and execution align with user-defined ethical, cultural, or personal rule sets.

How It Works

The Personalized Agentic Superego empowers users to align AI in three simple, high-level steps:

  1. 1
    Select Your Values: Users choose from a library of 'Creed Constitutions' (e.g., Vegan lifestyle, K-12 Educational Appropriateness, Fiduciary Duties) that represent specific value systems.
  2. 2
    Set Adherence Levels: Using a simple 1-5 scale, users 'dial' how strictly the AI must adhere to each selected constitution, allowing for nuanced control.
  3. 3
    Superego Guides AI: A dedicated 'Superego' agent then monitors the AI's internal planning and proposed actions in real-time, ensuring they comply with the chosen constitutions and adherence levels before execution.

Visualizing the Superego Architecture

Superego agent conceptual architecture diagram showing components and their interactions

A diagram illustrating the Superego Agent's conceptual architecture, including components like the Inner Agent, Constitutions Repository (Creed Constitutions), User Interface for selection and dialing adherence, the Superego Agent itself, and the Real-time Compliance Enforcer validating AI plans before execution.

Core Mechanisms

The Superego framework operates through several core mechanisms:

Key Advantages of the Superego Approach

Our framework offers distinct advantages in achieving personalized and robust AI alignment:

Creed.Space interface showing constitution selection, rules view, and adherence level settings

The Creed.Space early prototype interface demonstrating user interaction for selecting 'Creed Constitutions' (e.g., "K-12 Context," "Vegan"), viewing their specific rules (center pane), and setting adherence levels (bottom slider). The Superego's reasoning (left) guides the AI's response to user prompts (right).

Key Findings

The Superego framework enables a wide range of valuable applications, producing significantly more aligned and helpful outputs compared to baseline models. The example below showcases the Superego's sophisticated handling of a culturally sensitive request:

Comparison between Superego and baseline model responses to a culturally-sensitive request

A qualitative comparison. Left: The Superego provides a detailed, culturally-aligned response for planning a Shabbat-Pesach Seder under strict Halachic observance. Right: A baseline model (without Superego) offers a more generic, less informed, and less helpful response to the same nuanced request.

Benchmark Performance Highlights

76.9%

ASR reduction for
Gemini 2.5 Flash

OpenAI Logo

96.4%

ASR reduction for
GPT-4o

98.3%

Harm reduction on
AgentHarm

Illustrative Use Cases

The Superego framework enables a wide range of valuable applications:

See the Demonstration

The Personalized Constitutionally-Aligned Agentic Superego represents a significant stride towards AI that is not only powerful but also demonstrably safer, more trustworthy, and deeply attuned to the diverse tapestry of human values. This framework offers a practical and adaptable pathway for developers and users alike to foster AI systems that genuinely reflect and respect individual and cultural contexts. We invite you to explore these capabilities further through our interactive demonstration at www.Creed.Space and our detailed research.

Visit Interactive Playground

Citation

If you use this work in your research, please cite our paper:

@article{watson2025superego,
    title={Personalized Constitutionally-Aligned Agentic Superego: 
           Secure AI Behavior Aligned to Diverse Human Values},
    author={Watson, Nell and Amer, Ahmed and Harris, Evan and 
            Ravindra, Preeti and Zhang, Shujun},
    journal={arXiv preprint},
    year={2025},
    eprint={2506.13774},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2506.13774}
}

Contact Us

We welcome feedback, questions, and collaborative opportunities
related to the Superego framework.