Scalable AI Alignment Through Constitutional Training
A novel approach to training large language models with built-in safety constraints and human value alignment.
artificial intelligence remains beneficial as it becomes more capable.
Featured Research Research AreasOur research spans multiple critical areas of AI safety and alignment
Ensuring AI systems pursue intended goals and remain beneficial to humanity
Understanding how AI models work internally to enable better control and safety
Designing AI systems that cooperate effectively with humans and other agents
Developing robust methods to evaluate AI safety and identify potential risks
Our most impactful recent work in AI safety and alignment
A novel approach to training large language models with built-in safety constraints and human value alignment.
Understanding the internal mechanisms of transformer models through activation patching and circuit analysis.
Additional research and ongoing projects
Designing AI systems that can cooperate effectively with humans and other AI agents to solve complex global challenges.
Interested in contributing to AI safety research? We welcome collaborations with researchers, institutions, and organizations sharing our mission.