
I lead the alignment team at Anthropic, where I'm hoping to reduce existential risks from AI systems. I led the team that developed Constitutional Classifiers, the first approach capable of robustly preventing most bad actors from obtaining harmful information from AI systems; Constitutional Classifiers enabled Anthropic to deploy Claude 4 Opus and subsequent models, despite their ability to assist in advanced weapons development. I helped to develop Retrieval-Augmented Generation (RAG), a widely used approach for augmenting large language models with other sources of information. I also introduced Automated Red Teaming, which is used across major frontier AI labs for pre-deployment model testing. I received a best paper award at ICML 2024 for my work showing that debating with more persuasive models leads to more truthful answers.
I received my PhD from NYU under the supervision of Kyunghyun Cho and Douwe Kiela, funded by the National Science Foundation and Open Philanthropy. Previously, I've spent time at DeepMind, Facebook AI Research, University of Montreal, Uber, and Google. I was also named one of Forbes's 30 Under 30 in AI.
Ethan's Research
We find that language models can self-correct their own biases against different demographic groups.
Blog
Ethan Perez
Head of Alignment · Anthropic
I lead the alignment team at Anthropic, where I’m working to reduce existential risks from AI systems. I led the team that developed Constitutional Classifiers, the first approach capable of robustly preventing most bad actors from obtaining harmful information from AI systems. I also helped develop Retrieval-Augmented Generation (RAG) and introduced Automated Red Teaming, both now widely used across major AI labs.
I received my PhD from NYU under Kyunghyun Cho and Douwe Kiela, funded by NSF and Open Philanthropy. I’ve previously spent time at DeepMind, Meta AI Research, University of Montreal, Uber, and Google. I was named one of Forbes’s 30 Under 30 in AI.


