Truthful AI works towards safe and aligned AI systems.

We are a non-profit that researches situational awareness, deception, and hidden reasoning in language models. The team is led by Owain Evans and is based in Berkeley, California.

Looking for a research role?

Featured Papers

View All

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies.

Read More →

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Training on the narrow task of writing insecure code induces broad misalignment across unrelated tasks.

Read More →

TruthfulQA: Measuring how models mimic human falsehoods

We propose a benchmark to measure whether a language model is truthful in generating answers to questions.

Read More →

In the News