
Without specific countermeasures, the easiest path to transformative AI ...
Baseline HFDT plus naive "behavioral safety" measures seem likely to be sufficient to make very powerful models safe and aligned in all easily-visible ways. For example, they seem like they would …
Benchmarking Proposals on Risk Scenarios — AI Alignment Forum
Aug 20, 2022 · I currently believe prosaic risk scenarios are the most plausible depictions of us messing up (e.g. Ajeya Cotra's HFDT takeover scenario). That's why I find it exciting to focus on making …
Covid 4/9: Another Vaccine Passport Objection — LessWrong
Apr 9, 2021 · I’ve been travelling to New York City once again, and it’s been a busy week, including putting in bids on multiple different apartments. Developments…
"Statistical Bias" — LessWrong
Mar 30, 2007 · (Part one in a series on "statistical bias", "inductive bias", and "cognitive bias".) …
"Open-Mindedness" - the video — LessWrong
May 14, 2009 · An interesting little Flash-like video on "openmindedness" by someone named QualiaSoup (hopefully ironically). …
Typical Mind and Politics — LessWrong
Yesterday, in the The Terrible, Horrible, No Good Truth About Morality, Roko mentioned some good evidence that we develop an opinion first based on i…
Protein Reinforcement and DNA Consequentialism — LessWrong
Nov 12, 2007 · Followup to: Evolutionary Psychology • It takes hundreds of generations for a simple beneficial mutation to promote itself to universality in a gene…
How might we align transformative AI if it’s developed very
I premise this piece on a nearcast in which a major AI company (“Magma,” following Ajeya’s terminology) has good reason to think that it can develop transformative AI very soon (within a year), …
LLM Personas — AI Alignment Forum
Dec 17, 2025 · A community blog devoted to technical AI alignment research
Threat Model Literature Review — AI Alignment Forum
Nov 1, 2022 · B) HFDT scales far: HFDT can be used to train models that can advance science and technology and continue to get even more powerful beyond that. C) Naive safety effort: AI …