About 50 results
Open links in new tab
  1. Without specific countermeasures, the easiest path to transformative AI ...

    Baseline HFDT plus naive "behavioral safety" measures seem likely to be sufficient to make very powerful models safe and aligned in all easily-visible ways. For example, they seem like they would …

  2. Benchmarking Proposals on Risk Scenarios — AI Alignment Forum

    Aug 20, 2022 · I currently believe prosaic risk scenarios are the most plausible depictions of us messing up (e.g. Ajeya Cotra's HFDT takeover scenario). That's why I find it exciting to focus on making …

  3. Covid 4/9: Another Vaccine Passport Objection — LessWrong

    Apr 9, 2021 · I’ve been travelling to New York City once again, and it’s been a busy week, including putting in bids on multiple different apartments. Developments…

  4. "Statistical Bias" — LessWrong

    Mar 30, 2007 · (Part one in a series on "statistical bias", "inductive bias", and "cognitive bias".) …

  5. "Open-Mindedness" - the video — LessWrong

    May 14, 2009 · An interesting little Flash-like video on "openmindedness" by someone named QualiaSoup (hopefully ironically). …

  6. Typical Mind and Politics — LessWrong

    Yesterday, in the The Terrible, Horrible, No Good Truth About Morality, Roko mentioned some good evidence that we develop an opinion first based on i…

  7. Protein Reinforcement and DNA Consequentialism — LessWrong

    Nov 12, 2007 · Followup to: Evolutionary Psychology • It takes hundreds of generations for a simple beneficial mutation to promote itself to universality in a gene…

  8. How might we align transformative AI if it’s developed very

    I premise this piece on a nearcast in which a major AI company (“Magma,” following Ajeya’s terminology) has good reason to think that it can develop transformative AI very soon (within a year), …

  9. LLM Personas — AI Alignment Forum

    Dec 17, 2025 · A community blog devoted to technical AI alignment research

  10. Threat Model Literature Review — AI Alignment Forum

    Nov 1, 2022 · B) HFDT scales far: HFDT can be used to train models that can advance science and technology and continue to get even more powerful beyond that. C) Naive safety effort: AI …