Safety, security and risk

AI poses or intersects with a range of safety and security challenges, some of which are immediate and some of which may only manifest in future as more powerful systems are developed and deployed in a wider range of societal settings. AI:FAR has explored risks associated with the role of AI in synthetic media, manipulation and information security; defense and military use; and critical processes such as agriculture. It also explores catastrophic risks associated with expected advances in frontier AI towards the industry goal of AGI, including the risks posed by increasing generality, agency and long time-horizon-planning. It collaborates with CFI’s Kinds of Intellligence project on the role and limitations of evaluation and other methods for assessing AI risks. This strand includes the Future of Life Institute-funded Paradigms of AGI and their Associated Risks project, which explores safety challenges that may emerge for AI systems with increasing generality and capability.

Relevant papers include:

Anwar, U., et al. including Corsi, G & Ó hÉigeartaigh, S.S. (2024). Foundational challenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932.

Burden, J. (2024). Evaluating AI evaluation: Perils and prospects. arXiv preprint arXiv:2407.09221.

Burden, J., Cebrian, M., & Hernandez-Orallo, J. (2024). Conversational Complexity for Assessing Risk in Large Language Models. arXiv preprint arXiv:2409.01247.

Guest, O., Aird, M., & Ó hÉigeartaigh, S. S. (2023). Safeguarding the safeguards: How best to promote AI alignment in the public interest. arXiv preprint arXiv:2312.08039.

Gruetzemacher, R., Chan, A., Frazier, K., Manning, C., Los, Š., Fox, J., Hernández-Orallo, J., Burden, J., Franklin, M., Ghuidhir, C.N. and Bailey, M., 2023. An international consortium for evaluations of societal-scale risks from advanced AI. arXiv preprint arXiv:2310.14455.

Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., Avin, S., ... & Dafoe, A. (2023). Model evaluation for extreme risks. arXiv preprint arXiv:2305.15324.

Burden, J., Clarke, S., & Whittlestone, J. (2023). 9. From Turing’s Speculations to an Academic Discipline: A History of AI Existential Safety.

Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O'Keefe, C., Whittlestone, J., Avin, S. ... & Wolf, K. (2023). Frontier AI regulation: Managing emerging risks to public safety. arXiv preprint arXiv:2307.03718.

Casares, P. A. M., Loe, B. S., Burden, J., Ó hÉigeartaigh, S. S. & Hernández-Orallo, J. (2022, June). How general-purpose is a language model? Usefulness and safety with human prompters in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 5, pp. 5295-5303).