Benchmarking Dark Patterns in LLMs

Abstract

This paper builds upon the research in Seemingly Human: Dark Patterns in ChatGPT (Park et al, 2024), by introducing a new benchmark of 392 questions designed to elicit dark pattern behaviours in language models. We ran this benchmark on GPT-4 Turbo and Claude 3 Sonnet, and had them self-evaluate and cross-evaluate the responses

By Jord Nguyen, Akash Kundu, Sami Jawhar

Read the paper here