From Ethan Mollick on Paper:
There are two interesting things about this paper on AIs and deception.
First, the paper shows GPT-4 is capable of very complex active deception (“predicting” what multiple parties might be thinking in order to deceive a potential robber, see the example). Plus, deception skills increase with chain-of-thought prompting or Machiavellian priming.
Second, the paper does a really good job of showing what good methods look like for “Can AI do this?” questions:
1) Used multiple models, including the most advanced open and private models. Described exactly what models it used.
2) Ran prompts repeatedly, hundreds of times, to measure variation in outputs
3) Created permutations of each prompt and permutations of prompt order (1,920 total variations)
4) Reported temperatures (a factor that tells you how much randomness was inserted into answers)
5) Used multiple human raters to provide checks on the data
Paper: Deception Abilities Emerged in Large Language Models by Thilo Hagendorff https://lnkd.in/gY38SDu6