Deceptive AI

Project overview

An etching of an old automatic chess playing machine

Can computers deceive people? It is clear that computers can be used as tools for people to deceive each other (eg, fake news, phishing, etc), but is it possible for a specially designed AI agent to engage in strategic deception? In other words, can a machine devise and enact deeply deceptive strategies against humans by reasoning about their perceptions, beliefs and intentions? In what kind of human–machine encounters might this be possible? What would be the nature of the machine’s computational and cognitive architecture? How do people understand the possibilities of such machine deception and how do they react to it?

We are a team of computer scientists, psychologists, and magicians who are collaborating to explore these questions. Our methodology is to formalise the techniques of deception used by stage conjurors (for example see Kuhn, Olson & Raz, 2016) such that they can be built into the thinking processes of software agents, and to test the deceptive powers of these agents when playing computer games against humans (See Smith, Dignum & Sonenberg, 2016). The project sheds light on what it means for a computer to intentionally deceive people, and provides insights into the capabilities of software agents to deploy advanced ‘theory-of-mind’ reasoning in human-machine encounters.

Project team

Wally Smith, Faculty of Engineering & Information Technology, The University of Melbourne

Liz Sonenberg, Faculty of Engineering & Information Technology, The University of Melbourne

Michael Kirley, Faculty of Engineering & Information Technology, The University of Melbourne

Frank Dignum, Department of Computing Science, Umeå University, Sweden

Gustav Kuhn, Department of Psychology, Goldsmiths, University of London

Peta Masters, Faculty of Engineering & Information Technology, The University of Melbourne

Publications

Smith, W., Dignum, F. & Sonenberg, L. (2016). The construction of impossibility: a logic-based analysis of conjuring tricks. Frontiers in psychology, 7, 748. [View abstract]

Psychologists and cognitive scientists have long drawn insights and evidence from stage magic about human perceptual and attentional errors. We present a complementary analysis of conjuring tricks that seeks to understand the experience of impossibility that they produce. Our account is first motivated by insights about the constructional aspects of conjuring drawn from magicians’ instructional texts. A view is then presented of the logical nature of impossibility as an unresolvable contradiction between a perception-supported belief about a situation and a memory-supported expectation. We argue that this condition of impossibility is constructed not simply through misperceptions and misattentions, but rather it is an outcome of a trick’s whole structure of events. This structure is conceptualised as two parallel event sequences: an effect sequence that the spectator is intended to believe; and a method sequence that the magician understands as happening. We illustrate the value of this approach through an analysis of a simple close-up trick, Martin Gardner’s Turnabout. A formalism called propositional dynamic logic is used to describe some of its logical aspects. This elucidates the nature and importance of the relationship between a trick’s effect sequence and its method sequence, characterised by the careful arrangement of four evidence relationships: similarity, perceptual equivalence, structural equivalence, and congruence. The analysis further identifies two characteristics of magical apparatus that enable the construction of apparent impossibility: substitutable elements and stable occlusion.

Masters, P., Smith, W., Sonenberg, L. & Kirley, M. (2021) ‘Characterising Deception in AI: A Survey’. In Sarkadi, Wright, Masters, McBurnely (Eds). Deceptive AI. Springer. (pp. 17 – 26)

Smith, W., Kirley, M., Sonenberg, L. & Dignum F. (2021) ‘The role of environments in affording deceptive behaviour: some preliminary insights from stage magic’. In Sarkadi, Wright, Masters, McBurnely (Eds). Deceptive AI. Springer.

Masters, P. & Vered, M., (2021, August). What’s the context? Implicit and Explicit Assumptions in Model-Based Goal Recognition. In Proceedings of the 30th International Joint Conferences on Artificial Intelligence (to appear). [View abstract]

Every model involves assumptions. While some are standard to all models that simulate intelligent decision-making (eg, discrete/continuous, static/dynamic), goal recognition is well known also to involve choices about the observed agent: is it aware of being observed? cooperative or adversarial? In this paper, we examine not only these but the many other assumptions made in the context of model-based goal recognition. By exploring their meaning, the relationships between them and the confusions that can arise, we demonstrate their importance, shed light on the way trends emerge in AI, and suggest a novel means for researchers to uncover suitable avenues for future work.

Smith, W. (2021). ‘Deceptive Strategies in the Miniature Illusions of Close-Up Magic’. In Rein, K. (Ed.) Illusion in Cultural Practice: Productive Deception. Routledge. (pp. 123 – 138)

Masters, P., Kirley, M., & Smith, W. (2021, May). Extended Goal Recognition: A Planning-Based Model for Strategic Deception. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (pp 871–879). [View abstract]

Goal recognition is the problem of determining an agent’s intent by observing its actions. In the context of AI research, the problem is tackled for two quite different purposes: to determine an agent’s most probable goal or, for human-aware planning including planned—or strategic—deception, to determine an observer’s most likely belief about that goal. Making no distinction, contemporary models tend to assume an infallible observer, deceived only while it has limited access to information or if the environment itself is only partially observable. Focusing on the second purpose, we propose an extended framework that incorporates formal definitions of confirmation bias, selective attention and memory decay. In contrast to pre-existing models, our approach combines explicit consideration of prior probabilities with a principled representation of observer confidence and distinguishes between potential observations, ie, every observable event within the observer’s frame of reference—and recalled observations which we model as a function of attention and memory. We show that when these factors are taken into consideration, false beliefs may arise and can be made to persist, even in a fully observable environment—thus providing a perceptual model readily incorporated into the ‘thinking’ of an adversarial agent for the purpose of strategic deception.

Masters, P., Smith, W., & Kirley, M. (2021). Extended Goal Recognition: Lessons from Magic. Frontiers in artificial intelligence, 4, 730990.

Liu, Z., Yang, Y., Miller, T., & Masters, P. (2021, May). Deceptive Reinforcement Learning for Privacy-Preserving Planning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (pp. 818–826). [View abstract]

In this paper, we study the problem of deceptive reinforcement learning to preserve the privacy of a reward function. Reinforcement learning is the problem of finding a behaviour policy based on rewards received from exploratory behaviour. A key ingredient in reinforcement learning is a reward function, which determines how much reward (negative or positive) is given and when. However, in some situations, we may want to keep a reward function private; that is, to make it difficult for an observer to determine the reward function used. We define the problem of privacy-preserving reinforcement learning, and present two models for solving it. These models are based on dissimulation – a form of deception that ‘hides the truth’. We evaluate our models both computationally and via human behavioural experiments. Results show that the resulting policies are indeed deceptive, and that participants can determine the true reward function less reliably than that of an honest agent

Masters, P., & Sardina, S. (2021). Expecting the unexpected: Goal recognition for rational and irrational agents. Artificial Intelligence, 297, 103490. [View abstract]

Contemporary cost-based goal-recognition assumes rationality: that observed behaviour is more or less optimal. Probabilistic goal recognition systems, however, explicitly depend on some degree of sub-optimality to generate probability distributions. We show that, even when an observed agent is only slightly irrational (sub-optimal), state-of-the-art systems produce counter-intuitive results (though these may only become noticeable when the agent is highly irrational). We provide a definition of rationality appropriate to situations where the ground truth is unknown, define a rationality measure (RM) that quantifies an agent’s expected degree of sub-optimality, and define an innovative self-modulating probability distribution formula for goal recognition. Our formula recognises sub-optimality and adjusts its level of confidence accordingly, thereby handling irrationality—and rationality—in an intuitive, principled manner. Building on that formula, moreover, we strengthen a previously published result, showing that “single-observation” recognition in the path-planning domain achieves identical results to more computationally expensive techniques, where previously we claimed only to achieve equivalent rankings though values differed.

Project information

Funding source	ARC Grant DP180101215 ‘A Computational Theory of Strategic Deception’
Project time frame	2018–2020

Contact details

Assoc Prof Wally Smith
Email: wsmith@unimelb.edu.au