Confidence and Trust in Human-Machine Teaming

Home / Articles / Journal Article / Fall 2019: Volume 6 Issue 3

Photo illustration created by HDIAC and adapted from Adobe Stock images and a DIA photo (available for viewing at: https://www.dia.mil/portals/27/ Images/Phase%20I%20Redesign%20Files/Career%20Field%20Images/IT/170603-F-LW859-002.jpg?ver=2018-09-16-213142-767).

Posted: October 25, 2019 | By: Lt. Col. Aaron Celaya, Nick Yeung, Ph.D.

Following World War II and into the Cold War, the United States began utilizing offset strategies to gain strategic and operational advantage over adversaries. These strategies focused on technological advance as key to creating disparities in military and coercive power between opposing forces. Former U.S. Secretary of Defense Harold Brown explained this strategic approach to Congress in 1981 when he stated, “Technology can be a force multiplier, a resource that can be used to help offset numerical advantages of an adversary. Superior technology is one very effective way to balance military capabilities other than matching an adversary tank-for-tank or soldier-for-soldier [1].” Thus, the U.S. developed tactical nuclear warheads and precision-guided weapons as key components of the first and second offset strategies. A similar emphasis on technological advance is evident in the current National Defense Strategy, released in January 2018, which anticipates further development of traditional warfighting technologies (nuclear forces, missile defense, forward forces), but crucially goes beyond these to also include Artificial Intelligence (AI) and other forms of algorithmic technology implementations [2]. Correspondingly, the Department of Defense (DoD) is calling for AI development, employment, and deployment to support decision making, intelligence operations, and additional capabilities via its new Artificial Intelligence Strategy, released in February 2019 [3].

There are many varieties of artificial systems used daily by companies, governments, universities, and individual citizens. Artificial systems can range from fixed, rules-based automation to evolutionary, judgement-based systems. Fixed automation consists of pre-programmed, strictly controlled and contained functions, such as scripts, macros, or robotic-process automation. By contrast, evolutionary, judgement-based algorithms are self-learning, autonomous, and potentially unbounded AI capabilities that typically rely on deep learning and statistical prediction, and can exhibit behaviors not programmed or even anticipated by their human designers [4, 5].

Much of the focus and interest in artificial systems is in increasing their capabilities: optimizing algorithms, increasing data quality, and expanding the domains to which they are applied. Recent years have seen striking progress in AI systems—from algorithms capable of teaching themselves superhuman chess-playing ability in a matter of hours [6]; to predictive maintenance via machine learning with military vehicles, allowing AI to flag failing vehicle parts before they break down in hostile territory [7]. However, our focus here is on an aspect of artificial system development that has gained less attention: the question of how people interact with these emerging technologies.

This issue is crucial for at least two reasons. First, in some areas it is agreed that there must be meaningful human control in place—effectively placing a limit on autonomy—and the respective roles of human and artificial systems must be considered. Second, and more broadly, artificial system applications (especially for AI) remain narrow and specialized, such that situations involving complex, ill-defined, strategic problems are likely to depend on human input for the foreseeable future. Thus, in the coming years, effective artificial systems deployment will not only depend on the quality of those systems, but also human operators’ ability to work effectively with those artificial systems. The importance of human-machine teaming (HMT) is explicitly recognized in the emphasis within the DoD’s AI strategy on “human-centered AI,” and has been discussed in depth elsewhere in military strategy [8].

Some of the work to be done in the HMT domain will be in the vein of well-established human factors principles [9], namely, designing systems that are sensitive to human capabilities: people’s unparalleled ability to communicate naturally with others, solve complex and ill-defined problems, and understand their world using a combination of domain expertise and common sense; especially versus striking limitations in human operators’ cognitive capacity and ability to sustain attention, and in their susceptibility to cognitive bias. Understanding the relative strengths and limitations of human vs. artificial systems will remain critical to effective HMT in the coming years. Beyond this, however, we expect artificial systems to increasingly act as team members working alongside human operators, rather than as tools used by those operators—providing specialist skills and abilities that complement those provided by people.

As such, effective HMT will depend on factors that govern effective human teams— namely, trust and effective communication. Decades of research in psychology have documented the principles of trust, and there is a growing body of work applying these principles to understand what is shared and what is distinctive about human-machine trust. Here we briefly review this work and discuss future directions for this emerging field.

Human Trust

Current research regarding HMT is predicated on the assumption that human trust of an artificial system in any form is guided by factors that determine trust in normal human-human interactions [10]. Trust has been defined in a highly influential theory as “the willingness of a party to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor, irrespective of the ability to monitor or control that other party” (emphasis added) [11]. As automation and AI increase in capability and complexity, human operators will find their abilities to monitor or possibly even control artificial systems lessened. Therefore, understanding the components of human-machine trust and factors that influence the user’s desire to trust will be vital in designing and fielding future artificial systems.

Figure 1. Trust factors, adapted from Mayer, Davis, & Schoorman [11]

To analyze the components of human-machine trust, we adapt the two-party trust model developed by Mayer et al. [11]. Their model consists of a trusting party (trustor) and a party to be trusted (trustee). In our adaptation, the trustor is our human operator and the trustee is the artificial system, whether it be automation or AI-based (see Figure 1). The three perceived trustworthiness factors of the trustee are their Ability, Benevolence, and Integrity (ABI). The trustee’s Ability to perform its given task well, as we will discuss later, will remain one of the most important factors in determining human trust. Benevolence and Integrity refer to the trustor’s belief that the trustee wants to act in the trustor’s interests and adheres to a set of principles that the trustor finds acceptable.

Meanwhile, the trustor is largely characterized by their general willingness to trust others and the relative weight they place on ability, benevolence, and integrity as key factors in winning and maintaining their trust. This propensity to trust is considered a stable characteristic of the trustor and, importantly, mediates the relationship between the trustee’s ABI and overall relationship trust. Thus, the model emphasizes that individual characteristics and personality traits have a heavy influence in the HMT relationship.

Figure 2. Integrated, three-layer trust model, adapted from Hoff & Bashir [12]

Hoff and Bashir have developed a related model that applies specifically to the case of human trust in artificial systems [12]. Their model simplifies the critical determinants of trust to two factors—the ability of the system as experienced by the user (“learned trust,” depicted in Figure 2) and the user’s propensity to trust. They unpack the user’s propensity into two components: dispositional trust and situational trust.

Dispositional trust is what the user brings to the interaction with automation, reflecting his or her culture, age, gender, personality, etc. As such, it varies diversely from person to person, rendering it a difficult concept to isolate, study, and effectively implement into future artificial systems. However, trends and correlations have begun to emerge in the past decade in ongoing research regarding characteristics affecting human-machine trust. This is especially true with the advent of specific assessment tools which measure human personality traits in relation to various types of artificial systems [13–16].

Developers and users of technologies should continue to support research that aims to uncover the trends that drive individual differences in the use of artificial systems. Further, in a military context, commanders should ensure that operators are trained to appropriately rely on their algorithmic counterparts, with trust that scales with the utility and flexibility of these systems—never blindly accepting and complying with their outputs. Future AI should have the ability to adapt to its user, much like computer systems today have user preferences. However, in the case of AI, user preferences will be determined from machine learning that works to maximize correct and confident output by its individual user. This will be driven by the aforementioned research efforts and could manifest in the form of individualized appearance, communication mode and style, etc., as well as advanced features such as information about its own reliability (confidence), as discussed below.

Inclusion of situational trust in Hoff and Bashir’s model emphasises that people’s reliance on automation will vary according to context. External factors, such as task difficulty, workload, perceived risks and benefits, organizational setting, and framing of the task can all affect trust in artificial systems. For example, Ross conducted an experiment that utilized decision aids in a video-based search-and-rescue task [17]. In this study, human users were able to adjust their reliance and trust on a single automated decision aid based on that aid’s actual ability. However, appropriate reliability of a single decision aid was miscalculated when an additional decision aid was present and produced a different outcome (i.e., the decision aids had mixed reliability levels). In Ross’ experiment, outcome variation due to external factors (i.e., having another decision aid present) resulted in a bias which significantly impacted human trust.

Trust in automation also depends on variable factors of the user, such as their subject matter expertise, mood, and attentional capability. For example, De Vries, Midden, and Bouwhuis experimentally observed that human participants who were more self-confident in a mapping task used automation less throughout the study even though their workload would have been lessened [18].

Hoff and Bashir’s model provides a similarly detailed analysis of how trust reflects the perceived ability of an artificial system. Learned trust will depend partly on a user’s initial pre-conceptions: their expectations, the reputation of the system and/or brand, prior experience with a similar technology, and their understanding of the system. For example, Bliss, Dunn, and Fuller assessed the impact of “hearsay” on the reliability of an automated agent prior to interaction [19]. This information inflated the perceived accuracy of the automated aid, which subsequently increased participants’ initial reliance on the aid over that of a control group.

Once established, trust then changes dynamically over the course of system experience such that, through repeated interactions, a new reliance strategy is formed by the user. These interactions depend critically on the observed performance of the system [20], but human operators are far from perfect in evaluating system performance even when clear feedback is provided [21]. Moreover, in the absence of feedback, human users have been shown to weigh advice differently based on if it agrees with them or not, regardless of accuracy [22].

In addition to system performance, trust also depends on usability features of the system, including its appearance, ease-ofuse, communication style, transparency, and the operator’s level of control. Dynamic learned trust has been demonstrated experimentally. For example, Metzger & Parasuraman conducted an experiment to see how experienced air traffic controllers responded to an automated aircraft collision warning system [23]. In their study, operators were first exposed to a warning system with perfect reliability, then the automated aid became less reliable. This change in reliability had a significantly negative effect on the user’s trust of the automated aid.

Together, these models of trust highlight several conclusions relevant for human-machine teaming. First, users’ reliance on artificial systems varies according to their trust in these systems: Technological developments will only provide military advantage to the extent that they are trusted by human operators. Second, trust depends not only on objective system performance, but also on the complex psychology of trust that crucially includes the users’ propensity to trust according to their personality, prior experience, expectations, and their current situation. Third, users will form an initial reliance strategy that is created even before any experience with a particular system—biasing their trust even before any system interaction takes place. Thus, identifying specific aspects of dispositional, situational, and learned trust is of great importance to understanding how best to design and deploy new artificial systems in the service of effective HMT.

Trust in Human-Machine Teaming

Our recent research explores the themes identified above in the context of an AI decision aid for visual judgment tasks that are designed to parallel operationally-relevant tasks (e.g., is there a military installation in this satellite image?; is there suspicious activity in this radar map?). For example, while system performance remains the most important factor in determining human-machine trust, we observe large individual differences in actual use and reliance for artificial systems. We conducted an experiment where participants were required to complete a task with the assistance of either a human or a computer advisor. Even though human and computer advice was equally accurate and presented in the same format, we observed large individual differences in advisor influence and choice.

Figure 3. Plot of individual participants depicting advisor influence and choice. There is a strong correlation between an individual user’s preference for one advisor over the other (y-axis values) and the relative influence of the two advisors’ advice (x-axis). However, across individual users, there is substantial variability across users in whether this preference is for human or algorithmic advice (evident as wide scatter in datapoints across both axes).

Figure 3 depicts one of the significant findings from our study, showing variation across individuals (blue circles) according to their relative preference for choosing computer vs. human advice (y-axis) and being more influenced in their decision making by computer vs. human advice when provided (x-axis). People showed a strong tendency to choose the advisor they were more influenced by (datapoints generally fall on the green regression trend line), as one might expect, but there was a wide and varied difference regarding which advisor was more influential and preferred: Some participants overwhelmingly chose computer advice (e.g., with four people choosing computer advice more than 90% of the time), but others showed consistent aversion to the artificial system (with four others choosing it less than 30% of the time).

This finding brings to the forefront the importance of individual differences when it comes to propensity to trust. Singh, Molloy, & Parasuraman conducted ground-breaking research in identifying characteristics that could predict future automation-use traits [24]. They developed a psychometric measurement tool, the Complacency-Potential Rating Scale, which measures a person’s individualized automation-induced complacency. Figure 3 shows that, all things being equal, people remain diverse in their preferences and usage strategies for artificial systems. A current focus of our research is on uncovering the traits that drive these individual differences when it comes to artificial systems employment—a potentially critical factor in the effectiveness of emerging technologies. As Paul Scharre aptly stated in 2014, “the winner…will not be who develops [the] technology first or even who has the best technology, but who figures out how to best use it [25].”

A second strand of our current research focuses on effective communication in decision making involving human-machine teams. This work focuses in particular on communication about uncertainty and confidence. Complex tasks are typically characterized by uncertainty—in the information provided, its provenance and reliability, the courses of action available, etc.

When addressing these problems, it is crucial to appropriately deal with uncertainty. And when it comes to group decision making, team members need to communicate and integrate their uncertainty—or its converse, their confidence. As we have known for over 100 years in psychology, but are just beginning to understand the significance of, people’s confidence judgements about their own decisions convey very useful information about the reliability of those decisions [26]. People seem exquisitely sensitive to the information carried by statements of confidence.Consistent with this, evidence indicates that communication of confidence critically underpins human-human trust in decision making. Confidently expressed opinions have more influence [27] but trust is lost if this confidence proves to be unfounded [28]. More confident decision makers are less receptive to advice [29] and down-weight dissenting opinions [22].

Also, teams can make optimal decisions that outperform the best individual decision makers only if team members communicate confidence effectively [30]. Bahrami and colleagues used simple visual judgment tasks in their work to research which human-human communication model produced an optimal joint decision [30]. They showed that communication about confidence (which they defined as a person’s subjective estimate of their probability of being correct) can improve team decision making, with the group significantly outperforming the best individual within the team.

In human-machine teams, communication is predominately provided only in terms of the answer or advice from the artificial system to the human user. HMT may remain critically limited if confidence cues are ignored or lacking. Specifically, the absence of artificial system confidence cues may give rise to overuse or misuse of the system through at least two distinct mechanisms: perceived overconfidence (of the artificial system) and attributed sources of uncertainty. For overconfidence, human advice is discounted when confidence is poorly calibrated (i.e., it correlates weakly with objective accuracy) [28, 27]. Therefore, if an artificial system teammate is perceived as being confident but incorrect, this could have a large negative impact on human user trust and reliance. Moreover, even though people are generally more influenced by advice that is given confidently (as opposed to unconfidently), they are not averse to uncertain advice. In particular, people show greater trust when advice acknowledges the uncertainty that is inherent in many complex decision making contexts [31, 32].

Figure 4. Study participants favored advisors that communicated confidence over advisors that did not.

Both the communication of confidence itself and the manner in which it is expressed matter. In another of our experiments involving visual judgments with AI decision system support, we assessed preferences between two different computer algorithms. The advisors gave equally accurate advice, but one also communicated its confidence (e.g., a recommendation with 75% confidence) while the other did not (simply stating its answer). In Figure 4, the impact of communicated confidence is clear when it comes to advisor preference. Participants chose the advisor that gave them confidence information significantly more than the advisor that did not provide that information.

This simple study illustrates human user’s sensitivity to subtle aspects of communication in HMT. Given that uncertainty is inherent in many complex task domains, implementing confidence communication capabilities into artificial systems may drastically impact the team building capability of human-machine teams. More broadly, these findings illustrate that it is crucial to understand how people interact with each other in effective teams (and indeed in ineffective teams), because their behavior in HMT contexts will reflect these established patterns of interaction.

Conclusion

Advances in AI promise to have a transformational impact on military strategy and capability in the coming years. However, fulfilling this promise depends on an understanding of human psychology to optimize human users’ trust and reliance on AI systems. Thus, governments, militaries, or other organizations who employ these technologies would do well to ensure that artificial systems are developed in such a way that designing for appropriate user trust and reliance is considered in the original design and initial implementation of the new technology.

Effective HMT means going beyond simply focusing on ensuring that artificial systems have the best performance possible. System design should also consider that new technologies must incorporate principles that support effective interactions among human team members—particularly as the complexity of AI systems increases such that they come to take on more complex roles and tasks that historically would have been the preserve of human team members. In the context of strategic decision making, this means effective communication of confidence and uncertainty. Further, producers and requestors of these technologies should also lead the way in encouraging continuing research in the diverse domain of human trust.

Authors’ note: This work was partially funded by a grant from the U.S. Air Force Office of Scientific Research, European Office of Aerospace Research and Development. The views expressed are those of the authors and do not necessarily reflect the official policy or position of the Air Force, the DoD, or the U.S. Government.

References

1. Brown, H. (1981, January). Department of Defense Annual Report Fiscal Year 1982. Washington, D.C.: Department of Defense. Retrieved from https://history.defense.gov/Portals/70/Documents/annual_reports/1982_DoD_AR.pdf?ver=2014-06-24-150904-113

2. Department of Defense. (2018). Summary of the 2018 National Defense Strategy of the United States of America: Sharpening the American Military’s Competitive Edge. Retrieved from https://dod.defense.gov/Portals/1/Documents/pubs/2018-National-Defense-Strategy-Summary.pdf

3. U.S. Department of Defense. (2018). Summary of the 2018 Department of Defense Artificial Intelligence Strategy: Harnessing AI to Advance our Security and Prosperity. Retrieved from https://media.defense.gov/2019/Feb/12/2002088963/-1/-1/1/SUMMARY-OF-DOD-AI-STRATEGY.PDF

4. Krendl, P. (2017, July 25). Better with ‘bots? Five questions to ask before automating oil and gas processes. Accenture Energy Blog. Retrieved from https://www.accenture.com/us-en/blogs/blogs-betterwith-bots

5. Agrawal, A., Gans, J., & Goldfarb, A. (2018). Prediction machines: The simple economics of artificial intelligence. Boston: Harvard Business Review Press.

6. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., … & Chen, Y. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354. doi:10.1038/nature24270

7. Gregg, A. (2018, June 26). Army to use artificial intelligence to predict which vehicles will break down. Washington Post. Retrieved from https://www.washingtonpost.com/business/capitalbusiness/army-to-use-artificial-intelligence-to-predict-which-vehicles-will-break-down/2018/06/25/bfa1ef34-789f-11e8-93cc-6d3beccdd7a3_story.html

8. United Kingdom Ministry of Defence. (2018, May). Joint Concept Note (JCN) 1/18, Human-Machine Teaming. Retrieved from https://www.gov.uk/government/publications/human-machine-teaming-jcn-118

9. Wickens, C., Hollands, J., Banbury, S., & Parasuraman, R. (2016). Engineering psychology and human performance (4th ed.). New York: Routledge.

10. de Visser, E. J., Pak, R., & Shaw, T. (2018). From ‘automation’ to ‘autonomy’: The importance of trust repair in human–machine interaction. Ergonomics, 61(10),1409–1427. doi:10.1080/00140139.2018.1457725

11. Mayer, R. C., Davis, J. H., & Schoorman, F. D. (1995, July). An Integrative Model of Organizational Trust. Academy of Management Review, 20(3), 709–734. doi:10.2307/258792

12. Hoff, K., & Bashir, M. (2014, September). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57(3), 407–434. doi:10.1177/0018720814547570

13. Jian, J. -Y., Bisantz, A., & Drury, C. (2000). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics, 4(1), 53–71. doi:10.1207/S15327566IJCE0401_04

14. Davis, D. F. (1989, September). Perceived usefulness, perceived ease of use, and user acceptance of information. MIS Quarterly, 13(3), 319–340. doi:10.2307/249008

15. Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3), 425–478. doi:10.2307/30036540

16. Charalambous, G., Fletcher, S., & Webb, P. (2016). The development of a scale to evaluate trust in industrial human-robot collaboration. International Journal of Social Robotics, 8(2), 193–209. doi:10.1007/s12369-015-0333-8

17. Ross, J. (2008). Moderators of trust and reliance across multiple decision aids. Electronic Theses and Dissertations, 3754. Retrieved from https://stars.library.ucf.edu/etd/3754/

18. de Vries, P., Midden, C., & Bouwhuis, D. (2003). The effects of errors on system trust, self-confidence, and the allocation of control in route planning. International Journal of Human Computer Studies, 58(6), 719–735. doi:10.1016/S1071-5819(03)00039-9

19. Bliss, J., Dunn, M., & Fuller, B. (1995).Reversal of the cry-wolf effect: An investigation of two methods to increase alarm response rates. Perceptual and Motor Skills, 80(3_suppl), 1231–1242. doi:10.2466/pms.1995.80.3c.1231

20. Hancock, P., Billings, D., Schaefer, K., Chen, J., De Visser, E., & Parasuraman, R. (2011). A meta-analysis of factors affecting trust in human-robot interaction. Human Factors, 53(5), 517–527. doi:10.1177/0018720811417254

21. de Visser, E. J., Monfort, S. S., McKendrick, R., Smith, M. A., McKnight, P. E., Krueger, F., & Parasuraman, R. (2016, September). Almost human: Anthropomorphism increases trust resilience in cognitive agents. Journal of Experimental Psychology: Applied, 22(3), 331–349. doi:10.1037/xap0000092

22. Pescetelli, N., & Yeung, N. (2018, September). On the use of metacognitive evidence in feedback-free situations: Advice-taking and trust formation. arXiv preprint. Retrieved from https://arxiv.org/pdf/1809.10453

23. Metzger, U. & Parasuraman, R. (2005). Automation in future air traffic management: Effects of decision aid reliability on controller performance and mental workload. Human Factors, 47(1), 35–49. doi:10.1518/0018720053653802

24. Singh, I. L., Molloy, R., & Parasuraman, R. (1993). Automation-induced” complacency”: Development of the complacency-potential rating scale. The International Journal of Aviation Psychology, 3(2), 111–122. doi:10.1207/s15327108ijap0302_2

25. Scharre, P. (2014). Robotics on the battlefield Part II: The coming swarm. Washington, DC: Center for a New American Security. Retrieved from https://www.cnas.org/publications/reports/roboticson-the-battlefield-part-ii-the-comingswarm

26. Henmon, V. (1911). The relation of the time of a judgment to its accuracy. Psychological Review, 18(3), 186–201. doi:10.1037/h0074579

27. Yaniv, I. (2004). Receiving other people’s advice: Influence and benefit. Organizational Behavior and Human Decision Processes, 93(1), 1–13. doi:10.1016/j.obhdp.2003.08.002

28. Tenney, E., Maccoun, R., Spellman, B., & Hastie, R. (2007). Calibration Trumps Confidence as a Basis for Witness Credibility. Psychological Science, 18(1), 46–50. doi:10.1111/j.1467-9280.2007.01847.x

29. See, K., Morrison, E., Rothman, N., & Soll, J. (2011). The detrimental effects of power on confidence, advice taking, and accuracy. Organizational Behavior and Human Decision Processes, 116(2), 272–285. doi:10.1016/j.obhdp.2011.07.006

30. Bahrami, B., Olsen, K., Latham, P., Roepstorff, A., Rees, G., & Frith, C. (2010). Optimally Interacting Minds. Science, 329(5995), 1081–1085. doi:10.1126/science. 1185718

31. Ülkümen, G., Fox, C., & Malle, B. (2016). Two Dimensions of Subjective Uncertainty: Clues From Natural Language. Journal of Experimental Psychology: General, 145(10), 1280–1297. doi:10.1037/xge0000202

32. Gaertig, C., & Simmons, J. P. (2018). Do people inherently dislike uncertain advice? Psychological Science, 29(4), 504–520. doi:10.1177/0956797617739369

Focus Areas

Cultural Studies

Want to find out more about this topic?

Request a FREE Technical Inquiry!

New Technical Inquiry

Subscribe to the Digest

Start a New Technical Inquiry