Friday, April 5, 2024 12pm to 2pm
About this Event
View mapTitle: Modeling Theory of Mind in Multimodal Dialogue
Presenter: Dr. James Pustejovsky, TJX Feldberg Professor, Department of Computer Science, Volen National Center for Complex Systems, Brandeis University
Abstract: Theory of Mind (ToM) refers to the cognitive capacity that humans have to attribute mental states such as beliefs (true or false), desires, and intentions to oneself and others, thereby predicting and explaining behavior. Within the domain of Human-Computer Interaction (HCI), this concept has recently become more relevant for computational agents, especially in the context of multimodal communication. As multimodal interactions involve not only speech, but gestures, haptics, eye movement, and other types of input, each modality introduces subtleties which can be misinterpreted without a deeper understanding of the agent’s mental state. In this talk, I argue that Simulation Theory of Mind (SToM), encoded as an evidence-based dynamic epistemic logic (EB-DEL), can help model these complexities. Specifically, I apply this model to the problem of Common Ground Tracking (CGT) in task-oriented interactions.
Unlike dialogue state tracking (DST), which is the ability to update the representations of the speaker's needs at each turn in the dialogue by taking into account the past dialogue moves and history, common ground tracking (CGT) identifies the shared belief space held by all of the participants in a task-oriented dialogue. Within the framework of SToM, I present a method for automatically identifying the current set of shared beliefs and questions under discussion (QUDs) of a group with a shared goal. We annotate a dataset of multimodal interactions in a shared physical space with speech transcriptions, prosodic features, gestures, actions, and facets of collaboration, and operationalize these features for use in a deep neural model to predict moves toward construction of common ground. Model outputs cascade into a set of formal closure rules derived from situated evidence and belief axioms and update operations. We empirically assess the contribution of each feature type toward successful construction of common ground relative to ground truth.
Bio: James Pustejovsky is the TJX Feldberg Endowed Chair in Computer Science at Brandeis University, where he is also Chair of the Linguistics Program, Chair of the Computational Linguistics M.S. Program, and Director of the Lab for Linguistics and Computation. He conducts research in areas of computational linguistics, lexical semantics, multimodal interactions and reasoning, situated grounding, and developing standards and annotated datasets for machine learning. Currently, as part of the NSF-funded Institute for Student AI Teaming(iSAT) AI Institute, he and his lab are studying multimodal communication and nonverbal behavior in task-oriented workgroup and classroom interactions. The research question being addressed is how AI can help foster innovation, equity, and creativity in classroom settings, increasing the students' sense of inclusion and participation. In order to accomplish this, the student interactions in the classroom have to be studied and then computationally modeled, so that AI models can understand what is being communicated in the class. From a practical matter, this involves identifying speech, gaze, speaker orientation, gesture, and actions, all of which are annotated in order to build models for multi-agent multimodal behavior identification and tracking.