Tech

Research shows humans are still better than AI at reading the room

Share
Share
board room
Credit: Unsplash/CC0 Public Domain

Humans, it turns out, are better than current AI models at describing and interpreting social interactions in a moving scene—a skill necessary for self-driving cars, assistive robots, and other technologies that rely on AI systems to navigate the real world.

The research, led by scientists at Johns Hopkins University, finds that artificial intelligence systems fail at understanding social dynamics and context necessary for interacting with people and suggests the problem may be rooted in the infrastructure of AI systems.

“AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians. You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street,” said lead author Leyla Isik, an assistant professor of cognitive science at Johns Hopkins University.

“Any time you want an AI to interact with humans, you want it to be able to recognize what people are doing. I think this sheds light on the fact that these systems can’t right now.”

Kathy Garcia, a doctoral student working in Isik’s lab at the time of the research and co–first author, presented the research findings at the International Conference on Learning Representations on April 24. The study is also published in the journal PsyArXiv.

To determine how AI models measure up compared to human perception, the researchers asked human participants to watch three-second video clips and rate features important for understanding social interactions on a scale of one to five. The clips included people either interacting with one another, performing side-by-side activities, or conducting independent activities on their own.

The researchers then asked more than 350 AI language, video, and image models to predict how humans would judge the videos and how their brains would respond to watching. For large language models, the researchers had the AIs evaluate short, human-written captions.

Participants, for the most part, agreed with each other on all the questions; the AI models, regardless of size or the data they were trained on, did not. Video models were unable to accurately describe what people were doing in the videos.

Even image models that were given a series of still frames to analyze could not reliably predict whether people were communicating. Language models were better at predicting human behavior, while video models were better at predicting neural activity in the brain.

The results provide a sharp contrast to AI’s success in reading still images, the researchers said.

“It’s not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in AI. But real life isn’t static. We need AI to understand the story that is unfolding in a scene. Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in AI model development,” Garcia said.

Researchers believe this is because AI neural networks were inspired by the infrastructure of the part of the brain that processes static images, which is different from the area of the brain that processes dynamic social scenes.

“There’s a lot of nuances, but the big takeaway is none of the AI models can match human brain and behavior responses to scenes across the board, like they do for static scenes,” Isik said. “I think there’s something fundamental about the way humans are processing scenes that these models are missing.”

More information:
Kathy Garcia et al. Modeling dynamic social vision highlights gaps between deep learning and humans. Hall 3 + Hall 2B #64

Kathy Garcia et al, Modeling dynamic social vision highlights gaps between deep learning and humans, PsyArXiv (2024). DOI: 10.31234/osf.io/4mpd9

Provided by
Johns Hopkins University


Citation:
Research shows humans are still better than AI at reading the room (2025, April 24)
retrieved 24 April 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Asus patches security flaw which could have bricked servers
Tech

Asus patches security flaw which could have bricked servers

American Megatrends International released a fix for MegaRAC Baseboard Management Controller (BMC)...

When does Doctor Who season 2 episode 3 come out on Disney+ and BBC One?
Tech

When does Doctor Who season 2 episode 3 come out on Disney+ and BBC One?

It’s time to input a new date into the TARDIS, Doctor Who...

Robotic system zeroes in on objects most relevant for helping humans
Tech

Robotic system zeroes in on objects most relevant for helping humans

Using a novel relevance framework developed at MIT, the robot identifies and...

Aerial robots offer safer, more sustainable construction methods
Tech

Aerial robots offer safer, more sustainable construction methods

Experiments with flying construction robots on the test wall of the DroneHub...