Tech

Novel framework can create egocentric human demonstrations for imitation learning

LovabledanielsNovember 29, 202439 Views

Credit: arXiv (2024). DOI: 10.48550/arxiv.2410.24221

One of the most promising approaches to teaching robots how to complete manual tasks such as cleaning dishes or preparing food is known as imitation learning. End-to-end imitation learning typically entails training a deep learning algorithm on raw videos, images and/or motion capture data of humans completing manual tasks.

During this training, the algorithm gradually learns to produce output actions (i.e., robot joint movements, trajectories, etc.) that would allow a robot to successfully complete the same tasks.

While imitation learning techniques can enhance the ability of robots to complete complex object manipulation tasks, they often do not allow robots to generalize across tasks that are not included in the training dataset. Moreover, collecting training demonstrations for a wide range of tasks can be challenging and requires advanced sensors or equipment.

Researchers at the Georgia Institute of Technology have recently introduced EgoMimic, a new framework that could be used to easily collect more varied demonstration data for imitation learning. This framework, introduced in a paper posted to the arXiv preprint server, offers a scalable platform for gathering video demonstrations of humans completing manual tasks, from the point of view of the person completing the task (i.e., egocentric).

“We present EgoMimic, a full-stack framework which scales manipulation via human embodiment data, specifically egocentric human videos paired with 3D hand tracking,” Simar Kareer, Dhruv Patel and their colleagues wrote in their paper.

“EgoMimic achieves this through: (1) a system to capture human embodiment data using the ergonomic Project Aria glasses, (2) a low-cost bimanual manipulator that minimizes the kinematic gap to human data, (3) cross-domain data alignment techniques, and (4) an imitation learning architecture that co-trains on human and robot data.”

The first component of the EgoMimic framework, the system to capture demonstration videos, relies on the use of Project Aria, wearable smart glasses created at Meta Reality Labs Research. These glasses are worn by humans as they are completing everyday manual tasks, to record the task from their viewpoint.

The bi-manual robotic system that the researchers used to tackle the same tasks completed by humans consists of two Viper X robotic arms integrating Intel’s RealSense wrist cameras, which are in turn controlled by two WidowX robotic arms. Notably, this bi-manual robot also “wears” Aria glasses when completing a task, as this minimizes the difference between the footage of human demonstrators completing tasks and the robot’s view of the workspace.

“Compared to prior works that only extract high-level intent from human videos, our approach treats human and robot data equally as embodied demonstration data and learns a unified policy from both data sources,” wrote Kareer, Patel and their colleagues.

The researchers tested their proposed framework by running a series of experiments in their lab, where their robot learned to complete long-horizon real-world tasks. For instance, the robot learned to pick up a small plush toy, place it in a bowl, pick up the bowl and dump the toy on the table, and then repeat this sequence of movements for 40 seconds.

Other tasks it was trained on included folding t-shirts in a particular fashion and filling a grocery bag with bags of chips. The results of these initial experiments were highly promising, as the EgoMimic framework yielded better performances on these three tasks than other state-of-the-art imitation learning techniques introduced in the past, while also allowing the robot to effectively apply the skills it learned to tasks that it had not encountered during training.

“EgoMimic achieves significant improvement on a diverse set of long-horizon, single-arm and bimanual manipulation tasks over state-of-the-art imitation learning methods and enables generalization to entirely new scenes,” wrote Kareer, Patel and their colleagues. “Finally, we show a favorable scaling trend for EgoMimic, where adding 1 hour of additional hand data is significantly more valuable than 1 hour of additional robot data.”

The code for the data processing and training models used by the researchers is available on GitHub. In the future, EgoMimic or adaptations of it could be employed by other roboticists worldwide to improve the performance and generalizability of various robotic systems across various everyday tasks that involve manipulating objects.

More information:
Simar Kareer et al, EgoMimic: Scaling Imitation Learning via Egocentric Video, arXiv (2024). DOI: 10.48550/arxiv.2410.24221

Journal information:
arXiv

Citation:
Novel framework can create egocentric human demonstrations for imitation learning (2024, November 29)
retrieved 29 November 2024
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.