Tech

3D streaming gets leaner by seeing only what matters

LovabledanielsApril 9, 202559 Views

3D streaming gets leaner by seeing only what matters — (a) The viewing pyramid defines potentially visible content between near and far planes; only highlighted points are actually visible from the viewpoint. (b) The resulting view as seen by the viewer. (c) By dividing the point cloud into 3D cells, only those containing visible points need transmission, optimizing data usage. Credit: *Proceedings of the 16th ACM Multimedia Systems Conference* (2025). DOI: 10.1145/3712676.3714435

A new approach to streaming technology may significantly improve how users experience virtual reality and augmented reality environments, according to a study from NYU Tandon School of Engineering.

The research—presented in a paper at the 16th ACM Multimedia Systems Conference (ACM MMSys 2025) on April 1, 2025—describes a method for directly predicting visible content in immersive 3D environments, potentially reducing bandwidth requirements by up to 7-fold while maintaining visual quality.

The technology is being applied in an ongoing NYU Tandon project to bring point cloud video to dance education, making 3D dance instruction streamable on standard devices with lower bandwidth requirements.

“The fundamental challenge with streaming immersive content has always been the massive amount of data required,” explained Yong Liu—professor in the Electrical and Computer Engineering Department (ECE) at NYU Tandon and faculty member at both NYU Tandon’s Center for Advanced Technology in Telecommunications (CATT) and NYU WIRELESS—who led the research team.

“Traditional video streaming sends everything within a frame. This new approach is more like having your eyes follow you around a room—it only processes what you’re actually looking at.”

The technology addresses the “Field-of-View (FoV)” challenge for immersive experiences. Current AR/VR applications demand high bandwidth—a point cloud video (which renders 3D scenes as collections of data points in space) consisting of 1 million points per frame requires more than 120 megabits per second, nearly 10 times the bandwidth of standard high-definition video.

Unlike traditional approaches that first predict where a user will look and then calculate what’s visible, this new method directly predicts content visibility in the 3D scene. By avoiding this two-step process, the approach reduces error accumulation and improves prediction accuracy.

The system divides 3D space into “cells” and treats each cell as a node in a graph network. It uses transformer-based graph neural networks to capture spatial relationships between neighboring cells, and recurrent neural networks to analyze how visibility patterns evolve over time.

For pre-recorded virtual reality experiences, the system can predict what will be visible for a user 2–5 seconds ahead, a significant improvement over previous systems that could only accurately predict a user’s FoV a fraction of a second ahead.

“What makes this work particularly interesting is the time horizon,” said Liu. “Previous systems could only accurately predict what a user would see a fraction of a second ahead. This team has extended that.”

The research team’s approach reduces prediction errors by up to 50% compared to existing methods for long-term predictions, while maintaining real-time performance of more than 30 frames per second even for point cloud videos with over 1 million points.

For consumers, this could mean more responsive AR/VR experiences with reduced data usage, while developers can create more complex environments without requiring ultra-fast internet connections.

“We’re seeing a transition where AR/VR is moving from specialized applications to consumer entertainment and everyday productivity tools,” Liu said. “Bandwidth has been a constraint. This research helps address that limitation.”

The researchers have released their code to support continued development.

More information:
Chen Li et al, Spatial Visibility and Temporal Dynamics: Rethinking Field of View Prediction in Adaptive Point Cloud Video Streaming, Proceedings of the 16th ACM Multimedia Systems Conference (2025). DOI: 10.1145/3712676.3714435

Provided by
NYU Tandon School of Engineering

Citation:
3D streaming gets leaner by seeing only what matters (2025, April 9)
retrieved 9 April 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.