
Capture, process, and export multimodal data on hardware you already own.
Ten million frames of in-the-wild egocentric activity. 200 hours, 354 sessions, 108 minutes longest continuous capture. Every frame annotated with depth, 6-DoF pose, MANO hands, and an atomic-to-session-scale instruction tree. Available on Hugging Face.
Start your own portable data lab with an iPhone Pro and the Stera App. ARKit fuses RGB, depth, IMU, and 6-DOF tracking entirely on-device. Letting you capture multi-modal data anywhere. Just mount, record, and go.
pip install stera-sdkOne pipeline turns every session into multiple modalities - RGB-D, 6-DoF poses, 21 MANO articulations, per-hand, IMU, upper body co-ordinates, 3D mesh for real-to-sim, hierarchical textual instruction trees with no human in the loop.