Ego-Pose Estimation and Forecasting as Real-Time PD Control


We propose the use of a proportional-derivative (PD) control based policy learned via reinforcement learning (RL) to estimate and forecast 3D human pose from egocentric videos. The method learns directly from unsegmented egocentric videos and motion capture data consisting of various complex human motions (e.g., crouching, hopping, bending, and motion transitions). We propose a video-conditioned recurrent control technique to forecast physically-valid and stable future motions of arbitrary length. We also introduce a value function based fail-safe mechanism which enables our method to run as a single pass algorithm over the video data. Experiments with both controlled and in-the-wild data show that our approach outperforms previous art in both quantitative metrics and visual quality of the motions, and is also robust enough to transfer directly to real-world scenarios. Additionally, our time analysis shows that the combined use of our pose estimation and forecasting can run at 30 FPS, making it suitable for real-time applications.



  title={Ego-Pose Estimation and Forecasting as Real-Time PD Control},
  author={Yuan, Ye and Kitani, Kris},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},

Template adapted from GLAMR.