HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Department of Computer Science,
ETH Zurich
Teaser image.

HARP turns a short RGB video into a realistic hand avatar.

Abstract

We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry.

In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components.

As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability.

To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.

Result Video

Overview

Given a short monocular RGB video of a hand, our hand avatar creation method includes two steps: (1) coarse hand pose and shape estimation for initialization; (2) an optimization framework to reconstruct the personalized hand geometry and appearance. The hand geometry is first rasterized and combined with a normal map. Then, the shader combines the albedo, geometry, and lighting to render the personalized hand. The optimization solves the hand and scene parameters using only the input images.

Method overview.

Hand Avatar

Robustness

Given a short video, HARP can create an avatar from different capturing scenarios.

results in different dataset.

Out-of-distribution appearance

HARP works without modification for out-of-distribution appearance that cannot be captured by parametric models.

Out-of-distribution result.

BibTeX

@article{karunratanakul2022harp,
  author    = {Karunratanakul, Korrawe and Prokudin, Sergey and Hilliges, Otmar and Tang, Siyu},
  title     = {HARP: Personalized Hand Reconstruction from a Monocular RGB Video},
  journal   = {arXiv preprint arXiv:2212.09530},
  year      = {2022},
}