DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars

ICCV 2023

Abstract

We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images.

Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only.

In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.

Video


One-shot results

Multi-view merging

We can create an avatar from several images by merging the corresponding neural textures.

Bringing photos to life

An additional use case of our approach is to replace the person in the photo with their animated avatar. By doing this, we can achieve the effect of an animated photo.

Input image.

Input image

Output video


Input image.

Input image

Output video


BibTeX

@InProceedings{Svitov_2023_ICCV,
    author    = {Svitov, David and Gudkov, Dmitrii and Bashirov, Renat and Lempitsky, Victor},
    title     = {DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {7062-7072}
}