Svitov David

Hello!

I defended my first Ph.D. in 2023 at the Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences on the topic "Performance optimization of convolutional neural networks in a face recognition system". The results of the work are implemented in the products of the partners of Expasoft LLC in which I worked during the Ph.D.

My current research interests are in the field of 3D CV. In particular avatars of people, since creating photorealistic animated 3D objects is much more difficult than creating just 3D objects.

This year I am starting my journey towards my second Ph.D in the ELLIS PhD program. I enrolled at the Università degli Studi di Genova in affiliation with the Istituto Italiano di Tecnologia. I will work under the supervision of Alessio Del Bue and co-supervision of Lourdes Agapito.

You can get in touch with me using the following links:

PROJECTS

BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis

Arxiv preprint

We present billboard Splatting (BBSplat) - a novel approach for 3D scene representation based on textured geometric primitives.

BBSplat represents the scene as a set of optimizable textured planar primitives with learnable RGB textures and alpha-maps to control their shape. BBSplat primitives can be used in any Gaussian Splatting pipeline as drop-in replacements for Gaussians. The proposed primitives close the rendering quality gap between 2D and 3D Gaussian Splatting (GS), enabling the accurate extraction of 3D mesh as in the 2DGS framework. Additionally, the explicit nature of planar primitives enables the use of the ray-tracing effects in rasterization.

Our novel regularization term encourages textures to have a sparser structure, enabling an efficient compression that leads to a reduction in the storage space of the model up to x17 times compared to 3DGS. Our experiments show the efficiency of BBSplat on standard datasets of real indoor and outdoor scenes such as Tanks&Temples, DTU, and Mip-NeRF-360. Namely, we achieve a state-of-the-art PSNR of 29.72 for DTU at Full HD resolution.

HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior

ACCV 2024

We present HAHA - a novel approach for animatable human avatar generation from monocular input videos.

The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded.

We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.

MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video

WACV 2024

We present a system to create Mobile Realistic Fullbody (MoRF) avatars. MoRF avatars are rendered in real-time on mobile devices, learned from monocular videos, and have high realism. We use SMPL-X as a proxy geometry and render it with DNR (neural texture and image-2-image network). We improve on prior work, by overfitting perframe warping fields in the neural texture space, allowing to better align the training signal between different frames. We also refine SMPL-X mesh fitting procedure to improve the overall avatar quality. In the comparisons to other monocular video-based avatar systems, MoRF avatars achieve higher image sharpness and temporal consistency. Participants of our user study also preferred avatars generated by MoRF.

DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars

ICCV 2023

We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images.

Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture, we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the frontal view. The models in our pipeline are trained using 2D images and videos only.

In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.

MarginDistillation: distillation for margin-based softmax

Automation and Remote Control 2022

The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-ofthe-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and the face embeddings, predicted by the teacher network.

AmphibianDetector: adaptive computation for moving objects detection

Optoelectronics, Instrumentation and Data Processing 2021

Convolutional neural networks (CNN) allow achieving the highest accuracy for the task of object detection in images. Major challenges in further development of object detectors are false-positive detections and high demand of processing power. In this paper, we propose an approach to object detection which makes it possible to reduce the number of false-positive detections by processing only moving objects and reduce the required processing power for algorithm inference. The proposed approach is a modification of CNN already trained for object detection task. This method can be used to improve the accuracy of an existing system by applying minor changes to the algorithm. The efficiency of the proposed approach was demonstrated on the open dataset "CDNet2014 pedestrian".

BaldGAN

https://github.com/david-svitov/baldgan

An open source project for removing hair from the head shown in the photo.
The project contains a pre-trained neural network, code and training dataset.

Detection of suspicious objects on the basis of analysis of human X-ray images

Optoelectronics, Instrumentation and Data Processing 2017

A new approach is proposed for detection of suspicious objects in X-ray images for security assurance. The approach is based on using the statistical model of the image for detecting anomalies. The model is designed with the use of the “bag-of-words” with context definition of the word coordinates in the image during statistical pattern formation. It is experimentally demonstrated that this approach ensures adequate approximation of the result of detection of suspicious objects by humans.

EXPERIENCE

Research Engineer

Samsung AI Canter (SAIC)
Feb 2022 - Oct 2023

I am a research engineer at Vision, Learning & Telepresence (VIOLET) lab. I am doing research in the area of photorealistic human avatars, which has resulted in a first-author ICCV paper. The article was done under the supervision of Dr. Lempitsky.

Computer Vision Engineer

Expasoft LLC
Sep 2016 - Feb 2022

I was involved in the development and acceleration of neural networks for biometric tasks. Such as models for recognizing a person by voice or face. I have been developing new methods for distilling facial recognition models and speeding up neural network detectors. As a result, I published several articles and defended my Ph.D.

I also collaborated with Huawei on the development of generative neural networks for mobile NPUs.

EDUCATION

Ph.D., Mathematical Modeling, Numerical Methods and Program Complexes

Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences
2019 - 2022

Thesis: "Performance optimization of convolutional neural networks in a face recognition system".
Scientific adviser: Dr. Nezhevenko. Co-adviser: Dr. Alyamkin.

B.Sc., M.Sc., Informatics and computer engineering

Novosibirsk State University (NSU)
2012 - 2018

Bachelor's thesis: “Detection of suspicious objects on the basis of analysis of human X-ray images”. Scientific adviser: Dr. Kulikov

Master's thesis: “Applying deep learning techniques to detect local anomalies in X-ray images”. Scientific adviser: Dr. Kulikov

School

Higher College of Informatics
2010 - 2012

In high school, I studied at a specialized school with in-depth study of programming.