Svitov David
Hello!
I defended my first Ph.D. in 2023 at the Institute of Automation and Electrometry
of the Siberian Branch of the Russian Academy of Sciences on the topic
"Performance optimization of convolutional neural networks in a face recognition system".
The results of the work are implemented in the products of the partners of Expasoft LLC
in which I worked during the Ph.D.
My current research interests are in the field of 3D CV. In particular avatars of people,
since creating photorealistic animated 3D objects is much more difficult than creating just 3D objects.
This year I am starting my journey towards my second Ph.D in the ELLIS PhD program. I enrolled at the Università degli Studi di Genova in affiliation with the Istituto Italiano di Tecnologia. I will work under the supervision of Alessio Del Bue and co-supervision of Lourdes Agapito.
You can get in touch with me using the following links:
PROJECTS
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis
Arxiv preprint
We present billboard Splatting (BBSplat) - a novel approach for 3D scene representation based on textured geometric primitives.
BBSplat represents the scene as a set of optimizable textured planar primitives with learnable RGB textures and alpha-maps to control their shape.
BBSplat primitives can be used in any Gaussian Splatting pipeline as drop-in replacements for Gaussians.
Our method's qualitative and quantitative improvements over 3D and 2D Gaussians are most noticeable when fewer primitives are used, when BBSplat
achieves over 1200 FPS. Our novel regularization term encourages textures to have a sparser structure, unlocking an efficient compression that
leads to a reduction in storage space of the model.
Our experiments show the efficiency of BBSplat on standard datasets of real indoor and outdoor scenes such as Tanks&Temples, DTU, and Mip-NeRF-360.
We demonstrate improvements on PSNR, SSIM, and LPIPS metrics compared to the state-of-the-art, especially for the case when fewer primitives are used,
which, on the other hand, leads to up to 2 times inference speed improvement for the same rendering quality.
HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior
ACCV 2024
We present HAHA - a novel approach for animatable human avatar generation from monocular input videos.
The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and
high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X
parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair
and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering
artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded.
We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par
reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous
state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.
MoRF: Mobile Realistic Fullbody Avatars from a Monocular Video
WACV 2024
We present a system to create Mobile Realistic Fullbody (MoRF) avatars. MoRF avatars are rendered in real-time on mobile devices, learned from monocular videos, and have high realism. We use SMPL-X as a proxy geometry and render it with DNR (neural texture and image-2-image network). We improve on prior work, by overfitting perframe warping fields in the neural texture space, allowing to better align the training signal between different frames. We also refine SMPL-X mesh fitting procedure to improve the overall avatar quality. In the comparisons to other monocular video-based avatar systems, MoRF avatars achieve higher image sharpness and temporal consistency. Participants of our user study also preferred avatars generated by MoRF.
DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars
ICCV 2023
We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images.
Similarly to previous works, our method uses neural textures combined with the SMPL-X body model to achieve
photo-realistic quality of avatars while keeping them easy to animate and fast to infer. To restore the texture,
we use a latent diffusion model and show how such model can be trained in the neural texture space. The use of the
diffusion model allows us to realistically reconstruct large unseen regions such as the back of a person given the
frontal view. The models in our pipeline are trained using 2D images and videos only.
In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses
and viewpoints. In particular, the approach improves state-of-the-art on the SnapshotPeople public benchmark.
MarginDistillation: distillation for margin-based softmax
Automation and Remote Control 2022
The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-ofthe-art performance for the face recognition problem. Recently, lightweight
neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper,
we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the
proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same
angles between the class centers and the face embeddings, predicted by the teacher network.
AmphibianDetector: adaptive computation for moving objects detection
Optoelectronics, Instrumentation and Data Processing 2021
Convolutional neural networks (CNN) allow achieving the highest accuracy for the task of object detection in images. Major
challenges in further development of object detectors are false-positive detections and high demand of processing power.
In this paper, we propose an approach to object detection which makes it possible to reduce the number of false-positive detections
by processing only moving objects and reduce the required processing power for algorithm inference. The proposed approach is a modification
of CNN already trained for object detection task. This method can be used to improve the accuracy of an existing system by applying minor changes
to the algorithm. The efficiency of the proposed approach was demonstrated on the open dataset "CDNet2014 pedestrian".
BaldGAN
https://github.com/david-svitov/baldgan
An open source project for removing hair from the head shown in the photo.
The project contains a pre-trained neural network, code and training dataset.
Detection of suspicious objects on the basis of analysis of human X-ray images
Optoelectronics, Instrumentation and Data Processing 2017
A new approach is proposed for detection of suspicious objects in X-ray images for security assurance. The approach is
based on using the statistical model of the image for detecting anomalies. The model is designed with the use of the
“bag-of-words” with context definition of the word coordinates in the image during statistical pattern formation. It is
experimentally demonstrated that this approach ensures adequate approximation of the result of detection of suspicious
objects by humans.
EXPERIENCE
Research Engineer
Samsung AI Canter (SAIC)
Feb 2022 - Oct 2023
I am a research engineer at Vision, Learning & Telepresence (VIOLET) lab. I am doing research in the area of photorealistic human avatars, which has resulted in a first-author ICCV paper. The article was done under the supervision of Dr. Lempitsky.
Computer Vision Engineer
Expasoft LLC
Sep 2016 - Feb 2022
I was involved in the development and acceleration of neural networks for biometric tasks.
Such as models for recognizing a person by voice or face. I have been developing new methods for distilling
facial recognition models and speeding up neural network detectors. As a result, I published several articles
and defended my Ph.D.
I also collaborated with Huawei on the development of generative neural networks for mobile NPUs.
EDUCATION
Ph.D., Mathematical Modeling, Numerical Methods and Program Complexes
Institute of Automation and Electrometry
of the Siberian Branch of the Russian Academy of Sciences
2019 - 2022
Thesis:
"Performance optimization of convolutional neural networks in a face recognition system".
Scientific adviser: Dr. Nezhevenko.
Co-adviser: Dr. Alyamkin.
B.Sc., M.Sc., Informatics and computer engineering
Novosibirsk State University (NSU)
2012 - 2018
Bachelor's thesis: “Detection of suspicious objects on the basis of analysis of human X-ray images”. Scientific adviser: Dr. Kulikov
Master's thesis: “Applying deep learning techniques to detect local anomalies in X-ray images”. Scientific adviser: Dr. Kulikov