Toronto AI Lab
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

1NVIDIA
2University of Toronto
3Vector Institute
NeurIPS 2021

We introduce DMTet, a deep 3D conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels or noisy point cloud.

Abstract


We introduce DMTet, a deep 3D conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels. It marries the merits of implicit and explicit 3D representations by leveraging a novel hybrid 3D representation. Compared to the current implicit approaches, which are trained to regress the signed distance values, DMTet directly optimizes for the reconstructed surface, which enables us to synthesize finer geometric details with fewer artifacts. Unlike deep 3D generative models that directly generate explicit representations such as meshes, our model can synthesize shapes with arbitrary topology. The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation. This combination allows joint optimization of the surface geometry and topology as well as generation of the hierarchy of subdivisions using reconstruction and adversarial losses defined explicitly on the surface mesh. Our approach significantly outperforms existing work on conditional shape synthesis from coarse voxel inputs, trained on a dataset of complex 3D animal shapes. Project page: https://nv-tlabs.github.io/DMTet/.




DMTet predicts the underlying surface parameterized by an implicit function encoded via a deformable tetrahedral grid. The underlying surface is converted into an explicit mesh with a Marching Tetrahedra (MT) algorithm, which we show is differentiable. Therefore, DMTet can jointly optimize the surface geometry and topology using losses defined expliciitly on the surface mesh.

Here we demonstrate this with a 2D example, where the loss is defined as the distance between extracted surface (shown in red) with ground truth point cloud (shown in purple).

3D Shape Synthesis from Coarse Voxels


Qualitative results on 3D Shape Synthesis from Coarse Voxels. Comparing with all baselines, our method (fifth column) reconstructs shapes with much higher quality. Adding GAN (highlighted in orange) further improves the realism of the generated shape. We also show the retrieved shapes from the training set in the second last column.





DMTet generalizes to human-created low-resolution voxels collected online. Despite the fact that these human-created shapes (in yellow) have noticeable differences with our coarse voxels used in training, e.g., different ratios of body parts compared with our training shapes (larger head, thinner legs, longer necks), our model faithfully generates high-quality 3D details (in blue) conditioned on each coarse voxel – an exciting result.

Point Cloud 3D Reconstruction


Qualitative results on 3D Reconstruction from Point Clouds: Our model reconstructs shapes with more geometric details compared to baselines using different representations - voxels, deforming a mesh with a fixed template, deforming a mesh generated from a volumetric representation, tetrahedral mesh, and implicit functions.

Quantitative Results on Point Cloud Reconstruction (Chamfer L1). Our model outperforms ConvOnet, the SOTA implicit approach, across all object categories even running at a low grid resolution (in yellow). Our model run significantly faster at inference time and produces an explicit mesh as output, making it suitable for interactive graphic application.
With volume and surface subdivision (in blue) we can further boost the reconstruction quality. In this case, we apply volume subdivision once which double the grid resolution. The runtime is only doubled instead of growing cubically as resolution increases.

We demonstrate the effect of learning on surface by comparing with the oracle performance of MT/MC evaluated on ShapeNet Chairs. Without deforming the grid, DMTET outperforms the oracle performance of MT by a large margin when querying the same number of points, although DMTET predicts the surface from noisy point cloud. This demonstrates that directly optimizing the reconstructed surface can mitigate the discretization errors imposed by MT to a large extent.

Citation



        @inproceedings{shen2021dmtet,
        title = {Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis},
        author = {Tianchang Shen and Jun Gao and Kangxue Yin and Ming-Yu Liu and Sanja Fidler},
        year = {2021},
        booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}
        }


Paper


Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler

description Paper camera-ready
description arXiv version
description Supplement
insert_comment BibTeX