Configuration Reference¶
ViPE uses Hydra YAML presets for composition and Pydantic models for runtime validation. The tables below are generated from the Pydantic config models, so field descriptions, required values, and numeric constraints stay aligned with the code.
Common override examples:
uv run vipe infer assets/examples/dog-example.mp4 --pipeline dav3
uv run python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO.mp4
uv run python run.py pipeline=default streams.base_path=YOUR_VIDEO.mp4 pipeline.post.depth_align_model=null
Pipeline Presets¶
These are the pipeline values accepted by pipeline=... in Hydra overrides and by vipe infer --pipeline.
| Preset | Purpose | Pipeline Class | Camera | Keyframe Depth | Depth Post-Processing |
|---|---|---|---|---|---|
default |
Default pipeline for pinhole videos. | DefaultAnnotationPipeline |
pinhole |
unidepth-l |
adaptive_unidepth-l_svda |
dav3 |
Default pipeline using Depth Anything 3 for keyframe and multiview depth. | DefaultAnnotationPipeline |
pinhole |
dav3 |
mvd_dav3 |
lyra |
Configuration used for Lyra-style results, with MoGe keyframe depth and VDA alignment. | DefaultAnnotationPipeline |
pinhole |
moge |
adaptive_moge_vda |
no_vda |
Default pipeline without Video Depth Anything alignment. | DefaultAnnotationPipeline |
pinhole |
unidepth-l |
adaptive_unidepth-l |
static_vda |
Default pipeline without instance segmentation, using static VDA alignment. | DefaultAnnotationPipeline |
pinhole |
unidepth-l |
adaptive_unidepth-l_vda |
wide_angle |
Default pipeline configured for wide-angle or fisheye input. | DefaultAnnotationPipeline |
mei |
unidepth-l |
null |
panorama |
Panorama pipeline that projects 360-degree frames into virtual perspective views. | PanoramaAnnotationPipeline |
panorama |
null |
null |
Stream Presets¶
Use streams=raw_mp4_stream for videos and streams=frame_dir_stream for directories of frames.
| Preset | Implementation | frame_start |
frame_end |
frame_skip |
cached |
|---|---|---|---|---|---|
frame_dir_stream |
FrameDirStreamList |
0 |
-1 |
1 |
false |
raw_mp4_stream |
RawMP4StreamList |
0 |
1000 |
1 |
false |
Top-Level Config¶
ViPEConfig¶
Top-level ViPE runtime configuration.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
streams |
RawMP4StreamListConfig | FrameDirStreamListConfig | required | - | Input stream list that supplies videos or frame directories to process. |
pipeline |
DefaultPipelineConfig | PanoramaPipelineConfig | required | - | Annotation pipeline and all pipeline-specific runtime options. |
Input Streams¶
BaseStreamListConfig¶
Shared options for stream lists backed by videos or image-frame folders.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
base_path |
str | required | - | Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories. |
frame_start |
int | required | >= 0 | First frame index to include. |
frame_end |
int | required | >= -1 | Exclusive end frame index. Use -1 to process through the end of each stream. |
frame_skip |
int | required | >= 1 | Frame stride. A value of 1 processes every frame. |
cached |
bool | required | - | Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily. |
RawMP4StreamListConfig¶
Stream list that reads raw MP4 files.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
base_path |
str | required | - | Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories. |
frame_start |
int | required | >= 0 | First frame index to include. |
frame_end |
int | required | >= -1 | Exclusive end frame index. Use -1 to process through the end of each stream. |
frame_skip |
int | required | >= 1 | Frame stride. A value of 1 processes every frame. |
cached |
bool | required | - | Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily. |
instance |
vipe.streams.raw_mp4_stream.RawMP4StreamList |
required | fixed vipe.streams.raw_mp4_stream.RawMP4StreamList |
Implementation class for MP4 video input streams. |
FrameDirStreamListConfig¶
Stream list that reads directories of image frames.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
base_path |
str | required | - | Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories. |
frame_start |
int | required | >= 0 | First frame index to include. |
frame_end |
int | required | >= -1 | Exclusive end frame index. Use -1 to process through the end of each stream. |
frame_skip |
int | required | >= 1 | Frame stride. A value of 1 processes every frame. |
cached |
bool | required | - | Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily. |
instance |
vipe.streams.frame_dir_stream.FrameDirStreamList |
required | fixed vipe.streams.frame_dir_stream.FrameDirStreamList |
Implementation class for image-frame directory input streams. |
Pipelines¶
InstanceInitConfig¶
Object and sky-mask initialization used by the segmentation stage.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
kf_gap_sec |
float | required | > 0.0 | Minimum time gap, in seconds, between keyframes used to initialize instance segmentation. |
phrases |
list[str] | required | min items 1 | Text prompts passed to the open-vocabulary detector for objects that should be segmented. |
add_sky |
bool | required | - | Add a sky mask to the instance segmentation output when the detector supports it. |
DefaultInitConfig¶
Initialization options for the default pinhole and wide-angle pipelines.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
camera_type |
pinhole | panorama | simple_divisional | mei |
required | choices pinhole | panorama | simple_divisional | mei |
Camera model used by SLAM and projection code. Use mei for wide-angle/fisheye input. |
intrinsics |
geocalib | gt |
required | choices geocalib | gt |
Source of camera intrinsics. geocalib estimates intrinsics; gt expects each frame to provide them. |
instance |
InstanceInitConfig | null | required | - | Instance-segmentation initialization. Set to null to skip instance masks. |
PanoramaInitConfig¶
Initialization options for the panorama pipeline.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
instance |
InstanceInitConfig | null | required | - | Instance-segmentation initialization. Set to null to skip instance masks. |
VirtualCameraConfig¶
Perspective cameras sampled from a 360-degree panorama for SLAM.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
height |
int | required | >= 1 | Height, in pixels, of each virtual perspective view. |
fovx |
float | required | > 0.0, < 180.0 | Horizontal field of view of each virtual view, in degrees. |
fovy |
float | required | > 0.0, < 180.0 | Vertical field of view of each virtual view, in degrees. |
num_views |
int | required | >= 1 | Number of evenly spaced horizontal virtual views. |
top |
bool | required | - | Add an upward-looking virtual view. |
bottom |
bool | required | - | Add a downward-looking virtual view. |
PostConfig¶
Depth post-processing options.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
depth_align_model |
str | null | required | - | Depth model or alignment recipe used after SLAM. Examples include adaptive_unidepth-l, adaptive_unidepth-l_svda, adaptive_moge_vda, mvd_dav3, dap, and unik3d. Set to null for pose-only output. |
OutputConfig¶
Output paths and artifact/visualization controls.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
path |
str | required | - | Directory where ViPE writes artifacts, visualization videos, and optional SLAM maps. |
skip_exists |
bool | required | - | Skip a sequence when the expected output already exists. |
save_artifacts |
bool | required | - | Save reusable RGB, pose, intrinsics, depth, and mask artifacts for visualization or downstream use. |
save_slam_map |
bool | false |
- | Save the sparse SLAM reconstruction map for lightweight COLMAP conversion. |
save_viz |
bool | required | - | Render MP4 visualization videos for the configured visualization attributes. |
viz_downsample |
int | required | >= 1 | Downsample factor applied when rendering visualization videos. |
viz_attributes |
list[list[rgb | instance | depth | pcd | rectified]] |
required | min items 1 | Groups of frame attributes to render into visualization videos. Each inner list becomes one panel. |
DefaultPipelineConfig¶
Default annotation pipeline for pinhole and wide-angle videos.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
instance |
vipe.pipeline.default.DefaultAnnotationPipeline |
required | fixed vipe.pipeline.default.DefaultAnnotationPipeline |
Implementation class for the default annotation pipeline. |
init |
DefaultInitConfig | required | - | Initial camera and instance-mask setup. |
slam |
SLAMConfig | required | - | SLAM and bundle-adjustment configuration. |
post |
PostConfig | required | - | Depth alignment and post-processing configuration. |
output |
OutputConfig | required | - | Output artifact and visualization configuration. |
PanoramaPipelineConfig¶
Annotation pipeline for 360-degree panorama videos.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
instance |
vipe.pipeline.panorama.PanoramaAnnotationPipeline |
required | fixed vipe.pipeline.panorama.PanoramaAnnotationPipeline |
Implementation class for the panorama annotation pipeline. |
init |
PanoramaInitConfig | required | - | Initial instance-mask setup for panorama input. |
virtual |
VirtualCameraConfig | required | - | Virtual perspective views projected from each panorama frame. |
slam |
SLAMConfig | required | - | SLAM and bundle-adjustment configuration for virtual views. |
output |
OutputConfig | required | - | Output artifact and visualization configuration. |
post |
PostConfig | required | - | Panorama depth estimation and post-processing configuration. |
SLAM¶
SLAMConfig¶
DROID-style SLAM frontend, backend, map extraction, and metric-depth options.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
buffer |
int | required | >= 1 | Maximum number of keyframes stored in the SLAM graph buffer. |
beta |
float | required | >= 0.0 | Relative weighting of translation and rotation when measuring frame motion from optical flow. |
filter_thresh |
float | required | >= 0.0 | Motion-filter threshold for accepting an incoming frame as a candidate keyframe. |
warmup |
int | required | >= 0 | Number of accepted keyframes used before regular frontend updates begin. |
keyframe_thresh |
float | required | >= 0.0 | Frontend motion threshold below which the second-newest keyframe is removed. |
frontend_thresh |
float | required | >= 0.0 | Distance threshold for adding frontend proximity edges. |
frontend_window |
int | required | >= 1 | Number of recent keyframes kept active in the frontend window. |
frontend_radius |
int | required | >= 1 | Frame-neighborhood radius forced into the frontend graph. |
frontend_nms |
int | required | >= 1 | Non-max suppression radius for frontend proximity edges. |
seq_init |
bool | required | - | Initialize poses sequentially before regular graph optimization. |
frontend_backend_iters |
list[int] | required | - | Accepted-keyframe counts at which the backend runs during frontend initialization. |
backend_thresh |
float | required | >= 0.0 | Distance threshold for adding backend proximity edges. |
backend_radius |
int | required | >= 1 | Frame-neighborhood radius forced into the backend graph. |
backend_nms |
int | required | >= 1 | Non-max suppression radius for backend proximity edges. |
backend_iters |
int | required | >= 1 | Number of backend optimization iterations. |
init_disp |
float | required | > 0.0 | Initial inverse-depth value assigned to new keyframes. |
optimize_intrinsics |
bool | required | - | Optimize camera intrinsics during bundle adjustment. |
optimize_rig_rotation |
bool | required | - | Optimize rig rotations for multi-view inputs. |
cross_view |
bool | required | - | Add cross-view reprojection factors for multi-view or panorama-derived inputs. |
cross_view_idx |
list[int] | null | required | - | Optional cross-view index selection. Set to null to use the default neighboring-view selection. |
adaptive_cross_view |
bool | required | - | Recompute cross-view pairs in the backend using current geometry. |
infill_chunk_size |
int | required | >= 1 | Chunk size for dense trajectory and disparity infill. |
infill_dense_disp |
bool | required | - | Also optimize dense disparity while filling non-keyframe outputs. |
map_filter_thresh |
float | required | >= 0.0 | Depth-consistency threshold used when filtering SLAM-map points and extracting dense disparity. |
visualize |
bool | required | - | Stream SLAM internals to rerun for debugging. |
keyframe_depth |
str | null | required | - | Metric depth model used on keyframes to recover scale. Examples include metric3d-small, unidepth-l, moge, and dav3. Set to null to skip keyframe metric-depth recovery. |
ba |
BAConfig | required | - | Bundle-adjustment solver options. |
sparse_tracks |
SparseTracksConfig | required | - | Sparse-track backend options. |
BAConfig¶
Bundle-adjustment solver and robust loss options.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
dense_disp_alpha |
float | required | >= 0.0 | Weight for dense-disparity regularization during bundle adjustment. |
fused |
bool | required | - | Use the fused CUDA bundle-adjustment path; unsupported layouts raise an error. |
intrinsics_damping_scale |
float | required | > 0.0 | Multiplier for damping applied to optimized camera intrinsics. |
robust_kernel |
huber | tukey | gnc_tls | null |
required | - | Robust loss for dense-flow residuals. Set to null for L2 residuals. |
robust_kernel_threshold |
float | required | > 0.0 | Robust-kernel threshold in 1/8-resolution feature-map pixels. |
gnc_mu_init |
float | required | > 0.0 | Initial mu value for the GNC-TLS robust-kernel schedule. |
gnc_mu_step |
float | required | > 0.0 | Multiplicative mu update step for GNC-TLS. |
gnc_mu_max |
float | required | > 0.0 | Maximum mu value for GNC-TLS. |
gnc_n_mu_steps |
int | required | >= 1 | Number of GNC-TLS continuation steps. |
gnc_gn_iters_per_mu |
int | required | >= 1 | Number of Gauss-Newton iterations to run for each GNC-TLS mu value. |
SparseTracksConfig¶
Sparse feature-track backend used to seed or support SLAM.
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
name |
dummy | cuvslam |
required | choices dummy | cuvslam |
Sparse-track provider. dummy disables external sparse tracks; cuvslam uses NVIDIA cuVSLAM. |