Configuration Reference

ViPE uses Hydra YAML presets for composition and Pydantic models for runtime validation. The tables below are generated from the Pydantic config models, so field descriptions, required values, and numeric constraints stay aligned with the code.

Common override examples:

uv run vipe infer assets/examples/dog-example.mp4 --pipeline dav3
uv run python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO.mp4
uv run python run.py pipeline=default streams.base_path=YOUR_VIDEO.mp4 pipeline.post.depth_align_model=null

Pipeline Presets

These are the pipeline values accepted by pipeline=... in Hydra overrides and by vipe infer --pipeline.

Preset Purpose Pipeline Class Camera Keyframe Depth Depth Post-Processing
default Default pipeline for pinhole videos. DefaultAnnotationPipeline pinhole unidepth-l adaptive_unidepth-l_svda
dav3 Default pipeline using Depth Anything 3 for keyframe and multiview depth. DefaultAnnotationPipeline pinhole dav3 mvd_dav3
lyra Configuration used for Lyra-style results, with MoGe keyframe depth and VDA alignment. DefaultAnnotationPipeline pinhole moge adaptive_moge_vda
no_vda Default pipeline without Video Depth Anything alignment. DefaultAnnotationPipeline pinhole unidepth-l adaptive_unidepth-l
static_vda Default pipeline without instance segmentation, using static VDA alignment. DefaultAnnotationPipeline pinhole unidepth-l adaptive_unidepth-l_vda
wide_angle Default pipeline configured for wide-angle or fisheye input. DefaultAnnotationPipeline mei unidepth-l null
panorama Panorama pipeline that projects 360-degree frames into virtual perspective views. PanoramaAnnotationPipeline panorama null null

Stream Presets

Use streams=raw_mp4_stream for videos and streams=frame_dir_stream for directories of frames.

Preset Implementation frame_start frame_end frame_skip cached
frame_dir_stream FrameDirStreamList 0 -1 1 false
raw_mp4_stream RawMP4StreamList 0 1000 1 false

Top-Level Config

ViPEConfig

Top-level ViPE runtime configuration.

Field Type Default Constraints Description
streams RawMP4StreamListConfig | FrameDirStreamListConfig required - Input stream list that supplies videos or frame directories to process.
pipeline DefaultPipelineConfig | PanoramaPipelineConfig required - Annotation pipeline and all pipeline-specific runtime options.

Input Streams

BaseStreamListConfig

Shared options for stream lists backed by videos or image-frame folders.

Field Type Default Constraints Description
base_path str required - Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories.
frame_start int required >= 0 First frame index to include.
frame_end int required >= -1 Exclusive end frame index. Use -1 to process through the end of each stream.
frame_skip int required >= 1 Frame stride. A value of 1 processes every frame.
cached bool required - Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily.

RawMP4StreamListConfig

Stream list that reads raw MP4 files.

Field Type Default Constraints Description
base_path str required - Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories.
frame_start int required >= 0 First frame index to include.
frame_end int required >= -1 Exclusive end frame index. Use -1 to process through the end of each stream.
frame_skip int required >= 1 Frame stride. A value of 1 processes every frame.
cached bool required - Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily.
instance vipe.streams.raw_mp4_stream.RawMP4StreamList required fixed vipe.streams.raw_mp4_stream.RawMP4StreamList Implementation class for MP4 video input streams.

FrameDirStreamListConfig

Stream list that reads directories of image frames.

Field Type Default Constraints Description
base_path str required - Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories.
frame_start int required >= 0 First frame index to include.
frame_end int required >= -1 Exclusive end frame index. Use -1 to process through the end of each stream.
frame_skip int required >= 1 Frame stride. A value of 1 processes every frame.
cached bool required - Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily.
instance vipe.streams.frame_dir_stream.FrameDirStreamList required fixed vipe.streams.frame_dir_stream.FrameDirStreamList Implementation class for image-frame directory input streams.

Pipelines

InstanceInitConfig

Object and sky-mask initialization used by the segmentation stage.

Field Type Default Constraints Description
kf_gap_sec float required > 0.0 Minimum time gap, in seconds, between keyframes used to initialize instance segmentation.
phrases list[str] required min items 1 Text prompts passed to the open-vocabulary detector for objects that should be segmented.
add_sky bool required - Add a sky mask to the instance segmentation output when the detector supports it.

DefaultInitConfig

Initialization options for the default pinhole and wide-angle pipelines.

Field Type Default Constraints Description
camera_type pinhole | panorama | simple_divisional | mei required choices pinhole | panorama | simple_divisional | mei Camera model used by SLAM and projection code. Use mei for wide-angle/fisheye input.
intrinsics geocalib | gt required choices geocalib | gt Source of camera intrinsics. geocalib estimates intrinsics; gt expects each frame to provide them.
instance InstanceInitConfig | null required - Instance-segmentation initialization. Set to null to skip instance masks.

PanoramaInitConfig

Initialization options for the panorama pipeline.

Field Type Default Constraints Description
instance InstanceInitConfig | null required - Instance-segmentation initialization. Set to null to skip instance masks.

VirtualCameraConfig

Perspective cameras sampled from a 360-degree panorama for SLAM.

Field Type Default Constraints Description
height int required >= 1 Height, in pixels, of each virtual perspective view.
fovx float required > 0.0, < 180.0 Horizontal field of view of each virtual view, in degrees.
fovy float required > 0.0, < 180.0 Vertical field of view of each virtual view, in degrees.
num_views int required >= 1 Number of evenly spaced horizontal virtual views.
top bool required - Add an upward-looking virtual view.
bottom bool required - Add a downward-looking virtual view.

PostConfig

Depth post-processing options.

Field Type Default Constraints Description
depth_align_model str | null required - Depth model or alignment recipe used after SLAM. Examples include adaptive_unidepth-l, adaptive_unidepth-l_svda, adaptive_moge_vda, mvd_dav3, dap, and unik3d. Set to null for pose-only output.

OutputConfig

Output paths and artifact/visualization controls.

Field Type Default Constraints Description
path str required - Directory where ViPE writes artifacts, visualization videos, and optional SLAM maps.
skip_exists bool required - Skip a sequence when the expected output already exists.
save_artifacts bool required - Save reusable RGB, pose, intrinsics, depth, and mask artifacts for visualization or downstream use.
save_slam_map bool false - Save the sparse SLAM reconstruction map for lightweight COLMAP conversion.
save_viz bool required - Render MP4 visualization videos for the configured visualization attributes.
viz_downsample int required >= 1 Downsample factor applied when rendering visualization videos.
viz_attributes list[list[rgb | instance | depth | pcd | rectified]] required min items 1 Groups of frame attributes to render into visualization videos. Each inner list becomes one panel.

DefaultPipelineConfig

Default annotation pipeline for pinhole and wide-angle videos.

Field Type Default Constraints Description
instance vipe.pipeline.default.DefaultAnnotationPipeline required fixed vipe.pipeline.default.DefaultAnnotationPipeline Implementation class for the default annotation pipeline.
init DefaultInitConfig required - Initial camera and instance-mask setup.
slam SLAMConfig required - SLAM and bundle-adjustment configuration.
post PostConfig required - Depth alignment and post-processing configuration.
output OutputConfig required - Output artifact and visualization configuration.

PanoramaPipelineConfig

Annotation pipeline for 360-degree panorama videos.

Field Type Default Constraints Description
instance vipe.pipeline.panorama.PanoramaAnnotationPipeline required fixed vipe.pipeline.panorama.PanoramaAnnotationPipeline Implementation class for the panorama annotation pipeline.
init PanoramaInitConfig required - Initial instance-mask setup for panorama input.
virtual VirtualCameraConfig required - Virtual perspective views projected from each panorama frame.
slam SLAMConfig required - SLAM and bundle-adjustment configuration for virtual views.
output OutputConfig required - Output artifact and visualization configuration.
post PostConfig required - Panorama depth estimation and post-processing configuration.

SLAM

SLAMConfig

DROID-style SLAM frontend, backend, map extraction, and metric-depth options.

Field Type Default Constraints Description
buffer int required >= 1 Maximum number of keyframes stored in the SLAM graph buffer.
beta float required >= 0.0 Relative weighting of translation and rotation when measuring frame motion from optical flow.
filter_thresh float required >= 0.0 Motion-filter threshold for accepting an incoming frame as a candidate keyframe.
warmup int required >= 0 Number of accepted keyframes used before regular frontend updates begin.
keyframe_thresh float required >= 0.0 Frontend motion threshold below which the second-newest keyframe is removed.
frontend_thresh float required >= 0.0 Distance threshold for adding frontend proximity edges.
frontend_window int required >= 1 Number of recent keyframes kept active in the frontend window.
frontend_radius int required >= 1 Frame-neighborhood radius forced into the frontend graph.
frontend_nms int required >= 1 Non-max suppression radius for frontend proximity edges.
seq_init bool required - Initialize poses sequentially before regular graph optimization.
frontend_backend_iters list[int] required - Accepted-keyframe counts at which the backend runs during frontend initialization.
backend_thresh float required >= 0.0 Distance threshold for adding backend proximity edges.
backend_radius int required >= 1 Frame-neighborhood radius forced into the backend graph.
backend_nms int required >= 1 Non-max suppression radius for backend proximity edges.
backend_iters int required >= 1 Number of backend optimization iterations.
init_disp float required > 0.0 Initial inverse-depth value assigned to new keyframes.
optimize_intrinsics bool required - Optimize camera intrinsics during bundle adjustment.
optimize_rig_rotation bool required - Optimize rig rotations for multi-view inputs.
cross_view bool required - Add cross-view reprojection factors for multi-view or panorama-derived inputs.
cross_view_idx list[int] | null required - Optional cross-view index selection. Set to null to use the default neighboring-view selection.
adaptive_cross_view bool required - Recompute cross-view pairs in the backend using current geometry.
infill_chunk_size int required >= 1 Chunk size for dense trajectory and disparity infill.
infill_dense_disp bool required - Also optimize dense disparity while filling non-keyframe outputs.
map_filter_thresh float required >= 0.0 Depth-consistency threshold used when filtering SLAM-map points and extracting dense disparity.
visualize bool required - Stream SLAM internals to rerun for debugging.
keyframe_depth str | null required - Metric depth model used on keyframes to recover scale. Examples include metric3d-small, unidepth-l, moge, and dav3. Set to null to skip keyframe metric-depth recovery.
ba BAConfig required - Bundle-adjustment solver options.
sparse_tracks SparseTracksConfig required - Sparse-track backend options.

BAConfig

Bundle-adjustment solver and robust loss options.

Field Type Default Constraints Description
dense_disp_alpha float required >= 0.0 Weight for dense-disparity regularization during bundle adjustment.
fused bool required - Use the fused CUDA bundle-adjustment path; unsupported layouts raise an error.
intrinsics_damping_scale float required > 0.0 Multiplier for damping applied to optimized camera intrinsics.
robust_kernel huber | tukey | gnc_tls | null required - Robust loss for dense-flow residuals. Set to null for L2 residuals.
robust_kernel_threshold float required > 0.0 Robust-kernel threshold in 1/8-resolution feature-map pixels.
gnc_mu_init float required > 0.0 Initial mu value for the GNC-TLS robust-kernel schedule.
gnc_mu_step float required > 0.0 Multiplicative mu update step for GNC-TLS.
gnc_mu_max float required > 0.0 Maximum mu value for GNC-TLS.
gnc_n_mu_steps int required >= 1 Number of GNC-TLS continuation steps.
gnc_gn_iters_per_mu int required >= 1 Number of Gauss-Newton iterations to run for each GNC-TLS mu value.

SparseTracksConfig

Sparse feature-track backend used to seed or support SLAM.

Field Type Default Constraints Description
name dummy | cuvslam required choices dummy | cuvslam Sparse-track provider. dummy disables external sparse tracks; cuvslam uses NVIDIA cuVSLAM.