Configuration Reference¶

ViPE uses Hydra YAML presets for composition and Pydantic models for runtime validation. The tables below are generated from the Pydantic config models, so field descriptions, required values, and numeric constraints stay aligned with the code.

Common override examples:

uv run vipe infer assets/examples/dog-example.mp4 --pipeline dav3
uv run python run.py pipeline=default streams=raw_mp4_stream streams.base_path=YOUR_VIDEO.mp4
uv run python run.py pipeline=default streams.base_path=YOUR_VIDEO.mp4 pipeline.post.depth_align_model=null

Pipeline Presets¶

These are the pipeline values accepted by pipeline=... in Hydra overrides and by vipe infer --pipeline.

Preset	Purpose	Pipeline Class	Camera	Keyframe Depth	Depth Post-Processing
`default`	Default pipeline for pinhole videos.	`DefaultAnnotationPipeline`	`pinhole`	`unidepth-l`	`adaptive_unidepth-l_svda`
`dav3`	Default pipeline using Depth Anything 3 for keyframe and multiview depth.	`DefaultAnnotationPipeline`	`pinhole`	`dav3`	`mvd_dav3`
`lyra`	Configuration used for Lyra-style results, with MoGe keyframe depth and VDA alignment.	`DefaultAnnotationPipeline`	`pinhole`	`moge`	`adaptive_moge_vda`
`no_vda`	Default pipeline without Video Depth Anything alignment.	`DefaultAnnotationPipeline`	`pinhole`	`unidepth-l`	`adaptive_unidepth-l`
`static_vda`	Default pipeline without instance segmentation, using static VDA alignment.	`DefaultAnnotationPipeline`	`pinhole`	`unidepth-l`	`adaptive_unidepth-l_vda`
`wide_angle`	Default pipeline configured for wide-angle or fisheye input.	`DefaultAnnotationPipeline`	`mei`	`unidepth-l`	`null`
`panorama`	Panorama pipeline that projects 360-degree frames into virtual perspective views.	`PanoramaAnnotationPipeline`	`panorama`	`null`	`null`

Stream Presets¶

Use streams=raw_mp4_stream for videos and streams=frame_dir_stream for directories of frames.

Preset	Implementation	`frame_start`	`frame_end`	`frame_skip`	`cached`
`frame_dir_stream`	`FrameDirStreamList`	`0`	`-1`	`1`	`false`
`raw_mp4_stream`	`RawMP4StreamList`	`0`	`1000`	`1`	`false`

Top-Level Config¶

ViPEConfig¶

Top-level ViPE runtime configuration.

Field	Type	Default	Constraints	Description
`streams`	RawMP4StreamListConfig \| FrameDirStreamListConfig	required	-	Input stream list that supplies videos or frame directories to process.
`pipeline`	DefaultPipelineConfig \| PanoramaPipelineConfig	required	-	Annotation pipeline and all pipeline-specific runtime options.

Input Streams¶

BaseStreamListConfig¶

Shared options for stream lists backed by videos or image-frame folders.

Field	Type	Default	Constraints	Description
`base_path`	str	required	-	Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories.
`frame_start`	int	required	>= 0	First frame index to include.
`frame_end`	int	required	>= -1	Exclusive end frame index. Use -1 to process through the end of each stream.
`frame_skip`	int	required	>= 1	Frame stride. A value of 1 processes every frame.
`cached`	bool	required	-	Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily.

RawMP4StreamListConfig¶

Stream list that reads raw MP4 files.

Field	Type	Default	Constraints	Description
`base_path`	str	required	-	Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories.
`frame_start`	int	required	>= 0	First frame index to include.
`frame_end`	int	required	>= -1	Exclusive end frame index. Use -1 to process through the end of each stream.
`frame_skip`	int	required	>= 1	Frame stride. A value of 1 processes every frame.
`cached`	bool	required	-	Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily.
`instance`	`vipe.streams.raw_mp4_stream.RawMP4StreamList`	required	fixed `vipe.streams.raw_mp4_stream.RawMP4StreamList`	Implementation class for MP4 video input streams.

FrameDirStreamListConfig¶

Stream list that reads directories of image frames.

Field	Type	Default	Constraints	Description
`base_path`	str	required	-	Input path. For MP4 streams this can be one video or a directory of videos; for frame-dir streams this can be one frame directory or a directory containing multiple frame directories.
`frame_start`	int	required	>= 0	First frame index to include.
`frame_end`	int	required	>= -1	Exclusive end frame index. Use -1 to process through the end of each stream.
`frame_skip`	int	required	>= 1	Frame stride. A value of 1 processes every frame.
`cached`	bool	required	-	Cache each stream before processing. This helps with malformed videos whose frame counts are unreliable when decoded lazily.
`instance`	`vipe.streams.frame_dir_stream.FrameDirStreamList`	required	fixed `vipe.streams.frame_dir_stream.FrameDirStreamList`	Implementation class for image-frame directory input streams.

Pipelines¶

InstanceInitConfig¶

Object and sky-mask initialization used by the segmentation stage.

Field	Type	Default	Constraints	Description
`kf_gap_sec`	float	required	> 0.0	Minimum time gap, in seconds, between keyframes used to initialize instance segmentation.
`phrases`	list[str]	required	min items 1	Text prompts passed to the open-vocabulary detector for objects that should be segmented.
`add_sky`	bool	required	-	Add a sky mask to the instance segmentation output when the detector supports it.

DefaultInitConfig¶

Initialization options for the default pinhole and wide-angle pipelines.

Field	Type	Default	Constraints	Description
`camera_type`	`pinhole` \| `panorama` \| `simple_divisional` \| `mei`	required	choices `pinhole` \| `panorama` \| `simple_divisional` \| `mei`	Camera model used by SLAM and projection code. Use mei for wide-angle/fisheye input.
`intrinsics`	`geocalib` \| `gt`	required	choices `geocalib` \| `gt`	Source of camera intrinsics. geocalib estimates intrinsics; gt expects each frame to provide them.
`instance`	InstanceInitConfig \| null	required	-	Instance-segmentation initialization. Set to null to skip instance masks.

PanoramaInitConfig¶

Initialization options for the panorama pipeline.

Field	Type	Default	Constraints	Description
`instance`	InstanceInitConfig \| null	required	-	Instance-segmentation initialization. Set to null to skip instance masks.

VirtualCameraConfig¶

Perspective cameras sampled from a 360-degree panorama for SLAM.

Field	Type	Default	Constraints	Description
`height`	int	required	>= 1	Height, in pixels, of each virtual perspective view.
`fovx`	float	required	> 0.0, < 180.0	Horizontal field of view of each virtual view, in degrees.
`fovy`	float	required	> 0.0, < 180.0	Vertical field of view of each virtual view, in degrees.
`num_views`	int	required	>= 1	Number of evenly spaced horizontal virtual views.
`top`	bool	required	-	Add an upward-looking virtual view.
`bottom`	bool	required	-	Add a downward-looking virtual view.

PostConfig¶

Depth post-processing options.

Field	Type	Default	Constraints	Description
`depth_align_model`	str \| null	required	-	Depth model or alignment recipe used after SLAM. Examples include adaptive_unidepth-l, adaptive_unidepth-l_svda, adaptive_moge_vda, mvd_dav3, dap, and unik3d. Set to null for pose-only output.

OutputConfig¶

Output paths and artifact/visualization controls.

Field	Type	Default	Constraints	Description
`path`	str	required	-	Directory where ViPE writes artifacts, visualization videos, and optional SLAM maps.
`skip_exists`	bool	required	-	Skip a sequence when the expected output already exists.
`save_artifacts`	bool	required	-	Save reusable RGB, pose, intrinsics, depth, and mask artifacts for visualization or downstream use.
`save_slam_map`	bool	`false`	-	Save the sparse SLAM reconstruction map for lightweight COLMAP conversion.
`save_viz`	bool	required	-	Render MP4 visualization videos for the configured visualization attributes.
`viz_downsample`	int	required	>= 1	Downsample factor applied when rendering visualization videos.
`viz_attributes`	list[list[`rgb` \| `instance` \| `depth` \| `pcd` \| `rectified`]]	required	min items 1	Groups of frame attributes to render into visualization videos. Each inner list becomes one panel.

DefaultPipelineConfig¶

Default annotation pipeline for pinhole and wide-angle videos.

Field	Type	Default	Constraints	Description
`instance`	`vipe.pipeline.default.DefaultAnnotationPipeline`	required	fixed `vipe.pipeline.default.DefaultAnnotationPipeline`	Implementation class for the default annotation pipeline.
`init`	DefaultInitConfig	required	-	Initial camera and instance-mask setup.
`slam`	SLAMConfig	required	-	SLAM and bundle-adjustment configuration.
`post`	PostConfig	required	-	Depth alignment and post-processing configuration.
`output`	OutputConfig	required	-	Output artifact and visualization configuration.

PanoramaPipelineConfig¶

Annotation pipeline for 360-degree panorama videos.

Field	Type	Default	Constraints	Description
`instance`	`vipe.pipeline.panorama.PanoramaAnnotationPipeline`	required	fixed `vipe.pipeline.panorama.PanoramaAnnotationPipeline`	Implementation class for the panorama annotation pipeline.
`init`	PanoramaInitConfig	required	-	Initial instance-mask setup for panorama input.
`virtual`	VirtualCameraConfig	required	-	Virtual perspective views projected from each panorama frame.
`slam`	SLAMConfig	required	-	SLAM and bundle-adjustment configuration for virtual views.
`output`	OutputConfig	required	-	Output artifact and visualization configuration.
`post`	PostConfig	required	-	Panorama depth estimation and post-processing configuration.

SLAM¶

SLAMConfig¶

DROID-style SLAM frontend, backend, map extraction, and metric-depth options.

Field	Type	Default	Constraints	Description
`buffer`	int	required	>= 1	Maximum number of keyframes stored in the SLAM graph buffer.
`beta`	float	required	>= 0.0	Relative weighting of translation and rotation when measuring frame motion from optical flow.
`filter_thresh`	float	required	>= 0.0	Motion-filter threshold for accepting an incoming frame as a candidate keyframe.
`warmup`	int	required	>= 0	Number of accepted keyframes used before regular frontend updates begin.
`keyframe_thresh`	float	required	>= 0.0	Frontend motion threshold below which the second-newest keyframe is removed.
`frontend_thresh`	float	required	>= 0.0	Distance threshold for adding frontend proximity edges.
`frontend_window`	int	required	>= 1	Number of recent keyframes kept active in the frontend window.
`frontend_radius`	int	required	>= 1	Frame-neighborhood radius forced into the frontend graph.
`frontend_nms`	int	required	>= 1	Non-max suppression radius for frontend proximity edges.
`seq_init`	bool	required	-	Initialize poses sequentially before regular graph optimization.
`frontend_backend_iters`	list[int]	required	-	Accepted-keyframe counts at which the backend runs during frontend initialization.
`backend_thresh`	float	required	>= 0.0	Distance threshold for adding backend proximity edges.
`backend_radius`	int	required	>= 1	Frame-neighborhood radius forced into the backend graph.
`backend_nms`	int	required	>= 1	Non-max suppression radius for backend proximity edges.
`backend_iters`	int	required	>= 1	Number of backend optimization iterations.
`init_disp`	float	required	> 0.0	Initial inverse-depth value assigned to new keyframes.
`optimize_intrinsics`	bool	required	-	Optimize camera intrinsics during bundle adjustment.
`optimize_rig_rotation`	bool	required	-	Optimize rig rotations for multi-view inputs.
`cross_view`	bool	required	-	Add cross-view reprojection factors for multi-view or panorama-derived inputs.
`cross_view_idx`	list[int] \| null	required	-	Optional cross-view index selection. Set to null to use the default neighboring-view selection.
`adaptive_cross_view`	bool	required	-	Recompute cross-view pairs in the backend using current geometry.
`infill_chunk_size`	int	required	>= 1	Chunk size for dense trajectory and disparity infill.
`infill_dense_disp`	bool	required	-	Also optimize dense disparity while filling non-keyframe outputs.
`map_filter_thresh`	float	required	>= 0.0	Depth-consistency threshold used when filtering SLAM-map points and extracting dense disparity.
`visualize`	bool	required	-	Stream SLAM internals to rerun for debugging.
`keyframe_depth`	str \| null	required	-	Metric depth model used on keyframes to recover scale. Examples include metric3d-small, unidepth-l, moge, and dav3. Set to null to skip keyframe metric-depth recovery.
`ba`	BAConfig	required	-	Bundle-adjustment solver options.
`sparse_tracks`	SparseTracksConfig	required	-	Sparse-track backend options.

BAConfig¶

Bundle-adjustment solver and robust loss options.

Field	Type	Default	Constraints	Description
`dense_disp_alpha`	float	required	>= 0.0	Weight for dense-disparity regularization during bundle adjustment.
`fused`	bool	required	-	Use the fused CUDA bundle-adjustment path; unsupported layouts raise an error.
`intrinsics_damping_scale`	float	required	> 0.0	Multiplier for damping applied to optimized camera intrinsics.
`robust_kernel`	`huber` \| `tukey` \| `gnc_tls` \| null	required	-	Robust loss for dense-flow residuals. Set to null for L2 residuals.
`robust_kernel_threshold`	float	required	> 0.0	Robust-kernel threshold in 1/8-resolution feature-map pixels.
`gnc_mu_init`	float	required	> 0.0	Initial mu value for the GNC-TLS robust-kernel schedule.
`gnc_mu_step`	float	required	> 0.0	Multiplicative mu update step for GNC-TLS.
`gnc_mu_max`	float	required	> 0.0	Maximum mu value for GNC-TLS.
`gnc_n_mu_steps`	int	required	>= 1	Number of GNC-TLS continuation steps.
`gnc_gn_iters_per_mu`	int	required	>= 1	Number of Gauss-Newton iterations to run for each GNC-TLS mu value.

SparseTracksConfig¶

Sparse feature-track backend used to seed or support SLAM.

Field	Type	Default	Constraints	Description
`name`	`dummy` \| `cuvslam`	required	choices `dummy` \| `cuvslam`	Sparse-track provider. dummy disables external sparse tracks; cuvslam uses NVIDIA cuVSLAM.