Thanks to the rapid progress in RGB & thermal imaging, also known as multispectral imaging,
the task of multispectral video semantic segmentation, or MVSS in short,
has recently drawn significant attentions.
Noticeably, it offers new opportunities in improving segmentation performance under unfavorable visual conditions such as poor light or overexposure.
Unfortunately, there are currently very few datasets available,
including for example MVSeg dataset that focuses purely toward eye-level view;
and it features the sparse annotation nature due to the intensive demands of labeling process.
To confront these challenges, this paper presents two major contributions to advance MVSS:
the introduction of MVUAV, a new MVSS benchmark dataset, and the development of a dedicated semi-supervised MVSS baseline - SemiMV.
Our MVUAV dataset is captured via Unmanned Aerial Vehicles (UAV),
which offers a unique oblique bird’s-eye view complementary to the existing MVSS datasets;
it also encompasses a broad range of day/night lighting conditions and over 30 semantic categories.
In the meantime, to better leverage the sparse annotations and extra unlabeled RGB-Thermal videos,
a semi-supervised learning baseline, SemiMV,
is proposed to enforce consistency regularization through a dedicated Cross-collaborative Consistency Learning (C3L) module and a denoised temporal aggregation strategy.
Comprehensive empirical evaluations on both MVSeg and MVUAV benchmark datasets have showcased the efficacy of our SemiMV baseline.
We introduce MVUAV, a new MVSS dataset containing a wide range of RGB-T videos captured by Unmanned Aerial Vehicles (UAVs) from an oblique bird’s-eye viewpoint. This viewpoint offers a complementary perspective to the eye-level viewpoint adopted by existing MVSeg dataset.
MVUAV Examples
The MVUAV dataset captures diverse real-world scenarios such as roads, streets, bridges, parks, seas, beaches, courts and schools; it also spans different lighting conditions from daytime to low-light and even pitch-dark scenarios.
Illustrations of information used in the semi-supervised MVSS (Semi-MVSS) task and related semantic segmentation tasks.
Illustrations of the proposed semi-supervised MVSS framework, namely SemiMV.
We visualize some multispectral video sequences from both the MVSeg and our MVUAV datasets, alongside the segmentation results obtained using the SupOnly baseline and our SemiMV method. Obviously, our SemiMV produces more accurate segmentation predictions by effectively engaging both labeled and unlabeled multispectral videos.