Overview
We present a large-scale dataset based on the KITTI Vision Benchmark and we used all sequences provided by the odometry task. We provide dense annotations for each individual scan of sequences 00-10, which enables the usage of multiple sequential scans for semantic scene interpretation, like semantic segmentation and semantic scene completion.
The remaining sequences, i.e., sequences 11-21, are used as a test set showing a large variety of challenging traffic situations and environment types. Labels for the test set are not provided and we use an evaluation service that scores submissions and provides test set results.
Classes
The dataset contains 28 classes including classes distinguishing non-moving and moving objects. Overall, our classes cover traffic participants, but also functional classes for ground, like parking areas, sidewalks.
Folder structure and format
Semantic Segmentation and Panoptic Segmentation
We provide for each scan XXXXXX.bin of the velodyne folder in the
sequence folder of the
original KITTI Odometry Benchmark,
a file XXXXXX.label in the labels folder that contains for each point
a label in binary format.
The label is a 32-bit unsigned integer (aka uint32_t
) for each point, where the
lower 16 bits correspond to the label. The upper 16 bits encode the instance id, which is
temporally consistent over the whole sequence, i.e., the same object in two different scans gets
the same id. This also holds for moving cars, but also static objects seen after loop closures.
We furthermore provide the poses.txt file that contains the poses, which we used to annotate the data, estimated by a surfel-based SLAM approach (SuMa).
Semantic Scene Completion
We provide for each scan XXXXXX.bin of the velodyne folder in the sequence folder of the original KITTI Odometry Benchmark, we provide in the voxel folder:
- a file XXXXXX.bin in a packed binary format that contains for each voxel if that voxel is occupied by laser measurements. This is the input to the semantic scene completion task and it corresponds to the voxelization of a single LiDAR scan.
-
a file XXXXXX.label that contains for
each voxel of the completed scene a label in binary format. The label is a
16-bit unsigned integer (aka
uint16_t
) for each voxel. - a file XXXXXX.invalid in a packed binary format that contains for each voxel a flag indicating if that voxel is considered invalid, i.e., the voxel is never directly seen from any position to generate the voxels. These voxels are also not considered in the evaluation.
- a file XXXXXX.occluded in a packed binary format that contains for each voxel a flag that specifies if this voxel is either occupied by LiDAR measurements or occluded by a voxel in line of sight of all poses used to generate the completed scene.
To allow a higher compression rate, we store the binary flags in a custom format, where we store the flags as bit flags,i.e., each byte of the file corresponds to 8 voxels in the unpacked voxel grid. Please see the development kit for further information on how to efficiently read these files using numpy.
See also our development kit for further information on the labels and the reading of the labels using Python. The development kit also provides tools for visualizing the point clouds.
License
Our dataset is based on the KITTI Vision Benchmark and therefore we distribute the data under Creative Commons Attribution-NonCommercial-ShareAlike license. You are free to share and adapt the data, but have to give appropriate credit and may not use the work for commercial purposes.
Specifically you should cite our work (PDF):
@inproceedings{behley2019iccv,
author = {J. Behley and M. Garbade and A. Milioto and J. Quenzel and S. Behnke and C. Stachniss and J. Gall},
title = {{SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences}},
booktitle = {Proc. of the IEEE/CVF International Conf.~on Computer Vision (ICCV)},
year = {2019}
}
But also cite the original KITTI Vision Benchmark:
@inproceedings{geiger2012cvpr,
author = {A. Geiger and P. Lenz and R. Urtasun},
title = {{Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite}},
booktitle = {Proc.~of the IEEE Conf.~on Computer Vision and Pattern Recognition (CVPR)},
pages = {3354--3361},
year = {2012}
}
Download
Semantic Segmentation and Panoptic Segmentation
We only provide the label files and the remaining files must be downloaded from the KITTI Vision Benchmark. In particular, the following steps are needed to get the complete data:
- Download KITTI Odometry Benchmark Velodyne point clouds (80 GB)
- Download KITTI Odometry Benchmark calibration data (1 MB)
- Download SemanticKITTI label data (179 MB)
- Extract everything into the same folder. The folder structure inside the zip files of our labels matches the folder structure of the original data.
Semantic Scene Completion
Note: On August 24, 2020, we updated the data according to an issue with the voxelizer. Ensure that you have version 1.1 of the data!
We provide the voxel grids for learning and inference, which you must download to get the SemanticKITTI voxel data (700 MB). This archive contains the training (all files) and test data (only bin files). Refer to the development kit to see how to read our binary files.
We additionally provide all extracted data for the training set, which can be download here (3.3 GB). This does not contain the test bin files.