Overview

We present a large-scale dataset based on the KITTI Vision Benchmark and we used all sequences provided by the odometry task. We provide dense annotations for each individual scan of sequences 00-10, which enables the usage of multiple sequential scans for semantic scene interpretation, like semantic segmentation and semantic scene completion.

The remaining sequences, i.e., sequences 11-21, are used as a test set showing a large variety of challenging traffic situations and environment types. Labels for the test set are not provided and we use an evaluation service that scores submissions and provides test set results.


Classes

The dataset contains 28 classes including classes distinguishing non-moving and moving objects. Overall, our classes cover traffic participants, but also functional classes for ground, like parking areas, sidewalks.


Folder structure and format

Semantic Segmentation and Panoptic Segmentation

We provide for each scan XXXXXX.bin of the velodyne folder in the sequence folder of the original KITTI Odometry Benchmark, a file XXXXXX.label in the labels folder that contains for each point a label in binary format. The label is a 32-bit unsigned integer (aka uint32_t) for each point, where the lower 16 bits correspond to the label. The upper 16 bits encode the instance id, which is temporally consistent over the whole sequence, i.e., the same object in two different scans gets the same id. This also holds for moving cars, but also static objects seen after loop closures.

We furthermore provide the poses.txt file that contains the poses, which we used to annotate the data, estimated by a surfel-based SLAM approach (SuMa).

Semantic Scene Completion

We provide for each scan XXXXXX.bin of the velodyne folder in the sequence folder of the original KITTI Odometry Benchmark, we provide in the voxel folder:

  • a file XXXXXX.bin in a packed binary format that contains for each voxel if that voxel is occupied by laser measurements. This is the input to the semantic scene completion task and it corresponds to the voxelization of a single LiDAR scan.
  • a file XXXXXX.label that contains for each voxel of the completed scene a label in binary format. The label is a 16-bit unsigned integer (aka uint16_t) for each voxel.
  • a file XXXXXX.invalid in a packed binary format that contains for each voxel a flag indicating if that voxel is considered invalid, i.e., the voxel is never directly seen from any position to generate the voxels. These voxels are also not considered in the evaluation.
  • a file XXXXXX.occluded in a packed binary format that contains for each voxel a flag that specifies if this voxel is either occupied by LiDAR measurements or occluded by a voxel in line of sight of all poses used to generate the completed scene.
The blue files () are only given for the training data and the label file must be predicted for the semantic segmentation task.

To allow a higher compression rate, we store the binary flags in a custom format, where we store the flags as bit flags,i.e., each byte of the file corresponds to 8 voxels in the unpacked voxel grid. Please see the development kit for further information on how to efficiently read these files using numpy.

See also our development kit for further information on the labels and the reading of the labels using Python. The development kit also provides tools for visualizing the point clouds.


License

Our dataset is based on the KITTI Vision Benchmark and therefore we distribute the data under Creative Commons Attribution-NonCommercial-ShareAlike license. You are free to share and adapt the data, but have to give appropriate credit and may not use the work for commercial purposes.

Specifically you should cite our work (PDF):

@inproceedings{behley2019iccv,
  author = {J. Behley and M. Garbade and A. Milioto and J. Quenzel and S. Behnke and C. Stachniss and J. Gall},
  title = {{SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences}},
  booktitle = {Proc. of the IEEE/CVF International Conf.~on Computer Vision (ICCV)},
  year = {2019}
}

But also cite the original KITTI Vision Benchmark:

@inproceedings{geiger2012cvpr,
  author = {A. Geiger and P. Lenz and R. Urtasun},
  title = {{Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite}},
  booktitle = {Proc.~of the IEEE Conf.~on Computer Vision and Pattern Recognition (CVPR)},
  pages = {3354--3361},
  year = {2012}
}

Download

Semantic Segmentation and Panoptic Segmentation

We only provide the label files and the remaining files must be downloaded from the KITTI Vision Benchmark. In particular, the following steps are needed to get the complete data:

  1. Download KITTI Odometry Benchmark Velodyne point clouds (80 GB)
  2. Download KITTI Odometry Benchmark calibration data (1 MB)
  3. Download label data (179 MB)
  4. Extract everything into the same folder. The folder structure inside the zip files of our labels matches the folder structure of the original data.

Semantic Scene Completion

Note: On August 24, 2020, we updated the data according to an issue with the voxelizer. Ensure that you have version 1.1 of the data!

We provide the voxel grids for learning and inference, which you must download to get the voxel data (700 MB). This archive contains the training (all files) and test data (only bin files). Refer to the development kit to see how to read our binary files.

We additionally provide all extracted data for the training set, which can be download here (3.3 GB). This does not contain the test bin files.