## Overview

We furthermore provide with the data also a benchmark suite covering different aspects of semantic scene understanding at different levels of granularity. To ensure unbiased evaluation of these tasks, we follow the common best practice to use a server-side evaluation of the test set results, which enables us to keep the test set labels private.

Test set evaluation is performed using CodaLab competitions. For each task, we setup a competition handling the submissions and scoring them using the non-public labels for the test set sequences. See the individual competition websites for further details on the participation process. Here, we will only provide a short task description and the leaderboards.

## Semantic Segmentation

See our competition website for more information on the competition and submission process.

Tasks. In semantic segmentation of point clouds, we want to infer the label of each three-dimensional point. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor.

We evaluate two settings for this tasks: methods using a single scan as input and methods using multiple scans as input. With single scan evaluation, we don't distinguish between moving and non-moving objects, i.e., moving and non-moving are mapped to a single class. With multiple scan evaluation, we distinguish between moving and non-moving, which makes the task harder, since the method has to decide if something is dynamic.

Metric. We use mean Jaccard or so-called intersection-over-union (mIoU) over all classes, i.e., $$\frac{1}{C} \sum^C_{c=1} \frac{TP_c}{TP_c + FP_c + FN_c},$$ where $TP_c$ , $FP_c$ , and $FN_c$ correspond to the number of true positive, false positive, and false negative predictions for class $c$, and $C$ is the number of classes.

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: .)

## Panoptic Segmentation

See our competition website for more information on the competition and submission process.

Tasks. In panoptic segmentation of point clouds, we want to infer the label of each three-dimensional point and the instance of so-called thing classes. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor.

Metric. We use the panoptic quality (PQ) proposed by Kirillov et al. defined by $$\frac{1}{C} \sum^C_{c=1} \frac{ \sum_{(\mathcal{S}, \hat{\mathcal{S}}) \in \text{TP}_c} \text{IoU}(\mathcal{S}, \hat{\mathcal{S}})}{|\text{TP}_c| + \frac{1}{2}|\text{FP}_c| + \frac{1}{2}|\text{FN}_c|}$$ where $TP_c$ , $FP_c$ , and $FN_c$ correspond to the set of true positive, false positive, and false negative matches for class $c$, and $C$ is the number of classes. A match between segments is a true positive if their IoU (intersection-over-union) is larger than 0.5. To account for segments of stuff classes that have multiple connected components, Porzi et al. proposed a modified metric PQ$^\dagger$ that uses just the IoU for stuff classes without distinguishing between different segments.

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: April 1st, 2020.)

## Semantic Scene Completion

See our competition website for more information on the competition and submission process.

Task. In semantic scene completion, we are interested in predicting the complete scene inside a certain volume from a single initial scan. More specifically, we use as input a voxel grid, where each voxel is marked as empty or occupied, depending on whether or not it contains a laser measurement. For semantic scene completion, one needs to predict whether a voxel is occupied and its semantic label in the completed scene.

Metric. For evaluation, we follow the evaluation protocol of Song et al. and compute the Intersection-over-Union (IoU) for the task of scene completion, which only classifies a voxel as being occupied or empty, i.e., ignoring the semantic label, as well as mIoU for the task of semantic scene completion over the same 19 classes that were used for the single scan semantic segmentation task.