In order to be eligible for the official ranking, any submission must be described in a corresponding paper (see section on Paper Submission).

There are two tasks this year :

Task 1: Primary tumor (GTVp) and lymph nodes (GTVn) segmentation in PET/CT images.
Task 2: Recurrence-Free Survival (RFS) prediction relying on PET/CT images and/or available clinical information.

Evaluation of Task 1

Algorithms producing fully-automatic segmentation of the test cases will be assessed.

  1. The predicted segmentation masks should be in the same resolution as the CT and will not be resampled if this is not the case. The expected values are 1 for the predicted GTVp, 2 for GTVn, and 0 for the background.
  2. We will use a metric (aggregated DSC) adapted from the Aggregated Jaccard Index in [Kumar et al. 2017], without the detection step  which is adapted for nuclei segmentation but less relevant for our tasks. This detection step is also sensitive to the order of the iteration on the GT volumes (step 2 in algo 1 of [Kumar et al. 2017]) which is problematic for reproducibility. For both GTVp and GTVn, we will accumulate the intersections and unions between GTVs and the respective predicted volumes across all images. Note that the intersection and union in an image can be zero for both GTVp and GTVn,  as some cases may not contain GTVn or GTVp. In the end, we will divide the aggregated intersection by the aggregated union, both for GTVp and GTVn and will compute the average of these two aggregated Jaccard indices.

, where N is the number of test images,  is the ground truth  (GTVp or GTVn) for voxel k and image i, and  is the prediction.

DSC_agg will be computed separately for GTVp and GTVn on the test set, and the average of the two will be used for the ranking. This choice was made to give equal importance of the two GTV types. The algorithms must perform well on all lesion types.


Evaluation of Task 2

Algorithms producing fully-automated prediction of Recurrence-Free Survival (RFS, see Section Ground Truth for the exact definition) of the test cases will be assessed.

The ranking will be based on the Concordance index (C-index) on the test data. The C-index quantifies the model’s ability to provide an accurate ranking of the survival times based on the computed individual risk scores, generalizing the Area Under the ROC Curve (AUC). It can account for censored data and represents the global assessment of the model discrimination power.


REFERENCES

[Kumar et al. 2017] Kumar N, et al. "A dataset and a technique for generalized nuclear segmentation for computational pathology." IEEE Transactions on Medical Imaging, 36(7): 1550-1560 (2017).