To obtain the data, please follow  instructions under this link. After approval of your request, you will be granted access to the page Data Download to download the data.

DATASET STRUCTURE
Training

The training dataset is organized according to the following folder structure:

hecktor2022_training/
├── imagesTr
├── CHUM-001__CT.nii.gz
├── CHUM-001__PT.nii.gz
└── ...
├── labelsTr
├── CHUM_001.nii.gz
└── ...
├── hecktor2022_clinical_info_training.csv
└── hecktor2022_endpoint_training.csv

All the PET/CT images are gathered inside the imagesTr folder. The name convention is CenterName_PatientID__Modality.nii.gz . The primary tumor (GTVp) and lymph nodes (GTVn) segmentations are inside the labelsTr folder and are contained within one .nii.gz file per patient. The code label 1 is attributed to the GTVp and the label 2 for GTVn.
The clinical information for each patient is contained in the hecktor2022_clinical_info_training.csv, including center, gender, age, weight, tobacco and alcohol consumption, performance status (Zubrod), HPV status, treatment (surgery and/or chemotherapy in addition to the radiotherapy that all patients underwent).  Note that some information may be missing for some patients. The survival events and times between the end of radiotherapy and the events or last follow-up (in days) are provided in hecktor2022_patient_endpoint_training.csv. Only patients with complete responses are included in this file for Task 2, hence a lower number of cases as compared with the number of images and the number of cases in the clinical data file used for Task 1.

Training cases:


CHUM

CHUP

CHUS

CHUV

MDA

HGJ

HMR

Total

Task 1

56

72

72

53

198

55

18

524

Task 2

56

44

72

47

197

55

18

489

Testing

The testing dataset is organized according to the following folder structure:

hecktor2022_testing/
├── imagesTs
    ├── USZ-001__CT.nii.gz
    ├── USZ-001__PT.nii.gz
    └── ...
└── hecktor2022_clinical_info_testing.csv

Testing cases:


MDA

USZ

CHB

Total





Task 1

200

101

58

359





Task 2

200

101

38

339


















DATASET DESCRIPTION

Patients with histologically proven oropharyngeal H&N cancer who underwent radiotherapy and/or chemotherapy treatment planning were considered.

The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.

Data were collected from 9 centers :

Center


Acronym

PET/CT scanner


HECKTOR 2021

Hôpital général juif, Montréal, CA


 HGJ

 Discovery ST, GE Healthcare


 Yes

Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA


 CHUS

 GeminiGXL 16, Philips


 Yes

Hôpital Maisonneuve-Rosemont, Montréal, CA


 HMR

 Discovery STE, GE Healthcare


 Yes

Centre hospitalier de l’Université de Montréal, Montréal, CA


 CHUM

 Discovery STE, GE Healthcare


 Yes

Centre Hospitalier Universitaire Vaudois, CH


 CHUV

 Discovery D690 TOF, GE Healthcare


 Yes

Centre Hospitalier Universitaire de Poitiers, FR


 CHUP

 Biograph mCT 40 ToF, Siemens


 Yes

MD Anderson Cancer Center, Houston, Texas, USA


 MDA

 Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare)


 No

UniversitätsSpital Zürich, CH


 USZ

 Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare)


 No

Centre Henri Becquerel, Rouen, FR


 CHB

 GE710, GE Healthcare


 No

The information on image data includes clinical center, scanner information, DICOM meta-data including acquisition parameters and reconstruction algorithms.

The patient information includes center, age, gender, weight, tobacco and alcohol consumption, performance status, HPV status, treatment (radiotherapy only or additional chemotherapy and/or surgery). TNM stage is not provided as it informs on status that are part of the goal of Task 1. There can be missing values for some patients.

Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For the purpose of Task 1, contours with the annotated ground truth tumors (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label).  For the purpose of Task 2, the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring) are provided.

The total number of cases is 845. The total number of training cases is 524, and 489 for tasks 1 and 2 respectively. The training data come from 7 different centers. No specific validation cases are provided and the training set can be split in any manner for cross-validation. The test cases of the 2021 challenge were merged together with the training cases. The total number of test cases is 356 from 3 centers. All test cases are not public and new to this year’s challenge. The center MDA is represented both in the training and test sets.

Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer (with ~21% of recurrence events and a median recurrence-free survival of 14 months in the training set)

The preprocessing of PET/CT images involved (for both the training and test cases) are (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format. Various functions to load, crop, resample the data, train a baseline CNN and evaluate the results are be available on our GitHub repository: https://github.com/voreille/hecktor.

Patient weight, which is necessary to compute SUVs, was not available for a small subset of patients. We estimated it to 75kg for the following 8 cases: (train) HMR-005, HMR-016, HMR-024, HMR-029, HMR-030, HMR-034 , and  (test) USZ-042, USZ-085 .