We are currently trying to extend the agreement with the partner hospitals and cannot give access to the dataset at this time
To obtain the data, please follow instructions under this link. After approval of your request, you will be granted access to the page Data Download to download the data.
DATASET STRUCTURE¶
Training¶
The training dataset is organized according to the following folder structure:
hecktor2022_training/
├── imagesTr
├── CHUM-001__CT.nii.gz
├── CHUM-001__PT.nii.gz
└── ...
├── labelsTr
├── CHUM_001.nii.gz
└── ...
├── hecktor2022_clinical_info_training.csv
└── hecktor2022_endpoint_training.csv
All the PET/CT images are gathered inside the imagesTr folder. The
name convention is CenterName_PatientID__Modality.nii.gz . The
primary tumor (GTVp) and lymph nodes (GTVn) segmentations are inside the
labelsTr folder and are contained within one .nii.gz file per patient.
The code label 1 is attributed to the GTVp and the label 2 for GTVn.
The clinical information for each patient is contained in the
hecktor2022_clinical_info_training.csv, including center, gender, age,
weight, tobacco and alcohol consumption, performance status (Zubrod),
HPV status, treatment (surgery and/or chemotherapy in addition to the
radiotherapy that all patients underwent). Note that some information
may be missing for some patients. The survival events and times between
the end of radiotherapy and the events or last follow-up (in days) are
provided in hecktor2022_patient_endpoint_training.csv. Only patients
with complete responses are included in this file for Task 2, hence a
lower number of cases as compared with the number of images and the
number of cases in the clinical data file used for Task 1.
Training cases:
CHUM |
CHUP |
CHUS |
CHUV |
MDA |
HGJ |
HMR |
Total |
|
Task 1 |
56 |
72 |
72 |
53 |
198 |
55 |
18 |
524 |
Task 2 |
56 |
44 |
72 |
47 |
197 |
55 |
18 |
489 |
Testing¶
The testing dataset is organized according to the following folder structure:
hecktor2022_testing/
├── imagesTs
├── USZ-001__CT.nii.gz
├── USZ-001__PT.nii.gz
└── ...
└── hecktor2022_clinical_info_testing.csv
Testing cases:
MDA |
USZ |
CHB |
Total |
|||||
Task 1 |
200 |
101 |
58 |
359 |
||||
Task 2 |
200 |
101 |
38 |
339 |
DATASET DESCRIPTION¶
Patients with histologically proven oropharyngeal H&N cancer who underwent radiotherapy and/or chemotherapy treatment planning were considered.
The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.
Data were collected from 9 centers :
Center |
Acronym |
PET/CT scanner |
HECKTOR 2021 |
||
Hôpital général juif, Montréal, CA |
HGJ |
Discovery ST, GE Healthcare |
Yes |
||
Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA |
CHUS |
GeminiGXL 16, Philips |
Yes |
||
Hôpital Maisonneuve-Rosemont, Montréal, CA |
HMR |
Discovery STE, GE Healthcare |
Yes |
||
Centre hospitalier de l’Université de Montréal, Montréal, CA |
CHUM |
Discovery STE, GE Healthcare |
Yes |
||
Centre Hospitalier Universitaire Vaudois, CH |
CHUV |
Discovery D690 TOF, GE Healthcare |
Yes |
||
Centre Hospitalier Universitaire de Poitiers, FR |
CHUP |
Biograph mCT 40 ToF, Siemens |
Yes |
||
MD Anderson Cancer Center, Houston, Texas, USA |
MDA |
Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare) |
No |
||
UniversitätsSpital Zürich, CH |
USZ |
Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare) |
No |
||
Centre Henri Becquerel, Rouen, FR |
CHB |
GE710, GE Healthcare |
No |
The information on image data includes clinical center, scanner information, DICOM meta-data including acquisition parameters and reconstruction algorithms.
The patient information includes center, age, gender, weight, tobacco and alcohol consumption, performance status, HPV status, treatment (radiotherapy only or additional chemotherapy and/or surgery). TNM stage is not provided as it informs on status that are part of the goal of Task 1. There can be missing values for some patients.
Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For the purpose of Task 1, contours with the annotated ground truth tumors (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label). For the purpose of Task 2, the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring) are provided.
The total number of cases is 845. The total number of training cases is 524, and 489 for tasks 1 and 2 respectively. The training data come from 7 different centers. No specific validation cases are provided and the training set can be split in any manner for cross-validation. The test cases of the 2021 challenge were merged together with the training cases. The total number of test cases is 356 from 3 centers. All test cases are not public and new to this year’s challenge. The center MDA is represented both in the training and test sets.
Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer (with ~21% of recurrence events and a median recurrence-free survival of 14 months in the training set)
The preprocessing of PET/CT images involved (for both the training and test cases) are (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format. Various functions to load, crop, resample the data, train a baseline CNN and evaluate the results are be available on our GitHub repository: https://github.com/voreille/hecktor.
Patient weight, which is necessary to compute SUVs, was not available for a small subset of patients. We estimated it to 75kg for the following 8 cases: (train) HMR-005, HMR-016, HMR-024, HMR-029, HMR-030, HMR-034 , and (test) USZ-042, USZ-085 .