Dataset - MICCAI HECKTOR 2022

We are currently trying to extend the agreement with the partner hospitals and cannot give access to the dataset at this time

To obtain the data, please follow instructions under this link. After approval of your request, you will be granted access to the page Data Download to download the data.

DATASET STRUCTURE¶

Training¶

The training dataset is organized according to the following folder structure:

  hecktor2022_training/
  ├── imagesTr
      ├── CHUM-001__CT.nii.gz
      ├── CHUM-001__PT.nii.gz
      └── ...
  ├── labelsTr
      ├── CHUM_001.nii.gz
      └── ...
  ├── hecktor2022_clinical_info_training.csv
  └── hecktor2022_endpoint_training.csv

All the PET/CT images are gathered inside the imagesTr folder. The name convention is CenterName_PatientID__Modality.nii.gz . The primary tumor (GTVp) and lymph nodes (GTVn) segmentations are inside the labelsTr folder and are contained within one .nii.gz file per patient. The code label 1 is attributed to the GTVp and the label 2 for GTVn.
The clinical information for each patient is contained in the hecktor2022_clinical_info_training.csv, including center, gender, age, weight, tobacco and alcohol consumption, performance status (Zubrod), HPV status, treatment (surgery and/or chemotherapy in addition to the radiotherapy that all patients underwent). Note that some information may be missing for some patients. The survival events and times between the end of radiotherapy and the events or last follow-up (in days) are provided in hecktor2022_patient_endpoint_training.csv. Only patients with complete responses are included in this file for Task 2, hence a lower number of cases as compared with the number of images and the number of cases in the clinical data file used for Task 1.

Training cases:

	CHUM	CHUP	CHUS	CHUV	MDA	HGJ	HMR	Total
Task 1	56	72	72	53	198	55	18	524
Task 2	56	44	72	47	197	55	18	489

Testing¶

The testing dataset is organized according to the following folder structure:

  hecktor2022_testing/
  ├── imagesTs
      ├── USZ-001__CT.nii.gz
      ├── USZ-001__PT.nii.gz
      └── ...
  └── hecktor2022_clinical_info_testing.csv

Testing cases:

	MDA	USZ	CHB	Total
Task 1	200	101	58	359
Task 2	200	101	38	339

DATASET DESCRIPTION¶

Patients with histologically proven oropharyngeal H&N cancer who underwent radiotherapy and/or chemotherapy treatment planning were considered.

The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.

Data were collected from 9 centers :

Center	Acronym	PET/CT scanner	HECKTOR 2021
Hôpital général juif, Montréal, CA	HGJ	Discovery ST, GE Healthcare	Yes
Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA	CHUS	GeminiGXL 16, Philips	Yes
Hôpital Maisonneuve-Rosemont, Montréal, CA	HMR	Discovery STE, GE Healthcare	Yes
Centre hospitalier de l’Université de Montréal, Montréal, CA	CHUM	Discovery STE, GE Healthcare	Yes
Centre Hospitalier Universitaire Vaudois, CH	CHUV	Discovery D690 TOF, GE Healthcare	Yes
Centre Hospitalier Universitaire de Poitiers, FR	CHUP	Biograph mCT 40 ToF, Siemens	Yes
MD Anderson Cancer Center, Houston, Texas, USA	MDA	Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare)	No
UniversitätsSpital Zürich, CH	USZ	Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare)	No
Centre Henri Becquerel, Rouen, FR	CHB	GE710, GE Healthcare	No

The information on image data includes clinical center, scanner information, DICOM meta-data including acquisition parameters and reconstruction algorithms.

The patient information includes center, age, gender, weight, tobacco and alcohol consumption, performance status, HPV status, treatment (radiotherapy only or additional chemotherapy and/or surgery). TNM stage is not provided as it informs on status that are part of the goal of Task 1. There can be missing values for some patients.

Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For the purpose of Task 1, contours with the annotated ground truth tumors (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label). For the purpose of Task 2, the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring) are provided.

The total number of cases is 845. The total number of training cases is 524, and 489 for tasks 1 and 2 respectively. The training data come from 7 different centers. No specific validation cases are provided and the training set can be split in any manner for cross-validation. The test cases of the 2021 challenge were merged together with the training cases. The total number of test cases is 356 from 3 centers. All test cases are not public and new to this year’s challenge. The center MDA is represented both in the training and test sets.

Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer (with ~21% of recurrence events and a median recurrence-free survival of 14 months in the training set)

The preprocessing of PET/CT images involved (for both the training and test cases) are (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format. Various functions to load, crop, resample the data, train a baseline CNN and evaluate the results are be available on our GitHub repository: https://github.com/voreille/hecktor.

Patient weight, which is necessary to compute SUVs, was not available for a small subset of patients. We estimated it to 75kg for the following 8 cases: (train) HMR-005, HMR-016, HMR-024, HMR-029, HMR-030, HMR-034 , and (test) USZ-042, USZ-085 .