Data Preparation

<< Click to Display Table of Contents >>

Navigation:  Preparation of Training Data and Neural Network >

Data Preparation

Image Import

The use of a database is a prerequisite for developing an ML model in PAI. Please refer to the PMOD Basic Functionality User Guide  for instructions how to create and use databases. In the example below a database called BraTS was created and the data from the MICCAI BraTS Challenge imported.

Image Association

A training sample consists of one or several image series, and either the segmentation reference result or class for classification from which the neural network should learn. All of these images need to be associated in the database so that when a single image is referenced all related images are identified.

To associate the images, select a subject in the Subjects list and then all series to be associated in the Series list. From the option menu indicated below select Associate Images, which brings up a dialog window confirming association of the selected series.

clip0114

clip0229

To identify which image series is the reference segment map, select it in the list, then the TAG column, and in the menu that appears select the SEGMENT entry. If more than one input image is required for the segmentation, it is important that they always appear in the same order in the association list. Please use the arrow buttons to the right of the list for shifting the position of a selected element.

Existing associations can be checked by selecting one of the image series and activating the button indicated below:

clip0115

Adding a Descriptive Variable or Class for Training (Project Description)

For segmentation projects we strongly recommend adding a descriptive label to the series used for training by defining the Project using Assign to Project | Group. This description will be used to check that new data used for Prediction has the same content as that used for Training.

If a difference in the Project description (or number of studies) in Training/Prediction is detected, warning messages based on the following structure will be returned:

clip0090

For classification projects the Project label is required as assignment of the Class that the sample belongs to. For example, see the Amyloid PET classification Case Study: samples are in either “AD” or “YC” class (Alzheimer’s disease or Young Control) and the model is trained to Predict (with probability) which of these classes a new sample belongs.

Data Cropping

Another part of the data preparation consists of reducing the data volume to the relevant portion. In the brain segmentation example the image should be restricted to the brain. This process can be included in the training set definition by creating a VOI that will serve as the cropping box and associating it with the input data using the same tools as image association.

PAI_img20

To achieve this, open the input image, create a suitable VOI such as a box, position it properly and save it to the database. Then select the input image in the Series list on the DB Load page, followed by Associate VOI from the same menu where the images were associated.

AssociateVOI

In the dialog window which opens select the saved VOI and activate Set Selected.

Automatic Association Creation

The neural network training process requires the preparation of a large number of samples. To make this process easier a mechanism for the automatic association of the images is available. It uses either the Incoming Folder method or batch assignment of Project using database queries. A folder that is regularly checked for data to be imported into the database is defined in the DICOM Server configuration. It takes into account information prepared in a csv file that must also be located in the incoming folder. The structure of such a csv file is illustrated below:

PAI_img22

The label defined in the Project column is assigned to the imported image series. Once imported, Associate Images Automatically can be used to generate the associations. Note that in the example, four images in each sample are used as input for the segmentation according to the requirements for the MICCAI BraTS Challenge. To establish a consistent order, numbers are used in the labels.

PAI_img23

Automatic association may also be used for data already in a database, when advanced database query can be used to easily batch label series. For example, when each subject in the database has only a single image series and single segment series, all image series can be given the Project “input” and all segments the Project “[SEGMENT]”. These two Projects are then selected in the Associate Images Automatically dialog.

clip0183

clip0184