Dataset Loader

Paired Loader

Loads paired image data supports h5 and Nifti formats supports labeled and unlabeled data

class deepreg.dataset.loader.paired_loader.PairedDataLoader(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: str, seed, moving_image_shape: (<class 'list'>, <class 'tuple'>), fixed_image_shape: (<class 'list'>, <class 'tuple'>))

Loads paired data using given file loader Handles both labeled and unlabeled cases The function sample_index_generator needs to be defined for the GeneratorDataLoader class

Parameters
  • file_loader

  • data_dir_paths – path of the directories storing data, the data has to be saved under four different sub-directories: moving_images, fixed_images, moving_labels, fixed_labels

  • labeled – true if the data are labeled

  • sample_label

  • seed

  • moving_image_shape – (width, height, depth)

  • fixed_image_shape – (width, height, depth)

sample_index_generator()

Generate indexes in order to load data using the GeneratorDataLoader class

validate_data_files()

Verify all loaders have the same files

Unpaired Loader

Loads unpaired data supports h5 and Nifti formats supports labeled and unlabeled data

class deepreg.dataset.loader.unpaired_loader.UnpairedDataLoader(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: str, seed, image_shape: (<class 'list'>, <class 'tuple'>))

Loads unpaired data using given file loader, handles both labeled and unlabeled cases The function sample_index_generator needs to be defined for the GeneratorDataLoader class

Load data which are unpaired, labeled or unlabeled

Parameters
  • file_loader

  • data_dir_paths – paths of the directories storing data, the data has to be saved under four different sub-directories: images, labels

  • sample_label

  • seed

  • image_shape – (width, height, depth)

close()

Close the moving files opened by the file_loaders

sample_index_generator()

Generates sample indexes in order to load data using the GeneratorDataLoader class

validate_data_files()

Verify all loader have the same files. Since fixed and moving loaders come from the same file_loader, there’s no need to check both (avoid duplicate)

Grouped Loader

Loads grouped data supports h5 and Nifti formats supports labeled and unlabeled data Read https://deepreg.readthedocs.io/en/latest/api/loader.html#module-deepreg.dataset.loader.grouped_loader for more details.

class deepreg.dataset.loader.grouped_loader.GroupedDataLoader(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: (<class 'str'>, None), intra_group_prob: float, intra_group_option: str, sample_image_in_group: bool, seed: (<class 'int'>, None), image_shape: (<class 'list'>, <class 'tuple'>))

Loads grouped data sample_index_generator from GeneratorDataLoader is defined to yield indexes of images to load AbstractUnpairedLoader handles different file formats

Parameters
  • file_loader – a subclass of FileLoader

  • data_dir_paths

    paths of the directory storing data, the data has to be saved under two different sub-directories:

    • images

    • labels

  • labeled – bool, true if the data is labeled, false if unlabeled

  • sample_label – “sample” or “all”, read get_label_indices in deepreg/dataset/util.py for more details.

  • intra_group_prob

    float between 0 and 1,

    • 0 means generating only inter-group samples,

    • 1 means generating only intra-group samples

  • intra_group_option – str, “forward”, “backward, or “unconstrained”

  • sample_image_in_group

    bool,

    • if true, only one image pair will be yielded for each group, so one epoch has num_groups pairs of data,

    • if false, iterate through this loader will generate all possible pairs

  • seed – controls the randomness in sampling, if seed=None, then the randomness is not fixed

  • image_shape – list or tuple of length 3, corresponding to (dim1, dim2, dim3) of the 3D image

close()

Close file loaders

get_inter_sample_indices() → list

Calculate the sample indices for inter-group sampling The index to identify a sample is (group1, image1, group2, image2), means

  • image1 of group1 is moving image

  • image2 of group2 is fixed image

All pairs of images in the dataset are registered. Assuming group i has ni images, and that N=[n1, n2, …, nI], then in total the number of samples are: sum(N) * (sum(N)-1) - sum( N * (N-1) )

Returns

a list of sample indices

get_intra_sample_indices() → list

Calculate the sample indices for intra-group sampling The index to identify a sample is (group1, image1, group2, image2), means - image1 of group1 is moving image - image2 of group2 is fixed image

Assuming group i has ni images, then in total the number of samples are - sum( ni * (ni-1) / 2 ) for forward/backward - sum( ni * (ni-1) ) for unconstrained

Returns

a list of sample indices

sample_index_generator()

Yield (moving_index, fixed_index, image_indices) sequentially, where

  • moving_index = (group1, image1)

  • fixed_index = (group2, image2)

  • image_indices = [group1, image1, group2, image2]

validate_data_files()

If the data are labeled, verify image loader and label loader have the same files

File Loader

Interface

class deepreg.dataset.loader.interface.FileLoader(dir_paths: list, name: str, grouped: bool)

Interface / abstract class to load data from multiple directories

Parameters
  • dir_paths – path to the directory of the data set

  • name – name is used to identify the subdirectories or file names

  • grouped – true if the data is grouped

close()

Close opened file handles if exist.

get_data(index: (<class 'int'>, <class 'tuple'>))

Get one data array by specifying an index.

Parameters

index

the data index which is required

  • for paired or unpaired, the index is one single int, data_index

  • for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)

Returns

the data array at the specified index

get_data_ids()

Return the unique IDs of the data in this data set. This function is used to verify the consistency between moving and fixed images and label.

get_num_groups() → int

Return the number of groups in grouped data set.

Returns

int, number of groups in this data set, if grouped

get_num_images() → int

Return the number of image in this data set.

Returns

int, number of images in this data set

get_num_images_per_group() → List[int]

Return the number of images in each group. Each group must have at least one image.

Returns

a list of integers, representing the number of images in each group.

set_data_structure()

Store the data structure in the memory so that we can retrieve data using data_index

set_group_structure()

In addition to set_data_structure, store the group structure in the group_struct so that group_struct[group_index] = list of data_index and data can be retrieved data by data_index = group_struct[group_index][in_group_data_index]

Nifti Loader

class deepreg.dataset.loader.nifti_loader.NiftiFileLoader(dir_paths: List[str], name: str, grouped: bool)

Generalized loader for nifti files

Parameters
  • dir_paths – path to the directory of the data set

  • name – name is used to identify the subdirectories or file names

  • grouped – true if the data is grouped

close()

Close opened files

get_data(index: (<class 'int'>, <class 'tuple'>)) → numpy.ndarray

Get one data array by specifying an index

Parameters

index

the data index which is required

  • for paired or unpaired, the index is one single int, data_index

  • for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)

Returns arr

the data array at the specified index

get_data_ids()

Return the unique IDs of the data in this data set this function is used to verify the consistency between images and label, moving and fixed

Returns

data_path_splits but without suffix

get_num_images() → int
Returns

int, number of images in this data set

set_data_structure()

Store the data structure in the memory so that we can retrieve data using data_index this function sets data_path_splits, a list of string tuples to identify path of data

  • if grouped, a split is (dir_path, group_path, file_name, suffix) data is stored in dir_path/name/group_path/file_name.suffix

  • if not grouped, a split is (dir_path, file_name, suffix) data is stored in dir_path/name/file_name.suffix

set_group_structure()

In addition to set_data_structure store the group structure in the group_struct so that group_struct[group_index] = list of data_index we can retrieve data using (group_index, in_group_data_index) data_index = group_struct[group_index][in_group_data_index]

deepreg.dataset.loader.nifti_loader.load_nifti_file(file_path: str) → numpy.ndarray
Parameters

file_path – path of a Nifti file with suffix .nii or .nii.gz

Returns

return the numpy array

H5 Loader

Loads h5 files and some associated information

class deepreg.dataset.loader.h5_loader.H5FileLoader(dir_paths: List[str], name: str, grouped: bool)

Generalized loader for h5 files

Parameters
  • dir_paths – path to the directory of the data set

  • name – name is used to identify the subdirectories or file names

  • grouped – true if the data is grouped

close()

Close opened h5 file handles

get_data(index: (<class 'int'>, <class 'tuple'>)) → numpy.ndarray

Get one data array by specifying an index

Parameters

index

the data index which is required

  • for paired or unpaired, the index is one single int, data_index

  • for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)

Returns arr

the data array at the specified index

get_data_ids()

Return the unique IDs of the data in this data set this function is used to verify the consistency between images and label, moving and fixed

Returns

data_path_splits as the data can be identified using dir_path and data_key

get_num_images() → int
Returns

int, number of images in this data set

set_data_structure()

Store the data structure in the memory so that we can retrieve data using data_index this function sets two attributes

  • h5_files, a dict such that h5_files[dir_path] = opened h5 file handle

  • data_path_splits, a list of string tuples to identify path of data

    • if grouped, a split is (dir_path, group_name, data_key) such that data = h5_files[dir_path][“group-{group_name}-{data_key}”]

    • if not grouped, a split is (dir_path, data_key) such that data = h5_files[dir_path][data_key]

set_group_structure()

Same code as NiftiLoader, as the first two tokens of a split forms a group_id

In addition to set_data_structure store the group structure in the group_struct so that group_struct[group_index] = list of data_index we can retrieve data using (group_index, in_group_data_index) data_index = group_struct[group_index][in_group_data_index]