Dataset Loader

Paired Loader

Load paired image data. Supported formats: h5 and Nifti. Image data can be labeled or unlabeled.

class deepreg.dataset.loader.paired_loader.PairedDataLoader(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: str, seed, moving_image_shape: Union[Tuple[int, ], List[int]], fixed_image_shape: Union[Tuple[int, ], List[int]])

Load paired data using given file loader. The function sample_index_generator needs to be defined for the GeneratorDataLoader class.

  • file_loader

  • data_dir_paths – path of the directories storing data, the data has to be saved under four different sub-directories: moving_images, fixed_images, moving_labels, fixed_labels

  • labeled – true if the data are labeled

  • sample_label

  • seed

  • moving_image_shape – (width, height, depth)

  • fixed_image_shape – (width, height, depth)


Generate indexes in order to load data using the GeneratorDataLoader class.


Verify all loaders have the same files.

Unpaired Loader

Load unpaired data. Supported formats: h5 and Nifti. Image data can be labeled or unlabeled.

class deepreg.dataset.loader.unpaired_loader.UnpairedDataLoader(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: str, seed: int, image_shape: Union[Tuple[int, ], List[int]])

Load unpaired data using given file loader. Handles both labeled and unlabeled cases. The function sample_index_generator needs to be defined for the GeneratorDataLoader class.

Load data which are unpaired, labeled or unlabeled.

  • file_loader

  • data_dir_paths – paths of the directories storing data, the data are saved under four different sub-directories: images, labels

  • labeled – whether the data is labeled.

  • sample_label

  • seed

  • image_shape – (width, height, depth)


Close the moving files opened by the file_loaders.


Generates sample indexes to load data using the GeneratorDataLoader class.


Verify all loader have the same files. Since fixed and moving loaders come from the same file_loader, there is no need to check both (avoid duplicate).

Grouped Loader

Load grouped data. Supported formats: h5 and Nifti. Image data can be labeled or unlabeled. Read for more details.

class deepreg.dataset.loader.grouped_loader.GroupedDataLoader(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: Optional[str], intra_group_prob: float, intra_group_option: str, sample_image_in_group: bool, seed: Optional[int], image_shape: Union[Tuple[int, ], List[int]])

Load grouped data.

Yield indexes of images to load using sample_index_generator from GeneratorDataLoader. AbstractUnpairedLoader handles different file formats

  • file_loader – a subclass of FileLoader

  • data_dir_paths

    paths of the directory storing data, the data has to be saved under two different sub-directories:

    • images

    • labels

  • labeled – bool, true if the data is labeled, false if unlabeled

  • sample_label – “sample” or “all”, read get_label_indices in deepreg/dataset/ for more details.

  • intra_group_prob

    float between 0 and 1,

    • 0 means generating only inter-group samples,

    • 1 means generating only intra-group samples

  • intra_group_option – str, “forward”, “backward, or “unconstrained”

  • sample_image_in_group


    • if true, only one image pair will be yielded for each group, so one epoch has num_groups pairs of data,

    • if false, iterate through this loader will generate all possible pairs

  • seed – controls the randomness in sampling, if seed=None, then the randomness is not fixed

  • image_shape – list or tuple of length 3, corresponding to (dim1, dim2, dim3) of the 3D image


Close file loaders


Calculate the sample indices for inter-group sampling The index to identify a sample is (group1, image1, group2, image2), means

  • image1 of group1 is moving image

  • image2 of group2 is fixed image

All pairs of images in the dataset are registered. Assuming group i has ni images and that N=[n1, n2, …, nI], then in total the number of samples are: sum(N) * (sum(N)-1) - sum( N * (N-1) )


a list of sample indices


Calculate the sample indices for intra-group sampling The index to identify a sample is (group1, image1, group2, image2), means - image1 of group1 is moving image - image2 of group2 is fixed image

Assuming group i has ni images, then in total the number of samples are - sum( ni * (ni-1) / 2 ) for forward/backward - sum( ni * (ni-1) ) for unconstrained


a list of sample indices


Yield (moving_index, fixed_index, image_indices) sequentially, where

  • moving_index = (group1, image1)

  • fixed_index = (group2, image2)

  • image_indices = [group1, image1, group2, image2]


If the data are labeled, verify image loader and label loader have the same files.

File Loader


class deepreg.dataset.loader.interface.FileLoader(dir_paths: list, name: str, grouped: bool)

Interface / abstract class to load data from multiple directories.

  • dir_paths – path to the directory of the data set

  • name – name is used to identify the subdirectories or file names

  • grouped – true if the data is grouped


Close opened file handles if exist.

get_data(index: Union[int, Tuple[int, ]])numpy.ndarray

Get one data array by specifying an index.



the data index which is required

  • for paired or unpaired, the index is one single int, data_index

  • for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)


the data array at the specified index


Return the unique IDs of the data in this data set. This function is used to verify the consistency between moving and fixed images and label.


Return the number of groups in grouped data set.


int, number of groups in this data set, if grouped


Return the number of image in this data set.


int, number of images in this data set


Return the number of images in each group. Each group must have at least one image.


a list of integers, representing the number of images in each group.


Store the data structure in memory to retrieve data using data_index.


In addition to set_data_structure, store the group structure in the group_struct so that group_struct[group_index] = list of data_index and data can be retrieved data by data_index = group_struct[group_index][in_group_data_index]

Nifti Loader

class deepreg.dataset.loader.nifti_loader.NiftiFileLoader(dir_paths: List[str], name: str, grouped: bool)

Generalized loader for nifti files.


  • dir_paths – path of directories having nifti files.

  • name – name is used to identify the subdirectories.

  • grouped – whether the data is grouped.


Close opened files.

get_data(index: Union[int, Tuple[int, ]])numpy.ndarray

Get one data array by specifying an index



the data index which is required

  • for paired or unpaired, the index is one single int, data_index

  • for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)

Returns arr

the data array at the specified index


Return the unique IDs of the data in this data set this function is used to verify the consistency between images and label, moving and fixed.


data_path_splits but without suffix


int, number of images in this data set


Store the data structure in the memory so that we can retrieve data using data_index this function sets data_path_splits, a list of string tuples to identify path of data

  • if grouped, a split is (dir_path, group_path, file_name, suffix) data is stored in dir_path/name/group_path/file_name.suffix

  • if not grouped, a split is (dir_path, file_name, suffix) data is stored in dir_path/name/file_name.suffix


In addition to set_data_structure store the group structure in the group_struct so that group_struct[group_index] = list of data_index we can retrieve data using (group_index, in_group_data_index) data_index = group_struct[group_index][in_group_data_index]

deepreg.dataset.loader.nifti_loader.load_nifti_file(file_path: str)numpy.ndarray

file_path – path of a Nifti file with suffix .nii or .nii.gz


return the numpy array

H5 Loader

Load h5 files and associated information.

class deepreg.dataset.loader.h5_loader.H5FileLoader(dir_paths: List[str], name: str, grouped: bool)

Generalized loader for h5 files.


  • dir_paths – path of h5 files.

  • name – name is used to identify the file names.

  • grouped – whether the data is grouped.


Close opened h5 file handles.

get_data(index: Union[int, Tuple[int, ]])numpy.ndarray

Get one data array by specifying an index



the data index which is required

  • for paired or unpaired, the index is one single int, data_index

  • for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)

Returns arr

the data array at the specified index


Get the unique IDs of data in this data set to verify consistency between images and label, moving and fixed.


data_path_splits as the data can be identified using dir_path and data_key


int, number of images in this data set


Store the data structure in memory so that we can retrieve data using data_index. This function sets two attributes:

  • h5_files, a dict such that h5_files[dir_path] = opened h5 file handle

  • data_path_splits, a list of string tuples to identify path of data

    • if grouped, a split is (dir_path, group_name, data_key) such that data = h5_files[dir_path][“group-{group_name}-{data_key}”]

    • if not grouped, a split is (dir_path, data_key) such that data = h5_files[dir_path][data_key]


Similar to NiftiLoader as the first two tokens of a split forms a group_id. Store the group structure in group_struct so that group_struct[group_index] = list of data_index. Retrieve data using (group_index, in_group_data_index). data_index = group_struct[group_index][in_group_data_index].