Dataset Loader¶
Paired Loader¶
Load paired image data. Supported formats: h5 and Nifti. Image data can be labeled or unlabeled.
-
class
deepreg.dataset.loader.paired_loader.
PairedDataLoader
(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: str, seed, moving_image_shape: Union[Tuple[int, …], List[int]], fixed_image_shape: Union[Tuple[int, …], List[int]])¶ Load paired data using given file loader. The function sample_index_generator needs to be defined for the GeneratorDataLoader class.
- Parameters
file_loader –
data_dir_paths – path of the directories storing data, the data has to be saved under four different sub-directories: moving_images, fixed_images, moving_labels, fixed_labels
labeled – true if the data are labeled
sample_label –
seed –
moving_image_shape – (width, height, depth)
fixed_image_shape – (width, height, depth)
-
sample_index_generator
()¶ Generate indexes in order to load data using the GeneratorDataLoader class.
-
validate_data_files
()¶ Verify all loaders have the same files.
Unpaired Loader¶
Load unpaired data. Supported formats: h5 and Nifti. Image data can be labeled or unlabeled.
-
class
deepreg.dataset.loader.unpaired_loader.
UnpairedDataLoader
(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: str, seed: int, image_shape: Union[Tuple[int, …], List[int]])¶ Load unpaired data using given file loader. Handles both labeled and unlabeled cases. The function sample_index_generator needs to be defined for the GeneratorDataLoader class.
Load data which are unpaired, labeled or unlabeled.
- Parameters
file_loader –
data_dir_paths – paths of the directories storing data, the data are saved under four different sub-directories: images, labels
labeled – whether the data is labeled.
sample_label –
seed –
image_shape – (width, height, depth)
-
close
()¶ Close the moving files opened by the file_loaders.
-
sample_index_generator
()¶ Generates sample indexes to load data using the GeneratorDataLoader class.
-
validate_data_files
()¶ Verify all loader have the same files. Since fixed and moving loaders come from the same file_loader, there is no need to check both (avoid duplicate).
Grouped Loader¶
Load grouped data. Supported formats: h5 and Nifti. Image data can be labeled or unlabeled. Read https://deepreg.readthedocs.io/en/latest/api/loader.html#module-deepreg.dataset.loader.grouped_loader for more details.
-
class
deepreg.dataset.loader.grouped_loader.
GroupedDataLoader
(file_loader, data_dir_paths: List[str], labeled: bool, sample_label: Optional[str], intra_group_prob: float, intra_group_option: str, sample_image_in_group: bool, seed: Optional[int], image_shape: Union[Tuple[int, …], List[int]])¶ Load grouped data.
Yield indexes of images to load using sample_index_generator from GeneratorDataLoader. AbstractUnpairedLoader handles different file formats
- Parameters
file_loader – a subclass of FileLoader
data_dir_paths –
paths of the directory storing data, the data has to be saved under two different sub-directories:
images
labels
labeled – bool, true if the data is labeled, false if unlabeled
sample_label – “sample” or “all”, read get_label_indices in deepreg/dataset/util.py for more details.
intra_group_prob –
float between 0 and 1,
0 means generating only inter-group samples,
1 means generating only intra-group samples
intra_group_option – str, “forward”, “backward, or “unconstrained”
sample_image_in_group –
bool,
if true, only one image pair will be yielded for each group, so one epoch has num_groups pairs of data,
if false, iterate through this loader will generate all possible pairs
seed – controls the randomness in sampling, if seed=None, then the randomness is not fixed
image_shape – list or tuple of length 3, corresponding to (dim1, dim2, dim3) of the 3D image
-
close
()¶ Close file loaders
-
get_inter_sample_indices
() → list¶ Calculate the sample indices for inter-group sampling The index to identify a sample is (group1, image1, group2, image2), means
image1 of group1 is moving image
image2 of group2 is fixed image
All pairs of images in the dataset are registered. Assuming group i has ni images and that N=[n1, n2, …, nI], then in total the number of samples are: sum(N) * (sum(N)-1) - sum( N * (N-1) )
- Returns
a list of sample indices
-
get_intra_sample_indices
() → list¶ Calculate the sample indices for intra-group sampling The index to identify a sample is (group1, image1, group2, image2), means - image1 of group1 is moving image - image2 of group2 is fixed image
Assuming group i has ni images, then in total the number of samples are - sum( ni * (ni-1) / 2 ) for forward/backward - sum( ni * (ni-1) ) for unconstrained
- Returns
a list of sample indices
-
sample_index_generator
()¶ Yield (moving_index, fixed_index, image_indices) sequentially, where
moving_index = (group1, image1)
fixed_index = (group2, image2)
image_indices = [group1, image1, group2, image2]
-
validate_data_files
()¶ If the data are labeled, verify image loader and label loader have the same files.
File Loader¶
Interface¶
-
class
deepreg.dataset.loader.interface.
FileLoader
(dir_paths: list, name: str, grouped: bool)¶ Interface / abstract class to load data from multiple directories.
- Parameters
dir_paths – path to the directory of the data set
name – name is used to identify the subdirectories or file names
grouped – true if the data is grouped
-
close
()¶ Close opened file handles if exist.
-
get_data
(index: Union[int, Tuple[int, …]]) → numpy.ndarray¶ Get one data array by specifying an index.
- Parameters
index –
the data index which is required
for paired or unpaired, the index is one single int, data_index
for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)
- Returns
the data array at the specified index
-
get_data_ids
() → List¶ Return the unique IDs of the data in this data set. This function is used to verify the consistency between moving and fixed images and label.
-
get_num_groups
() → int¶ Return the number of groups in grouped data set.
- Returns
int, number of groups in this data set, if grouped
-
get_num_images
() → int¶ Return the number of image in this data set.
- Returns
int, number of images in this data set
-
get_num_images_per_group
() → List[int]¶ Return the number of images in each group. Each group must have at least one image.
- Returns
a list of integers, representing the number of images in each group.
-
set_data_structure
()¶ Store the data structure in memory to retrieve data using data_index.
-
set_group_structure
()¶ In addition to set_data_structure, store the group structure in the group_struct so that group_struct[group_index] = list of data_index and data can be retrieved data by data_index = group_struct[group_index][in_group_data_index]
Nifti Loader¶
-
class
deepreg.dataset.loader.nifti_loader.
NiftiFileLoader
(dir_paths: List[str], name: str, grouped: bool)¶ Generalized loader for nifti files.
Init.
- Parameters
dir_paths – path of directories having nifti files.
name – name is used to identify the subdirectories.
grouped – whether the data is grouped.
-
close
()¶ Close opened files.
-
get_data
(index: Union[int, Tuple[int, …]]) → numpy.ndarray¶ Get one data array by specifying an index
- Parameters
index –
the data index which is required
for paired or unpaired, the index is one single int, data_index
for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)
- Returns arr
the data array at the specified index
-
get_data_ids
() → List¶ Return the unique IDs of the data in this data set this function is used to verify the consistency between images and label, moving and fixed.
- Returns
data_path_splits but without suffix
-
get_num_images
() → int¶ - Returns
int, number of images in this data set
-
set_data_structure
()¶ Store the data structure in the memory so that we can retrieve data using data_index this function sets data_path_splits, a list of string tuples to identify path of data
if grouped, a split is (dir_path, group_path, file_name, suffix) data is stored in dir_path/name/group_path/file_name.suffix
if not grouped, a split is (dir_path, file_name, suffix) data is stored in dir_path/name/file_name.suffix
-
set_group_structure
()¶ In addition to set_data_structure store the group structure in the group_struct so that group_struct[group_index] = list of data_index we can retrieve data using (group_index, in_group_data_index) data_index = group_struct[group_index][in_group_data_index]
-
deepreg.dataset.loader.nifti_loader.
load_nifti_file
(file_path: str) → numpy.ndarray¶ - Parameters
file_path – path of a Nifti file with suffix .nii or .nii.gz
- Returns
return the numpy array
H5 Loader¶
Load h5 files and associated information.
-
class
deepreg.dataset.loader.h5_loader.
H5FileLoader
(dir_paths: List[str], name: str, grouped: bool)¶ Generalized loader for h5 files.
Init.
- Parameters
dir_paths – path of h5 files.
name – name is used to identify the file names.
grouped – whether the data is grouped.
-
close
()¶ Close opened h5 file handles.
-
get_data
(index: Union[int, Tuple[int, …]]) → numpy.ndarray¶ Get one data array by specifying an index
- Parameters
index –
the data index which is required
for paired or unpaired, the index is one single int, data_index
for grouped, the index is a tuple of two ints, (group_index, in_group_data_index)
- Returns arr
the data array at the specified index
-
get_data_ids
() → List¶ Get the unique IDs of data in this data set to verify consistency between images and label, moving and fixed.
- Returns
data_path_splits as the data can be identified using dir_path and data_key
-
get_num_images
() → int¶ - Returns
int, number of images in this data set
-
set_data_structure
()¶ Store the data structure in memory so that we can retrieve data using data_index. This function sets two attributes:
h5_files, a dict such that h5_files[dir_path] = opened h5 file handle
data_path_splits, a list of string tuples to identify path of data
if grouped, a split is (dir_path, group_name, data_key) such that data = h5_files[dir_path][“group-{group_name}-{data_key}”]
if not grouped, a split is (dir_path, data_key) such that data = h5_files[dir_path][data_key]
-
set_group_structure
()¶ Similar to NiftiLoader as the first two tokens of a split forms a group_id. Store the group structure in group_struct so that group_struct[group_index] = list of data_index. Retrieve data using (group_index, in_group_data_index). data_index = group_struct[group_index][in_group_data_index].