Design Experiments

DeepReg dataset loaders use a folder/directory-based file storing approach, with which the user will be responsible for organising image and label files in required file formats and folders. This design was primarily motivated by the need to minimise the risk of data leakage (or information leakage), both in code development and subsequent applications.

Random-split

Every call of the deepreg_train or deepreg_predict function uses a dataset “physically” separated by folders, including ‘train’, ‘val’ and ‘test’ sets used in a random-split experiment. In this case, the user needs to randomly assign available experiment image and label files into the three folders. Again, for more details see the Dataset loader.

Cross-validation

Experiments such as cross-validation can be readily implemented by using the “multi-folder support” in the dataset section of the yaml configuration files. See details in configuration.

For example, in a 3-fold cross-validation, the user may randomly partition available experiment data files into four folders, ‘fold0’, ‘fold1’, ‘fold2’ and ‘test’. The ‘test’ is a hold-out testing set. Each run of the 3-fold cross-validation then can be specified in a different yaml file as follows.

“cv_run1.yaml”:

dataset:
  dir:
    train: # training data set
      - "data/test/h5/paired/fold0"
      - "data/test/h5/paired/fold1"
    valid: "data/test/h5/paired/fold2" # validation data set
    test: ""

“cv_run2.yaml”:

dataset:
  dir:
    train: # training data set
      - "data/test/h5/paired/fold0"
      - "data/test/h5/paired/fold2"
    valid: "data/test/h5/paired/fold1" # validation data set
    test: ""

“cv_run3.yaml”:

dataset:
  dir:
    train: # training data set
      - "data/test/h5/paired/fold1"
      - "data/test/h5/paired/fold2"
    valid: "data/test/h5/paired/fold0" # validation data set
    test: ""

To further facilitate flexible uses of these dataset loaders, the deepreg_train and deepreg_predict functions also accept multiple yaml files - therefore the same train section does not have to be repeated multiple times for the multiple cross-validation folds or for the test. An example dataset section for configuring testing when using deepreg_predict is given below.

“test.yaml”:

dataset:
  dir:
    train: ""
    valid: ""
    test: "data/test/h5/paired/test" # validation data set