Authors: Alexandre Hutton $^{1}$, Maria Teleńczuk $^{2}$, Swetha Shanker $^{2}$, Guillaume Lemaitre $^{2}$, François Caud $^{2}$, Alexandre Gramfort $^{2}$, Sook-Lei Liew $^{1}$
$^{1}$: Neural Plasticity & Neurorehabilitation Lab (Univ. of Southern California), USA
$^{2}$: Paris-Saclay Center for Data Science (Dataia - Univ. Paris Saclay - Inria), France
Stroke is the leading cause of adult disability worldwide, with up to two-thirds of affected individuals experiencing long-term disabilities. Large-scale neuroimaging studies have shown promise in identifying robust biomarkers (e.g. measures of brain structure) of long-term stroke recovery following rehabilitation. However, analyzing large rehabilitation-related datasets is problematic due to barriers in accurate stroke lesion segmentation. Manually-traced lesions are currently the gold standard for lesion segmentation on T1-weighted MRIs, but require anatomical expertise and are labor-intensive. Further, manual segmentation is subjective with raters producing different results. While algorithms have been developed to automate this process, the resulting lesion masks often lack the accuracy needed to make them reliable information. Newer algorithms that employ machine-learning and deep learning techniques are promising avenues, but they require large, diverse datasets for training and testing and developing generalizable models. In this challenge, training can be performed on our public ATLAS 2.0 dataset, and testing is done with a private dataset comprised of multi-site data from the same sites as ATLAS 2.0.
For more information refer to: Anatomical Tracings of Lesions After Stroke
In this challenge you will be given 3D medical images (T1 MRI scans in nii.gz
format) of stroke patients and the files with the corresponding lesions (binary masks) traced manually by experts. Your algorithm will be scored on the overlap between your prediction and the segmentation mask.
The following cell will install the required package dependencies, if necessary. You can examine the file, requirements.txt
, included in the repo to view the list of dependencies.
NOTE: Due to the structure of the challenge, libraries not included in requirements.txt
will need to be added via a pull request to the GitHub repo.
import sys
!{sys.executable} -m pip install -r requirements.txt
To get this notebook running and test your models locally using ramp-test
(from ramp-workflow), we recommend that you use the Anaconda or Miniconda Python distribution.
You will need four things to be able to get a local copy of the data:
You can get a local (encrypted) copy of the data by going to the NITRC website or running the following cell. The relevant code is commented to prevent accidentally downloading the (large) data. Make sure to uncomment the last line.
Note that the data is approx. 15GB; ensure that you have enough space.
import download_data
# This will take some time; you can skip data validation by setting check_hash=False, but that is not recommended.
### Uncomment this line to download the data:
# download_data.data_fetch(check_hash=True)
The data is encrypted; to obtain the password to decrypt the data, you'll need to agree to the terms of use found on this form. Please be careful not to close the tab before acquiring the decryption key.
If you are running either Linux or MacOS, you can run the following cell. You will be prompted for the password obtained in step 2; you can copy+paste it.
import getpass, subprocess
## Decrypt the data.
subprocess.call(['openssl', 'aes-256-cbc', '-md', 'sha256',
'-d', '-a', '-in',
'ATLAS_R2.0_encrypted.tar.gz', '-out', 'ATLAS_R2.0.tar.gz',
'-pass', f'pass:{getpass.getpass("Enter password")}'])
subprocess.call(['tar', '-xzf', 'ATLAS_R2.0.tar.gz'])
*** WARNING : deprecated key derivation used. Using -iter or -pbkdf2 would be better.
0
For Windows, you'll need to install OpenSSL, open the terminal, then run the following from that window:
openssl aes-256-cbc -md sha256 -d -a -in ATLAS_R2.0_encrypted.tar.gz -out ATLAS_R2.0.tar.gz
Next, extract the .tar.gz archive. You should be left with a directory called ATLAS_2 once it is extracted.
Unfortunately, the data provided via INDI is not compatible with PyBIDS, which is used throughout the challenge. You can run the following cell to correct the formatting; update the values as appropriate. WARNING: This will move the data files on your disk.
from stroke import indi_reformat
## Uncomment the following lines to correct the formatting.
atlas_2_path = 'ATLAS_R2.0/ATLAS_2' # Assumes that it is in the current directory. Set this to the location on your drive.
destination = 'data' # Assumes that we're in the stroke/ directory.
indi_reformat.bidsify_indi_atlas(atlas_path=atlas_2_path, destination_path=destination)
In your data
folder you should now be able to find the test
and train
directories, each containing BIDS directories. Subject directories contain two files:
sub-X_ses-Y_space-MNI152NLin2009aSym_T1w.nii.gz
sub-X_ses-Y_space-MNI152NLin2009aSym_label-L_desc-T1lesion_mask.nii.gz
If you wish to view any of those files from outside of Python, you can use applications like ITK-snap. There, you can load the structural file as the main image and the lesion mask as a segmentation to overlap the two.
To load the images to Python, we will be using PyBIDS. See this link to read details about BIDS.
If you want to use a libarary which is not listed in requirements.txt
or extra_libraries.txt
, please make a pull request to the the Stroke Lesion Segmentation Challenge repository on GitHub by adding the required library to extra_libraries.txt file).
import bids
from bids import BIDSLayout
bids.config.set_option('extension_initial_dot', True) # Warning suppression
data = BIDSLayout('data/train', derivatives=True).derivatives['ATLAS']
print(data)
/home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/bids/config.py:39: FutureWarning: Setting 'extension_initial_dot' will be removed in pybids 0.16. warnings.warn("Setting 'extension_initial_dot' will be removed in pybids 0.16.",
BIDS Layout: ...s/data/train/derivatives/ATLAS | Subjects: 436 | Sessions: 436 | Runs: 0
We see that there are 436 subjects in the training set with a total of 436 sessions.
# Pick random sample
import random
subjects = data.entities['subject'].unique()
sample = random.choice(subjects)
structural = data.get(subject=sample, suffix='T1w')[0]
mask = data.get(subject=sample, suffix='mask')[0]
print(structural.filename)
print(mask.filename)
sub-r031s022_ses-1_space-MNI152NLin2009aSym_T1w.nii.gz sub-r031s022_ses-1_space-MNI152NLin2009aSym_label-L_desc-T1lesion_mask.nii.gz
%matplotlib inline
from stroke.nii_slice import qc_slice
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = [14, 6]
qc_slice(structural.path, mask_path=mask.path, nslices=5)
# Beautify plot
plt.text(0,0, f'{sample}', bbox={'facecolor': 'white'}); # Label which subject
plt.tick_params(axis='both', labelbottom=False, labelleft=False)
/home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/bids/config.py:39: FutureWarning: Setting 'extension_initial_dot' will be removed in pybids 0.16. warnings.warn("Setting 'extension_initial_dot' will be removed in pybids 0.16.", /home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/bids/config.py:39: FutureWarning: Setting 'extension_initial_dot' will be removed in pybids 0.16. warnings.warn("Setting 'extension_initial_dot' will be removed in pybids 0.16.", /home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/bids/config.py:39: FutureWarning: Setting 'extension_initial_dot' will be removed in pybids 0.16. warnings.warn("Setting 'extension_initial_dot' will be removed in pybids 0.16.",
In the above image, we see slices for each plane. The masked stroke is denoted in red, although the slices may not intersect the lesion mask for every subject (re-run the last two cells to display a new subject). Some preprocessing has already been done on the images; you can review the preprocessing steps here.
Due to the unreliable performance of non-linear registration with subjects with pathologies, the data was registered linearly. If you examine multiple subjects, you may notice that although the image dimensions are the same and the brains are roughly in the right area, there are differences between subjects (e.g., dimensions of the head, location of specific structures). This is typical of neuroimaging datasets.
Before we look closer at the lesions, let's look at some of the image properties:
print(f'Shape of structural scan: {structural.get_image().shape}')
print(structural.get_image().header)
Shape of structural scan: (197, 233, 189) <class 'nibabel.nifti1.Nifti1Header'> object, endian='<' sizeof_hdr : 348 data_type : b'' db_name : b'' extents : 0 session_error : 0 regular : b'' dim_info : 0 dim : [ 3 197 233 189 1 1 1 1] intent_p1 : 0.0 intent_p2 : 0.0 intent_p3 : 0.0 intent_code : none datatype : float32 bitpix : 32 slice_start : 0 pixdim : [1. 1. 1. 1. 0. 1. 1. 1.] vox_offset : 0.0 scl_slope : nan scl_inter : nan slice_end : 0 slice_code : unknown xyzt_units : 10 cal_max : 0.0 cal_min : 0.0 slice_duration : 0.0 toffset : 0.0 glmax : 0 glmin : 0 descrip : b'FreeSurfer Jan 18 2017' aux_file : b'' qform_code : scanner sform_code : scanner quatern_b : 0.0 quatern_c : 0.0 quatern_d : 0.0 qoffset_x : -98.0 qoffset_y : -134.0 qoffset_z : -72.0 srow_x : [ 1. 0. 0. -98.] srow_y : [ 0. 1. 0. -134.] srow_z : [ 0. 0. 1. -72.] intent_name : b'' magic : b'n+1'
The previous cell displays some of the metadata about the file. You can read here for more information, but the shape of the data is most relevant. Every image contains 197x233x189 (8.6M) voxels. Consequently, it is not usually possible to load the entire dataset into memory at once. We will see the effects of this in later sections.
Loading individual files is fine for examining the data, but we need to ensure that our data and masks are paired, even if new sessions or modalities are added later. We've included a loader for this in bids_loader.py
, as BIDSLoader
. We can look at its docstrings to see the expected input:
from stroke.bids_loader import BIDSLoader
help(BIDSLoader)
Help on class BIDSLoader in module stroke.bids_loader: class BIDSLoader(builtins.object) | BIDSLoader(root_dir: str, data_entities: list, target_entities: list, batch_size: int = 1, data_derivatives_names: list = None, target_derivatives_names: list = None, root_list: list = None, label_names: list = None) | | BIDS-compatible data loader used for classifying BIDS datasets. | Parameters | ---------- | root_dir : str | BIDS root directory; subject directories should be immediately below this (e.g. root_dir/sub-123) | data_entities : list [dict] | List of dictionaries, where each dictionary contains BIDS entities that will uniquely match data. Multiple | dictionaries should be used to if multiple files will be used for prediction. | Empty entries ({'subject': ''}) indicate that that entry should match across samples (e.g., [{'session': ''}] | would ensure that entities from a returned sample are from the same session, but that any value is valid. | For example: ({'subject': '', 'session': '1', 'desc': 'Normalized'}, {'session': '1', 'desc': 'defaced'}) would | return samples of two images: The first could be 'sub-123_ses-1_desc-Normalized.nii.gz', and the second would be | 'sub-123_ses-1_desc-defaced.nii.gz'. The subject entity matches, session is restricted to "1", and the | description entitiy is used to differentiate between them. | target_entities : list [dict] | Same as data_entities, but for the prediction target. | batch_size : int | Optional. Size of the batch to train the estimator. Default: 1. | data_derivatives_names : list [str] | Optional. If an entry in data_entities is BIDS derivatives data, its name should be listed here. Entries | that don't correspond to derivatives should be listed as None. Default: [None for _ in data_entities] | target_derivatives_names : list [str] | Optional. If an entry in target_entities is BIDS derivatives data, its name should be listed here. Entries | that don't correspond to derivatives should be listed as None. Default: [None for _ in target_entities] | root_list : list | Reserved. Not yet implemented. List of BIDS root directories, if data must be loaded from different BIDS | directories. There must be exactly len(data_entities) + len(target_entities) entries in the list, with the | order corresponding to the order of the data_entities, followed by the target_entities. | label_names : list [str] | Names of the values of the target, if any. | | Methods defined here: | | __init__(self, root_dir: str, data_entities: list, target_entities: list, batch_size: int = 1, data_derivatives_names: list = None, target_derivatives_names: list = None, root_list: list = None, label_names: list = None) | Initialize self. See help(type(self)) for accurate signature. | | __len__(self) | | load_batch(self, indices: list) | Loads a batch of N images and returns the data/target in arrays. | Parameters | ---------- | indices : list [int] | List of indices to load. | Returns | ------- | np.array | Array of shape (len(indices), num_data, *image.shape) containing data. | np.array | Array of shape (len(indices), num_target, *image.shape) containing data. | | load_sample(self, idx: int) | Loads the sample at idx. | Parameters | ---------- | idx : int | Index of the sample to load. Max valid value is len(BIDSClassifier)-1 | | Returns | ------- | np.array | Array of shape (num_data, *image.shape) containing the data. | np.array | Array of shape (num_target, *image.shape) containing the target. | | ---------------------------------------------------------------------- | Static methods defined here: | | get_matching_images(image_to_match: bids.layout.models.BIDSImageFile, bids_dataset: bids.layout.layout.BIDSLayout, matching_entities: list = None, required_entities: dict = None) | Returns a list of images from the BIDS dataset that has the specified required_entities and has the same | value for entities listed in matching_entities as the image_to_match. | Example: for an image "sub-123_ses-1_T1w.nii" with matching_entities ['ses'] and required_entities | {'suffix': 'FLAIR'}, the image "sub-123_ses-1_FLAIR.nii" would match, but "sub-123_ses-2_FLAIR.nii" would not. | Parameters | ---------- | required_entities: dict | Entity-value dictionary that are required. | matching_entities: list | List of entities that must match, if present, between the previous image and the one to fetch. | image_to_match: BIDSImageFile | Image to use as reference for matching_entities. | bids_dataset: BIDSLayout | BIDS dataset from which to fetch the new image. | | Returns | ------- | list [BIDSImageFile] | BIDS image file matching the input specifications. Empty if there are no matches. | | load_image_tuple(image_tuple: tuple, dtype=<class 'numpy.float32'>) | Loads the tuple and returns it in an array | Parameters | ---------- | image_tuple : tuple (BIDSImageFile,) | Tuple of BIDSImageFile to be loaded and returned in an array | Returns | ------- | np.array | Loaded data | | load_image_tuple_list(image_list: list, dtype=<class 'numpy.float32'>) | Loads each image in the tuple and returns in a single array; different tuples in the list are assumed to be | batches. The returned array will be of shape (len(image_list), len(image_tuple), *image.shape | Parameters | ---------- | image_list : list [tuple] | List of tuples containing BIDSImageFile | | Returns | ------- | np.array | Loaded data. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
training_set = BIDSLoader(root_dir='data/train/',
data_entities={'subject': '',
'session': '',
'suffix': 'T1w'},
target_entities={'label': 'L',
'desc': 'T1lesion',
'suffix': 'mask'},
data_derivatives_names=['ATLAS'],
target_derivatives_names=['ATLAS'],
label_names=['not lesion', 'lesion'])
The object, training_set
now has the data list paired with the target list:
print(f'Image: {training_set.data_list[0]}')
print(f'Mask: {training_set.target_list[0]}')
Image: (<BIDSImageFile filename='/home/frcaud/ramp-board/ramp_deployment/ramp-kits/stroke_lesions/data/train/derivatives/ATLAS/sub-r001s002/ses-1/anat/sub-r001s002_ses-1_space-MNI152NLin2009aSym_T1w.nii.gz'>,) Mask: (<BIDSImageFile filename='/home/frcaud/ramp-board/ramp_deployment/ramp-kits/stroke_lesions/data/train/derivatives/ATLAS/sub-r001s002/ses-1/anat/sub-r001s002_ses-1_space-MNI152NLin2009aSym_label-L_desc-T1lesion_mask.nii.gz'>,)
We can load the data/target into separate arrays:
data, target = training_set.load_sample(0)
print(f'data shape: {data.shape}')
print(f'target shape: {target.shape}')
data shape: (1, 197, 233, 189) target shape: (1, 197, 233, 189)
We can also load multiple images as a batch:
data, target = training_set.load_batch([0,1,2])
print(f'data shape: {data.shape}')
print(f'target shape: {target.shape}')
data shape: (3, 1, 197, 233, 189) target shape: (3, 1, 197, 233, 189)
Along with load_image_tuple
and load_image_tuple_list
, these methods are used during RAMP's training to feed your estimator.
Now, that we understand better the anatomical T1 images that we will work with, let's look at the lesion masks. We will visualize overlap of the two:
from nilearn.image import load_img
from nilearn import plotting
/home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/nilearn/datasets/__init__.py:93: FutureWarning: Fetchers from the nilearn.datasets module will be updated in version 0.9 to return python strings instead of bytes and Pandas dataframes instead of Numpy arrays. warn("Fetchers from the nilearn.datasets module will be "
def plot_overlap(path_to_t1, path_to_lesion, title='overlap'):
fig = plt.figure(1, figsize=(14, 5), frameon=False, dpi=50)
ax = plt.gca()
plotting.plot_roi(path_to_lesion, path_to_t1, title=title,
axes=ax,
draw_cross=False,
annotate=False,
cmap='autumn', # the lesions will be shown in red
cut_coords=(0, 0, 0)
)
plot_overlap(path_to_t1=structural.path,
path_to_lesion=mask.path,
title=f'overlap, {sample}')
/home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/numpy/ma/core.py:2831: UserWarning: Warning: converting a masked element to nan. _data = np.array(data, dtype=dtype, copy=copy,
Note that lesions are heterogeneous: they appear in different places, have different sizes, and are different shapes.
We can examine a number of subjects and get the overall lesions for each subject. Note that some subjects have multiple lesions, so the following value is the lesion load rather than the lesion size. Even using the aggregate measure, we'll see a wide distribution.
from stroke.stroke_config import bids_loader_train, bids_loader_test # Default loaders for starting kit
import numpy as np
train_fraction = np.floor(len(bids_loader_train.target_list)/10)
test_fraction = np.floor(len(bids_loader_train.target_list))
subset_size = np.min([50, train_fraction, test_fraction]) # number of subjects to examine; reduce if it takes too long to load
subset_size = int(subset_size)
lesion_load_train = np.zeros((subset_size,)) # lesion load per subject
lesion_load_test = np.zeros((subset_size,))
# Load the data, compute the lesion load, store
for i in range(subset_size):
target_tuple = bids_loader_train.target_list[i]
target = bids_loader_train.load_image_tuple(target_tuple)
lesion_load_train[i] = np.sum(target)
for i in range(subset_size):
target_tuple = bids_loader_test.target_list[i]
target = bids_loader_test.load_image_tuple(target_tuple)
lesion_load_test[i] = np.sum(target)
# Plot the distribution!
from matplotlib import pyplot as plt
plt.hist(np.log10(lesion_load_train), bins=10) # Note the log scale
plt.hist(np.log10(lesion_load_test), bins=10, alpha=0.5)
plt.xlabel('log10 Lesion Load')
plt.ylabel('Frequency of lesion load')
plt.title('Distribution of lesion load in sets')
plt.legend(['Training set', 'Testing set'])
<matplotlib.legend.Legend at 0x7f3e29c498b0>
We can see in the distribution that even among the 100 subjects, the lesion load varies quite a bit across patients. The test set is a random subset from the same sites as the training set, but we can (qualitatively) observe different distribution between the two sets.
This section will show you the expected format for your estimators. We'll first go through a sample estimator that defines the required properties and methods.
The starter kit has a sample estimator in submissions/sample/
. There, you'll find a file, estimator.py
, which RAMP expects to have BIDSEstimator defined. The class is intentionally barebones, and is a good starting place for your own estimators.
Let's take a look at the sample class and its methods.
from submissions.sample.estimator import BIDSEstimator # Load estimator
methods = [d for d in dir(BIDSEstimator) if not d.startswith('_')] # Get methods; remove private ones.
print(methods)
['fit', 'fit_partial', 'get_params', 'predict', 'predict_proba', 'set_params']
Two of these methods, get_params
and set_params
are from sklearn
as BIDSEstimator
is a custom estimator built from BaseEstimator
. You don't need to implement these.
The fitting methods, fit
and fit_partial
are related to fitting your estimator to the data.
fit
expects to be given the entirety of the data to fit. You won't be able to fit all the data in memory, and as such fit
is not required.
fit_partial
behaves similarly, but instead assumes that not all data is being presented at once. It is used in iterative fitting (e.g. stochastic gradient descent). This method is required.
help(BIDSEstimator.fit_partial)
Help on function fit_partial in module submissions.sample.estimator: fit_partial(self, X, y) Fit the estimator using the input data (X) and target (y). Assumes that the inputs represent only a fraction of the data and that it will be called multiple times while using the dataset. I.e., learning rates and adaptive parameters should not be entirely recalculated with each call to this method. Required. This estimator in particular does nothing. Parameters ---------- X : np.array Data of the form (n_samples, n_channels, *image.shape) y : np.array Target (labels) of the form (n_samples, n_channels, *image.shape) Returns ------- None
Lastly, the prediction methods, predict
and predict_proba
use input data and return a prediction.
Broadly, predict
is expected to return the predicted class (i.e. argmax instead of softmax). In the case of our binary masks, we would expect predictions of either '0' or '1'.
predict_proba
is expected to return continuous values (i.e. softmax instead of argmax), which can be useful for model evaluation and calibration.
Scoring functions will call the relevant prediction method for their requirements, but will cast the predictions into the right type (e.g. binary) before scoring. You can implement only predict
and then have predict_proba
simply return the results of predict
without issue. Having your own implementation allows you to customize parameters such as thresholds, but is not required.
help(BIDSEstimator.predict_proba)
Help on function predict_proba in module submissions.sample.estimator: predict_proba(self, X) Applies the data to the estimator to produce a prediction. The output can be continuous to represent the relative confidence the estimator has in the prediction. Optional. Typically, correct but uncertain predictions are rewarded less. Similarly, incorrect but uncertain predictions are punished less severely. This estimator always returns 1. Parameters ---------- X : np.array Data of the form (n_samples, n_channels, *image.shape) Returns ------- np.array Prediction made by the estimator.
Before submitting to RAMP, you can test your solution locally to ensure that trivial errors (e.g. typos, path issues, etc.) are resolved. We can test a given submission using the ramp
command that was installed in the virtual environment.
We'll use the following command:
!ramp-test --submission sample --quick-test
The !
signals that the command should be run on the command line instead of this notebook.
ramp
is the command to be executed.
test
is the first argument; it signals ramp
to perform a local test.
--submission sample
specifies which submission to run. You can have multiple potential submissions in the submissions/
directory; this prevents ramp
from running all of them.
!ramp-test --quick-test --submission sample
/home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/bids/config.py:39: FutureWarning: Setting 'extension_initial_dot' will be removed in pybids 0.16. warnings.warn("Setting 'extension_initial_dot' will be removed in pybids 0.16.", /home/frcaud/anaconda3/envs/ramp-stroke/lib/python3.8/site-packages/bids/config.py:39: FutureWarning: Setting 'extension_initial_dot' will be removed in pybids 0.16. warnings.warn("Setting 'extension_initial_dot' will be removed in pybids 0.16.", Testing ATLAS Stroke Lesion Segmentation Reading train and test files from ./data/ ... Reading cv ... Training submissions/sample ... CV fold 0 score Sørensen–Dice Coefficient time train 0.012 1.257557 valid 0.004 0.000009 test 0.017 0.000004 CV fold 1 score Sørensen–Dice Coefficient time train 0.012 1.323619 valid 0.009 0.000010 test 0.017 0.000005 CV fold 2 score Sørensen–Dice Coefficient time train 0.006 1.301342 valid 0.018 0.000007 test 0.017 0.000004 CV fold 3 score Sørensen–Dice Coefficient time train 0.018 1.360942 valid 0.001 0.000010 test 0.017 0.000005 CV fold 4 score Sørensen–Dice Coefficient time train 0.012 1.287049 valid 0.009 0.000006 test 0.017 0.000005 ---------------------------- Mean CV scores ---------------------------- score Sørensen–Dice Coefficient time train 0.012 ± 0.0036 1.3 ± 0.03 valid 0.008 ± 0.0058 0.0 ± 0.0 test 0.017 ± 0.0 0.0 ± 0.0 ---------------------------- Bagged scores ---------------------------- score Sørensen–Dice Coefficient valid38;5;12m0m 0 test38;5;1m0m 0
We can see that the results are not very good, but that is expected: our estimator completely ignores the data!
RAMP will automatically perform 5-fold cross-validation and report the Sørensen–Dice for each of the folds and report the mean across the folds. Bagging of the results has been disabled; the output can be ignored.
You can find the scoring metrics in the scoring.py
file. If you want to use the score to evaluate your own results, use the calc_score
method.
from stroke.scoring import DiceCoeff
help(DiceCoeff)
Help on class DiceCoeff in module stroke.scoring: class DiceCoeff(rampwf.score_types.base.BaseScoreType) | DiceCoeff(name='Sørensen–Dice Coefficient', precision=3) | | Method resolution order: | DiceCoeff | rampwf.score_types.base.BaseScoreType | builtins.object | | Methods defined here: | | __call__(self, y_true: tuple, y_pred: <built-in function array>) | Call self as a function. | | __init__(self, name='Sørensen–Dice Coefficient', precision=3) | Scoring class for RAMP workflows. When called, returns the Sørensen–Dice coefficient. Note that this | implementation allows for continuous values in the prediction. | Parameters | ---------- | name : str | Name of the score; used for creating column headers. | precision : str | Numerical precision. | | score_function(self, Y_true: <built-in function array>, Y_pred: <built-in function array>) | Returns the Sørensen–Dice coefficient for the input images. If multiple samples are given, the mean score | is returned. | Parameters | ---------- | true_bids : BIDSPrediction | BIDSPrediction with true_bids.y_true set as an array. | pred : np.array | Array containing the predicted labels of an image. | | Returns | ------- | dice_coefficient : float | Sørensen–Dice coefficient. | | ---------------------------------------------------------------------- | Static methods defined here: | | calc_score(array_0: <built-in function array>, array_1: <built-in function array>) | Performs the calculation to get the Sørensen–Dice coefficient. | Parameters | ---------- | array_0 : np.array | First array to score. | array_1 : np.array | Second array to score. | | Returns | ------- | float | Sørensen–Dice coefficient | | calc_score_parts(array_0: <built-in function array>, array_1: <built-in function array>) | Computes the three parts of the Sørensen–Dice coefficient: overlap and 2 positives | Parameters | ---------- | array_0 | array_1 | | Returns | ------- | tuple | Tuple containing (overlap, sum(array_0), sum(array_1) | | check_y_pred_dimensions(array_0: <built-in function array>, array_1: <built-in function array>) | Checks that the dimensions of the inputs are consistent. | Parameters | ---------- | array_0 : np.array | First array to check. | array_1 : np.array | Second array to check | | Returns | ------- | bool | | unpack_data(array_0: <built-in function array>, output_shape: <built-in function array>) | Unpacks boolean data packed via np.packbits into appropriate shape, discarding excess bytes | Parameters | ---------- | array_0 : np.array | np.uint8 array to unpack | output_shape : tuple | Expected shape of output. | | Returns | ------- | np.array | Unpacked, reshape array | | ---------------------------------------------------------------------- | Readonly properties inherited from rampwf.score_types.base.BaseScoreType: | | worst | | ---------------------------------------------------------------------- | Data descriptors inherited from rampwf.score_types.base.BaseScoreType: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
help(DiceCoeff.calc_score)
Help on function calc_score in module stroke.scoring: calc_score(array_0: <built-in function array>, array_1: <built-in function array>) Performs the calculation to get the Sørensen–Dice coefficient. Parameters ---------- array_0 : np.array First array to score. array_1 : np.array Second array to score. Returns ------- float Sørensen–Dice coefficient
For the reference, if you wish to view previous medical image segmentation challenges:
When your estimator in estimator.py
is ready, place it in the submissions/
directory in its own, unique directory (e.g., the path should be submissions/my_estimator/estimator.py
).
You can then test your submission locally using the command:
ramp-test --submission <your submission folder name>
Example:
ramp-test --submission my_estimator
For more information on how to submit your code on ramp.studio, refer to the online documentation.