Detection and classification of ovarian follicles

Frédérique Clément (INRIA), Raphäel Corre (CNRS), Céline Guigon (INSERM), François Caud, Benjamin Habert, Alexandre Gramfort (DATAIA, Univ. Paris-Saclay)

This challenge was done with the support of DATAIA in collaboration with INRIA, CNRS, INSERM and INRAE:

Introduction

The challenge consists in automatically detecting and classifying ovarian follicles on histological slices from mammal ovaries.

The ovary is a unique instance of dynamic, permanently remodeling endocrine organ in adulthood. The ovarian function is supported by spheroid, multilayered, and multiphasic structures, the ovarian follicles, which shelter the oocyte (female germ cell) and secrete a variety of hormones and growth factors. The ovary is endowed with a pool of follicles established early in life, which gets progressively exhausted through follicle development or death. Understanding the dynamics of ovarian follicle populations is critical to characterize the reproductive physiological status of females, from birth (or even prenatal life) to reproductive senescence.

Accurate estimation of the number of ovarian follicles at various stages of development is of key importance in the field of reproductive biology, for basic research, pharmacological and toxicological studies, as well as clinical management of fertility. The associated societal challenges relate to physiological ovarian aging (decrease in fertility with age, menopause), pathological aging (premature ovarian failure) and toxic-induced aging (endocrine disruptors, anticancer treatments).

In vivo, only the terminal stages of follicles, hence the tip of the iceberg, can be monitored through ultrasonographic imaging. To detect all follicles, invasive approaches, relying on histology are needed. Ovaries are fixed, serially sectioned and stained with proper dyes, and manually analyzed by light microscopy. Such a counting is a complex, tedious, operator-dependent and, above all, very time consuming procedure. To save time, only some slices sampled from a whole ovary are examined, which adds to the experimental noise and degrades further the reliability of the measurements.

Experimentalists expect a lot from the improvement of the classical counting procedure, and deep-learning based approaches of follicle counting could bring a considerable breakthrough in the field of reproductive biology.

We will distinguish here 4 categories of follicles from smaller to larger follicles: Primordial, Primary, Secondary, Tertiary. One of the difficulties lies in the fact that there is a great disparity of size between all the follicles. Another one is that most of pre-trained classifiers are trained on daily life objets, not biological tissues.

Data description

Data consist of 34 images of histological sections taken on 6 mouse ovaries. 29 sections in the train dataset and 5 sections in the test dataset. Each section has been annotated with ground truth follicle locations and categories. Bounding box coordinates and class labels are stored in a csv file named labels.csv, one for each train and test set. A Negative class has also been created with bounding boxes of various sizes on locations where there is no positive example of follicles from the 4 retained categories.

Requirements for running the notebook

Exploratory data analysis

First uncomment the following line to download the data using this python script. It will create a data folder inside which will be placed the train and test data.

In order to get a feel of what the data look like, let's visualize an image of a section and the corresponding annotations.

First we need to be able to read and extract bounding boxes coordinates and class names from the csv file.

This function extract boxes coordinates for true locations (ground truth annotations):

Here is a function that diplays the image and bounding boxes:

Let's visualize one section from the training set with all the annotated true boxes:

Feel free to change IMAGE_TO_ANALYSE name to display other examples.

Now, what is the size distribution of all the ground truth bounding boxes in the train set ? First we make a list of all those locations, and then we count by class and visualize histograms of the width of bounding boxes per class.

We clearly see that the annotated follicles are arranged in this order of their size : Primordial < Primary < Secondary < Tertiary

Multiclass classification

The chosen strategy for the baseline algorithm is a random window cropping followed by a multiclass classification at each extracted window. First let's build and train the classifier. Here we take a pretrained model on Imagenet and freeze its weights. Then we only train the last fully-connected layer with the training data.

Building model

Extracting all training examples from images and creating Xtrain and ytrain for classifier

Training of the classifier

We load a test image and visualize it with ground truths:

Let's make some predictions on cropped images:

The classifier predicts with 97% confidence level that this follicle is a Tertiary (index 4)

Here it predicted 0 (Negative class) with 81% instead of a Secondary follicle.

Finally, with 95% it predicted a Tertiary follicle (index 4)

Loading test data

We will load all test data to evaluate the classifier.

Evaluation of the classifier

We use model.evaluate with test data to evaluate our model

We can save the model with this line:

Here we make predictions on the whole test set:

Confusion matrix:

Classification report:

Object detection

The strategy is to generate random windows on test images and pass them through the classifier. We can choose different mean (window_size) and the window size will be drawn from a normal distribution with this mean.

Random window generator

First we create a list of random bounding boxes of certain sizes: one bbox = (xmin, ymin, xmax, ymax).

Then we build a tensor of cropped images, all 224x224x3 in size:

This function will create a list of locations {"class": , "proba": , "bbox": } from probas and boxes obtained at prediction time.

Predictions on windows and filtering on one test image

Next function takes an image of a section, a list of box sizes and a list of number of boxes for every size and returns a list of predicted locations:

Then we filter those predictions with a probability threshold:

IOU-NMS to remove duplicates

Precision-Recall curve

Average precision over the whole test set

Your proposition should at least beat this mAP score.

Quick submission test

You can test any submission locally by running:

ramp-test --submission <submission folder>

If you want to quickly test the that there are no obvious code errors, use the --quick-test flag to only use a small subset of the data.

ramp-test --submission <submission folder> --quick-test

See the online documentation for more details.