Detection and classification of ovarian follicles

Frédérique Clément (INRIA), Raphäel Corre (CNRS), Céline Guigon (INSERM), François Caud, Benjamin Habert, Alexandre Gramfort (DATAIA, Univ. Paris-Saclay)

This challenge was done with the support of DATAIA in collaboration with INRIA, CNRS, INSERM and INRAE:

Introduction

The challenge consists in automatically detecting and classifying ovarian follicles on histological slices from mammal ovaries.

The ovary is a unique instance of dynamic, permanently remodeling endocrine organ in adulthood. The ovarian function is supported by spheroid, multilayered, and multiphasic structures, the ovarian follicles, which shelter the oocyte (female germ cell) and secrete a variety of hormones and growth factors. The ovary is endowed with a pool of follicles established early in life, which gets progressively exhausted through follicle development or death. Understanding the dynamics of ovarian follicle populations is critical to characterize the reproductive physiological status of females, from birth (or even prenatal life) to reproductive senescence.

Accurate estimation of the number of ovarian follicles at various stages of development is of key importance in the field of reproductive biology, for basic research, pharmacological and toxicological studies, as well as clinical management of fertility. The associated societal challenges relate to physiological ovarian aging (decrease in fertility with age, menopause), pathological aging (premature ovarian failure) and toxic-induced aging (endocrine disruptors, anticancer treatments).

In vivo, only the terminal stages of follicles, hence the tip of the iceberg, can be monitored through ultrasonographic imaging. To detect all follicles, invasive approaches, relying on histology are needed. Ovaries are fixed, serially sectioned and stained with proper dyes, and manually analyzed by light microscopy. Such a counting is a complex, tedious, operator-dependent and, above all, very time consuming procedure. To save time, only some slices sampled from a whole ovary are examined, which adds to the experimental noise and degrades further the reliability of the measurements.

Experimentalists expect a lot from the improvement of the classical counting procedure, and deep-learning based approaches of follicle counting could bring a considerable breakthrough in the field of reproductive biology.

We will distinguish here 4 categories of follicles from smaller to larger follicles: Primordial, Primary, Secondary, Tertiary. One of the difficulties lies in the fact that there is a great disparity of size between all the follicles. Another one is that most of pre-trained classifiers are trained on daily life objets, not biological tissues.

Data description

Data consist of 34 images of histological sections taken on 6 mouse ovaries. 29 sections in the train dataset and 5 sections in the test dataset. Each section has been annotated with ground truth follicle locations and categories. Bounding box coordinates and class labels are stored in a csv file named labels.csv, one for each train and test set. A Negative class has also been created with bounding boxes of various sizes on locations where there is no positive example of follicles from the 4 retained categories.

Requirements for running the notebook

Exploratory data analysis

First uncomment the following line to download the data using this python script. It will create a data folder inside which will be placed the train and test data.

In order to get a feel of what the data look like, let's visualize an image of a section and the corresponding annotations.

First we need to be able to read and extract bounding boxes coordinates and class names from the csv file.

This function extract boxes coordinates for true locations (ground truth annotations):

Here is a function that diplays the image and bounding boxes:

Let's visualize one section from the training set with all the annotated true boxes: