About dogsled

dogsled is an open-source Python package that does only one thing: Macenko 1 stain normalisation of medical slides. It works with OpenSlide slide formats, generates either JPEG or TIFF normalised image, and it is designed to be:

  • Simple – minimum input is required from the user

  • Flexible – dogsled supports different ways of specifying which slides have to be normalised

  • Customisable – the user can either use the pre-defined parameters such as normalisation constants or paths (for storing the data) or can specify different ones

Why dogsled? Well, first of all, because of the dogs. Second, because together many dogs can push a cargo too heavy for one dog to handle. Similarily, dogsled divides heavy computations into smaller ones. As with many algorithms and life situations, divide and conquer, right?

Motivation behind the development

While working on a method for segmentation and classification of all tubular structures seen in medical slides of kidneys, the author of this package has noticed a slight difference in the stain colour between the medical slides. This difference was even more apparent when slides from another laboratory were examined. As the quality of the data plays an (if not the most) important role in model development 2, it has been decided to take a step back from the exciting model development and turn to the data quality enhancement. It has been identified that several different normalisation techniques exist 345. The choice fell to the Macenko stain normalisation for its relative simplicity. However, it has also been discovered that available tools could not handle SVS slides (20,000x20,000 px an up), causing a crash when handling the data. Therefore, a tool was created which

  • would not crash the system

  • could normalise several slides specified

  • create thumbnails for reporting

This Python package is a slightly refactored version of this tool with some additions and modifications, and the normalisation looks something like this:

normalisation gallery

The images on the left side are the source slides (and their corresponding 100% zoom), and on the right their normalised copies. Slides from: 678

Quirks and features

Currently, dogsled can:

  • normalise all slides located in a specified folder

  • normalise slides specified by name

  • normalise slides defined in a QuPath library (either all or the ones specified by name and/or index)

  • generate JPEG equivalents of the normalised slides

  • generate TIFF equivalents of the normalised slides (also for large slides not fitting in RAM)

  • create hematoxylin/eosin decoupled normalised images

  • create thumbnails of all slides (pre-normalised and normalised)

See quickstart and API for further details.

How does it work?

normalisation process

dogsled is based on the Macenko normalisation implementation found here and here. In contrast to these raw implementations, dogsled is tailored for automatic handling of many large medical slides. This is achieved by, amongst other, by processing slides in tiles. First, dogsled estimates the tile sizes suitable for the machine, and then it calculates locations of the tiles. In order to estimate the slide-specific parameters, the tile in centre of the slide is normalised first (as the probability of this tile to contain tissue is higher than containing the background). This process is illustrated in the picture above (slide from the OpenSlide test dataset). After finishing the normalisation of all tiles, they are stitched together, resulting in a normalised equivalent image of the source slide.

References

1

M. Macenko, M. Niethammer, J. S. Marron, D. Borland, J. T. Woosley, Guan Xiaojun, C. Schmitt, and N. E. Thomas. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110. 2009. doi:10.1109/ISBI.2009.5193250.

2

Alon Halevy, Peter Norvig, and Fernando Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2):8–12, 2009. doi:10.1109/MIS.2009.36.

3

E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley. Color transfer between images. IEEE Computer Graphics and Applications, 21(5):34–41, 2001. doi:10.1109/38.946629.

4

A. M. Khan, N. Rajpoot, D. Treanor, and D. Magee. A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Transactions on Biomedical Engineering, 61(6):1729–1738, 2014. doi:10.1109/TBME.2014.2303294.

5

Deepak Anand, Goutham Ramakrishnan, and Amit Sethi. Fast gpu-enabled color normalization for digital pathology. In International Conference on Systems, Signals and Image Processing (IWSSIP), 219–224. IEEE, 2019.

6

Christof A. Bertram, Marc Aubreville, Christian Marzahl, Andreas Maier, and Robert Klopfleisch. A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor. Scientific Data, 6(1):274, 2019. doi:10.1038/s41597-019-0290-4.

7

Gloria Bueno, Lucia Gonzalez-Lopez, Marcial Garcia-Rojo, Arvydas Laurinavicius, and Oscar Deniz. Data for glomeruli characterization in histopathological images. Data in Brief, 29:105314, 2020. doi:10.1016/j.dib.2020.105314.

8

Kenneth Clark, Bruce Vendt, Kirk Smith, John Freymann, Justin Kirby, Paul Koppel, Stephen Moore, Stanley Phillips, David Maffitt, Michael Pringle, Lawrence Tarbox, and Fred Prior. The cancer imaging archive (tcia): maintaining and operating a public information repository. Journal of Digital Imaging, 26(6):1045–1057, 2013. doi:10.1007/s10278-013-9622-7.