API¶
The NormaliseSlides class¶
NormaliseSlides class¶
dogsled.normaliser.NormaliseSlides
NormaliseSlides is the main interface for slide normalisation. It is required to specify which folder should contain the normalised slides by passing the norm_path argument. The slides themselves can be specified either by passing the qpproj_path or source_path. If both are specified, then qpproj_path is used and the source_path is ignored. See __init__() for further details:
Example usage:
from dogsled.normaliser import NormaliseSlides
normaliser = NormaliseSlides(norm_path = '/Users/uname/slides/normalised',
source_path = '/Users/uname/slides/',
qpproj_path = '/Users/uname/QuPath_projects/project.qpproj',
slides_indexes = [0,1,8],
slide_names = ['SAS_21883_001.svs', 'VUHSK_1912.svs'])
Note
It is possible to specify both the names and the indexes of the slides in the QuPath project. In case of an overlap, the slide normalisation will not be repeated
As the normalisation may take a significant amount of time, it is best to double-check whether all specified parameters are correct before proceeding further. This can be done by examining the output of the
normaliser.file_data
this will show the location of the QuPath project, path to the pre- and normalised slides, temporary folder path, indexes of the selected slides and the names of the slide files which will be normalised
Once the instance of the NormaliseSlides is created and the arguments controlled, the normalisation can be started using start() method:
normaliser.start()
This method does not require any arguments. During normalisation, dogsled will provide the basic progress information, e.g.:
12/06/2021 09:08:48 PM normaliser INFO: [ Slide 1/2 SAS_21883_001.svs tile 1/4 reading slide sector ]
The status information includes the number of the current slide and the total number of the slides to be normalised (1/2), name of the currently processed slide, which tile is currently normalised and in how many tiles the slide will be processed (1/4), which operation is currently in progress (reading slide sector).
repeat_stitching() method¶
When big slides are handled, many normalised tiles can be produced. These tiles are later stitched together, producing the end result- a normalised image. In case this stitching caused the system to crash, it is possible to repeat it using the repeat_stitching() method. To confirm that stitching has caused the crash and repeated stitching is applicable, navigate to the norm_path and check the last lines of the log file:
user@arch:~$ cd /Users/uname/slides/normalised
user@arch:normalised$ tail -3 dogsled.log
01/12/2021 09:16:50 PM normaliser INFO: [ Slide 1/1 SAS_21883_001.svs tile 4/4 stitching image together ]
01/12/2021 09:16:50 PM normaliser INFO: [ Slide 1/1 SAS_21883_001.svs tile 4/4 stitching using vips ]
01/12/2021 09:16:50 PM normaliser INFO: [ Slide 1/1 SAS_21883_001.svs tile 4/4 stitching slide: SAS_21883_001.svs ]
01/12/2021 09:16:50 PM normaliser INFO: [ Slide 1/1 SAS_21883_001.svs tile 4/4 stitching finished ]
01/12/2021 09:16:50 PM normaliser INFO: [ Slide 1/1 SAS_21883_001.svs tile 4/4 saving jpeg image: norm_SAS_21883_001 ]
If the last logged event states the name of the stitched slide or the status of saving image (last three lanes in the above example), the stitching might have caused a crash, and re-stitching can be applied (also, you might want to check whether the temporary folder of the slide contains the correct number of tiles, 4 in case of the example above). For this purpose, NormaliseSlides has to be reinitialised prior repeating the stitching using only the slide which caused the crash. Also, it is recommended to use VIPS stitching instead of the default NumPy array stacking:
from dogsled.normaliser import NormaliseSlides
from dogsled.defaults import DEFAULTS
normaliser = NormaliseSlides(norm_path = '/Users/uname/slides/normalised',
source_path = '/Users/uname/slides/',
qpproj_path = '/Users/uname/QuPath_projects/project.qpproj',
slide_names = ['SAS_21883_001.svs'])
DEFAULTS['vips_stitcher']= True
normaliser.repeat_stitching()
Note
dogseld is designed to avoid crashes. When the NormaliseSlides is initialised, it analyses the RAM available and tailors the size of the tiles accordingly. The thresholds are defined in the DEFAULTS dictionary (see further). Apart from that, dogsled gives a (very) aproximate estimation of the space required for normalisation of the selected slides and the space available (the normalisation will be started regardless of the space estimation results). Both, RAM and space estimation information are shown during the NormaliseSlides initialisation. Please note that the actual space required for slide processing might be higher as your system might save intermediate data on the disc. It is recommended to have at least 10GB of free memory.
The DEFAULTS_VALS dictionary¶
dogsled.defaults.DEFAULTS_VALS
This dictionary holds constants used for normalisation and some other normalisation parameters.
import numpy as np
DEFAULTS = {'show_results': False,
'ram_megapixel': {12000: 12000, 12001: 24500},
'output_type': ['norm'],
'dtype': np.float32,
'numba_dtype': numba.float32,
'temporary_folder_name': 'dogsled_temp',
'remove_temporary_files': True,
'jpeg_quality': 95,
'vips_tiff_compression': 'lzw',
'thumbnail': True,
'thumbnail_max_side': 6000,
'vips_stitcher': False,
'OpenSlide_formats': ['.svs', '.tif', '.tiff', '.scn', '.vms', '.vmu', '.ndpi', '.mrxs', '.svslide', '.bif'],
'first_tile': 'middle',
# normalisation constants:
'normalising_c': 255,
'alpha': 0.0001,
'beta': 0.0015,
'he_ref': array([[0.6895, 0.1759],
[0.6973, 0.8286],
[0.674 , 0.5312]]),
'max_s_ref': array([0.498, 0.927])}
This dictionary is internally processed and forwarded to the Defaults which holds all parameters and can be used for parameter re-definition by the user prior NormaliseSlides initialisation (using DEFAULTS instance of Defaults). If you wish to obtain hemotoxylin and eosin stains in addition to the default normalised slide, it can be done via StainTypes class:
from dogsled.normaliser import NormaliseSlides
from dogsled.defaults import DEFAULTS, StainTypes
# if you want norm stain + he and eo
DEFAULTS.output_type = [StainTypes.norm, StainTypes.he, StainTypes.eo]
DEFAULTS.vips_stitcher = True
DEFAULTS.vips_tiff_compression = 'deflate'
normaliser = NormaliseSlides(source_path = '/Users/uname/slides/',
qpproj_path = '/Users/uname/QuPath_projects/project.qpproj',
slides_indexes = [0,1,8],
slide_names = ['SAS_21883_001.svs', 'VUHSK_1912.svs'])
normaliser.start()
- ram_megapixel¶
ram_megapixelis a mapping of available RAM to the tile size that will be used during normalisation. During initialisation, dogsled checks available RAM usingpsutil. If the value is less than or equal 12000MB, then the maximum width and height of the tile used are set to 12000px; if the value is bigger- 24500- Type
dictionary
- Default
{12000: 12000, 12001: 24500}
- output_type¶
As the Macenko normalisation gives access to the stain-decoupling, dogsled can generate the hematoxylin and eosin images in addition to the normalised image. If these images are needed, they can be specified using attributes of the
StainTypesclass. E.g.[StainTypes.he, StainTypes.eo]. For all three, specify[StainTypes.norm, StainTypes.he, StainTypes.eo]]- Type
list[<enum ‘StainTypes’>]
- Default
[StainTypes.norm]
- dtype¶
NumPy data type used when performing the normalisation operations
- Type
type
- Default
np.float32Attention
It is possible to set this attribute to a lower value- np.float16, which will increase the speed of calculations and decrease the intermediate space required. This, however, may result in artefacts being present in the end result due to the lower precision (dark pixels scattered over regions with high contrast)
- numba_dtype¶
Numba data type used for Numba-optimised calculations
- Type
type
- Default
numba.float32
- temporary_folder_name¶
Name of the temporary folder which is created inside the
norm_pathand will hold the normalised slide tiles- Type
string
- Default
'dogsled_temp'
- remove_temporary_files¶
Whether the normalised slide tiles stored in the temporary folder should be removed after normalisation. It may be set to
Falseif subsequent tile processing is required by the user. If set toTrue, before deleting dogsled checks whether the normalised slide was indeed stitched togetherNote
The normalised slide tiles will be removed; however, the temporary folders themselves are kept
- Type
boolean
- Default
True
- jpeg_quality¶
JPEG quality used when creating the tiles
- Type
integer
- Default
95
- vips_stitcher¶
If large slides are handled, their tiles might not fit into the memory at once when stitched together (e.g. slides with 40x magnification and size over 100,000pixels per side). In these cases, it is possible to stitch the tiles together using sequential read, which is disabled by default.
Attention
When vips sequential access is used for stitching, the normalised slide will be saved as TIFF file. It is recommended to use vips/openslide for further processing of this normalised image. Also, this TIFF file will contain spoofed metadata and can be opened with QuPath
- Type
boolean
- Default
False
- vips_tiff_compression¶
Sets the TIFF compression. See pyvips manual for other options
- Type
string
- Default
'lzw'
- thumbnail¶
Whether the thumbnails of the pre- and normalised images should be generated
- Type
boolean
- Default
true
- thumbnail_max_side¶
Maximum side of the generated tile. E.g. if set to 6000px and the source slide has dimensions of 12,000x18,000px, the resulting thumbnail will have dimensions of 4,000x6,000px
- Type
integer
- Default
6000
- OpenSlide_formats¶
List of the slide formats which can be handled using
libvipsand thus by dogsled as well- Type
list[string]
- Default
['.svs', '.tif', '.tiff', '.scn', '.vms', '.vmu', '.ndpi', '.mrxs', '.svslide', '.bif']
- first_tile¶
When the slide is processed in tiles, the first normalised tile is used for estimating the slide-specific parameters. However, if the tile in the left upper corner (the first one, if ordered from left to right, top to bottom) contains only background, these parameters are estimated incorrectly, leading to the normalised slide having wrong colours. Therefore, the tile located in the centre of the slide is normalised first. As an alternative, an
intmay be provided indicating the number of the tile processed first- Type
string
- Default
'middle'
- normalising_c, alpha, beta, he_ref, max_s_ref¶
Macenko normalisation constants
- Type
integer, float, float, np.array, np.array
- Default
255, 0.001, 0.0015, array([[0.6895, 0.1759], [0.6973, 0.8286], [0.674 , 0.5312]], dtype=float16), array([0.498, 0.927], dtype=float16)