Evaluation - ANHIR - Grand Challenge

We evaluate the registration performance with relative Target Registration Error (rTRE) which represents geometric accuracy between the target and warped landmarks in the target image frame. We compute several statistic measures on rTRE for each registered pair and then we aggregate this statistic to the overall score for each participant.

Besides accuracy, we would also like to measure execution time. For normalising the measured time, we ask participants to run the attached code written in Python and insert this information within the submission.

Scoring and ranking¶

The evaluation is based on rTRE measured for each pair of landmarks in the image pair being registered. Lets us denote `TRE=d_e(x_l^T,x_l^W)` where `x^T` and `x^W` are the landmark coordinates in the target and warped image and `d_e(.)` is the Euclidean distance. All TRE are normalized by the image diagonal `rTRE={TRE} / \sqrt{w^2+h^2}` where `w` and `h` is image weight and height respectively. Let us also denote following abbreviation `a_d(.)=\text{mean}_{\text{dataset}}(.)`, `r_m(.)=\text{rank}_{\text{method}}(.)`, `m_i(.)=\text{median}_{\text{image}}(.)`, `s_i(.)=\text{max}_{\text{image}}(.)`. The main criterion used for ranking is the average rank of median rTRE - `a_d(r_m(m_i(rTRE))`. The motivation for using the median is not to penalize a few inaccurate landmarks if most of them are registered well. Then for each image pair, the methods will be ranked by this median rTRE in order to be able to compare even very different images with very different expected rTREs. The final ranking of the methods will be the mean of the ranks on individual images. This score will only be available later when several registration results are known for each image pair.

Moreover, we will report several other criteria, such as:

Average robustness - `a_d(R_i)`
Average median rTRE - `a_d(m_i(rTRE))` and `a_d(m_i(rTRE) if R_i=1)`
Average rank of median rTRE - `a_d(r_m(m_i(rTRE)))`
Average max rTRE - `a_d(s_i(rTRE)))` and `a_d(m_i(s_i(rTRE) if R_i=1))`
Average rank max rTRE - `a_d(r_m(s_i(rTRE)))`
Average execution time `t_i` measured in minutes - `a_d(r_m(t_i))` and `a_d(t_i if R_i=1)`

where a value `R` is a relative value how many landmarks `L` improved rTRE by performed registration compared to the initial rTRE, otherwise, formally `R_i=1/|L_i| \sum_{j \in L_i}(rTRE_{j}^{\text{regist}} \< rTRE_{j}^{\text{init}})`. All missing or incomplete registrations in the submission are considered to have the initial rTRE and they are ranked as last.

The benchmark provides two scales of the images for the newly introduced images and multiple scales for previously presented datasets:

Small - ~5% original image scale with approximate image sizes 1k - 2k pixels
Medium - ~25% original image scale with approximate image sizes 8k - 16k pixels

Only the medium size results will be evaluated in this Challenge. The small size images are intended for overview only.

Submission details¶

We provide a file with a list of image pairs (target and source) to be registered together with the paths/names of landmark files (also target and source) and a few empty columns which will be filled by participants as part of the submission (these columns are the path to warped source landmarks and execution time).

The table has the following columns (where [dataset-path] means a relative path to the file within the dataset; [submission-path] means a relative path to the file in the submission):

Target image [dataset-path]
Source Image [dataset-path]
Target landmarks [dataset-path]
Source landmarks [dataset-path]
Warped target landmarks [dataset-path]
Warped source landmarks (to be filled by participants) [submission path]
Execution time [seconds] (to be filled by participants)

Registration time should be measured (including loading images, excluding warping landmarks). The warped landmarks (the subject of submission) are the landmarks from one image transformed to the coordinate frame of the other image. In other words, it is referring to the new position of the source landmarks after registration with the target image.

Structure of the submission¶

The participants should submit a zip archive. The submission archive needs to contain the following files:

Cover table - registration-results.csv - It is basically a completed version of the table (CSV file) that we provide. It contains a list of the registration pairs (images to register and the related landmarks files). In this table, the participants are requested to fill up the execution time for each registration and the name of the file containing the warped source landmarks to the target image frame. [registration-results.csv has to be in the root of submission folder]
Computer benchmark - computer-performances.json - A simple JSON file with benchmark results on computer performance. This serves for normalisation of the registration time. We assume that the registration experiments run on the same machine. This file will be automatically generated by the attached computation benchmark. [computer-performances.json has to be in the root of submission folder]
Collection of warped landmarks - A folder (it can be a collection of folders or placed directly in submission root) with a collection of CSV files containing warped landmarks. Note that the relative path to the particular CSV file has to match the path in the Cover table - registration-results.csv.

You can find a sample submission from bUnwarpJ.

Winning criteria¶

We will provide sets of points inside the tissue for each registration pair and we will evaluate the rTRE only on a subset of these points/landmarks which typically marks significant structures in the tissue. Participants will be ranked according to criteria earlier on evaluation images pairs (Average rank of median rTRE).

All participants also have to submit a short description of their method in standard double-column IEEE format. The minimum length is 1-page and it should not be longer than 4-pages (excluding references). For submitting your paper/report use “Supplementary File” along with your result submission or “Publication URL” to your publicly available paper e.g on https://arxiv.org. For your paper/report after results deadline use following mail anhir.submission@gmail.com.

Extra information¶

We provide a benchmark framework - https://borda.github.io/BIRL which contains several useful scripts and the benchmark implementation. The benchmark uses object-oriented-architecture so with rewriting two methods you can simply inherit the complete benchmark for your image registration method. The two methods are: (i) run image registration and (ii) obtain the warped landmarks. By running them, you generate all submission files. For detailed information see the README in the BIRL project.

Reference¶

Borovec, J. (2019). BIRL: Benchmark on Image Registration methods with Landmark validation. arXiv preprintarXiv:1912.13452
Borovec J, Munoz-Barrutia A, Kybic J. Benchmarking of Image Registration Methods for Differently Stained Histological Slides. 2018 25th IEEE International Conference on Image Processing (ICIP). 2018. doi:10.1109/icip.2018.8451040