Open Kidney Dataset
Fine-grained Annotated Ultrasound for Medical Image Analysis
Current Release: 14 June 2022
Introduction
Ultrasound imaging is a portable, real-time, non-ionizing, and non-invasive imaging modality. It is the first line for numerous organs, including the kidney. With recent advances in technology, the world of artificial intelligence (AI)-enhanced ultrasound is imminently upon us. However, compared to other modalities like CT or MRI, there is a lack of open ultrasound data available for researchers to use.
We present the Open Kidney Dataset. It includes over 500 two-dimensional B-mode abdominal ultrasound images and two sets of fine-grained polygon annotations, each generated by an expert sonographer, for four classes that are available for non-commericial use.
Motivation
Artificial intelligence for medical imaging has seen unprecedented growth in the last ten years. As a result of the creation of imaging data being made available to researchers, cornerstone algorithms like U-net have been created. However, in the field of ultrasound, there is a lack of high quality data available. This is in part due to difficulty in acessing medical imaging data as well as anonymization and privacy considerations. However, even in competititions within biomedical imaging, such as the The MICCAI Segmentation Decathalon, ultrasound is underrpresented. The lack of data accentuates the growing reproducibility crisis within the ultrasound machine learning field. To the best of our knowledge there is no widely available kidney ultrasound dataset that exists. To further expand and improve academic efforts for machine learning in ultrasound, we present the Open Kidney Dataset.
This dataset may provide standardization to ultrasound segmentation benchmarking, as well as in the long-term reduce ultrasound interpretation efforts, furthering simplifying ultrasound use.
Data Description
Insitutional approval was received (H21-02375). The ultrasound images were originally acquired between January 2015 and September 2019. These B-mode ultrasound images are collected from real patients who had a clinical indication to receive an ultrasound investigation of their kidneys. Consequently, a significant portion are obtained in real-world situations such as at the bedside or intensive care units, rather than finely controlled laboratory conditions. The participant population includes adults with chronic kidney disease, prospective kidney donors, and adults with a transplanted kidney. No filtering for specific patient or imaging characteristics were made. No filtering for specific vendors were made, and hence a variety of ultrasound manufacturers are represented including Philips, General Electric (GE), Acuson, Siemens, Toshiba and SonoSite.
Each annotated image additionally comes with labels for the view type and kidney type (native or transplant).
Repository Structure
The repository itself is laid out as per follows:
- The top level directory includes the
requirements.txt
, README, and a CSV file on transducer properties. This CSV contains, where possible, information on a per-image basis on the make and model of the ultrasound machine, the transducer used, the physical resolution and transducer frequency. /src/
includes the scripts used to evaluation segmentation results./src/tools/
contains utility scripts to perform image manipulation, calculate distributions, etc./src/echotools/
contains scripts used to process the DICOM images originally obtained. Note that only .PNG files are shared. These files are provided as reference./src/annotation_analysis/
conatins utility scripts for evaluating the manual annotations provided by sonographers and calculate variance statistics on such annotations.
/labels/
contains the CSV files generated from the VGG Image Annotator for all images. Each CSV record includes the filename, file size, atributes on quality, the view type, and any brief comments provided by the sonographer. It also includes the coordinates for each polygon corresponding to each different annotation region.
Data Structure
The data structure is provided as folder of PNG images. Each file corresponds to a randomly sampled image from a unique patient. No more than one image is from the same patient. Access to data requires registration.
License and Usage
The data and code that are made available are under the CC BY-NC-SA license. Data may not be used for commercial purposes. Due to accessibility and privacy terms, registration is required for manual verification prior to the release of data.
Access
Please complete the registration form at this link: https://ubc.ca1.qualtrics.com/jfe/form/SV_1TfBnLm1wwZ9srk
Upon registering, your submission will be reviewed manually. After review, an email will be sent to you with relevant links.
Code and Trained Models
Relevant code for masking, cropping data, reading and processing summary statistics of labels, pre-trained models and additional helper code is available at: https://github.com/rsingla92/kidneyUS
Citation
Singla R, Ringstrom C, Hu G, Lessoway V, Reid J, Nguan C, Rohling R. The open kidney ultrasound data set. arXiv preprint arXiv:2206.06657. 2022 Jun 14.
Support or Contact
For additional information, or to report errors in the data, please contact us at rsingla92 [at] gmail [dot] com
Errata
Errata to the code, data, or otherwise will be listed here in a date stamped manner.
None to date.