Abstract: Scientific imaging instruments with modern, fast CMOS detectors are generating increasingly large datasets, and hence, data management becomes more critical. This is particularly true in the context of large multi-user facilities such as the Center for Microscopy and Microanalysis (CMM) at the University of Queensland (UQ) as it operates a wide range of microscopes and many of them are big data producers. A central data repository to store, index, annotate data not only allows its researchers to search, browse and retrieve their data easily but also has the potential to harvest metadata to enrich these datasets.
Pitschi is a central data repository for CMM scientific instruments that adheres to FAIR data principles. This is part of the Australian Characterisation Commons at Scale project, which is funded by the Australian Research Data Commons. Pitschi is based on the opensource Clowder data management framework from the National Center for Supercomputing Applications in the US, and it is fully integrated with the instrument booking system and the storage infrastructure at UQ. Pitschi provides end-to-end process data management, from capturing raw data to transferring them to storage collection and finally ingesting/indexing the data into the repository. As part of the ingest process, metadata of supported file types are extracted automatically. These metadata are then used to facilitate search and discovery. Once the data are ingested in Pitschi, they are available in various platforms such as HPCs, personal computers, and processing platforms such as CVL. Data transport is arranged transparently using the Metropolitan Data Caching Infrastructure (MeDICI).
In March 2021, we presented our evaluation of data repositories to Microscopy Australia’s Data Management and Informatics Committee. In this follow-up webinar, we would like to share our experience in the development and operation of Pitschi.
Dr Hoang Nguyen is a computer scientist with background in high-throughput and high-performance computing. He works at the Research Computing Centre, the University of Queensland. His work interests revolve around scientific workflows and different ways for users to interact with these workflows. Since joining the ACCS project in late 2020, he has been more involved with data management.
Dr Rubbiya Ali completed her PhD degree at the Institute for Molecular Bioscience, The University of Queensland, Australia. She developed three dimensional computational methods (3D BLE, RAZA and RAZAps) for edge detection, segmentation and particle picking of 3D organelles, macromolecules, and membrane proteins. she works at Centre for Microscopy and Microanalysis (CMM) as Data Informatics Manager. Her major research activities are 2D/3D image processing and helping researchers in establishing workflows for microscopic data management and analysis to facilitate entire microscopic data lifecycle at CMM.