Catalog Item

S&T Project 22041 Final Report: Evaluation of file formats for storage and transfer of large datasets in the RISE platform

The Reclamation Research and Development Office funded an evaluation of file formats for large datasets to use in RISE through the Science & Technology Program. A team of Reclamation scientific and information technology (IT) subject matter experts evaluated multiple file formats commonly utilized for scientific data through literature review and independent benchmarks. The network Common Data Form (netCDF) and Zarr formats were identified as open-source options that could meet a variety of Reclamation use cases. The formats allow for metadata, data compression, subsetting, and appending in a single file using an efficient binary format. Additionally, the Zarr format is optimized for cloud storage applications. While support of both formats would provide the most flexibility, the maturity of the netCDF format led to its prioritization as the preferred RISE file format for large datasets. This report documents the evaluation and selection of large data file formats for the RISE platform. Additionally, a preliminary list of identified changes to the RISE platform needed to support the netCDF format is provided. The intent is to frame future RISE development by providing a roadmap to support large datasets within the platform.
Download File Opens in new window
Generation Effort S&T Project 22041: Evaluation of file formats for storage and transfer of large datasets in the RISE platform
Location Name Worldwide
Type Uploaded file(s)
File Type PDF
Publisher Bureau of Reclamation
Publication Date Friday, December 1st, 2023
Update Frequency not planned
Last Update Friday, January 5th, 2024

Disclaimer

The findings and conclusions of this work are those of the author(s) and do not necessarily represent the views of the Bureau of Reclamation.