Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?
nice, I've been hitting that up with the multidim model and it works pretty well. (i'm still scratchy in the C++ but getting what I want out of it). We were checking out CMIP6 holdings too that look similarly impressive. M On Thu, Feb 27, 2025 at 5:27 PM Kurt Schwehr wrote: > Mike, > > That's an interesting one for sure. > > Even before the ESA talk of Zarr, we have a good amount of Zarr data > around. Just one I know of: > https://cloud.google.com/storage/docs/public-datasets/era5 > > On Wed, Feb 26, 2025 at 8:07 PM Michael Sumner wrote: > >> Just clueing into why you might be working with this, have you seen this >> critique? (was a bit shocked to see that this is apparently going forward >> for Sentinel 2, let alone that it was even considered!) >> >> https://github.com/csaybar/ESA-zar-zip-decision >> >> Also glad to see a working example outlined that I can follow, Thanks! >> >> Cheers, Mike >> >> >> >> On Tue, Feb 25, 2025 at 7:10 AM Kurt Schwehr via gdal-dev < >> gdal-dev@lists.osgeo.org> wrote: >> >>> Thanks Laurentiu and Scott! >>> >>> I can't believe 1) I left off -r and 2) I didn't think to look at what's >>> in the zip. Doh! >>> >>> And thanks for reminding me about sozip. I was one of the reviewers of >>> the initial spec and still don't have it front-of-mind. >>> >>> unzip -l nczarr_v2.zarr.zip >>> Archive: nczarr_v2.zarr.zip >>> Length DateTimeName >>> - -- - >>> 0 2024-12-23 02:00 nczarr_v2.zarr/ >>> - --- >>> 0 1 file >>> >>> Now happily able to see the contents: >>> >>> gdalinfo /vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr >>> Driver: Zarr/Zarr >>> Files: /vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr >>> Size is 512, 512 >>> Subdatasets: >>> >>> SUBDATASET_1_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/lon >>> SUBDATASET_1_DESC=Array /MyGroup/lon >>> >>> SUBDATASET_2_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/lat >>> SUBDATASET_2_DESC=Array /MyGroup/lat >>> >>> SUBDATASET_3_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/dset1 >>> SUBDATASET_3_DESC=Array /MyGroup/dset1 >>> >>> SUBDATASET_4_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/Group_A/dset2 >>> SUBDATASET_4_DESC=Array /MyGroup/Group_A/dset2 >>> >>> SUBDATASET_5_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/Group_A/dset3 >>> SUBDATASET_5_DESC=Array /MyGroup/Group_A/dset3 >>> [SNIP] >>> >>> -Kurt >>> >>> On Mon, Feb 24, 2025 at 11:21 AM Laurențiu Nicola via gdal-dev < >>> gdal-dev@lists.osgeo.org> wrote: >>> I suspect it won't make a lot of difference. SOZIP is designed to allow seeking within a compressed file, but Zarr is tile- (block) based, where each of those is stored in a different file. So you end up uncompressing a whole file anyway. Laurentiu On Mon, Feb 24, 2025, at 21:17, Scott via gdal-dev wrote: > There's GDAL's sozip (Search Optimized Zip) utility as well. I have no > idea if it works with .zarr. I'm sure someone will correct me! ;) > > cd nczarr_v2.zarr > sozip ../nczarr_v2.zarr.zip . > sozip -l /vsizip/../nczarr_v2.zarr.zip > > On 2/24/25 10:09, Kurt Schwehr via gdal-dev wrote: >> Hi all, >> >> I seem to be having trouble exactly how to correctly make a zip of a >> zarr and how to correctly specify the vsizip url. >> >> e.g from autotest/gdrivers/data/zarr >> >> gdalinfo nczarr_v2.zarr # Works >> tar cf nczarr_v2.zarr.zip nczarr_v2.zarr >> >> Then what? I've tried lots of variations and not had any success. >> >> Thanks! >> -Kurt >> >> ___ >> gdal-dev mailing list >> gdal-dev@lists.osgeo.org >> https://lists.osgeo.org/mailman/listinfo/gdal-dev > > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev >>> ___ >>> gdal-dev mailing list >>> gdal-dev@lists.osgeo.org >>> https://lists.osgeo.org/mailman/listinfo/gdal-dev >>> >> >> >> -- >> Michael Sumner >> Research Software Engineer >> Australian Antarctic Division >> Hobart, Australia >> e-mail: mdsum...@gmail.com >> > -- Michael Sumner Research Software Engineer Australian Antarctic Division Hobart, Australia e-mail: mdsum...@gmail.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?
Hi, While I agree with the sentiment, I'm not sure I agree about some of the details. It assumes that on ZARR ZIPs, you need to make twice as many reads, for the local file headers and for the data. But the information in the local header is already available in the central directory, so you can seek directly to the data. This assumption is then used throughout the analysis. But there are things I don't really like about the format, like the very flexible data model and physical organization, the implicit assumption that everything is lat/lon, and the lack of overviews. I'm also a bit skeptical about the Python ecosystem. My impression is that Zarr just tries (and manages) to be a better NetCDF. Laurentiu On Thu, Feb 27, 2025, at 06:06, Michael Sumner via gdal-dev wrote: > Just clueing into why you might be working with this, have you seen this > critique? (was a bit shocked to see that this is apparently going forward > for Sentinel 2, let alone that it was even considered!) > > https://github.com/csaybar/ESA-zar-zip-decision > > Also glad to see a working example outlined that I can follow, Thanks! > > Cheers, Mike ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?
Agreed, I know I'm heavily biased myself towards a little corner of tech. PS: I raised the issue of the local headers in https://github.com/csaybar/ESA-zar-zip-decision/issues/5, if you want to follow. On Thu, Feb 27, 2025, at 12:07, Michael Sumner wrote: > Agree with all this, unfortunately xarray was a bit late with PR 9543 that > provides a basis for implicit coordinates (and will hopefully feed down into > Zarr), and knowing the difference from the low basis NetCDF provided. > > There's a lot to it, but my read is that xarray is the new and way better > NetCDF (and I mean really damn impressive and ambitious and general), but > because it's born in python it missed a lot of really key geospatial > foundations that we take for granted, and for various reasons don't flow well > from rasterio through xarray . > > I have long seen a need for some pretty serious cross discipline reviews, and > many of those are happening but not always enough, especially with > generational overwork and "novelty" burnout. > > > Appreciate the discussion here a lot. > Cheers, Mike > ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?
Agree with all this, unfortunately xarray was a bit late with PR 9543 that provides a basis for implicit coordinates (and will hopefully feed down into Zarr), and knowing the difference from the low basis NetCDF provided. There's a lot to it, but my read is that xarray is the new and way better NetCDF (and I mean really damn impressive and ambitious and general), but because it's born in python it missed a lot of really key geospatial foundations that we take for granted, and for various reasons don't flow well from rasterio through xarray . I have long seen a need for some pretty serious cross discipline reviews, and many of those are happening but not always enough, especially with generational overwork and "novelty" burnout. Appreciate the discussion here a lot. Cheers, Mike On Thu, Feb 27, 2025, 19:53 Laurențiu Nicola wrote: > Hi, > > While I agree with the sentiment, I'm not sure I agree about some of the > details. It assumes that on ZARR ZIPs, you need to make twice as many > reads, for the local file headers and for the data. But the information in > the local header is already available in the central directory, so you can > seek directly to the data. This assumption is then used throughout the > analysis. > > But there are things I don't really like about the format, like the very > flexible data model and physical organization, the implicit assumption that > everything is lat/lon, and the lack of overviews. I'm also a bit skeptical > about the Python ecosystem. My impression is that Zarr just tries (and > manages) to be a better NetCDF. > > Laurentiu > > > On Thu, Feb 27, 2025, at 06:06, Michael Sumner via gdal-dev wrote: > > Just clueing into why you might be working with this, have you seen this > critique? (was a bit shocked to see that this is apparently going forward > for Sentinel 2, let alone that it was even considered!) > > https://github.com/csaybar/ESA-zar-zip-decision > > Also glad to see a working example outlined that I can follow, Thanks! > > Cheers, Mike > > > ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev