Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?

2025-02-27 Thread Michael Sumner via gdal-dev
nice, I've been hitting that up with the multidim model and it works pretty
well. (i'm still scratchy in the C++ but getting what I want out of it).

We were checking out CMIP6 holdings too that look similarly impressive.

M

On Thu, Feb 27, 2025 at 5:27 PM Kurt Schwehr  wrote:

> Mike,
>
> That's an interesting one for sure.
>
> Even before the ESA talk of Zarr, we have a good amount of Zarr data
> around. Just one I know of:
> https://cloud.google.com/storage/docs/public-datasets/era5
>
> On Wed, Feb 26, 2025 at 8:07 PM Michael Sumner  wrote:
>
>> Just clueing into why you might be working with this, have you seen this
>> critique?  (was a bit shocked to see that this is apparently going forward
>> for Sentinel 2, let alone that it was even considered!)
>>
>> https://github.com/csaybar/ESA-zar-zip-decision
>>
>> Also glad to see a working example outlined that I can follow, Thanks!
>>
>> Cheers, Mike
>>
>>
>>
>> On Tue, Feb 25, 2025 at 7:10 AM Kurt Schwehr via gdal-dev <
>> gdal-dev@lists.osgeo.org> wrote:
>>
>>> Thanks Laurentiu and Scott!
>>>
>>> I can't believe 1) I left off -r and 2) I didn't think to look at what's
>>> in the zip. Doh!
>>>
>>> And thanks for reminding me about sozip. I was one of the reviewers of
>>> the initial spec and still don't have it front-of-mind.
>>>
>>> unzip -l nczarr_v2.zarr.zip
>>> Archive:  nczarr_v2.zarr.zip
>>>   Length  DateTimeName
>>> -  -- -   
>>> 0  2024-12-23 02:00   nczarr_v2.zarr/
>>> - ---
>>> 0 1 file
>>>
>>> Now happily able to see the contents:
>>>
>>> gdalinfo /vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr
>>> Driver: Zarr/Zarr
>>> Files: /vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr
>>> Size is 512, 512
>>> Subdatasets:
>>>
>>> SUBDATASET_1_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/lon
>>>   SUBDATASET_1_DESC=Array /MyGroup/lon
>>>
>>> SUBDATASET_2_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/lat
>>>   SUBDATASET_2_DESC=Array /MyGroup/lat
>>>
>>> SUBDATASET_3_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/dset1
>>>   SUBDATASET_3_DESC=Array /MyGroup/dset1
>>>
>>> SUBDATASET_4_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/Group_A/dset2
>>>   SUBDATASET_4_DESC=Array /MyGroup/Group_A/dset2
>>>
>>> SUBDATASET_5_NAME=ZARR:"/vsizip/nczarr_v2.zarr.zip/nczarr_v2.zarr":/MyGroup/Group_A/dset3
>>>   SUBDATASET_5_DESC=Array /MyGroup/Group_A/dset3
>>> [SNIP]
>>>
>>> -Kurt
>>>
>>> On Mon, Feb 24, 2025 at 11:21 AM Laurențiu Nicola via gdal-dev <
>>> gdal-dev@lists.osgeo.org> wrote:
>>>
 I suspect it won't make a lot of difference. SOZIP is designed to allow
 seeking within a compressed file, but Zarr is tile- (block) based, where
 each of those is stored in a different file. So you end up uncompressing a
 whole file anyway.

 Laurentiu

 On Mon, Feb 24, 2025, at 21:17, Scott via gdal-dev wrote:
 > There's GDAL's sozip (Search Optimized Zip) utility as well. I have
 no
 > idea if it works with .zarr. I'm sure someone will correct me! ;)
 >
 > cd nczarr_v2.zarr
 > sozip ../nczarr_v2.zarr.zip .
 > sozip -l /vsizip/../nczarr_v2.zarr.zip
 >
 > On 2/24/25 10:09, Kurt Schwehr via gdal-dev wrote:
 >> Hi all,
 >>
 >> I seem to be having trouble exactly how to correctly make a zip of a
 >> zarr and how to correctly specify the vsizip url.
 >>
 >> e.g from autotest/gdrivers/data/zarr
 >>
 >> gdalinfo nczarr_v2.zarr # Works
 >> tar cf nczarr_v2.zarr.zip nczarr_v2.zarr
 >>
 >> Then what? I've tried lots of variations and not had any success.
 >>
 >> Thanks!
 >> -Kurt
 >>
 >> ___
 >> gdal-dev mailing list
 >> gdal-dev@lists.osgeo.org
 >> https://lists.osgeo.org/mailman/listinfo/gdal-dev
 >
 > ___
 > gdal-dev mailing list
 > gdal-dev@lists.osgeo.org
 > https://lists.osgeo.org/mailman/listinfo/gdal-dev
 ___
 gdal-dev mailing list
 gdal-dev@lists.osgeo.org
 https://lists.osgeo.org/mailman/listinfo/gdal-dev

>>> ___
>>> gdal-dev mailing list
>>> gdal-dev@lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
>>
>> --
>> Michael Sumner
>> Research Software Engineer
>> Australian Antarctic Division
>> Hobart, Australia
>> e-mail: mdsum...@gmail.com
>>
>

-- 
Michael Sumner
Research Software Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsum...@gmail.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?

2025-02-27 Thread Laurențiu Nicola via gdal-dev
Hi,

While I agree with the sentiment, I'm not sure I agree about some of the 
details. It assumes that on ZARR ZIPs, you need to make twice as many reads, 
for the local file headers and for the data. But the information in the local 
header is already available in the central directory, so you can seek directly 
to the data. This assumption is then used throughout the analysis.

But there are things I don't really like about the format, like the very 
flexible data model and physical organization, the implicit assumption that 
everything is lat/lon, and the lack of overviews. I'm also a bit skeptical 
about the Python ecosystem. My impression is that Zarr just tries (and manages) 
to be a better NetCDF.

Laurentiu


On Thu, Feb 27, 2025, at 06:06, Michael Sumner via gdal-dev wrote:
> Just clueing into why you might be working with this, have you seen this 
> critique?  (was a bit shocked to see that this is apparently going forward 
> for Sentinel 2, let alone that it was even considered!)
> 
> https://github.com/csaybar/ESA-zar-zip-decision
> 
> Also glad to see a working example outlined that I can follow, Thanks!
> 
> Cheers, Mike
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?

2025-02-27 Thread Laurențiu Nicola via gdal-dev
Agreed, I know I'm heavily biased myself towards a little corner of tech.

PS: I raised the issue of the local headers in 
https://github.com/csaybar/ESA-zar-zip-decision/issues/5, if you want to follow.

On Thu, Feb 27, 2025, at 12:07, Michael Sumner wrote:
> Agree with all this, unfortunately xarray was a bit late with PR 9543 that 
> provides a basis for implicit coordinates (and will hopefully feed down into 
> Zarr), and knowing the difference from the low basis NetCDF provided.
> 
> There's a lot to it, but my read is that xarray is the new and way better 
> NetCDF (and I mean really damn impressive and ambitious and general), but 
> because it's born in python it missed a lot of really key geospatial 
> foundations that we take for granted, and for various reasons don't flow well 
> from rasterio through xarray .
> 
> I have long seen a need for some pretty serious cross discipline reviews, and 
> many of those are happening but not always enough, especially with 
> generational overwork and "novelty" burnout.
> 
> 
> Appreciate the discussion here a lot. 
> Cheers, Mike
> 
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] How to wrap a zarr in a zip and read it with vsizip?

2025-02-27 Thread Michael Sumner via gdal-dev
Agree with all this, unfortunately xarray was a bit late with PR 9543 that
provides a basis for implicit coordinates (and will hopefully feed down
into Zarr), and knowing the difference from the low basis NetCDF provided.

There's a lot to it, but my read is that xarray is the new and way better
NetCDF (and I mean really damn impressive and ambitious and general), but
because it's born in python it missed a lot of really key geospatial
foundations that we take for granted, and for various reasons don't flow
well from rasterio through xarray .

I have long seen a need for some pretty serious cross discipline reviews,
and many of those are happening but not always enough, especially with
generational overwork and "novelty" burnout.

Appreciate the discussion here a lot.
Cheers, Mike

On Thu, Feb 27, 2025, 19:53 Laurențiu Nicola  wrote:

> Hi,
>
> While I agree with the sentiment, I'm not sure I agree about some of the
> details. It assumes that on ZARR ZIPs, you need to make twice as many
> reads, for the local file headers and for the data. But the information in
> the local header is already available in the central directory, so you can
> seek directly to the data. This assumption is then used throughout the
> analysis.
>
> But there are things I don't really like about the format, like the very
> flexible data model and physical organization, the implicit assumption that
> everything is lat/lon, and the lack of overviews. I'm also a bit skeptical
> about the Python ecosystem. My impression is that Zarr just tries (and
> manages) to be a better NetCDF.
>
> Laurentiu
>
>
> On Thu, Feb 27, 2025, at 06:06, Michael Sumner via gdal-dev wrote:
>
> Just clueing into why you might be working with this, have you seen this
> critique?  (was a bit shocked to see that this is apparently going forward
> for Sentinel 2, let alone that it was even considered!)
>
> https://github.com/csaybar/ESA-zar-zip-decision
>
> Also glad to see a working example outlined that I can follow, Thanks!
>
> Cheers, Mike
>
>
>
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev