Erik,

I don't think it is really worth sozip'ing a zipped Zarr, given that zarr is made of many relatively small files, and sozip shines with big compressed files.  Generally, even when creating a zipped (sozip or not) Zarr file, you need to make sure that your writing pattern matches chunks boundaries, to avoid chunk files to be rewritten several times and making the zip bigger than needed. Please file an issue about the error not being transmitted up to the caller

Even

Le 19/07/2025 à 17:44, Erik Schnetter via gdal-dev a écrit :
I am using GDAL to create a multidimensional zarr file that is sozip compressed. I see this error when creating the file:

ERROR 1: dish_positions.00000000.zarr/zarr.json already exists in ZIP file
ERROR 8: Open file /vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr/zarr.json to write failed

Everything is working fine when I do not use sozip compression. I enable sozip compression by adding a "/vsizip" prefix to the file name. Although there is an error reported on screen, I do not see an error code reported by the function creating or closing the multidimensional dataset. The resulting file ("*.zarr.zip") is created fine and looks almost correct, but all attributes seem to be missing.

I wonder – is it actually possible to create a zarr file that is sozip compressed, given that zarr probably writes to each of its file multiple times? If not, what is the preferred way to create a sozip-compressed zarr file efficiently?

Some details:

I create the dataset (i.e. the file) via

                const auto driver_manager = GetGDALDriverManager();
                const auto driver = driver_manager->GetDriverByName("Zarr");                 const auto dataset = std::unique_ptr<GDALDataset>(driver->CreateMultiDimensional(                     full_path.c_str(), root_group_options_c.data(), options_c.data()));

where "full_path" is "/vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr".

I then create multiple attributes ("CreateAttribute") and then

                const auto mdarray = group->CreateMDArray(meta->get_name(), dimensions, datatype,
array_options_c.data());
                    const bool success = mdarray->Write(
                        arrayStart.data(), count.data(), nullptr, bufferStride.data(), datatype,                         frame + datatypesize * meta->offset, frame, buffer->frame_size);

and finish with

                const CPLErr err = dataset->Close();
                assert(!err);

The full code is available at <https://github.com/kotekan/kotekan/blob/eschnett/updates-2/lib/stages/gdalFileWrite.cpp>.

-erik

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to