Erik,
I don't think it is really worth sozip'ing a zipped Zarr, given that
zarr is made of many relatively small files, and sozip shines with big
compressed files. Generally, even when creating a zipped (sozip or not)
Zarr file, you need to make sure that your writing pattern matches
chunks boundaries, to avoid chunk files to be rewritten several times
and making the zip bigger than needed. Please file an issue about the
error not being transmitted up to the caller
Even
Le 19/07/2025 à 17:44, Erik Schnetter via gdal-dev a écrit :
I am using GDAL to create a multidimensional zarr file that is sozip
compressed. I see this error when creating the file:
ERROR 1: dish_positions.00000000.zarr/zarr.json already exists in ZIP file
ERROR 8: Open file
/vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr/zarr.json
to write failed
Everything is working fine when I do not use sozip compression. I
enable sozip compression by adding a "/vsizip" prefix to the file
name. Although there is an error reported on screen, I do not see an
error code reported by the function creating or closing the
multidimensional dataset. The resulting file ("*.zarr.zip") is created
fine and looks almost correct, but all attributes seem to be missing.
I wonder – is it actually possible to create a zarr file that is sozip
compressed, given that zarr probably writes to each of its file
multiple times? If not, what is the preferred way to create a
sozip-compressed zarr file efficiently?
Some details:
I create the dataset (i.e. the file) via
const auto driver_manager = GetGDALDriverManager();
const auto driver =
driver_manager->GetDriverByName("Zarr");
const auto dataset =
std::unique_ptr<GDALDataset>(driver->CreateMultiDimensional(
full_path.c_str(), root_group_options_c.data(),
options_c.data()));
where "full_path" is
"/vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr".
I then create multiple attributes ("CreateAttribute") and then
const auto mdarray =
group->CreateMDArray(meta->get_name(), dimensions, datatype,
array_options_c.data());
const bool success = mdarray->Write(
arrayStart.data(), count.data(), nullptr,
bufferStride.data(), datatype,
frame + datatypesize * meta->offset, frame,
buffer->frame_size);
and finish with
const CPLErr err = dataset->Close();
assert(!err);
The full code is available at
<https://github.com/kotekan/kotekan/blob/eschnett/updates-2/lib/stages/gdalFileWrite.cpp>.
-erik
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev