Hi Even,

Cool!

Out of curiosity: I thought sqlite/geopackage was already relatively 'skinny' 
packed? Am I wrong?

Anybody has experience with zipping "laaaarge zipped GeoPackages"? ;-) Is that 
useful? I just tested a 885MB mbtiles file (I know not geopackage...but still sqlite 
isnt't it?), and that ended up in 868MB.

OR... are we talking about sets of geopackages?

Regards,

Richard Duivenvoorde


On 1/9/23 16:42, Even Rouault via QGIS-Developer wrote:
Sorry for cross-posting, but very relevant topic for QGIS. To make it short, 
pending compressing .zip files in the SOZip way, it is possible to directly 
read laaaarge zipped GeoPackage files (or Shapefiles for nostalgic) from QGIS 
without prior decompression

Even

-------- Message transféré --------
Sujet :         [gdal-dev] Announcing SOZip: Seek-Optimized profile for the 
.zip format
Date :  Mon, 9 Jan 2023 15:19:07 +0100
De :    Even Rouault <[email protected]>
Pour :  [email protected] <[email protected]>



Hi,

It is my pleasure to announce ( 
https://github.com/sozip/sozip-spec/blob/master/blog/01-announcement.md ) the 
initial release of the specification ( 
https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md ) for 
the SOZip (Seek-Optimized Zip) profile to the ZIP file format, as well as its 
GDAL implementation.

What is SOZip ?
----------------------

A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one or several 
Deflate-compressed files that are organized and annotated such that a 
SOZip-aware reader can perform very fast random access (seek) within a 
compressed file.

SOZip makes it possible to access large compressed files directly from a .zip 
file without prior decompression. It is not a new file format, but a profile of 
the existing ZIP format, done in a fully backward compatible way. ZIP readers 
that are non-SOZip aware can read a SOZip-enabled file normally and ignore the 
extended features that support efficient seek capability.

Use cases
--------------

The SOZip specification is intended to be general purpose / not domain 
specific. It was first developed to serve geospatial use cases, which commonly 
have large compressed files inside of ZIP archives. In particular, it makes it 
possible for users to read large GIS files using the Shapefile, GeoPackage or 
FlatGeobuf formats (which have no native provision for compression) compressed 
in .zip files without prior decompression.

Efficient random access and selective decompression are a requirement to 
provide acceptable performance in many usage scenarios: spatial index 
filtering, access to a feature by its identifier, etc.

Performance
------------------

SOZip is efficient:

* The overhead of using a file from a SOZip archive, compared to using it 
uncompressed, is of the order of 10% for common read operations.
* Generation of a SOZip file can be much faster than regular ZIP generation 
when using multithreading.
* SOZip files are typically only ~ 5% larger than regular ZIPs (dependent on 
content, and chunk size)

Have a look at benchmarking results: 
https://github.com/sozip/sozip-spec/blob/master/README.md#benchmarking

Other ZIP related specification
------------------------------------------

The SOZip GitHub organization also hosts the KeyValuePairs extra-field 
specification ( 
https://github.com/sozip/keyvaluepairs-spec/blob/master/zip_keyvalue_extra_field_specification.md
 ), to be able to encode arbitrary key-value pairs of metadata associated with 
a file within a ZIP. For example to store the Content-Type of a file.

How does this relate to GDAL ?
-------------------------------------------

Pull request https://github.com/OSGeo/gdal/pull/7042 has been submitted with 
the following enhancements:

*  The /vsizip/ virtual file system uses the SOZip index to perform fast
     random access within a compressed SOZip-enabled file.

* The Shapefile and GPKG drivers can directly generate SOZip-enabled 
.shz/.shp.zip or .gpkg.zip files.

*  Addition of the CPLAddFileInZip() C function that can compress a file and add
     it to an new or existing ZIP file, and enable the SOZip optimization when 
relevant.

*  The existed VSIGetFileMetadata() method can be called on a filename of
     the form /vsizip/path/to/the/file.zip/path/inside/the/zip/file and
     with domain = "ZIP" to get information if a SOZip index is available for 
that file.

*  The sozip 
(https://github.com/rouault/gdal/blob/sozip/doc/source/programs/sozip.rst) new 
command line utility
     can be used to create a seek-optimized ZIP file, to append files to an 
existing ZIP file, list the
     contents of a ZIP file and display the SOZip optimization status or 
validate a SOZip file.

Best regards,

Even

--

http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
QGIS-Developer mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

_______________________________________________
QGIS-Developer mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
  • [QGIS-Developer] Fw... Even Rouault via QGIS-Developer
    • Re: [QGIS-Deve... Richard Duivenvoorde via QGIS-Developer
      • Re: [QGIS-... Even Rouault via QGIS-Developer
      • Re: [QGIS-... WhereGroup
        • Re: [Q... Even Rouault via QGIS-Developer
          • [Q... Catania, Luke A ERDC-RDE-GRL-VA CIV via QGIS-Developer
            • ... Ismail Sunni via QGIS-Developer
              • ... Catania, Luke A ERDC-RDE-GRL-VA CIV via QGIS-Developer
            • ... Nyall Dawson via QGIS-Developer
              • ... Catania, Luke A ERDC-RDE-GRL-VA CIV via QGIS-Developer

Reply via email to