Hi,

This has been much improved in upcoming GDAL 3.10.0 : cf in particular https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538 . Now only the header and trailers of part-00000 are read.

That said duckdb will likely still outperform the OGR GeoParquet driver (GDAL 3.11 with https://github.com/OSGeo/gdal/pull/11003 will allow to use libduckdb)

Even

Le 24/10/2024 à 21:41, Varun Sharma via gdal-dev a écrit :
Hello GDAL'ers ,

I have made a few attempts at using ogr2ogr for getting bounding box based extracts from overturemaps datasets.

I am unfortunately not able to do so - something that takes duckdb or overturemaps-py <https://github.com/OvertureMaps/overturemaps-py > 30s or less takes forever when using ogr2ogr. overturemaps-py is essentially a wrapper over pyarrow with the arrow filter constructed from bbox.

I suspect I am doing something wrong. The lesser probability is that ogr2ogr is not the right tool for this.

Attempt 1: Command at the top of the link
---------------------------------------------
https://pastebin.com/bh05Kcww

Attempt 2:
----------------------------------------------

https://pastebin.com/BG3WmQ9Y

From what I can tell, all row groups from each of the parquet files is being loaded and checked. This is clearly not correct.

Below are my libs and versions on ubuntu 20.04. All attempts are within a conda environment.

gdal                      3.9.2
gcc_linux-64              12.4.0
libarrow                  17.0.0
libarrow-dataset          17.0.0
libparquet                17.0.0
zstd                      1.5.6
libgdal-core              3.9.2
libgdal-arrow-parquet     3.9.2
libcurl/8.9.1
OpenSSL/3.3.2

I typically use the command line tools to test gdal/ogr's functionality and performance before I can embed that functionality in my own c++ app. Thus, while there are other tools, I would love to understand how to do this in GDAL/OGR.

Please advice !

cheers,
Varun



_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is 
just about bytes.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to