Hi,
This has been much improved in upcoming GDAL 3.10.0 : cf in particular
https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538
. Now only the header and trailers of part-00000 are read.
That said duckdb will likely still outperform the OGR GeoParquet driver
(GDAL 3.11 with https://github.com/OSGeo/gdal/pull/11003 will allow to
use libduckdb)
Even
Le 24/10/2024 à 21:41, Varun Sharma via gdal-dev a écrit :
Hello GDAL'ers ,
I have made a few attempts at using ogr2ogr for getting bounding box
based extracts from overturemaps datasets.
I am unfortunately not able to do so - something that takes duckdb or
overturemaps-py <https://github.com/OvertureMaps/overturemaps-py > 30s
or less takes forever when using ogr2ogr. overturemaps-py
is essentially a wrapper over pyarrow with the arrow filter
constructed from bbox.
I suspect I am doing something wrong. The lesser probability is that
ogr2ogr is not the right tool for this.
Attempt 1: Command at the top of the link
---------------------------------------------
https://pastebin.com/bh05Kcww
Attempt 2:
----------------------------------------------
https://pastebin.com/BG3WmQ9Y
From what I can tell, all row groups from each of the parquet files is
being loaded and checked. This is clearly not correct.
Below are my libs and versions on ubuntu 20.04. All attempts are
within a conda environment.
gdal 3.9.2
gcc_linux-64 12.4.0
libarrow 17.0.0
libarrow-dataset 17.0.0
libparquet 17.0.0
zstd 1.5.6
libgdal-core 3.9.2
libgdal-arrow-parquet 3.9.2
libcurl/8.9.1
OpenSSL/3.3.2
I typically use the command line tools to test gdal/ogr's
functionality and performance before I can embed that functionality in
my own c++ app. Thus, while there are other tools, I would love to
understand how to do this in GDAL/OGR.
Please advice !
cheers,
Varun
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is
just about bytes.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev