Thanks Even for your prompt reply! 1. Just to clarify, with GDAL v3.10.0, the command
ogr2ogr -f GPKG ogr_water.gpkg -spat 7.5 46.5 7.7 46.7 /vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=base/type=water/ is fine and I should see a ( significant ) speed up .. yes ? 2. the apache arrow project libs itself have many knobs to tweak like threadpools, i/o threads, memory pools etc - are these exposed in GDAL configuration options ? 3. GDAL 3.11 ADBC with libduckdb would be amazing - in my C++ app, I was thinking of directly using libduckdb and duckdb-spatial. but I don't know how to use duckdb in C++ apart from passing SQL queries as strings :). Your linked PR thread and https://github.com/OSGeo/gdal/issues/10887 are very interesting reads ! Best, Varun On Thu, Oct 24, 2024 at 10:01 PM Even Rouault <even.roua...@spatialys.com> wrote: > Hi, > > This has been much improved in upcoming GDAL 3.10.0 : cf in particular > https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538 > . Now only the header and trailers of part-00000 are read. > > That said duckdb will likely still outperform the OGR GeoParquet driver > (GDAL 3.11 with https://github.com/OSGeo/gdal/pull/11003 will allow to > use libduckdb) > > Even > Le 24/10/2024 à 21:41, Varun Sharma via gdal-dev a écrit : > > Hello GDAL'ers , > > I have made a few attempts at using ogr2ogr for getting bounding box based > extracts from overturemaps datasets. > > I am unfortunately not able to do so - something that takes duckdb or > overturemaps-py <https://github.com/OvertureMaps/overturemaps-py> 30s or > less takes forever when using ogr2ogr. overturemaps-py is essentially a > wrapper over pyarrow with the arrow filter constructed from bbox. > > I suspect I am doing something wrong. The lesser probability is that > ogr2ogr is not the right tool for this. > > Attempt 1: Command at the top of the link > --------------------------------------------- > https://pastebin.com/bh05Kcww > > Attempt 2: > ---------------------------------------------- > > https://pastebin.com/BG3WmQ9Y > > From what I can tell, all row groups from each of the parquet files is > being loaded and checked. This is clearly not correct. > > Below are my libs and versions on ubuntu 20.04. All attempts are within a > conda environment. > > gdal 3.9.2 > gcc_linux-64 12.4.0 > libarrow 17.0.0 > libarrow-dataset 17.0.0 > libparquet 17.0.0 > zstd 1.5.6 > libgdal-core 3.9.2 > libgdal-arrow-parquet 3.9.2 > libcurl/8.9.1 > OpenSSL/3.3.2 > > I typically use the command line tools to test gdal/ogr's functionality > and performance before I can embed that functionality in my own c++ app. > Thus, while there are other tools, I would love to understand how to do > this in GDAL/OGR. > > Please advice ! > > cheers, > Varun > > > > _______________________________________________ > gdal-dev mailing > listgdal-dev@lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- http://www.spatialys.com > My software is free, but my time generally not. > Butcher of all kinds of standards, open or closed formats. At the end, this > is just about bytes. > >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev