As a followup, I was able to get this working using gdal-master build from conda (which is quite cool):
ogrinfo -ro -oo PRELUDE_STATEMENTS="LOAD SPATIAL" -oo PRELUDE_STATEMENTS="LOAD PARQUET" ADBC:'overture-places.parquet' -sql "select st_astext(geometry), * from \"overture-places\" where st_dwithin_spheroid(geometry, ST_POINT( -72.1440, 43.6406 ), 500)=true and bbox.xmin BETWEEN -73 AND -72 AND bbox.ymin BETWEEN 43 AND 44" using duckdb sql to query parquet. I find that I have to have a dummy local parquet file and then I can query remote datasets just fine: ogrinfo -ro -sql "select *, st_astext(geometry) geom from read_parquet(\"s3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/*\", filename=true, hive_partitioning=1) where st_dwithin_spheroid(geometry,ST_POINT( -72.1440, 43.6406 ), 500)=true and bbox.xmin BETWEEN -73 AND -72 AND bbox.ymin BETWEEN 43 AND 44" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="load spatial" -oo PRELUDE_STATEMENTS="load parquet" ADBC:~/dummy.parquet Mike -- Michael Smith Remote Sensing/GIS Center US Army Corps of Engineers On 12/22/24, 10:05 AM, "Even Rouault" <even.roua...@spatialys.com <mailto:even.roua...@spatialys.com>> wrote: Hi Michael, I've also noticed that the ADBC / Arrow interface of libduckdb seems to be less efficient than their native API. I've no idea whether this is for a fundamental cause or if it is "just" an implementation issue that could be improved (on their side). In particular I had the impression that getting an arrow stream for "SELECT * FROM 'the_filename'", as used internally by the driver, seemed to trigger the whole file to be ingested. Or maybe just the first row group, but that might already be too much. To be noted too that the driver itself asks for Arrow streams a couple of times when geometries are detected, because it rewrites the SQL to use ST_AsWKB() on the geometry columns, otherwise when the spatial extension is loaded, it returns geometries encoded with their own geometry encoding, and I didn't bother writing a parser for this custom encoding (ADBC support in GDAL is a unsponsored effort) So perhaps to get the most of duckdb, a dedicated driver should be written. Regarding the lack of geometry for your use case, I'm not sure what the cause is. I believe that duckdb_spatial is a bit stricter / less lax than the OGR GeoParquet driver to recognize GeoParquet. At least older versions of OvertureMaps were loosely compliant with GeoParquet. With https://github.com/OSGeo/gdal/pull/11536 <https://github.com/OSGeo/gdal/pull/11536> applies, the following works (although much slower than we'd indeed like it to run) $ ogrinfo ADBC: -oo SQL="SELECT * FROM 's3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd.parquet' LIMIT 1" -al INFO: Open of `ADBC:' using driver `ADBC' successful. Layer name: part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd Geometry: Point Feature Count: 1 Extent: (-179.999992, -84.996332) - (-0.001674, 44.999998) Layer SRS WKT: GEOGCRS["WGS 84", [ ... snip ... ] ID["EPSG",4326]] Data axis to CRS axis mapping: 2,1 Geometry Column = geometry id: String (0.0) [ ... snip ... ] type: String (0.0) OGRFeature(part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd):0 id (String) = 08ff39bac830c5900361ff7fe23acab8 version (Integer) = 0 sources (String(JSON)) = [{"property":"","dataset":"meta","record_id":"1150855701606590","update_time":"2024-09-10T00:00:00.000Z","confidence":null}] names.primary (String) = KK Beauty Shop 2 categories.primary (String) = shopping categories.alternate (StringList) = (1:cosmetic_and_beauty_supplies) confidence (Real) = 0.265179677819083 websites (StringList) = (null) socials (StringList) = (1:https://www.facebook.com/1150855701606590 <https://www.facebook.com/1150855701606590>) emails (StringList) = (null) phones (StringList) = (1:+959765858258) brand.wikidata (String) = (null) brand.names.primary (String) = (null) addresses (String(JSON)) = [{"freeform":"အမှတ်(၂၁),ပွဲစားလမ်း(အောက်လမ်း)၊ ကြည့်မြင်တိုင်","locality":"Yangon","postcode":"11101","region":null,"country":"MM"}] theme (String) = places type (String) = place POINT (-179.13203 -84.5792175) Even Le 21/12/2024 à 21:39, Michael Smith via gdal-dev a écrit : > Using gdal-master conda packages, trying to use the new ADBC driver for > libduckdb integration, I’m able to connect to a parquet dataset (only if it > has the parquet extension) but the geometry is not being recognized. > Seems to take a long time to load compared with duckdb. So, I must be doing > something wrong. > Note private s3 bucket. > > > CPL_DEBUG=on ogrinfo > ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo > ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo > PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo > PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo > PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE > S3,PROVIDER CREDENTIAL_CHAIN)" > GDAL: On-demand registering > /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib > using RegisterOGRADBC. > GDAL: > GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, > this=0x13a70a000) succeeds as ADBC. > INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet' > using driver `ADBC' successful. > OGR: GetLayerCount() = 1 > > 1: overture-places (None) > GDAL: > GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, > this=0x13a70a000) > GDAL: In GDALDestroy - unloading GDAL shared library. > > > time CPL_DEBUG=on ogrinfo > ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo > ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL spatial" -oo > PRELUDE_STATEMENTS="load spatial" -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo > PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo > PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo > PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE > S3,PROVIDER CREDENTIAL_CHAIN)" > GDAL: On-demand registering > /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib > using RegisterOGRADBC. > GDAL: > GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, > this=0x129e15350) succeeds as ADBC. > INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet' > using driver `ADBC' successful. > OGR: GetLayerCount() = 1 > > 1: overture-places (None) > GDAL: > GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, > this=0x129e15350) > GDAL: In GDALDestroy - unloading GDAL shared library. > CPL_DEBUG=on ogrinfo -oo ADBC_DRIVER=libduckdb -oo -oo -oo -oo -oo -oo > 90.25s user 22.43s system 41% cpu 4:29.75 total > > -- http://www.spatialys.com <http://www.spatialys.com> My software is free, but my time generally not. Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes. Mood of the day: "Bien entendu, on peut sauter sur sa chaise comme un cabri en disant : les standards ! les standards ! les standards ! Mais ça n’aboutit à rien et ça ne signifie rien." ~ dixit De Gaulle _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev