Fixed per https://github.com/OSGeo/gdal/pull/13941 . I would expect the
performance to better than your GDAL Python snippet due to OGR SQL
setting ignored fields, so only the geometry and file_size columns,
which will reduce I/O if there are many attribute fields.
Le 17/02/2026 à 11:48, Michael Smith via gdal-dev a écrit :
I wanted to get a sum of the value of a column using a spatial filter on a
parquet file. I can easily do this with duckdb but I was trying via gdal.
I was able to do it via fetching features but was unable to do it just with
executeSQL as the spatialfilter part wouldn’t find the geometry column unless
it was part of the query
This worked:
gf = gdal.OpenEx(f'PARQUET:{parquet_file')
lay = gf.GetLayer()
lay.SetSpatialFilter(ogr.CreateGeometryFromWkb(aoi.wkb))
totsize_bytes +=
sum([feat.GetFieldAsInteger64('file_size') for feat in lay])
This didn’t:
res = gf.ExecuteSQL('select sum(file_size) from
"parquet-file"', ogr.CreateGeometryFromWkb(aoi.wkb))
RuntimeError: Cannot set spatial filter: no geometry
field present in layer.
Is this just a limitation of OGR SQL?
Via duckdb:
wkb_bytes = aoi.wkb.tobytes()
sql = f"select sum(file_size) from read_parquet('{str(parquet-file)}') where
ST_Intersects_Extent(geometry, ST_GeomFromWKB(?))"
params = [wkb_bytes]
Performance difference:
gdal: size: 1139758617, time: 0:01:37.471977
duck: size: 1139758617, time: 0:00:15.171584
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev