Dan,

No you didn't do anything obviously wrong. I'm not sure that in the ArrowDataset mode libarrow actually uses group statistics to filter out row groups, which might cause it to actually ingest the whole files

You may also try to tune the config options at https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/parquet/ogrparquetdatasetlayer.cpp#L522-L558

do you observe a similar difference if you work with just a simple file like /vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area/part-00000-5466202d-8cdf-48e5-9aee-886c73dafc5f-c000.zstd.parquet ?

Even

Le 28/08/2024 à 18:45, Daniel Baston via gdal-dev a écrit :
Hello,

I'm trying to use ogr2ogr with an attribute filter to pull 14 polygons
from Overture maps. Running the following command with CPL_DEBUG=ON
tells me that "PARQUET: Attribute filter fully translated to Arrow"
yet it takes about 7 minutes to complete, and appears to download
quite a bit of data:

ogr2ogr /tmp/vt.geojson
"PARQUET:/vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area"
-select "id,division_id,names.primary" -where "subtype='county' AND
country='US' AND region='US-VT'"

Have I made a mistake in my ogr2ogr invocation? For comparison,
running what I believe to be an equivalent query in DuckDB takes about
10 seconds:

SELECT
       id,
       division_id,
       names.primary,
       ST_GeomFromWKB(geometry) as geometry
       FROM
           
read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area/*',
hive_partitioning=1)
       WHERE
           subtype = 'county'
           AND country = 'US'
           AND region = 'US-VT';

I am using GDAL master (e09d07a7) and libarrow 16.1.

Thanks,
Dan
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to