Both below issues should now be fixed per
https://github.com/OSGeo/gdal/pull/13606 . Turns out what caused GDAL
to probe all files even when _metadata is present is perhaps completely
different from the reason for the python reproducer in the below
apache/arrow issue.
Le 28/12/2025 à 16:48, Even Rouault via gdal-dev a écrit :
Hi Mike,
the problem is likely two folds:
- "gdal vector partition" doesn't write the "_metadata" file that
contains the schema and the path to the actual .parquet files
- but even if it did, I cannot manage to convince libarrow/libparquet
to not probe all files. Not sure if I'm missing something in the API
or if that's a fundamental limitation of the library. I've filed
https://github.com/apache/arrow/issues/48671 about that. I've
considered implementing a workaround on GDAL side but I couldn't come
with anything.
Your best workaround is to directly access
"/vsis3/bucket/overture/20251217/overture-buildings/country=US"
Even
Le 28/12/2025 à 13:26, Michael Smith via gdal-dev a écrit :
I know that gdal can write parquet data with hive partitioning using gdal
vector partition, but after doing so, can gdal do the partition elimination on
reading when a where/attribute is specified on the partition key?
I was trying to do a pipeline with:
gdal vector pipeline ! read "/vsis3/bucket/overture/20251217/overture-buildings/” !
filter --bbox -117.486117584442,33.9156194185775,-117.333055544584,33.9745995301481 --where
"country='US'" ! write -f parquet /tmp/test1.parquet --progress --overwrite
but in CPL_DEBUG I see it scanning all the partitions rather than just querying
the country=US partition.
S3: Downloading 0-1605631
(https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAI/data_0.parquet)...
S3: Got response_code=206
S3: Downloading 0-16383999
(https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet)...
S3: Got response_code=206
S3: Downloading 0-16383999
(https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet)...
S3: Got response_code=206
S3: Downloading 16384000-32767999
(https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet)...
S3: Got response_code=206
S3: Downloading 16384000-29741378
(https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet)...
....
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev