Re: [I] Combination of sorting and writing to GeoParquet 1.1 causes an expectation to panic in DataFusion internals [sedona-db]

via GitHub Fri, 28 Nov 2025 20:52:00 -0800


Kontinuation commented on issue #379:
URL: https://github.com/apache/sedona-db/issues/379#issuecomment-3590990787


   This could be cause by changing the ordinal of columns when creating the 
write plan. Here is the fragment of verbose query plan for writing the 
dataframe:
   
   ```
   ...
   | logical_plan            | CopyTo: format=parquet output_url=foofy.parquet 
options: ()
   |                         |   Sort: sd_order(water_point.geometry) ASC NULLS 
LAST
   |                         |     SubqueryAlias: water_point
   |                         |       TableScan: ?table? projection=[OBJECTID, 
OBJECTID_1, FEAT_CODE, ZVALUE, MINZ, MAXZ, POLY_CLASS, NAMEID_1, NAME_1, HID, 
SHAPE_LENG, SHAPE_AREA, SHAPE_LEN, geometry]
   | initial_physical_plan   | DataSinkExec: sink=ParquetSink(file_groups=[])
   |                         |   ProjectionExec: expr=[OBJECTID@0 as OBJECTID, 
OBJECTID_1@1 as OBJECTID_1, FEAT_CODE@2 as FEAT_CODE, ... SHAPE_LEN@12 as 
SHAPE_LEN, geoparquet_bbox(geometry@13) as bbox, geometry@13 as geometry]
   |                         |     SortExec: expr=[sd_order(geometry@13) ASC 
NULLS LAST], preserve_partitioning=[false]
   |                         |       DataSourceExec: file_groups={1 group: 
[[../files/ns-water_water-poly_geo.parquet]]}, projection=[OBJECTID, 
OBJECTID_1, FEAT_CODE, ... , SHAPE_LEN, geometry], file_type=parquet
   ...
   
   ```
   
   The sort key in the plan is `sd_order(geometry@13)`, it assumed that the 
input argument of `sd_order` comes from the 13-th column of batches from the 
upstream.
   
   The actual columns are as following after creating the write physical plan:
   
   ```
   OBJECTID@0 as OBJECTID,
   OBJECTID_1@1 as OBJECTID_1,
   FEAT_CODE@2 as FEAT_CODE,
   ZVALUE@3 as ZVALUE,
   MINZ@4 as MINZ,
   MAXZ@5 as MAXZ,
   POLY_CLASS@6 as POLY_CLASS,
   NAMEID_1@7 as NAMEID_1,
   NAME_1@8 as NAME_1,
   HID@9 as HID,
   SHAPE_LENG@10 as SHAPE_LENG,
   SHAPE_AREA@11 as SHAPE_AREA,
   SHAPE_LEN@12 as SHAPE_LEN,
   geoparquet_bbox(geometry@13) as bbox,  # was geometry@13 as geometry in the 
table schema
   geometry@13 as geometry  # now should be geometry@14
   ```
   
   The physical plan for sort and its sort key expression could be generated 
according to the schema of output relation, so the `Column` expression binds to 
the 13-th column. The changes to the ordinals of bound columns were not taken 
into account when creating the write physical plan. This caused the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Combination of sorting and writing to GeoParquet 1.1 causes an expectation to panic in DataFusion internals [sedona-db]

Reply via email to