Kontinuation commented on issue #379:
URL: https://github.com/apache/sedona-db/issues/379#issuecomment-3590990787
This could be cause by changing the ordinal of columns when creating the
write plan. Here is the fragment of verbose query plan for writing the
dataframe:
```
...
| logical_plan | CopyTo: format=parquet output_url=foofy.parquet
options: ()
| | Sort: sd_order(water_point.geometry) ASC NULLS
LAST
| | SubqueryAlias: water_point
| | TableScan: ?table? projection=[OBJECTID,
OBJECTID_1, FEAT_CODE, ZVALUE, MINZ, MAXZ, POLY_CLASS, NAMEID_1, NAME_1, HID,
SHAPE_LENG, SHAPE_AREA, SHAPE_LEN, geometry]
| initial_physical_plan | DataSinkExec: sink=ParquetSink(file_groups=[])
| | ProjectionExec: expr=[OBJECTID@0 as OBJECTID,
OBJECTID_1@1 as OBJECTID_1, FEAT_CODE@2 as FEAT_CODE, ... SHAPE_LEN@12 as
SHAPE_LEN, geoparquet_bbox(geometry@13) as bbox, geometry@13 as geometry]
| | SortExec: expr=[sd_order(geometry@13) ASC
NULLS LAST], preserve_partitioning=[false]
| | DataSourceExec: file_groups={1 group:
[[../files/ns-water_water-poly_geo.parquet]]}, projection=[OBJECTID,
OBJECTID_1, FEAT_CODE, ... , SHAPE_LEN, geometry], file_type=parquet
...
```
The sort key in the plan is `sd_order(geometry@13)`, it assumed that the
input argument of `sd_order` comes from the 13-th column of batches from the
upstream.
The actual columns are as following after creating the write physical plan:
```
OBJECTID@0 as OBJECTID,
OBJECTID_1@1 as OBJECTID_1,
FEAT_CODE@2 as FEAT_CODE,
ZVALUE@3 as ZVALUE,
MINZ@4 as MINZ,
MAXZ@5 as MAXZ,
POLY_CLASS@6 as POLY_CLASS,
NAMEID_1@7 as NAMEID_1,
NAME_1@8 as NAME_1,
HID@9 as HID,
SHAPE_LENG@10 as SHAPE_LENG,
SHAPE_AREA@11 as SHAPE_AREA,
SHAPE_LEN@12 as SHAPE_LEN,
geoparquet_bbox(geometry@13) as bbox, # was geometry@13 as geometry in the
table schema
geometry@13 as geometry # now should be geometry@14
```
The physical plan for sort and its sort key expression could be generated
according to the schema of output relation, so the `Column` expression binds to
the 13-th column. The changes to the ordinals of bound columns were not taken
into account when creating the write physical plan. This caused the problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]