zhangfengcdt commented on PR #254:
URL: https://github.com/apache/sedona-db/pull/254#issuecomment-3470434020
> Thank you for looking into this!
>
> I would prefer to find the place in Comet that is dropping the type and
fix it upstream (perhaps in one of its calls to `return_field()`,
`accumlator()`, or `invoke_batch_with_args()`?).
>
> If we absolutely must do it here, I think we should do it behind a feature
flag...for all standard usage we want to error for this case to catch and fix
dropped metadata as soon as we know it occurs.
This is actually an issue in Datafusion (not Comet) issue for aggregation
function, which impact the four ST UADFs we added before: ST_Envelope_Aggr,
ST_Intersection_Aggr, ST_Union_Aggr, and ST_Analyze_Aggr.
In datafusion, datafusion/physical-plan/src/aggregates/mod.rs -
PhysicalGroupBy::group_fields()
```
Field::new(name, expr.data_type(input_schema)?,
group_expr_nullable || expr.nullable(input_schema)?)
.with_metadata(expr.return_field(input_schema)?.metadata().clone())
```
However, in datafusion/physical-expr-common/src/physical_expr.rs, it simply
creates a new fileds without adding the metadata from original field. See:
https://github.com/apache/datafusion/blob/f57da83aac35f0ea4506ccb6a4ddbd26a503c1c1/datafusion/physical-expr-common/src/physical_expr.rs#L86
This could be fixed by changing the Datafusion source code to check if
expression references an input field, if so, get that field's metadata and call
.with_metadata() to preserve it. However, I am not sure this is acceptable for
all senarios because the aggregated fields may NOT be legit to actually inherit
original field's metadata. For example, this is not the case for
ST_Analyze_Aggr, which the output is not Geometry type anymore. So, not sure
Datafusion commit will accept such changes.
Anyway, I can add a feature flag to only have changes in the PR, but I don't
see risks of not having it under a feature flag.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]