zhangfengcdt commented on PR #254: URL: https://github.com/apache/sedona-db/pull/254#issuecomment-3471311563
> If this is a DataFusion issue it likely means that the aggregations in SedonaDB are broken (I am not sure we have integration tests for these and we probably should!). > > I would be surprised if this is not solvable in DataFusion...in any case, the lack of field metadata propagation for our specific case is a bug (not necessarily the generic case, as you noted). Our return field in theory declares metadata properly! I got your points! I have added e2e tests (aggregate.rs) to test these aggregation functions and verified that without the "fix" in this PR, they will fail with the same errors we see earlier. I understand that our aggregate_udf.rs has the correct return_field() method implemented that carries the metadata. However, it seems only be used in Datafusion logic plan planning. But in Datafusion's physical planning, as noted before, it does not honor the return_field(), instead, it creates a brand new field from scratch based only on the array's DataType, which is just Binary without extension metadata. The real fix is to make DataFusion's physical aggregate use the logical plan's field (with metadata) instead of creating a new one. Until we have that fix in Datafusion, it does not seem to have an easy way to fully fix it in sedona db. The "fix" in this PR should be able to unblock us, so I would suggest we: (1) add feature flag to control this fixing behavior (2) add todo to remove such "fix" once the root cause in datafusion is resolved. Please let me know if you'd be good with this or any other thoughts, and I can make the changes accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
