Geethapranay1 commented on issue #4131: URL: https://github.com/apache/datafusion-comet/issues/4131#issuecomment-4386430150
Hi @comphead, I just went through the merged pr #4003 and noticed the fallback for `FIRST/LAST` under `PartialMerge` is already in place doConvert rejects them at operators.scala. So the correctness issue from the original bug report is handled. I think what's left here is removing that fallback and actually supporting `FIRST/LAST` natively in `PartialMerge` mode. The underlying problem is that datafusion's hash table doesn't guarantee the same group visitation order as spark during `merge_batch`, which breaks order-dependent aggregates like `FIRST` and `LAST`. one way i think to fix this would be to add a monotonic ordinal to the accumulator state something like `(value, ordinal)` so that `merge_batch` can always pick the right value regardless of processing order: * `FIRST` would keep the smallest ordinal * `LAST` the largest that would mean touching the state schema and merge logic in `native/spark-expr/src/agg_funcs/`, and then removing the guard in operators.scala once native execution is correct. does this direction make sense before I start working on it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
