berkaysynnada commented on PR #13560: URL: https://github.com/apache/datafusion/pull/13560#issuecomment-2507351353
Hi @haohuaijin, and sorry for the delayed response. I have been very busy over the past few days. I have reviewed your fix and have some comments about the problem and the solution. The issue originates from `handle_custom_pushdown()`, which overfits to operators that simply concatenate the input children's schemas side by side. In your example, the join is a right-semi join, which exposed this bug. Your fix refers to column names when propagating the order. However, if there are multiple columns in the child schema, this fix may still be insufficient. To address this, I suggest two potential solutions: 1. You can write a specific handler for hash joins. This approach would not rely on a general mapping of input_fields to output_fields but would instead use a join type parameter to propagate the ordering. This solution should work for any type of join in our project and is my recommended approach. You could copy the current logic from this function and adapt it for different join types. 2. You could generalize the logic further to include a mapping of input children fields to output children fields and propagate the ordering accordingly. However, implementing this solution would be non-trivial and would require extensive testing. If you go with the first solution, I suggest adding a note to the function’s header to clarify its purpose: "The function can be used for operators that simply concatenate the input children's schemas side by side." By the way, @alamb, isn’t it risky to have an `index_of(`) API in the `Schema` implementation? Doesn’t this create the potential for mismatches when columns are matched based solely on their names? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org