berkaysynnada commented on PR #13560:
URL: https://github.com/apache/datafusion/pull/13560#issuecomment-2507351353

   Hi @haohuaijin, and sorry for the delayed response. I have been very busy 
over the past few days. I have reviewed your fix and have some comments about 
the problem and the solution.
   
   The issue originates from `handle_custom_pushdown()`, which overfits to 
operators that simply concatenate the input children's schemas side by side. In 
your example, the join is a right-semi join, which exposed this bug.
   
   Your fix refers to column names when propagating the order. However, if 
there are multiple columns in the child schema, this fix may still be 
insufficient. To address this, I suggest two potential solutions:
   
   1. You can write a specific handler for hash joins. This approach would not 
rely on a general mapping of input_fields to output_fields but would instead 
use a join type parameter to propagate the ordering. This solution should work 
for any type of join in our project and is my recommended approach. You could 
copy the current logic from this function and adapt it for different join types.
   2. You could generalize the logic further to include a mapping of input 
children fields to output children fields and propagate the ordering 
accordingly. However, implementing this solution would be non-trivial and would 
require extensive testing.
   If you go with the first solution, I suggest adding a note to the function’s 
header to clarify its purpose:
   "The function can be used for operators that simply concatenate the input 
children's schemas side by side."
   
   By the way, @alamb, isn’t it risky to have an `index_of(`) API in the 
`Schema` implementation? Doesn’t this create the potential for mismatches when 
columns are matched based solely on their names?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to