alamb commented on issue #12454:
URL: https://github.com/apache/datafusion/issues/12454#issuecomment-2350940281

   You are right about broadcast join, but I think for `OUTER JOIN` cases the 
relation that is not preserved (aka the one that is not being padded with 
nulls) is what is broadcast and the other needs to be partitioned on the join 
key (to ensure all possible non-matching rows occur on only one node). In this 
case I think the distribution isn't quite right
   
   Though @thinkharderdev  maybe that is another idea: how about do the `OUTER 
JOIN` across all tables and then run the results through a second operator that 
removes any duplicate `NULL` padded rows 🤔 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to