subject:"\[I\] Remove nulls in joins during hash table lookup \[datafusion\]"

Re: [I] Remove nulls in joins during hash table lookup [datafusion]

2025-04-21 Thread via GitHub

ctsk commented on issue #15784: URL: https://github.com/apache/datafusion/issues/15784#issuecomment-2817904944 That does make sense. I found the [filter_null_join_keys](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/filter_null_join_keys.rs) rule which takes care of

Re: [I] Remove nulls in joins during hash table lookup [datafusion]

2025-04-21 Thread via GitHub

Dandandan commented on issue #15784: URL: https://github.com/apache/datafusion/issues/15784#issuecomment-2817826955 Additionally, I think there is an optimization that pushes down null filters down. Maybe it would be worth testing if it can not pushed down to execute it in the join itself (

Re: [I] Remove nulls in joins during hash table lookup [datafusion]

2025-04-20 Thread via GitHub

2010YOUY01 commented on issue #15784: URL: https://github.com/apache/datafusion/issues/15784#issuecomment-2817567103 It's a good idea. However, I think if the build side has a large HT and the probe side has many `NULL`s, the existing implementation is likely to be faster. Some statistics m

[I] Remove nulls in joins during hash table lookup [datafusion]

2025-04-20 Thread via GitHub

ctsk opened a new issue, #15784: URL: https://github.com/apache/datafusion/issues/15784 ### Is your feature request related to a problem or challenge? The join implementation currently removes nulls during equality checking - I believe it would be faster / more efficient if we did so