acking-you commented on PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340

   The error in `cargo test` is caused by an incorrect calculation of the 
pre-selection. The correct steps for calculating the pre-selection are as 
follows:
   
   1. Compute the boolean array on the left-hand side.
   2. Filter to obtain a new record batch based on the left-hand boolean array.
   3. Compute the boolean array on the right-hand side using the new record 
batch (note that this boolean array will have incorrect positions since it 
corresponds to the new record batch).
   4. Combine the left-hand and right-hand boolean arrays to produce the 
correct boolean array (modify the positions in the left-hand array marked as 
`true` based on the values from the right-hand array).
   
   To illustrate this, I’ve drawn a diagram (it’s quite rough since I used a 
mouse, so I hope you don’t mind 😂):
   
   
![image](https://github.com/user-attachments/assets/83197db1-fed1-47da-90e4-07acb4d7de69)
   
   
   The current code only handles up to the second step, which is why the error 
occurs. If you use `evaluate_selection`, the issue persists because its 
internal call to `scatter` (which corresponds to completing the fourth step) 
fills in the missing parts of the right-hand boolean array with null values, 
resulting in an incorrect outcome.
   
   I’m currently working on implementing the fourth step to fix the issue, but 
it may take some time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to