acking-you commented on PR #15694: URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340
The error in `cargo test` is caused by an incorrect calculation of the pre-selection. The correct steps for calculating the pre-selection are as follows: 1. Compute the boolean array on the left-hand side. 2. Filter to obtain a new record batch based on the left-hand boolean array. 3. Compute the boolean array on the right-hand side using the new record batch (note that this boolean array will have incorrect positions since it corresponds to the new record batch). 4. Combine the left-hand and right-hand boolean arrays to produce the correct boolean array (modify the positions in the left-hand array marked as `true` based on the values from the right-hand array). To illustrate this, I’ve drawn a diagram (it’s quite rough since I used a mouse, so I hope you don’t mind 😂):  The current code only handles up to the second step, which is why the error occurs. If you use `evaluate_selection`, the issue persists because its internal call to `scatter` (which corresponds to completing the fourth step) fills in the missing parts of the right-hand boolean array with null values, resulting in an incorrect outcome. I’m currently working on implementing the fourth step to fix the issue, but it may take some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org