Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-12 Thread via GitHub
acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798871646 > > [@acking-you](https://github.com/acking-you) the code needs to be extended to support nulls (you can take a look at the true_count implementation in arrow-rs to do this e

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-12 Thread via GitHub
acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798742956 > @acking-you the code needs to be extended to support nulls (you can take a look at the true_count implementation in arrow-rs to do this efficiently). I have an idea f

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-11 Thread via GitHub
kosiew commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2798494451 hi @Dandandan I am getting failed tests with ```rust #[test] fn test_all_one() -> Result<()> { // Helper function to run tests and repo

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-11 Thread via GitHub
Dandandan commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2796844672 Btw as a simple concept, I tested this yesterday to reduce execution time of short circuiting all false / all true cases by -25% compared to `true_count` / `false_count`:

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-10 Thread via GitHub
alamb commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2792363126 `ShortCircuitStrategy` is a pretty neat idea In my opinion, as long as the code is easy to understand, makes realistic benchmarks faster, and doesn't regress existing perfo

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-10 Thread via GitHub
Dandandan commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2792442938 > I don't know if you think it's a good idea? @alamb @Dandandan I think it is a pretty good idea given that evaluation is so important. -- This is an automated messag

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-09 Thread via GitHub
acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2788923437 I have an idea that might improve the effectiveness of short-circuit optimization, and it seems necessary to use `false_count` for evaluation counting. The current iss

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-08 Thread via GitHub
Dandandan commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786711757 Would be good to compare it with a boolean version of this as well, like this, to see if it vectorizes better: ``` pub fn all_zero(&self) -> bool { // plat

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-08 Thread via GitHub
alamb commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786424000 > sum += chunk[i].count_ones() as usize; Maybe simply manually unrolling the loop to check 1024 bits at a time would let llvm make the best code Something

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-08 Thread via GitHub
Dandandan commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786341929 Interesting! I think we probably can take some inspiration from arrow-rs aggregate code, e.g. doing something like (?): ```rust /// Counts the number of on

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-08 Thread via GitHub
acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786166445 This might require manual SIMD for optimization, but that would increase the porting difficulty([As duckdb says](https://duckdb.org/faq.html#does-duckdb-use-simd)). However,

[I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-08 Thread via GitHub
alamb opened a new issue, #15631: URL: https://github.com/apache/datafusion/issues/15631 ### Is your feature request related to a problem or challenge? @acking-you 's wonderful PR https://github.com/apache/datafusion/pull/15462 adds short circuiting to boolean operation evaluation whi