Dandandan commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783121877
> > The related results are as follows. I’m not sure what you think about these results — should we continue using true_count/false_count, or look for a better solution? @alamb @Dandandan > > Thank you for your diligence @acking-you > > I think we should merge this PR in (with the count bits and the benchmarks) and file a follow on ticket to potentially improve the performance in some other way. > > I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something > > https://docs.rs/arrow-array/54.3.1/src/arrow_array/array/boolean_array.rs.html#160 https://docs.rs/arrow-arith/54.3.1/src/arrow_arith/aggregate.rs.html#751 I think the likely part of it getting slower is short-circuiting whenever a `true` is observed. For this case it might be interesting to compare it with `array.values().bit_chunks().iter_padded().fold(true, |acc, i: u64| i != 0 && acc)` and see if it is any faster than the `true_count` implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org