Dandandan commented on PR #15462:
URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783121877

   > > The related results are as follows. I’m not sure what you think about 
these results — should we continue using true_count/false_count, or look for a 
better solution? @alamb @Dandandan
   > 
   > Thank you for your diligence @acking-you
   > 
   > I think we should merge this PR in (with the count bits and the 
benchmarks) and file a follow on ticket to potentially improve the performance 
in some other way.
   > 
   > I don't fully understand your performance results given that the two 
functions seem very similar -- maybe it has to do with option hanlding messing 
auto vectorization or something
   > 
   > 
https://docs.rs/arrow-array/54.3.1/src/arrow_array/array/boolean_array.rs.html#160
 https://docs.rs/arrow-arith/54.3.1/src/arrow_arith/aggregate.rs.html#751
   
   I think the likely part of it getting slower is short-circuiting whenever a 
`true` is observed.
   
   For this case it might be interesting to compare it with 
`array.values().bit_chunks().iter_padded().fold(true, |acc, i: u64| i != 0 && 
acc)` and see if it is any faster than the `true_count` implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to