Il 26/03/2013 09:14, Peter Lieven ha scritto: > If noone objects I would use is_zero_page_2 and continue with v5 of > the patch set. As I am ooo for the next 8 days from tomorrow. i > prefer v3 as it has better performance if the non-zeroness is within > the 8*sizeof(VECTYPE) bytes and not in the first 256-bit.
Either v2 or v3 is fine. v3 has slightly simpler code and v2 optimizes for a rare case, but v2 is indeed a bit faster and your benchmarking effort should be rewarded. :) > Paolo, with the version that has lower setup costs in mind shall I > use the vectorized or the unrolled version of patch 4 (find_next_bit > optimization)? I think for that we should, at least for now, use the version we discussed a few weeks ago (with no SIMD and just unrolling). Paolo