> Another thought: for non-x86 platforms, the SIMD nodes degenerate to
> "simple loop", and looping over up to 32 elements is not great
> (although possibly okay). We could do binary search, but that has bad
> branch prediction.

I am not sure that for relevant non-x86 platforms SIMD / vector
instructions would not be used (though it would be a good idea to
verify)
Do you know any modern platforms that do not have SIMD ?

I would definitely test before assuming binary search is better.

Often other approaches like counting search over such small vectors is
much better when the vector fits in cache (or even a cache line) and
you always visit all items as this will completely avoid branch
predictions and allows compiler to vectorize and / or unroll the loop
as needed.

Cheers
Hannu


Reply via email to