https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118072

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So we are just left with the unstablity of the chosing based on the cache and
sometimes the cache is different when first based on divide vs mod.

I suspect if you do timing on the mod with/without using the udiv instruction,
both might end up being similar.  NOTE you need to do large values too and not
just small values since udiv instruction has an early out on almost all aarch64
cores.

Reply via email to