https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to kim.walisch from comment #4)
> One possible explanation for why GCC's current integer division by a
> constant assembly sequence was chosen back in the day (I guess one or two
> decades ago) is that GCC's current assembly sequence uses only 1 mul
> instruction whereas Clang uses 2 mul instructions.
> 
> Historically, multiplication instructions used to be slower than add, sub
> and shift instructions on nearly all CPU architectures and so it made sense
> to avoid mul instructions whenever possible. However in the past decade this
> performance gap has narrowed and now it is more important to avoid long
> instruction dependency chains which GCC's current integer modulo by a
> constant assembly sequence suffers from.

Note the way GCC has handled mult->shift/add (and even the divide/mod
expansion) is via a target independent part which querries the target on the
costs of specific instructions (mult, shift, add, etc.). So if the target has
the cost not modeled correctly, you get the less efficient sequence. This is
why I said it was a cost model issue and why PR 115756 is asking for the
changing of the default (generic) output. 
So yes the cost might be based on the older cores and not been retuned since.
Anyways the middle-end is doing the correct thing based on what the target is
giving it.

Reply via email to