[Bug target/115749] Non optimal assembly for integer modulo by a constant on x86-64 CPUs

liuhongt at gcc dot gnu.org via Gcc-bugs Wed, 03 Jul 2024 18:17:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749


Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |haochen.jiang at intel dot com,
                   |                            |liuhongt at gcc dot gnu.org

--- Comment #10 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> One of the comments in PR 115756 was "I'd lean towards shift+add because for
> example Intel E-cores have a slow imul.". However, my benchmarks suggest
> that even on Intel Efficiency CPU cores the algorithm with 2 multiplication
> instructions is faster. (I used the Process Lasso tool on Windows 11 to
> force the benchmark to be run on an Efficiency CPU core).

@haocheng, could you try the benchmark on our Sierra Forest machine?
I'm ok to adjust rtx_cost of imulq for COST_N_INSNS (4) to COST_N_INSNS (3) if
the performance test looks ok.

[Bug target/115749] Non optimal assembly for integer modulo by a constant on x86-64 CPUs

Reply via email to