https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749
Hongtao Liu <liuhongt at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |haochen.jiang at intel dot com, | |liuhongt at gcc dot gnu.org --- Comment #10 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > One of the comments in PR 115756 was "I'd lean towards shift+add because for > example Intel E-cores have a slow imul.". However, my benchmarks suggest > that even on Intel Efficiency CPU cores the algorithm with 2 multiplication > instructions is faster. (I used the Process Lasso tool on Windows 11 to > force the benchmark to be run on an Efficiency CPU core). @haocheng, could you try the benchmark on our Sierra Forest machine? I'm ok to adjust rtx_cost of imulq for COST_N_INSNS (4) to COST_N_INSNS (3) if the performance test looks ok.