https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749
--- Comment #11 from kong lingling <lingling.kong7 at gmail dot com> --- After adjusted rtx_cost of imulq for COST_N_INSNS (4) to COST_N_INSNS (3), I tested the benchmark on Sierra Forest machine based on gcc trunk, and the algorithm with 2 multiplications is 2% faster. For Spec2017 performance improvement is around 0.2% (1 copy, -march=native -Ofast -funroll-loops -flto / -mtune=generic -O2 -march=x86-64-v3).