> Am 02.04.2025 um 00:02 schrieb Andi Kleen <a...@linux.intel.com>:
>
> From: Andi Kleen <a...@gcc.gnu.org>
>
> This isn't a regression, but it's a very simple patch with high
> performance improvement, so perhaps suitable in the current stage.
Ok. Can you add a comment as to why we store unconditionally?
Richard
> ---
>
> bitmap_set_bit checks the original value of the bit to return it to the
> caller and then only writes the new value back if it changes.
> Most callers of bitmap_set_bit don't need the return value, but with the
> conditional store
> the CPU still has to predict it correctly since gcc doesn't know how to do
> that without APX on x86 (even though CMOV could do it with a dummy target).
> Really if-conversion should handle this case, but for now we can fix
> it.
>
> This simple patch improves runtime by 15% for the test case in the PR.
> Which is more than I expected given it only has ~1.44% of the cycles, but I
> guess
> the mispredicts caused some down stream effects.
>
> ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet ran
> 1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2
> pr119482.cc -quiet
>
> At least with this test case the total number of branches decreases
> drastically. Even though the mispredict rate goes up slightly it is
> still a big win.
>
> $ perf stat -e
> branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/
> \
> -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet -w
>
> Performance counter stats for 'system wide':
>
> 41,932,957,091 branches
> 686,117,623 branch-misses # 1.64% of all
> branches
> 43,690.47 MiB uncore_imc/cas_count_read/
> 12,362.56 MiB uncore_imc/cas_count_write/
>
> 49.328633365 seconds time elapsed
>
> $ perf stat -e
> branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/
> \
> -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet -w
>
> Performance counter stats for 'system wide':
>
> 37,092,113,179 branches
> 663,641,708 branch-misses # 1.79% of all
> branches
> 43,196.52 MiB uncore_imc/cas_count_read/
> 12,369.33 MiB uncore_imc/cas_count_write/
>
> 42.632458350 seconds time elapsed
>
> gcc/ChangeLog:
>
> PR middle-end/119482
> * bitmap.cc (bitmap_set_bit): Write back value unconditionally
> ---
> gcc/bitmap.cc | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
> index f5a64b495ab3..7744f8f8c2e4 100644
> --- a/gcc/bitmap.cc
> +++ b/gcc/bitmap.cc
> @@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit)
> if (ptr != 0)
> {
> bool res = (ptr->bits[word_num] & bit_val) == 0;
> - if (res)
> - ptr->bits[word_num] |= bit_val;
> + ptr->bits[word_num] |= bit_val;
> return res;
> }
>
> --
> 2.49.0
>