> Am 02.04.2025 um 00:02 schrieb Andi Kleen <a...@linux.intel.com>:
> 
> From: Andi Kleen <a...@gcc.gnu.org>
> 
> This isn't a regression, but it's a very simple patch with high
> performance improvement, so perhaps suitable in the current stage.

Ok.  Can you add a comment as to why we store unconditionally?

Richard 

> ---
> 
> bitmap_set_bit checks the original value of the bit to return it to the
> caller and then only writes the new value back if it changes.
> Most callers of bitmap_set_bit don't need the return value, but with the 
> conditional store
> the CPU still has to predict it correctly since gcc doesn't know how to do
> that without APX on x86 (even though CMOV could do it with a dummy target).
> Really if-conversion should handle this case, but for now we can fix
> it.
> 
> This simple patch improves runtime by 15% for the test case in the PR.
> Which is more than I expected given it only has ~1.44% of the cycles, but I 
> guess
> the mispredicts caused some down stream effects.
> 
>  ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet ran
>    1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 
> pr119482.cc  -quiet
> 
> At least with this test case the total number of branches decreases
> drastically. Even though the mispredict rate goes up slightly it is
> still a big win.
> 
> $ perf stat -e 
> branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ 
> \
> -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc  -quiet -w
> 
> Performance counter stats for 'system wide':
> 
>    41,932,957,091      branches
>       686,117,623      branch-misses                    #    1.64% of all 
> branches
>         43,690.47 MiB  uncore_imc/cas_count_read/
>         12,362.56 MiB  uncore_imc/cas_count_write/
> 
>      49.328633365 seconds time elapsed
> 
> $ perf stat -e 
> branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ 
> \
> -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet -w
> 
> Performance counter stats for 'system wide':
> 
>    37,092,113,179      branches
>       663,641,708      branch-misses                    #    1.79% of all 
> branches
>         43,196.52 MiB  uncore_imc/cas_count_read/
>         12,369.33 MiB  uncore_imc/cas_count_write/
> 
>      42.632458350 seconds time elapsed
> 
> gcc/ChangeLog:
> 
>    PR middle-end/119482
>    * bitmap.cc (bitmap_set_bit): Write back value unconditionally
> ---
> gcc/bitmap.cc | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
> index f5a64b495ab3..7744f8f8c2e4 100644
> --- a/gcc/bitmap.cc
> +++ b/gcc/bitmap.cc
> @@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit)
>   if (ptr != 0)
>     {
>       bool res = (ptr->bits[word_num] & bit_val) == 0;
> -      if (res)
> -    ptr->bits[word_num] |= bit_val;
> +      ptr->bits[word_num] |= bit_val;
>       return res;
>     }
> 
> --
> 2.49.0
> 

Reply via email to