Hi Segher, on 2023/5/25 19:22, Segher Boessenkool wrote: > Hi! > > On Thu, May 25, 2023 at 07:05:55AM -0300, Alexandre Oliva wrote: >> On May 25, 2023, "Kewen.Lin" <li...@linux.ibm.com> wrote: >>> So both lp64 and ilp32 have the same count, could we merge it and >>> remove the selectors? >> >> We could, but... I thought I wouldn't, since they were different >> before, and they're likely to diverge again in the future. I thought >> that combining them might suggest that they ought to be the same, when >> we already know that this is not the case. >> >> I'll prepare an alternate patch that combines them. > > Fwiw, updating the insn counts blindly like this has very small value on > the one hand, and negative value on the other. In total, negative > value. > > If it is not possible to keep these tests up-to-date easily the test > should be improved. If tests regressed otoh we should ***not*** paper > over that with patches like this, but investigate what happened instead: > such regressions are *real*. > > So which is it here? I am assuming it is a not-to-well written testcase > without all the necessary noipa attrs, and/or putting more than one > thing to test per function directly. Insn counts then shift easily if > the compiler decides to factor (CSE etc.) your code differently, but > that is a testcase artifact then, not something we want to adjust counts > for all of the time. > > It is feasible to do these insn count things only for trivial tiny > snippets. Everything bigger will regress all of the time, no one will > look at it properly, and instead people will just do blind "update > counts" patches like this :-/ *Good* insn count tests are quite > valuable, but harder to write. But maintenance costs noticably bigger > than zero for a testcase are not good, how many testcases do we run in > the testsuite? > > So, can we fix the underlying problem here please?
Thanks for all the comments and good question. I looked into this issue and found the current counts for 32-bit are mainly for aix (it doesn't need any updates there), and there are some generated assembly differences between aix 32-bit and 32-bit Linux, and it seems to be related to if compiler saves the frame pointer or not. Take a function testbc_var from fold-vec-extract-char.p7.c as example: #include <altivec.h> unsigned char testbc_var (vector bool char vbc2, signed int si) { return vec_extract (vbc2, si); } 1) on aix 32-bit, with trunk: .testbc_var: li 9,32 addi 10,1,-64 stxvw4x 34,10,9 rlwinm 3,3,0,28,31 addi 9,3,-64 // these two lines add 3,9,1 // can be combined, see below. lbz 3,32(3) blr with old gcc (without r11-6615): .testbc_var: addi 10,1,-64 li 9,32 stxvw4x 34,10,9 rlwinm 3,3,0,28,31 add 3,10,3 // better lbz 3,32(3) blr apparently an extra unnecessary addi is created. The test case expects one addi to adjust stack for a temp space, one add to prepare the index for the extracted byte. 2) same thing happens on aix 64-bit and Linux 64-bit: trunk: .testbc_var: li 9,48 addi 10,1,-64 stxvw4x 34,10,9 rldicl 5,5,0,60 addi 9,5,-64 // similar to aix 32-bit add 5,9,1 // .... lbz 3,48(5) blr vs. optimized: .testbc_var: addi 10,1,-64 li 9,48 stxvw4x 34,10,9 rldicl 5,5,0,60 add 5,10,5 // better lbz 3,48(5) blr 3) but for Linux 32-bit, they are the same between trunk and old gcc (without r11-6615): testbc_var: stwu 1,-48(1) li 9,16 rlwinm 3,3,0,28,31 stxvw4x 34,1,9 add 3,1,3 lbz 3,16(3) addi 1,1,48 blr So the expected count adjusted for aix 32-bit broke Linux 32-bit. As above, the behavior change (one more addi) on 64-bit and aix 32-bit results in sub-optimal code than before, but we updated the counts previously, so I changed PR101169's component to rtl-optimization for further investigation. From 3), what Alexandre proposed to fix for Linux 32-bit is actually to restore the expected count back to before. :) BR, Kewen