Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

Kewen.Lin via Gcc-patches Wed, 31 May 2023 02:00:34 -0700

Hi Segher,

on 2023/5/25 19:22, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, May 25, 2023 at 07:05:55AM -0300, Alexandre Oliva wrote:
>> On May 25, 2023, "Kewen.Lin" <li...@linux.ibm.com> wrote:
>>> So both lp64 and ilp32 have the same count, could we merge it and
>>> remove the selectors?
>>
>> We could, but...  I thought I wouldn't, since they were different
>> before, and they're likely to diverge again in the future.  I thought
>> that combining them might suggest that they ought to be the same, when
>> we already know that this is not the case.
>>
>> I'll prepare an alternate patch that combines them.
> 
> Fwiw, updating the insn counts blindly like this has very small value on
> the one hand, and negative value on the other.  In total, negative
> value.
> 
> If it is not possible to keep these tests up-to-date easily the test
> should be improved.  If tests regressed otoh we should ***not*** paper
> over that with patches like this, but investigate what happened instead:
> such regressions are *real*.
> 
> So which is it here?  I am assuming it is a not-to-well written testcase
> without all the necessary noipa attrs, and/or putting more than one
> thing to test per function directly.  Insn counts then shift easily if
> the compiler decides to factor (CSE etc.) your code differently, but
> that is a testcase artifact then, not something we want to adjust counts
> for all of the time.
> 
> It is feasible to do these insn count things only for trivial tiny
> snippets.  Everything bigger will regress all of the time, no one will
> look at it properly, and instead people will just do blind "update
> counts" patches like this :-/  *Good* insn count tests are quite
> valuable, but harder to write.  But maintenance costs noticably bigger
> than zero for a testcase are not good, how many testcases do we run in
> the testsuite?
> 
> So, can we fix the underlying problem here please?


Thanks for all the comments and good question.

I looked into this issue and found the current counts for 32-bit are
mainly for aix (it doesn't need any updates there), and there are some
generated assembly differences between aix 32-bit and 32-bit Linux, and
it seems to be related to if compiler saves the frame pointer or not.

Take a function testbc_var from fold-vec-extract-char.p7.c as example:

#include <altivec.h>

unsigned char
testbc_var (vector bool char vbc2, signed int si)
{
  return vec_extract (vbc2, si);
}

1) on aix 32-bit, with trunk:

.testbc_var:
        li 9,32
        addi 10,1,-64
        stxvw4x 34,10,9
        rlwinm 3,3,0,28,31
        addi 9,3,-64          // these two lines
        add 3,9,1             // can be combined, see below.
        lbz 3,32(3)
        blr

with old gcc (without r11-6615):

.testbc_var:
        addi 10,1,-64
        li 9,32
        stxvw4x 34,10,9
        rlwinm 3,3,0,28,31
        add 3,10,3            // better
        lbz 3,32(3)
        blr

apparently an extra unnecessary addi is created.  The test
case expects one addi to adjust stack for a temp space,
one add to prepare the index for the extracted byte.

2) same thing happens on aix 64-bit and Linux 64-bit:

trunk:

.testbc_var:
        li 9,48
        addi 10,1,-64
        stxvw4x 34,10,9
        rldicl 5,5,0,60
        addi 9,5,-64         // similar to aix 32-bit
        add 5,9,1            // ....
        lbz 3,48(5)
        blr

vs. optimized:

.testbc_var:
        addi 10,1,-64
        li 9,48
        stxvw4x 34,10,9
        rldicl 5,5,0,60
        add 5,10,5           // better
        lbz 3,48(5)
        blr

3) but for Linux 32-bit, they are the same between trunk
and old gcc (without r11-6615):

testbc_var:
        stwu 1,-48(1)
        li 9,16
        rlwinm 3,3,0,28,31
        stxvw4x 34,1,9
        add 3,1,3
        lbz 3,16(3)
        addi 1,1,48
        blr

So the expected count adjusted for aix 32-bit broke Linux
32-bit.

As above, the behavior change (one more addi) on 64-bit and aix
32-bit results in sub-optimal code than before, but we updated
the counts previously, so I changed PR101169's component to
rtl-optimization for further investigation.  From 3), what
Alexandre proposed to fix for Linux 32-bit is actually to restore
the expected count back to before. :)

BR,
Kewen

Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

Reply via email to