On 2021/12/21 09:32, David Edelsohn wrote:
> On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
>>
>> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote:
>>> On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luo...@linux.ibm.com> wrote:
>>>> These four UNSPECS seems could be replaced with native RTL, and why
>>>> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
>>>> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
>>>>
>>>>   This bit is sticky; that is, once set to 1 it
>>>>   remains set to 1 until it is set to 0 by an
>>>>   mtvscr instruction.
>>>>
>>>> The RTL pattern set it to 0 but final ASM doesn't present it?  And why
>>>> not use Clobber VSCR_REGNO instead?
>>>
>>> The design came from the early implementation of Altivec:
>>>
>>> https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html

Thanks, I constructed a testcase for this,

cat vadds.c
#include <stdio.h>
#include <altivec.h>
vector signed int foo1 (vector signed int a, vector signed int c) {
  vector signed int ret = vec_vaddsws (a, c);
  union {vector unsigned short v; unsigned short s[8];} u;
  u.v = vec_mfvscr();
  printf("foo1: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1],
u.s[2],
         u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]);
  return a;
}

vector signed int foo2 (vector signed int a, vector signed int c) {
  vector signed int ret = vec_vaddsws (a, c);
  union {vector unsigned short v; unsigned short s[8];} u;
  u.v = vec_mfvscr();
  printf("foo2: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1],
u.s[2],
         u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]);
  return ret;
}

int main ()
{
  vector signed int a = {0x12345678, 0x22345678, 0x32345678, 0x42345678};
  vector signed int c = {0x12345678, 0x22345678, 0x32345678, 0x42345678};
  vector signed int b1 = foo1 (a, c);
  vector signed int b2 = foo2 (a, c);
  return b1[0]+b2[0];
}

The output is:

./a.out
foo1: vscr == { 0,0,0,0,0,0,0,0 }
foo2: vscr == { 1,0,0,0,0,0,0,0 }


So is this expected behavior?  Seems doesn't meet with Aldy's
description...  (foo1's temp is optimized away so no vaddsws produced)

" foo()
{
  vector int result;

  result = vec_adds(blah, blah);
  __check_for_saturation__
}

gcc, will optimize away vec_adds() because the result is a local
variable unused later.  then when we check the saturation bit in VSCR,
we get wrong results.

this patch explains to gcc all about VSCR, and adds it as a global
register as well."


>>>
>>> If one later checks for saturation (reads VSCR), one needs a
>>> corresponding SET of the value.  It's set in an architecture-specific
>>> manner that isn't described to GCC, but it's set, not just clobbered
>>> and in an undefined state.
>>
>> Well.  RTL clobber and set do exactly the same thing, except with
>> clobber it is not specified *what* value is set.  All bits are set, all
>> bits are defined.  There is no (direct) way in RTL to say
>> "undetermined".
>>
>> An RTL clobber would work just fine afaics?
> 
> I don't know about the original intention from Aldy, but if one were
> looking at an RTL dump and the code used the saturation bit from VSCR,
> it might be confusing to see a CLOBBER instead of a SET.  The SET
> documents that VSCR_REGNO is assigned a specific value; GCC doesn't
> know about the semantics, but it's not some undefined bit pattern.
> CLOBBER implies a trash value or a value that one will not query
> later, i.e., one would want to SET the register to a specific value
> before using it.
> 
>>
>>> The RTL does not describe that VSCR is set to the value 0.  The
>>> (const_int 0) is not the value set.  You can think of the (const_int
>>> 0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
>>> least one argument and the pattern doesn't try to express the
>>> argument, so it uses a dummy RTL constant.
>>
>> Yup.  Traditionally (pc) was used for this.  Nowadays (const_int 0) is
>> not really more expensive anymore, and many people find it clearer (but
>> not in this case it seems :-) ).
>>
>>> It's part of a PARALLEL
>>> and the plus or minus already expresses the data dependency of the
>>> pattern on the input operands.
>>
>> But they do not describe any dependency on vscr, or output to it.  This
>> is the same problem we have with fpscr (most FP insns use some of its
>> fields, most set some, but there is no way to cleanly express that).
> 
> It describes that VSCR_REGNO is set, an output. It doesn't describe
> how it is set nor inform the compiler that the value depends on the
> input operands in some complicated way unknown to the compiler, but
> the compiler cannot do anything useful with the additional
> information.
> 
>>
>> Explicit clobbers like this help one side of the issue.  For vscr, other
>> than the sat bit there is only the nj bit, and we just ignore that :-)
>>
>>> This patch is okay.  Thanks for updating the machine description and
>>> for cleaning up the formatting.
>>
>> x2.  Thanks!
>>
>>
>> Segher

-- 
Thanks,
Xionghu

Reply via email to