On 2021/12/21 09:32, David Edelsohn wrote:
> On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
>>
>> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote:
>>> On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luo...@linux.ibm.com> wrote:
>>>> These four UNSPECS seems could be replaced with native RTL, and why
>>>> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
>>>> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
>>>>
>>>> This bit is sticky; that is, once set to 1 it
>>>> remains set to 1 until it is set to 0 by an
>>>> mtvscr instruction.
>>>>
>>>> The RTL pattern set it to 0 but final ASM doesn't present it? And why
>>>> not use Clobber VSCR_REGNO instead?
>>>
>>> The design came from the early implementation of Altivec:
>>>
>>> https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html
Thanks, I constructed a testcase for this,
cat vadds.c
#include <stdio.h>
#include <altivec.h>
vector signed int foo1 (vector signed int a, vector signed int c) {
vector signed int ret = vec_vaddsws (a, c);
union {vector unsigned short v; unsigned short s[8];} u;
u.v = vec_mfvscr();
printf("foo1: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1],
u.s[2],
u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]);
return a;
}
vector signed int foo2 (vector signed int a, vector signed int c) {
vector signed int ret = vec_vaddsws (a, c);
union {vector unsigned short v; unsigned short s[8];} u;
u.v = vec_mfvscr();
printf("foo2: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1],
u.s[2],
u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]);
return ret;
}
int main ()
{
vector signed int a = {0x12345678, 0x22345678, 0x32345678, 0x42345678};
vector signed int c = {0x12345678, 0x22345678, 0x32345678, 0x42345678};
vector signed int b1 = foo1 (a, c);
vector signed int b2 = foo2 (a, c);
return b1[0]+b2[0];
}
The output is:
./a.out
foo1: vscr == { 0,0,0,0,0,0,0,0 }
foo2: vscr == { 1,0,0,0,0,0,0,0 }
So is this expected behavior? Seems doesn't meet with Aldy's
description... (foo1's temp is optimized away so no vaddsws produced)
" foo()
{
vector int result;
result = vec_adds(blah, blah);
__check_for_saturation__
}
gcc, will optimize away vec_adds() because the result is a local
variable unused later. then when we check the saturation bit in VSCR,
we get wrong results.
this patch explains to gcc all about VSCR, and adds it as a global
register as well."
>>>
>>> If one later checks for saturation (reads VSCR), one needs a
>>> corresponding SET of the value. It's set in an architecture-specific
>>> manner that isn't described to GCC, but it's set, not just clobbered
>>> and in an undefined state.
>>
>> Well. RTL clobber and set do exactly the same thing, except with
>> clobber it is not specified *what* value is set. All bits are set, all
>> bits are defined. There is no (direct) way in RTL to say
>> "undetermined".
>>
>> An RTL clobber would work just fine afaics?
>
> I don't know about the original intention from Aldy, but if one were
> looking at an RTL dump and the code used the saturation bit from VSCR,
> it might be confusing to see a CLOBBER instead of a SET. The SET
> documents that VSCR_REGNO is assigned a specific value; GCC doesn't
> know about the semantics, but it's not some undefined bit pattern.
> CLOBBER implies a trash value or a value that one will not query
> later, i.e., one would want to SET the register to a specific value
> before using it.
>
>>
>>> The RTL does not describe that VSCR is set to the value 0. The
>>> (const_int 0) is not the value set. You can think of the (const_int
>>> 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at
>>> least one argument and the pattern doesn't try to express the
>>> argument, so it uses a dummy RTL constant.
>>
>> Yup. Traditionally (pc) was used for this. Nowadays (const_int 0) is
>> not really more expensive anymore, and many people find it clearer (but
>> not in this case it seems :-) ).
>>
>>> It's part of a PARALLEL
>>> and the plus or minus already expresses the data dependency of the
>>> pattern on the input operands.
>>
>> But they do not describe any dependency on vscr, or output to it. This
>> is the same problem we have with fpscr (most FP insns use some of its
>> fields, most set some, but there is no way to cleanly express that).
>
> It describes that VSCR_REGNO is set, an output. It doesn't describe
> how it is set nor inform the compiler that the value depends on the
> input operands in some complicated way unknown to the compiler, but
> the compiler cannot do anything useful with the additional
> information.
>
>>
>> Explicit clobbers like this help one side of the issue. For vscr, other
>> than the sat bit there is only the nj bit, and we just ignore that :-)
>>
>>> This patch is okay. Thanks for updating the machine description and
>>> for cleaning up the formatting.
>>
>> x2. Thanks!
>>
>>
>> Segher
--
Thanks,
Xionghu