Hi,

For a processor that supports SSE, but not AVX.

the following code:

typedef int __attribute__((mode(QI))) qi;
typedef qi __attribute__((vector_size (32))) v32qi;

v32qi foo (int x)
{
  v32qi y = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f',
          '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'};
  return y;
}

produces a warning " warning: AVX vector return without AVX enabled changes the 
ABI [-Wpsabi]”.

so - the question is what is the resultant ABI in the changed case (since _m256 
is supported for such processors)

====

Looking at the psABI v1.0 

* pp24 Returning of Values

The returning of values is done according to the following algorithm:

        • Classify the return type with the classification algorithm.

…
        • If the class is SSE, the next available vector register of the 
sequence %xmm0, %xmm1 is used.

        • If the class is SSEUP, the eight byte is returned in the next 
available eightbyte chunk of the last used vector register.

...

* classification algorithm : pp20

        • Arguments of type __m256 are split into four eightbyte chunks. The 
least significant one belongs to class SSE and all the others to class SSEUP.

        • Arguments of type __m512 are split into eight eightbyte chunks. The 
least significant one belongs to class SSE and all the others to class SSEUP.

*  footnote on pp21

12 The post merger clean up described later ensures that, for the processors 
that do not support the __m256 type, if the size of an object is larger than 
two eightbytes and the first eightbyte is not SSE or any other eightbyte is not 
SSEUP, it still has class MEMORY. 

This in turn ensures that for processors that do support the __m256 type, if 
the size of an object is four eightbytes and the first eightbyte is SSE and all 
other eightbytes are SSEUP, it can be passed in a register. This also applies 
to the __m512 type. That is for processors that support the __m512 type, if the 
size of an object is eight eightbytes and the first eightbyte is SSE and all 
other eightbytes are SSEUP, it can be passed in a register, otherwise, it will 
be passed in memory.

---

However : the case where the processor does *not* support __m256 but the first 
eightbyte *is* SSE and the following eighbytes *are* SSEUP is not clarified.

The intent for SSE seems clear - use a reg
The intent for following SSEUP is less clear.

Nevertheless, it seems to imply that the intent for processors with SSE that 
the __m256 (and __m512) returns should be passed in xmm0:1(:3, maybe).

figure 3.4 pp23 does not clarify xmm* use for vector return at all - only 
mentioning floating point.

===== status

In any event, GCC passes the vec32 return in memory,
LLVM conversely passes it in xmm0:1 (at least for the versions I’ve tried).

which leads to an ABI discrepancy when GCC is used to build code on systems 
based on LLVM.

Please could the X86 maintainers clarify the intent (and maybe consider 
enhancing the footnote classification notes to make things clearer)?

- and then we can figure out how to deal with the systems that are already 
implemented - and how to move forward.

(as an aside, in any event, it seems inefficient to pass through memory when at 
least xmm0:1 are already set aside for return value use).

thanks
Iain

Reply via email to