In Tue, Oct 30, 2018 at 4:28 AM Iain Sandoe <i...@sandoe.co.uk> wrote:
>
> Hi,
>
> For a processor that supports SSE, but not AVX.
>
> the following code:
>
> typedef int __attribute__((mode(QI))) qi;
> typedef qi __attribute__((vector_size (32))) v32qi;
>
> v32qi foo (int x)
> {
>   v32qi y = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f',
>           '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'};
>   return y;
> }
>
> produces a warning " warning: AVX vector return without AVX enabled changes 
> the ABI [-Wpsabi]”.
>
> so - the question is what is the resultant ABI in the changed case (since 
> _m256 is supported for such processors)
>
> ====
>
> Looking at the psABI v1.0
>
> * pp24 Returning of Values
>
> The returning of values is done according to the following algorithm:
>
>         • Classify the return type with the classification algorithm.
>
> …
>         • If the class is SSE, the next available vector register of the 
> sequence %xmm0, %xmm1 is used.
>
>         • If the class is SSEUP, the eight byte is returned in the next 
> available eightbyte chunk of the last used vector register.
>
> ...
>
> * classification algorithm : pp20
>
>         • Arguments of type __m256 are split into four eightbyte chunks. The 
> least significant one belongs to class SSE and all the others to class SSEUP.
>
>         • Arguments of type __m512 are split into eight eightbyte chunks. The 
> least significant one belongs to class SSE and all the others to class SSEUP.
>
> *  footnote on pp21
>
> 12 The post merger clean up described later ensures that, for the processors 
> that do not support the __m256 type, if the size of an object is larger than 
> two eightbytes and the first eightbyte is not SSE or any other eightbyte is 
> not SSEUP, it still has class MEMORY.
>
> This in turn ensures that for processors that do support the __m256 type, if 
> the size of an object is four eightbytes and the first eightbyte is SSE and 
> all other eightbytes are SSEUP, it can be passed in a register. This also 
> applies to the __m512 type. That is for processors that support the __m512 
> type, if the size of an object is eight eightbytes and the first eightbyte is 
> SSE and all other eightbytes are SSEUP, it can be passed in a register, 
> otherwise, it will be passed in memory.
>
> ---
>
> However : the case where the processor does *not* support __m256 but the 
> first eightbyte *is* SSE and the following eighbytes *are* SSEUP is not 
> clarified.
>
> The intent for SSE seems clear - use a reg
> The intent for following SSEUP is less clear.
>
> Nevertheless, it seems to imply that the intent for processors with SSE that 
> the __m256 (and __m512) returns should be passed in xmm0:1(:3, maybe).
>
> figure 3.4 pp23 does not clarify xmm* use for vector return at all - only 
> mentioning floating point.
>
> ===== status
>
> In any event, GCC passes the vec32 return in memory,
> LLVM conversely passes it in xmm0:1 (at least for the versions I’ve tried).
>
> which leads to an ABI discrepancy when GCC is used to build code on systems 
> based on LLVM.
>
> Please could the X86 maintainers clarify the intent (and maybe consider 
> enhancing the footnote classification notes to make things clearer)?
>
> - and then we can figure out how to deal with the systems that are already 
> implemented - and how to move forward.
>
> (as an aside, in any event, it seems inefficient to pass through memory when 
> at least xmm0:1 are already set aside for return value use).

Please open a bug to keep track.

Thanks.

-- 
H.J.

Reply via email to