In Tue, Oct 30, 2018 at 4:28 AM Iain Sandoe <i...@sandoe.co.uk> wrote: > > Hi, > > For a processor that supports SSE, but not AVX. > > the following code: > > typedef int __attribute__((mode(QI))) qi; > typedef qi __attribute__((vector_size (32))) v32qi; > > v32qi foo (int x) > { > v32qi y = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f', > '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'}; > return y; > } > > produces a warning " warning: AVX vector return without AVX enabled changes > the ABI [-Wpsabi]”. > > so - the question is what is the resultant ABI in the changed case (since > _m256 is supported for such processors) > > ==== > > Looking at the psABI v1.0 > > * pp24 Returning of Values > > The returning of values is done according to the following algorithm: > > • Classify the return type with the classification algorithm. > > … > • If the class is SSE, the next available vector register of the > sequence %xmm0, %xmm1 is used. > > • If the class is SSEUP, the eight byte is returned in the next > available eightbyte chunk of the last used vector register. > > ... > > * classification algorithm : pp20 > > • Arguments of type __m256 are split into four eightbyte chunks. The > least significant one belongs to class SSE and all the others to class SSEUP. > > • Arguments of type __m512 are split into eight eightbyte chunks. The > least significant one belongs to class SSE and all the others to class SSEUP. > > * footnote on pp21 > > 12 The post merger clean up described later ensures that, for the processors > that do not support the __m256 type, if the size of an object is larger than > two eightbytes and the first eightbyte is not SSE or any other eightbyte is > not SSEUP, it still has class MEMORY. > > This in turn ensures that for processors that do support the __m256 type, if > the size of an object is four eightbytes and the first eightbyte is SSE and > all other eightbytes are SSEUP, it can be passed in a register. This also > applies to the __m512 type. That is for processors that support the __m512 > type, if the size of an object is eight eightbytes and the first eightbyte is > SSE and all other eightbytes are SSEUP, it can be passed in a register, > otherwise, it will be passed in memory. > > --- > > However : the case where the processor does *not* support __m256 but the > first eightbyte *is* SSE and the following eighbytes *are* SSEUP is not > clarified. > > The intent for SSE seems clear - use a reg > The intent for following SSEUP is less clear. > > Nevertheless, it seems to imply that the intent for processors with SSE that > the __m256 (and __m512) returns should be passed in xmm0:1(:3, maybe). > > figure 3.4 pp23 does not clarify xmm* use for vector return at all - only > mentioning floating point. > > ===== status > > In any event, GCC passes the vec32 return in memory, > LLVM conversely passes it in xmm0:1 (at least for the versions I’ve tried). > > which leads to an ABI discrepancy when GCC is used to build code on systems > based on LLVM. > > Please could the X86 maintainers clarify the intent (and maybe consider > enhancing the footnote classification notes to make things clearer)? > > - and then we can figure out how to deal with the systems that are already > implemented - and how to move forward. > > (as an aside, in any event, it seems inefficient to pass through memory when > at least xmm0:1 are already set aside for return value use).
Please open a bug to keep track. Thanks. -- H.J.