Hi, We like to update x86-64 psABI to pass aggregates of 32 bytes with single __m256 field in AVX registers, instead of memory. However, finding the proper wording seems tricky. Here is what I got. Any comments?
Thanks. -- H.J.
Index: low-level-sys-info.tex =================================================================== --- low-level-sys-info.tex (revision 5099) +++ low-level-sys-info.tex (working copy) @@ -343,10 +343,12 @@ classes are corresponding to \xARCH regi \begin{description} \item[INTEGER] This class consists of integral types that fit into one of the general purpose registers. -\item[SSE] The class consists of types that fit into a SSE register. -\item[SSEUP] The class consists of types that fit into a SSE register +\item[SSE] The class consists of types that fit into an SSE register. +\item[SSEUP] The class consists of types that fit into an SSE register + and can be passed and returned in the most significant half of it. +\item[AVX] The class consists of types that fit into an AVX register. +\item[AVXUP] The class consists of types that fit into an AVX register and can be passed and returned in the most significant half of it. -\item[AVX] The class consists of types that fit into a AVX register. \item[X87, X87UP] These classes consists of types that will be returned via the x87 FPU. \item[COMPLEX\_X87] This class consists of types that will be returned @@ -372,7 +374,9 @@ The basic types are assigned their natur \item Arguments of types \code{__float128}, \code{_Decimal128} and \code{__m128} are split into two halves. The least significant ones belong to class SSE, the most significant one to class SSEUP. -\item Arguments of type \code{__m256} are in class AVX. +\item Arguments of type \code{__m256} are split into into two halves. + The least significant ones belong to class AVX, the most significant + one to class AVXUP. \item The 64-bit mantissa of arguments of type \code{long double} belongs to class X87, the 16-bit exponent plus 6 bytes of padding belongs to class X87UP. @@ -407,11 +411,10 @@ The classification of aggregate (structu types works as follows: \begin{enumerate} -\item If the size of an object is larger than two \eightbytes, or - it contains unaligned fields, it has class MEMORY. +\item If it contains unaligned fields, it has class MEMORY. \item If a C++ object has either a non-trivial copy constructor - or a non-trivial destructor + or a non-trivial destructor, \footnote{A de/constructor is trivial if it is an implicitly-declared default de/constructor and if: \begin{itemize} @@ -433,6 +436,15 @@ types works as follows: because such objects must have well defined addresses. Similar issues apply when returning an object from a function.} +\item If the size of the aggregate is four \eightbytes, two + consecutive \eightbytes are classified as an aggregate of two + \eightbytes. If the first of two \eightbytes aggregates has the + AVX class, it is broken into the SSE and SSEUP classes for + class merge purpose. + +\item If the size of an object is larger than two \eightbytes, + it has class MEMORY. + \item If the size of the aggregate exceeds a single \eightbyte, each is classified separately. Each \eightbyte gets initialized to class NO_CLASS. @@ -453,6 +465,8 @@ types works as follows: \begin{enumerate} \item If one of the classes is MEMORY, the whole argument is passed in memory. \item If SSEUP is not preceeded by SSE, it is converted to SSE. + \item If AVXUP is preceeded by SSE, the SSE class is converted to AVX. + \item If AVXUP is not preceeded by AVX, it is converted to AVX. \end{enumerate} \end{enumerate}