Some people over at OpenBLAS were asking me whether I knew of a whitepaper on gcc asm. I didn't besides the gcc manual, and wrote a note explaining some tricks. This patch is that note cleaned up. Tested by an x86_64-linux build. OK to apply?
BTW, anyone wandering over to look at OpenBLAS might notice that this example doesn't match the file exactly. Yes, writing this doco made me realize I need to submit a patch there.. * doc/extend.texi (Extended Asm): Add OpenBLAS example. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 594b32a..05c6892 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2017-03-31 Alan Modra <amo...@gmail.com> + + * doc/extend.texi (Extended Asm): Add OpenBLAS example. + 2017-03-31 Matthew Fortune <matthew.fort...@imgtec.com> * config/mips/mips-msa.md (msa_vec_extract_<msafmt_f>): Update diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index fadbc96..991a2f6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8516,6 +8516,84 @@ asm ("cmoveq %1, %2, %[result]" : "r" (test), "r" (new), "[result]" (old)); @end example +Here is a larger PowerPC example taken from OpenBLAS. The over 150 +lines of assembly have been removed except for comments added to check +gcc's register assignments, because the assembly itself isn't that +important. You do need to know that all of the function parameters +are inputs except for the @code{y} array, which is modified by the +function, and that early assembly sets up four pointers into the +@code{ap} array, @code{a0=ap}, @code{a1=ap+lda}, @code{a2=ap+2*lda}, +and @code{a3=ap+3*lda}. + +Illustrated here is a technique you can use to have gcc allocate +temporary registers for an asm, giving the compiler more freedom than +the programmer allocating fixed registers via clobbers. This is done +by declaring a variable and making it an early-clobber asm output as +with @code{a2} and @code{a3}, or making it an output tied to an input +as with @code{a0} and @code{a1}. The vsx registers used by the asm +could have used the same technique except for gcc's limit on number of +asm parameters. It shouldn't be surprising that @code{a0} is tied to +@code{ap} from the above description, and @code{lda} is only used +early so that register is available for reuse as @code{a1}. Tying an +input to an output is the way to set up an initialised temporary +register that is modified by an asm. The example also shows an +initialised register unchanged by the asm; @code{"b" (16)} sets up +@code{%11} to 16. + +Also shown is a somewhat better method than using a @code{"memory"} +clobber to tell gcc that an asm accesses or modifies memory . Here we +use @code{"+m" (*y)} in the list of outputs to tell gcc that the +@code{y} array is both read and written by the asm. @code{"m" (*x)} +and @code{"m" (*ap)} in the inputs tells gcc that these arrays are +read. At a minimum, aliasing rules will allow gcc to know what memory +@emph{doesn't} need to be flushed, and if the function were inlined +then gcc may be able to do even better. Notice that @code{x}, +@code{y}, and @code{ap} all appear twice in the asm parameters, once +to specify memory accessed, and once to specify a base register used +by the asm. You won't normally be wasting a register by doing this as +gcc can use the same register for both purposes. However, it would be +foolish to use both @code{%0} and @code{%2} for @code{y} in your asm +and expect them to be the same. + +@example +static void +dgemv_kernel_4x4 (long n, const double *ap, long lda, + const double *x, double *y, double alpha) +@{ + double *a0; + double *a1; + double *a2; + double *a3; + + __asm__ + ( + ... + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" + "#a0=%3 a1=%4 a2=%5 a3=%6" + : + "+m" (*y), + "+r" (n), // 1 + "+b" (y), // 2 + "=b" (a0), // 3 + "=b" (a1), // 4 + "=&b" (a2), // 5 + "=&b" (a3) // 6 + : + "m" (*x), + "m" (*ap), + "d" (alpha), // 9 + "r" (x), // 10 + "b" (16), // 11 + "3" (ap), // 12 + "4" (lda) // 13 + : + "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" + ); +@} +@end example + @anchor{Clobbers} @subsubsection Clobbers @cindex @code{asm} clobbers -- Alan Modra Australia Development Lab, IBM