On Wed, Apr 05, 2017 at 09:37:04AM -0600, Sandra Loosemore wrote: > On 04/04/2017 06:14 AM, Alan Modra wrote: > >Revised patch. > > > >[snip] > >+@smallexample > >+static void > >+dgemv_kernel_4x4 (long n, const double *ap, long lda, > >+ const double *x, double *y, double alpha) > >+@{ > >+ double *a0; > >+ double *a1; > >+ double *a2; > >+ double *a3; > >+ > >+ __asm__ > >+ ( > >+ "lxvd2x 34, 0, %10 \n\t" // x0, x1 > >+ "lxvd2x 35, %11, %10 \n\t" // x2, x3 > >+ "xxspltd 32, %x9, 0 \n\t" // alpha, alpha > >+ "sldi %6, %13, 3 \n\t" // lda * sizeof (double) > >+ "xvmuldp 34, 34, 32 \n\t" // x0 * alpha, x1 * > >alpha > >+ "xvmuldp 35, 35, 32 \n\t" // x2 * alpha, x3 * > >alpha > >+ "add %4, %3, %6 \n\t" // a0 = ap, a1 = a0 + lda > >+ "add %6, %6, %6 \n\t" // 2 * lda > >+ "xxspltd 32, 34, 0 \n\t" // x0 * alpha, x0 * > >alpha > >+ "xxspltd 33, 34, 1 \n\t" // x1 * alpha, x1 * > >alpha > >+ "xxspltd 34, 35, 0 \n\t" // x2 * alpha, x2 * > >alpha > >+ "xxspltd 35, 35, 1 \n\t" // x3 * alpha, x3 * > >alpha > >+ "add %5, %3, %6 \n\t" // a2 = a0 + 2 * lda > >+ "add %6, %4, %6 \n\t" // a3 = a1 + 2 * lda > >+ ... > >+ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" > >+ "#a0=%3 a1=%4 a2=%5 a3=%6" > >+ : > >+ "+m" (*y), > >+ "+r" (n), // 1 > >+ "+b" (y), // 2 > >+ "=b" (a0), // 3 > >+ "=b" (a1), // 4 > >+ "=&b" (a2), // 5 > >+ "=&b" (a3) // 6 > >+ : > >+ "m" (*x), > >+ "m" (*ap), > >+ "d" (alpha), // 9 > >+ "r" (x), // 10 > >+ "b" (16), // 11 > >+ "3" (ap), // 12 > >+ "4" (lda) // 13 > >+ : > >+ "cr0", > >+ "vs32","vs33","vs34","vs35","vs36","vs37", > >+ "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" > >+ ); > >+@} > >+@end smallexample > > Hmmm. My main objection to this version is that it's unintelligible to > anyone who can't parse PowerPC assembly language without the help of an > architecture manual, and that's probably the majority of readers.
Heh, even I have trouble parsing some powerpc assembly! That's why there are a few lines of text describing what the assembly code does. I am concerned that the 14 lines of assembly shown make the example too big, but it's harder to describe code that isn't shown than to describe something under the nose of the reader. > I'm now wondering if it would be better to have a series of small examples > showing these tricks individually instead of one giant example that tries to > illustrate multiple things? Possibly, but this example comes after many others. If people have waded this far into the asm section of the manual they shouldn't have too much trouble understanding the concepts here. Also, there's value in a real-world example. Maybe that's just me. I'm not someone who tends to read manuals first, preferring to dive right in then go back to a manual later for some detail that can't be easily deduced. In fact, I have a distrust of manuals.. ;) This isn't a criticism of the gcc manual, but other documents I've read over the years are often just plain wrong. I've even been the *author* of technical documentation that had errors, some by yours truly, and some introduced by a "technical writer" who edited my input to make it read better, in the process accidentally changing something that made the details incorrect. I'm sure others have had the same experience. So I like *and trust* code snippets taken from working code more than made up examples created for documentation. -- Alan Modra Australia Development Lab, IBM