As I said about 10 years ago here: http://www.itworld.com/Comp/2102/UIR960701perf/ Which can be found more easily here: http://perfcap.blogspot.com/2005/10/sunworld-columns-at-itworld.html
------------------------------- You can expect a speedup from -xarch=v8plus if your code is double-precision, vectorizable, and the compiler can unroll DO loops. A large number of temporary variables need to be stored in registers in the unrolled loop to hide load-use latencies. A version of the Linpack DP1000 benchmark went 70 percent faster with this option, which is the most you can expect. Single precision code shows no speedup, as there are already 32 single-precision registers. The performance improvement obtained using the above options with -xarch=v8 and -xarch=v8plus on each component of SPECfp92 varied from 0 percent in several cases to a best case of 29 percent. The geometric mean increased by 11 percent. It is rare for one loop to dominate an application, so a mixture of accelerated and unaccelerated loops gives rise to a varying overall speedup. The potential for speedup increases with the highest optimization levels and should increase over time as the compiler improves its optimization strategies. I have not seen a significant speedup on typical C code. In general, don't waste time trying -xarch=v8plus with the C compiler. The compiler's code generator has many more options. The ones I described usually make a significant difference. In a few cases the profile feedback is useful as well. The highest level of optimization is now -xO5, and it should only ever be used in conjunction with a collected profile, so the code generator knows which loops to optimize aggressively. You simply compile with -xO4 -xprofile=collect, run the program, then recompile with -xO5 -xprofile=use. This is easy to setup on small benchmarks, but trickier with large apps. -------------------------- I don't think the fundamentals of this have changed since I wrote this, the compilers are a bit better, but -v8plus is only going to improve DP floating point code. Adrian On 7/12/06, Roland Mainz <[EMAIL PROTECTED]> wrote:
Hi! ---- While looking at "isaexec" machinery which selects the ksh93 binary to be used I realised that it uses /usr/bin/i86/ksh93 on a Solaris version which supports as minimum version the Intel Pentium-1 (or higher) ... ... my proposal would be to bump the default optimisation settings up to match the minimum supported hardware for better performace - the question is: Which flags should we use ? - For SPARC we have the choice between the currenty used "sparcv8" or "sparcv8plus" - "sparcv8plus" has the advantage of native 64bit registers but there is a warning in http://cvs.opensolaris.org/source/xref/on/usr/src/lib/libc/sparc/threads/sparc.il#58 which makes me worry about binary compatiblity: -- snip -- We can't compile the 32-bit libc with -xarch=v8plus because then %g5 would become a scratch register and we would break 32-bit applications that use %g5 as an invariant register. -- snip -- ... which simply raises the question whether we can mix "sparcv8" and "sparcv8plus" code in one application. I guess the answer is "no"... but I am not sure and I can't remember anymore since I am now more than 96h awake... ;-( Does anyone know whether a "sparcv8" application can link to a library which was compiled with "sparcv8plus" ? - For x86 I think we can use "-xarch=pentium_pro" (or -xtarget=pentium_pro ?) and move the binary to /usr/bin/pentium_pro/ksh93 ... are there any objections against that ? ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [EMAIL PROTECTED] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org
_______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org