As I said about 10 years ago here:
http://www.itworld.com/Comp/2102/UIR960701perf/
Which can be found more easily here:
http://perfcap.blogspot.com/2005/10/sunworld-columns-at-itworld.html

-------------------------------
You can expect a speedup from -xarch=v8plus if your code is
double-precision, vectorizable, and the compiler can unroll DO loops.
A large number of temporary variables need to be stored in registers
in the unrolled loop to hide load-use latencies. A version of the
Linpack DP1000 benchmark went 70 percent faster with this option,
which is the most you can expect. Single precision code shows no
speedup, as there are already 32 single-precision registers. The
performance improvement obtained using the above options with
-xarch=v8 and -xarch=v8plus on each component of SPECfp92 varied from
0 percent in several cases to a best case of 29 percent. The geometric
mean increased by 11 percent. It is rare for one loop to dominate an
application, so a mixture of accelerated and unaccelerated loops gives
rise to a varying overall speedup. The potential for speedup increases
with the highest optimization levels and should increase over time as
the compiler improves its optimization strategies.

I have not seen a significant speedup on typical C code. In general,
don't waste time trying -xarch=v8plus with the C compiler. The
compiler's code generator has many more options. The ones I described
usually make a significant difference. In a few cases the profile
feedback is useful as well. The highest level of optimization is now
-xO5, and it should only ever be used in conjunction with a collected
profile, so the code generator knows which loops to optimize
aggressively. You simply compile with -xO4 -xprofile=collect, run the
program, then recompile with -xO5 -xprofile=use. This is easy to setup
on small benchmarks, but trickier with large apps.
--------------------------

I don't think the fundamentals of this have changed since I wrote
this, the compilers are a bit better, but -v8plus is only going to
improve DP floating point code.

Adrian

On 7/12/06, Roland Mainz <[EMAIL PROTECTED]> wrote:

Hi!

----

While looking at "isaexec" machinery which selects the ksh93 binary to
be used I realised that it uses /usr/bin/i86/ksh93 on a Solaris version
which supports as minimum version the Intel Pentium-1 (or higher) ...

... my proposal would be to bump the default optimisation settings up to
match the minimum supported hardware for better performace - the
question is: Which flags should we use ?

- For SPARC we have the choice between the currenty used "sparcv8" or
"sparcv8plus" - "sparcv8plus" has the advantage of native 64bit
registers but there is a warning in
http://cvs.opensolaris.org/source/xref/on/usr/src/lib/libc/sparc/threads/sparc.il#58
which makes me worry about binary compatiblity:
-- snip --
We can't compile the 32-bit libc with -xarch=v8plus because
then %g5 would become a scratch register and we would break
32-bit applications that use %g5 as an invariant register.
-- snip --
... which simply raises the question whether we can mix "sparcv8" and
"sparcv8plus" code in one application. I guess the answer is "no"... but
I am not sure and I can't remember anymore since I am now more than 96h
awake... ;-(

Does anyone know whether a "sparcv8" application can link to a library
which was compiled with "sparcv8plus" ?

- For x86 I think we can use "-xarch=pentium_pro" (or
-xtarget=pentium_pro ?) and move the binary to
/usr/bin/pentium_pro/ksh93 ... are there any objections against that ?

----

Bye,
Roland

--
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to