On 04/23/2012 11:42 AM, Ulrich Weigand wrote:
Vladimir Makarov wrote:
I have a mixed feeling with the patch. I've tried it on SPEC2000 on
x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
(SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
comparison with the current algorithm. It is slower too. Although
the difference is quite insignificant on Corei7, compiler speed
slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also
generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200)
and practically the same average code on x86-64 and ARM (I've tried
only SPECINT on ARM).
Was this using NEON on ARM? The biggest benefits we saw involved
vector code ...
Sorry, No.
On 04/17/2012 04:29 AM, Richard Sandiford wrote:
Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch. It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future. There must be
better ways of achieving the same thing.
It is up to you, Richard. I don't mind if you commit it into the trunk.
There is something in your approach too. If it would be possible to get
the best of the two methods, we could see a big improvement.
On s390, the patch is pretty much a straight improvement across the line:
0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement
in GemsFDTD (9.3%), and no regression beyond usual noise levels.
That is a really big improvement.
Given that, and the impressive improvements we saw on some ARM benchmarks
involving vector code (up to 70%), it would really be a pity to see the
patch going nowhere ...
Is there anything we can do to help make the patch more acceptable?
Any suggestions welcome ...
I did not object the patch. As I wrote Richard, it is up to him to
decide to commit the patch or not (it still is). I had mixed results on
x86-64 -- some better and some worse (but with average the same) with
pretty big variations. Those big variations mean that people could use
this for fine-tuning their programs.
Taking your results for S390 and ARM with Neon into account, I guess it
should be included and probably made by default for these 2 targets (for
sure for s390).