from:"Ronny Peine"

Re: which opt. flags go where? - references

2007-02-08 Thread Ronny Peine

Hi,

maybe http://docs.lib.purdue.edu/ecetr/123/ would also be interesting for you.
There, a quadratic algorithm for finding a nearly optimal set of compiler 
flags is described. The results are quite promising and i have also tested it 
on my own benchmarkingsuite with good results.

cu,
Ronny Peine


pgpFVOsQLoKuf.pgp
Description: PGP signature

Re: which opt. flags go where? - references

2007-02-10 Thread Ronny Peine

Hi,

Am Donnerstag, 8. Februar 2007 13:18 schrieben Sie:
> Thank you very much. After reading the abstract, I'm highly
> interested in this work, because they also use GCC and SPEC CPU2000,
> as I'm planning to do...
>
> Which benchmarks did you test on?
I testet it on freebench-1.03, nbench-byte-2.2.2 and a selfmade lame-benchmark 
(encoding a wav to mp3).
I don't have SPEC CPU because it's not for free.
The runtimes are about one day for each benchmark if tested with nearly all of 
gcc's possible CFLAGS for optimization (tested with 3.4.6 and 4.1.1). So the 
quadratic nature of the algorithm can be quite painful, but it gives better 
results than the linear approach.

cu,
Ronny Peine


pgpVqQpBRZlaN.pgp
Description: PGP signature

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine

Well, i'm studying mathematics and as i know so far 0^0 is always 1 (for 
real and complex numbers) and well defined even in numerical and 
theoretical mathematics. Could you point me to some publications which 
say other things?

cu, Ronny
Duncan Sands wrote:
On Mon, 2005-03-07 at 10:51 -0500, Robert Dewar wrote:
Paolo Carlini wrote:
Andrew Haley wrote:

F9.4.4 requires pow (x, 0) to return 1 for any x, even NaN.

Indeed. My point, basically, is that consistency appear to require the
very same behavior for *complex* zero^zero.
I am not sure, it looks like the standard is deliberately vague here,
and is not requiring this result.

Mathematically speaking zero^zero is undefined, so it should be NaN.
This already clear for real numbers: consider x^0 where x decreases
to zero.  This is always 1, so you could deduce that 0^0 should be 1.
However, consider 0^x where x decreases to zero.  This is always 0, so
you could deduce that 0^0 should be 0.  In fact the limit of x^y
where x and y decrease to 0 does not exist, even if you exclude the
degenerate cases where x=0 or y=0.  This is why there is no reasonable
mathematical value for 0^0.
Ciao,
Duncan.

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine

Hi again,
a small proof.
if A and X are real numbers and A>0 then
A^X := exp(X*ln(A)) (Definition in analytical mathematics).
0^0 = lim A->0, A>0 (exp(0*ln(A)) = 1 if exp(X*ln(A)) is continual continued
The complex case can be derived from this (0^(0+ib) = 0^0*0^ib = 1 = 
0^a*0^(i*0) ).
Well, i know only the german mathematical expressions, so maybe the 
translations to english are not accurate, sorry for this :)

cu, Ronny

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine

Hi,
Marcin Dalecki wrote:
On 2005-03-08, at 01:47, Ronny Peine wrote:
Hi again,
a small proof.

How cute.
if A and X are real numbers and A>0 then
A^X := exp(X*ln(A)) (Definition in analytical mathematics).
0^0 = lim A->0, A>0 (exp(0*ln(A)) = 1 if exp(X*ln(A)) is continual 
continued

The complex case can be derived from this (0^(0+ib) = 0^0*0^ib = 1 = 
0^a*0^(i*0) ).
Well, i know only the german mathematical expressions, so maybe the 
translations to english are not accurate, sorry for this :)

You managed to hide the proof very well. I can't find it.
I don't think it's hidden. The former definiton is absolutely right.

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine


Joe Buck wrote:
On Tue, Mar 08, 2005 at 01:47:13AM +0100, Ronny Peine wrote:
Hi again,
a small proof.
if A and X are real numbers and A>0 then
A^X := exp(X*ln(A)) (Definition in analytical mathematics).

That is an incomplete definition, as 0^X is well-defined.

0^0 = lim A->0, A>0 (exp(0*ln(A)) = 1 if exp(X*ln(A)) is continual continued

Your proof is wrong; since you even propose it you probably have not been
exposed to partial differential equations.  You have a two-dimensional
plane; you can approach the origin from any direction.
The direction you chose was to keep the exponent constant at 0.  Then
you get a limit of 1.
An alternate choice is to keep the base constant at 0, choose a positive
exponent and let it approach zero.  Then you get a limit of 0.
Well, then it would be lim x->0 (0^x) = 1 because 0^x is 1 for every x 
element of |R_>0

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine


Ronny Peine wrote:

Joe Buck wrote:
On Tue, Mar 08, 2005 at 01:47:13AM +0100, Ronny Peine wrote:
Hi again,
a small proof.
if A and X are real numbers and A>0 then
A^X := exp(X*ln(A)) (Definition in analytical mathematics).

That is an incomplete definition, as 0^X is well-defined.

0^0 = lim A->0, A>0 (exp(0*ln(A)) = 1 if exp(X*ln(A)) is continual 
continued

Your proof is wrong; since you even propose it you probably have not been
exposed to partial differential equations.  You have a two-dimensional
plane; you can approach the origin from any direction.
The direction you chose was to keep the exponent constant at 0.  Then
you get a limit of 1.
An alternate choice is to keep the base constant at 0, choose a positive
exponent and let it approach zero.  Then you get a limit of 0.
Well, then it would be lim x->0 (0^x) = 1 because 0^x is 1 for every x 
element of |R_>0

Sorry for this, maybe i should sleep :) (It's 2 o'clock here)
But as i know of 0^0 is defined as 1 in every lecture i had so far.

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine

Well, these were math lectures (Analysis 1,2 and 3, Function Theory, 
Numerical Mathematics and so on). In every lectures it was defined as 1 
and in most cases mathematical expressions are mostly tried to transform 
in equivalent calculations for the FPU (even though associativity is for 
example not preserved).

I don't know of any standard which defines this to 0.
Robert Dewar wrote:
Ronny Peine wrote:
Sorry for this, maybe i should sleep :) (It's 2 o'clock here)
But as i know of 0^0 is defined as 1 in every lecture i had so far.

Were these math classes, or CS classes.
Generally when you have a situation like this, where the value of
the function is different depending on how you approach the limit,
you prefer to simply say that the function is undefined at that
point. As we have discussed, computers, which are not doing real
arithmetic in any case, often extend domains for convenience, as
in this case.

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine

Hi again,
a small example often used in mathematics and electronic engineering:
the geometric row ("Reihe" in german, i don't know the correct 
expression in english):

sum from k=0 to +unlimited q^k = 1/(1-q) if |q|<1.
this is also correct for q=0 where the sum gives q^0+q^1+q^2+...= 1 + 0 
+ 0 + ... (if 0^0 = 1) and 1/(1-q) = 1 too.
I have read some parts in ieee 754 and articles about this but they say
that 0^0 is very questionable, some say it's 0 some say it's 1.
Well, after the standard there doesn't seem to be an accurate answer, in 
mathematics it's 1 (and will always be 1).

cu,
Ronny
Ronny Peine wrote:
Well, these were math lectures (Analysis 1,2 and 3, Function Theory, 
Numerical Mathematics and so on). In every lectures it was defined as 1 
and in most cases mathematical expressions are mostly tried to transform 
in equivalent calculations for the FPU (even though associativity is for 
example not preserved).

I don't know of any standard which defines this to 0.
Robert Dewar wrote:
Ronny Peine wrote:
Sorry for this, maybe i should sleep :) (It's 2 o'clock here)
But as i know of 0^0 is defined as 1 in every lecture i had so far.

Were these math classes, or CS classes.
Generally when you have a situation like this, where the value of
the function is different depending on how you approach the limit,
you prefer to simply say that the function is undefined at that
point. As we have discussed, computers, which are not doing real
arithmetic in any case, often extend domains for convenience, as
in this case.

Re: __builtin_cpow((0,0),(0,0))

2005-03-07 Thread Ronny Peine

Maybe i found something:
http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps
page 9 says:
"A number of real expressions are sometimes implemented as INVALID
by mistake, or declared Undefined by illconsidered
language standards; a few examples are ...
0.0**0.0 = inf**0.0 = NaN**0.0 = 1.0, not Nan;"
I'm not really sure if he means that it should be 1.0 or it should be 
NaN but i think he means 1.0.

Ronny Peine wrote:
Hi again,
a small example often used in mathematics and electronic engineering:
the geometric row ("Reihe" in german, i don't know the correct 
expression in english):

sum from k=0 to +unlimited q^k = 1/(1-q) if |q|<1.
this is also correct for q=0 where the sum gives q^0+q^1+q^2+...= 1 + 0 
+ 0 + ... (if 0^0 = 1) and 1/(1-q) = 1 too.
I have read some parts in ieee 754 and articles about this but they say
that 0^0 is very questionable, some say it's 0 some say it's 1.
Well, after the standard there doesn't seem to be an accurate answer, in 
mathematics it's 1 (and will always be 1).

cu,
Ronny
Ronny Peine wrote:
Well, these were math lectures (Analysis 1,2 and 3, Function Theory, 
Numerical Mathematics and so on). In every lectures it was defined as 
1 and in most cases mathematical expressions are mostly tried to 
transform in equivalent calculations for the FPU (even though 
associativity is for example not preserved).

I don't know of any standard which defines this to 0.
Robert Dewar wrote:
Ronny Peine wrote:
Sorry for this, maybe i should sleep :) (It's 2 o'clock here)
But as i know of 0^0 is defined as 1 in every lecture i had so far.


Were these math classes, or CS classes.
Generally when you have a situation like this, where the value of
the function is different depending on how you approach the limit,
you prefer to simply say that the function is undefined at that
point. As we have discussed, computers, which are not doing real
arithmetic in any case, often extend domains for convenience, as
in this case.

Re: __builtin_cpow((0,0),(0,0))

2005-03-08 Thread Ronny Peine

Well this article was referenced by http://grouper.ieee.org/groups/754/, 
so i don't think it's an unreliable source.

It would be nice if you wouldn't try to insult me Joe Buck, that's not 
very productive.

Robert Dewar wrote:
Marcin Dalecki wrote:
Are we a bit too obedient today? Look I was talking about the paper 
presented
above not about the author there of.

But a paper like this must be read in context, and if you don't
know who the author is, you
a) don't have the context to read the paper
b) you show yourself to be remarkably ignorant about the field

Re: [OT] __builtin_cpow((0,0),(0,0))

2005-03-08 Thread Ronny Peine

Well, you are right, this discussion becomes a bit off topic.
I think 0^0 should be 1 in the complex case, too. Otherwise the complex
and real definitions would collide.
Example:
use complex number 0+i*0 then this should be handled equivalent to the
real number 0. Otherwise the programmer would get quite irritated if he
transforms real numbers into equivalent complex numbers (a -> a+i*0).
Paolo Carlini wrote:
Chris Jefferson wrote:
What we are debating here isn't really maths at all, just the 
definition which will be most useful and least suprising (and perhaps 
also what various standards tell us to use).

Also, since we are definitely striving to consistently implement the 
current C99 and C++ Standards, it's *totally* pointless discussing 0^0 
in the real domain: it *must* be one. Please, people, don't overflow 
the gcc development list with this kind of discussion. I feel guilty 
because of that, by the way: please, accept my apologies. My original 
question was *only* about consistency between the real case (pow) and 
the complex case (cpow, __builtin_cpow, std::complex::pow).

Paolo.

Re: __builtin_cpow((0,0),(0,0))

2005-03-08 Thread Ronny Peine

Maybe i should make it more clearer, why 0^x is not defined for real 
exponents x, and not continual in any way.

Be G a set ("Menge" in german) and op : G x G -> G, (a,b) -> a op b.
If op is associative than (G,op) is called a half-group.
Therefore then exponentiation is defined as:
a from G, n from |N>0:
a^1 = a; a^n = a op a^(n-1)
If a neutral element is in G (mostly called the "1") than a^0 is defined 
as 1.

Example (Z,+) is a half-group (it's even a group). Therefor a^n = a + a 
+ a + ... + a (n times).

For real exponents this is not defined in the above case, therefore
(Example: what would be 2^pi?) a definition which is in accordance to 
the previous one was defined:
For A,X from |R, A>0:
A^X = exp(X*ln(A))

with exp(N*X) = exp(X)^N (which can be proofed by induction) it can
be seen that it is in accordance to the previous definition (if X is 
from |N).

The rule a^(1/n) = n-th root of a comes from the proof:
Be a from |R, a>0 and p from Z, q from |N>1, then:
a^p = exp(p * ln(a)) = exp(q * (p/q) * ln(a)) = exp(p/q * ln(a))^q = 
(a^(p/q))^q => a^(p/q) = q-th root of a^p (remind that this is only true 
for a>0).

For 0^x there is no such definition except of x is from |N. Therefore 
0^0 is defined as according to the first rule as 1 (because we look at
the group (|R,*) with a^n= a*a*a* ... *a (n times) and the neutral 
element 1, therefore a^0 = 1 for every element in |R).

I hope that this make things clearer for some who don't believe 0^0 = 1 
in the real case.

cu, Ronny
Robert Dewar wrote:
Ronny Peine wrote:
Well this article was referenced by 
http://grouper.ieee.org/groups/754/, so i don't think it's an 
unreliable source.

Since Kahan is one of the primary movers behind 754 that's not so 
surprising.
For me, 754 is authoritative significantly because of this connection.
If there were a case where Kahan disagreed with 754, I would suspect
that the standard had made a mistake :-)

Re: __builtin_cpow((0,0),(0,0))

2005-03-08 Thread Ronny Peine

This proof is absolutely correct and in no way bogus, it is lectured to 
nearly every mathematics student PERIOD
But you are right, if the standards handles this otherwise, then this 
doesn't help in any case.

Robert Dewar wrote:
Ronny Peine wrote:
I hope that this make things clearer for some who don't believe 0^0 = 
1 in the real case.

Believe??? so now its a matter of religeon. Anyway, your bogus proof is
irrelevant for the real case, since the language standard is clear in
the real case anyway. It really is completely pointless to argue this
from a mathematical point of view, the only proper viewpoint is that
of the standard. You would do better to go read that!

Re: __builtin_cpow((0,0),(0,0))

2005-03-16 Thread Ronny Peine

Hi,
Kai Henningsen wrote:
[EMAIL PROTECTED] (Robert Dewar)  wrote on 07.03.05 in <[EMAIL PROTECTED]>:

Ronny Peine wrote:

Sorry for this, maybe i should sleep :) (It's 2 o'clock here)
But as i know of 0^0 is defined as 1 in every lecture i had so far.
Were these math classes, or CS classes.

Let's just say that this didn't happen in any of the German math classes I  
ever took, school or uni. This is in fact a classic example of this type  
of behaviour.


Generally when you have a situation like this, where the value of
the function is different depending on how you approach the limit,
you prefer to simply say that the function is undefined at that
point.

And that's how it was always taught to me.
Well yes, in the general case this is the right way. But for some 
special cases a definition is used to simplify mathematical sentences as 
it is done for 0^0 = 1 or gcd(0,0,...,0) = 0. See for example:
http://mathworld.wolfram.com/ExponentLaws.html

Even though, gcc returns 1 for pow(0.0,0.0) in version 3.4.3 like many 
other c-compiler do. The same behaviour would be expected from cpow.


This is, of course, a different question from what a library should  
implement ... though I must say if I were interested in NaNs at all for a  
given problem, I'd be disappointed by any such library that didn't return  
a NaN for 0^0, and of any language standard saying so - I'd certainly  
consider a result of 1 wrong in the general case.

MfG Kai


cu, Ronny

Re: __builtin_cpow((0,0),(0,0))

2005-03-17 Thread Ronny Peine


Dave Korn wrote:
Original Message
From: Ronny Peine
Sent: 16 March 2005 17:34

See for example:
http://mathworld.wolfram.com/ExponentLaws.html

  Ok, I did.

Even though, gcc returns 1 for pow(0.0,0.0) in version 3.4.3 like many 
other c-compiler do. The same behaviour would be expected from cpow.

  No, you're wrong (that the same behaviour would be expected from cpow).
See for example:
http://mathworld.wolfram.com/ExponentLaws.html
" Note that these rules apply in general only to real quantities, and can
give manifestly wrong results if they are blindly applied to complex
quantities. "

Well yes in the general case it's not applieable, but x^0 is 1 in the 
complex case, too. And if 0^0 is converted from the real to the complex 
domain (it's even a part of the complex domain) than the same behaviour 
would be expected, otherwise the definition wouldn't be very well.

Has anyone found a hint in the ieee754 standard if there is something 
about it in there? I haven't one here right now, well it's not 
prizeless. Otherwise these discussion won't end.

cheers,
  DaveK
cu, Ronny

Performance comparison of gcc releases

2005-12-15 Thread Ronny Peine

Hi,

i thought, maybe it interests you. I have done some benchmarking to compare 
different gcc releases. Therefore, i have written a bash based benchmarksuite 
which compiles and benches mostly C-benchmarks.

Benchmark environment:

The benchmarks are running on i686-pc-linux-gnu (gentoo based) system
on a Athlon XP 2600+ Barton Core (512 Kbyte L2 Cache) with 1.9Ghz and
512 MB Ram. The benchmarks are all running with nice -n -20 for minimizing
noise. Therefore, only very few processes are running while benchmarking (only 
kernel, agetty, bash, udev, login and syslog).

The benchmarks:

I'm using freebench-1.03, nbench-byte-2.2.2 and a selfmade lamebench-3.96.1 
based on lame-3.96.1.
In the lamebenchmark a wav-file is compressed to an mp3 file, the time is 
measured for this.

The benchmark procedure:

A bash script named startbenchrotation starts the given benchmarks with the 
following commandlines:

startfreebench-1.03()
{
make distclean >/dev/null 2>&1
nice -n -20 make ref >/dev/null 2>&1
}

startnbench-byte-2.2.2()
{
make mrproper >/dev/null 2>&1
make >/dev/null 2>&1
nice -n -20 ./nbench 2>/dev/null > ./resultfile.txt
}

startlamebench-3.96.1()
{
rm -rf lamebuild
rm -f testfile.mp3
mkdir lamebuild || error "Couldn't mkdir lamebench"
cd lamebuild
../lame-3.96.1/configure >/dev/null 2>&1
make >/dev/null 2>&1
START=`date +%s`
nice -n -20 frontend/lame -m s --quiet -q 0 -b 128 
--cbr ../testfile.wav ../testfile.mp3
END=`date +%s`
cd ..
echo "$((${END}-${START}))" >./resultfile.txt
}

Each benchmark is run with a combination of cflags.
The cflags are composed of base-flags and testingflags, eg.:
BASEFLAGS="-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe"
TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fmove-all-movables|-freduce-all-givs|-ffast-math|-ftrace
r|-funroll-loops|-funroll-all-loops|-fprefetch-loop-arrays|-mfpmath=sse|-mfpmath=sse,387|-momit-leaf-frame-poi
nter"
'|' is used as a field-seperator. First, all flags from baseflags and 
testingflags are combined and the benchmark is started and the result is 
taken as the best result.
Then, a flag from the testingflags is removed and the benchmark is repeated.
If the arithmetic average of the results in the repeated benchmark is better 
than the arithmetic average of the best results so far, than the tested cflag 
is noted as worst flag. This is done for all flags in the testingflags and 
the worst flag of all is filtered out of the testingflags.
After this, the above procedure is started again with the new testingflags 
without the filtered one.
(This heuristical approach to compiler performance comparison was described on 
the gcc ml some month ago, "Compiler Optimization Orchestration For Peak 
Performance" by Zhelong Pan and Rudolf Eigenmann).

The protagonists:

The tested compilers were: gcc-3.3.6 (gentoo system compiler), gcc-3.4.4 
(gentoo system compiler) and gcc-4.0.2 (FSF release).

The results:

All results are written as relativ performance measures. gcc-3.3.6 is taken as 
the basis for all relations.
If x% > 0% then this means, that the given compiler generated code which was 
x% faster than the code from gcc-3.3.6. For x% <= 0% it means the generated 
code was slower. All relations base on the arithmetic average of all passes 
of a benchmark of the best achieved result.

benchmark:  gcc-3.3.6  gcc-3.4.4  gcc-4.0.2

freebench-+1% -5%
nbench-+13%   +11%
lamebench   -   +1%  +1%


Conclusion:

Well, this benchmarksuite is only meant for some comparisons between different 
compilers to estimate performance in real life applications.
If you are interested in future benchmarks of newer releases, i could offer 
this service. If you think this is uninteresting, i won't send any benchmark 
measures anymore. I hope this will help track the performance of code 
generated by gcc and help gcc getting better in this afford. Constructiv 
critics is always welcomed. I hope you guys keep up your work on improving 
gcc.

Thanks for reading,
Ronny Peine

Performance comparison of gcc releases

2005-12-15 Thread Ronny Peine

Hi,

i forgot to post the best cflags for each gcc-version and benchmark.
Here are the results:

gcc-3.3.6:
nbench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe 
-fforce-addr -fsched-spec-load -fmove-all-movables -ffast-math -ftracer 
-funroll-loops -funroll-all-loops -mfpmath=sse -momit-leaf-frame-pointer

freebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fmove-all-movables -freduce-all-givs -ftracer 
-funroll-all-loops -fprefetch-loop-arrays -mfpmath=sse 
-momit-leaf-frame-pointer

lamebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fmove-all-movables -freduce-all-givs -funroll-loops 
-funroll-all-loops -mfpmath=sse -mfpmath=sse,387 -momit-leaf-frame-pointer


gcc-3.4.4:
nbench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks 
-fsched2-use-traces -fmove-all-movables -ffast-math -funroll-loops 
-funroll-all-loops -fpeel-loops -fold-unroll-loops 
-fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387

freebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks 
-fsched2-use-traces -freduce-all-givs -ffast-math -ftracer -funroll-loops 
-funroll-all-loops -fpeel-loops -fold-unroll-loops -fold-unroll-all-loops 
-fbranch-target-load-optimize -fbranch-target-load-optimize2 -mfpmath=sse 
-mfpmath=sse,387 -momit-leaf-frame-pointer

lamebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks 
-fsched2-use-traces -fmove-all-movables -freduce-all-givs -ftracer 
-funroll-loops -funroll-all-loops -fpeel-loops -fold-unroll-loops 
-fold-unroll-all-loops -fbranch-target-load-optimize 
-fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387 
-momit-leaf-frame-pointer


gcc-4.0.2:
nbench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fmodulo-sched -fgcse-sm -fgcse-las -fsched-spec-load -ftree-vectorize 
-ftracer -funroll-loops -fvariable-expansion-in-unroller 
-fprefetch-loop-arrays -freorder-blocks-and-partition -fweb -ffast-math 
-fmove-loop-invariants -fbranch-target-load-optimize 
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -momit-leaf-frame-pointer 
-D__NO_MATH_INLINES

freebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fmodulo-sched 
-fsched-spec-load -freschedule-modulo-scheduled-loops -ftree-vectorize 
-ftracer -funroll-loops -fvariable-expansion-in-unroller 
-fprefetch-loop-arrays -freorder-blocks-and-partition -fmove-loop-invariants 
-fbranch-target-load-optimize -fbranch-target-load-optimize2 
-fbtr-bb-exclusive -momit-leaf-frame-pointer -D__NO_MATH_INLINES

lamebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fgcse-sm 
-fgcse-las -fsched-spec-load -fsched2-use-superblocks -fsched2-use-traces 
-freschedule-modulo-scheduled-loops -ftracer -funroll-loops 
-fvariable-expansion-in-unroller -freorder-blocks-and-partition -fweb 
-ffast-math -fpeel-loops -fmove-loop-invariants -fbranch-target-load-optimize 
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -mfpmath=sse 
-mfpmath=sse,387 -momit-leaf-frame-pointer -D__NO_MATH_INLINES


The time for one benchmark and one compiler takes from 6 to 48 hours and 
depends heavily on the given testingflags (the used algorithm for 
flagfiltering is O(n^2)).

The testingflags for each compiler is:

gcc-3.3.6:
TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fmove-all-movables|-freduce-all-givs|-ffast-math|
-ftracer|-funroll-loops|-funroll-all-loops|-fprefetch-loop-arrays|-mfpmath=sse|-mfpmath=sse,387|
-momit-leaf-frame-pointer"

gcc-3.4.4:
TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fsched2-use-superblocks|
-fsched2-use-superblocks -fsched2-use-traces|-fmove-all-movables|
-freduce-all-givs|-ffast-math|-ftracer|-funroll-loops|-funroll-all-loops|
-fpeel-loops|-fold-unroll-loops|-fold-unroll-all-loops|-fprefetch-loop-arrays|
-fbranch-target-load-optimize|-fbranch-target-load-optimize2|-mfpmath=sse|
-mfpmath=sse,387|-momit-leaf-frame-pointer"

gcc-4.0.2:
TESTINGFLAGS="-fforce-addr|-fmodulo-sched|-fgcse-sm|-fgcse-las|-fsched-spec-load|
-fsched2-use-superblocks -fsched2-use-traces|
-freschedule-modulo-scheduled-loops| -ftree-vectorize|
-ftracer|-funroll-loops|-fvariable-expansion-in-unroller|
-fprefetch-loop-arrays|-freorder-blocks-and-partition|-fweb|-ffast-math|-fpeel-loops|
-fmove-loop-invariants|-fbranch-target-load-optimize|-fbranch-target-load-optimize2|
-fbtr-bb-exclusive|-mfpmath=sse|-mfpmath=sse,387|-momit-leaf-frame-pointer|-D__NO_MATH_INLINES"

-ftree-loop-linear is removed from the testingflags in gcc-4.0.2 because it 
leads to an endless loop in neural net in nbench.

Re: Performance comparison of gcc releases

2005-12-16 Thread Ronny Peine

Hi,

Am Freitag, 16. Dezember 2005 19:50 schrieb Sebastian Pop:
> Ronny Peine wrote:
> > -ftree-loop-linear is removed from the testingflags in gcc-4.0.2 because
> > it leads to an endless loop in neural net in nbench.
>
> Could you fill a bug report for this one?

Done.

cu,
Ronny Peine

Re: Performance comparison of gcc releases

2005-12-16 Thread Ronny Peine

Hi,

Am Freitag, 16. Dezember 2005 19:31 schrieb Dan Kegel:
> Your PR is a bit short on details.  For instance, it'd be nice to
> include a link to the source for nbench, so people don't have
> to guess what version you're using.  Was it
>   http://www.tux.org/~mayer/linux/nbench-byte-2.2.2.tar.gz
> ?
>
> It'd be even more helpful if you included a recipe a sleepy person
> could use to reproduce the problem.  In this case,
> something like
>
> wget http://www.tux.org/~mayer/linux/nbench-byte-2.2.2.tar.gz
> tar -xzvf nbench-byte-2.2.2.tar.gz
> cd nbench-byte-2.2.2
> make CC=gcc-4.0.1  CFLAGS="-ftree-loop-linear"
>
> Unfortunately, I couldn't reproduce your problem with that command.
> Can you give me any tips?
>
> Finally, it's helpful when replying to the list about filing a PR
> to include the PR number or a link to the PR.
> The shortest link is just gcc.gnu.org/PR%d, e.g.
>http://gcc.gnu.org/PR25449

Sorry, i had forgotten to give the information. It was nbench-2.2.2.
a 'make CC=gcc-4.0.2 CFLAGS="-O3 -march=... -ftree-loop-linear"'
should be enough.
The bugreport is a duplicate of 20256, as i have written into bugzilla.
The source extracted in 20256 is nearly the same as the 'neural net' 
benchmark.

The next time i write a bugreport, i should more concentrate on it, sorry 
again for this.

cu,
Ronny Peine

Christmas

2005-12-23 Thread Ronny Peine

Hi all,

i'm going into holiday and i wish you all of the gcc-team a happy christmas
and thanks for all your work, even though it is still to early for christmas 
wishes :).

cu,
Ronny Peine

Re: Very Fast: Directly Coded Lexical Analyzer

2007-06-01 Thread Ronny Peine

Hi,

my questions is, why not use the element construction algorithm? The Thomson 
Algorithm creates an epsilon-NFA which needs quite a lot of memory. The 
element construction creates an NFA directly and therefor has fewer states. 
Well, this is only interesting in the scanner creation which is not so 
important than the scanner itself, but it can reduce the memory footprint of 
generator. It's a pity i can't find a url for the algorithmdescription, maybe 
i even have the wrong naming of it. I have only read it in script Compiler 
Construction at the University.

cu,
Ronny


pgppPnWr9BDsO.pgp
Description: PGP signature

Re: Very Fast: Directly Coded Lexical Analyzer

2007-08-17 Thread Ronny Peine

Am Freitag, 10. August 2007 schrieben Sie:
> To me, very fast (millions of lines a second) lexical analyzers are
> trivial to write by hand, and I really don't see the point of tools,
> and certainly not the utility of any theory in writing such code.
> If anything the formalism of a finite state machine just gets in the
> way, since it is more efficient to encode the state in the code
> location than in data.

Well, there are people out there who don't want to write everytime the same 
code. Why not making your life easier with using autogeneration tools, it 
also reduces bugpropability.




signature.asc
Description: This is a digitally signed message part.

Re: which opt. flags go where? - references

Re: which opt. flags go where? - references

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: [OT] __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Re: __builtin_cpow((0,0),(0,0))

Performance comparison of gcc releases

Performance comparison of gcc releases

Re: Performance comparison of gcc releases

Re: Performance comparison of gcc releases

Christmas

Re: Very Fast: Directly Coded Lexical Analyzer

Re: Very Fast: Directly Coded Lexical Analyzer

23 matches

Site Navigation

Mail list logo

Footer information