Hello, maybe this is the better list to post the problem (see below).
Regards Ralf On Wednesday, 26. September 2007 18:23:34 Ralf Lübben wrote: > Ok, > > the problems seems to be the pow() function. If I use instead the function > gsl_pow_int(double x, int n) from the gsl library the performance on the > x86_64 machine is much faster. > I call the pow function with the following values: > > pow(5.0,-3.0); > pow(10.0,-3.0); > pow(15.0,-3.0); > pow(20.0,-3.0); > > The problem also occurs with gcc 4.2.1, but not with the x86 Ubuntu Feity > Fawn distribution on the x86_64 machine. > Sorry for this misinformation before. > > Ralf > > On Wednesday, 26. September 2007 16:35:50 Ralf Lübben wrote: > > Hi, > > > > I just have tried two other setups on the x86_64 machine: > > > > 1. Ubuntu Feisty Fawn (gcc 4.1.2) server x86: > > - Expected performance: about two times faster than on my notebook > > > > 2. Ubuntu Gutsy Gibbon (gcc 4.2.1) server x86: > > - nearly same performance than "Ubuntu Feisty Fawn (gcc 4.1.2) server > > x86" - Expected performance: about two times faster than on my notebook > > > > Was there a change from gcc 4.1.2 to gcc 4.2.1 which could explain that? > > Or is there anything else which could explain that? > > > > Ralf > > > > On Wednesday, 26. September 2007 10:35:20 Ralf Lübben wrote: > > > Hello, > > > > > > in the last days I ran a simulation on a x86_64 architecture: > > > ################### > > > processor : 0 > > > vendor_id : GenuineIntel > > > cpu family : 15 > > > model : 6 > > > model name : Genuine Intel(R) CPU 3.20GHz > > > stepping : 8 > > > cpu MHz : 3192.081 > > > cache size : 8192 KB > > > physical id : 0 > > > siblings : 2 > > > core id : 0 > > > cpu cores : 2 > > > fpu : yes > > > fpu_exception : yes > > > cpuid level : 6 > > > wp : yes > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > > > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall > > > nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cid cx16 xtpr lahf_lm > > > bogomips : 6390.34 > > > clflush size : 64 > > > cache_alignment : 128 > > > address sizes : 40 bits physical, 48 bits virtual > > > power management: > > > ##################### > > > > > > with very poor performance. > > > > > > I ran the same simulations on my notebook: > > > > > > ###################### > > > processor : 0 > > > vendor_id : AuthenticAMD > > > cpu family : 6 > > > model : 8 > > > model name : mobile AMD Athlon(tm) XP 2000+ > > > stepping : 1 > > > cpu MHz : 797.820 > > > cache size : 256 KB > > > fdiv_bug : no > > > hlt_bug : no > > > f00f_bug : no > > > coma_bug : no > > > fpu : yes > > > fpu_exception : yes > > > cpuid level : 1 > > > wp : yes > > > flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca > > > cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts fid vid > > > bogomips : 1596.37 > > > clflush size : 32 > > > ####################### > > > > > > The same simulation is about 10 times faster on my notebook. > > > The simulation was compiled with "-O3 -ffast-math", without > > > "-ffast-math" the performance of the x86_64 architecture is much worse. > > > I used gcc 4.1.2 on Ubuntu, the simulator is Omnet++. > > > > > > There was already a post about the topic: > > > http://gcc.gnu.org/ml/gcc-help/2006-05/msg00185.html > > > on AMD machines. > > > > > > I could also figure out, that one problem ist the pow() function, maybe > > > there are more functions with poor performance on x86_64 machines. > > > > > > Has anyone an idea about the reasons or how to improve the performance > > > on x86_64 machines? > > > > > > Thanks. > > > > > > Regards, > > > Ralf