------- Comment #24 from lucier at math dot purdue dot edu  2008-01-21 22:43 
-------
Subject: Re:  [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code


On Jan 21, 2008, at 2:21 PM, ubizjak at gmail dot com wrote:

> It is not possible to create an executable from direct.i.

That's correct, sorry.

> Could you attach the source that can be used to create the executable?

Here are instructions on how to build and test a modified version of  
Gambit, from which I derived direct.i.

Download the file

http://www.math.purdue.edu/~lucier/gcc/test-files/bugzilla/33928/ 
gambc-v4_1_2.tgz

Build it with the following commands:

> tar zxf gambc-v4_1_2.tgz
> cd gambc-v4_1_2
> ./configure CC='/pkgs/gcc-mainline/bin/gcc -save-temps'
> make -j

If you want to recompile the source after reconfiguring, do

> make mostlyclean


not 'make clean', unfortunately.

Then test it with

> gsi/gsi -e '(define a (time (expt 3 10000000)))(define b (time (* a  
> a)))'

The output ends with something like

> (time (##bignum.make (##fixnum.quotient result-length  
> (##fixnum.quotient ##bignum.adigit-width ##bignum.fdigit-width)) #f  
> #f))
>     4 ms real time
>     5 ms cpu time (3 user, 2 system)
>     no collections
>     3962448 bytes allocated
>     968 minor faults
>     no major faults
> (time (##make-f64vector (##fixnum.* two^n 2)))
>     5 ms real time
>     5 ms cpu time (1 user, 4 system)
>     1 collection accounting for 5 ms real time (1 user, 4 system)
>     33554464 bytes allocated
>     59 minor faults
>     no major faults
> (time (make-w (##fixnum.- log-two^n 1)))
>     30 ms real time
>     31 ms cpu time (17 user, 14 system)
>     no collections
>     16810144 bytes allocated
>     4097 minor faults
>     no major faults
> (time (make-w-rac log-two^n))
>     28 ms real time
>     28 ms cpu time (16 user, 12 system)
>     no collections
>     16826272 bytes allocated
>     4097 minor faults
>     no major faults
> (time (bignum->f64vector-rac x a))
>     45 ms real time
>     45 ms cpu time (20 user, 25 system)
>     no collections
>     -16 bytes allocated
>     8192 minor faults
>     no major faults
> (time (componentwise-rac-multiply a rac-table))
>     26 ms real time
>     26 ms cpu time (26 user, 0 system)
>     no collections
>     -16 bytes allocated
>     no minor faults
>     no major faults
> (time (direct-fft-recursive-4 a table))
>     445 ms real time
>     445 ms cpu time (445 user, 0 system)
>     no collections
>     64 bytes allocated
>     no minor faults
>     no major faults
> (time (componentwise-complex-multiply a a))
>     24 ms real time
>     24 ms cpu time (24 user, 0 system)
>     no collections
>     -16 bytes allocated
>     no minor faults
>     no major faults
> (time (inverse-fft-recursive-4 a table))
>     418 ms real time
>     418 ms cpu time (418 user, 0 system)
>     no collections
>     64 bytes allocated
>     no minor faults
>     no major faults
> (time (componentwise-rac-multiply-conjugate a rac-table))
>     26 ms real time
>     26 ms cpu time (26 user, 0 system)
>     no collections
>     -16 bytes allocated
>     no minor faults
>     no major faults
> (time (bignum<-f64vector-rac a result result-length))
>     108 ms real time
>     108 ms cpu time (108 user, 0 system)
>     no collections
>     112 bytes allocated
>     no minor faults
>     no major faults
> (time (* a a))
>     1170 ms real time
>     1170 ms cpu time (1105 user, 65 system)
>     1 collection accounting for 5 ms real time (1 user, 4 system)
>     71266896 bytes allocated
>     17413 minor faults
>     no major faults


The time for the routine in direct.i is the time reported for direct- 
fft-recursive-4:

> (time (direct-fft-recursive-4 a table))
>     445 ms real time
>     445 ms cpu time (445 user, 0 system)
>     no collections
>     64 bytes allocated
>     no minor faults
>     no major faults

The name of the routine in the .i and .s files is  
___H_direct_2d_fft_2d_recursive_2d_4.

By the way, ___H_inverse_2d_fft_2d_recursive_2d_4 is a similar  
routine implementing the inverse fft, which, for some reason, goes  
faster than the direct (forward) fft.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

Reply via email to