On Mon, 2003-11-10 at 16:27, Adam Heath wrote: > On Mon, 10 Nov 2003, Joe Wreschnig wrote: > > > A program that is CPU-bound *and* can be encoded more efficiently will > > benefit from compiler optimizations. Some CPU bound things just aren't > > going to be helped much by vectorization, instruction reordering, etc. I > > mean, integer multiply is integer multiply. > > But if the target cpu supports pipelining, and has multiple multiplication > units(which means it can do them in parallel), or can do a 128bit multiple, or > 1 64 bit multiple, at once, then it's more efficient to do a partial loop > unroll, and thereby have faster code, because of more efficient parallization. > > (sorry, read Dr. Dobbs last week).
I knew someone would chime in with this. :) AIUI this is only possible when there is no data dependency issue (i.e. multiply no. n+1 does not depend on no. n), otherwise you still have to serialize them. This is also a good example where optimizing for one chip might slow another one; say you've got 2 multiplication units on chip A, but only 1 on chip B. You unroll the loop partially when compiling. On A, this helps, because you can do both multiplies at once. On B, this may slow it down because of greater icache usage from the unrolled loop, or because B could be doing (e.g.) an add and a multiply but not two multiplies. Of course, I'm far from a compiler and chip design expert (or even novice); this is what I remember from my classes last year. :) But it shows how complicated optimizing compilers can get, and why you can't say any optimization is always good/safe/faster/etc. The only truly safe way to tell is extensive, controlled benchmarking. -- Joe Wreschnig <[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part