On Tue, Mar 11, 2014 at 10:35 AM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 11.03.2014 17:29, schrieb Ian Romanick: >> and there was much rejoicing. The timings that we >> use in the compiler backend are 22 cycles for POW, and 14 cycles for MUL >> on Haswell. The numbers are similar (but slightly longer) on >> Sandybridge and Ivybridge. > I think that works if you just care about latency. Since it appears you > have a "base latency" of 14 cycles for anything, but 22 for POW however > it looks to me like POW is significantly more expensive. (That is, if > you'd try to issue nothing but POWs or probably other functions from the > extended math group, you'd find you could only get 1/4 or so from the > throughput you get with MULs, since you probably cannot issue that > function every two cycles, but you can do that with MULs. Just a guess > though, assuming that during these additional latency cycles the hw > cannot do another POW, and even if true maybe latency is really still > more relevant in practice. But as said that's just a wild guess I blame > the docs for that :-).)
Nope, you're right. Haswell can issue 8 multiplies per EU per cycle, but only one pow. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev