http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51581

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-12-16 
16:25:26 UTC ---
Created attachment 26111
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26111
div2.c

Testcase where f1-f6 are normal integer division loops and f7-f12 the same
division rewritten manually as multiplication the expander performs, so it can
be autovectorized.  With a pattern recognizer we'd do something similar to
this.

Timings for -O3 -mavx on SandyBridge CPU, in each case 500000 calls to fN ():
500000 * 4096 divisions:
/ 3   0m1.964s -> 0m0.706s
/ 3U  0m1.626s -> 0m0.705s
/ 18  0m2.181s -> 0m0.868s
/ 18U 0m1.629s -> 0m0.708s
/ 19  0m2.183s -> 0m0.863s
/ 19U 0m2.635s -> 0m0.862s

Reply via email to