http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51581
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-12-16 16:25:26 UTC --- Created attachment 26111 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26111 div2.c Testcase where f1-f6 are normal integer division loops and f7-f12 the same division rewritten manually as multiplication the expander performs, so it can be autovectorized. With a pattern recognizer we'd do something similar to this. Timings for -O3 -mavx on SandyBridge CPU, in each case 500000 calls to fN (): 500000 * 4096 divisions: / 3 0m1.964s -> 0m0.706s / 3U 0m1.626s -> 0m0.705s / 18 0m2.181s -> 0m0.868s / 18U 0m1.629s -> 0m0.708s / 19 0m2.183s -> 0m0.863s / 19U 0m2.635s -> 0m0.862s