http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47990
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-01 13:07:00 UTC --- The intel compiler does not perform this optimization even at -fast. It does perform the demotion on float foo (float x, float y) { return (int)((float)(x/y + 0.5)) * y; } though, even with default optimization (also with the conversion to int removed or associated to apply to the first operand of the multiplication only). So they leave alone what looks like a usual "rounding" pattern. My original idea was to fold (int)((double)(x/y) + 0.5) to (int)(x/y + 0.5f), similar to (float)((double)(x/y) + 0.5) to (x/y + 0.5f) which we already do (at -O0, in convert_to_real).