[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

ubizjak at gmail dot com Thu, 05 Apr 2007 04:46:25 -0700


------- Comment #11 from ubizjak at gmail dot com  2007-04-05 10:58 -------
(In reply to comment #10)
> I would look at the lreg output, which contains the results of regclass.


No, the difference is due to ssa pass that generates:

  # v1z_10 = PHI <v1z_13(2), v1z_32(3)>
  # v1y_9 = PHI <v1y_12(2), v1y_31(3)>
  # v1x_8 = PHI <v1x_11(2), v1x_30(3)>
  # i_7 = PHI <i_17(2), i_36(3)>
  # v3z_6 = PHI <v3z_18(D)(2), v3z_29(3)>
  # v3y_5 = PHI <v3y_19(D)(2), v3y_26(3)>
  # v3x_4 = PHI <v3x_20(D)(2), v3x_23(3)>
  # v2z_3 = PHI <v2z_16(2), v2z_35(3)>
  # v2y_2 = PHI <v2y_15(2), v2y_34(3)>
  # v2x_1 = PHI <v2x_14(2), v2x_33(3)>

without -msse and

  # v3z_10 = PHI <v3z_18(D)(2), v3z_29(3)>
  # v3y_9 = PHI <v3y_19(D)(2), v3y_26(3)>
  # v3x_8 = PHI <v3x_20(D)(2), v3x_23(3)>
  # v2z_7 = PHI <v2z_16(2), v2z_35(3)>
  # v2y_6 = PHI <v2y_15(2), v2y_34(3)>
  # v2x_5 = PHI <v2x_14(2), v2x_33(3)>
  # v1z_4 = PHI <v1z_13(2), v1z_32(3)>
  # v1y_3 = PHI <v1y_12(2), v1y_31(3)>
  # v1x_2 = PHI <v1x_11(2), v1x_30(3)>
  # i_1 = PHI <i_17(2), i_36(3)>

with -msse compile flag. Note different variable suffixes that create different
sort order. This is (IMO) due to fact that -msse enables lots of additional
__builtin functions (these can be seen in 001.tu dump). Since we don't have x87
scheduler the results became quite unpredictable, and depend on -msseX
settings. It just _happens_ that second form better suits stack nature of x87.

So, why does SSA pass have to interfere with computation dataflow? This
interferece makes things worse and effectively takes away user's control on the
flow of data.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

Reply via email to