https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Last reconfirmed|2014-12-03 00:00:00 | Component|middle-end |rtl-optimization Target Milestone|--- |4.9.3 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- The difference is in whether there are extra user-named variables in the end and thus SSA coalescing decision differences: stm_load (volatile stm_word_t * addr) { - stm_word_t l; - stm_word_t value; stm_word_t version; stm_word_t l; struct r_entry_t * r; - stm_word_t now; ... + size_t _32; + size_t _33; + size_t _34; ... Conflict graph: +1: 3 +3: 1 After sorting: Sorted Coalesce list: +(16610) _30 <-> _33 (651) _10 <-> _30 ... -Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Success -> 1 +Coalesce list: (30)_30 & (33)_33 [map: 2, 3] : Success -> 2 +Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Fail due to conflict So it turns out the different coalescing ends up generating worse code. It would be interesting to see why we decide that coalescing _30 and _33 is so much more beneficial than coalescing _10 and _30. Ah, it simply uses EDGE_FREQUENCY... and for some reason we predicted that _33 & 1 != 0 is 10% taken only. So ... the theory is that the version is faster on the important path?