https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #5) > Note there is another way of solving this. From my anylsis (which I wrote in > PR 121921): > currently DSE5 can remove the stores: > ``` > Deleted dead store: MEM[(struct __as_base &)&data] ={v} {CLOBBER(bob)}; > > Deleted dead store: MEM[(struct _Bvector_impl_data *)&data] ={v} > {CLOBBER(bob)}; > > ``` > But DCE7 (which is right afterwards) does not `remove operator new/delete` > because this missed optimization and then forwprop4 (which is right after > dce7) is able to see (b+s) - (b+s - b) is just b and then later on the next > DCE optimizes away the new/delete pair. > > > Unused new/delete pair is only being determined at cddce3 which is bit > > late. > > The reason why it is not before hand is due to `e - (e - b)` not being > optimized to b until forwprop4 which is right after dce7. If `e - (e - b)` > got folded say fre1: > ``` > _1 = this_15(D)->_M_impl.D.25104._M_start.D.16464._M_p; > ... > _20 = MEM[(const struct _Bvector_impl > *)this_15(D)].D.25104._M_end_of_storage; > _5 = _20 - _1; // e - b > _8 = (long unsigned int) _5; > _9 = -_8; > _10 = _20 + _9; // e - (e - b) > _11 = &this_15(D)->_M_impl; > operator delete (_10, _8); > ``` > We should recongize the operator new/delete pair earlier too. Nope because we are till left with: ``` _133 = _34 + _33; ... _9 = _133 - _34; _10 = (long unsigned int) _9; ``` Not being converted into _33 until forwprop still. The reason is fre5 does not get it due to the need for jump threading: ``` <bb 8> [local count: 111448560]: # _150 = PHI <_34(7), 0B(4), _34(6)> # data$D25093$_M_end_of_storage_175 = PHI <_28(7), 0B(4), _28(6)> __first ={v} {CLOBBER(eos)}; __result ={v} {CLOBBER(eos)}; if (_150 != 0B) goto <bb 10>; [53.47%] else goto <bb 11>; [46.53%] ... <bb 10> [local count: 58514395]: _9 = data$D25093$_M_end_of_storage_175 - _150; ``` In theory we could optimize: ``` _28 = _34 + _33; ... <bb 10> [local count: 111448560]: # __result_72 = PHI <_69(7), _34(8), _71(9), 0B(4)> # _150 = PHI <_34(7), _34(8), _34(9), 0B(4)> # data$D25093$_M_end_of_storage_175 = PHI <_28(7), _28(8), _28(9), 0B(4)> ... _9 = data$D25093$_M_end_of_storage_175 - _150; _10 = (long unsigned int) _9; Into: ``` <bb 10> [local count: 111448560]: # _t = PHI<_33(7),_33(8),_33(9),0> ... _9 = (long int)_t _10 = (long unsigned int) _9; ... ``` But I am not sure how expensive in compile time this would be. Then in ccp4 we would get the decent code.