https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #5)
> Note there is another way of solving this. From my anylsis (which I wrote in
> PR 121921):
> currently DSE5 can remove the stores:
> ```
>   Deleted dead store: MEM[(struct __as_base  &)&data] ={v} {CLOBBER(bob)};
> 
>   Deleted dead store: MEM[(struct _Bvector_impl_data *)&data] ={v}
> {CLOBBER(bob)};
> 
> ```
> But DCE7 (which is right afterwards) does not `remove operator new/delete`
> because this missed optimization and then forwprop4 (which is right after
> dce7) is able to see (b+s) - (b+s - b) is just b and then later on the next
> DCE optimizes away the new/delete pair.
> 
> > Unused new/delete pair is only being determined at cddce3 which is bit 
> > late. 
> 
> The reason why it is not before hand is due to `e - (e - b)` not being
> optimized to b until forwprop4 which is right after dce7. If `e - (e - b)`
> got folded say fre1:
> ```
>   _1 = this_15(D)->_M_impl.D.25104._M_start.D.16464._M_p;
> ...
>   _20 = MEM[(const struct _Bvector_impl
> *)this_15(D)].D.25104._M_end_of_storage;
>   _5 = _20 - _1; // e - b
>   _8 = (long unsigned int) _5;
>   _9 = -_8;
>   _10 = _20 + _9; // e - (e - b)
>   _11 = &this_15(D)->_M_impl;
>   operator delete (_10, _8);
> ```
> We should recongize the operator new/delete pair earlier too.

Nope because we are till left with:
```
  _133 = _34 + _33;
...
  _9 = _133 - _34;
  _10 = (long unsigned int) _9;
```
Not being converted into _33 until forwprop still.
The reason is fre5 does not get it due to the need for jump threading:
```
  <bb 8> [local count: 111448560]:
  # _150 = PHI <_34(7), 0B(4), _34(6)>
  # data$D25093$_M_end_of_storage_175 = PHI <_28(7), 0B(4), _28(6)>
  __first ={v} {CLOBBER(eos)};
  __result ={v} {CLOBBER(eos)};
  if (_150 != 0B)
    goto <bb 10>; [53.47%]
  else
    goto <bb 11>; [46.53%]

...

  <bb 10> [local count: 58514395]:
  _9 = data$D25093$_M_end_of_storage_175 - _150;
```

In theory we could optimize:
```
  _28 = _34 + _33;
...
  <bb 10> [local count: 111448560]:
  # __result_72 = PHI <_69(7), _34(8), _71(9), 0B(4)>
  # _150 = PHI <_34(7), _34(8), _34(9), 0B(4)>
  # data$D25093$_M_end_of_storage_175 = PHI <_28(7), _28(8), _28(9), 0B(4)>

...
  _9 = data$D25093$_M_end_of_storage_175 - _150;
  _10 = (long unsigned int) _9;


Into:
```
  <bb 10> [local count: 111448560]:
 # _t = PHI<_33(7),_33(8),_33(9),0>
...
  _9 = (long int)_t
  _10 = (long unsigned int) _9;
...
```

But I am not sure how expensive in compile time this would be. Then in ccp4 we
would get the decent code.

Reply via email to