> On Sunday, 19 November 2023 22:53:37 CET Jan Hubicka wrote:
> > Sadly it is really hard to work out this
> > from IPA passes, since we basically care whether the iterator points to
> > the same place as the end pointer, which are both passed by reference.
> > This is inter-procedural value numbering that is quite out of reach.
> 
> I've done a fair share of branching on __builtin_constant_p in 
> std::experimental::simd to improve code-gen. It's powerful! But maybe we 
> also need the other side of the story to tell the optimizer: "I know you 
> can't const-prop everything; but this variable / expression, even if you 
> need to put in a lot of effort, the performance difference will be worth 
> it."
> 
> For std::vector, the remaining capacity could be such a value. The 
> functions f() and g() are equivalent (their code-gen isn't https://
> compiler-explorer.com/z/r44ejK1qz):
> 
> #include <vector>
> 
> auto
> f()
> {
>   std::vector<int> x;
>   x.reserve(10);
>   for (int i = 0; i < 10; ++i)
>     x.push_back(0);
>   return x;
> }
> auto
> g()
> { return std::vector<int>(10, 0); }

With my changes at -O3 we now inline push_back, so we could optimize the
first loop to the second. However with 
~/trunk-install/bin/gcc -O3  auto.C  -S -fdump-tree-all-details -fno-exceptions 
-fno-store-merging -fno-tree-slp-vectorize
the fist problem is right at the begining:

  <bb 2> [local count: 97603128]:
  MEM[(struct _Vector_impl_data *)x_4(D)]._M_start = 0B;
  MEM[(struct _Vector_impl_data *)x_4(D)]._M_finish = 0B;
  MEM[(struct _Vector_impl_data *)x_4(D)]._M_end_of_storage = 0B;
  _37 = operator new (40);
  _22 = x_4(D)->D.26019._M_impl.D.25320._M_finish;
  _23 = x_4(D)->D.26019._M_impl.D.25320._M_start;
  _24 = _22 - _23;
  if (_24 > 0)
    goto <bb 3>; [41.48%]
  else
    goto <bb 4>; [58.52%]

So the vector is fist initialized with _M_start=_M_finish=0, but after
call to new we already are not able to propagate this.

This is because x is returned and PTA considers it escaping.  This is
problem discussed in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
Which shows that it is likely worthwhile to fix PTA to handle this
correctly.

Reply via email to