On 24/11/24 16:26 +0100, Jan Hubicka wrote:
Hi,
looking into reason why we still do throw_bad_alloc in clang binary I noticed
that quite few calls come from deque::_M_reallocate_map.  This patch adds
unreachable to limit the size of realloc_map.  _M_reallocate_map is called only
if new size is smaller then max_size.  map is an array holding pointers to
entries of fixed size.

Since rellocation is done by doubling the map size, I think the maximal size of
map allocated is max_size / deque_buf_size rounded up times two.  This should
be also safe for overflows since we have extra bit.

map size is always at least 8. Theoretically this computation may be wrong for
very large T, but in that case callers should never reallocate.

On the testcase I get:
jh@shroud:~> ~/trunk-install-new4/bin/g++ -O2 dq.C -c ; size -A dq.o | grep text
.text                                              284      0
.text._ZNSt5dequeIiSaIiEE17_M_reallocate_mapEmb    485      0
.text.unlikely                                      10      0
jh@shroud:~> ~/trunk-install-new5/bin/g++ -O2 dq.C -c ; size -A dq.o | grep text
.text                                              284      0
.text._ZNSt5dequeIiSaIiEE17_M_reallocate_mapEmb    465      0
.text.unlikely                                      10      0

so this saves about 20 bytes of rellocate_map, which I think is worthwhile.
Curiously enough gcc14 does:

jh@shroud:~> g++ -O2 dq.C -c ; size -A dq.o | grep text
.text                 604      0
.text.unlikely         10      0

which is 145 bytes smaller. Obvoius difference is that _M_reallocate_map gets 
inlined.
Compiling gcc14 preprocessed file with trunk gives:

jh@shroud:~> g++ -O2 dq.C -S ; size -A dq.o | grep text
.text                 762      0
.text.unlikely         10      0

So inlining is due to changes at libstdc++ side, but code size growth is due to
something else.

For clang this reduced number of thris_bad_new_array_length from 121 to 61.

Regtested x86_64-linux, OK?

libstdc++-v3/ChangeLog:

        * include/bits/deque.tcc: Compute maximal size of alloc_map.

gcc/testsuite/ChangeLog:

        * g++.dg/tree-ssa/deque.C: New test.


diff --git a/gcc/testsuite/g++.dg/tree-ssa/deque.C 
b/gcc/testsuite/g++.dg/tree-ssa/deque.C
new file mode 100644
index 00000000000..c79de9b2161
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/deque.C
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+#include <deque>
+void
+test(std::deque<int> &q, int v)
+{
+  q.push_back (v);
+}
+// { dg-final { scan-tree-dump-not "throw_bad_alloc" "optimized" } }
diff --git a/libstdc++-v3/include/bits/deque.tcc 
b/libstdc++-v3/include/bits/deque.tcc
index deb010a0ebb..653354f90a7 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -955,6 +955,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
          size_type __new_map_size = this->_M_impl._M_map_size
                                     + std::max(this->_M_impl._M_map_size,
                                                __nodes_to_add) + 2;
+         size_type __max = __deque_buf_size(sizeof(_Tp));
+         if (__new_map_size > ((max_size() + __deque_buf_size(sizeof(_Tp)) - 1)
+                               / __deque_buf_size(sizeof(_Tp))) * 2)
+           __builtin_unreachable ();

I forgot dead variable here which yields to -Wall warning.  Also I
noticed that deque copy operation may be optimized if we know that
size <= max_size.
With this change we can now fully optimize away unused deque copies.  I
am not sure how common it is in practice, but I think all the containers
should be optimizable this way.

Next interesting challenge (seen at llvm binary) is std::hash.

Here is updated patch which got regtested&bootstrapped on x86_64-linux

libstdc++-v3/ChangeLog:

        * include/bits/deque.tcc (std::deque::_M_reallocate_map): Add
        __builtin_unreachable check to declare that maps are not very large.
        * include/bits/stl_deque.h (std::deque::size): Add __builtin_unreachable
        to check for maximal size of map.

gcc/testsuite/ChangeLog:

        * g++.dg/tree-ssa/deque-1.C: New test.
        * g++.dg/tree-ssa/deque-2.C: New test.

diff --git a/gcc/testsuite/g++.dg/tree-ssa/deque-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/deque-1.C
new file mode 100644
index 00000000000..c639ebb1a5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/deque-1.C
@@ -0,0 +1,9 @@
+// { dg-do compile }
+// { dg-options "-O1 -fdump-tree-optimized" }
+#include <deque>
+void
+test(std::deque<int> &q, int v)
+{
+  q.push_back (v);
+}
+// { dg-final { scan-tree-dump-not "throw_bad_alloc" "optimized" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/deque-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/deque-2.C
new file mode 100644
index 00000000000..7e268b3f018
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/deque-2.C
@@ -0,0 +1,10 @@
+// { dg-do compile }
+// { dg-options "-O3 -fdump-tree-optimized" }
+#include <deque>
+std::deque<int *>
+test2(std::deque<int *> &q)
+{
+  return q;
+}
+// rethrow is OK, but throw is not.
+// { dg-final { scan-tree-dump-not {[^e]throw} "optimized" } }
diff --git a/libstdc++-v3/include/bits/deque.tcc 
b/libstdc++-v3/include/bits/deque.tcc
index deb010a0ebb..e56cd0b9319 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -955,6 +955,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
          size_type __new_map_size = this->_M_impl._M_map_size
                                     + std::max(this->_M_impl._M_map_size,
                                                __nodes_to_add) + 2;
+         if (__new_map_size > ((max_size() + __deque_buf_size(sizeof(_Tp)) - 1)
+                               / __deque_buf_size(sizeof(_Tp))) * 2)
+           __builtin_unreachable ();

I think it's worth avoiding the duplicate call, for -O0 -std=c++98.
It makes the line shorter too:

  const size_t __bufsz = __deque_buf_size(sizeof(_Tp));
  if (__new_map_size > ((max_size() + __bufsz - 1) / __bufsz) * 2)
    __builtin_unreachable();



          _Map_pointer __new_map = this->_M_allocate_map(__new_map_size);
          __new_nstart = __new_map + (__new_map_size - __new_num_nodes) / 2
diff --git a/libstdc++-v3/include/bits/stl_deque.h 
b/libstdc++-v3/include/bits/stl_deque.h
index c617933bd81..e47e7059330 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -1266,7 +1266,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
      _GLIBCXX_NODISCARD
      size_type
      size() const _GLIBCXX_NOEXCEPT
-      { return this->_M_impl._M_finish - this->_M_impl._M_start; }
+      {
+       size_type __siz = this->_M_impl._M_finish - this->_M_impl._M_start;

__sz here too please.

+       if (__siz > max_size ())
+         __builtin_unreachable ();
+       return __siz;
+      }

      /**  Returns the size() of the largest possible %deque.  */
      _GLIBCXX_NODISCARD

Reply via email to