Hi!

On 2021-08-07T09:54:53+0100, Jonathan Wakely via Gcc <gcc@gcc.gnu.org> wrote:
> On Sat, 7 Aug 2021, 09:08 Thomas Schwinge, <tho...@codesourcery.com> wrote:
>> On 2021-08-06T19:37:58+0100, Jonathan Wakely <jwakely....@gmail.com> wrote:
>> > On Fri, 6 Aug 2021, 17:58 Thomas Schwinge, <tho...@codesourcery.com> wrote:
>> >> So I'm trying to do some C++...  ;-)
>> >>
>> >> Given:
>> >>
>> >>     /* A map from SSA names or var decls to record fields.  */
>> >>     typedef hash_map<tree, tree> field_map_t;
>> >>
>> >>     /* For each propagation record type, this is a map from SSA names or 
>> >> var decls
>> >>        to propagate, to the field in the record type that should be used 
>> >> for
>> >>        transmission and reception.  */
>> >>     typedef hash_map<tree, field_map_t> record_field_map_t;
>> >>
>> >> Thus, that's a 'hash_map<tree, hash_map<tree, tree>>'.  (I may do that,
>> >> right?)  Looking through GCC implementation files, very most of all uses
>> >> of 'hash_map' boil down to pointer key ('tree', for example) and
>> >> pointer/integer value.
>> >>
>> >> Then:
>> >>
>> >>     record_field_map_t field_map ([...]); // see below
>> >>     for ([...])
>> >>       {
>> >>         tree record_type = [...];
>> >>         [...]
>> >>         bool existed;
>> >>         field_map_t &fields
>> >>           = field_map.get_or_insert (record_type, &existed);
>> >>         gcc_checking_assert (!existed);
>> >>         [...]
>> >>         for ([...])
>> >>           fields.put ([...], [...]);
>> >>         [...]
>> >>       }
>> >>     [stuff that looks up elements from 'field_map']
>> >>     field_map.empty ();
>> >>
>> >> This generally works.
>> >>
>> >> If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy.
>> >> If however I instantiate 'record_field_map_t field_map (13);' (where '13'
>> >> would be the default for 'hash_map'), Valgrind complains: [...]
>> >>
>> >> My suspicion was that it is due to the 'field_map' getting resized as it
>> >> incrementally grows (and '40' being big enough for that to never happen),
>> >> and somehow the non-POD (?) value objects not being properly handled
>> >> during that.  Working my way a bit through 'gcc/hash-map.*' and
>> >> 'gcc/hash-table.*' (but not claiming that I understand all that, off
>> >> hand), it seems as if my theory is right: I'm able to plug this memory
>> >> leak as follows:
>> >>
>> >>     --- gcc/hash-table.h
>> >>     +++ gcc/hash-table.h
>> >>     @@ -820,6 +820,8 @@ hash_table<Descriptor, Lazy, Allocator>::expand ()
>> >>              {
>> >>                value_type *q = find_empty_slot_for_expand 
>> >> (Descriptor::hash (x));
>> >>           new ((void*) q) value_type (std::move (x));
>> >>     +     //BAD Descriptor::remove (x); // (doesn't make sense and) a ton 
>> >> of "Invalid read [...] inside a block of size [...] free'd"
>> >>     +     x.~value_type (); //GOOD This seems to work!  -- but does it 
>> >> make sense?
>> >>              }
>> >>
>> >>            p++;
>> >>
>> >> However, that doesn't exactly look like a correct fix, does it?  I'd
>> >> expect such a manual destructor call in combination with placement new
>> >> (that is being used here, obviously) -- but this is after 'std::move'?
>> >> However, this also survives a smoke-test-like run of parts of the GCC
>> >> testsuite, bootstrap and complete run now ongoing.
>>
>> That testing came back without any issues.
>>
>> > Does GCC's hash_map assume you only use it to store POD (plain old data)
>> > types
>>
>> Don't you disappoint me, C++!
>
> It's not a limitation of C++, just this data structure.

(Understood, of course.  Yet, the programming language paves the way for
making it "easy" to achieve similar behavior for different kinds of data
types -- but I know, the devil's in the details, always.)

Actually, I suppose not "non-POD" is the problem here, but rather
non-trivial constructor/destructor, because the latter is how you have a
C++ class data type allocate additional resources (such as memory), which
is what's the problem here regarding the memory leak.

>> > which don't need to be destroyed, because they don't have any
>> > dynamically allocated memory or other resources?
>> >
>> > A hash_map is not a POD, because it does have dynamically allocated memory.
>>
>> ACK, that's what I tried to say above in my "layman's terms".  ;-)
>>
>> > If my guess is right, then hash_map should really use a static_assert to
>> > enforce that requirement, instead of letting you use it in a way that will
>> > leak.
>>
>> Eh, yes, at the very least!

'gcc/hash-map.h':

    /* Class hash_map is a hash-value based container mapping objects of
       KeyId type to those of the Value type.
       Both KeyId and Value may be non-trivial (non-POD) types provided
       a suitabe Traits class.  [...]

..., so this ought to work in principle.  Indeed, if I try:

    --- gcc/hash-map.h
    +++ gcc/hash-map.h
    @@ -38,6 +38,9 @@ template<typename KeyId, typename Value,
                                                        Value> */>
     class GTY((user)) hash_map
     {
    +  static_assert (std::is_pod<KeyId>::value, "non-POD KeyId");
    +  static_assert (std::is_pod<Value>::value, "non-POD Value");
    +
       [...]

... we get a very lot of complaints.

Trying another thing (catching non-trivial destructor instead of
non-POD):

    --- gcc/hash-table.h
    +++ gcc/hash-table.h
    @@ -814,30 +814,36 @@ hash_table<Descriptor, Lazy, Allocator>::expand ()
       value_type *p = oentries;
       do
         {
           value_type &x = *p;

           if (!is_empty (x) && !is_deleted (x))
             {
               value_type *q = find_empty_slot_for_expand (Descriptor::hash 
(x));
              new ((void*) q) value_type (std::move (x));
             }

           p++;
         }
       while (p < olimit);

    +  /* If we get here for types with non-trivial destructor, there is a 
memory
    +     leak: above, the individual 'x's have been 'move'd, and below, the
    +     original 'm_entries' container gets 'free'd -- but the individual 'x's
    +     never get destructed.  */
    +  gcc_checking_assert (std::is_trivially_destructible<value_type>::value);
    +
       if (!m_ggc)
         Allocator <value_type> ::data_free (oentries);
       else
         ggc_free (oentries);
     }

..., that is getting closer, but it still fires in a number of places
where there in fact is no leak (because the non-trivial constructor
doesn't actually allocate any resources dynamically).  (See attached,
just for posterity.)

>> Or, of course, make it work?  I mean GCC surely isn't the first software
>> project to desire implementing a 'hash_map' storing non-POD objects?
>> Don't you disappoint me, C++!
>
> Of course it's possible.

So let's do it.  :-)

>> Alternative to that manual destructor call (per my patch/hack above) --
>> is maybe something wrong in the 'value_type' constructor implementation
>> or any other bits related to the 'std::move'?  (Is that where the non-POD
>> source data ought to be destructed; via "move" instead of "copy"
>> semantics?)
>
> No, a move is just a transfer of resources, it doesn't end the object's
> lifetime. You still need a destructor. I don't know if that is the right
> place to do it though (I haven't looked into it). The destructor should be
> run just before an object is removed from the container.

ACK, thanks.  For a continuation of this specific discussion, please see
my reply to Martin Sebor's email.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 233b21395e8e8fd0e35bd6c81ac360d2bf355500 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Wed, 11 Aug 2021 14:51:16 +0200
Subject: [PATCH] [WIP] In 'hash_table::expand' verify
 'is_trivially_destructible': selectively disable

---
 gcc/analyzer/store.cc  | 4 ++++
 gcc/cp/module.cc       | 2 ++
 gcc/cp/parser.c        | 2 ++
 gcc/hash-table.c       | 2 ++
 gcc/hash-table.h       | 5 ++++-
 gcc/ipa-devirt.c       | 2 ++
 gcc/ipa-icf.c          | 2 ++
 gcc/sanopt.c           | 6 ++++++
 gcc/tree-sra.c         | 2 ++
 gcc/tree-ssa-uncprop.c | 2 ++
 gcc/tree-ssa.c         | 4 ++++
 11 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index eac1295c5de..c0957df9f9e 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -1931,7 +1931,9 @@ store_manager::get_concrete_binding (bit_offset_t start_bit_offset,
     return existing;
 
   concrete_binding *to_save = new concrete_binding (b);
+  check_hash_table_expand = false;
   m_concrete_binding_key_mgr.put (b, to_save);
+  check_hash_table_expand = true;
   return to_save;
 }
 
@@ -1943,7 +1945,9 @@ store_manager::get_symbolic_binding (const region *reg)
     return existing;
 
   symbolic_binding *to_save = new symbolic_binding (b);
+  check_hash_table_expand = false;
   m_symbolic_binding_key_mgr.put (b, to_save);
+  check_hash_table_expand = true;
   return to_save;
 }
 
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ccbde292c22..8927f3cef3d 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -15452,7 +15452,9 @@ module_state::read_pendings (unsigned count)
       dump () && dump ("Pending:%u keyed to %P", index, key.ns, key.id);
 
       index += entity_lwm;
+      check_hash_table_expand = false;
       auto &vec = pending_table->get_or_insert (key);
+      check_hash_table_expand = true;
       vec.safe_push (index);
     }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index edb69aeb926..b22a6b9e436 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -33194,7 +33194,9 @@ class_decl_loc_t::add (cp_parser *parser, location_t key_loc,
   /* Set if a declaration of TYPE has previously been seen or if it must
      exist in a precompiled header.  */
   bool exist;
+  check_hash_table_expand = false;
   class_decl_loc_t *rdl = &class2loc.get_or_insert (type_decl, &exist);
+  check_hash_table_expand = true;
   if (!exist)
     {
       tree type = TREE_TYPE (type_decl);
diff --git a/gcc/hash-table.c b/gcc/hash-table.c
index a1603ee0170..093895a5c23 100644
--- a/gcc/hash-table.c
+++ b/gcc/hash-table.c
@@ -136,3 +136,5 @@ hashtab_chk_error ()
 	   "of values with a different hash value\n");
   gcc_unreachable ();
 }
+
+bool check_hash_table_expand = true;
diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index afddcd66199..73d05cfec3b 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -773,6 +773,8 @@ hash_table<Descriptor, Lazy, Allocator>::too_empty_p (unsigned int elts)
    table entries is changed.  If memory allocation fails, this function
    will abort.  */
 
+extern bool check_hash_table_expand;
+
 template<typename Descriptor, bool Lazy,
 	 template<typename Type> class Allocator>
 void
@@ -830,7 +832,8 @@ hash_table<Descriptor, Lazy, Allocator>::expand ()
      leak: above, the individual 'x's have been 'move'd, and below, the
      original 'm_entries' container gets 'free'd -- but the individual 'x's
      never get destructed.  */
-  gcc_checking_assert (std::is_trivially_destructible<value_type>::value);
+  gcc_checking_assert (!check_hash_table_expand
+		       || std::is_trivially_destructible<value_type>::value);
 
   if (!m_ggc)
     Allocator <value_type> ::data_free (oentries);
diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 8deec75b2df..b8142730466 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -4143,8 +4143,10 @@ ipa_odr_read_section (struct lto_file_decl_data *file_data, const char *data,
       name = XOBFINISH (&odr_enum_obstack, char *);
 
       bool existed_p;
+      check_hash_table_expand = false;
       class odr_enum &this_enum
 		 = odr_enum_map->get_or_insert (xstrdup (name), &existed_p);
+      check_hash_table_expand = true;
 
       /* If this is first time we see the enum, remember its definition.  */
       if (!existed_p)
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 4c1f25d0834..5148dab79a3 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -167,7 +167,9 @@ sem_item::add_reference (ref_map *refs,
   bool existed;
 
   sem_usage_pair *pair = new sem_usage_pair (target, index);
+  check_hash_table_expand = false;
   vec<sem_item *> &v = refs->get_or_insert (pair, &existed);
+  check_hash_table_expand = true;
   if (existed)
     delete pair;
 
diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index 2e401554abf..441a7309038 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -365,7 +365,9 @@ maybe_optimize_ubsan_null_ifn (class sanopt_ctx *ctx, gimple *stmt)
   gcc_assert (TREE_CODE (cur_align) == INTEGER_CST);
   bool remove = false;
 
+  check_hash_table_expand = false;
   auto_vec<gimple *> &v = ctx->null_check_map.get_or_insert (ptr);
+  check_hash_table_expand = true;
   gimple *g = maybe_get_dominating_check (v);
   if (!g)
     {
@@ -716,13 +718,17 @@ maybe_optimize_asan_check_ifn (class sanopt_ctx *ctx, gimple *stmt)
 
   gimple_set_uid (stmt, info->freeing_call_events);
 
+  check_hash_table_expand = false;
   auto_vec<gimple *> *ptr_checks = &ctx->asan_check_map.get_or_insert (ptr);
+  check_hash_table_expand = true;
 
   tree base_addr = maybe_get_single_definition (ptr);
   auto_vec<gimple *> *base_checks = NULL;
   if (base_addr)
     {
+      check_hash_table_expand = false;
       base_checks = &ctx->asan_check_map.get_or_insert (base_addr);
+      check_hash_table_expand = true;
       /* Original pointer might have been invalidated.  */
       ptr_checks = ctx->asan_check_map.get (ptr);
     }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 3a9e14f50a0..192dfbfbfb8 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -872,7 +872,9 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   access->offset = offset;
   access->size = size;
 
+  check_hash_table_expand = false;
   base_access_vec->get_or_insert (base).safe_push (access);
+  check_hash_table_expand = true;
 
   return access;
 }
diff --git a/gcc/tree-ssa-uncprop.c b/gcc/tree-ssa-uncprop.c
index fcc8445c982..12bfd77e6c2 100644
--- a/gcc/tree-ssa-uncprop.c
+++ b/gcc/tree-ssa-uncprop.c
@@ -290,7 +290,9 @@ remove_equivalence (tree value)
 static void
 record_equiv (tree value, tree equivalence)
 {
+  check_hash_table_expand = false;
   val_ssa_equiv->get_or_insert (value).safe_push (equivalence);
+  check_hash_table_expand = true;
 }
 
 class uncprop_dom_walker : public dom_walker
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 4cc400d3c2e..b26493132d2 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -59,7 +59,9 @@ redirect_edge_var_map_add (edge e, tree result, tree def, location_t locus)
   if (edge_var_maps == NULL)
     edge_var_maps = new hash_map<edge, auto_vec<edge_var_map> >;
 
+  check_hash_table_expand = false;
   auto_vec<edge_var_map> &slot = edge_var_maps->get_or_insert (e);
+  check_hash_table_expand = true;
   new_node.def = def;
   new_node.result = result;
   new_node.locus = locus;
@@ -95,7 +97,9 @@ redirect_edge_var_map_dup (edge newe, edge olde)
   if (!edge_var_maps)
     return;
 
+  check_hash_table_expand = false;
   auto_vec<edge_var_map> *new_head = &edge_var_maps->get_or_insert (newe);
+  check_hash_table_expand = true;
   auto_vec<edge_var_map> *old_head = edge_var_maps->get (olde);
   if (!old_head)
     return;
-- 
2.30.2

Reply via email to