Re: Some questions on hash_set/hash_map traits

Richard Biener via Gcc Thu, 09 Sep 2021 00:06:04 -0700

On Wed, Sep 8, 2021 at 10:50 PM Erick Ochoa via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hello,
>
> I am refactoring some old code that runs as an IPA_PASS. This code is
> intended to run at link-time using the LTO framework and so I am
> always testing it that way. At the moment, I am trying to use
> hash-maps but having some difficulties. I am adding some patches that
> will help illustrate along the way.
>
> The first patch simply adds a link-time optimization pass that will be
> used to demo the problem that I am having. One can run with -flto
> -fipa-hash. During the first patch, the pass does not do any
> meaningful work but it just helps to set up the stage.
>
> For the second patch I create a wrapper around a symtab_node. This
> wrapper is intended to add some other functionality to elements that I
> might want to insert in hash-sets or hash-maps.
>
> class symtab_wrapper
> {
> private:
>   symtab_node *_symtab_node;
> public:
>   friend symtab_wrapper_traits;
>   symtab_wrapper ()
>     : _symtab_node (NULL)
>   {}
>   symtab_wrapper (symtab_node *s)
>     : _symtab_node (s)
>   { gcc_assert (s); }
>
>   void print (void)
>   {
>     if (!dump_file) return;
>     gcc_assert (_symtab_node);
>     fprintf (dump_file, "%s\n", _symtab_node->name ());
>   }
> };
>
> I also have a wrapper around a hash-set to add some things that might
> be of interest to me in the future:
>
> template <typename KeyId, bool Lazy = false, typename Traits =
> default_hash_traits <KeyId> >
> class my_set_t
> {
> private:
>   hash_set <KeyId, Lazy, Traits> _impl;
> public:
>   my_set_t () {}
>
>   void insert (const KeyId &k)
>   {
>     _impl.add (k);
>   }
>
>   bool contains (const KeyId &k)
>   {
>     return _impl.contains (k);
>   }
>
>   void print (void)
>   {
>     if (!dump_file) return;
>     for (auto i = _impl.begin (), e = _impl.end (); i != e; ++i)
>     {
>       (*i).print ();
>     }
>   }
> };
>
> I also created a symtab_wrapper_traits because I want to store the
> object in the hash-map itself, and so I need to specify how to hash
> it.
>
> class symtab_wrapper_traits
> {
> public:
>   typedef symtab_wrapper value_type;
>   typedef symtab_wrapper compare_type;
>   static const bool empty_zero_p = true;
>
>   static inline hashval_t hash (const value_type &v)
>   {
>     return pointer_hash <void>::hash (v._symtab_node);
>   }
>
>   static bool equal (const value_type &l, const value_type &r)
>   {
>     return l._symtab_node == r._symtab_node;
>   }
>
>   static void mark_deleted (value_type &v)
>   {
>     v._symtab_node = NULL;
>   }
>
>   static void mark_empty (value_type &v)
>   {
>     v._symtab_node = NULL;
>   }
>
>   static void remove (value_type &v)
>   {
>     v._symtab_node = NULL;
>   }
>
>   static bool is_deleted (const value_type &v)
>   {
>      return !v._symtab_node;
>   }
>
>   static bool is_empty (const value_type &v)
>   {
>      return !v._symtab_node;
>   }
> };
>
> I then create a global set with my traits and populate it with some
> information and later print. It seems to be working:
>
> my_set_t <symtab_wrapper, false, symtab_wrapper_traits> my_set;
>
>  static void
>  ipa_hash_generate_summary (void)
>  {
>   cgraph_node *cnode = NULL;
>   FOR_EACH_FUNCTION (cnode)
>   {
>     symtab_wrapper w (cnode);
>     my_set.insert (w);
>   }
>
>   my_set.print ();
>
>   return;
>  }
>
> Here, I already have some questions, but this is not the biggest issue
> that I am having: I believe that the hash_set implementation manages
> its own memory, but I am unclear about the semantics of "removed".
>
> * Is "removed" supposed to, for example, call the destructor? Since,
> in this particular case, it is only a wrapper and cgraph_node is not
> intended to be destroyed, I just assign the pointer to NULL. Not sure
> if that's the intention.
> * I don't understand the semantics of empty_zero_p, I think it is used
> to zero out deleted entries? If I mark something as empty.
> * I am assuming that its memory will be re-used, is this the case?
> * I see that there is a typed_delete_remove function that is sometimes
> used in traits, but I think that it doesn't apply to this case because
> I am storing the whole object. Is that the case?
>
> I later did some refactoring (patch 3), which did not seem to impact
> the code now, but will impact hash_map later. The refactoring is as
> follows, I create an abstract "printable" class
>
> class printable
> {
> public:
>   virtual char* str () = 0;
>   void print (void);
> };
>
> void
> printable::print (void)
> {
>   if (!dump_file) return;
>   char* _str = str ();
>   fprintf (dump_file, "%s\n", _str);
>   free (_str);
> }
>
> Make symtab_wrapper a publically derived class
>
> -class symtab_wrapper
> +class symtab_wrapper : public printable
>
> and implemented the str function in symtab_wrapper:
>
>   virtual char* str (void)
>    {
>     if (!dump_file) return;
>     gcc_assert (_symtab_node);
>     char *retval = NULL;
>     asprintf (&retval, "%s", _symtab_node->name ());
>     return retval;
>   }
>
> Again, this seems to be working correctly.
>
> What is more interesting is moving from a hash-set to a hash-map. On
> the fourth patch, I implemented the hash_map equivalent to the
> hash_set that I am describing above:
>
> template <typename KeyId, typename Value, typename Traits>
> class my_map_t
> {
> private:
>   hash_map <KeyId, Value, Traits> _impl;
> public:
>   my_map_t () {}
>
>   void insert (const KeyId &k, const Value &v)
>   {
>     _impl.put (k, v);
>   }
>
>   Value *select (const KeyId &k)
>   {
>     return _impl.get (k);
>   }
>
>   void print (void)
>   {
>     if (!dump_file) return;
>     for (auto i = _impl.begin (), e = _impl.end (); i != e; ++i)
>     {
>       (*i).first.print ();
>     }
>   }
> };
>
> my_map_t <symtab_wrapper, int, simple_hashmap_traits
> <symtab_wrapper_traits, int>> my_map;
>
> I fill the keys with the same data as the hash_set and for the
> integers, just some incrementing values. I try to print the hash_map
> after printing the hash_set, but it crashes at the call-site to "str
> ()"
>
> void
> printable::print (void)
> {
>   if (!dump_file) return;
>   char* _str = str (); // <-- it crashes here
>   fprintf (dump_file, "%s\n", _str);
>   free (_str);
> }
>
> The interesting thing is that if I remove printable from
> symtab_wrapper's class hierarchy and inline print (and devirtualize
> str), then everything works correctly... (or at least to my
> understanding). This is illustrated in the fifth patch, which
> succeeds.
>
> Now, I am not sure if I am doing something wrong and I am a bit lost
> here. If anyone could help me, I'd appreciate it.
>
> I have built these patches on top of trunk and also bootstrapped them.
>
> Many thanks in advance!


There are likely still holes in how hash-set and hash-map handle
non-POD key/value types.  There have been patches floating around
and also some changes to ensure that constructors and destructors
are appropriately invoked.  As with your printable issue it's likely
the object is not properly constructed and thus the vtable pointer
initialization is missing.

The empty/zero/deleted stuff is to aid in representing hash table
entries that are unoccupied, either empty or deleted.  For every
object you put into a hash you have to define representations for
those states (for hash-map that applies to the 'key' type).

Richard.

Re: Some questions on hash_set/hash_map traits

Reply via email to