On Wed, Nov 26, 2014 at 09:49:52AM -0500, David Malcolm wrote:
> On Wed, 2014-11-26 at 09:13 +0100, Uros Bizjak wrote:
> > Hello!
> > 
> > > cgraph*.c and ipa-*.c use xstrdup on strings when dumping them via
> > > fprintf, leaking all of the duplicated buffers.
> > >
> > > Is/was there a reason for doing this?
> > 
> > Yes, please see [1] and PR 53136 [2]. As said in [1]:
> > 
> > "There is a problem with multiple calls of cgraph_node_name in fprintf
> > dumps. Please note that C++ uses caching in
> > cxx_printable_name_internal (aka LANG_HOOKS_DECL_PRINTABLE_NAME), so
> > when cxx_printable_name_internal is called multiple times from printf
> > (i.e. fprintf "%s/%i -> %s/%i"), it can happen that the first string
> > gets evicted by the second call, before fprintf is fully evaluated."
> > 
> > > Taking them out fixes these leaks (seen when dumping is enabled):
> > 
> > But you will get "Invalid read of size X" instead.
> > 
> > The patch at [1] fixed these, but introduced memory leaks, which were
> > tolerable at the time:
> > 
> > "I think that small memory leak is tolerable here (the changes are
> > exclusively in the dump code), and follows the same approach as in
> > java frontend."
> > 
> > It seems that these assumptions are not valid anymore.
> > 
> > [1] https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01904.html
> > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53136
> 
> Aha!   Thanks.
> 
> I was only running under valgrind when testing the jit code, and the jit
> code looks like a frontend to the rest of GCC, it doesn't provide an
> implementation of LANG_HOOKS_DECL_PRINTABLE_NAME.  I ran the rest of the
> gcc testsuite outside of gcc.  So perhaps the duplications of the
> transient decl strings is still needed, and my patch would undo that
> fix.  Oops.
> 
> I'm about to disappear for a US holiday, but maybe a good solution here
> is to invent something like a "xstrdup_for_dump" function like this:
> 
>   /* When using fprintf (or similar), problems can arise with
>      transient generated strings.  Many string-generation APIs
>      only support one result being alive at once (e.g. by
>      returning a pointer to a statically-allocated buffer).
> 
>      If there is more than one generated string within one
>      fprintf call: the first string gets evicted or overwritten
>      by the second, before fprintf is fully evaluated.
>      See e.g. PR/53136.
> 
>      This function provides a workaround for this, by providing
>      a simple way to create copies of these transient strings,
>      without the need to have explicit cleanup:
> 
>          fprintf (dumpfile, "string 1: %s string 2:%s\n",
>                   xstrdup_for_dump (EXPR_1),
>                   xstrdup_for_dump (EXPR_2));
>      
>      The copied string is eventually freed, from code within
>      toplev::finalize.  */
> 
>   extern const char *
>   xstrdup_for_dump (const char *transient_str);
> 
> We could then use this instead of xstrdup at these callsites, which
> would document what's going on.
> 
> (I'm not in love with the name, but hopefully the idea is clear).
> 
> This could be implemented either:
> 
> (A) in terms of the "long-term allocator" idea here:
>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02420.html
> which uses a obstack on the gcc::context, eventually freeing the
> allocations from toplev::finalize (although that's currently only called
> from libgccjit.so, but we could call it from cc1 etc when running under
> valgrind).
> 
> (B) or in terms of ggc_xstrdup, with the assumption that no GC can occur
> in anything within a fprintf call.
> 
> That would document what's going on, ensure that we don't have
> use-after-free issues, and stop memory leak warnings from valgrind
> concerning these duplicates.
> 
> Thoughts?

"bleh"? ;-) however I'm not sure I have a  better idea, but here are
some other options.

- have lang_hooks_decl_printable_name return something like
  auto_ptr<char *> so you'd write
  fprintf ("blah: %s", hooks.printable_name (t).get ());

  - depending on who all calls this function we could also replace the
    static cache with a hash map from trees to strings.

    Trev

> Dave
> 

Reply via email to