On Fri, 2017-11-10 at 16:45 -0500, David Malcolm wrote:
> On Thu, 2017-11-02 at 10:46 -0400, Jason Merrill wrote:
> > On Tue, Oct 31, 2017 at 5:09 PM, David Malcolm <dmalc...@redhat.com
> > >
> > wrote:
> > > On Tue, 2017-10-24 at 09:53 -0400, Jason Merrill wrote:
> > > > On Fri, Oct 20, 2017 at 5:53 PM, David Malcolm <dmalcolm@redhat
> > > > .c
> > > > om>
> > > > wrote:
> > > > > Design questions:
> > > > > 
> > > > > * The patch introduces a new kind of tree node, currently
> > > > > called
> > > > >   DECL_WRAPPER_EXPR (although it's used for wrapping
> > > > > constants
> > > > > as
> > > > > well
> > > > >   as decls).  Should wrappers be a new kind of tree node, or
> > > > > should
> > > > > they
> > > > >   reuse an existing TREE_CODE? (e.g. NOP_EXPR, CONVERT_EXPR,
> > > > > etc).
> > > > >     * NOP_EXPR: seems to be for use as an rvalue
> > > > >     * CONVERT_EXPR: for type conversions
> > > > >     * NON_LVALUE_EXPR: "Value is same as argument, but
> > > > > guaranteed
> > > > > not an
> > > > >       lvalue"
> > > > >       * but we *do* want to support lvalues here
> > > > 
> > > > I think using NON_LVALUE_EXPR for constants would be
> > > > appropriate.
> > > > 
> > > > >     * VIEW_CONVERT_EXPR: viewing one thing as of a different
> > > > > type
> > > > >       * can it support lvalues?
> > > > 
> > > > Yes, the purpose of VIEW_CONVERT_EXPR is to support lvalues, it
> > > > seems
> > > > like the right choice.
> > > > 
> > > > Jason
> > > 
> > > Thanks.  I've been working on a new version of the patch using
> > > those
> > > tree codes, but have run into an issue.
> > > 
> > > In g++.dg/conversion/reinterpret1.C:
> > > 
> > >   // PR c++/15076
> > > 
> > >   struct Y { Y(int &); };
> > > 
> > >   int v;
> > >   Y y1(reinterpret_cast<int>(v));  // { dg-error "" }
> > > 
> > > With trunk, this successfully generates an error:
> > > 
> > >   reinterpret1.C:6:6: error: cannot bind non-const lvalue
> > > reference
> > > of type ‘int&’ to an rvalue of type ‘int’
> > >    Y y1(reinterpret_cast<int>(v));  // { dg-error "" }
> > >         ^~~~~~~~~~~~~~~~~~~~~~~~
> > >   reinterpret1.C:3:12: note:   initializing argument 1 of
> > > ‘Y::Y(int&)’
> > >    struct Y { Y(int &); };
> > >               ^
> > > 
> > > where internally there's a NON_LVALUE_EXPR around a VAR_DECL,
> > > where
> > > both have the same type:
> > > 
> > > (gdb) call debug_tree (expr)
> > >  <non_lvalue_expr 0x7ffff145f6e0
> > >     type <integer_type 0x7ffff132e5e8 int public type_6 SI
> > >         size <integer_cst 0x7ffff1331120 constant 32>
> > >         unit-size <integer_cst 0x7ffff1331138 constant 4>
> > >         align:32 warn_if_not_align:0 symtab:0 alias-set -1
> > > canonical-type 0x7ffff132e5e8 precision:32 min <integer_cst
> > > 0x7ffff13310d8 -2147483648> max <integer_cst 0x7ffff13310f0
> > > 2147483647>
> > >         pointer_to_this <pointer_type 0x7ffff1336a80>
> > > reference_to_this <reference_type 0x7ffff144ca80>>
> > > 
> > >     arg:0 <var_decl 0x7ffff7ffbd80 v type <integer_type
> > > 0x7ffff132e5e8 int>
> > >         used public static tree_1 read SI /home/david/coding-
> > > 3/gcc-
> > > git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:5:5 size
> > > <integer_cst 0x7ffff1331120 32> unit-size <integer_cst
> > > 0x7ffff1331138 4>
> > >         align:32 warn_if_not_align:0 context
> > > <translation_unit_decl
> > > 0x7ffff131e168 /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C>
> > >         chain <type_decl 0x7ffff141a720 Y type <record_type
> > > 0x7ffff144c150 Y>
> > >             public decl_2 VOID /home/david/coding-3/gcc-git-expr-
> > > vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:3:8
> > >             align:8 warn_if_not_align:0 context
> > > <translation_unit_decl 0x7ffff131e168 /home/david/coding-3/gcc-
> > > git-
> > > expr-vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C>
> > > chain <function_decl 0x7ffff144f800 __cxa_call_unexpected>>>
> > >     /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6
> > > start:
> > > /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6
> > > finish:
> > > /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:29>
> > > 
> > > The problem is that this reinterpret cast "looks" just like one
> > > of
> > > my
> > > location wrappers.
> > 
> > Your code shouldn't strip a NON_LVALUE_EXPR around a VAR_DECL.
> > > I see a similar issue with constants, where with:
> > > 
> > >   struct Y { Y(int &); };
> > >   Y y1(reinterpret_cast<int>(42));
> > > 
> > > trunk generates an error like the above, but my code handles the
> > >   NON_LVALUE_EXPR<int>(INTEGER_CST<int>(42))
> > > as if it were a location wrapper around the INTEGER_CST, and thus
> > > doesn't emit the error.
> > 
> > Why doesn't it emit the error?  We should get the same error
> > whether
> > or not we strip the wrapper.
> 
> Thanks: my stripping macro was over-zealous: it was stripping any
> NON_LVALUE_EXPR or VIEW_CONVERT_EXPR where the type matched that of
> the wrapped node.  I've added the additional condition that a
> NON_LVALUE_EXPR has to be around a CONSTANT_CLASS_P, and
> a VIEW_CONVERT_EXPR around a !CONSTANT_CLASS_P.
> 
> Here's an updated version of the patch (v2), now a patch kit (on top
> of r254387).  I split it up thematically for ease of review, but all
> the patches go together.
> 
> This version of the patch kit bootstraps and passes the regression
> tests (on x86_64-pc-linux-gnu)
> 
> To do so, I've made some simplfications to how wrappers nodes are
> added.
> 
> The previous patch added wrappers in the C++ parser around constants
> and uses-of-declarations, along with some other places in the
> parser (typeid, alignof, sizeof, offsetof).
> 
> This version takes a much more minimal approach: it only adds
> location wrapper nodes around the arguments at callsites, thus
> not adding wrapper nodes around uses of constants and decls in other
> locations.
> 
> It keeps them for the other places in the parser (typeid, alignof,
> sizeof, offsetof).
> 
> In addition, for now, each site that adds wrapper nodes is guarded
> with !processing_template_decl, suppressing the creation of wrapper
> nodes when processing template declarations.  This is to simplify
> the patch kit so that we don't have to support wrapper nodes during
> template expansion.
> 
> With this, we get a big usability win: we always have a location_t
> for every argument at a callsite, and so various errors involving
> mismatching arguments are much easier to read (as the pertinent
> argument is underlined).
> 
> I marked this as "PR 43486" as it's a big step towards solving that
> PR (which seeks to preserve locations all the way through to the
> middle end), but much more would need to be done to solve it:
> 
> * this patch kit only adds wrappers to the C++ frontend, not to C
> 
> * as noted above, location_t wrapper nodes are only added at
>   arguments of callsites (and a few other places), with some
>   restrictions.
> 
> * none of the wrapper nodes survive past gimplification;
>   it's not clear to me how best to preserve them into the
>   middle-end.  But even without doing so, we get the big usability
>   win.
> 
> The later parts of the patch kit add STRIP_ANY_LOCATION_WRAPPER uses
> in various places where the tree code of an expression is examined,
> so that such conditions use the code of the wrapped node, rather
> than that of the wrapper.  One of the risks of the patch kit is that
> although the testsuite passes, there are probably places in our code
> which still need uses of STRIP_ANY_LOCATION_WRAPPER.
> 
> 
> Performance of the patch kit
> ****************************
> 
> Benchmarking shows an apparent 2-3% in cc1plus wallclock compile-time
> for kdecore.cc -O3 -g:
> 
> Compilation of kdecore.cc at -O3 with -g for x86_64-pc-linux-gnu:
> wall
>       control: [56.55, 56.54, 56.68, 56.51, 56.45, 56.5, 56.45,
> 56.46, 56.49, 56.5, 56.42, 56.37, 56.41, 56.55]
>    experiment: [57.32, 58.37, 58.17, 58.18, 58.78, 58.48, 57.99,
> 58.16, 58.14, 57.62, 58.36, 58.1, 57.71, 57.7]
> Min: 56.370000 -> 57.320000: 1.02x slower
> Avg: 56.491429 -> 58.077143: 1.03x slower
> Significant (t=-15.14)
> Stddev: 0.07655 -> 0.38426: 5.0199x larger
> 
> although I'm not quite sure why; the difference is in "phase setup",
> which
> gains a large "wall" value (but not in usr or sys), whereas "phase
> parsing"
> is unaffected:
> 
> unpatched:
> 
>  phase setup             :   0.00 ( 0%) usr   0.02 ( 0%) sys   0.02 (
> 0%) wall    1488 kB ( 0%) ggc
>  phase parsing           :   1.83 ( 4%) usr   0.82 (16%) sys   2.66 (
> 5%) wall  156603 kB (12%) ggc
>  phase lang. deferred    :   0.30 ( 1%) usr   0.09 ( 2%) sys   0.38 (
> 1%) wall   29861 kB ( 2%) ggc
>  phase opt and generate  :  48.09 (94%) usr   4.28 (81%) sys  52.55
> (93%) wall 1085420 kB (84%) ggc
>  phase last asm          :   0.86 ( 2%) usr   0.05 ( 1%) sys   0.92 (
> 2%) wall   15806 kB ( 1%) ggc
>  phase finalize          :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.01 (
> 0%) wall       0 kB ( 0%) ggc
>  |name lookup            :   0.30 ( 1%) usr   0.13 ( 2%) sys   0.21 (
> 0%) wall    4955 kB ( 0%) ggc
>  |overload resolution    :   0.49 ( 1%) usr   0.06 ( 1%) sys   0.68 (
> 1%) wall   27263 kB ( 2%) ggc
>  garbage collection      :   0.77 ( 2%) usr   0.00 ( 0%) sys   0.77 (
> 1%) wall       0 kB ( 0%) ggc
>  [...]
>  TOTAL                 :  51.08             5.27            56.54    
>         1289189 kB
> 
> patched:
>  phase setup             :   0.01 ( 0%) usr   0.03 ( 1%) sys   1.91 (
> 3%) wall    1488 kB ( 0%) ggc
>  phase parsing           :   1.80 ( 4%) usr   0.84 (16%) sys   2.66 (
> 5%) wall  158066 kB (12%) ggc
>  phase lang. deferred    :   0.29 ( 1%) usr   0.09 ( 2%) sys   0.39 (
> 1%) wall   29861 kB ( 2%) ggc
>  phase opt and generate  :  48.09 (94%) usr   4.27 (81%) sys  52.52
> (90%) wall 1085428 kB (84%) ggc
>  phase last asm          :   0.82 ( 2%) usr   0.05 ( 1%) sys   0.89 (
> 2%) wall   15806 kB ( 1%) ggc
>  phase finalize          :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 (
> 0%) wall       0 kB ( 0%) ggc
>  |name lookup            :   0.29 ( 1%) usr   0.07 ( 1%) sys   0.36 (
> 1%) wall    4955 kB ( 0%) ggc
>  |overload resolution    :   0.51 ( 1%) usr   0.07 ( 1%) sys   0.50 (
> 1%) wall   27296 kB ( 2%) ggc
>  garbage collection      :   0.79 ( 2%) usr   0.00 ( 0%) sys   0.78 (
> 1%) wall       0 kB ( 0%) ggc
>  TOTAL                 :  51.01             5.29            58.37    
>         1290660 kB
> 
> What's up with that "phase setup" change?  Did I mess up my testing
> somehow?


I had used taskset to pin everything onto one processor.  On using
"echo 3 > /proc/sys/vm/drop_caches" and rerunning, after a warmup, the
slowdown moved to the unpatched binary, with the patched binary
becoming faster.  My theory is that this is NUMA-related, and that the
buffer cache's copy of the binaries can end up on the "wrong" NUMA
node.  Depending on which I run first, I can make the patched or
unpatched binary faster (based on this slowdown of "wall" seen in
"phase setup").

Speed differences in the "parsing" phase are lost in the noise e.g.:

Compilation of cp-stdlib.cc at -O3 with -g for x86_64-pc-linux-gnu: phase 
parsing: usr
      control: [2.2, 2.25, 2.27, 2.25, 2.23, 2.22, 2.37, 2.23, 2.22, 2.23, 
2.25, 2.21, 2.21, 2.26, 2.23, 2.26, 2.26, 2.23, 2.22, 2.25]
   experiment: [2.22, 2.26, 2.26, 2.2, 2.25, 2.2, 2.21, 2.17, 2.25, 2.23, 2.21, 
2.21, 2.2, 2.2, 2.25, 2.28, 2.19, 2.2, 2.24, 2.24]
Min: 2.200000 -> 2.170000: 1.01x faster
Avg: 2.242500 -> 2.223500: 1.01x faster
Stddev: 0.03582 -> 0.02889: 1.2399x smaller
Timeline: 
http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.17,3.37&chco=FF0000,0000FF&chdl=control|experiment&chds=1.17,3.37&chd=t:2.2,2.25,2.27,2.25,2.23,2.22,2.37,2.23,2.22,2.23,2.25,2.21,2.21,2.26,2.23,2.26,2.26,2.23,2.22,2.25|2.22,2.26,2.26,2.2,2.25,2.2,2.21,2.17,2.25,2.23,2.21,2.21,2.2,2.2,2.25,2.28,2.19,2.2,2.24,2.24&chxl=0:|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|2:||Iteration|3:||Time+(secs)&chtt=Compilation+of+cp-stdlib.cc+at+-O3+with+-g+for+x86_64-pc-linux-gnu:+phase+parsing:+usr


Compilation of kdecore.cc at -O3 with -g for x86_64-pc-linux-gnu: phase 
parsing: usr
      control: [2.66, 2.61, 2.65, 2.69, 2.62, 2.67, 2.63, 2.6, 2.68, 2.67, 
2.64, 2.67, 2.66, 2.67, 2.65, 2.64, 2.69, 2.68, 2.67, 2.67]
   experiment: [2.64, 2.66, 2.7, 2.69, 2.71, 2.67, 2.67, 2.73, 2.68, 2.64, 
2.66, 2.71, 2.66, 2.68, 2.66, 2.66, 2.66, 2.72, 2.64, 2.68]
Min: 2.600000 -> 2.640000: 1.02x slower
Avg: 2.656000 -> 2.676000: 1.01x slower
Stddev: 0.02563 -> 0.02664: 1.0393x larger
Timeline: 
http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.6,3.73&chco=FF0000,0000FF&chdl=control|experiment&chds=1.6,3.73&chd=t:2.66,2.61,2.65,2.69,2.62,2.67,2.63,2.6,2.68,2.67,2.64,2.67,2.66,2.67,2.65,2.64,2.69,2.68,2.67,2.67|2.64,2.66,2.7,2.69,2.71,2.67,2.67,2.73,2.68,2.64,2.66,2.71,2.66,2.68,2.66,2.66,2.66,2.72,2.64,2.68&chxl=0:|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|2:||Iteration|3:||Time+(secs)&chtt=Compilation+of+kdecore.cc+at+-O3+with+-g+for+x86_64-pc-linux-gnu:+phase+parsing:+usr


> ...and a slight increase in GGC usage:
> 
> Compilation of kdecore.cc at -O3 with -g for x86_64-pc-linux-gnu: ggc
>       control: [1289179.0, 1289189.0, 1289170.0, 1289186.0,
> 1289194.0, 1289172.0, 1289176.0, 1289192.0, 1289189.0, 1289179.0,
> 1289172.0, 1289190.0, 1289169.0, 1289185.0]
>    experiment: [1290654.0, 1290660.0, 1290655.0, 1290659.0,
> 1290631.0, 1290655.0, 1290650.0, 1290652.0, 1290642.0, 1290650.0,
> 1290658.0, 1290662.0, 1290638.0, 1290655.0]
> Mem max: 1289194.000 -> 1290662.000: 1.0011x larger
> 
> (this is with stripped binaries, and --enable-checking=release)
> 
> Testing with a simple file that includes all of the C++ standard
> library
> (but doesn't do anything with it) shows no change in time, and GGC
> usage
> in the parsing phase increasing 50kB from 149954 kB to 150004 kB.
> 
> 
> Next steps?
> ***********
> 
> I'm working on extending the patch kit to add wrapper nodes at all
> constants and uses-of-decls in the C++ parser, but of course this
> could have some effect on time/memory, and require more uses of
> STRIP_ANY_LOCATION_WRAPPER.
> 
> An alternative appoach is:
> 
>   "[PATCH] C++: use an optional vec<location_t> for callsites"
>     https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01392.html
> 
> which eschews wrapper nodes in favor of duplicating the C frontend's
> workaround here (though Jason disliked this approach, when we
> discussed
> it at Cauldron).
> 
> Thoughts?
> 
> (I'm keen on getting *some* solution for providing location_t values
> for arguments at C++ callsites into gcc 8)
> 
> 
> David Malcolm (14):
>   C++: preserve locations within build_address
>   Support for adding and stripping location_t wrapper nodes
>   C++: add location_t wrapper nodes during parsing (minimal impl)
>   Update testsuite to show improvements
>   tree.c: strip location wrappers from integer_zerop etc
>   Fix Wsizeof-pointer-memaccess*.c
>   reject_gcc_builtin: strip any location wrappers
>   cp/tree.c: strip location wrappers in lvalue_kind
>   Strip location wrappers in null_ptr_cst_p
>   warn_for_memset: handle location wrappers
>   Handle location wrappers in string_conv_p
>   C++: introduce null_node_p
>   c-format.c: handle location wrappers
>   pp_c_cast_expression: don't print casts for location wrappers
> 
>  gcc/c-family/c-common.c                            |   3 +
>  gcc/c-family/c-common.h                            |   1 +
>  gcc/c-family/c-format.c                            |   9 +-
>  gcc/c-family/c-pretty-print.c                      |  66 +++++-
>  gcc/c-family/c-warn.c                              |   8 +
>  gcc/cp/call.c                                      |   6 +-
>  gcc/cp/cp-tree.h                                   |  13 ++
>  gcc/cp/cvt.c                                       |   2 +-
>  gcc/cp/error.c                                     |   2 +-
>  gcc/cp/except.c                                    |   2 +-
>  gcc/cp/parser.c                                    |  30 ++-
>  gcc/cp/tree.c                                      |   2 +
>  gcc/cp/typeck.c                                    |   6 +-
>  .../g++.dg/diagnostic/param-type-mismatch.C        |  27 +--
>  .../g++.dg/plugin/diagnostic-test-expressions-1.C  | 260
> +++++++++++++--------
>  gcc/tree.c                                         |  59 +++++
>  gcc/tree.h                                         |  26 +++
>  17 files changed, 392 insertions(+), 130 deletions(-)
> 

Reply via email to