Re: Question about undefined functions' parameters during LTO

2020-03-13 Thread Richard Biener via Gcc
On Thu, Mar 12, 2020 at 5:31 PM Erick Ochoa
 wrote:
>
> Hello,
>
> I am trying to find out the arguments of functions which are undefined
> during LTO.
>
> Basically:
>
> gcc_assert(in_lto_p && !cnode->definition)
> // Do we have arguments?
> gcc_assert(DECL_ARGUMENTS(cnode->decl)) // fails
> // No, we don't.
>
> As I understand it, functions which are not defined are ones which have
> have been declared external.
>
> I believe that, when building an application with -flto, the only
> functions which are not visible during LTO **and** are declared external
> are functions defined in libraries which have not been compiled with
> -flto. An example of this is glibc.
>
> Indeed, I have implemented an analysis pass in gcc which prints out
> undefined functions, and it prints out the following:
>
> undefined function __gcov_merge_add
> undefined function fopen
> undefined function printf
> undefined function __builtin_putchar
> undefined function calloc
> undefined function __gcov_merge_topn
> undefined function strtol
> undefined function free
> ... and more
>
> Now, I am not interested in the bodies of these. I am only interested in
> determining the type of the arguments passed to these functions.
> However, when I call the following function:
>
> ```
> void
> print_parameters_undefined_functions(const cgraph_node *cnode)
> {
>gcc_assert(cnode);
>gcc_assert(in_lto_p);
>gcc_assert(!cnode->definition);
>
>tree function = cnode->decl;
>gcc_assert(function);
>enum tree_code code = TREE_CODE (function);
>bool is_function_decl = FUNCTION_DECL == code;
>gcc_assert (is_function_decl);
>
>log("about to print decl_arguments(%s)\n", cnode->name());
>for (tree parm = DECL_ARGUMENTS (function); parm; parm =
> DECL_CHAIN(parm))
>{
>  log("hello world\n");
>}
> ```
>
> I never see "hello world" but I do see "about to print...".
> Does anyone have any idea on how to obtain the arguments to undefined
> functions?

The argument types or the actual arguments to all calls?  "hello world" sounds
like you want actual arguments.  For those you need to look at the callgraph
edges to the cgraph node of the external functions (node->callers) and there
at the call stmts - which will not be available in WPA mode.

>
> The only way I see to do this, is to walk through the gimple
> instructions, find GIMPLE_CALL statements and look at the argument list
> at that moment. But I was wondering if there's a more efficient way to
> do it.
>
> Thanks!


Re: How to extend SLP to support this case

2020-03-13 Thread Richard Biener via Gcc
On Tue, Mar 10, 2020 at 12:32 PM Tamar Christina 
wrote:

>
> > -Original Message-
> > From: Gcc  On Behalf Of Richard Biener
> > Sent: Tuesday, March 10, 2020 11:12 AM
> > To: Kewen.Lin 
> > Cc: GCC Development ; Segher Boessenkool
> > 
> > Subject: Re: How to extend SLP to support this case
> >
> > On Tue, Mar 10, 2020 at 7:52 AM Kewen.Lin  wrote:
> > >
> > > Hi all,
> > >
> > > I'm investigating whether GCC can vectorize the below case on ppc64le.
> > >
> > >   extern void test(unsigned int t[4][4]);
> > >
> > >   void foo(unsigned char *p1, int i1, unsigned char *p2, int i2)
> > >   {
> > > unsigned int tmp[4][4];
> > > unsigned int a0, a1, a2, a3;
> > >
> > > for (int i = 0; i < 4; i++, p1 += i1, p2 += i2) {
> > >   a0 = (p1[0] - p2[0]) + ((p1[4] - p2[4]) << 16);
> > >   a1 = (p1[1] - p2[1]) + ((p1[5] - p2[5]) << 16);
> > >   a2 = (p1[2] - p2[2]) + ((p1[6] - p2[6]) << 16);
> > >   a3 = (p1[3] - p2[3]) + ((p1[7] - p2[7]) << 16);
> > >
> > >   int t0 = a0 + a1;
> > >   int t1 = a0 - a1;
> > >   int t2 = a2 + a3;
> > >   int t3 = a2 - a3;
> > >
> > >   tmp[i][0] = t0 + t2;
> > >   tmp[i][2] = t0 - t2;
> > >   tmp[i][1] = t1 + t3;
> > >   tmp[i][3] = t1 - t3;
> > > }
> > > test(tmp);
> > >   }
> > >
> > > With unlimited costs, I saw loop aware SLP can vectorize it but with
> > > very inefficient codes.  It builds the SLP instance from store group
> > > {tmp[i][0] tmp[i][1] tmp[i][2] tmp[i][3]}, builds nodes {a0, a0, a0,
> > > a0}, {a1, a1, a1, a1}, {a2, a2, a2, a2}, {a3, a3, a3, a3} after
> > > parsing operands for tmp* and t*.  It means it's unable to make the
> > > isomorphic group for a0, a1, a2, a3, although they appears isomorphic
> > > to merge.  Even if it can recognize over_widening pattern and do some
> > > parallel for two a0 from two iterations, but it's still inefficient
> (high cost).
> > >
> > > In this context, it looks better to build  first by
> > > leveraging isomorphic computation trees constructing them, eg:
> > >   w1_0123 = load_word(p1)
> > >   V1_0123 = construct_vec(w1_0123)
> > >   w1_4567 = load_word(p1 + 4)
> > >   V1_4567 = construct_vec(w1_4567)
> > >   w2_0123 = load_word(p2)
> > >   V2_0123 = construct_vec(w2_0123)
> > >   w2_4567 = load_word(p2 + 4)
> > >   V2_4567 = construct_vec(w2_4567)
> > >   V_a0123 = (V1_0123 - V2_0123) + (V1_4567 - V2_4567)<<16
> > >
> > > But how to teach it to be aware of this? Currently the processing
> > > starts from bottom to up (from stores), can we do some analysis on the
> > > SLP instance, detect some pattern and update the whole instance?
> >
> > In theory yes (Tamar had something like that for AARCH64 complex
> rotations
> > IIRC).  And yes, the issue boils down to how we handle SLP discovery.
> I'd like
> > to improve SLP discovery but it's on my list only after I managed to get
> rid of
> > the non-SLP code paths.  I have played with some ideas (even produced
> > hackish patches) to find "seeds" to form SLP groups from using
> multi-level
> > hashing of stmts.
>
> I still have this but missed the stage-1 deadline after doing the
> rewriting to C++ 😊
>
> We've also been looking at this and the approach I'm investigating now is
> trying to get
> the SLP codepath to handle this after it's been fully unrolled. I'm
> looking into whether
> the build-slp can be improved to work for the group size == 16 case that
> it tries but fails
> on.
>
> My intention is to see if doing so would make it simpler to recognize this
> as just 4 linear
> loads and two permutes. I think the loop aware SLP will have a much harder
> time with this
> seeing the load permutations it thinks it needs because of the permutes
> caused by the +/-
> pattern.
>
> One Idea I had before was from your comment on the complex number patch,
> which is to try
> and move up TWO_OPERATORS and undo the permute always when doing +/-. This
> would simplify
> the load permute handling and if a target doesn't have an instruction to
> support this it would just
> fall back to doing an explicit permute after the loads.  But I wasn't sure
> this approach would get me the
> results I wanted.
>
> In the end you don't want a loop here at all. And in order to do the above
> with TWO_OPERATORS I would
> have to let the SLP pattern matcher be able to reduce the group size and
> increase the no# iterations during
> the matching otherwise the matching itself becomes quite difficult in
> certain cases.
>

Just to show where I'm heading I'm attaching current work-in-progress that
introduces
an explicit SLP merge node and implementing SLP_TREE_TWO_OPERATORS that way:

   v1 + v2  v1 - v2
   \  /
  merge

the SLP merge operation is concatenating operands in order and then applies
a lane permutation mask (basically a select from the input lanes).  The
actual patch
depends on earlier cleanups and more preparatory changes are in order to
make
it "nice".  With such SLP merge operation in place it sho

Re: Thought on inlining indirect function calls

2020-03-14 Thread Richard Biener via Gcc
On March 14, 2020 10:55:09 AM GMT+01:00, "FRÉDÉRIC RECOULES" 
 wrote:
>Hello the GCC community,
>I just want to share some thoughts on inlining a function even if
>it is called through a function pointer.
>My starting point is the version 9.2 (used at https://godbolt.org/),
>so I am sorry if something similar have already been discussed since.
>
>
>For the context, I got very excited when I discovered the (not so new
>but not yet really used) Link Time Optimization and I started to play
>with
>to put under the test the inlining capacities.
>I will assume however that LTO is just an enabler and so, examples can
>be
>simplified by writing everything in the same file and activate the
>whole
>program optimization.
>
>
>To make my remarks concrete, I will rely on the following (dumb but
>inspired by real software) example compiled with -O3 -fwhole-program:
>
>int (*f) (int, int);
>
>static int f_add (int x, int y)
>{
>return x + y;
>}
>
>static int f_sub (int x, int y)
>{
>return x - y;
>}
>
>enum f_e { ADD, SUB };
>void f_init(enum f_e op) {
>switch (op) {
>case ADD:
>f = &f_add;
>break;
>case SUB:
>f = &f_sub;
>}
>}
>
>STEP 1: statically known at function call site
>
>#include 
>#include 
>
>int main (int argc, char *argv[])
>{
>int x, y, z;
>f_init(ADD);
>if (argc < 3) return -1;
>x = atoi(argv[1]);
>y = atoi(argv[2]);
>z = f(x, y);
>printf("%d\n", z);
>return 0;
>}
>
>I was pretty disappointed to see that even if the compiler knows we are
>calling f_add, it doesn't inline the call (it ends up with "call
>f_add").

It's probably because we know it's only called once and thus not performance 
relevant. Try put it into a loop. 

Richard. 

>I can but only suppose it is because its address is taken and from a
>blind black box user perspective, it doesn't sound too difficult to
>completely inline it.
>
>STEP 2: statically known as being among a pool of less than
>(arbitrarily fixed = 2) N functions
>
>#include 
>#include 
>#include 
>
>int main (int argc, char *argv[])
>{
>int x, y, z;
>enum f_e e;
>if (argc < 4) return -1;
>if (strcmp(argv[1], "add") == 0)
>  e = ADD;
>else if (strcmp(argv[1], "sub") == 0)
>  e = SUB;
>else return -1;
>f_init(e);
>x = atoi(argv[2]);
>y = atoi(argv[3]);
>z = f(x, y);
>printf("%d\n", z);
>return 0;
>}
>
>Here the compiler can't know at compile time the function that will be
>called but I suppose that it knows that it will be either f_add or
>f_sub.
>A simple work around would be for the compiler to test at the call site
>the value of f and inline the call thereafter:
>
>if (f == &f_add)
>z = f_add(x, y);
>else if (f == &f_sub)
>z = f_sub(x, y);
>  else __builtin_unreachable(); /* or z = f(x, y) to be conservative */
>
>Once again, this transformation don't sound too complicated to
>implement.
>Still, easy to say-so without diving into the compiler's code.
>
>
>I hope it will assist you in your reflections,
>Have a nice day,
>Frédéric Recoules



Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level

2020-03-17 Thread Richard Biener via Gcc
On Tue, Mar 17, 2020 at 3:33 PM Aditya K via Gcc  wrote:
>
> As I understand the openmp outliner is also at the tree level. A region based 
> outliner could be reused there. I’m not particular about the outliner being 
> specific to ipa-split. A GSoC project can help us get the coding+testing 
> done.  Any pass that needs a function splitting at tree level can reuse them.

There's a SESE region outliner (it also handles SEME regions with
alternate exits exiting the function), in move_sese_region_to_fn.
Probably exactly what would be needed for this.  It's also used by
OpenMP outlining (but not IPA split as Honza said).

Richard.

> -Aditya
> 
> From: Jakub Jelinek 
> Sent: Monday, March 16, 2020 5:19:16 PM
> To: Aditya K 
> Cc: Jan Hubicka ; gcc@gcc.gnu.org 
> Subject: Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
>
> On Mon, Mar 16, 2020 at 11:11:14PM +, Aditya K via Gcc wrote:
> > >
> > > 2) ipa-split is very simplistic and only splits when there is no value
> > >computed in header of function used in the tail.  We should support
> > >   adding extra parameters for values computed and do more general SESE
> > >outlining
> > >  Note that we do SESE outlining for openMP but this code is not
> > >  interfaced very generically to be easilly used by ipa-split.
> >
> > This sounds like a good GSoC project to work on. We could have a SESE/SEME 
> > based ipa-split, that
> > could help with function splitting as well as openMP.
>
> No, OpenMP region outlining needs to be done where it is done currently,
> ipa-split is way too late for that.
>
> Jakub
>


Re: Not usable email content encoding

2020-03-19 Thread Richard Biener via Gcc
On Thu, Mar 19, 2020 at 2:28 PM Florian Weimer  wrote:
>
> * Tom Tromey:
>
> > Also, gerrit was pretty bad about threading messages, so it became quite
> > hard to follow progress in email (but following all patches in the web
> > interface is very difficult, a problem shared by all these web UIs).
>
> What I found most disappointing was that the web interface doesn't
> guide you to the next reasonable step for your reviews and patches,
> like showing comments which need addressing.  Tagging messages in an
> email client for later action actually works better than that, I think.

I guess if anything we'd want something git-centric now like github
or gitlab pull requests & reviews.  The only complication is approval
then which would still mean manual steps.  Patch review would also not
be publicly visible and archived(?) so both chiming in late after visible
progress and archeology would be harder.  I think following all
patch reviews by clicking on websites rather than watching gcc-patches
is impractical.

Richard.


Re: Vectorization Messages

2020-03-24 Thread Richard Biener via Gcc
On March 24, 2020 5:45:05 PM GMT+01:00, Roger Martz via Gcc  
wrote:
>I was glad to see that compiler flags such as -fopt-info-vec-missed ...
>provide information about what is happening under the hood w.r.t code
>that
>can and can't be vectorized.
>
>Can anyone point me to a document, etc. that would be helpful in
>understanding what the messages output from the compiler mean?   Most
>are
>not obvious.

There is no documentation besides the source unfortunately... 

Richard. 

>Thanks.
>
>Roger



Re: Question on lto-stream-out

2020-03-26 Thread Richard Biener via Gcc
On Thu, Mar 26, 2020 at 12:01 PM lizekun (A)  wrote:
>
> Hi,
> I have a question on function "get_symbol_initial_value" in lto-stream-out.c.
>
> When the initial value of symbol is constructor, it will be replaced by an 
> error_mark.
> What's the benefit of donging this? In some cases, it increases the size of 
> binary.

constructors are streamed into a separate section, the above just "marks" the
variable that there is one available.

> I would be grateful if anyone could help.
>
> Best regulars


Re: Question on lto-stream-out

2020-03-26 Thread Richard Biener via Gcc
On Thu, Mar 26, 2020 at 2:00 PM lizekun (A)  wrote:
>
> Thanks for replying!
>
> I've dumped the lto stream-out file and noticed that constructors are 
> streamed to
>  function_body.
> Also, I noticed that, the binany which is generated when streaming 
> constructors
> to function_body has bigger size than the one's streaming to decls.
>
> So, I wonder why we need to stream constructors to other section.

Because we then can ship them selectively to only the LTRANS units
that can make use of them saving in streaming for other LTRANS units.

Richard.

> Best regards!
>
> > -Original Message-
> > From: Richard Biener [mailto:richard.guent...@gmail.com]
> > Sent: 2020年3月26日 20:27
> > To: lizekun (A) 
> > Cc: gcc@gcc.gnu.org
> > Subject: Re: Question on lto-stream-out
> >
> > On Thu, Mar 26, 2020 at 12:01 PM lizekun (A)  wrote:
> > >
> > > Hi,
> > > I have a question on function "get_symbol_initial_value" in lto-stream-
> > out.c.
> > >
> > > When the initial value of symbol is constructor, it will be replaced by an
> > error_mark.
> > > What's the benefit of donging this? In some cases, it increases the size 
> > > of
> > binary.
> >
> > constructors are streamed into a separate section, the above just "marks"
> > the variable that there is one available.
> >
> > > I would be grateful if anyone could help.
> > >
> > > Best regulars


Re: [QUESTION] About RTL optimization at forward propagation

2020-03-30 Thread Richard Biener via Gcc
On Sat, Mar 28, 2020 at 4:19 AM xiezhiheng  wrote:
>
> Hi,
>   I find there exists some restricts in function fwprop preventing it to 
> forward propagate addresses into loops.
> /* Go through all the uses.  df_uses_create will create new ones at the
>end, and we'll go through them as well.
>
>Do not forward propagate addresses into loops until after unrolling.
>CSE did so because it was able to fix its own mess, but we are not.  */
> for (i = 0; i < DF_USES_TABLE_SIZE (); i++)
>   {
> if (!propagations_left)
>   break;
>
> df_ref use = DF_USES_GET (i);
> if (use)
>   {
> if (DF_REF_TYPE (use) == DF_REF_REG_USE
> || DF_REF_BB (use)->loop_father == NULL
> <<
> /* The outer most loop is not really a loop.  */
> || loop_outer (DF_REF_BB (use)->loop_father) == NULL)
>   forward_propagate_into (use, fwprop_addr_p);
>
> else if (fwprop_addr_p)
>   forward_propagate_into (use, false);
>   }
>   }
>
>   And I have two questions.
>   1) What are the reasons or background for not forward propagating addresses 
> into loops ?
>   2) Can we still forward propagate addresses if the def and use are in the 
> same loop ?
>   I mean something like:

The condiition indeed looks odd, the canonical way would be to check

  !flow_loop_nested_p (DF_REF_BB (def)->loop_father, DF_REF_BB
(use)->loop_father)

which would allow propagating addresses defined in loops outside as well.
And loop_father should never be NULL I think.

> diff -Nurp a/gcc/fwprop.c b/gcc/fwprop.c
> --- a/gcc/fwprop.c  2020-03-27 03:17:50.70400 -0400
> +++ b/gcc/fwprop.c  2020-03-27 04:58:35.14800 -0400
> @@ -1573,10 +1573,12 @@ fwprop (bool fwprop_addr_p)
>df_ref use = DF_USES_GET (i);
>if (use)
> {
> + df_ref def = get_def_for_use (use);
>   if (DF_REF_TYPE (use) == DF_REF_REG_USE
>   || DF_REF_BB (use)->loop_father == NULL
>   /* The outer most loop is not really a loop.  */
> - || loop_outer (DF_REF_BB (use)->loop_father) == NULL)
> + || loop_outer (DF_REF_BB (use)->loop_father) == NULL
> + || (def && DF_REF_BB (def)->loop_father == DF_REF_BB 
> (use)->loop_father))
> forward_propagate_into (use, fwprop_addr_p);
>
>   else if (fwprop_addr_p)
>
> I would be grateful if anyone could help.
>
> Best regards


Re: Question about undefined functions' parameters during LTO

2020-04-07 Thread Richard Biener via Gcc
On Tue, Apr 7, 2020 at 1:54 PM Erick Ochoa
 wrote:
>
> Hello Micheal,
>
> Thanks for this lead! It is almost exactly what I need. I do have one
> more question about this. It seems that the types obtained via
> FOR_EACH_FUNCTION_ARGS and TREE_TYPE are different pointers when
> compiled with -flto.
>
> What do I mean by this? Consider the following code:
>
> #include 
> int main(){
>FILE *f = fopen("hello.txt", "w");
>fclose(f);
>return 0;
> }
>
> The trees corresponding to types FILE* and FILE obtained via the
> variable f are different from the trees obtained from the argument to
> fclose.
>
> Let's say that we have a gcc pass with the following global variables:
>
> tree _local_file_ptr_type;
> tree _local_file_type;
> tree _glibc_file_ptr_type;
> tree _glibc_file_type;
>
> These variables will hold the trees that correspond to types:
>
> * FILE* and FILE, obtained via TREE_TYPE(f)
> * FILE* and FILE, obtained via FOREACH_FUNCTION_ARGS(fclose_func,i,t)
>
> And these variables will be compared using pointer equality. Here, we
> can print the address of the variables to find out their values.
>
> log("%p =?= %p\n", _local_file_ptr_type, _glibc_file_ptr_type);
> log("%p =?= %p\n", _local_file_type, _glibc_file_type);
>
> When the simple C program is compiled via:
> /path/to/gcc a.c -fdump-ipa-hello-world -fipa-hello-world
> we see that the pointers are the same.
>
> pointers 0x7a8dcb70 =?= 0x7a8dcb70
> records 0x7a8dbfa0 =?= 0x7a8dbfa0
>
> However, when we are compiling the simple C program via
> /path/to/gcc -flto a.c -fdump-ipa-hello-world -fipa-hello-world
> /path/to/gcc -flto -flto-patition=none -fipa-hello-world a.c -o a.out
> one can see that the pointers are different:
>
> pointers 0x79ee1c38 =?= 0x79ee0b28
> records 0x79ee1b90 =?= 0x79ee0a80
>
> Do you, or anyone else for that matter, know if it would be possible
> to keep the trees pointing to the same address? Or, in case it can be
> possible with some modifications, where could I start looking to modify
> the source code to make these addresses match? The other alternative for
> me would be to make my own type comparison function, which is something
> I can do. But I was wondering about this first.

You are probably running into the fact that we have a global
fileptr_type_node which may end up used by FE parsing(?) in
addition to uses from builtins (there's no builtin for fclose).
We're keeping those equal to that node equal through LTO
via preload_common_nodes but any others may or may not be
merged (but we never merge [into] common nodes!).

That said, you cannot really rely on pointer equality of type trees.
Consider for example different levels of "completion" of the "same"
type in different TUs:

TU1:
struct S { struct B *b; };

TU2:
struct B {};
struct S { struct B *b; };

the S types are equal but they will _not_ share the same type node
(not reliably so - you might find that we indeed to unify the nodes
by means of making all pointers point to incomplete variants of aggregates)

> Here is the patch necessary for running this hello world pass. The
> interesting part of the code is in ipa-hello-world.c . The utils file is
> only used for printing out a human readable name.
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index fa9923bb270..1c0fef5c8a2 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1399,6 +1399,8 @@ OBJS = \
> incpath.o \
> init-regs.o \
> internal-fn.o \
> +   ipa-hello-world.o \
> +   ipa-str-reorg-utils.o \
> ipa-cp.o \
> ipa-sra.o \
> ipa-devirt.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4368910cb54..d61498d722c 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3392,4 +3392,8 @@ fipa-ra
>   Common Report Var(flag_ipa_ra) Optimization
>   Use caller save register across calls if possible.
>
> +fipa-hello-world
> +Common Report Var(flag_ipa_hello_world) Optimization
> +TBD
> +
>   ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/ipa-hello-world.c b/gcc/ipa-hello-world.c
> new file mode 100644
> index 000..41cab07c357
> --- /dev/null
> +++ b/gcc/ipa-hello-world.c
> @@ -0,0 +1,192 @@
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple-expr.h"
> +#include "predict.h"
> +#include "alloc-pool.h"
> +#include "tree-pass.h"
> +#include "cgraph.h"
> +#include "diagnostic.h"
> +#include "fold-const.h"
> +#include "gimple-fold.h"
> +#include "symbol-summary.h"
> +#include "tree-vrp.h"
> +#include "ipa-prop.h"
> +#include "tree-pretty-print.h"
> +#include "tree-inline.h"
> +#include "ipa-fnsummary.h"
> +#include "ipa-utils.h"
> +#include "tree-ssa-ccp.h"
> +#include "stringpool.h"
> +#include "attribs.h"
> +
> +
> +#include 
> +#include 
> +
> +#include "ipa-str-reorg-utils.h"
> +
> +inline static void
> +log(const char* format, ...)
> +{
> +  va_list args;
> +  va_start(ar

Re: Question about undefined functions' parameters during LTO

2020-04-07 Thread Richard Biener via Gcc
On Tue, Apr 7, 2020 at 2:41 PM Erick Ochoa
 wrote:
>
>
>
> On 07/04/2020 14:34, Michael Matz wrote:
> > Hello,
> >
> > On Tue, 7 Apr 2020, Erick Ochoa wrote:
> >
> >> Thanks for this lead! It is almost exactly what I need. I do have one more
> >> question about this. It seems that the types obtained via
> >> FOR_EACH_FUNCTION_ARGS and TREE_TYPE are different pointers when compiled 
> >> with
> >> -flto.
> >>
> >> What do I mean by this? Consider the following code:
> >>
> >> #include 
> >> int main(){
> >>FILE *f = fopen("hello.txt", "w");
> >>fclose(f);
> >>return 0;
> >> }
> >>
> >> The trees corresponding to types FILE* and FILE obtained via the variable f
> >> are different from the trees obtained from the argument to fclose.
> >
> > Yes, quite possible.
> >
> >> However, when we are compiling the simple C program via
> >> /path/to/gcc -flto a.c -fdump-ipa-hello-world -fipa-hello-world
> >> /path/to/gcc -flto -flto-patition=none -fipa-hello-world a.c -o a.out
> >> one can see that the pointers are different:
> >>
> >> pointers 0x79ee1c38 =?= 0x79ee0b28
> >> records 0x79ee1b90 =?= 0x79ee0a80
> >>
> >> Do you, or anyone else for that matter, know if it would be possible to
> >> keep the trees pointing to the same address? Or, in case it can be
> >> possible with some modifications, where could I start looking to modify
> >> the source code to make these addresses match? The other alternative for
> >> me would be to make my own type comparison function, which is something
> >> I can do. But I was wondering about this first.
> >
> > So, generally type equality can't be established by pointer equality in
> > GCC, even more so with LTO; there are various reasons why the "same" type
> > (same as in language equality) is represented by different trees, and
> > those reasons are amplified with LTO.  We try to unify some equal types to
> > the same trees when reading in LTO bytecode, but that's only an
> > optimization mostly.
> >
> > So, when you want to compare types use useless_type_conversion_p (for
> > equivalence you need useless(a,b) && useless(b,a)).  In particular, for
> > record types T it's TYPE_CANONICAL(T) that needs to be pointer-equal.
> > (I.e. you could hard-code that as well, but it's probably better to use
> > the existing predicates we have).  Note that useless_type_conversion_p is
> > for the middle-end type system (it's actually one part of the definition
> > of that type system), i.e. it's language agnostic.  If you need language
> > specific equality you would have to use a different approach, but given
> > that you're adding IPA passes you probably don't worry about that.
>
> I've been using TYPE_MAIN_VARIANT(T) as opposed to TYPE_CANONICAL(T).
> This was per the e-mail thread:
> https://gcc.gnu.org/legacy-ml/gcc/2020-01/msg00077.html .

TYPE_CANONICAL (which might be NULL!) is conservative on the side
of making types equal when they are not and used for type-based alias analysis
where erroring so is conservatively correct but not optimal.

TYPE_MAIN_VARIANT (which should never be NULL) is conservative on the
side of makeing types distinct when they are not and is not really used in the
middle-end but for means of caching variants of types (aka const qualified
vs. unqualified).  Erroring in making types distinct when they are not is then
merely a missed caching opportunity.

ISTR having said that being conservative on both sides is impossible.
More so with LTO and absolutely when multiple source languages are
involved.  Which is why I chose to dismiss the idea of using "type escape".

> I am not 100% sure what the differences are between these two yet, but I
> think TYPE_CANONICAL(T) was not helpful because of typedefs? I might be
> wrong here, it has been a while since I did the test to see what worked.
>
> Using TYPE_MAIN_VARIANT(T) has gotten us far in an optimization we are
> working on, but I do think that a custom type comparison is needed now.
>
> I do not believe I can use useless_type_conversion_p because I need a
> partial order in order to place types in a set.
>
> >
> >
> > Ciao,
> > Michael.
> >


Re: Vectorization of loop which operate on local arrays

2020-04-14 Thread Richard Biener via Gcc
On Tue, Apr 14, 2020 at 4:39 PM Shubham Narlawar via Gcc
 wrote:
>
> Hello,
>
> I am working on gcc-4.9.4 and encountered different results of loop
> vectorization on array arr0, arr1 and arr2.
>
> Testcase -
>
> int main()
>   {
> int i;
> for (i=0; i<64; i++)
>   {
> arr2[i]=(arr1[i]|arr0[i]);
>   }
>   }
>
> Using -O2 -ftree-vectorize, Above loop is vectorized if arr0, arr1,
> arr2 are global arrays whereas if they are local, loop is not getting
> vectorized.
>
> 1. Is there any flag which will enable vectorization of loop which
> will operate on local array as well?

Generally vectorization does not care about this difference.  It might be
that alignment of local arrays is not sufficient (you do not say which
target you are working with).  Maybe you are restricted by the
vectorizers default cost model at -O2 which is "cheap", try
-fvect-cost-model=dynamic or -O3.

> 2. Which part of code I need to tweak so that I would get
> vectorization on loops operating on local arrays or at least know why
> vectorization is not suggested for such loops?

You can look at the vectorizer debugging dumps output by
-fdump-tree-vect-details (in a file, source.1xxt.vect).  It can
be a bit overwhelming though.

I also strongly recomment to update to a newer version of GCC,
GCC 4.9 is more than 5 years old and no longer maintained.

Richard.

> Thanks and Regards
> Shubham


Re: [RFC, doloop] How to deal with invariants introduced by doloop

2020-04-17 Thread Richard Biener via Gcc
On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc  wrote:
>
> Hi all,
>
> This is one question origining from PR61837, which shows that doloop
> pass could introduce some loop invariants against its outer loop when
> doloop performs and prepares the iteration count for hardware
> count register assignment.
>
> Even with Xionghu's simplification patch
> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543887.html,
> we still have one zero extension un-hoisted, and possibly much
> more if we don't have any chances to simplify it.
>
> The current related pass pipeline is:
>
> NEXT_PASS (pass_rtl_loop_init);
> NEXT_PASS (pass_rtl_move_loop_invariants);
> NEXT_PASS (pass_rtl_unroll_loops);
> NEXT_PASS (pass_rtl_doloop);
> NEXT_PASS (pass_rtl_loop_done);
>
> * How about tweaking pass order?
>
>   1. move pass_rtl_move_loop_invariants after pass_rtl_doloop?
>
> I don't know the historical reason on this order, but one no-go
> reason may be that it can impact following unroll, which is
> sensitive to #insn.  Hoisting can probably reduce loop body #insn,
> easy to break existing things without it there?

It's probably historical up to the point where we did not have the
SSA infrastructure and optimization passes.  I'd indeed be more
worried about invariants created later.

>   2. move pass_rtl_doloop before pass_rtl_move_loop_invariants?
>
> It looks impractical, pass_rtl_unroll_loops can update the loop
> niter which is required by doloop preparation, predicting the value
> correctly sounds impossible.
>
> * rerun hoisting pass?
>
>   3. run pass_rtl_move_loop_invariants again after pass_rtl_doloop?
>
> It looks practical? But currently on Power it can't hoist all the
> invariants, it's caused that the pseudo used for count register which
> isn't taken as "invariant" by pass_rtl_move_loop_invariants, since
> we still leave the doloop end pattern inside the loop, for example:
>
>   67: r143:SI=r119:DI#0-0x1
>   68: r142:DI=zero_extend(r143:SI)
>   69: r141:DI=r142:DI+0x1// count
>   ...
>   66: {pc={(r141:DI!=0x1)?L64:pc};r141:DI=r141:DI-0x1;
>clobber scratch;clobber scratch;}   // doloop_end
>
> though the doloop_end will be eliminated on Power eventually,
>
> I noticed that there is one hook doloop_begin, which looks to help
> ports to introduce loop count pseudo.  By supporting this for rs6000,
> I can obtain the insns what I expected on Power.
>
> Original:
>12: NOTE_INSN_BASIC_BLOCK 4
>67: %9:SI=%5:SI-0x1   // invariant
>14: %8:DI=%4:DI-0x1
>68: %9:DI=zero_extend(%9:SI)  // invariant
>15: %10:DI=%3:DI
>69: %9:DI=%9:DI+0x1   // invariant
>82: ctr:DI=%9:DI
>   REG_DEAD %9:DI
>
> With rerun hoisting:
>
>12: NOTE_INSN_BASIC_BLOCK 4
>69: %8:DI=%5:DI+0x1  // invariant
>14: %10:DI=%4:DI-0x1
>84: ctr:DI=%8:DI
>   REG_DEAD %8:DI
>15: %9:DI=%3:DI
>
> With rerun hoisting + doloop_begin:
>
>12: NOTE_INSN_BASIC_BLOCK 4
>70: ctr:DI=%5:DI
>14: %10:DI=%4:DI-0x1
>15: %9:DI=%3:DI
>
> But I still had the concern that I'm not sure the current hoisting pass
> will consider register pressure, if the niter related invariants
> have some derived uses inside the loop, probably creating more
> register pressure here.
>
> Attached is the initial patch for this naive proposal, I'd like to post it
> first to see whether it's reasonable to proceed.  If it's reasonable, one
> thing to be improved can be to guard it for only the related outer loops
> of which doloop perform succesfully on to save compilation time.
>
> Welcome any comments!

I suggest to try both approaches and count the number of transforms
done in each instance.

Richard.

>
> BR,
> Kewen


Re: SH Port Status

2020-04-21 Thread Richard Biener via Gcc
On Mon, Apr 20, 2020 at 11:05 PM Jeff Law via Gcc  wrote:
>
> On Mon, 2020-04-20 at 15:29 -0500, Joel Sherrill wrote:
> >
> >
> > On Mon, Apr 20, 2020, 3:13 PM Jeff Law  wrote:
> > > On Mon, 2020-04-20 at 14:47 -0500, Joel Sherrill wrote:
> > > > Hi
> > > >
> > > > Over at RTEMS, we were discussing ports to deprecate/obsolete
> > > > and the SH seems to be on everyone's candidate list. I can't seem
> > > > to find any gcc test results sh-unknown-elf since 2009 and none
> > > > for sh-rtems. I know I posted some but when, I can't say. But the
> > > > new  mailing list  setup may be messing that up. I expected more
> > > > recent results.
> > > >
> > > > (1) Is my search right? Have there been no test results in 10 years?
> > > >
> > > > (2) Is the toolchain in jeopardy?
> > > >
> > > > (3) I know there was an effort to do an open implementation with
> > > > j-core.org but there is no News or download item newer than 2016.
> > > > Is this architecture effectively dead except for legacy hardware out
> > > > in the field (Sega?)
> > > >
> > > > I'm leaning to RTEMS dropping support for the SH after we branch
> > > > a release and wondering if the GCC community knows anything that
> > > > I don't.
> > > I'm not aware of the SH toolchain being in any jeopardy.
> > >
> > >
> > > I'm doing weekly bootstrap (yes, really) & regression tests for 
> > > {sh4,sh4eb}-
> > > linux-gnu and daily builds of {sh3,sh3b}-linux-gnu.  See
> > >
> > > http://gcc.gnu.org/jenkins
> >
> > Awesome!
> > >
> > > The Linux kernel is currently broken, but I suspect it's a transient 
> > > issue as
> > > it
> > > was fine until a week ago -- my tester usually builds the kernel too, but
> > > that's
> > > been temporarily disabled for SH targets.
> >
> > Thanks Jeff! Are you using the simulator in gdb? That's what we have a BSP 
> > for?
> I'm using qemu -- it's user mode emulation is strong enough that I can create 
> a
> little sh4 native root filesystem and bootstrap GCC within it.
>
>
> >
> > We build the cross RTEMS tools regularly on Linux, Mac, FreeBSD, Mingw, and
> > Cygwin. All of our BSPs build including sh1 and the odd sh2e.
> >
> > Our BSP status for the gdb simulator is unknown. We replaced a lot of 
> > testing
> > infrastructure scripting and the SH hasn't gotten to the top of the list.
> ACK.  In general, if there's a qemu solution, that's my preference these days.
> For the *-elf targets I usually have to fall back to the old gdb-sim bits.
>
> >
> > So we both are building a lot and making sure rot hasn't set in. But in
> > practice, is this worth the trouble anymore?
> I'm not sure about that ;-)  I haven't seen anyone suggest removal of the 
> port or
> anything like that.  The port doesn't use CC0, so there's essentially zero 
> chance
> it'll be deprecated for gcc-11.  I believe the port is not using LRA, so 
> if/when
> we move on deprecating reload, SH might be at some degree of risk.

There's two listed maintainers as well (albeit at their anonymous
gcc.gnu.org domain).

Richard.

> jeff
> >
>


Re: Help implementing support for vec in gengtype

2020-04-21 Thread Richard Biener via Gcc
On Mon, Apr 20, 2020 at 11:45 PM Giuliano Belinassi
 wrote:
>
> Hi. Sorry for the late reply.
>
> On 03/02, Richard Biener wrote:
> > On Thu, Feb 27, 2020 at 6:56 PM Giuliano Belinassi
> >  wrote:
> > >
> > > Hi, all.
> > >
> > > I am tying to fix an issue with a global variable in the parallel gcc
> > > project. For this, I am trying to move some global variables from
> > > tree-ssa-operands to struct function. One of this variable is a
> > > vec type, and gengtype doesn't look so happy with it.
> >
> > I think the solution for this is to not move it to struct function
> > but instead have it local to function scope - the state is per
> > GIMPLE stmt we process and abstracting what is
> > cleaned up in cleanup_build_arrays() into a class we can
> > instantiate in the two callers is most appropriate.  In theory
> > all the functions that access the state could move into the
> > class as methods as well but you can pass down the state
> > everywhere needed as well.
>
> I implemented this strategy, but the issue remains. Therefore, the
> cause of it must be something else.

Btw, it would be nice to push those changes as cleanups during
next stage1.

> Just to contextualize, in [1], I also implemented parallelism in
> ParallelGcc using a pipeline method for testing, where I split the set
> of GIMPLE Intra Procedural Optimizations into multiple sets, and assign
> each set to a thread.  Then, the function passes through this pipeline.
>
> Now, I am trying to make this version pass the testsuite. There is a test
> in particular ('gcc.dg/20031102-1.c') that I am having difficulties
> finding the cause of the issue.
>
> if I run:
>
> /tmp/gcc10_parallel/usr/local/bin/gcc --param=num-threads=2 -O2 -c 
> 20031102-1.c
>
> The crash message is:
>
> ```
>  type  align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x7f44334fb000
> pointer_to_this >
> used static ignored external VOID :0:0
> align:8 warn_if_not_align:0>
>
> In function ‘main’:
> cc1: error: virtual use of statement not up to date
> # VUSE <.MEM_1(D)>
> _2 = FooBar ();
> during GIMPLE pass: walloca

^^^

so this pass isn't supposed to change anything which either means you're
missing some global state in the verify_ssa checker.  Notably the actual
verification error triggering checks the underlying .MEM symbol (a VAR_DECL).
Since your message above only shows one var_decl build_vuse must be
NULL somehow.

Now it could be that the pass_local_pure_const (the late one) changes 'FooBar'
to be const which means it wouldn't get a virtual operand anymore.  Looking
at the testcase that's likely the issue here.

That's a "tough one" and would ask for the const/pure-ness of call stmts
to be encoded in the call stmt itself (this issue is also one reason for
the fixup_cfg passes we have).  So instead of looking at the decl we'd
track this via a gimple_call_flags () flag and update that from the decl
at known points (for example when updating SSA operands (sic!) but
exactly not when just verifying them).

So for your branch try adding a verifying_p member to the class
and when verifying instead of

  /* If aliases have been computed already, add VDEF or VUSE
 operands for all the symbols that have been found to be
 call-clobbered.  */
  if (!(call_flags & ECF_NOVOPS))
{
  /* A 'pure' or a 'const' function never call-clobbers anything.  */
  if (!(call_flags & (ECF_PURE | ECF_CONST)))
add_virtual_operand (fn, stmt, opf_def);
  else if (!(call_flags & ECF_CONST))
add_virtual_operand (fn, stmt, opf_use);
}

rely on existing vuse/vdef like

  if (verifying_p)
{
   /* ???  Track const/pure/novop-ness in gimple call flags.  */
   if (gimple_vdef (stmt))
add_virtual_operand (...);
   else if (gimple_vuse (stmt))
add_virtual_operand (...);
   return;
}

and call it a day ;)

> cc1: internal compiler error: verify_ssa failed
> 0xfdb8fe verify_ssa(bool, bool)
> ../../gcc/gcc/tree-ssa.c:1208
> 0xcd3d08 execute_function_todo
> ../../gcc/gcc/passes.c:2017
> 0xcd49f2 execute_todo
> ../../gcc/gcc/passes.c:2064
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> ```
>
> Which is triggered by tree-ssa-operands.c:1066 in this branch, checking
> if build_vuse != use. Interestingly, this crash does not happens if:
>
> 1 - I set the number of threads to 1.
> 2 - I set the optimization level to O0, O1 or O3.
> 3 - I disable O2, but enable all flags enabled by O2
> (gcc -O2 -Q --help=optimizer).
> 4 - I left the first 115 passes in the same thread with a parameter I
> implmemented (--param=num-threads=2 --param=break=116). Any value
> smaller that this causes the issue.
>
> The crash is also consistent, which means that it happens 100% of time.
>
> Any light concerning this issue is welcome.

Re: [RFC, doloop] How to deal with invariants introduced by doloop

2020-04-21 Thread Richard Biener via Gcc
On Tue, Apr 21, 2020 at 10:42 AM Kewen.Lin  wrote:
>
> on 2020/4/17 下午7:32, Richard Biener wrote:
> > On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc  wrote:
> >>
> >> Hi all,
> >>
> >> This is one question origining from PR61837, which shows that doloop
> >> pass could introduce some loop invariants against its outer loop when
> >> doloop performs and prepares the iteration count for hardware
> >> count register assignment.
> >>
> > I suggest to try both approaches and count the number of transforms
> > done in each instance.
> >
> > Richard.
> >
>
> Hi Richi,
>
> Thanks for the suggestion, some statistics were collected as below.
>
> A: default
> B: move pass_rtl_move_loop_invariants after pass_rtl_doloop
> C: rerun pass_rtl_move_loop_invariants after pass_rtl_doloop
> D: C + doloop_begin
>
> Ran by bootstrapping and regression testing on ppc64le Power8 configured
> with languages c,c++,fortran,objc,obj-c++,go.
>
> Counting move #transformations in function move_invariant_reg (before
> label fail, probably better with inv == repr to filter out those
> replacements with rep, but I think the trend is similar?).
>
> A: 802650
> B: 841476
> C: 803485 (C1) + 827883 (C2)
> D: 802622 (D1) + 841476 (D2)
>
> Let's call pass_rtl_move_loop_invariants as hoisting.
> PS: C1/D1 for 1st time hoisting while C2/D2 for 2nd time hoisting.
> The small differences (~0.1%) among A/C1/D1 should be caused by noise.
>
> The numbers with twice runs (C/D) are almost two times of one time run,
> which surprised me.  By further investigation, it looks the current
> pass_rtl_move_loop_invariants has something to be improved if we want
> to rerun it.  Taking gcc/testsuite/gfortran.dg/inline_matmul_16.f90 at
> -O1 as example.  C1 does 178 transforms and C2 does 165, it's unrelated
> to unroll/doloop passes, this result isn't changed by disabling them
> explicitly.
>
> Currently, without flag_ira_loop_pressure, the regs_used estimation
> isn't good, I'd expect that invs which are hoisted first time from
> the loop should be counted as regs_used next time at regs_used
> analysis.  By checking the regs_used, it's set as 2 for all loops of
> case inline_matmul_16, either C1 or C2.  I think it leads the 2nd
> hoisting optimistically estimate register pressure and hoist more.
> By simple hacking by considering 1st hoisting new_reg, I can see the
> 2nd hoisting has fewer moves (57).  It means the above statistics
> comparison is unfair and unreliable.

OK, so that alone argues against doing C or D without better understanding
and fixing this.  That is, when you invoke invariant motion twice at its
current place the second invocation shouldn't really do any more hoisting,
definitely not a significant amount.

> With flag_ira_loop_pressure, the #transforms become to 255 (1st) and
> 68 (2nd), it looks better but might also need more enhancements?
>
> Since rs6000 sets flag_ira_loop_pressure at O3, I did SPEC2017
> performance evaluation on Power8 (against baseline A) with option
> -Ofast -funroll-loops:
>  * B showed 525.x264_r +1.43%, 538.imagick_r +1.23% speedup
>but 503.bwaves_r -2.74% degradation.
>  * C showed 500.perlbench_r -1.31%, 520.omnetpp_r -2.20% degradation.
>
> The evaluation shows running hoisting after doloop can give us some
> benefits, but to rerun it twice isn't able to give us the similar
> gains.  It looks regardless of flag_ira_loop_pressure, to rerun
> the pass requires more tweaks, probably considering those related
> parameters.  If go with B, we need to figure out what we miss forbwaves_r.

Of course it also requires benchmarking on other archs.

> BR,
> Kewen
>


Re: Help implementing support for vec in gengtype

2020-04-21 Thread Richard Biener via Gcc
On Tue, Apr 21, 2020 at 5:56 PM Giuliano Belinassi
 wrote:
>
> Hi, Richi
>
> On 04/21, Richard Biener wrote:
> > On Mon, Apr 20, 2020 at 11:45 PM Giuliano Belinassi
> >  wrote:
> > >
> > > Hi. Sorry for the late reply.
> > >
> > > On 03/02, Richard Biener wrote:
> > > > On Thu, Feb 27, 2020 at 6:56 PM Giuliano Belinassi
> > > >  wrote:
> > > > >
> > > > > Hi, all.
> > > > >
> > > > > I am tying to fix an issue with a global variable in the parallel gcc
> > > > > project. For this, I am trying to move some global variables from
> > > > > tree-ssa-operands to struct function. One of this variable is a
> > > > > vec type, and gengtype doesn't look so happy with it.
> > > >
> > > > I think the solution for this is to not move it to struct function
> > > > but instead have it local to function scope - the state is per
> > > > GIMPLE stmt we process and abstracting what is
> > > > cleaned up in cleanup_build_arrays() into a class we can
> > > > instantiate in the two callers is most appropriate.  In theory
> > > > all the functions that access the state could move into the
> > > > class as methods as well but you can pass down the state
> > > > everywhere needed as well.
> > >
> > > I implemented this strategy, but the issue remains. Therefore, the
> > > cause of it must be something else.
> >
> > Btw, it would be nice to push those changes as cleanups during
> > next stage1.
> >
> > > Just to contextualize, in [1], I also implemented parallelism in
> > > ParallelGcc using a pipeline method for testing, where I split the set
> > > of GIMPLE Intra Procedural Optimizations into multiple sets, and assign
> > > each set to a thread.  Then, the function passes through this pipeline.
> > >
> > > Now, I am trying to make this version pass the testsuite. There is a test
> > > in particular ('gcc.dg/20031102-1.c') that I am having difficulties
> > > finding the cause of the issue.
> > >
> > > if I run:
> > >
> > > /tmp/gcc10_parallel/usr/local/bin/gcc --param=num-threads=2 -O2 -c 
> > > 20031102-1.c
> > >
> > > The crash message is:
> > >
> > > ```
> > >  > > type  > > align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> > > 0x7f44334fb000
> > > pointer_to_this >
> > > used static ignored external VOID :0:0
> > > align:8 warn_if_not_align:0>
> > >
> > > In function ‘main’:
> > > cc1: error: virtual use of statement not up to date
> > > # VUSE <.MEM_1(D)>
> > > _2 = FooBar ();
> > > during GIMPLE pass: walloca
> >
> > ^^^
> >
> > so this pass isn't supposed to change anything which either means you're
> > missing some global state in the verify_ssa checker.  Notably the actual
> > verification error triggering checks the underlying .MEM symbol (a 
> > VAR_DECL).
> > Since your message above only shows one var_decl build_vuse must be
> > NULL somehow.
> >
> > Now it could be that the pass_local_pure_const (the late one) changes 
> > 'FooBar'
> > to be const which means it wouldn't get a virtual operand anymore.  Looking
> > at the testcase that's likely the issue here.
> >
> > That's a "tough one" and would ask for the const/pure-ness of call stmts
> > to be encoded in the call stmt itself (this issue is also one reason for
> > the fixup_cfg passes we have).  So instead of looking at the decl we'd
> > track this via a gimple_call_flags () flag and update that from the decl
> > at known points (for example when updating SSA operands (sic!) but
> > exactly not when just verifying them).
> >
> > So for your branch try adding a verifying_p member to the class
> > and when verifying instead of
> >
> >   /* If aliases have been computed already, add VDEF or VUSE
> >  operands for all the symbols that have been found to be
> >  call-clobbered.  */
> >   if (!(call_flags & ECF_NOVOPS))
> > {
> >   /* A 'pure' or a 'const' function never call-clobbers anything.  */
> >   if (!(call_flags & (ECF_PURE | ECF_CONST)))
> > add_virtual_operand (fn, stmt, opf_def);
> >   else if (!(call_flags & ECF_CONST))
> > add_virtual_operand (fn, stmt, opf_use);
> > }
> >
> > rely on existing vuse/vdef like
> >
> >   if (verifying_p)
> > {
> >/* ???  Track const/pure/novop-ness in gimple call flags.  */
> >if (gimple_vdef (stmt))
> > add_virtual_operand (...);
> >else if (gimple_vuse (stmt))
> > add_virtual_operand (...);
> >return;
> > }
> >
> > and call it a day ;)
>
> That indeed worked! Thank you. This one in particular was really tough!
>
> I will prepare a patch about these changes to trunk ready for stage1.
> There are some unused stuff that I found here that is nice to have it
> cleaned.
>
> I am just curious about how it was working before these changes, once it
> seems not to be a race condition. Or probaly there is a race condition
> lost somewhere that was triggering it?

Well it needs appropriate timing to catch the issue ... for non-threaded
builds the fixup_cfg pass fixes this up.

Richard.

> T

Re: Not usable email content encoding

2020-04-23 Thread Richard Biener via Gcc
On Thu, Apr 23, 2020 at 7:47 AM Florian Weimer  wrote:
>
> * Tamar Christina:
>
> > A bit late to the party, but this really doesn't work that well
> > because until recent version of gitlab there was no fairness
> > guarantee.  another patch could be approved after mine (with hours
> > in between because of CI) and yet still get merged first causing my
> > own patch to no longer apply, you'd rebase and roll the dice again.
> > To fix this they added merge trains
> > https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/
> >
> > but trains for GCC Will likely be very short because of Changelog
> > conflicts.  So I don't think an automated merge workflow would work
> > for projects where every single commit changes the same files.
>
> I had not thought about that.
>
> Does Gitlab support pluggable merge helpers?  The gnulib changelog
> auto-merger did a great job when we were still writing changelogs for
> glibc.

Btw, I encourage everybody trying to experiment with CI to set it up
for release branches first because of the lower check-in count.

Richard.


Re: Question about alias or points-to analysis

2020-05-06 Thread Richard Biener via Gcc
On Wed, May 6, 2020 at 12:26 PM Erick Ochoa
 wrote:
>
> Hi,
>
> I am trying to find out how to use the alias and/or points-to analysis
> in GCC. Ideally, I would like to find a function that given an
> allocation site, the return value is a set of pointers which may point
> to memory allocated from that allocation site.
>
> For example:
>
> int
> main(int argc, char** argv)
> {
>int a;
>int *b = argc > 2 ? &a : NULL;
>int *c = b;
> }
>
> Here, querying the allocation site corresponding to the declaration of
> local variable "a", should return { "b",  "c" }.

So that's a "reverse query" to that you are asking for below ...

> I've found the following documentation on Alias-Analysis [0] and two
> source files[1][2] which seem to implement some (distinct?) alias analysis.
>
> I am trying to keep the discussion a bit high level, otherwise I would
> have a lot of questions, but given this C example, **how would someone
> be able to use any of the alias analyses in GCC to determine that "b"
> and "c" may-alias "a"?**

... here?  Otherwise for a pointer "b" you can query whether it may-alias
"a" by using ptr_deref_may_alias_decl_p (b, a) or of 'a' is not a decl
but a general reference there is ptr_deref_may_alias_ref_p_1
(not exported - there wasn't any need sofar).

> I compiled my example and placed an pass to experiment with alias
> analysis at link time. (I include the patch at the end). This is the
> gimple produced by the example above.
>
> main (int argc, char * * argv)
> {
>int * c;
>int * b;
>int a;
>int D.4170;
>int * iftmp.0;
>int * iftmp.0_1;
>int * iftmp.0_3;
>int * iftmp.0_4;
>int _9;
>
> :
>if (argc_2(D) > 2)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>iftmp.0_4 = &a;
>goto ; [INV]
>
> :
>iftmp.0_3 = 0B;
>
> :
># iftmp.0_1 = PHI 
>b_5 = iftmp.0_1;
>c_6 = b_5;
>a ={v} {CLOBBER};
>_9 = 0;
>
> :
> :
>return _9;
>
> }
>
> I include this example because looking at the Alias Analysis [0]
> section, it mentions memory SSA form. But I do not see any #.MEM_n
> assignments.

You need to dump with the -vops modifier (to get virtual operands dumped).
And you can use the -alias modifier to dump points-to results.

> Furthermore, I made an equivalent code to the example of Memory SSA form
> and I still don't see any Memory SSA forms:
>
> ```c
> int i;
> int foo()
> {
>i = 1;
>return i;
> }
> ```
>
> ```gimple
> foo ()
> {
>int D.4164;
>int _3;
>
> :
>i = 1;
>_3 = i;
>
> :
> :
>return _3;
>
> }
> ```
>
> So, I am not sure how the gimple shown on the Alias-analysis page is
> produced. **Does anyone know why the gimple produced is not showing the
> virtual SSA names?**
>
> Afterwards, instead of looking at the virtual SSA names, I then focused
> on finding out whether SSA_NAME_PTR_INFO but I found that it was not
> set. **Do I need I need to run something to make sure that
> SSA_NAME_PTR_INFO is set?** Maybe the example I chose for compilation
> did not trigger the path for setting SSA_NAME_PTR_INFO. What would be an
> example of some code that does set SSA_NAME_PTR_INFO?

SSA_NAME_PTR_INFO is computed by points-to analysis, for a simple
IPA pass run at LTRANS time that will not be computed yet (it is not
streamed into the IL because it's not in a convenient form and it can be
and is re-computed early enough - just not for you ;)).  Without LTO
the info should be still there from the early optimization pipeline computation.

Hope this helps,
Richard.

> Here is the patch, and in order to compile an example and dump the log:
>
> /path/to/gcc -fdump-ipa-hello-world -fipa-hello-world a.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 543b477ff18..bc1af09cbf8 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1399,6 +1399,7 @@ OBJS = \
> incpath.o \
> init-regs.o \
> internal-fn.o \
> +   ipa-hello-world.o \
> ipa-cp.o \
> ipa-sra.o \
> ipa-devirt.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d33383b523c..09cabeb114d 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3408,4 +3408,8 @@ fipa-ra
>   Common Report Var(flag_ipa_ra) Optimization
>   Use caller save register across calls if possible.
>
> +fipa-hello-world
> +Common Report Var(flag_ipa_hello_world) Optimization
> +TBD
> +
>   ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/ipa-hello-world.c b/gcc/ipa-hello-world.c
> new file mode 100644
> index 000..00e276a4bd7
> --- /dev/null
> +++ b/gcc/ipa-hello-world.c
> @@ -0,0 +1,126 @@
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple-expr.h"
> +#include "predict.h"
> +#include "alloc-pool.h"
> +#include "tree-pass.h"
> +#include "cgraph.h"
> +#include "diagnostic.h"
> +#include "fold-const.h"
> +#include "gimple-fold.h"
> +#include "symbol-summary.

Re: Question about alias or points-to analysis

2020-05-06 Thread Richard Biener via Gcc
On Wed, May 6, 2020 at 3:04 PM Erick Ochoa
 wrote:
>
>
>
> On 06/05/2020 14:25, Richard Biener wrote:
> > On Wed, May 6, 2020 at 12:26 PM Erick Ochoa
> >  wrote:
> >>
> >> Hi,
> >>
> >> I am trying to find out how to use the alias and/or points-to analysis
> >> in GCC. Ideally, I would like to find a function that given an
> >> allocation site, the return value is a set of pointers which may point
> >> to memory allocated from that allocation site.
> >>
> >> For example:
> >>
> >> int
> >> main(int argc, char** argv)
> >> {
> >> int a;
> >> int *b = argc > 2 ? &a : NULL;
> >> int *c = b;
> >> }
> >>
> >> Here, querying the allocation site corresponding to the declaration of
> >> local variable "a", should return { "b",  "c" }.
> >
> > So that's a "reverse query" to that you are asking for below ...
> >
> >> I've found the following documentation on Alias-Analysis [0] and two
> >> source files[1][2] which seem to implement some (distinct?) alias analysis.
> >>
> >> I am trying to keep the discussion a bit high level, otherwise I would
> >> have a lot of questions, but given this C example, **how would someone
> >> be able to use any of the alias analyses in GCC to determine that "b"
> >> and "c" may-alias "a"?**
> >
> > ... here?  Otherwise for a pointer "b" you can query whether it may-alias
> > "a" by using ptr_deref_may_alias_decl_p (b, a) or of 'a' is not a decl
> > but a general reference there is ptr_deref_may_alias_ref_p_1
> > (not exported - there wasn't any need sofar).
>
> Thanks Richard. I'll look into the Tree alias-oracle API.
>
> >
> >> I compiled my example and placed an pass to experiment with alias
> >> analysis at link time. (I include the patch at the end). This is the
> >> gimple produced by the example above.
> >>
> >> main (int argc, char * * argv)
> >> {
> >> int * c;
> >> int * b;
> >> int a;
> >> int D.4170;
> >> int * iftmp.0;
> >> int * iftmp.0_1;
> >> int * iftmp.0_3;
> >> int * iftmp.0_4;
> >> int _9;
> >>
> >>  :
> >> if (argc_2(D) > 2)
> >>   goto ; [INV]
> >> else
> >>   goto ; [INV]
> >>
> >>  :
> >> iftmp.0_4 = &a;
> >> goto ; [INV]
> >>
> >>  :
> >> iftmp.0_3 = 0B;
> >>
> >>  :
> >> # iftmp.0_1 = PHI 
> >> b_5 = iftmp.0_1;
> >> c_6 = b_5;
> >> a ={v} {CLOBBER};
> >> _9 = 0;
> >>
> >>  :
> >> :
> >> return _9;
> >>
> >> }
> >>
> >> I include this example because looking at the Alias Analysis [0]
> >> section, it mentions memory SSA form. But I do not see any #.MEM_n
> >> assignments.
> >
> > You need to dump with the -vops modifier (to get virtual operands dumped).
> > And you can use the -alias modifier to dump points-to results.
> >  >> Furthermore, I made an equivalent code to the example of Memory SSA form
> >> and I still don't see any Memory SSA forms:
> >>
> >> ```c
> >> int i;
> >> int foo()
> >> {
> >> i = 1;
> >> return i;
> >> }
> >> ```
> >>
> >> ```gimple
> >> foo ()
> >> {
> >> int D.4164;
> >> int _3;
> >>
> >>  :
> >> i = 1;
> >> _3 = i;
> >>
> >>  :
> >> :
> >> return _3;
> >>
> >> }
> >> ```
> >>
> >> So, I am not sure how the gimple shown on the Alias-analysis page is
> >> produced. **Does anyone know why the gimple produced is not showing the
> >> virtual SSA names?**
> >>
> >> Afterwards, instead of looking at the virtual SSA names, I then focused
> >> on finding out whether SSA_NAME_PTR_INFO but I found that it was not
> >> set. **Do I need I need to run something to make sure that
> >> SSA_NAME_PTR_INFO is set?** Maybe the example I chose for compilation
> >> did not trigger the path for setting SSA_NAME_PTR_INFO. What would be an
> >> example of some code that does set SSA_NAME_PTR_INFO?
> >
> > SSA_NAME_PTR_INFO is computed by points-to analysis, for a simple
> > IPA pass run at LTRANS time that will not be computed yet (it is not
> > streamed into the IL because it's not in a convenient form and it can be
> > and is re-computed early enough - just not for you ;)).  Without LTO
> > the info should be still there from the early optimization pipeline 
> > computation.
> >
>
> So, does this mean that there's no alias information available at LTO?
> Or are you saying that I should store alias information at LGEN time and
> use it at WPA time to make my transformation plan and finally transform
> at LTRANS time?

There is no points-to information available during WPA.  There is no
points-to information during LTRANS until pass_build_alias is run
which is before the first user (if you exclude your simple IPA pass).

If you need points-to information at WPA you either need to compute it
(it looks like you need to stream in function bodies anyway to use it,
so you could simply call compute_may_aliases on each function) or
what seems to be a better strathegy try to compute as much as
possible during IPA analysis and do final verification at LTRANS stage.
If that's possible of course depends on the exact t

Re: Multilibs in stage-1

2020-05-06 Thread Richard Biener via Gcc
On May 6, 2020 11:15:08 PM GMT+02:00, Uros Bizjak via Gcc  
wrote:
>Hello!
>
>I wonder, if the build process really needs to build all multilibs in
>stage-1 bootstrap build. IIRC, stage-1 uses system compiler to build
>stage-1 gcc, so there is no need for multilibs, apart from library
>that will be used by stage-1 gcc during compilation of stage-2
>compiler.

Correct. Only stage3 needs those. But IIRC we already avoid building them? 
Likewise we avoid building libsanitizer and friends in stage 1/2 unless ubsan 
bootstrap is enabled. 

>Uros.



Re: Multilibs in stage-1

2020-05-07 Thread Richard Biener via Gcc
On Thu, May 7, 2020 at 8:25 AM Uros Bizjak  wrote:
>
> On Thu, May 7, 2020 at 8:16 AM Richard Biener
>  wrote:
> >
> > On May 6, 2020 11:15:08 PM GMT+02:00, Uros Bizjak via Gcc  
> > wrote:
> > >Hello!
> > >
> > >I wonder, if the build process really needs to build all multilibs in
> > >stage-1 bootstrap build. IIRC, stage-1 uses system compiler to build
> > >stage-1 gcc, so there is no need for multilibs, apart from library
> > >that will be used by stage-1 gcc during compilation of stage-2
> > >compiler.
> >
> > Correct. Only stage3 needs those. But IIRC we already avoid building them? 
> > Likewise we avoid building libsanitizer and friends in stage 1/2 unless 
> > ubsan bootstrap is enabled.
>
> Looking at:
>
> [gcc-build]$ ls stage1-x86_64-pc-linux-gnu/32/
> libgcc  libgomp  libstdc++-v3
>
> it seems that 32bit multilibs are built anyway, also in stage2:
>
> [gcc-build]$ ls prev-x86_64-pc-linux-gnu/32/
> libgcc  libgomp  libstdc++-v3

Hmm.  IIRC it required special-handling in the individual libs - Jakub
may remeber (IIRC
he implemented short-cutting libsanitizer builds)

For libstdc++ there's also a bugreport I opened at some point - we're
using the target
runtime for the host when not cross-compiling but it would be better
to build the host
libraries only - target libs need not be bootstrapped and host ones
are not multilib.

Richard.

> Uros.


Re: Question about alias or points-to analysis

2020-05-07 Thread Richard Biener via Gcc
On Wed, May 6, 2020 at 9:25 PM Erick Ochoa
 wrote:
>
>
>
> On 06/05/2020 18:40, Richard Biener wrote:
> > On Wed, May 6, 2020 at 3:04 PM Erick Ochoa
> >  wrote:
> >>
> >>
> >>
> >> On 06/05/2020 14:25, Richard Biener wrote:
> >>> On Wed, May 6, 2020 at 12:26 PM Erick Ochoa
> >>>  wrote:
> 
>  Hi,
> 
>  I am trying to find out how to use the alias and/or points-to analysis
>  in GCC. Ideally, I would like to find a function that given an
>  allocation site, the return value is a set of pointers which may point
>  to memory allocated from that allocation site.
> 
>  For example:
> 
>  int
>  main(int argc, char** argv)
>  {
>   int a;
>   int *b = argc > 2 ? &a : NULL;
>   int *c = b;
>  }
> 
>  Here, querying the allocation site corresponding to the declaration of
>  local variable "a", should return { "b",  "c" }.
> >>>
> >>> So that's a "reverse query" to that you are asking for below ...
> >>>
>  I've found the following documentation on Alias-Analysis [0] and two
>  source files[1][2] which seem to implement some (distinct?) alias 
>  analysis.
> 
>  I am trying to keep the discussion a bit high level, otherwise I would
>  have a lot of questions, but given this C example, **how would someone
>  be able to use any of the alias analyses in GCC to determine that "b"
>  and "c" may-alias "a"?**
> >>>
> >>> ... here?  Otherwise for a pointer "b" you can query whether it may-alias
> >>> "a" by using ptr_deref_may_alias_decl_p (b, a) or of 'a' is not a decl
> >>> but a general reference there is ptr_deref_may_alias_ref_p_1
> >>> (not exported - there wasn't any need sofar).
> >>
> >> Thanks Richard. I'll look into the Tree alias-oracle API.
> >>
> >>>
>  I compiled my example and placed an pass to experiment with alias
>  analysis at link time. (I include the patch at the end). This is the
>  gimple produced by the example above.
> 
>  main (int argc, char * * argv)
>  {
>   int * c;
>   int * b;
>   int a;
>   int D.4170;
>   int * iftmp.0;
>   int * iftmp.0_1;
>   int * iftmp.0_3;
>   int * iftmp.0_4;
>   int _9;
> 
>    :
>   if (argc_2(D) > 2)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>    :
>   iftmp.0_4 = &a;
>   goto ; [INV]
> 
>    :
>   iftmp.0_3 = 0B;
> 
>    :
>   # iftmp.0_1 = PHI 
>   b_5 = iftmp.0_1;
>   c_6 = b_5;
>   a ={v} {CLOBBER};
>   _9 = 0;
> 
>    :
>  :
>   return _9;
> 
>  }
> 
>  I include this example because looking at the Alias Analysis [0]
>  section, it mentions memory SSA form. But I do not see any #.MEM_n
>  assignments.
> >>>
> >>> You need to dump with the -vops modifier (to get virtual operands dumped).
> >>> And you can use the -alias modifier to dump points-to results.
> >>>   >> Furthermore, I made an equivalent code to the example of Memory SSA 
> >>> form
>  and I still don't see any Memory SSA forms:
> 
>  ```c
>  int i;
>  int foo()
>  {
>   i = 1;
>   return i;
>  }
>  ```
> 
>  ```gimple
>  foo ()
>  {
>   int D.4164;
>   int _3;
> 
>    :
>   i = 1;
>   _3 = i;
> 
>    :
>  :
>   return _3;
> 
>  }
>  ```
> 
>  So, I am not sure how the gimple shown on the Alias-analysis page is
>  produced. **Does anyone know why the gimple produced is not showing the
>  virtual SSA names?**
> 
>  Afterwards, instead of looking at the virtual SSA names, I then focused
>  on finding out whether SSA_NAME_PTR_INFO but I found that it was not
>  set. **Do I need I need to run something to make sure that
>  SSA_NAME_PTR_INFO is set?** Maybe the example I chose for compilation
>  did not trigger the path for setting SSA_NAME_PTR_INFO. What would be an
>  example of some code that does set SSA_NAME_PTR_INFO?
> >>>
> >>> SSA_NAME_PTR_INFO is computed by points-to analysis, for a simple
> >>> IPA pass run at LTRANS time that will not be computed yet (it is not
> >>> streamed into the IL because it's not in a convenient form and it can be
> >>> and is re-computed early enough - just not for you ;)).  Without LTO
> >>> the info should be still there from the early optimization pipeline 
> >>> computation.
> >>>
> >>
> >> So, does this mean that there's no alias information available at LTO?
> >> Or are you saying that I should store alias information at LGEN time and
> >> use it at WPA time to make my transformation plan and finally transform
> >> at LTRANS time?
> >
> > There is no points-to information available during WPA.  There is no
> > points-to information during LTRANS until pass_build

Re: Multilibs in stage-1

2020-05-07 Thread Richard Biener via Gcc
On Thu, May 7, 2020 at 9:24 AM Jakub Jelinek  wrote:
>
> On Thu, May 07, 2020 at 09:02:58AM +0200, Richard Biener wrote:
> > Hmm.  IIRC it required special-handling in the individual libs - Jakub
> > may remeber (IIRC
> > he implemented short-cutting libsanitizer builds)
>
> Just fuzzy memories, but I think the libsanitizer case was that it the build
> of that is extremely slow due to the huge sources.
>
> > For libstdc++ there's also a bugreport I opened at some point - we're
> > using the target
> > runtime for the host when not cross-compiling but it would be better
> > to build the host
> > libraries only - target libs need not be bootstrapped and host ones
> > are not multilib.
>
> Perhaps not for stage1, but don't we want multilibs for stage2 so that
> we compare not just the primary multilib, but also objects of the other
> multilib as a way to e.g. catch -m32 only related issues?

We don't compare target libs at all, just host objects ...

Richard.

> Jakub
>


Re: performance of exception handling

2020-05-12 Thread Richard Biener via Gcc
On Tue, May 12, 2020 at 8:14 AM Thomas Neumann via Gcc  wrote:
>
> > Not all GCC/G++ targets are GNU/Linux and use GLIBC.  A duplicate
> > implementation in GLIBC creates its own set of advantages and
> > disadvantages.
>
> so what should I do now? Should I try to move the lookup into GLIBC? Or
> handled it within libgcc, as I had originally proposed? Or give up due
> to the inertia of a large, grown system?
>
> Another concern is memory consumption. I wanted to store the FDE entries
> in a b-tree, which allows for fast lookup and low overhead
> synchronization. Memory wise that is not really worse than what we have
> today (the "linear" and "erratic" arrays). But the current code has a
> fallback for when it is unable to allocate these arrays, falling back to
> linear search. Is something like that required? It would make the code
> much more complicated (but I got from Moritz mail that some people
> really care about memory constrained situations).

Some people use exceptions to propagate "low memory" up which
made me increase the size of the EH emergency pool (which is
used when malloc cannot even allocate the EH data itself) ...

So yes, people care.  There absolutely has to be a path in
unwinding that allocates no (as little as possible) memory.

Richard.

> Thomas


Re: how to find variable related to a virtual ssa name

2020-05-12 Thread Richard Biener via Gcc
On Tue, May 12, 2020 at 2:44 PM 易会战 via Gcc  wrote:
>
> hi, I am working on gcc ssa name. For each function, we can traverse all 
> defined ssa name by macro FOR_EACH_SSA_NAME. If a ssa name is default 
> definition for a symbol (check SSA_NAME_IS_DEFAULT_DEF) , I can get the 
> symbol by SSA_NAME_VAR. But for a virtual DEFAULT DEF, I cannot get it, 
> SSA_NAME_VAR return a identifier named .MEM. I cannot find which variable 
> related to the default definition. Why and how I should find the related 
> variable?
>
>
> By the way , I give my current work,  I wish find a MEM_REF refer to 
> global/heap memory or local stack. I try my best to get a correct memory 
> type. Since MEM_REF have a base address, which is often a ssa name. Athough 
> it is not virtual ssa name. But I find just check ssa name data flow is not 
> enough to get the info.
> For example, a malloc function allocate some heap memory and record the 
> address in a global ptr. On gimple ssa IR, the malloc function return a 
> address assigned to a ssa name , then ssa name assign the value to the global 
> ptr. When i check ssa name defined by the global ptr, I donot know if the ptr 
> point to global memory or local memory.
> Please see the gimple code:
> _2 = malloc()
> ptr = _2
> _3 = ptr
> MEM_REF[BASE _3]
> I wish get _3  is a address pointing to global memory. But just from 
> _3=ptr, cannot judge it. 
> I wish memory SSA can help solve the problem.

memory SSA will not solve this problem.  You can instead query
points-to information
on _3 for example by calling ptr_deref_may_alias_global_p (_3) which internally
looks at SSA_NAME_PTR_INFO which contains the solution of the
points-to computation.

Richard.

>
> Or gcc gives the info at other pass? wish get some advice. Thanks a lot.


Re: how to find variable related to a virtual ssa name

2020-05-12 Thread Richard Biener via Gcc
On Tue, May 12, 2020 at 4:16 PM 易会战  wrote:
>
> thanks a lot. I will check your advice.
> Can you give some explaination about memory ssa, and how to use it. I check 
> internal, cannot get it. Maybe you know some examples or some more materials.

memory SSA in GCC is simply a SSA chain of all memory statements local
to a function
with a _single_ underlying variable (.MEM) and thus only one SSA name
live at the same
time.  It can be used to quickly traverse stores via use->def chains
and loads inbetween
two stores via immediate uses.

Richard.

> ---Original---
> From: "Richard Biener"
> Date: Tue, May 12, 2020 22:02 PM
> To: "易会战";
> Cc: "gcc";
> Subject: Re: how to find variable related to a virtual ssa name
>
> On Tue, May 12, 2020 at 2:44 PM 易会战 via Gcc  wrote:
> >
> > hi, I am working on gcc ssa name. For each function, we can traverse all 
> > defined ssa name by macro FOR_EACH_SSA_NAME. If a ssa name is default 
> > definition for a symbol (check SSA_NAME_IS_DEFAULT_DEF) , I can get the 
> > symbol by SSA_NAME_VAR. But for a virtual DEFAULT DEF, I cannot get it, 
> > SSA_NAME_VAR return a identifier named .MEM. I cannot find which variable 
> > related to the default definition. Why and how I should find the related 
> > variable?
> >
> >
> > By the way , I give my current work,  I wish find a MEM_REF refer to 
> > global/heap memory or local stack. I try my best to get a correct memory 
> > type. Since MEM_REF have a base address, which is often a ssa name. Athough 
> > it is not virtual ssa name. But I find just check ssa name data flow is not 
> > enough to get the info.
> > For example, a malloc function allocate some heap memory and record the 
> > address in a global ptr. On gimple ssa IR, the malloc function return a 
> > address assigned to a ssa name , then ssa name assign the value to the 
> > global ptr. When i check ssa name defined by the global ptr, I donot know 
> > if the ptr point to global memory or local memory.
> > Please see the gimple code:
> > _2 = malloc()
> > ptr = _2
> > _3 = ptr
> > MEM_REF[BASE _3]
> > I wish get _3  is a address pointing to global memory. But just from 
> > _3=ptr, cannot judge it. 
> > I wish memory SSA can help solve the problem.
>
> memory SSA will not solve this problem.  You can instead query
> points-to information
> on _3 for example by calling ptr_deref_may_alias_global_p (_3) which 
> internally
> looks at SSA_NAME_PTR_INFO which contains the solution of the
> points-to computation.
>
> Richard.
>
> >
> > Or gcc gives the info at other pass? wish get some advice. Thanks a lot.


Re: how to find variable related to a virtual ssa name

2020-05-13 Thread Richard Biener via Gcc
On Wed, May 13, 2020 at 6:03 AM 易会战  wrote:
>
> It seems the function ptr_deref_may_alias_global_p cannot give right result.
> For example,
> int func(int size, int i)
> {
> int * sum;
> sum = malloc()
> here some code access sum pointing to memory
> return sum[i]
> }
> ptr_deref_may_alias_global_p tell me it is a local memory access. indeed sum 
> is a local variable, but the pointer point to heap memory.
> In fact there is a similiar function ref_may_alias_global_p, and it give 
> similiar result.

GCC can be clever and notice your malloc() result does not escape the function
which means stores to it are dead once you leave it.  For this reason
it does not
mark the memory global.  So make sure the allocated pointer escapes
and try again.

>
>
> ---Original---
> From: "Richard Biener"
> Date: Tue, May 12, 2020 22:20 PM
> To: "易会战";
> Cc: "gcc";
> Subject: Re: how to find variable related to a virtual ssa name
>
> On Tue, May 12, 2020 at 4:16 PM 易会战  wrote:
> >
> > thanks a lot. I will check your advice.
> > Can you give some explaination about memory ssa, and how to use it. I check 
> > internal, cannot get it. Maybe you know some examples or some more 
> > materials.
>
> memory SSA in GCC is simply a SSA chain of all memory statements local
> to a function
> with a _single_ underlying variable (.MEM) and thus only one SSA name
> live at the same
> time.  It can be used to quickly traverse stores via use->def chains
> and loads inbetween
> two stores via immediate uses.
>
> Richard.
>
> > ---Original---
> > From: "Richard Biener"
> > Date: Tue, May 12, 2020 22:02 PM
> > To: "易会战";
> > Cc: "gcc";
> > Subject: Re: how to find variable related to a virtual ssa name
> >
> > On Tue, May 12, 2020 at 2:44 PM 易会战 via Gcc  wrote:
> > >
> > > hi, I am working on gcc ssa name. For each function, we can traverse all 
> > > defined ssa name by macro FOR_EACH_SSA_NAME. If a ssa name is default 
> > > definition for a symbol (check SSA_NAME_IS_DEFAULT_DEF) , I can get the 
> > > symbol by SSA_NAME_VAR. But for a virtual DEFAULT DEF, I cannot get it, 
> > > SSA_NAME_VAR return a identifier named .MEM. I cannot find which variable 
> > > related to the default definition. Why and how I should find the related 
> > > variable?
> > >
> > >
> > > By the way , I give my current work,  I wish find a MEM_REF refer to 
> > > global/heap memory or local stack. I try my best to get a correct memory 
> > > type. Since MEM_REF have a base address, which is often a ssa name. 
> > > Athough it is not virtual ssa name. But I find just check ssa name data 
> > > flow is not enough to get the info.
> > > For example, a malloc function allocate some heap memory and record the 
> > > address in a global ptr. On gimple ssa IR, the malloc function return a 
> > > address assigned to a ssa name , then ssa name assign the value to the 
> > > global ptr. When i check ssa name defined by the global ptr, I donot know 
> > > if the ptr point to global memory or local memory.
> > > Please see the gimple code:
> > > _2 = malloc()
> > > ptr = _2
> > > _3 = ptr
> > > MEM_REF[BASE _3]
> > > I wish get _3  is a address pointing to global memory. But just from 
> > > _3=ptr, cannot judge it. 
> > > I wish memory SSA can help solve the problem.
> >
> > memory SSA will not solve this problem.  You can instead query
> > points-to information
> > on _3 for example by calling ptr_deref_may_alias_global_p (_3) which 
> > internally
> > looks at SSA_NAME_PTR_INFO which contains the solution of the
> > points-to computation.
> >
> > Richard.
> >
> > >
> > > Or gcc gives the info at other pass? wish get some advice. Thanks a lot.


Re: how to find variable related to a virtual ssa name

2020-05-13 Thread Richard Biener via Gcc
On Wed, May 13, 2020 at 11:08 AM 易会战  wrote:
>
> yes, it does not escape the function, but indeed allocate memory on heap. 
> There is much specific method to judge the memory on heap although not escape 
> the function?

Not at the moment.  The info is computed by tree-ssa-structalias.c in
compute_may_aliases,
the pass knows that a variable points to not escaped heap storage but this is
not stored anywhere ready for consumption.  Adding a flag to
pt_solution would be easy though.

Richard.

> ---Original---
> From: "Richard Biener"
> Date: Wed, May 13, 2020 15:00 PM
> To: "易会战";
> Cc: "gcc";
> Subject: Re: how to find variable related to a virtual ssa name
>
> On Wed, May 13, 2020 at 6:03 AM 易会战  wrote:
> >
> > It seems the function ptr_deref_may_alias_global_p cannot give right result.
> > For example,
> > int func(int size, int i)
> > {
> > int * sum;
> > sum = malloc()
> > here some code access sum pointing to memory
> > return sum[i]
> > }
> > ptr_deref_may_alias_global_p tell me it is a local memory access. indeed 
> > sum is a local variable, but the pointer point to heap memory.
> > In fact there is a similiar function ref_may_alias_global_p, and it give 
> > similiar result.
>
> GCC can be clever and notice your malloc() result does not escape the function
> which means stores to it are dead once you leave it.  For this reason
> it does not
> mark the memory global.  So make sure the allocated pointer escapes
> and try again.
>
> >
> >
> > ---Original---
> > From: "Richard Biener"
> > Date: Tue, May 12, 2020 22:20 PM
> > To: "易会战";
> > Cc: "gcc";
> > Subject: Re: how to find variable related to a virtual ssa name
> >
> > On Tue, May 12, 2020 at 4:16 PM 易会战  wrote:
> > >
> > > thanks a lot. I will check your advice.
> > > Can you give some explaination about memory ssa, and how to use it. I 
> > > check internal, cannot get it. Maybe you know some examples or some more 
> > > materials.
> >
> > memory SSA in GCC is simply a SSA chain of all memory statements local
> > to a function
> > with a _single_ underlying variable (.MEM) and thus only one SSA name
> > live at the same
> > time.  It can be used to quickly traverse stores via use->def chains
> > and loads inbetween
> > two stores via immediate uses.
> >
> > Richard.
> >
> > > ---Original---
> > > From: "Richard Biener"
> > > Date: Tue, May 12, 2020 22:02 PM
> > > To: "易会战";
> > > Cc: "gcc";
> > > Subject: Re: how to find variable related to a virtual ssa name
> > >
> > > On Tue, May 12, 2020 at 2:44 PM 易会战 via Gcc  wrote:
> > > >
> > > > hi, I am working on gcc ssa name. For each function, we can traverse 
> > > > all defined ssa name by macro FOR_EACH_SSA_NAME. If a ssa name is 
> > > > default definition for a symbol (check SSA_NAME_IS_DEFAULT_DEF) , I can 
> > > > get the symbol by SSA_NAME_VAR. But for a virtual DEFAULT DEF, I cannot 
> > > > get it, SSA_NAME_VAR return a identifier named .MEM. I cannot find 
> > > > which variable related to the default definition. Why and how I should 
> > > > find the related variable?
> > > >
> > > >
> > > > By the way , I give my current work,  I wish find a MEM_REF refer 
> > > > to global/heap memory or local stack. I try my best to get a correct 
> > > > memory type. Since MEM_REF have a base address, which is often a ssa 
> > > > name. Athough it is not virtual ssa name. But I find just check ssa 
> > > > name data flow is not enough to get the info.
> > > > For example, a malloc function allocate some heap memory and record the 
> > > > address in a global ptr. On gimple ssa IR, the malloc function return a 
> > > > address assigned to a ssa name , then ssa name assign the value to the 
> > > > global ptr. When i check ssa name defined by the global ptr, I donot 
> > > > know if the ptr point to global memory or local memory.
> > > > Please see the gimple code:
> > > > _2 = malloc()
> > > > ptr = _2
> > > > _3 = ptr
> > > > MEM_REF[BASE _3]
> > > > I wish get _3  is a address pointing to global memory. But just 
> > > > from _3=ptr, cannot judge it. 
> > > > I wish memory SSA can help solve the problem.
> > >
> > > memory SSA will not solve this problem.  You can instead query
> > > points-to information
> > > on _3 for example by calling ptr_deref_may_alias_global_p (_3) which 
> > > internally
> > > looks at SSA_NAME_PTR_INFO which contains the solution of the
> > > points-to computation.
> > >
> > > Richard.
> > >
> > > >
> > > > Or gcc gives the info at other pass? wish get some advice. Thanks a lot.


Re: Automatically generated ChangeLog files - PHASE 1

2020-05-13 Thread Richard Biener via Gcc
On Wed, May 13, 2020 at 11:27 AM Martin Liška  wrote:
>
> On 5/13/20 10:16 AM, Richard Sandiford wrote:
> > As far as this particular example goes, shouldn't the "testsuite/" line
> > be dropped from the above?
>
> Good point. Fixes now with:
>
> $ ./git_email.py 
> patches/0020-IPA-Avoid-segfault-in-devirtualization_time_bonus-PR.patch
> Errors:
> first line should start with a tab, asterisk and space:"testsuite/"

Hmm, it's OK in the commit but it should be omitted in the
ChangeLog files.

Richard.

> Martin


Re: how to find variable related to a virtual ssa name

2020-05-13 Thread Richard Biener via Gcc
On Wed, May 13, 2020 at 11:38 AM 易会战  wrote:
>
> now I am working on gcc-9.3, can you give the specific code location to check 
> not escaped heap? I try to add a flag.

set_uids_in_ptset

> ---Original---
> From: "Richard Biener"
> Date: Wed, May 13, 2020 17:28 PM
> To: "易会战";
> Cc: "gcc";
> Subject: Re: how to find variable related to a virtual ssa name
>
> On Wed, May 13, 2020 at 11:08 AM 易会战  wrote:
> >
> > yes, it does not escape the function, but indeed allocate memory on heap. 
> > There is much specific method to judge the memory on heap although not 
> > escape the function?
>
> Not at the moment.  The info is computed by tree-ssa-structalias.c in
> compute_may_aliases,
> the pass knows that a variable points to not escaped heap storage but this is
> not stored anywhere ready for consumption.  Adding a flag to
> pt_solution would be easy though.
>
> Richard.
>
> > ---Original---
> > From: "Richard Biener"
> > Date: Wed, May 13, 2020 15:00 PM
> > To: "易会战";
> > Cc: "gcc";
> > Subject: Re: how to find variable related to a virtual ssa name
> >
> > On Wed, May 13, 2020 at 6:03 AM 易会战  wrote:
> > >
> > > It seems the function ptr_deref_may_alias_global_p cannot give right 
> > > result.
> > > For example,
> > > int func(int size, int i)
> > > {
> > > int * sum;
> > > sum = malloc()
> > > here some code access sum pointing to memory
> > > return sum[i]
> > > }
> > > ptr_deref_may_alias_global_p tell me it is a local memory access. indeed 
> > > sum is a local variable, but the pointer point to heap memory.
> > > In fact there is a similiar function ref_may_alias_global_p, and it give 
> > > similiar result.
> >
> > GCC can be clever and notice your malloc() result does not escape the 
> > function
> > which means stores to it are dead once you leave it.  For this reason
> > it does not
> > mark the memory global.  So make sure the allocated pointer escapes
> > and try again.
> >
> > >
> > >
> > > ---Original---
> > > From: "Richard Biener"
> > > Date: Tue, May 12, 2020 22:20 PM
> > > To: "易会战";
> > > Cc: "gcc";
> > > Subject: Re: how to find variable related to a virtual ssa name
> > >
> > > On Tue, May 12, 2020 at 4:16 PM 易会战  wrote:
> > > >
> > > > thanks a lot. I will check your advice.
> > > > Can you give some explaination about memory ssa, and how to use it. I 
> > > > check internal, cannot get it. Maybe you know some examples or some 
> > > > more materials.
> > >
> > > memory SSA in GCC is simply a SSA chain of all memory statements local
> > > to a function
> > > with a _single_ underlying variable (.MEM) and thus only one SSA name
> > > live at the same
> > > time.  It can be used to quickly traverse stores via use->def chains
> > > and loads inbetween
> > > two stores via immediate uses.
> > >
> > > Richard.
> > >
> > > > ---Original---
> > > > From: "Richard Biener"
> > > > Date: Tue, May 12, 2020 22:02 PM
> > > > To: "易会战";
> > > > Cc: "gcc";
> > > > Subject: Re: how to find variable related to a virtual ssa name
> > > >
> > > > On Tue, May 12, 2020 at 2:44 PM 易会战 via Gcc  wrote:
> > > > >
> > > > > hi, I am working on gcc ssa name. For each function, we can traverse 
> > > > > all defined ssa name by macro FOR_EACH_SSA_NAME. If a ssa name is 
> > > > > default definition for a symbol (check SSA_NAME_IS_DEFAULT_DEF) , I 
> > > > > can get the symbol by SSA_NAME_VAR. But for a virtual DEFAULT DEF, I 
> > > > > cannot get it, SSA_NAME_VAR return a identifier named .MEM. I cannot 
> > > > > find which variable related to the default definition. Why and how I 
> > > > > should find the related variable?
> > > > >
> > > > >
> > > > > By the way , I give my current work,  I wish find a MEM_REF 
> > > > > refer to global/heap memory or local stack. I try my best to get a 
> > > > > correct memory type. Since MEM_REF have a base address, which is 
> > > > > often a ssa name. Athough it is not virtual ssa name. But I find just 
> > > > > check ssa name data flow is not enough to get the info.
> > > > > For example, a malloc function allocate some heap memory and record 
> > > > > the address in a global ptr. On gimple ssa IR, the malloc function 
> > > > > return a address assigned to a ssa name , then ssa name assign the 
> > > > > value to the global ptr. When i check ssa name defined by the global 
> > > > > ptr, I donot know if the ptr point to global memory or local memory.
> > > > > Please see the gimple code:
> > > > > _2 = malloc()
> > > > > ptr = _2
> > > > > _3 = ptr
> > > > > MEM_REF[BASE _3]
> > > > > I wish get _3  is a address pointing to global memory. But just 
> > > > > from _3=ptr, cannot judge it. 
> > > > > I wish memory SSA can help solve the problem.
> > > >
> > > > memory SSA will not solve this problem.  You can instead query
> > > > points-to information
> > > > on _3 for example by calling ptr_deref_may_alias_global_p (_3) which 
> > > > internally
> > > > looks at SSA_NAME_PTR_INFO which contains the solution of the
> > > > points-to computation.
> > > >
> > > > Rich

Re: how to find variable related to a virtual ssa name

2020-05-14 Thread Richard Biener via Gcc
On Thu, May 14, 2020 at 6:00 AM 易会战  wrote:
>
> There are some other cases that I cannot get right answer.
> case1: interproceduure
> func(int*arg)
> {
> return arg[0] + arg[1]
> }
> func2()
> {
> int a[10]
> return func(a);
> }
> here func cannot tell arg is local var.
>
> case 2:  global array point to local
> int *array[3]
> int func(int x)
> {
> int sub1[10];
> int sub2[10];
> int sub3[10];
> array[0] = sub1;
> array[1]=sub2;
> array[2]=sub3;
> then refer to array by array[x][y]
> }
> here i refer to local var, but the points-to cannnot give right answer.
>
> I do not know if this is the points-to analysis problem, or improper use it.

GCCs analysis is not powerful enough to provide "right" answers you
are seeking for.  GCCs analysis provides conservative correct answers
for the users it has though which is alias analysis.

Richard.

> ---Original---
> From: "Richard Biener"
> Date: Wed, May 13, 2020 19:10 PM
> To: "易会战";
> Cc: "gcc";
> Subject: Re: how to find variable related to a virtual ssa name
>
> On Wed, May 13, 2020 at 11:38 AM 易会战  wrote:
> >
> > now I am working on gcc-9.3, can you give the specific code location to 
> > check not escaped heap? I try to add a flag.
>
> set_uids_in_ptset
>
> > ---Original---
> > From: "Richard Biener"
> > Date: Wed, May 13, 2020 17:28 PM
> > To: "易会战";
> > Cc: "gcc";
> > Subject: Re: how to find variable related to a virtual ssa name
> >
> > On Wed, May 13, 2020 at 11:08 AM 易会战  wrote:
> > >
> > > yes, it does not escape the function, but indeed allocate memory on heap. 
> > > There is much specific method to judge the memory on heap although not 
> > > escape the function?
> >
> > Not at the moment.  The info is computed by tree-ssa-structalias.c in
> > compute_may_aliases,
> > the pass knows that a variable points to not escaped heap storage but this 
> > is
> > not stored anywhere ready for consumption.  Adding a flag to
> > pt_solution would be easy though.
> >
> > Richard.
> >
> > > ---Original---
> > > From: "Richard Biener"
> > > Date: Wed, May 13, 2020 15:00 PM
> > > To: "易会战";
> > > Cc: "gcc";
> > > Subject: Re: how to find variable related to a virtual ssa name
> > >
> > > On Wed, May 13, 2020 at 6:03 AM 易会战  wrote:
> > > >
> > > > It seems the function ptr_deref_may_alias_global_p cannot give right 
> > > > result.
> > > > For example,
> > > > int func(int size, int i)
> > > > {
> > > > int * sum;
> > > > sum = malloc()
> > > > here some code access sum pointing to memory
> > > > return sum[i]
> > > > }
> > > > ptr_deref_may_alias_global_p tell me it is a local memory access. 
> > > > indeed sum is a local variable, but the pointer point to heap memory.
> > > > In fact there is a similiar function ref_may_alias_global_p, and it 
> > > > give similiar result.
> > >
> > > GCC can be clever and notice your malloc() result does not escape the 
> > > function
> > > which means stores to it are dead once you leave it.  For this reason
> > > it does not
> > > mark the memory global.  So make sure the allocated pointer escapes
> > > and try again.
> > >
> > > >
> > > >
> > > > ---Original---
> > > > From: "Richard Biener"
> > > > Date: Tue, May 12, 2020 22:20 PM
> > > > To: "易会战";
> > > > Cc: "gcc";
> > > > Subject: Re: how to find variable related to a virtual ssa name
> > > >
> > > > On Tue, May 12, 2020 at 4:16 PM 易会战  wrote:
> > > > >
> > > > > thanks a lot. I will check your advice.
> > > > > Can you give some explaination about memory ssa, and how to use it. I 
> > > > > check internal, cannot get it. Maybe you know some examples or some 
> > > > > more materials.
> > > >
> > > > memory SSA in GCC is simply a SSA chain of all memory statements local
> > > > to a function
> > > > with a _single_ underlying variable (.MEM) and thus only one SSA name
> > > > live at the same
> > > > time.  It can be used to quickly traverse stores via use->def chains
> > > > and loads inbetween
> > > > two stores via immediate uses.
> > > >
> > > > Richard.
> > > >
> > > > > ---Original---
> > > > > From: "Richard Biener"
> > > > > Date: Tue, May 12, 2020 22:02 PM
> > > > > To: "易会战";
> > > > > Cc: "gcc";
> > > > > Subject: Re: how to find variable related to a virtual ssa name
> > > > >
> > > > > On Tue, May 12, 2020 at 2:44 PM 易会战 via Gcc  wrote:
> > > > > >
> > > > > > hi, I am working on gcc ssa name. For each function, we can 
> > > > > > traverse all defined ssa name by macro FOR_EACH_SSA_NAME. If a ssa 
> > > > > > name is default definition for a symbol (check 
> > > > > > SSA_NAME_IS_DEFAULT_DEF) , I can get the symbol by SSA_NAME_VAR. 
> > > > > > But for a virtual DEFAULT DEF, I cannot get it, SSA_NAME_VAR return 
> > > > > > a identifier named .MEM. I cannot find which variable related to 
> > > > > > the default definition. Why and how I should find the related 
> > > > > > variable?
> > > > > >
> > > > > >
> > > > > > By the way , I give my current work,  I wish find a MEM_REF 
> > > > > > refer to global/heap memory or local stack. I try my best to get

Re: Writing automated tests for the GCC driver

2020-05-22 Thread Richard Biener via Gcc
On Thu, May 21, 2020 at 11:00 PM Giuliano Belinassi via Gcc
 wrote:
>
> Hi, all.
>
> GCC have a extensive testsuite, that is no news at all. However they are
> focused on the compiler (cc1*) or in libraries, and I can't find tests
> related to the GCC driver.
>
> Are there tests to the GCC driver? If yes, is there any docs about how
> to write them?

I think all tests to the driver eventually exercise a compiler in the end.
One obvious I can find is gcc.dg/driver-specs.c which tests
-specs=not-a-file properly diagnoses the missing file.

So the question back would be what kind of "driver" tests do you have?
That is, what makes them not, for example, "C compiler driven by driver" tests?

Thanks,
Richard.

> Thank you,
> Giuliano.


Re: Writing automated tests for the GCC driver

2020-05-25 Thread Richard Biener via Gcc
On Mon, May 25, 2020 at 4:37 PM Giuliano Belinassi
 wrote:
>
> Hi,
>
> On 05/22, Richard Biener wrote:
> > On Thu, May 21, 2020 at 11:00 PM Giuliano Belinassi via Gcc
> >  wrote:
> > >
> > > Hi, all.
> > >
> > > GCC have a extensive testsuite, that is no news at all. However they are
> > > focused on the compiler (cc1*) or in libraries, and I can't find tests
> > > related to the GCC driver.
> > >
> > > Are there tests to the GCC driver? If yes, is there any docs about how
> > > to write them?
> >
> > I think all tests to the driver eventually exercise a compiler in the end.
> > One obvious I can find is gcc.dg/driver-specs.c which tests
> > -specs=not-a-file properly diagnoses the missing file.
>
> Yes, but that does not cover all driver features.
>
> >
> > So the question back would be what kind of "driver" tests do you have?
> > That is, what makes them not, for example, "C compiler driven by driver" 
> > tests?
>
> GCC driver supports several modes which are required for bootstrapping
> but there is no quick automated test for it. For instance.
>
> gcc a.c b.o -o a.out
> gcc a.c b.c
> gcc a.S
>
> and so on. So if you do some radical change to the GCC driver, making
> sure everything is correct get somewhat painful because you have to do
> a clean bootstrap and find out what is going wrong. If we had a
> dedicated testsuite for that, bootstrap could be avoided for
> development and only done at a last time, reducing development time.

Hmm, indeed.  I don't think we have specific tests for all variants above.
Most of them get exercised one way or another of course but the testsuite
usually separates the compile and link steps.

I would think writing a dedicated dejagnu .exp file would be possible here,
sth in a new gcc.dg/driver/, have a driver.exp driving things and using
test files in that directory.  Sharing stuff from testsuite/lib to find the
driver and figure how to invoke it, gcc-dg.exp has gcc-dg-test
which might be enough to drive things.

> Thank you,
> Giuliano.
>
> >
> > Thanks,
> > Richard.
> >
> > > Thank you,
> > > Giuliano.


Re: [1-800-GIT-HELP] Backporting a series of commits into a combined commit?

2020-06-02 Thread Richard Biener via Gcc
On Mon, Jun 1, 2020 at 2:17 PM Thomas Koenig via Fortran
 wrote:
>
> Hi Martin,
>
> > For now, I would recommend doing 1:1 backports. Otherwise, you'll need
> > to merge
> > all ChangeLog entries in a format the server hook accepts. That can
> > require some
> > work.
>
> If the first commit caused a regression, which the second one fixed,
> this would keep the first regression, right?  Is that what we want?

IMHO squashing is preferred.  Is it really so hard to do that?  You'll
get concatenated ChangeLogs you have to merge manually, sure.
So what?  Just do it ;)  Maybe the scripts even accept "duplicates"
and you only have to edit out the duplicate date/author headers.

Richard.

> Regards
>
> Thomas


Re: Question about comparing function function decls

2020-06-05 Thread Richard Biener via Gcc
On Thu, Jun 4, 2020 at 10:24 PM Gary Oblock via Gcc  wrote:
>
> 
>
>
>
> I'm trying to determine during LTO optimization (with one partition)
> whether of not a function call is to a function in the partition.
>
> Here is the routine I've written. Note, I'm willing to admit up front
> that the comparison below ( ) is probably dicey.

I think you should simply lookup the callgraph edge for the call_stmt
in the caller via caller_graph_node->get_edge (call_stmt).

> ---
> static bool
> is_user_function ( gimple *call_stmt)
> {
>   tree fndecl = gimple_call_fndecl ( call_stmt);
>
>   DEBUG_L("is_user_function: decl in: %p,", fndecl);
>   DEBUG_F( print_generic_decl, stderr, fndecl, (dump_flags_t)-1);
>   DEBUG("\n");
>   INDENT(2);
>
>   cgraph_node* node;
>   bool ret_val = false;
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
> DEBUG_L("decl %p,", node->decl);
> DEBUG_F( print_generic_decl, stderr, node->decl, (dump_flags_t)-1);
> DEBUG("\n");
>
> if ( node->decl == fndecl )
>   {
> ret_val = true;
> break;
>   }
>   }
>
>   INDENT(-2);
>   return ret_val;
> }
> ---
>
> Here's the test program I was compiling.
>
> -- aux.h --
> #include "stdlib.h"
> typedef struct type type_t;
> struct type {
>   int i;
>   double x;
> };
>
> #define MAX(x,y) ((x)>(y) ? (x) : (y))
>
> extern int max1( type_t *, size_t);
> extern double max2( type_t *, size_t);
> extern type_t *setup( size_t);
> -- aux.c --
> #include "aux.h"
> #include "stdlib.h"
>
> type_t *
> setup( size_t size)
> {
>   type_t *data = (type_t *)malloc( size * sizeof(type_t));
>   size_t i;
>   for( i = 0; i < size; i++ ) {
> data[i].i = rand();
> data[i].x = drand48();
>   }
>   return data;
> }
>
> int
> max1( type_t *array, size_t len)
> {
>   size_t i;
>   int result = array[0].i;
>   for( i = 1; i < len; i++  ) {
> result = MAX( array[i].i, result);
>   }
>   return result;
> }
>
> double
> max2( type_t *array, size_t len)
> {
>   size_t i;
>   double result = array[0].x;
>   for( i = 1; i < len; i++  ) {
> result = MAX( array[i].x, result);
>   }
>   return result;
> }
> -- main.c -
> #include "aux.h"
> #include "stdio.h"
>
> type_t *data1;
>
> int
> main(void)
> {
>   type_t *data2 = setup(200);
>   data1 = setup(100);
>
>   printf("First %d\n" , max1(data1,100));
>   printf("Second %e\n", max2(data2,200));
> }
> ---
>
> The output follows:
>
> ---
> L# 1211: is_user_function: decl in: 0x7f078461be00,  static intD. 
> max1D. (struct type_t *, size_t);
> L# 1222:   decl 0x7f078462,  static struct type_t * setupD. (size_t);
> L# 1222:   decl 0x7f078461bf00,  static intD. max1.constprop.0D. 
> (struct type_t *);
> L# 1222:   decl 0x7f078461bd00,  static doubleD. max2.constprop.0D. 
> (struct type_t *);
> L# 1222:   decl 0x7f078461bb00,  static intD. mainD. (void);
> ---
>
> Now it's pretty obvious that constant propagation decided the size_t
> len arguments to max1 and max2 were no longer needed. However, the
> function declaration information on the calls to them weren't updated
> so they'll never match. Now if there is another way to see if the
> function is in the partition or if there is some other way to compare
> the functions in a partition, please let me know.
>
> Thanks,
>
> Gary Oblock
> Ampere Computing
>
> PS. The body of the message is attached in a file because my email program
> (Outlook) mangled the above.
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any review, copying, or distribution of this email (or any 
> attachments thereto) is strictly prohibited. If you are not the intended 
> recipient, please contact the sender immediately and permanently delete the 
> original and any copies of this email and any attachments thereto.


Re: gcc math functions for OpenMP vectoization

2020-06-05 Thread Richard Biener via Gcc
On June 5, 2020 7:58:20 PM GMT+02:00, Toon Moene  wrote:
>On 6/5/20 6:10 PM, Tobias Burnus wrote:
>
>> On 6/5/20 4:11 PM, Jakub Jelinek via Gcc wrote:
>
>>> It is glibc that provides them, not GCC.
>>> See 
>>>
>https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/fpu/bits/math-vector.h;h=0801905da7b85e2f43fb6b682a7b84c5eec469d4;hb=HEAD
>
>>>
>> 
>> Minor addition: That header file is included in math.h, i.e. 
>> automatically available.
>> For Fortran/gfortran there is math-vector-fortran.h (also provided by
>
>> glibc)
>> which has the same functions and a similar effect.
>
>I wonder if there are Linux distributions where this actually effected 
>already.
>
>I know for sure that it is not in Debian Testing (as of two weeks ago) 
>and Red Hat Fedora 30 (similarly).
>
>Do you know of any ?

It works in openSUSE Tumbleweed at least.

Richard. 

>
>Kind regards,



Re: Inquire a potential bug when printing out GIMPLE ASAN statements

2020-06-09 Thread Richard Biener via Gcc
On Tue, Jun 9, 2020 at 3:38 AM Shuai Wang via Gcc  wrote:
>
> Hello!
>
> I am writing to report a potential bug I encountered when playing with the
> GIMPLE IR. I enabled the ASan and would like to print out all ASAN_MARK
> statements for the following simple code:
>
>  int main(int argc ,char **argv)
>  {
>   int stack_array[100];
>   stack_array[1] = 100;
>   stack_array[argc + 12];  // an ASan check, namely, ASAN_MARK, will
> be inserted at this point
>  }
>
> And I am using the following code snippet (basically derived from this
> post 
> )
> to print out all function calls, including ASAN_MARK:
>
>  if (is_gimple_call(stmt)){
>tree current_fn_decl = gimple_call_fndecl(stmt);
>const char* name = get_name(current_fn_decl);
>cerr << " Function : " << name << " is called \n";
>  }
>
> However, I note that some internal exceptions are encountered, when I
> use gcc version 7.4, 8.3, and also 9.3:
>
> test.c: In function ‘main’:
> test.c:9:5: internal compiler error: Segmentation fault
> 9 | int main(int argc ,char **argv)
>   | ^~~~
> 0xab88bf crash_signal
> ../../gcc-9.3.0/gcc/toplev.c:326
> 0xcfc836 location_wrapper_p(tree_node const*)
> ../../gcc-9.3.0/gcc/tree.h:3812
> 0xcfc836 tree_nop_conversion
> ../../gcc-9.3.0/gcc/tree.c:12850
> 0xcfc836 tree_strip_nop_conversions(tree_node*)
> ../../gcc-9.3.0/gcc/tree.c:12888
> 0xcfc836 get_name(tree_node*)
> ../../gcc-9.3.0/gcc/tree.c:12559
> 0x7f9466d86bb7 execute
> 
> /home/shuaiw/work/sanitizer_reduction_gcc/demo/walk_gimple/walk_gimple.cc:61
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> Makefile:10: recipe for target 'test' failed
> make: *** [test] Error 1
>
>
> I think the issue is due to ASAN_MARK, because when I comment out that
> particular array access which induces the ASAN_MARK, all other function
> calls, including ASan related functions, __builtin___asan_init
> and __builtin___asan_version_mismatch_check_v8, and be smoothly printed out
> with no issue.
>
> Can I interpret it as a bug or somewhat? Any suggestions are welcomed.
> Thank you very much.

ASAN_MARK is likely an internal function which does not have a function
declaration so you feed get_name a NULL pointer.

>
> Best,
> Shuai


Re: Seeking clarification and way forward on limited scope variables.

2020-06-09 Thread Richard Biener via Gcc
On Tue, Jun 9, 2020 at 8:00 AM Tomar, Sourabh Singh
 wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
> Hello Everyone,
>
> I need to have your thoughts on this.
>
> Consider the following test case --
> ---
>  1int main(int Argc, char **Argv) {
>   2 int Local = 6;
>   3 printf("%d\n",Local);
>   4
>   5 {
>   6 printf("%d\n",Local);
>   7 int Local = 7;
>   8 printf("%d\n",Local);
>   9 }
>  10
>  11 return 0;
>  12  }
> 
> When compiled in debug mode with compilers including (trunk gcc and trunk 
> clang) and debugging with GDB at Line No.6, the following behavior is observed
> Breakpoint 1, main (Argc=1, Argv=0x7fffe458) at MainScope.c:6
> 6   printf("%d\n",Local);
> (gdb) print Local
> $1 = 2102704   -- some Garbage value,
> (gdb) info addr Local
> Symbol "Local" is a variable at frame base reg $rbp offset 0+-24.   -- This 
> is location of *Local* declared inside scope, but as you may notice that the 
> variable being referred here is from the outer scope.
>
> This problem persists with both GDB and LLDB. Since we have entered the 
> Lexical Scope and when we try to print value of *Local*,  it will look into 
> the *current scope* and fetch the value if the variable exists in scope(in 
> case variable doesn't exist, GDB searches for it in the outer scope).
>
> This is regardless of whether the variable has actually came into scope(or 
> actually defined) at Line No. 7. Since DWARF already defined the location(on 
> stack) which will be valid for the lifetime of the variable, contrary to when 
> the variable is actually defined(or allocated) which is in this case Line No. 
> 7.
> -
>   0x006d: DW_TAG_lexical_block
>   DW_AT_low_pc  (0x002016d1)
>   DW_AT_high_pc (0x0020170b)
> 0x007a:   DW_TAG_variable
> DW_AT_location  (DW_OP_fbreg -24)
> DW_AT_name  ("Local")
> DW_AT_decl_file ("MainScope.c")
> DW_AT_decl_line (7)
> DW_AT_type  (0x008a "int")
> --
>
> The DWARF specification provides the DW_AT_start_scope attribute to deal with 
> this issue (Sec 3.9 Declarations with Reduced Scope DWARFv5). This attribute 
> aims at limiting the scope of variables within the lexical scope in which it 
> is defined to from where it has been declared/ defined.
>
> In order to fix this issue, we want to modify llvm so that DW_AT_start_scope 
> is emitted for the variable in the inner block (in the above example). This 
> limits the scope of the inner block variable to start from the point of its 
> declaration.
>
> For POC, we inserted DW_AT_start_scope in this inner *Local* variable, 
> resultant dwarf after this.
> -
> 0x006d: DW_TAG_lexical_block
>   DW_AT_low_pc  (0x002016d1)
>   DW_AT_high_pc (0x0020170b)
> 0x007a:   DW_TAG_variable
>  DW_AT_start_scope   (0x17) -- restricted within a 
> subset(starting from the point of definition(specified as an offset)) of 
> entire ranges covered by Lex Block.
> DW_AT_location  (DW_OP_fbreg -24)
> DW_AT_name  ("Local")
> DW_AT_decl_file ("MainScope.c")
> DW_AT_decl_line (7)
> DW_AT_type  (0x0092 "int")
> 
>
>
> We also modified 'gdb' to interpret DW_AT_start_scope so that the scope of 
> the variable is limited from the PC where the value of DW_AT_start_scope is. 
> If the debugger is stopped at a point within the same lexical block but at a 
> PC before DW_AT_start_scope, then gdb follows the normal search mechanism of 
> searching in consecutive super blocks till it gets a match or it reaches the 
> global block. After the modification,  GDB is able to correctly show the 
> value *6* in our example.
>
>
> After incorporating changes --
>   Breakpoint 1, main (Argc=1, Argv=0x7fffe458) at MainScope.c:6
> 6   printf("%d\n",Local);
> (gdb) print Local
> $1 = 6 --- Value retrieved from outer scope
> (gdb) info addr Local
> Symbol "Local" is a variable at frame base reg $rbp offset 0+-20.
>
> Could you guys please let us know your thoughts or suggestions on this? Was/ 
> Is there is an existing effort already going on to deal with this problem?

There's one or two bugreports about this in bugzilla which has some
extra info.  The issue is that
lexical blocks do not align with variable lifetime.  IIRC I spotted a
DWARF feature that might help
but don't remember (try find the bug).

I'm not aware of anyone actually trying to fix the issue.

Richard.

> Even though locatio

Re: Push to my private branches is disallowed

2020-06-15 Thread Richard Biener via Gcc
On June 15, 2020 6:05:26 PM GMT+02:00, Segher Boessenkool 
 wrote:
>Hi!
>
>$ git push -n fsf
>To git+ssh://gcc.gnu.org/git/gcc.git
> + 1db88c6...71e5e35 cc0 -> refs/users/segher/heads/cc0 (forced update)
>
>$ git push fsf
>Counting objects: 664, done.
>Delta compression using up to 64 threads.
>Compressing objects: 100% (239/239), done.
>Writing objects: 100% (504/504), 87.72 KiB | 0 bytes/s, done.
>Total 504 (delta 434), reused 321 (delta 265)
>remote: Resolving deltas: 100% (434/434), completed with 159 local
>objects.
>remote: *** !!! WARNING: This is *NOT* a fast-forward update.
>remote: *** !!! WARNING: You may have removed some important commits.
>remote: *** This update introduces too many new commits (2898), which
>would
>remote: *** trigger as many emails, exceeding the current limit (1000).
>remote: *** Contact your repository adminstrator if you really meant
>remote: *** to generate this many commit emails.
>remote: error: hook declined to update refs/users/segher/heads/cc0
>To git+ssh://gcc.gnu.org/git/gcc.git
> ! [remote rejected] cc0 -> refs/users/segher/heads/cc0 (hook declined)
>error: failed to push some refs to 'git+ssh://gcc.gnu.org/git/gcc.git'
>
>What.
>
>Of course it is not a fast-forward.  I rebase the branches I publish,
>what is the point of publishing them otherwise?  This is so that people
>can see the stuff that will make its way into master *later*.
>
>The number of new commits is nonsense (it is just 13), and the number
>of
>emails that triggers should be 0.
>
>Please fix?  Or, what else is wrong?

The number of commits is. From merges I suppose. 

Richard. 

>
>
>Segher



Re: SSA_NAME_DEF_STMT or print_gimple_stmt for MEM_REF seems mal-functional

2020-06-15 Thread Richard Biener via Gcc
On June 15, 2020 6:31:38 PM GMT+02:00, Shuai Wang via Gcc  
wrote:
>Hello,
>
>Suppose given the following SSA statement generated by the `sanopt`
>pass:
>
>   _17 = (signed char *) _16;
>   _18 = *_17;
>
>I am using the following code to identify that _17 depends on _16:
>
>// def_stmt refers to _18 = &_17;
>for (unsigned i = 1; i < gimple_num_ops(def_stmt); i++) {
> op1 = gimple_assign_rhs1(def_stmt);
> if (is_gimple_addressable(op1)) {
>  gimple* def_stmt = SSA_NAME_DEF_STMT(op1);
>  print_gimple_stmt(stderr, def_stmt, 0, TDF_SLIM); // crash at
>this point
> }
>
>It crashes with the following call stack:
>
>0xb5cd5f crash_signal
>../../gcc-10.1.0/gcc/toplev.c:328
>0x1452134 pp_format(pretty_printer*, text_info*)
>../../gcc-10.1.0/gcc/pretty-print.c:1828
>0x14533e4 pp_printf(pretty_printer*, char const*, ...)
>../../gcc-10.1.0/gcc/pretty-print.c:1773
>0x8dcc81 print_gimple_stmt(_IO_FILE*, gimple*, int, dump_flag)
>../../gcc-10.1.0/gcc/gimple-pretty-print.c:157
>
>I tried hard but just cannot understand why this would crash. Indeed,
>this
>code works pretty well when printing out other dependency statements,
>but
>just gets stuck in front of pointer dereference like _18 = *_17.
>
>Any suggestion would be appreciated. Thank you!

Build your compiler with - - enable-checking and you'll figure you reference 
SSA_NAME_DEF_STMT of a NON-SSA_NAME. I suggest you learn to use a debugger. 

Richard. 

>Best,
>Shuai



Re: Push to my private branches is disallowed

2020-06-15 Thread Richard Biener via Gcc
On June 15, 2020 7:19:13 PM GMT+02:00, Joseph Myers  
wrote:
>On Mon, 15 Jun 2020, Segher Boessenkool wrote:
>
>> It should never send email for things that are on master (or any
>release
>> branch) already.
>
>https://github.com/AdaCore/git-hooks/issues/9
>
>https://github.com/AdaCore/git-hooks/pull/12 is marked "Approved".  It 
>certainly has fixes for some of the issues reported in the GCC context,
>
>but I'm not sure if it includes a fix for that particular one.  And
>once 
>"Approved" has turned into actually present in master, the upstream 
>changes will need merging into the version used by GCC (replacing local
>
>changes where the features those implement have been implemented more 
>generally upstream).
>
>> It should never send email for user branches *at all*.
>
>I think sending email for all branches showing the development taking 
>place there (as opposed to commits that are already in the repository
>and 
>are just being added to another ref) is entirely appropriate.
>
>I don't know if deleting and then recreating a user branch (in separate
>
>pushes) avoids the limit (and the excess mails) in the case where a
>user 
>branch is being rebased, but I expect it should.

Can you document this (and how to do it) in git.html?



Re: SSA_NAME_DEF_STMT or print_gimple_stmt for MEM_REF seems mal-functional

2020-06-15 Thread Richard Biener via Gcc
On June 15, 2020 6:58:27 PM GMT+02:00, Shuai Wang  
wrote:
>Thank you very much for your prompt response, Rchard. Sorry I was kinda
>"learning by doing". I am familiar with LLVM stuff but newbie to GCC
>specifications.
>
>Just want to make sure I got it right; _17 and _16 in the IR code are
>SSA
>variables. They are initialized for once and used once. Could you
>please
>shed some light on where "non-ssa name" comes in this scenario, and how
>exactly can I get  _17 = (signed char *) _16 printed out? Thank you
>very
>much.
>
>Best,
>Shuai
>
>On Tue, Jun 16, 2020 at 12:52 AM Richard Biener
>
>wrote:
>
>> On June 15, 2020 6:31:38 PM GMT+02:00, Shuai Wang via Gcc
>
>> wrote:
>> >Hello,
>> >
>> >Suppose given the following SSA statement generated by the `sanopt`
>> >pass:
>> >
>> >   _17 = (signed char *) _16;
>> >   _18 = *_17;
>> >
>> >I am using the following code to identify that _17 depends on _16:
>> >
>> >// def_stmt refers to _18 = &_17;
>> >for (unsigned i = 1; i < gimple_num_ops(def_stmt); i++) {
>> > op1 = gimple_assign_rhs1(def_stmt);

op1 is not an SSA name here. 

>> > if (is_gimple_addressable(op1))

That predicate does not make sense on SSA names

 {
>> >  gimple* def_stmt = SSA_NAME_DEF_STMT(op1);
>> >  print_gimple_stmt(stderr, def_stmt, 0, TDF_SLIM); // crash
>at
>> >this point
>> > }
>> >
>> >It crashes with the following call stack:
>> >
>> >0xb5cd5f crash_signal
>> >../../gcc-10.1.0/gcc/toplev.c:328
>> >0x1452134 pp_format(pretty_printer*, text_info*)
>> >../../gcc-10.1.0/gcc/pretty-print.c:1828
>> >0x14533e4 pp_printf(pretty_printer*, char const*, ...)
>> >../../gcc-10.1.0/gcc/pretty-print.c:1773
>> >0x8dcc81 print_gimple_stmt(_IO_FILE*, gimple*, int, dump_flag)
>> >../../gcc-10.1.0/gcc/gimple-pretty-print.c:157
>> >
>> >I tried hard but just cannot understand why this would crash.
>Indeed,
>> >this
>> >code works pretty well when printing out other dependency
>statements,
>> >but
>> >just gets stuck in front of pointer dereference like _18 = *_17.
>> >
>> >Any suggestion would be appreciated. Thank you!
>>
>> Build your compiler with - - enable-checking and you'll figure you
>> reference SSA_NAME_DEF_STMT of a NON-SSA_NAME. I suggest you learn to
>use a
>> debugger.
>>
>> Richard.
>>
>> >Best,
>> >Shuai
>>
>>



Re: SSA_NAME_DEF_STMT or print_gimple_stmt for MEM_REF seems mal-functional

2020-06-16 Thread Richard Biener via Gcc
On Tue, Jun 16, 2020 at 5:00 AM Shuai Wang  wrote:
>
> Yes,  TREE_CODE (op1) != SSA_NAME shows that op1 is by no means SSA names 
> (although I don't know why). But how can I backwardly identify its 
> initialization statement _17 = (signed char *) _16? Thanks!

You want to walk over SSA operands of the stmt, not over operands
using for example FOR_EACH_SSA_USE_OPERAND.

> Shuai
>
> On Tue, Jun 16, 2020 at 10:32 AM Shuai Wang  wrote:
>>
>> Got it. But in that sense, given a `op1` satisfies the 
>> "is_gimple_addressable" predicate (e.g., the _17 in my sample code), how can 
>> I find its def statement? Thank you very much.
>>
>> Shuai
>>
>> On Tue, Jun 16, 2020 at 3:19 AM Richard Biener  
>> wrote:
>>>
>>> On June 15, 2020 6:58:27 PM GMT+02:00, Shuai Wang  
>>> wrote:
>>> >Thank you very much for your prompt response, Rchard. Sorry I was kinda
>>> >"learning by doing". I am familiar with LLVM stuff but newbie to GCC
>>> >specifications.
>>> >
>>> >Just want to make sure I got it right; _17 and _16 in the IR code are
>>> >SSA
>>> >variables. They are initialized for once and used once. Could you
>>> >please
>>> >shed some light on where "non-ssa name" comes in this scenario, and how
>>> >exactly can I get  _17 = (signed char *) _16 printed out? Thank you
>>> >very
>>> >much.
>>> >
>>> >Best,
>>> >Shuai
>>> >
>>> >On Tue, Jun 16, 2020 at 12:52 AM Richard Biener
>>> >
>>> >wrote:
>>> >
>>> >> On June 15, 2020 6:31:38 PM GMT+02:00, Shuai Wang via Gcc
>>> >
>>> >> wrote:
>>> >> >Hello,
>>> >> >
>>> >> >Suppose given the following SSA statement generated by the `sanopt`
>>> >> >pass:
>>> >> >
>>> >> >   _17 = (signed char *) _16;
>>> >> >   _18 = *_17;
>>> >> >
>>> >> >I am using the following code to identify that _17 depends on _16:
>>> >> >
>>> >> >// def_stmt refers to _18 = &_17;
>>> >> >for (unsigned i = 1; i < gimple_num_ops(def_stmt); i++) {
>>> >> > op1 = gimple_assign_rhs1(def_stmt);
>>>
>>> op1 is not an SSA name here.
>>>
>>> >> > if (is_gimple_addressable(op1))
>>>
>>> That predicate does not make sense on SSA names
>>>
>>>  {
>>> >> >  gimple* def_stmt = SSA_NAME_DEF_STMT(op1);
>>> >> >  print_gimple_stmt(stderr, def_stmt, 0, TDF_SLIM); // crash
>>> >at
>>> >> >this point
>>> >> > }
>>> >> >
>>> >> >It crashes with the following call stack:
>>> >> >
>>> >> >0xb5cd5f crash_signal
>>> >> >../../gcc-10.1.0/gcc/toplev.c:328
>>> >> >0x1452134 pp_format(pretty_printer*, text_info*)
>>> >> >../../gcc-10.1.0/gcc/pretty-print.c:1828
>>> >> >0x14533e4 pp_printf(pretty_printer*, char const*, ...)
>>> >> >../../gcc-10.1.0/gcc/pretty-print.c:1773
>>> >> >0x8dcc81 print_gimple_stmt(_IO_FILE*, gimple*, int, dump_flag)
>>> >> >../../gcc-10.1.0/gcc/gimple-pretty-print.c:157
>>> >> >
>>> >> >I tried hard but just cannot understand why this would crash.
>>> >Indeed,
>>> >> >this
>>> >> >code works pretty well when printing out other dependency
>>> >statements,
>>> >> >but
>>> >> >just gets stuck in front of pointer dereference like _18 = *_17.
>>> >> >
>>> >> >Any suggestion would be appreciated. Thank you!
>>> >>
>>> >> Build your compiler with - - enable-checking and you'll figure you
>>> >> reference SSA_NAME_DEF_STMT of a NON-SSA_NAME. I suggest you learn to
>>> >use a
>>> >> debugger.
>>> >>
>>> >> Richard.
>>> >>
>>> >> >Best,
>>> >> >Shuai
>>> >>
>>> >>
>>>


Re: Re-optimize instrumented GIMPLE code

2020-06-17 Thread Richard Biener via Gcc
On Wed, Jun 17, 2020 at 4:11 AM Shuai Wang via Gcc  wrote:
>
> Hello,
>
> Suppose I have changed certain if condition in the GIMPLE code (generated
> by the `sanopt` pass) into the following format:
>
> if (0 == 1)
> {
>
> }
>
> Then, in order to completely remove this unnecessary if condition and the
> guarded true branch, I want to leverage the dead code elimination
> optimization of gcc. However, I just cannot figure out a way of doing so. I
> use the following command to output the instrumented GIMPLE code:

A simple CFG cleanup would get rid of the above, if you insert a pass
make sure to return TODO_cfg_cleanup from it.

> gcc -fdump-tree-all -fplugin=./instrumentor.so -g -fsanitize=address test.c
>
> And notice that the instrumented gimple code is right there in the
> outputs: test.c.322t.instrumentor. Everything seems fine.
>
> Anyone could shed some light on how to re-optimize (e.g., with deadcode
> elimination or just use -O3 if possible) the instrumented GIMPLE code?
> Thank you very much.
>
> Shuai


Re: Exception at "need_ssa_update_p" during GIMPLE instrumentation

2020-06-21 Thread Richard Biener via Gcc
On June 21, 2020 11:38:49 AM GMT+02:00, Shuai Wang via Gcc  
wrote:
>OK, I think I know how to solve it. Just return TODO_update_ssa
>.

If you dump with -vops you'll likely see that virtual operands got out of sync. 
You can either manually copy them from the original function calls or as you 
do, make sure update_ssa runs. 

Richard. 

>On Sun, Jun 21, 2020 at 5:34 PM Shuai Wang 
>wrote:
>
>> Hello,
>>
>> I am doing instrumentation of GIMPLE code by adding extra coverage
>> counters at each basic block. Basically it's
>> mimicking -fsanitize-coverage=trace-pc, where the only difference is
>> that __sanitizer_cov_trace_pc (the default
>> hander of fsanitize-coverage=trace-pc)  has no input parameters, but
>my
>> coverage hander has a parameter of basic block id.
>>
>> My current issue is that after the instrumentation of one function,
>the
>> plugin throws an exception at the following gcc_assert and do not
>proceed
>> to instrument another function:
>>
>>   if (flags & TODO_cleanup_cfg)
>> cleanup_tree_cfg (flags & TODO_update_ssa_any);
>>   else if (flags & TODO_update_ssa_any)
>> update_ssa (flags & TODO_update_ssa_any);
>>   gcc_assert (!need_ssa_update_p (fn));  <--  line 1954 of
>gcc/passes.c for gcc 10.1.0
>>
>> This really confused me, because when I print out the instrumented
>GIMPLE code and compare with fsanitize-coverage=trace-pc, I don't see a
>major difference here:
>>
>> == my instrumented GIMPLE code ===
>>
>> fun2 ()
>> {
>>   int D.2588;
>>   int _3;
>>
>>:
>>   __sanitizer_cov_trace_pc (2);  <--- my coverage hander with basic
>block id as the input
>>   __builtin_puts (&"fun2"[0]);
>>   _3 = 0;
>>
>>:
>> :
>>   __sanitizer_cov_trace_pc (3);
>>   return _3;
>>
>> }
>>
>> === the corresponding instrumented GIMPLE code by
>fsanitize-coverage=trace-pc =
>>
>> fun2 ()
>> {
>>   int D.2760;
>>   int _3;
>>
>>[0.00%]:
>>   __builtin___sanitizer_cov_trace_pc ();
>>   __builtin_puts (&"fun2"[0]);
>>   _3 = 0;
>>
>>  [0.00%]:
>>   __builtin___sanitizer_cov_trace_pc ();
>>   return _3;
>>
>> }
>>
>> There is no big difference here. Could anyone shed some lights on why
>an exception on "need_ssa_update_p" is thrown? I don't think there is
>an need to udpate any "SSA" here.. Thank you very much.
>>
>> Best,
>>
>> Shuai
>>
>>



Re: GIMPLE problem

2020-06-24 Thread Richard Biener via Gcc
On Wed, Jun 24, 2020 at 1:36 AM Gary Oblock via Gcc  wrote:
>
> I'm somehow misusing GIMPLE (probably in multiple ways) and I need
> some help in straightening out this little mess I've made.
>
> I'm trying to do the following:
>
> In an attempt at structure reorganization (instance interleaving) an
> array of structures is being transformed into a structure of arrays.
>
> for the simple example I'm using
> typedef struct type type_t;
> struct type {
>   double x;
>   double y;
> };
> .
> .
> type_t *data = (type_t *)malloc( len * sizeof(type_t));
> .
> .
> result = data[i].y;
>
> Is transformed into this or something close to it
>
> typedef long _reorg_SP_ptr_type_type_t
> typedef struct _reorg_base_type_type_t _reorg_base_type_type_t
>
> struct _reorg_base_type_type_t {
>  double *x;
>  double *y;
> };
>
> _reorg_SP_ptr_type_type_t data;
>
> _reorg_base_type_type_t _reorg_base_var_type_t;
>
> // Note I'm ignoring a bunch of stuff that needs to happen
> // when a malloc fails..
> _reorg_base_var_type_t.x = (double*)malloc( len*sizeof(double));
> _reorg_base_var_type_t.y = (double*)malloc( len*sizeof(double));
>
> data = 0;
> .
> .
> double *temp = _reorg_base_var_type_t.y;
> result = temp[i];
>
> Now, believe it or not the the whole bit above, except for "result = 
> data[i].y",
> seems to work just fine.
>
> I attempted to do this (result = data[i].y) via basically two different
> ways. One is using ARRAY_REF and in the other faking an array access with
> INDIRECT_REF. The first approach chokes on the fact that temp is a pointer
> and the second dies in ssa operand scanning because it doesn't have a case
> for INDIRECT_REF.

On GIMPLE there's no INDIRECT_REF but you have to use a MEM_REF
instead.  I'd use an ARRAY_REF and what you need to build is, in
-fdump-tree-XYZ-gimple (aka GIMPLE frontend) syntax:

temp_2 = _reorg_base_var_type_t.y;
result_3 = __MEM  (temp_2)[i_4];

so for the ARRAY_REF you have to dereference temp but view it as
array type double[].  That is, the TREE_TYPE of the MEM_REF you
build should be the array type.  You can build an array type from
the component type via build_array_type (component_type, NULL_TREE)/

> The code below shows both ways. What have I done wrong here and what to
> I need to do differently to get it to work?
>
> Thanks,
>
> Gary
>
> PS Please ignore the then case below.
>
> 
>  gimple_stmt_iterator gsi = gsi_for_stmt( stmt);
>
>  // Dump for debugging
>  print_gimple_stmt ( stderr, stmt, 0);
>
>  tree lhs = gimple_assign_lhs( stmt);
>  tree rhs = gimple_assign_rhs1( stmt);
>
>  bool ro_on_left = tree_contains_a_reorgtype_p ( lhs, info);
>
>  tree ro_side = ro_on_left ? lhs : rhs;
>  tree nonro_side = ro_on_left ? rhs : lhs;
>
>  switch ( recognize_op ( ro_side, info) )  // "a->f"
>{
>case ReorgOpT_Indirect:
>  {
>tree orig_field = TREE_OPERAND( ro_side, 1);
>tree field_type = TREE_TYPE( orig_field);
>tree base = ri->instance_interleave.base;
>
>tree base_field =
>find_coresponding_field ( base, orig_field);
>
>tree base_field_type = TREE_TYPE( base_field);
>
>tree field_val_temp =
>  make_temp_ssa_name( field_type, NULL, "field_val_temp");
>
>tree inner_op = TREE_OPERAND( ro_side, 0);
>
>// For either case generate common code:
>
>// field_array = _base.f
>tree field_arry_addr =
>make_temp_ssa_name( base_field_type, NULL, "field_arry_addr");
>
>tree rhs_faa = build3 ( COMPONENT_REF,
>   //base_field_type, // This doesn't work
>   ptr_type_node, // This seems bogus
>   base,
>  //base_field, // This doesn't work
>  orig_field, // This seems bogus
>  NULL_TREE);
>
>// Use this to access the array of element.
>gimple *get_field_arry_addr =
>gimple_build_assign( field_arry_addr, rhs_faa);
>
>   // index = a
>   tree index =
> make_temp_ssa_name( ri->pointer_rep, NULL, "index");
>   gimple *get_index =
> gimple_build_assign( index, inner_op);
>
>   gimple *temp_set;
>   gimple *final_set;
>
>   #if WITH_INDIRECT
>   // offset = index * size_of_field
>   tree size_of_field = TYPE_SIZE_UNIT ( base_field_type);
>   tree offset = make_temp_ssa_name( sizetype, NULL, "offset");
>
>   gimple *get_offset =
> gimple_build_assign ( offset, MULT_EXPR, index, size_of_field);
>
>   // field_addr = field_array + offset
>   // bug fix here (TBD) type must be *double not double
>   tree field_addr =
> make_te

Re: GIMPLE problem

2020-06-25 Thread Richard Biener via Gcc
On Wed, Jun 24, 2020 at 9:05 PM Gary Oblock via Gcc  wrote:
>
> Richard,
>
> First off I did suspect INDIRECT_REF wasn't supported, thanks for
> confirming that.
>
> I tried what you said in the original code before I posted
> but I suspect how I went at it is the problem. I'm probably
> doing something(s) in a glaringly stupid way.
>
> Can you spot it, because everything I'm doing makes total sense
> to me?

Well, read what I wrote ...

> Thanks Gary
>
> --
>
> Snippet from the code with MEM_REF:
>
>   tree lhs_ref = build1 ( MEM_REF, field_type, field_addr);

MEM_REF has two operands, the second is a byte offset
plus encodes TBAA information.

>   final_set = gimple_build_assign( lhs_ref, field_val_temp);
>
> field_type is a double *
>
> field_addr is an address within an malloced array of doubles.
>
> --
>
> Snippet from the code with ARRAY_REF:
>
>   tree rhs_ref = build4 ( ARRAY_REF, field_type, field_arry_addr, index,
>   NULL_TREE, NULL_TREE);

you need to dereference field_arry_addr to produce an array you
can reference with the ARRAY_REF.

 tree arr =  build2 (MEM_REF, array_type, field_arry_addr,
build_int_cst (ptr_type_node, 0));
 rhs_ref = build4 (ARRAY_REF, field_type, arr, index, NULL, NULL);

>   temp_set = gimple_build_assign( field_val_temp, rhs_ref);
>
> field type is double
>
> field_arry_addr is the starting address of an array of malloced doubles.
>
> index is a pointer_rep (an integer)
>   details:
> tree pointer_rep = make_node ( INTEGER_TYPE);
> TYPE_PRECISION (pointer_rep) = TYPE_PRECISION (pointer_sized_int_node);
>


Re: Hoisting DFmode loads out of loops..

2020-06-25 Thread Richard Biener via Gcc
On June 26, 2020 3:24:24 AM GMT+02:00, Alan Lehotsky  wrote:
>On Jun 25, 2020, at 6:37 PM, Jeff Law
>mailto:l...@redhat.com>> wrote:
>
>On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote:
>I’m working on a GCC 8.3 port to a load/store architecture with a
>32-bit data-path between registers and memory;
>
>looking at the gcc.dg/loop-9.c test, I fail to pass because I have
>split the move of a double constant to memory into multiple moves (4 in
>fact, because I only have a 16-bit immediate mode.)
>
>The (define_insn_and_split “movdf” …) is conditioned on
>“reload_completed”.
>
>Is there some other trick I need get the constant hoisted.  I have
>already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10
>insns)
>Hi Alan, it's been a long time...
>
>We'd probably need to set the RTL.  A variety of things can get in the
>way of
>LICM.  For example, I'd expect subregs to be problematical because they
>can look
>like RMW operations.
>
>jeff
>
>
>
>Hello to you too, Jeff….   I’ve been lurking for the last decade or so,
>last port I actually did was was GCC 4 based, so lots of new stuff to
>try and wrap my head around.  I certainly am grateful for anybody with
>suggestions as to how to track down this problem (I’m not terribly
>eager to do a
>parallel stepping thru a x86 gcc in parallel with my port to see where
>they diverge in the loop-invariant recognition.)
>
>Although in crafting this expanded email, I see that the x86 has
>already decided to store the constant 18.4242 in the .rodata section by
>the start of loop-invariance so there’s a
>
>(set (reg:DF…. ) (mem:DF  (symbol_ref ….)))
>
>and I bet that’s far easier to move out of the loop than it would be to
>split the original
>
>(set (mem:DF…) (const_double:DF ….))

Immediate operands are never moved or CSEd by either RTL nor GIMPLE so if you 
do not have const_double immediates the best thing to do is not make them 
legitimate. 

Richard. 

>— Al
>
>==
>
>Source code is
>
>void f (double *a)
>{
>int i;
>for (i = 0; i < 100; i++_
>a[i] = 18.4242;
>}
>==
>
>Here’s the dump from loop-9.c.252r.loop2-invariant  (compiled -O1)
>
>
>;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0,
>symbol_order=0)
>
>*starting processing of loop 1 **
>starting the processing of deferred insns
>ending the processing of deferred insns
>setting blocks to analyze 3, 5
>starting the processing of deferred insns
>ending the processing of deferred insns
>df_analyze called
>df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 (
>0.33)
>df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 (
>0.33)
>df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 ( 
>0.5)
>
>
>starting region dump
>
>
>f
>
>Dataflow summary:
>def_info->table_size = 3, use_info->table_size = 23
>;;  invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6
>[d6] 7 [d7] 8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24
>[acc0_hi] 25 [acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc]
>31 [int_set0] 32 [int_set1] 33 [int_clr0] 34 [int_clr1] 35
>[scratchpad0] 36 [scratchpad1] 37 [scratchpad2] 38 [scratchpad3]
>;;  hardware regs used 23 [sp] 29 [arg] 39 [sfp]
>;;  regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
>;;  eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
>;;  entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7
>[d7] 8 [d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp]
>;;  exit block uses 22 [a6] 23 [sp] 39 [sfp]
>;;  regs ever live 0 [d0] 30 [cc]
>;;  ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d}
>r6={1d} r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u}
>r29={1d,4u} r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u}
>;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns.
>;; Reaching defs:
>;;  sparse invalidated
>;;  dense invalidated 0, 1
>;;  reg->defs[] map: 30[0,1] 46[2,2]
>;; bb 3 artificial_defs: { }
>;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }}
>;; lr  in   22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
>;; lr  use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48
>;; lr  def 30 [cc] 46
>;; live  in   46
>;; live  gen 30 [cc] 46
>;; live  kill 30 [cc]
>;; rd  in   (1) 46[2]
>;; rd  gen (2) 30[1],46[2]
>;; rd  kill (3) 30[0,1],46[2]
>;;  UD chains for artificial uses at top
>
>(code_label 11 7 8 3 2 (nil) [0 uses])
>(note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>;;   UD chains for insn luid 0 uid 9
>;;  reg 46 { d2(bb 3 insn 10) }
>(insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15,
>offset: 0B]+0 S8 A32])
>(const_double:DF 1.842419990222931955941021442413330078125e+1
>[0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf}
> (nil))
>;;   UD chains for insn luid 1 uid 10
>;;  reg 46 { d2(bb 3 insn 10) }
>(insn 10 9 12 3 (parallel [
>(set (reg:SI 46 [ ivtmp___6 ])
>(plus:SI (reg:SI 46 [ ivtmp___6 ])
>(const_int 8 [0x8])))
>

Re: Support for named address spaces in C++

2020-06-26 Thread Richard Biener via Gcc
On Fri, Jun 26, 2020 at 9:12 AM Georg-Johann Lay  wrote:
>
> Andrew Pinski via Gcc schrieb:
> > On Wed, Jun 3, 2020 at 2:32 PM Max Ruttenberg via Gcc  
> > wrote:
> >> Hi all,
> >>
> >> I’ve added a named address space to our backend and I noticed that it is 
> >> only support in C.
> >> Has anyone had experience porting this feature to C++? Is there any 
> >> technical reason why it’s not supported?
> >
> > The main issue is how it is interacts with templates and then
> > mangling.  There was a few other issues that have been posted about
> > before every time it is raised.
> >
> > Thanks,
> > Andrew
> >
>
> AFAIK llvm / clang supports named address spaces in C++, so it is
> obviously possible and feasible.

I suppose restricting it to interfaces with extern "C" might side-step
most of the mangling and template issues.  Does clang document
its C++ language extension?

Richard.

>
> Johann
>


Re: Passing an string argument to a GIMPLE call

2020-06-27 Thread Richard Biener via Gcc
On June 27, 2020 6:21:12 AM GMT+02:00, Shuai Wang via Gcc  
wrote:
>Hello,
>
>I am writing the following statement to make a GIMPLE call:
>
>  tree function_fn_type = build_function_type_list(void_type_node,
>void_type_node, integer_type_node, NULL_TREE);
>  tree sancov_fndecl = build_fn_decl("my_instrumentation_function",
>function_fn_type);
>
> auto gcall = gimple_build_call(sancov_fndecl, 2,
>build_string_literal(3, "foo"), build_int_cst_type(integer_type_node,
>0));
>
>However, when executing the GIMPLE plugin, while inducing no internal
>crash, the following function call statement is generated:
>
>  my_instrumentation_function (*&"foo"[0]*, 0);
>
>The first argument seems really strange. Can I somewhat just put a
>"foo"
>there instead of the current form? Thank you very much.

It looks correct. You are passing the address of the string literal. 

Richard. 

>Best,
>Shuai



Re: Passing an string argument to a GIMPLE call

2020-06-28 Thread Richard Biener via Gcc
On June 27, 2020 11:15:50 PM GMT+02:00, David Malcolm  
wrote:
>On Sat, 2020-06-27 at 21:27 +0800, Shuai Wang via Gcc wrote:
>> Dear Richard,
>> 
>> Thanks for the info. My bad, I will need to append "\0" at the end of
>> the
>> string. Also, a follow-up question which I just cannot find an
>> answer:
>> typically in the plugin entry point:
>> 
>> virtual unsigned int execute(function *fun)
>> 
>> How do I know which C files I am instrumenting? Can I somehow get the
>> name
>> of the C file? I don't find a corresponding pointer in the function
>> struct.
>
>fun->function_start_locus and fun->function_end_locus are the
>location_t for the start and end of the function; 

DECL_SOURCE_LOCATION of cfun->decl might be a more reliable source. 

Richard 

also, each gimple
>stmt has a location_t (although this isn't always set for every stmt).
>
>Given a location_t, you can use LOCATION_FILE (loc) to get the source
>file (and various other macros and accessors, see input.h)
>
>Hope this is helpful
>Dave
>
>> Best,
>> Shuai
>> 
>> On Sat, Jun 27, 2020 at 9:12 PM Richard Biener <
>> richard.guent...@gmail.com>
>> wrote:
>> 
>> > On June 27, 2020 6:21:12 AM GMT+02:00, Shuai Wang via Gcc <
>> > gcc@gcc.gnu.org>
>> > wrote:
>> > > Hello,
>> > > 
>> > > I am writing the following statement to make a GIMPLE call:
>> > > 
>> > >  tree function_fn_type =
>> > > build_function_type_list(void_type_node,
>> > > void_type_node, integer_type_node, NULL_TREE);
>> > >  tree sancov_fndecl =
>> > > build_fn_decl("my_instrumentation_function",
>> > > function_fn_type);
>> > > 
>> > > auto gcall = gimple_build_call(sancov_fndecl, 2,
>> > > build_string_literal(3, "foo"),
>> > > build_int_cst_type(integer_type_node,
>> > > 0));
>> > > 
>> > > However, when executing the GIMPLE plugin, while inducing no
>> > > internal
>> > > crash, the following function call statement is generated:
>> > > 
>> > >  my_instrumentation_function (*&"foo"[0]*, 0);
>> > > 
>> > > The first argument seems really strange. Can I somewhat just put
>> > > a
>> > > "foo"
>> > > there instead of the current form? Thank you very much.
>> > 
>> > It looks correct. You are passing the address of the string
>> > literal.
>> > 
>> > Richard.
>> > 
>> > > Best,
>> > > Shuai



Re: RFC noipa sizeof function for record relayout at link time

2020-06-29 Thread Richard Biener via Gcc
On Mon, Jun 29, 2020 at 11:56 AM Erick Ochoa
 wrote:
>
> Hello,
>
> I have been working on link time optimization for C that may change the
> size of structs (at link time). We are close to sharing the results we
> have so far, but there are a couple of missing pieces left to work on:
>
> Implementations of sizeof and offsetof that support this change in
> struct layout at link time.
>
> == What is the problem? ==
>
> Currently, for both sizeof and offsetof, the C parser will replace these
> statements with trees that correspond to the value returned by sizeof
> and offsetof at parse time. For example:
>
> // source code
> struct astruct a;
> memset(a, 0, sizeof(a));
>
> // parse time
> memset(a, 0, 64);
>
> // after dead field elimination
> // struct astruct is now 56 bytes long
> memset(a, 0, 64); // <-- we are overwriting memory!
>
> At link time, we really shouldn't change the value 64 since we can't and
> shouldn't assume that the value 64 came from a sizeof statement. The
> source code could have been written this way:
>
> // source code
> struct astruct a;
> memset(a, 0, 64);
>
> regardless of whether the struct astruct has a length of 64.
>
> ** We need to identify which trees come from sizeof statements **
>
> == What do we want? ==
>
> What we really want is to make sure that our transformation performs the
> following changes (or no changes!) depending on the source code.
>
> If the value for memset's argument comes from a sizeof statement:
>
> // source code
> struct astruct a;
> memset(a, 0, sizeof(a));
>
> // parse time
> memset(a, 0, 64);
>
> // after dead field elimination
> memset(a, 0, 56);
>
> However, in the case in which no sizeof is used, we want to do the
> following:
>
> // source code
> struct astruct a;
> memset(a, 0, 64);
>
> // parse time
> memset(a, 0, 64);
>
> // after dead field elimination
> memset(a, 0, 64);
>
> == How do we get what we want? ==
>
> Ideally what we want is to:
>
> * Be able to change the value returned by sizeof and offsetof at link time:
>* possibly a global variable?
> * Identify which values come from sizeof statement:
>* matching identifiers?
> * No re/define valid C identifiers:
>* in gimple we can have an identifier we a dot in it.
> * Disable constant propagation and other optimizations:
>* possibly __attribute__((noipa))
> * Be able to work with parallel compilation (make -j)
> * Be able to work with any Makefile
>* No C code generation and then compile and link gen code at the end.
>
> So, I've been thinking about multiple options:
>
> * Extending gimple to add support for a sizeof statement
> * A function per struct generated during compilation (sizeof & offsetof)
> * A variable per struct generated during compilation (sizeof and more
> for offsetof)
>
> I think extending gimple to add support for a sizeof statement gets us
> all what we want, however this would involve rewriting possibly many
> parts of GCC. As such, I am somewhat opposed to this.
>
> I then thought of generating global variables during parse/time
> compilation. In this scheme, I would replace sizeof statements with a
> reference to a global variable (or function) that is initialized with
> the value returned by the sizeof statement during parse time. At link
> time we can replace initialization value if needed. For example:
>
> // The parser is parsing a C file
> // it encounters a sizeof statement
> sizeof(struct astruct);
>
> // Parsing is paused.
> // Does a global variable that identifies this struct exists?
> // I.e. size_t __lto.sizeof.astruct exists?
> // If it doesn't create it.
>
> size_t __lto.sizeof.astruct = 64
>
> // Back to the parser
> // instead of replacing
> // sizeof(struct astruct) with 64
> // replace with the following gimple:
>
> __lto.sizeof.astruct
>
> // Continue parsing until the end of file compilation.
>
> // If at link time we detect that we will delete a field from astruct
> // Then we will have to look at the initialization value of
> // __lto.sizeof.astruct and replace it with the new value.
>
> size_t __lto.sizeof.$identifier = 56
>
> This strategy can be used with global functions instead of variables and
> it is similar. The only differences would be we would create a global
> function instead of a variable and we would call that function to obtain
> the value.
>
> For offsetof, we will need to change in the following way:
>
> // Parser encounter offsetof
> offsetof(struct astruct, b);
>
> / Parsing is paused.
> // Does a global variable that identifies this struct AND field exists?
>
> // The previous field has a size of 8
> size_t __lto.offsetof.astruct._8 = 8
>
> // Back to the parser
> // instead of replacing
> // offsetof(struct astruct, b) with 8
> // replace with the following gimple:
>
> __lto.offsetof.astruct._8
>
> // Continue parsing until the end of file compilation.
>
> // If at link time we detect that we will delete the previous field
> // then we can rewrite all the offsetof for this struct and which refer
>

Re: RFC noipa sizeof function for record relayout at link time

2020-06-29 Thread Richard Biener via Gcc
On Mon, Jun 29, 2020 at 1:05 PM Richard Biener
 wrote:
>
> On Mon, Jun 29, 2020 at 11:56 AM Erick Ochoa
>  wrote:
> >
> > Hello,
> >
> > I have been working on link time optimization for C that may change the
> > size of structs (at link time). We are close to sharing the results we
> > have so far, but there are a couple of missing pieces left to work on:
> >
> > Implementations of sizeof and offsetof that support this change in
> > struct layout at link time.
> >
> > == What is the problem? ==
> >
> > Currently, for both sizeof and offsetof, the C parser will replace these
> > statements with trees that correspond to the value returned by sizeof
> > and offsetof at parse time. For example:
> >
> > // source code
> > struct astruct a;
> > memset(a, 0, sizeof(a));
> >
> > // parse time
> > memset(a, 0, 64);
> >
> > // after dead field elimination
> > // struct astruct is now 56 bytes long
> > memset(a, 0, 64); // <-- we are overwriting memory!
> >
> > At link time, we really shouldn't change the value 64 since we can't and
> > shouldn't assume that the value 64 came from a sizeof statement. The
> > source code could have been written this way:
> >
> > // source code
> > struct astruct a;
> > memset(a, 0, 64);
> >
> > regardless of whether the struct astruct has a length of 64.
> >
> > ** We need to identify which trees come from sizeof statements **
> >
> > == What do we want? ==
> >
> > What we really want is to make sure that our transformation performs the
> > following changes (or no changes!) depending on the source code.
> >
> > If the value for memset's argument comes from a sizeof statement:
> >
> > // source code
> > struct astruct a;
> > memset(a, 0, sizeof(a));
> >
> > // parse time
> > memset(a, 0, 64);
> >
> > // after dead field elimination
> > memset(a, 0, 56);
> >
> > However, in the case in which no sizeof is used, we want to do the
> > following:
> >
> > // source code
> > struct astruct a;
> > memset(a, 0, 64);
> >
> > // parse time
> > memset(a, 0, 64);
> >
> > // after dead field elimination
> > memset(a, 0, 64);

But why do you think the difference of handling of sizeof(a) vs.
a constant is warranted?  It's by no means required that
whenever semantically the size of 'a' is needed you need to
write sizeof(a) but the user can just write literal 64 here.

It's the same with malloc sites btw.

So it seems you cannot use the presence or not presence
of 'sizeof' to derive semantics.

> > == How do we get what we want? ==
> >
> > Ideally what we want is to:
> >
> > * Be able to change the value returned by sizeof and offsetof at link time:
> >* possibly a global variable?
> > * Identify which values come from sizeof statement:
> >* matching identifiers?
> > * No re/define valid C identifiers:
> >* in gimple we can have an identifier we a dot in it.
> > * Disable constant propagation and other optimizations:
> >* possibly __attribute__((noipa))
> > * Be able to work with parallel compilation (make -j)
> > * Be able to work with any Makefile
> >* No C code generation and then compile and link gen code at the end.
> >
> > So, I've been thinking about multiple options:
> >
> > * Extending gimple to add support for a sizeof statement
> > * A function per struct generated during compilation (sizeof & offsetof)
> > * A variable per struct generated during compilation (sizeof and more
> > for offsetof)
> >
> > I think extending gimple to add support for a sizeof statement gets us
> > all what we want, however this would involve rewriting possibly many
> > parts of GCC. As such, I am somewhat opposed to this.
> >
> > I then thought of generating global variables during parse/time
> > compilation. In this scheme, I would replace sizeof statements with a
> > reference to a global variable (or function) that is initialized with
> > the value returned by the sizeof statement during parse time. At link
> > time we can replace initialization value if needed. For example:
> >
> > // The parser is parsing a C file
> > // it encounters a sizeof statement
> > sizeof(struct astruct);
> >
> > // Parsing is paused.
> > // Does a global variable that identifies this struct exists?
> > // I.e. size_t __lto.sizeof.astruct exists?
> > // If it doesn't create it.
> >
> > size_t __lto.sizeof.astruct = 64
> >
> > // Back to the parser
> > // instead of replacing
> > // sizeof(struct astruct) with 64
> > // replace with the following gimple:
> >
> > __lto.sizeof.astruct
> >
> > // Continue parsing until the end of file compilation.
> >
> > // If at link time we detect that we will delete a field from astruct
> > // Then we will have to look at the initialization value of
> > // __lto.sizeof.astruct and replace it with the new value.
> >
> > size_t __lto.sizeof.$identifier = 56
> >
> > This strategy can be used with global functions instead of variables and
> > it is similar. The only differences would be we would create a global
> > function instead of a variable and we would call that function to obt

Re: An problematic interaction between a call created by gimple_build_call and inlining

2020-07-01 Thread Richard Biener via Gcc
On Wed, Jul 1, 2020 at 7:49 AM Gary Oblock via Gcc  wrote:
>
> I'm trying to generate calls to "free" on the fly at ipa time.
>
> I've tried several things (given below) but they both fail
> in expand_call_inline in tree-inline.c on this gcc_checking_assert:
>
>   cg_edge = id->dst_node->get_edge (stmt);
>   gcc_checking_assert (cg_edge);

It simply means you are operating at a point where we expect
callgraph edges to be present but you fail to update the callgraph
for your added function call.  it might be as easy as calling

cgraph_node::get (cfun->decl)->create_edge (cgraph_node::get_create
(fndecl_free), free_call, gimple_bb (free_call)->count);

> Now, I've tried using the built in free via:
>
>   tree fndecl_free = builtin_decl_explicit( BUILT_IN_FREE);
>   // Note to_free is set between here and the call by an assign
>   tree to_free =
> make_temp_ssa_name( reorg_pointer_type, NULL, "malloc_to_free");
>   .
>   .
>   gcall *free_call = gimple_build_call( fndecl_free, 1, to_free);
>
> or building the fndecl from scrath:
>
>   tree fntype = build_function_type ( free_return_type, param_type_list);
>   tree fnname = get_identifier ( "free");
>   tree fndecl_free =
> build_decl ( input_location, FUNCTION_DECL, fnname, fntype);
>   gcall *free_call = gimple_build_call( fndecl_free, 1, to_free);
>
> Note, I was able to get something similar to work for "malloc" by
> using the fndecl I extracted from an existing malloc call.
>
> Your advice on how to build a fndecl that doesn't have this
> problem is appreciated.
>
> Thanks,
>
> Gary Oblock
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any review, copying, or distribution of this email (or any 
> attachments thereto) is strictly prohibited. If you are not the intended 
> recipient, please contact the sender immediately and permanently delete the 
> original and any copies of this email and any attachments thereto.


Re: Questions regarding control flow during IPA passes

2020-07-03 Thread Richard Biener via Gcc
On Fri, Jul 3, 2020 at 6:04 AM Gary Oblock via Gcc  wrote:
>
> At IPA time I'm creating GIMPLE statements. I've noticed during dumps
> that gotos and labels don't seem to exist. In fact when I tried
> introducing them, at least the gotos, failed.  I assume that at this
> point in compilation GCC relies on the control flow graph (which I'm
> updating as I create new BBs) so I actually shouldn't create them?
> Furthermore, I assume I should be setting the "gotos" in the condition
> statement to NULL?

Yes and Yes.

> Thanks,
>
> Gary Oblock
> Ampere Computing
> Santa Clara, California
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any review, copying, or distribution of this email (or any 
> attachments thereto) is strictly prohibited. If you are not the intended 
> recipient, please contact the sender immediately and permanently delete the 
> original and any copies of this email and any attachments thereto.


Re: Local optimization options

2020-07-04 Thread Richard Biener via Gcc
On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König"  wrote:
>Hi,
>
>in Fortran, it would sometimes  be useful to have a different
>optimization
>depending on whether we generate inlined code for intrinsics (where we
>know when it is OK to „go wild“) or user code, where  we need to
>adhere (for example) to IEEE semantics unless otherwise instructed
>by the user.
>
>What could be a preferred way to achieve that? Could optimization
>options like -ffast-math be applied to blocks instead of functions?
>Could we set flags on the TREE codes to allow certain optinizations?
>Other things?

The middle end can handle those things on function granularity only. 

Richard. 

>Regards, Thomas



Re: Local optimization options

2020-07-05 Thread Richard Biener via Gcc
On July 5, 2020 12:37:58 PM GMT+02:00, "Thomas König"  wrote:
>
>> Am 04.07.2020 um 19:11 schrieb Richard Biener
>:
>> 
>> On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König"
> wrote:
>>> 
>>> What could be a preferred way to achieve that? Could optimization
>>> options like -ffast-math be applied to blocks instead of functions?
>>> Could we set flags on the TREE codes to allow certain optinizations?
>>> Other things?
>> 
>> The middle end can handle those things on function granularity only. 
>> 
>> Richard. 
>
>OK, so that will not work (or not without a disproportionate
>amount of effort).  Would it be possible to set something like a
>TREE_FAST_MATH flag on TREEs? An operation could then be
>optimized according to these rules iff both operands
>had that flag, and would also have it then.

Since -ffast-math has effects on operations (-freciprocal-math) and on
Operands (-fsignalling-nans) I think we'd need both and a single flag isn't 
enough. 

I guess parts of -ffast-math could be represented on a per stmt basis already, 
-fno-trapping-math for example could be TREE_NO_TRAP and the corresponding 
gimple flag.

And yes, it would be very desirable to have all semantics fully represented in 
the IL rather than influenced by global flags. But then also optimization 
passes have to be careful to track state on that level. 

Richard. 

>
>Regards, Thomas



Re: Local optimization options

2020-07-06 Thread Richard Biener via Gcc
On Sun, Jul 5, 2020 at 4:37 PM Marc Glisse  wrote:
>
> On Sun, 5 Jul 2020, Thomas König wrote:
>
> >
> >> Am 04.07.2020 um 19:11 schrieb Richard Biener :
> >>
> >> On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König"  
> >> wrote:
> >>>
> >>> What could be a preferred way to achieve that? Could optimization
> >>> options like -ffast-math be applied to blocks instead of functions?
> >>> Could we set flags on the TREE codes to allow certain optinizations?
> >>> Other things?
> >>
> >> The middle end can handle those things on function granularity only.
> >>
> >> Richard.
> >
> > OK, so that will not work (or not without a disproportionate
> > amount of effort).  Would it be possible to set something like a
> > TREE_FAST_MATH flag on TREEs? An operation could then be
> > optimized according to these rules iff both operands
> > had that flag, and would also have it then.
>
> In order to support various semantics on floating point operations, I was
> planning to replace some trees with internal functions, with an extra
> operand to specify various behaviors (rounding, exception, etc). Although
> at least in the beginning, I was thinking of only using those functions in
> safe mode, to avoid perf regressions.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2019-August/527040.html

Note this tackles the dependency on fesetround and friends which is
of course another issue (tracking FP control and exception state).

> This may never happen now, but it sounds similar to setting flags like
> TREE_FAST_MATH that you are suggesting. I was going with functions for
> more flexibility, and to avoid all the existing assumptions about trees.
> While I guess for fast-math, the worst the assumptions could do is clear
> the flag, which would make use optimize less than possible, not so bad.

Indeed going with tree/gimple stmt flags or alternate tree codes
(PLUS_NONTRAP_EXPR?) isn't likely to scale for the myriads of
FP behavior controls we have.  So using an internal function sounds
reasonable though, given your referenced patch above, one might
want to think about that extra input (FP env) and output (FP state)
those functions will have as well.  Also extracting the important
bits from "fast-math" and thorougly documenting semantics of
what flags we use would be required.

To prevent too many bad effects on optimization one might think
of using regular PLUS_EXPR when global flags match the
specific ones on a internal-function ...

Btw, instead of using the _Complex and __real/__imag trick
for multiple defs we might want to go with more general SSA projections
or allow multiple defs on functions at least.

Richard.

> --
> Marc Glisse


Re: documentation of powerpc64{,le}-linux-gnu as primary platform

2020-07-09 Thread Richard Biener via Gcc
On July 9, 2020 3:43:19 PM GMT+02:00, David Edelsohn via Gcc  
wrote:
>On Thu, Jul 9, 2020 at 9:07 AM Matthias Klose  wrote:
>>
>> On 7/9/20 1:58 PM, David Edelsohn via Gcc wrote:
>> > On Thu, Jul 9, 2020 at 7:03 AM Matthias Klose 
>wrote:
>> >>
>> >> https://gcc.gnu.org/gcc-8/criteria.html lists the little endian
>platform first
>> >> as a primary target, however it's not mentioned for GCC 9 and GCC
>10. Just an
>> >> omission?
>> >>
>> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg00854.html
>suggests that
>> >> the little endian platform should be mentioned, and maybe the big
>endian
>> >> platform should be dropped?
>> >>
>> >> Jakub suggested to fix that for GCC 9 and GCC 10, and get a
>consensus for GCC 11.
>> >
>> > Why are you so insistent to drop big endian?  No.  Please leave
>this alone.
>>
>> No, I don't leave this alone.  The little endian target is dropped in
>GCC 9 and
>> GCC 10.  Is this really what you intended to do?
>
>No, it's not dropped.  Some people are being pedantic about the name,
>which is why Bill added {,le}.  powerpc64-unknown-linux-gnu means
>everything.  If you want to add {,le} back, that's fine.  But there
>always is some variant omitted, and that doesn't mean it is ignored.
>The more that one over-specifies and enumerates some variants, the
>more that it implies the other variants intentionally are ignored.
>
>I would appreciate that we would separate the discussion about
>explicit reference to {,le} from the discussion about dropping the big
>endian platform.

I think for primary platforms it is important to be as specific as possible 
since certain regressions are supposed to block a release. That's less of an 
issue for secondary platforms but it's still a valid concern there as well for 
build issues. 

Richard. 

>Thanks, David



Re: New x86-64 micro-architecture levels

2020-07-12 Thread Richard Biener via Gcc
On Fri, Jul 10, 2020 at 11:45 PM H.J. Lu via Gcc  wrote:
>
> On Fri, Jul 10, 2020 at 10:30 AM Florian Weimer  wrote:
> >
> > Most Linux distributions still compile against the original x86-64
> > baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
> > EM64T compatibility).
> >
> > There has been an attempt to use the existing AT_PLATFORM-based loading
> > mechanism in the glibc dynamic linker to enable a selection of optimized
> > libraries.  But the general selection mechanism in glibc is problematic:
> >
> >   hwcaps subdirectory selection in the dynamic loader
> >   
> >
> > We also have the problem that the glibc version of "haswell" is distinct
> > from GCC's -march=haswell (and presumably other compilers):
> >
> >   Definition of "haswell" platform is inconsistent with GCC
> >   
> >
> > And that the selection criteria are not what people expect:
> >
> >   Epyc and other current AMD CPUs do not select the "haswell" platform
> >   subdirectory
> >   
> >
> > Since the hwcaps-based selection does not work well regardless of
> > architecture (even in cases the kernel provides glibc with data), I
> > worked on a new mechanism that does not have the problems associated
> > with the old mechanism:
> >
> >   [PATCH 00/30] RFC: elf: glibc-hwcaps support
> >   
> >
> > (Don't be concerned that these patches have not been reviewed; we are
> > busy preparing the glibc 2.32 release, and these changes do not alter
> > the glibc ABI itself, so they do not have immediate priority.  I'm
> > fairly confident that a version of these changes will make it into glibc
> > 2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
> > Enterprise Linux 8.4.  Debian as well, but I have never done anything
> > like it there, so I don't know if the patches will be accepted.)
> >
> > Out of the box, this should work fairly well for IBM POWER and Z, where
> > there is a clear progression of silicon versions (at least on paper
> > —virtualization may blur the picture somewhat).
> >
> > However, for x86, we do not have such a clear progression of
> > micro-architecture versions.  This is not just as a result of the
> > AMD/Intel competition, but also due to ongoing product differentiation
> > within one chip vendor.  I think we need these levels broadly for the
> > following reasons:
> >
> > * Selecting on individual CPU features (similar to the old hwcaps
> >   mechanism) in glibc has scalability issues, particularly for
> >   LD_LIBRARY_PATH processing.
> >
> > * Developers need guidance about useful targets for optimization.  I
> >   think there is value in limiting the choices, in the sense that “if
> >   you are able to test three builds in total, these are the things you
> >   should build”.
> >
> > * glibc and the compilers should align in their definition of the
> >   levels, so that developers can use an -march= option to build for a
> >   particular level that is recognized by glibc.  This is why I think the
> >   description of the levels should go into the psABI supplement.
> >
> > * A preference order for these levels avoids falling back to the K8
> >   baseline if the platform progresses to a new version due to
> >   glibc/kernel/hypervisor/hardware upgrades.
> >
> > I'm including a proposal for the levels below.  I use single letters for
> > them, but I expect that the concrete implementation of this proposal
> > will use names like “x86-100”, “x86-101”, like in the glibc patch
> > referenced above.  (But we can discuss other approaches.)
> >
> > I looked at various machines in the Red Hat labs and talked to Intel and
> > AMD engineers about this, but this concrete proposal is based on my own
> > analysis of the situation.  I excluded CPU features related to
> > cryptography and cache management, including hardware transactional
> > memory, and CPU timing.  I assume that we will see some of these
> > features being disabled by the firmware or the kernel over time.  That
> > would eliminate entire levels from selection, which is not desirable.
> > For cryptographic code, I expect that localized selection of an
> > optimized implementation works because such code tends to be isolated
> > blocks, running for dozens of cycles each time, not something that gets
> > scattered all over the place by the compiler.
> >
> > We previously discussed not emitting VZEROUPPER at later levels, but I
> > don't think this is beneficial because the ABI does not have
> > callee-saved vector registers, so it can only be useful with local
> > functions (or whatever LTO considers local), where there is no ABI
> > impact anyway.
> >
> > I did not include FSGSBASE because the FS base is already available at
> > %fs:0.  Changing the FS base in userspace breaks too much,

Re: New x86-64 micro-architecture levels

2020-07-13 Thread Richard Biener via Gcc
On Mon, Jul 13, 2020 at 9:40 AM Florian Weimer  wrote:
>
> * Richard Biener:
>
> >> Looks good.  I like it.
> >
> > Likewise.  Btw, did you check that VIA family chips slot into Level A
> > at least?
>
> Those seem to lack SSE4.2, so they land in the baseline.
>
> > Where do AMD bdverN slot in?
>
> bdver1 to bdver3 (as defined by GCC) should land in Level B (so Level A
> if that is dropped).  bdver4 and znver1 (and later) should land in
> Level C.
>
> >>  My only concerns are
> >>
> >> 1. Names like “x86-100”, “x86-101”, what features do they support?
> >
> > Indeed I didn't get the -100, -101 part.  On the GCC side I'd have
> > suggested -march=generic-{A,B,C,D} implying the respective
> > -mtune.
>
> With literal A, B, C, D, or are they just placeholders?  If not literal
> levels, then what we should use there?
>
> I like the simplicity of numbers.  I used letters in the proposal to
> avoid confusion if we alter the proposal by dropping or levels, shifting
> the meaning of those that come later.  I expect to switch back to
> numbers again for the final version.

They are indeed placeholders though I somehow prefer letters to
numbers.  But this is really bike-shedding territory.  Good documentation
on the tools side will be more imporant as well as consistent spelling
between tools sets, possibly driven by a good choice from within the
psABI document.

Richard.


Re: RISC-V: `ld.so' fails linking against `libgcc.a' built at `-O0'

2020-07-13 Thread Richard Biener via Gcc
On Tue, Jul 14, 2020 at 7:24 AM Andreas Schwab  wrote:
>
> On Jul 14 2020, Maciej W. Rozycki wrote:
>
> >  Arguably this might probably be called a deficiency in libgcc, however
> > the objects are built with `-fexceptions -fnon-call-exceptions'
>
> I consider that broken.  It doesn't make any sense to build a lowlevel
> runtime library like libgcc with exceptions.

Indeed - you only need to be able to unwind through those, so
-fasynchronous-unwind-tables should be used.

Richard.

> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: Understand pointer deferences in GIMPLE

2020-07-14 Thread Richard Biener via Gcc
On Tue, Jul 14, 2020 at 9:17 AM Shuai Wang via Gcc  wrote:
>
> Hello,
>
> I am trying to traverse the GIMPlE statements and identify all pointer
> differences (i.e., memory load and store). For instance, something like:
>
>   **_4* = 0;
>...
>   _108 = (signed char *) _107;
>   _109 = **_108*;
>
> After some quick searches in the GCC codebase, I am thinking to use the
> following statements to identify variables like _4 and _108:
>
> tree op0 = gimple_op(stmt, 0);// get the left variable

Use gimple_get_lhs (stmt)

> if (TREE_CODE (op0) == SSA_NAME) {
>   struct ptr_info_def *pi = SSA_NAME_PTR_INFO (op0);
>   if (pi) {

That's the wrong thing to look at.  You can use gimple_store_p
which also can end up with DECL_P in position op0.

But what you are running into is that the LHS of *_4 = 0; is _not_
the SSA name _4 but a MEM_REF tree with tree operand zero
being the SSA name _4.

> std::cerr << "find a store\n";
> return STORE;
>   }
> }
>
> However, to my surprise, variables like _4 just cannot be matched. Actually
> _4 and _108 will be both treated as "NOT" SSA_NAME, and therefore cannot
> satisfy the first if condition anyway.
>
> So here is my question:
>
> 1. How come variables like _4 and _108 are NOT ssa forms?
> 2. then, what would be the proper way of identifying pointer dereferences,
> something like *_4 = 0; and _109 = *_108 + 1?
>
> Best,
> Shuai


Re: GCC Plugin to insert new expressions/statements in the code

2020-07-15 Thread Richard Biener via Gcc
On Tue, Jul 14, 2020 at 11:23 PM Masoud Gholami  wrote:
>
> Hi,
>
> I am writing a plugin that  uses the PLUGIN_PRAGMAS event to register a 
> custom pragma that is expected to be before a function call as follows:
>
> int main() {
>
> char *filename = “path/to/file”;
> #pragma inject_before_call
> File *f = fopen(filename, …);   // marked fopen (by the 
> pragma)
> …
> fclose(f);
> char *filename2 = “path/to/file2”;
> File *f2 = fopen(filename2, …); // non-marked fopen
> …
> fclose(f2);
> return 0;
>
> }
>
> In fact, I am using the inject_before_call pragma to mark some fopen calls in 
> the code (in this example, the first  fopen call is marked). Then, for each 
> marked fopen call, some extra expressions/statements/declarations are 
> injected into the code before calling the marked function. For example, the 
> above main function would be transformed as follows:
>
> int main() {
>
> char *filename = “/path/to/file”;
> File *tmp_f = fopen(“/path/to/another/file”, “w+");
> fclose(tmp_f);
> File *f = fopen(filename, …);
> …
> fclose(f);
> char *filename2 = “path/to/file2”;  // codes not injected for the 
> non-marked fopen
> File *f2 = fopen(filename2, …);
> …
> fclose(f2);
> return 0;
>
> }
>
> Here, because of the inject_before_call pragma, the grey code is injected 
> into the main function before calling the marked fopen. It simply opens a new 
> file (“/path/to/another/file”) and closes it.
> The thing about the injected code is that it should be inserted only if a 
> fopen call is marked by a inject_before_call pragma. And if after the 
> inject_before_call pragma no fopen calls are made, the user gets an error 
> (the pragma should be only inserted before a fopen call).
>
> I implemented this in 3 steps as follows:
>
> 1. detection of the marked fopen calls: I created a pragma_handler which 
> remembers the location_t of all inject_before_call pragmas. Then using a pass 
> (before ssa), I look for the statements/expressions that are in the next line 
> of each remembered location. If it’s a fopen call, it is considered as a 
> marked call and the code should be inserted before the fopen call. If it’s 
> something other than a fopen call, an error will be generated. However, I’m 
> not aware if there are any better ways to detect the marked calls.
>
> Here is the simplified pass to find the marked fopen calls (generating errors 
> not covered):
>
> unsigned int execute(function *func) {
> basic_block bb;
> FOR_EACH_BB_FN (bb, func) {
> for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); 
> gsi_next (&gsi)) {
> gimple *stmt = gsi_stmt (gsi);
> if (gimple_is_fopen(stmt)) {
> if (marked_fopen(stmt)) {
> handle_marked_fopen(stmt);
> }
> }
> }
> }
> }
>
> 2. create the GIMPLE representation of the code to be injected: after finding 
> the marked fopen calls, I construct some declaration and expressions as 
> follows:
>
> // create the strings “/path/to/another/file" and “w+"
> tree another_path = build_string (20, “/path/to/another/file");
> fix_string_type (another_path);
> tree mode = build_string (3, “w+\0");
> fix_string_type (mode);
>
> // create a call to the fopen function with the created strings
> tree fopen_decl = lookup_qualified_name (global_namespace, 
> get_identifier("fopen"), 0, true, false);
> gimple *new_open_call = gimple_build_call(fopen_decl, 2, another_path, mode);
>
> // create the tmp_f declaration
> f_decl = build_decl(UNKNOWN_LOCATION, VAR_DECL, get_identifier(“tmp_f"), 
> fileptr_type_node);
> pushdecl (f_decl);
> rest_of_decl_compilation (f_decl, 0, 0);

That's the wrong interface for GIMPLE code.  Is f_decl supposed to be
a global variable
or a function local one?  For the latter simply use

 f_decl = create_tmp_var (fileptr_type_node, "tmp_f");

> // set the lhs of the fopen call to be f_decl
> gimple_call_set_lhs(new_open_call, f_decl)
>
> // create a call to the fclose function with the tmp_f variable
> tree fclose_decl = lookup_qualified_name (global_namespace, 
> get_identifier("fclose"), 0, true, false);

Likewise lookup_qualified_name is a frontend specific function, since
there's no builtin declaration
for fclose you'll have to build one yourself.

> gimple *new_close_call = gimple_build_call(fclose_decl, 1, f_decl);
>
>
> 3. add the created GIMPLE trees to the code (basic-blocks):
>
> basic_block bb = gimple_bb(stmt);
> for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next 
> (&gsi)) { gimple *st = gsi_stmt (gsi);
> if (st == stmt) {  // the marked fopen call
> gsi_insert_before(&gsi, new_open_call, GSI_NEW_STMT);
> gsi_insert_after(&gsi

Re: Crash at gimple_code(gimple* )

2020-07-15 Thread Richard Biener via Gcc
On Wed, Jul 15, 2020 at 9:30 AM Shuai Wang via Gcc  wrote:
>
> Hello,
>
> I am using the following code to iterate different gimple statements:
>
> ...
>  gimple* stmt = gsi_stmt(gsi);
> if (gimple_assign_load_p(stmt)) {
>  tree rhs = gimple_assign_rhs1 (stmt);
>  if (!rhs) return;
>   gimple* def_stmt = SSA_NAME_DEF_STMT(rhs);
>   if (!def_stmt) return;
>
>  switch (gimple_code (def_stmt)) {
>  
>  }
> }
>
> While the above code works smoothly for most of the cases, to my surprise,
> the following statement (pointed by gsi) would cause a crash at
> gimple_code(def_stmt):
>
> stderr.9_1 = stderr;
>
> It seems that `stderr` is a special tree node; however, it successfully
> passes the two if checks and reaches the gimple_code(def_stmt), but still
> caused an exception:
>
> 0xb5cd5f crash_signal
> ../../gcc-10.1.0/gcc/toplev.c:328
> 0x7f4214557838 gimple_code
>
> /export/d1/shuaiw/gcc-build-10/gcc-install/lib/gcc/x86_64-pc-linux-gnu/10.1.0/plugin/include/gimple.h:1783
> 
>
> Am I missing anything?

I see you're working on 10.1, please make sure to configure your
development compiler with
--enable-checking which would have said that SSA_NAME_DEF_STMT expects
an SSA name
argument but you are passing it a VAR_DECL.

Richard.


> Best,
> Shuai


Re: Default defs question

2020-07-15 Thread Richard Biener via Gcc
On July 16, 2020 7:09:21 AM GMT+02:00, Gary Oblock via Gcc  
wrote:
>Regarding the other question I asked today could somebody explain to
>me what the default_defs are all about. 

Default defs are SSA names without an explicit defining statement for example 
those representing values at function entry. They are also used for 
uninitialized variables. 

I suspect I'm doing something
>wrong with regard of them. Note, I've isolated the failure in the last
>email
>down to this bit (in red):
>
>if (is_empty (*entry)
>|| (!is_deleted (*entry) && Descriptor::equal (*entry, comparable))
>
>Which doesn't make much sense to me.
>
>Thanks,
>
>Gary
>
>
>CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>is for the sole use of the intended recipient(s) and contains
>information that is confidential and proprietary to Ampere Computing or
>its subsidiaries. It is to be used solely for the purpose of furthering
>the parties' business relationship. Any review, copying, or
>distribution of this email (or any attachments thereto) is strictly
>prohibited. If you are not the intended recipient, please contact the
>sender immediately and permanently delete the original and any copies
>of this email and any attachments thereto.



Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  wrote:
>
> * Dongsheng Song:
>
> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> > python's platform tags (e.g. manylinux2010, manylinux2014).
>
> I started out with a year number, but that was before the was Level A.
> Too many new CPUs only fall under level A unfortunately because they do
> not even have AVX.  This even applies to some new server CPU designs
> released this year.
>
> I'm concerned that putting a year into the level name suggests that
> everything main-stream released after that year supports that level, and
> that's not true.  I think for manylinux, it's different, and it actually
> works out there.  No one is building a new GNU/Linux distribution that
> is based on glibc 2.12 today, for example.  But not so much for x86
> CPUs.
>
> If you think my worry is unfounded, then a year-based approach sounds
> compelling.

I think the main question is whether those levels are supposed to be
an implementation detail hidden from most software developer or
if people are expected to make concious decisions between
-march=x86-100 and -march=x86-101.  Implementation detail
for system integrators, that is.

If it's not merely an implementation detail then names without
any chance of providing false hints (x86-2014 - oh, it will
run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
course I want avx2) is better.  But this also means this feature
should come with extensive documentation on how it is
supposed to be used.  For example we might suggest ISVs
provide binaries for all architecture levels or use IFUNCs
or other runtime CPU selection capabilities.  It's also required
to provide a (extensive?) list of SKUs that fall into the respective
categories (probably up to CPU vendors to amend those).
Since this is a feature crossing multiple projects - at least
glibc and GCC - sharing the source of said documentation
would be important.

So for the bike-shedding I indeed think x86-10{0,1,2,3}
or x86-{A,B,C,..}, eventually duplicating as x86_64- as
suggested by Jan is better than x86-2014 or x86-avx2.

Richard.

> Thanks,
> Florian
>


Re: Three issues

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
>
> Some background:
>
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
>
> My issues in order of importance are:
>
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
>
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
>
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
>
>   /* Return true if the DECL_UID in both trees are equal.  */
>
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == 
> b->ssa_name.var->decl_minimal.uid);
>   }
> };
>
> The parameter "a" is associated with "*entry" on the 2nd to last
> line shown (it's trimmed off after that.) This from hash-table.h:
>
> template template class Allocator>
> typename hash_table::value_type &
> hash_table
> ::find_with_hash (const compare_type &comparable, hashval_t hash)
> {
>   m_searches++;
>   size_t size = m_size;
>   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
>
>   if (Lazy && m_entries == NULL)
> m_entries = alloc_entries (size);
>
> #if CHECKING_P
>   if (m_sanitize_eq_and_hash)
> verify (comparable, hash);
> #endif
>
>   value_type *entry = &m_entries[index];
>   if (is_empty (*entry)
>   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> return *entry;
>   .
>   .
>
> Is there any way this could happen other than by a memory corruption
> of some kind? This is a show stopper for me and I really need some help on
> this issue.
>
> 2) I tried to dump out all the gimple in the following way at the very
> beginning of my program:
>
> void
> print_program ( FILE *file, int leading_space )
> {
>   struct cgraph_node *node;
>   fprintf ( file, "%*sProgram:\n", leading_space, "");
>
>   // Print Global Decls
>   //
>   varpool_node *var;
>   FOR_EACH_VARIABLE ( var)
>   {
> tree decl = var->decl;
> fprintf ( file, "%*s", leading_space, "");
> print_generic_decl ( file, decl, (dump_flags_t)0);
> fprintf ( file, "\n");
>   }
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
> struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> dump_function_header ( file, func->decl, (dump_flags_t)0);
> dump_function_to_file ( func->decl, file, (dump_flags_t)0);
>   }
> }
>
> When I run this the first two (out of three) functions print
> just fine. However, for the third, func->decl is (nil) and
> it segfaults.
>
> Now the really odd thing is that this works perfectly at the
> end or middle of my optimization.
>
> What gives?
>
> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
>
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
>
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88 alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*) 
> (ipa-prototype.c:329)
> ==18572==by 0x106E987: gcc::pass_manager::pass_manager(gcc::context*) 
> (pass-instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool) (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
>
> Are these known issues with lto or is this a valgrind issue?

It smells like you are modifying IL via APIs that rely on cfun set to the
function you are modifying.  Note such API dependence might be not
obvious so it's advisable to do

 push_cfun (function to modify);
... modify IL of function ...
 pop_cfun ();

note push/pop_cfun can be expensive so try to glob function modifications.
That said, the underlying issue is likely garbage collector related - try
building with --enable-valgrind-annotations which makes valgrind a bit more
GCC GC aware.

Richard.

> Thanks,
>
> Gary
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its s

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:16 PM Florian Weimer  wrote:
>
> * Richard Biener:
>
> > On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> > wrote:
> >>
> >> * Dongsheng Song:
> >>
> >> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> >> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> >> > python's platform tags (e.g. manylinux2010, manylinux2014).
> >>
> >> I started out with a year number, but that was before the was Level A.
> >> Too many new CPUs only fall under level A unfortunately because they do
> >> not even have AVX.  This even applies to some new server CPU designs
> >> released this year.
> >>
> >> I'm concerned that putting a year into the level name suggests that
> >> everything main-stream released after that year supports that level, and
> >> that's not true.  I think for manylinux, it's different, and it actually
> >> works out there.  No one is building a new GNU/Linux distribution that
> >> is based on glibc 2.12 today, for example.  But not so much for x86
> >> CPUs.
> >>
> >> If you think my worry is unfounded, then a year-based approach sounds
> >> compelling.
> >
> > I think the main question is whether those levels are supposed to be
> > an implementation detail hidden from most software developer or
> > if people are expected to make concious decisions between
> > -march=x86-100 and -march=x86-101.  Implementation detail
> > for system integrators, that is.
>
> Anyone who wants to optimize their software something that's more
> current than what was available in 2003 has to think about this in some
> form.
>
> With these levels, I hope to provide a pre-packaged set of choices, with
> a consistent user interface, in the sense that -march= options and file
> system locations match.  Programmers will definitely encounter these
> strings, and they need to know what they mean for their users.  We need
> to provide them with the required information so that they can make
> decisions based on their knowledge of their user base.  But the ultimate
> decision really has to be a programmer choice.
>
> I'm not sure if GCC documentation or glibc documentation would be the
> right place for this.  An online resource that can be linked to directly
> seems more appropriate.
>
> Apart from that, there is the more limited audience of general purpose
> distribution builders.  I expect they will pick one of these levels to
> build all the distribution binaries, unless they want to be stuck in
> 2003.  But as long they do not choose the highest level defined,
> programmers might still want to provide optimized library builds for
> run-time selection, and then they need the same guidance as before.
>
> > If it's not merely an implementation detail then names without
> > any chance of providing false hints (x86-2014 - oh, it will
> > run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> > course I want avx2) is better.  But this also means this feature
> > should come with extensive documentation on how it is
> > supposed to be used.  For example we might suggest ISVs
> > provide binaries for all architecture levels or use IFUNCs
> > or other runtime CPU selection capabilities.
>
> I think we should document the mechanism as best as we can, and provide
> intended use cases.  We shouldn't go as far as to tell programmers what
> library versions they must build, except that they should always include
> a fallback version if no optimized library can be selected.
>
> Describing the interactions with IFUNCs also makes sense.
>
> But I think we should not go overboard with this.  Historically, we've
> done not such a great job with documenting toolchain features, I know,
> and we should do better now.  I will try to write something helpful, but
> it should still match the relative importance of this feature.
>
> > It's also required to provide a (extensive?) list of SKUs that fall
> > into the respective categories (probably up to CPU vendors to amend
> > those).
>
> I'm afraid, but SKUs are not very useful in this context.
> Virtualization can disable features (e.g., some cloud providers
> advertise they use certain SKUs, but some features are not available to
> guests), and firmware updates have done so as well.  I think the only
> way is to document our selection criteria, and encourage CPU vendors to
> enhance their SKU browsers so that you can search by the (lack of)
> support for certain CPU features.
>
> The selection criteria I suggested should not be affected by firmware
> and microcode updates at least (I took that into consideration), but
> it's just not possible to achieve virtualization and kernel version
> independence, given that some features based on which we want to make
> library selections demand kernel and hypervisor support.
>
> > Since this is a feature crossing multiple projects - at least
> > glibc and GCC - sharing the source of said documentation
> > would be important.
>
> Technically, the GCC web site would work for me.

Re: Three issues

2020-07-22 Thread Richard Biener via Gcc
On Thu, Jul 23, 2020 at 5:32 AM Gary Oblock  wrote:
>
> Richard,
>
> My wolf fence failed to detect an issue at the end of my pass
> so I'm now hunting for a problem I caused in a following pass.
>
> Your thoughts?

Sorry - I'd look at the IL after your pass for obvious mistakes.
All default defs need to have a VAR_DECL associated as
SSA_NAME_VAR.

> Gary
>
> - Wolf Fence Follows -
> int
> wf_func ( tree *slot, tree *dummy)
> {
>   tree t_val = *slot;
>   gcc_assert( t_val->ssa_name.var);
>   return 0;
> }
>
> void
> wolf_fence (
> Info *info // Pass level gobal info (might not use it)
>   )
> {
>   struct cgraph_node *node;
>   fprintf( stderr,
>   "Wolf Fence: Find wolf via gcc_assert(t_val->ssa_name.var)\n");
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> {
>   struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
>   push_cfun ( func);
>   DEFAULT_DEFS ( func)->traverse_noresize < tree *, wf_func> ( NULL);
>   pop_cfun ();
> }
>   fprintf( stderr, "Wolf Fence: Didn't find wolf!\n");
> }
> 
> From: Richard Biener 
> Sent: Wednesday, July 22, 2020 2:32 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Three issues
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
> >
> > Some background:
> >
> > This is in the dreaded structure reorganization optimization that I'm
> > working on. It's running at LTRANS time with '-flto-partition=one'.
> >
> > My issues in order of importance are:
> >
> > 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> > has a segfault because the "var" field of "a" is (nil).
> >
> > struct ssa_name_hasher : ggc_ptr_hash
> > {
> >   /* Hash a tree in a uid_decl_map.  */
> >
> >   static hashval_t
> >   hash (tree item)
> >   {
> > return item->ssa_name.var->decl_minimal.uid;
> >   }
> >
> >   /* Return true if the DECL_UID in both trees are equal.  */
> >
> >   static bool
> >   equal (tree a, tree b)
> >   {
> >   return (a->ssa_name.var->decl_minimal.uid == 
> > b->ssa_name.var->decl_minimal.uid);
> >   }
> > };
> >
> > The parameter "a" is associated with "*entry" on the 2nd to last
> > line shown (it's trimmed off after that.) This from hash-table.h:
> >
> > template > template class Allocator>
> > typename hash_table::value_type &
> > hash_table
> > ::find_with_hash (const compare_type &comparable, hashval_t hash)
> > {
> >   m_searches++;
> >   size_t size = m_size;
> >   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
> >
> >   if (Lazy && m_entries == NULL)
> > m_entries = alloc_entries (size);
> >
> > #if CHECKING_P
> >   if (m_sanitize_eq_and_hash)
> > verify (comparable, hash);
> > #endif
> >
> >   value_type *entry = &m_entries[index];
> >   if (is_empty (*entry)
> >   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> > return *entry;
> >   .
> >   .
> >
> > Is there any way this could happen other than by a memory corruption
> > of some kind? This is a show stopper for me and I really need some help on
> > this issue.
> >
> > 2) I tried to dump out all the gimple in the following way at the very
> > beginning of my program:
> >
> > void
> > print_program ( FILE *file, int leading_space )
> > {
> >   struct cgraph_node *node;
> >   fprintf ( file, "%*sProgram:\n", leading_space, "");
> >
> >   // Print Global Decls
> >   //
> >   varpool_node *var;
> >   FOR_EACH_VARIABLE ( var)
> >   {
> > tree decl = var->decl;
> > fprintf ( file, "%*s", leading_space, "");
> > print_generic_decl ( file, decl, (dump_flags_t)0);
> > fprintf ( file, "\n");
> >   }
> >
> >   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> >   {
> > struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> > dump_function_header ( file, func->decl, (dump_flags_t)0);
> > dump_function_to_file ( func->decl, file, (dump_flags_t)0);
> >   }
> > }
> >
> > When I run this the first two (out of three) functions print
> > just fine. However, for the third, func->decl is (nil) and
> > it segfaults.
> >
> > Now the really odd thing is that this works perfectly at the
> > end or middle of my optimization.
> >
> > What gives?
> >
> > 3) For my bug in (1) I got so distraught that I ran valgrind which
> > in my experience is an act of desperation for compilers.
> >
> > None of the errors it spotted are associated with my optimization
> > (although it oh so cleverly pointed out the segfault) however it
> > showed the following:
> >
> > ==18572== Invalid read of size 8
> > ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> > ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> > ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> > ==18572==by 0x9915A9: lto_main() (lto.c:653)
> > ==18572==b

Re: Problems with changing the type of an ssa name

2020-07-24 Thread Richard Biener via Gcc
On July 25, 2020 7:30:48 AM GMT+02:00, Gary Oblock via Gcc  
wrote:
>If you've followed what I've been up to via my questions
>on the mailing list, I finally traced my latest big problem
>back to to my own code. In a nut shell here is what
>I'm doing.
>
>I'm creating a new type exaactly like this:
>
>tree pointer_rep =
>  make_signed_type ( TYPE_PRECISION ( pointer_sized_int_node));
>TYPE_MAIN_VARIANT ( pointer_rep) =
>  TYPE_MAIN_VARIANT ( pointer_sized_int_node);
>const char *gcc_name =
>identifier_to_locale ( IDENTIFIER_POINTER ( TYPE_NAME (
>ri->gcc_type)));
>size_t len =
>  strlen ( REORG_SP_PTR_PREFIX) + strlen ( gcc_name);
>char *name = ( char *)alloca(len + 1);
>strcpy ( name, REORG_SP_PTR_PREFIX);
>strcat ( name, gcc_name);
>TYPE_NAME ( pointer_rep) = get_identifier ( name);
>
>I detect an ssa_name that I want to change to have this type
>and change it thusly. Note, this particular ssa_name is a
>default def which I seems to be very pertinent (since it's
>the only case that fails.)
>
>modify_ssa_name_type ( an_ssa_name, pointer_rep);
>
>void
>modify_ssa_name_type ( tree ssa_name, tree type)
>{
>  // This rips off the code in make_ssa_name_fn with a
>  // modification or two.
>
>  if ( TYPE_P ( type) )
>{
>   TREE_TYPE ( ssa_name) = TYPE_MAIN_VARIANT ( type);
>   if ( ssa_defined_default_def_p ( ssa_name) )
>  {
> // I guessing which I know is a terrible thing to do...
> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, TYPE_MAIN_VARIANT ( type));
>   }
> else
>   {
>   // The following breaks defaults defs hence the check above.
> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, NULL_TREE);
>   }
>}
> else
>{
>  TREE_TYPE ( ssa_name) = TREE_TYPE ( type);
>  SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, type);
>}
>}
>
>After this it dies when trying to call print_generic_expr with the ssa
>name.
>
>Here's the bottom most complaint from the internal error:
>
>tree check: expected tree that contains ‘decl minimal’ structure, have
>‘integer_type’ in dump_generic_node, at tree-pretty-print.c:3154
>
>Can anybody tell what I'm doing wrong?

Do not modify existing SSA names, instead create a new one and replace uses of 
the old. 

Richard. 

>Thank,
>
>Gary
>
>
>
>
>CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>is for the sole use of the intended recipient(s) and contains
>information that is confidential and proprietary to Ampere Computing or
>its subsidiaries. It is to be used solely for the purpose of furthering
>the parties' business relationship. Any review, copying, or
>distribution of this email (or any attachments thereto) is strictly
>prohibited. If you are not the intended recipient, please contact the
>sender immediately and permanently delete the original and any copies
>of this email and any attachments thereto.



Re: Problems with changing the type of an ssa name

2020-07-25 Thread Richard Biener via Gcc
On July 25, 2020 10:47:59 PM GMT+02:00, Gary Oblock  
wrote:
>Richard,
>
>I suppose that might be doable but aren't there any ramifications
>from the fact that the problematic ssa_names are the default defs?
>I can imagine easily replacing all the ssa names except those that
>are default defs.

Well, just changing the SSA names doesn't make it less ramifications. You have 
to know what you are doing. 

So - what's the reason you need to change those SSA name types? 

Richard. 

>Gary
>
>From: Richard Biener 
>Sent: Friday, July 24, 2020 11:16 PM
>To: Gary Oblock ; Gary Oblock via Gcc
>; gcc@gcc.gnu.org 
>Subject: Re: Problems with changing the type of an ssa name
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>On July 25, 2020 7:30:48 AM GMT+02:00, Gary Oblock via Gcc
> wrote:
>>If you've followed what I've been up to via my questions
>>on the mailing list, I finally traced my latest big problem
>>back to to my own code. In a nut shell here is what
>>I'm doing.
>>
>>I'm creating a new type exaactly like this:
>>
>>tree pointer_rep =
>>  make_signed_type ( TYPE_PRECISION ( pointer_sized_int_node));
>>TYPE_MAIN_VARIANT ( pointer_rep) =
>>  TYPE_MAIN_VARIANT ( pointer_sized_int_node);
>>const char *gcc_name =
>>identifier_to_locale ( IDENTIFIER_POINTER ( TYPE_NAME (
>>ri->gcc_type)));
>>size_t len =
>>  strlen ( REORG_SP_PTR_PREFIX) + strlen ( gcc_name);
>>char *name = ( char *)alloca(len + 1);
>>strcpy ( name, REORG_SP_PTR_PREFIX);
>>strcat ( name, gcc_name);
>>TYPE_NAME ( pointer_rep) = get_identifier ( name);
>>
>>I detect an ssa_name that I want to change to have this type
>>and change it thusly. Note, this particular ssa_name is a
>>default def which I seems to be very pertinent (since it's
>>the only case that fails.)
>>
>>modify_ssa_name_type ( an_ssa_name, pointer_rep);
>>
>>void
>>modify_ssa_name_type ( tree ssa_name, tree type)
>>{
>>  // This rips off the code in make_ssa_name_fn with a
>>  // modification or two.
>>
>>  if ( TYPE_P ( type) )
>>{
>>   TREE_TYPE ( ssa_name) = TYPE_MAIN_VARIANT ( type);
>>   if ( ssa_defined_default_def_p ( ssa_name) )
>>  {
>> // I guessing which I know is a terrible thing to do...
>> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, TYPE_MAIN_VARIANT (
>type));
>>   }
>> else
>>   {
>>   // The following breaks defaults defs hence the check
>above.
>> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, NULL_TREE);
>>   }
>>}
>> else
>>{
>>  TREE_TYPE ( ssa_name) = TREE_TYPE ( type);
>>  SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, type);
>>}
>>}
>>
>>After this it dies when trying to call print_generic_expr with the ssa
>>name.
>>
>>Here's the bottom most complaint from the internal error:
>>
>>tree check: expected tree that contains ‘decl minimal’ structure, have
>>‘integer_type’ in dump_generic_node, at tree-pretty-print.c:3154
>>
>>Can anybody tell what I'm doing wrong?
>
>Do not modify existing SSA names, instead create a new one and replace
>uses of the old.
>
>Richard.
>
>>Thank,
>>
>>Gary
>>
>>
>>
>>
>>CONFIDENTIALITY NOTICE: This e-mail message, including any
>attachments,
>>is for the sole use of the intended recipient(s) and contains
>>information that is confidential and proprietary to Ampere Computing
>or
>>its subsidiaries. It is to be used solely for the purpose of
>furthering
>>the parties' business relationship. Any review, copying, or
>>distribution of this email (or any attachments thereto) is strictly
>>prohibited. If you are not the intended recipient, please contact the
>>sender immediately and permanently delete the original and any copies
>>of this email and any attachments thereto.



Re: TImode for BITS_PER_WORD=32 targets

2020-07-27 Thread Richard Biener via Gcc
On Fri, Jul 24, 2020 at 5:38 PM Andrew Stubbs  wrote:
>
> Hi all,
>
> I want amdgcn to be able to support int128 types, partly because they
> might come up in code offloaded from x86_64 code, and partly because
> libgomp now requires at least some support (amdgcn builds have been
> failing since yesterday).
>
> But, amdgcn has 32-bit registers, and therefore defines BITS_PER_WORD to
> 32, which means that TImode doesn't Just Work, at least not for all
> operators. It already has TImode moves, for internal uses, so I can
> enable TImode and fix the libgomp build, but now libgfortran tries to
> use operators that don't exist, so I'm no better off.
>
> The expand pass won't emit libgcc calls, like it does for DImode, and
> libgcc doesn't have the routines for it anyway. Neither does it support
> synthesized shifts or rotates for more than double-word types.
> (Multiple-word add and subtract appear to work fine, however.)
>
> What would be the best (least effort) way to implement this?
>
> I think I need shift, rotate, multiply, divide, and modulus, but there's
> probably more.

You've figured out that TImode support for SImode word_mode targets
is not implemented in generic code.  So what you need to do is either
provide patterns for all of the operations you need or implement
generic support for libgcc fallbacks (which isn't there).  Joseph might
have an idea what's missing and how difficult it would be (I suppose
we do not want divti3 to end up calling divdi3, thus "stage" TImode
support ontop of DImode ops eventually provided by libgcc only).
libgcc2.c uses LIBGCC2_UNITS_PER_WORD so it _might_ be
possible to somehow do this "staging" by providing two different
values here.  I guess you'd have to try.

Richard.

> Thanks, any advise will be appreciated.
>
> Andrew


Re: Problems with changing the type of an ssa name

2020-07-27 Thread Richard Biener via Gcc
On Sun, Jul 26, 2020 at 10:31 PM Gary Oblock  wrote:
>
> Richard,
>
> As you know I'm working on a structure reorganization optimization.
> The particular one I'm working on is called instance interleaving.
> For the particular case I'm working on now, there is a single array
> of structures being transformed, a pointer to an element of the
> array is transformed into an index into what is now a structure
> of arrays. Note, I did share my HL design document with you so
> there are more details in there if you need them. So what all this
> means is for this example
>
> typedef struct fu fu_t;
> struct fu {
>   char x;
>   inty;
>   double z;
> };
>   :
>   :
>   fu_t *fubar = (fu_t*)malloc(...);
>   fu_t *baz;
>
> That fubar and baz no longer are pointer types and need to be
> transformed into some integer type (say _index_fu_t.) Thus if
> I encounter an ssa_name of type "fu_t *", I'll need to modify its
> type be _index_fu_t. This is of course equivalent to replacing
> that ssa name with a new one of type _index_fu_t.
>
> Now, how do I actually do either of these? My attempts at
> former all failed and the  later seems equally difficult for
> the default defs. Note, prefer modifying them to replacing
> them because it seems more reasonable and it also seems
> to work except for the default defs.
>
> I really need some help with this Richard.

OK, so modifying the SSA name in-place is really bad here
since you _have_ to adjust all uses and defs anyway.  Thus
please create a new SSA name here.

The default-def case you run into is either an uninitialized
value which can easily appear with conditionally initialized
pointers or the SSA name associated with the value of
a function argument.

Once you have to deal with a default def you have to create
a new underlying VAR_DECL (or PARM_DECL if it was a
parameter) with the new type and for the SSA replacement
create its default def (get_or_create_ssa_default_def).

Now for parameters this of course means you have to
adjust function signatures and calls.  For the function
boundary case you'll likely need to pass a pointer to
the structure as well which means you'll have to add
parameters.

As for "replacing" uses you can use immediate uses
to walk them:

 FOR_EACH_IMM_USE_STMT (...)
FOR_EACH_IMM_USE_ON_STMT (..)
...

also the SSA definition statement after your transform
cannot be the same so you have to create another
stmt anyway, no?

Richard.

> Thanks,
>
> Gary
> 
> From: Richard Biener 
> Sent: Saturday, July 25, 2020 10:48 PM
> To: Gary Oblock ; gcc@gcc.gnu.org 
> Subject: Re: Problems with changing the type of an ssa name
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On July 25, 2020 10:47:59 PM GMT+02:00, Gary Oblock 
>  wrote:
> >Richard,
> >
> >I suppose that might be doable but aren't there any ramifications
> >from the fact that the problematic ssa_names are the default defs?
> >I can imagine easily replacing all the ssa names except those that
> >are default defs.
>
> Well, just changing the SSA names doesn't make it less ramifications. You 
> have to know what you are doing.
>
> So - what's the reason you need to change those SSA name types?
>
> Richard.
>
> >Gary
> >
> >From: Richard Biener 
> >Sent: Friday, July 24, 2020 11:16 PM
> >To: Gary Oblock ; Gary Oblock via Gcc
> >; gcc@gcc.gnu.org 
> >Subject: Re: Problems with changing the type of an ssa name
> >
> >[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
> >Please be mindful of safe email handling and proprietary information
> >protection practices.]
> >
> >
> >On July 25, 2020 7:30:48 AM GMT+02:00, Gary Oblock via Gcc
> > wrote:
> >>If you've followed what I've been up to via my questions
> >>on the mailing list, I finally traced my latest big problem
> >>back to to my own code. In a nut shell here is what
> >>I'm doing.
> >>
> >>I'm creating a new type exaactly like this:
> >>
> >>tree pointer_rep =
> >>  make_signed_type ( TYPE_PRECISION ( pointer_sized_int_node));
> >>TYPE_MAIN_VARIANT ( pointer_rep) =
> >>  TYPE_MAIN_VARIANT ( pointer_sized_int_node);
> >>const char *gcc_name =
> >>identifier_to_locale ( IDENTIFIER_POINTER ( TYPE_NAME (
> >>ri->gcc_type)));
> >>size_t len =
> >>  strlen ( REORG_SP_PTR_PREFIX) + strlen ( gcc_name);
> >>char *name = ( char *)alloca(len + 1);
> >>strcpy ( name, REORG_SP_PTR_PREFIX);
> >>strcat ( name, gcc_name);
> >>TYPE_NAME ( pointer_rep) = get_identifier ( name);
> >>
> >>I detect an ssa_name that I want to change to have this type
> >>and change it thusly. Note, this particular ssa_name is a
> >>default def which I seems to be very pertinent (since it's
> >>the only case that fails.)
> >>
> >>modify_ssa_name_type ( an_ssa_name, pointer_rep);
> >>
> >>void
> >>modify_ssa

Re: Tar version being used

2020-07-27 Thread Richard Biener via Gcc
On Mon, Jul 27, 2020 at 12:59 PM CHIGOT, CLEMENT via Gcc
 wrote:
>
> Hi everyone,
>
> I'm wondering if someone knows which tar version / configuration was being 
> used when creating gcc-10.2.0 tarballs ?
>
> I'm getting some directory checksum errors while trying to unpack it with the 
> AIX tar (which can be a bit old). But they are disappearing when I'm building 
> these tarballs on Ubuntu-18.04, even with the last tar version 1.32.
>
> Note that gcc-10.1.0 doesn't have these problems, so maybe something have 
> changed since.

I have used tar 1.30 as shipped by openSUSE Leap 15.1
(tar-1.30-lp151.2.1.x86_64)

Richard.

> Sincerely
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France


Re: LTO Dead Field Elimination

2020-07-27 Thread Richard Biener via Gcc
On Fri, Jul 24, 2020 at 5:43 PM Erick Ochoa
 wrote:
>
> This patchset brings back struct reorg to GCC.
>
> We’ve been working on improving cache utilization recently and would
> like to share our current implementation to receive some feedback on it.
>
> Essentially, we’ve implemented the following components:
>
>  Type-based escape analysis to determine if we can reorganize a type
> at link-time
>
>  Dead-field elimination to remove unused fields of a struct at
> link-time
>
> The type-based escape analysis provides a list of types, that are not
> visible outside of the current linking unit (e.g. parameter types of
> external functions).
>
> The dead-field elimination pass analyses non-escaping structs for fields
> that are not used in the linking unit and thus can be removed. The
> resulting struct has a smaller memory footprint, which allows for a
> higher cache utilization.
>
> As a side-effect a couple of new infrastructure code has been written
> (e.g. a type walker, which we were really missing in GCC), which can be
> of course reused for other passes as well.
>
> We’ve prepared a patchset in the following branch:
>
>refs/vendors/ARM/heads/arm-struct-reorg-wip

Just had some time to peek into this.  Ugh.  The code doesn't look like
GCC code looks :/  It doesn't help to have one set of files per C++ class (25!).
The code itself is undocumented - it's hard to understand what the purpose
of all the Walker stuff is.

You also didn't seem to know walk_tree () nor walk_gimple* ().

Take as example - I figured to look for IPA pass entries, then I see

+
+static void
+collect_types ()
+{
+  GimpleTypeCollector collector;
+  collector.walk ();
+  collector.print_collected ();
+  ptrset_t types = collector.get_pointer_set ();
+  GimpleCaster caster (types);
+  caster.walk ();
+  if (flag_print_cast_analysis)
+caster.print_reasons ();
+  ptrset_t casting = caster.get_sets ();
+  fix_escaping_types_in_set (casting);
+  GimpleAccesser accesser;
+  accesser.walk ();
+  if (flag_print_access_analysis)
+accesser.print_accesses ();
+  record_field_map_t record_field_map = accesser.get_map ();
+  TypeIncompleteEquality equality;
+  bool has_fields_that_can_be_deleted = false;
+  typedef std::set field_offsets_t;

there's no comments (not even file-level) that explains how type escape
is computed.

Sorry, but this isn't even close to be coarsely reviewable.

> We’ve also added a subsection in the GCC internals document to allow
> other compiler devs to better understand our design and implementation.
> A generated PDF can be found here:
>
> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F
> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F/download
>
> page: 719
>
> We’ve been testing the pass against a range of in-tree tests and
> real-life applications (e.g. all SPEC CPU2017 C benchmarks). For
> testing, please see testing subsection in the gcc internals we prepared.
>
> Currently we see the following limitations:
>
> * It is not a "true" ipa pass yes. That is, we can only succeed with
> -flto-partition=none.
> * Currently it is not safe to use -fipa-sra.
> * Brace constructors not supported now. We handle this gracefully.
> * Only C as of now.
> * Results of sizeof() and offsetof() are generated in the compiler
> frontend and thus can’t be changed later at link time. There are a
> couple of ideas to resolve this, but that’s currently unimplemented.
> * At this point we’d like to thank the GCC community for their patient
> help so far on the mailing list and in other channels. And we ask for
> your support in terms of feedback, comments and testing.
>
> Thanks!


Re: LTO Dead Field Elimination

2020-07-27 Thread Richard Biener via Gcc
On Mon, Jul 27, 2020 at 2:59 PM Christoph Müllner
 wrote:
>
> Hi Richard,
>
> On 7/27/20 2:36 PM, Richard Biener wrote:
> > On Fri, Jul 24, 2020 at 5:43 PM Erick Ochoa
> >  wrote:
> >>
> >> This patchset brings back struct reorg to GCC.
> >>
> >> We’ve been working on improving cache utilization recently and would
> >> like to share our current implementation to receive some feedback on it.
> >>
> >> Essentially, we’ve implemented the following components:
> >>
> >>  Type-based escape analysis to determine if we can reorganize a type
> >> at link-time
> >>
> >>  Dead-field elimination to remove unused fields of a struct at
> >> link-time
> >>
> >> The type-based escape analysis provides a list of types, that are not
> >> visible outside of the current linking unit (e.g. parameter types of
> >> external functions).
> >>
> >> The dead-field elimination pass analyses non-escaping structs for fields
> >> that are not used in the linking unit and thus can be removed. The
> >> resulting struct has a smaller memory footprint, which allows for a
> >> higher cache utilization.
> >>
> >> As a side-effect a couple of new infrastructure code has been written
> >> (e.g. a type walker, which we were really missing in GCC), which can be
> >> of course reused for other passes as well.
> >>
> >> We’ve prepared a patchset in the following branch:
> >>
> >>refs/vendors/ARM/heads/arm-struct-reorg-wip
> >
> > Just had some time to peek into this.  Ugh.  The code doesn't look like
> > GCC code looks :/  It doesn't help to have one set of files per C++ class 
> > (25!).
>
> Any suggestions how to best structure these?

As "bad" as it sounds, put everything into one file (maybe separate out
type escape analysis from the actual transform).  Add a toplevel comment
per file explaining things.

> Are there some coding guidelines in the GCC project,
> which can help us to match the expectation?

Look at existing passes, otherwise there's mostly conventions on
formatting.

> > The code itself is undocumented - it's hard to understand what the purpose
> > of all the Walker stuff is.
> >
> > You also didn't seem to know walk_tree () nor walk_gimple* ().
>
> True, we were not aware of that code.
> Thanks for pointing to that code.
> We will have a look.
>
> > Take as example - I figured to look for IPA pass entries, then I see
> >
> > +
> > +static void
> > +collect_types ()
> > +{
> > +  GimpleTypeCollector collector;
> > +  collector.walk ();
> > +  collector.print_collected ();
> > +  ptrset_t types = collector.get_pointer_set ();
> > +  GimpleCaster caster (types);
> > +  caster.walk ();
> > +  if (flag_print_cast_analysis)
> > +caster.print_reasons ();
> > +  ptrset_t casting = caster.get_sets ();
> > +  fix_escaping_types_in_set (casting);
> > +  GimpleAccesser accesser;
> > +  accesser.walk ();
> > +  if (flag_print_access_analysis)
> > +accesser.print_accesses ();
> > +  record_field_map_t record_field_map = accesser.get_map ();
> > +  TypeIncompleteEquality equality;
> > +  bool has_fields_that_can_be_deleted = false;
> > +  typedef std::set field_offsets_t;
> >
> > there's no comments (not even file-level) that explains how type escape
> > is computed.
> >
> > Sorry, but this isn't even close to be coarsely reviewable.
>
> Sad to hear.
> We'll work on the input that you provided and provide a new version.
>
> Thanks,
> Christoph
>
> >
> >> We’ve also added a subsection in the GCC internals document to allow
> >> other compiler devs to better understand our design and implementation.
> >> A generated PDF can be found here:
> >>
> >> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F
> >> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F/download
> >>
> >> page: 719
> >>
> >> We’ve been testing the pass against a range of in-tree tests and
> >> real-life applications (e.g. all SPEC CPU2017 C benchmarks). For
> >> testing, please see testing subsection in the gcc internals we prepared.
> >>
> >> Currently we see the following limitations:
> >>
> >> * It is not a "true" ipa pass yes. That is, we can only succeed with
> >> -flto-partition=none.
> >> * Currently it is not safe to use -fipa-sra.
> >> * Brace constructors not supported now. We handle this gracefully.
> >> * Only C as of now.
> >> * Results of sizeof() and offsetof() are generated in the compiler
> >> frontend and thus can’t be changed later at link time. There are a
> >> couple of ideas to resolve this, but that’s currently unimplemented.
> >> * At this point we’d like to thank the GCC community for their patient
> >> help so far on the mailing list and in other channels. And we ask for
> >> your support in terms of feedback, comments and testing.
> >>
> >> Thanks!


Re: Gcc Digest, Vol 5, Issue 52

2020-07-28 Thread Richard Biener via Gcc
On Tue, Jul 28, 2020 at 4:36 AM Gary Oblock via Gcc  wrote:
>
> Almost all of the makes sense to.
>
> I'm not sure what a conditionally initialized pointer is.

{
  void *p;
  if (condition)
p = ...;
  if (other condition)
 ... use p;

will end up with a PHI node after the conditional init with
one PHI argument being the default definition SSA name
for 'p'.


> You mention VAR_DECL but I assume this is for
> completeness and not something I'll run across
> associated with a default def (but then again I don't
> understand notion of a conditionally initialized
> pointer.)
>
> I'm at the moment only dealing with a single malloced
> array of structures of the given type (though multiple types could have this 
> property.) I intend to extend this to cover multiple array and static 
> allocations but I need to get the easiest case working first. This means no 
> side pointers are needed and if and when I need them pointer will get 
> transformed into a base and index pair.
>
> I intend to do the creation of new ssa names as a separate pass from the 
> gimple transformations. So I will technically be creating for the duration of 
> the pass possibly two defs associated with a single gimple statement. Do I 
> need to delete the old ssa names
> via some mechanism?

When you remove the old definition do

   gsi_remove (&gsi, true); // gsi points at stmt
   release_defs (stmt);

note that as far as I understand you need to modify the stmts using
the former pointer (since its now an index), and I would not recommend
to make creation of new SSA names a separate pass, instead create
them when you alter the original definition and maintain a map
between old and new SSA name.

I haven't dug deep enough into your figure how you identify things
to modify (well, I fear you're just scanning for "uses" of the changed
type ...), but in the scheme I think should be implemented you'd
follow the SSA def->use links for both tracking an objects life
as well as for modifying the accesses.

With just scanning for types I am quite sure you'll run into
cases where you discover SSA uses that you did not modify
because you thought that's not necessary (debug stmts!).  Of
course you'll simply make more things "type escape points" then.

> By the way this is really helpful information. The only
> other person on the project, Erick, is a continent away
> and has about as much experience with gimple as
> me but a whole heck lot less compiler experience.
>
> Thanks,
>
> Gary
>
> 
> From: Gcc  on behalf of gcc-requ...@gcc.gnu.org 
> 
> Sent: Monday, July 27, 2020 1:33 AM
> To: gcc@gcc.gnu.org 
> Subject: Gcc Digest, Vol 5, Issue 52
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> Send Gcc mailing list submissions to
> gcc@gcc.gnu.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://gcc.gnu.org/mailman/listinfo/gcc
> or, via email, send a message with subject or body 'help' to
> gcc-requ...@gcc.gnu.org
>
> You can reach the person managing the list at
> gcc-ow...@gcc.gnu.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Gcc digest..."


Re: Gcc Digest, Vol 5, Issue 52

2020-07-29 Thread Richard Biener via Gcc
On Tue, Jul 28, 2020 at 11:02 PM Gary Oblock  wrote:
>
> Richard,
>
> I wasn't aware of release_defs so I'll add that for certain.
>
> When I do a single transformation as part of the transformation pass
> each transformation uses the correct types internally but on the edges
> emits glue code that will be transformed via a dangling type fixup pass.
>
> For example when adding something to a pointer:
>
> _2 = _1 + k
>
> Where _1 & _2 are the old point types I'll
>  emit
>
> new_3 = (type_convert)_1
> new_4 = (type_convert)k
> new_5 = new_4 / struct_size // truncating divide
> new_6 = new_3 + new_5
> _2   = (type_convert)_new_6
>
> Note, the casting is done with CONVERT_EXPR
> which is harmless when I create new ssa names
> and set the appropriate operands in

OK, so you're funneling the new "index" values through
the original pointer variable _1?  But then I don't see
where the patching up of SSA names and the default
def issue happens.

> new_3 = (type_convert)_1
> _2 = (type_convert)new_6
>
> to
>
> new_3 = new_7
> new_8 = new_6
>
> Now I might actually find via a look up that
> _1 and/or _2 were already mapped to
> new_7 and/or new_8 but that's irrelevant.
>
> To intermix the applications of the transformations and
> the patching of these dangling types seems like I'd
> need to do an insanely ugly recursive walk of each functions
> body.
>
> I'm curious when you mention def-use I'm not aware of
> GCC using def-use chains except at the RTL level.
> Is there a def-use mechanism in GIMPLE because
> in SSA form it's trivial to find the definition of
> a temp variable but non trivial to find the use of
> it. Which I think is a valid reason for fixing up the
> dangling types of temps in a scan.

In GIMPLE SSA we maintain a list of uses for each SSA
def, available via the so called immediate-uses.  You
can grep for uses of FOR_EACH_IMM_USE[_FAST]

>
> Note, I'll maintain a mapping like you suggest but not use
> it at transformation application time. Furthermore,
> I'll initialize the mapping with the default defs from
> the DECLs so I won't have to mess with them on the fly.
> Now at the time in the scan when I find uses and defs of
> a dangling type I'd like to simply modify the associated operands
> of the statement. What is the real advantage creating a new
> statement with the correct types? I'll be using SSA_NAME_DEF_STMT
> if the newly created ssa name is on the left hand side. Also, the
> ssa_name it replaces will no longer be referenced by the end of the
> scan pass.

Since you are replacing a[i].b with array_for_b[i] I am wondering
how you do the transform for non-pointer adjustments.

> Note, I do have a escape mechanism in a qualification
> pre-pass to the transformations. It's not intended as
> catch-all for things I don't understand rather it's an
> aid to find possible new cases. However, there are
> legitimate things at this point in time during development
> of this optimization that I need to spot things this way. Later,
> when points to analysis is integrated falling through to
> the default case behavior will likely cause an internal error.
>
> Thanks,
>
> Gary
>
> 
> From: Richard Biener 
> Sent: Tuesday, July 28, 2020 12:07 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Gcc Digest, Vol 5, Issue 52
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Tue, Jul 28, 2020 at 4:36 AM Gary Oblock via Gcc  wrote:
> >
> > Almost all of the makes sense to.
> >
> > I'm not sure what a conditionally initialized pointer is.
>
> {
>   void *p;
>   if (condition)
> p = ...;
>   if (other condition)
>  ... use p;
>
> will end up with a PHI node after the conditional init with
> one PHI argument being the default definition SSA name
> for 'p'.
>
>
> > You mention VAR_DECL but I assume this is for
> > completeness and not something I'll run across
> > associated with a default def (but then again I don't
> > understand notion of a conditionally initialized
> > pointer.)
> >
> > I'm at the moment only dealing with a single malloced
> > array of structures of the given type (though multiple types could have 
> > this property.) I intend to extend this to cover multiple array and static 
> > allocations but I need to get the easiest case working first. This means no 
> > side pointers are needed and if and when I need them pointer will get 
> > transformed into a base and index pair.
> >
> > I intend to do the creation of new ssa names as a separate pass from the 
> > gimple transformations. So I will technically be creating for the duration 
> > of the pass possibly two defs associated with a single gimple statement. Do 
> > I need to delete the old ssa names
> > via some mechanism?
>
> When you remove the old definition do
>
>gsi_remove (&gsi, true); // gsi points at stmt
>release_defs (stmt);
>
> note that as far as 

Re: Gcc Digest, Vol 5, Issue 52

2020-07-29 Thread Richard Biener via Gcc
On Wed, Jul 29, 2020 at 9:39 PM Gary Oblock  wrote:
>
> Richard,
>
> Thanks, I had no idea about the immediate uses mechanism and
> using it will speed things up a bit and make them more reliable.
> However, I'll still have to scan the LHS of each assignment unless
> there's a mechanism to traverse all the SSAs for a function.

May I suggest that you give the GCC internals manual a read,
particularly the sections about GIMPLE, GENERIC and
'Analysis and Optimization of GIMPLE tuples'.  Most of the
info I provided is documented there.

> Note, I assume there is also a mechanism to add and remove
> immediate use instances. If I can find it I'll post a question to the list.
>
> I do the the patching on a per function basis immediately after
> applying the transforms. It was going to be a scan of all the
> GIMPLE. What you've told me might make it a bit of a misnomer
> to call what I intend to do now, a scan. The default defs problem
> happened when the original scan tried to simply modify the type
> of a default def. There didn't seem to be a way of doing this and I've
> since learned this in fact associates declarations not types but with
> a declaration. Note, just modifying the type of normal ssa names
> seemed to work but I can't in fact know it actually would have.
>
> I'm not sure I can do justice to the other transformations but
> here is one larger example. Note, since I'm currently only
> dealing with dynamically allocated array I'll only see "a->f" and
> not "a[i].f" so you are getting the former.
>
>  _2 = _1->f
>
> turns into
>
> get_field_arry_addr: new_3 = array_base.f_array_field
> get_index   : new_4 = (sizetype)_1
> get_offset   : new_5  = new_4 * size_of_f_element
> get_field_addr: new_6 = new_3 + new_5   // uses pointer arith
> temp_set: new_7 = * new_6
> final_set  : _2   = new_7
>
> I hope that's sufficient to satisfy your curiosity because the only other
> large transformation currently coded is that for the malloc which would
> take me quite a while to put together an example of. Note, these are
> shown in the HL design doc which I sent you. Though like battle plans,
> no design no matter how good survives coding intact.
>
> Thanks again,
>
> Gary
>
>
>
>
> 
> From: Richard Biener 
> Sent: Wednesday, July 29, 2020 5:42 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Gcc Digest, Vol 5, Issue 52
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Tue, Jul 28, 2020 at 11:02 PM Gary Oblock  wrote:
> >
> > Richard,
> >
> > I wasn't aware of release_defs so I'll add that for certain.
> >
> > When I do a single transformation as part of the transformation pass
> > each transformation uses the correct types internally but on the edges
> > emits glue code that will be transformed via a dangling type fixup pass.
> >
> > For example when adding something to a pointer:
> >
> > _2 = _1 + k
> >
> > Where _1 & _2 are the old point types I'll
> >  emit
> >
> > new_3 = (type_convert)_1
> > new_4 = (type_convert)k
> > new_5 = new_4 / struct_size // truncating divide
> > new_6 = new_3 + new_5
> > _2   = (type_convert)_new_6
> >
> > Note, the casting is done with CONVERT_EXPR
> > which is harmless when I create new ssa names
> > and set the appropriate operands in
>
> OK, so you're funneling the new "index" values through
> the original pointer variable _1?  But then I don't see
> where the patching up of SSA names and the default
> def issue happens.
>
> > new_3 = (type_convert)_1
> > _2 = (type_convert)new_6
> >
> > to
> >
> > new_3 = new_7
> > new_8 = new_6
> >
> > Now I might actually find via a look up that
> > _1 and/or _2 were already mapped to
> > new_7 and/or new_8 but that's irrelevant.
> >
> > To intermix the applications of the transformations and
> > the patching of these dangling types seems like I'd
> > need to do an insanely ugly recursive walk of each functions
> > body.
> >
> > I'm curious when you mention def-use I'm not aware of
> > GCC using def-use chains except at the RTL level.
> > Is there a def-use mechanism in GIMPLE because
> > in SSA form it's trivial to find the definition of
> > a temp variable but non trivial to find the use of
> > it. Which I think is a valid reason for fixing up the
> > dangling types of temps in a scan.
>
> In GIMPLE SSA we maintain a list of uses for each SSA
> def, available via the so called immediate-uses.  You
> can grep for uses of FOR_EACH_IMM_USE[_FAST]
>
> >
> > Note, I'll maintain a mapping like you suggest but not use
> > it at transformation application time. Furthermore,
> > I'll initialize the mapping with the default defs from
> > the DECLs so I won't have to mess with them on the fly.
> > Now at the time in the scan when I find uses and defs of
> > a dangling type I'd like to simply modify the a

Re: Define __attribute__((no_instrument_function)) but still got instrumented

2020-08-06 Thread Richard Biener via Gcc
On Fri, Aug 7, 2020 at 8:35 AM Shuai Wang via Gcc  wrote:
>
> Hello!
>
> I am working on a ARM GCC plugin which instruments each GIMPLE function
> with some new function calls.
>
> Currently I want to skip certain functions by adding the
> no_instrument_function attribute. However, I do see that in the
> disassembled code, all functions are still instrumented.
>
> Am I missed anything here? From this page (
> https://www.keil.com/support/man/docs/armcc/armcc_chr1359124976163.htm), I
> do see that no_instrument_function is used to skip --gnu_instrument, but
> might not be applicable to my case where I use the following command to
> compile:
>
> arm-none-eabi-g++ -fplugin=my_plugin.so -mcpu=cortex-m4 -mthumb
> -mfloat-abi=soft -Og -fmessage-length=0 -fsigned-char -ffunction-sections
> -fdata-sections -fno-move-loop-invariants -Wall -Wextra  -g3 -DDEBUG
> -DUSE_FULL_ASSERT -DOS_USE_SEMIHOSTING -DTRACE -DOS_USE_TRACE_SEMIH
> OSTING_DEBUG -DSTM32F429xx -DUSE_HAL_DRIVER -DHSE_VALUE=800
> -DLOS_KERNEL_DEBUG_OUT
>
> But overall, could anyone shed some lights on: 1) how to skip instrument
> certain functions with GCC plugin? 2: is it possible to check the function
> attribute in GIMPLE code? If so, I can simply check if certain functions
> are marked as "no_instrument_function" and skip by myself.

You can check lookup_attribute("no_instrument_function",
DECL_ATTRIBUTES (cfun->decl))

> Thank you!
> Shuai


Re: Has FSF stopped processing copyright paperwork?

2020-08-07 Thread Richard Biener via Gcc
On Fri, Aug 7, 2020 at 3:14 PM H.J. Lu via Gcc  wrote:
>
> On Tue, May 5, 2020 at 6:42 PM Kaylee Blake  wrote:
> >
> > On 2/5/20 11:49 pm, H.J. Lu wrote:
> > > On Wed, Mar 18, 2020 at 6:46 PM Kaylee Blake via Binutils
> > >  wrote:
> > >>
> > >> On 19/3/20 12:02 pm, H.J. Lu wrote:
> > >>> Kaylee, is your paper work with FSF in order? I will submit the updated
> > >>> patch set after your paper is on file with FSF.
> > >>
> > >> I'm waiting on a response from them at the moment.
> > >>
> > >
> > > Hi Kaylee,
> > >
> > > Any update on your paper work with FSF?
> > >
> >
> > Still waiting; apparently their work process has been dramatically
> > slowed by the whole COVID-19 situation.
> >
> > --
> > Kaylee Blake 
> > C is the worst language, except for all the others.
>
> Hi,
>
> I submitted a set of binutils patches:
>
> https://sourceware.org/pipermail/binutils/2020-March/13.html
>
> including contribution from Kaylee Blake .
> Can someone check if Kaylee's paperwork is on file with FSF?

Don't see her in the list.

Richard.

> Thanks.
>
> --
> H.J.


Re: Silly question about pass numbers

2020-08-12 Thread Richard Biener via Gcc
On August 13, 2020 2:57:04 AM GMT+02:00, Gary Oblock via Gcc  
wrote:
>Segher,
>
>If this was on the mainline and not in the middle of a
>nontrivial optimization effort I would have filed a bug report
>and not asked a silly question. 😉
>
>I'm at a total lost as to how I could have caused the pass
>numbers to be backward... but at least have I confirmed that's
>what seems to be happening. It's not doing any harm to
>anything except the sanity of anybody looking at the pass
>dumps...

The inline dump is last written to during transform phase which is only carried 
out when the body is further optimized (thus again function at a time, not 
IPA). Which is why you see interleaving of dump appends. 

>Thanks,
>
>Gary
>
>From: Segher Boessenkool 
>Sent: Wednesday, August 12, 2020 5:45 PM
>To: Gary Oblock 
>Cc: gcc@gcc.gnu.org 
>Subject: Re: Silly question about pass numbers
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>Hi!
>
>On Wed, Aug 12, 2020 at 08:26:34PM +, Gary Oblock wrote:
>> The files are from the same run:
>> -rw-rw-r-- 1 gary gary  3855 Aug 12 12:49 exe.ltrans0.ltrans.074i.cp
>> -rw-rw-r-- 1 gary gary 16747 Aug 12 12:49
>exe.ltrans0.ltrans.087i.structure-reorg
>>
>> By the time .cp was created inlining results in only main existing.
>> In the .structure-reorg file there are three functions.
>
>It does not matter what time the dump files were last opened (or
>created
>or written to).
>
>> Not only am I seeing things in .cp (beyond a shadow of a doubt)
>> that were created in structure  reorganization, inlining has also
>> been done and its pass number of 79!
>>
>> Note, this is not hurting me in any way other than violating my
>> beliefs about pass numbering.
>
>I cannot check on any of that because this is not in mainline GCC?
>It is a lot easier if you ask us about problems we may be able to
>reproduce ;-)  Like maybe something with only cp and inline?
>
>
>Segher



Re: RFC: -fno-share-inlines

2020-08-23 Thread Richard Biener via Gcc
On Mon, Aug 10, 2020 at 9:36 AM Allan Sandfeld Jensen
 wrote:
>
> Following the previous discussion, this is a proposal for a patch that adds
> the flag -fno-share-inlines that can be used when compiling singular source
> files with a different set of flags than the rest of the project.
>
> It basically turns off comdat for inline functions, as if you compiled without
> support for 'weak' symbols. Turning them all into "static" functions, even if
> that wouldn't normally be possible for that type of function. Not sure if it
> breaks anything, which is why I am not sending it to the patch list.
>
> I also considered alternatively to turn the comdat generation off later during
> assembler production to ensure all processing and optimization of comdat
> functions would occur as normal.

We already have -fvisibility-inlines-hidden so maybe call it
-fvisibility-inlines-static?
Does this option also imply 'static' vtables?

Richard.

> Best regards
> Allan


Re: Problem cropping up in Value Range Propogation

2020-08-24 Thread Richard Biener via Gcc
On Tue, Aug 11, 2020 at 6:15 AM Gary Oblock via Gcc  wrote:
>
> I'm trying to debug a problem cropping up in value range propagation.
> Ironically I probably own an original copy 1995 copy of the paper it's
> based on but that's not going to be much help since I'm lost in the
> weeds.  It's running on some optimization (my structure reorg
> optimization) generated GIMPLE statements.
>
> Here's the GIMPLE dump:
>
> Function max_of_y (max_of_y, funcdef_no=1, decl_uid=4391, cgraph_uid=2, 
> symbol_order=20) (executed once)
>
> max_of_y (unsigned long data, size_t len)
> {
>   double value;
>   double result;
>   size_t i;
>
>[local count: 118111600]:
>   field_arry_addr_14 = _reorg_base_var_type_t.y;
>   index_15 = (sizetype) data_27(D);
>   offset_16 = index_15 * 8;
>   field_addr_17 = field_arry_addr_14 + offset_16;
>   field_val_temp_13 = MEM  [(void *)field_addr_17];
>   result_8 = field_val_temp_13;
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   _1 = i_3 * 16;
>   PPI_rhs1_cast_18 = (unsigned long) data_27(D);
>   PPI_rhs2_cast_19 = (unsigned long) _1;
>   PtrPlusInt_Adj_20 = PPI_rhs2_cast_19 / 16;
>   PtrPlusInt_21 = PPI_rhs1_cast_18 + PtrPlusInt_Adj_20;
>   dedangled_27 = (unsigned long) PtrPlusInt_21;
>   field_arry_addr_23 = _reorg_base_var_type_t.y;
>   index_24 = (sizetype) dedangled_27;
>   offset_25 = index_24 * 8;
>   field_addr_26 = field_arry_addr_23 + offset_25;
>   field_val_temp_22 = MEM  [(void *)field_addr_26];
>   value_11 = field_val_temp_22;
>   if (result_5 < value_11)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 477815112]:
>
>[local count: 955630225]:
>   # result_4 = PHI 
>   i_12 = i_3 + 1;
>
>[local count: 1073741824]:
>   # i_3 = PHI <1(2), i_12(5)>
>   # result_5 = PHI 
>   if (i_3 < len_9(D))
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   # result_10 = PHI 
>   return result_10;
> }
>
> The failure in VRP is occurring on
>
> offset_16 = data_27(D) * 8;
>
> which is the from two adjacent statements above
>
>   index_15 = (sizetype) data_27(D);
>   offset_16 = index_15 * 8;
>
> being merged together.
>
> Note, the types of index_15/16 are sizetype and data_27 is unsigned
> long.
> The error message is:
>
> internal compiler error: tree check: expected class ‘type’, have 
> ‘exceptional’ (error_mark) in to_wide,

This means the SSA name looked at is released and should no longer be
refered from in the IL.

> Things only start to look broken in value_range::lower_bound in
> value-range.cc when
>
> return wi::to_wide (t);
>
> is passed error_mark_node in t. It's getting it from m_min just above.
> My observation is that m_min is not always error_mark_node. In fact, I
> seem to think you need to use set_varying to get this to even happen.
>
> Note, the ssa_propagation_engine processed the statement "offset_16 =
> data..."  multiple times before failing on it. What oh what is
> happening and how in the heck did I cause it???
>
> Please, somebody throw me a life preserver on this.
>
> Thanks,
>
> Gary
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any review, copying, or distribution of this email (or any 
> attachments thereto) is strictly prohibited. If you are not the intended 
> recipient, please contact the sender immediately and permanently delete the 
> original and any copies of this email and any attachments thereto.


Re: GCC Plugins and global_options

2020-08-24 Thread Richard Biener via Gcc
On Thu, Aug 13, 2020 at 10:39 AM Jakub Jelinek via Gcc  wrote:
>
> Hi!
>
> Any time somebody adds or removes an option in some *.opt file (which e.g.
> on the 10 branch after branching off 11 happened 5 times already), many
> offsets in global_options variable change.  It is true we don't guarantee
> ABI stability for plugins, but we change the most often used data structures
> on the release branches only very rarely and so the options changes are the
> most problematic for ABI stability of plugins.
>
> Annobin uses a way to remap accesses to some of the global_options.x_* by
> looking them up in the cl_options array where we have
> offsetof (struct gcc_options, x_flag_lto)
> etc. remembered, but sadly doesn't do it for all options (e.g. some flag_*
> etc. option accesses may be hidden in various macros like POINTER_SIZE),
> and more importantly some struct gcc_options offsets are not covered at all.
> E.g. there is no offsetof (struct gcc_options, x_optimize),
> offsetof (struct gcc_options, x_flag_sanitize) etc.  Those are usually:
> Variable
> int optimize
> in the *.opt files.
>
> So, couldn't our opt*.awk scripts generate another array that would either
> cover just the offsets not covered in struct cl_options that a plugin could
> use to remap struct global_options offsets at runtime, which would include
> e.g. the offsetof value and the name of the variable and perhaps sizeof for
> verification purposes?
> Or couldn't we in plugin/include/ install a modified version of options.h
> that instead of all the:
> #define flag_opts_finished global_options.x_flag_opts_finished
> will do:
> #define flag_opts_finished gcc_lookup_option (flag_opts_finished)
> where lookup_option would be a macro that does something like:
> __attribute__((__pure__))
> void *gcc_lookup_option_2 (unsigned short, const char *, unsigned short);
> template 
> T &gcc_lookup_option_1 (unsigned short offset, const char *name)
> {
>   T *ptr = static_cast  (gcc_lookup_option_2 (offset, name, sizeof (T)));
>   return *ptr;
> }
> #define lookup_option(var) \
>   gcc_lookup_option_1  \
> (offsetof (struct gcc_options, x_##var), #var, \
>  sizeof (global_options.x_##var))
> where the gcc_lookup_option_2 function would lookup the variable in an
> opt*.awk generated table, containing entries like:
>   "ix86_stack_protector_guard_offset", NULL, NULL, NULL, NULL, NULL, NULL, 
> NULL,
>   "ix86_stack_protector_guard_reg", "", NULL, NULL,
>   "recip_mask", NULL, NULL, NULL,
> ...
> As struct gcc_options is around 5KB now, that table would need 5K entries,
> NULL and "" stand for no variable starts here, and "" additionally says that
> padding starts here.
> So, if no options have changed since the plugin has been built, it would be
> very cheap, it would just verify that at the given offset the table contains
> the corresponding string (i.e. non-NULL and strcmp == 0 and that the size
> matches (that size - 1 following entries are NULL and then there is
> non-NULL)). If not, it would keep looking around (one loop that looks in
> both directions in the table, so first it would check offsets -1 and +1 from
> the original, then -2 and +2, etc.
> And would gcc_unreachable () if it can't find it in the table, or can find
> it, but the size has changed.
>
> If that is unacceptable, at least having a table with variables not covered
> in struct cl_options offsets would allow the plugin to do it itself
> (basically by constructing a remapping table, original offsetof (struct 
> gcc_options, XXX)
> remaps to offset ABC (and have some value like (unsigned short) -1 to signal
> it is gone and there should be assertion failure.
>
> Thoughts on this?

I'd say we ignore this since we do not provide any ABI stability guarantees.
Instead we maybe want to "export" the genchecksum result and embed
it into plugin objects and refuse to load plugins that were not built against
the very same ABI [unless --force is given]?

Richard.

> Jakub
>


Re: Question about IPA-PTA and build_alias

2020-08-24 Thread Richard Biener via Gcc
On Mon, Aug 17, 2020 at 3:22 PM Erick Ochoa
 wrote:
>
> Hello,
>
> I'm looking to understand better the points-to analysis (IPA-PTA) and
> the alias analysis (build_alias).
>
> How is the information produced by IPA-PTA consumed?
>
> Are alias sets in build_alias computed by the intersections of the
> points_to_set(s) (computed by IPA-PTA)?
>
> My intuition tells me that it could be relatively simple to move
> build_alias to be an SIMPLE_IPA_PASS performed just after IPA-PTA, but I
> do not have enough experience in GCC to tell if this is correct. What
> could be some difficulties which I am not seeing? (Either move, or
> create a new IPA-ALIAS SIMPLE_IPA_PASS.) This pass would have the same
> sensitivity as IPA-PTA { flow-insensitive, context-insensitive,
> field-sensitive } because the alias sets could be computed by the
> intersection of points-to-sets.

Both IPA-PTA and build_alias do the same, they build PTA constraint
sets, solve them and attach points-to info to SSA names.  Just IPA-PTA
does this for the whole TU while build_alias does it for a function at a time.

So I guess I do not understand your question.

Richard.

>
> Thanks!


Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Richard Biener via Gcc
On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak  wrote:
>
> "Allan Sandfeld Jensen"  wrote:
>
> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
> >> Hi @ll,
> >>
> >> in his ACM queue article ,
> >> Matt Godbolt used the function
> >>
> >> | bool isWhitespace(char c)
> >> | {
> >> |
> >> | return c == ' '
> >> |
> >> |   || c == '\r'
> >> |   || c == '\n'
> >> |   || c == '\t';
> >> |
> >> | }
> >>
> >> as an example, for which GCC 9.1 emits the following assembly for AMD64
> >>
> >> processors (see ):
> >> |xoreax, eax  ; result = false
> >> |cmpdil, 32   ; is c > 32
> >> |ja .L4   ; if so, exit with false
> >> |movabs rax, 4294977024   ; rax = 0x12600
> >> |shrx   rax, rax, rdi ; rax >>= c
> >> |andeax, 1; result = rax & 1
> >> |
> >> |.L4:
> >> |ret
> >>
> > No it doesn't. As your example shows if you took the time to read it, it is
> > what gcc emit when generating code to run on a _haswell_ architecture.
>
> Matt's article does NOT specify the architecture for THIS example.
> He specified it for another example he named "(q)":
>
> | When targeting the Haswell microarchitecture, GCC 8.2 compiles this code
> | to the assembly in (q) (https://godbolt.org/z/acm19_bits):
>
> WHat about CAREFUL reading?
>
> > If you remove -march=haswell from the command line you get:
> >
> >xor eax, eax
> >cmp dil, 32
> >ja  .L1
> >movabs  rax, 4294977024
> >mov ecx, edi
> >shr rax, cl
> >and eax, 1
> >
> > It uses one mov more, but no shrx.
>
> The SHRX is NOT the point here; its the avoidable conditional branch that
> matters!

Whether or not the conditional branch sequence is faster depends on whether
the branch is well-predicted which very much depends on the data you
feed the isWhitespace function with but I guess since this is the
c == ' ' test it _will_ be a well-predicted branch which means the
conditional branch sequence will be usually faster.  The proposed
change turns the control into a data dependence which constrains
instruction scheduling and retirement.  Indeed a mispredicted branch
will likely be more costly.

x86 CPUs do not perform data speculation.

Richard.

>  mov ecx, edi
>  movabs  rax, 4294977024
>  shr rax, cl
>  xor edi, edi
>  cmp ecx, 33
>  setbdil
>  and eax, edi
>
> Stefan


Re: Question about Gimple Variables named D.[0-9]*

2020-08-24 Thread Richard Biener via Gcc
On Thu, Aug 20, 2020 at 11:51 AM Erick Ochoa
 wrote:
>
> Hello,
>
> I am looking at the dump for the build_alias pass. I see a lot of
> variables with the naming convention D.[0-9]* in the points-to sets
> being printed.
>
> When I compile with
>
> -fdump-tree-all-all
>
> I can see that the suffix D.[0-9]* is appended to some gimple variables.
> I initially imagined that variables in the points-to variable set could
> map to a variable declaration in gimple, but this does not seem to be
> the case. I have confirmed this by searching for some known variable
> name in the points-to set and finding no matches in the gimple code, the
> other way around seems to also be true.
>
> Are these variables just constraint variables used to solve the
> points-to analysis? In other words, the variables in points-to sets
> printed out in build_alias do not have a simple map to variables in
> gimple. The only relation is that the intersection between to points-to
> set for variable A with the points-to set of variable B will yield an
> is_alias(A, B) relationship. Is the above true?

The points-to sets in SSA_NAME_POINTER_INFO record DECL_UIDs
which are those printed as D.[0-9]* which is appended to all variables
if you dump with -uid.  The points-to set "names" in the points-to dumps
are internal names created for the constraint vairables - most of the time
based on the program variable names but only loosely coupled.

The translation between constraint variables and program variables is
done in set_uids_in_ptset.

Richard.

>
> Thanks!
>
>


Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Richard Biener via Gcc
On Mon, Aug 24, 2020 at 1:22 PM Stefan Kanthak  wrote:
>
> "Richard Biener"  wrote:
>
> > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak  
> > wrote:
> >>
> >> "Allan Sandfeld Jensen"  wrote:
> >>
> >> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
> >> >> Hi @ll,
> >> >>
> >> >> in his ACM queue article ,
> >> >> Matt Godbolt used the function
> >> >>
> >> >> | bool isWhitespace(char c)
> >> >> | {
> >> >> |
> >> >> | return c == ' '
> >> >> |
> >> >> |   || c == '\r'
> >> >> |   || c == '\n'
> >> >> |   || c == '\t';
> >> >> |
> >> >> | }
> >> >>
> >> >> as an example, for which GCC 9.1 emits the following assembly for AMD64
> >> >>
> >> >> processors (see ):
> >> >> |xoreax, eax  ; result = false
> >> >> |cmpdil, 32   ; is c > 32
> >> >> |ja .L4   ; if so, exit with false
> >> >> |movabs rax, 4294977024   ; rax = 0x12600
> >> >> |shrx   rax, rax, rdi ; rax >>= c
> >> >> |andeax, 1; result = rax & 1
> >> >> |
> >> >> |.L4:
> >> >> |ret
>
> [...]
>
> > Whether or not the conditional branch sequence is faster depends on whether
> > the branch is well-predicted which very much depends on the data you
> > feed the isWhitespace function with
>
> Correct.
>
> > but I guess since this is the c == ' ' test it _will_ be a well-predicted 
> > branch
>
> Also correct, but you miss a point: the typical use case is
>
> while (isWhitespace(*ptr)) ptr++;
>
> > which means the conditional branch sequence will be usually faster.
>
> And this is wrong!
> The (well-predicted) branch is usually NOT taken, so both code variants
> usually execute (with one exception the same) 6 or 7 instructions.

Whether or not the branch is predicted taken does not matter, what
matters is that the continuation is not data dependent on the branch
target computation and thus can execute in parallel to it.

> > The proposed change turns the control into a data dependence which
> > constrains instruction scheduling and retirement.
>
> It doesn't matter: the branch has the same data dependency too!
>
> > Indeed a mispredicted branch will likely be more costly.
>
> And no branch is even better: the branch predictor has a limited capacity,
> so every removed branch instruction can help improve its efficiency.
>
> > x86 CPUs do not perform data speculation.
>
> >>  mov ecx, edi
> >>  movabs  rax, 4294977024
> >>  shr rax, cl
> >>  xor edi, edi
> >>  cmp ecx, 33
> >>  setbdil
> >>  and eax, edi
>
> I already presented measured numbers: with random data, the branch-free
> code is faster, with ordered data the original code.
>
> Left column 1 billion sequential characters
> for (int i=10; i; --i) ...(i);
> right column 1 billion random characters, in cycles per character:

I guess feeding it Real Text (TM) is the only relevant benchmark,
doing sth like

  for (;;)
 cnt[isWhitespace(*ptr++)]++;

> GCC:   2.43.4
> branch-free:   3.02.5

I'd call that unconclusive data - you also failed to show your test data
is somehow relevant.  We do know that mispredicted branches are bad.
You show well-predicted branches are good.  By simple statistics
singling out 4 out of 255 values will make the branches well-predicted.

> Now perform a linear interpolation and find the break-even point at
> p=0.4, with p=0 for ordered data and p=1 for random data, or just use
> the average of these numbers: 2.9 cycles vs. 2.75 cycles.
> That's small, but measurable!


Re: Do all global structure variables escape in IPA-PTA?

2020-08-25 Thread Richard Biener via Gcc
On August 25, 2020 3:09:13 PM GMT+02:00, Erick Ochoa 
 wrote:
>Hi,
>
>I'm trying to understand how the escape analysis in IPA-PTA works. I
>was 
>testing a hypothesis where if a structure contains an array of 
>characters and this array of characters is passed to fopen, the 
>structure and all subfields will escape.
>
>To do this, I made a program that has a global structure variable foo2 
>that is has a field passed as an argument to fopen. I also made another
>
>variable foo whose array is initialized by the result of rand.
>
>However, after compiling this program with -flto -flto-partition=none 
>-fipa -fdump-ipa-pta -fdump-tree-all-all -Ofast (gcc --version 10.2.0)
>
>E.g.
>
>#include 
>#include 
>#include 
>
>struct foo_t {
>   char buffer1[100];
>   char buffer2[100];
>};
>
>struct foo_t foo;
>struct foo_t foo2;
>
>int
>main(int argc, char** argv)
>{
>
>   fopen(foo2.buffer1, "r");
>   for (int i = 0; i < 100; i++)
>   {
> foo.buffer1[i] = rand();
>   }
>   int i = rand();
>   int retval = foo.buffer1[i % 100];
>   return retval;
>}
>
>I see the PTA dump state the following:
>
>ESCAPED = { STRING ESCAPED NONLOCAL foo2 }
>foo = { ESCAPED NONLOCAL }
>foo2 = { ESCAPED NONLOCAL }
>
>which I understand as
>* something externally visible might point to foo2
>* foo2 might point to something externally visible
>* foo might point to something externally visible

Yes. So it's exactly as your hypothesis. 

>I have seen that global variables are stored in the .gnu.lto_.decls LTO
>
>file section. In the passes I have worked on I have ignored global 
>variables. But can foo and foo2 be marked as escaping because the 
>declarations are not streamed in yet? Or is there another reason I am 
>not seeing? I am aware of aware of the several TODOs at the beginning
>of 
>gcc/tree-ssa-structalias.c but I am unsure if they contribute to these 
>variables being marked as escaping. (Maybe TODO 1 and TODO 2?)

Not sure what the problem is. Foo2 escapes because it's address is passed to a 
function. 

? 

Richard. 

>Just FYI, I've been reading:
>* Structure Aliasing in GCC
>* Gimple Alias Improvements for GCC 4.5
>* Memory SSA - A Unified Approach for Sparsely Representing Memory 
>Operations
>
>Thanks, I appreciate all help!



Re: Question about IPA-PTA and build_alias

2020-08-25 Thread Richard Biener via Gcc
On August 24, 2020 10:00:44 AM GMT+02:00, Erick Ochoa 
 wrote:
>
>
>On 24/08/2020 09:40, Richard Biener wrote:
>> On Mon, Aug 17, 2020 at 3:22 PM Erick Ochoa
>>  wrote:
>>>
>>> Hello,
>>>
>>> I'm looking to understand better the points-to analysis (IPA-PTA)
>and
>>> the alias analysis (build_alias).
>>>
>>> How is the information produced by IPA-PTA consumed?
>>>
>>> Are alias sets in build_alias computed by the intersections of the
>>> points_to_set(s) (computed by IPA-PTA)?
>>>
>>> My intuition tells me that it could be relatively simple to move
>>> build_alias to be an SIMPLE_IPA_PASS performed just after IPA-PTA,
>but I
>>> do not have enough experience in GCC to tell if this is correct.
>What
>>> could be some difficulties which I am not seeing? (Either move, or
>>> create a new IPA-ALIAS SIMPLE_IPA_PASS.) This pass would have the
>same
>>> sensitivity as IPA-PTA { flow-insensitive, context-insensitive,
>>> field-sensitive } because the alias sets could be computed by the
>>> intersection of points-to-sets.
>> 
>> Both IPA-PTA and build_alias do the same, they build PTA constraint
>> sets, solve them and attach points-to info to SSA names.  Just
>IPA-PTA
>> does this for the whole TU while build_alias does it for a function
>at a time.
>> 
>> So I guess I do not understand your question.
>
>Hi Richard,
>
>I'm just trying to imagine what a data-layout optimization would look 
>like if instead of using the type-escape analysis we used the points-to
>
>analysis to find out which variables/memory locations escape and what 
>that would mean for the transformation itself.

What I've said before is that for the object based approach you need precise 
following of pointers which covers escape analysis already. For non allocated 
objects you need to find possible address taken and accesses. 

I don't think the escape analysis included in IPA points-to analysis will help 
you in the end. The constraint solver does not do the precise analysis you need 
as well but the precise analysis will give you conservative escape results. 

>One of the things that I think would be needed are alias-sets. I
>thought 
>that build_alias was building alias sets but I was mistaken. However, 
>computing the alias sets should not be too difficult.
>
>Also continuing imagining what a data-layout optimization would look 
>like in GCC, since IPA-PTA is a SIMPLE_IPA_PASS and if alias sets are 
>indeed needed, I was asking what would be the reception to a 
>SIMPLE_IPA_PASS that computes alias sets just after IPA-PTA. (As
>opposed 
>to a full ipa pass).

If you look we skip simple analysis if IPA analysis was done to not overwrite 
its results. So forcing it (even earlier) would make IPA analysis moot which 
would of course not be welcome. 

Richard. 

>
>
>
>> 
>> Richard.
>> 
>>>
>>> Thanks!



Re: Do all global structure variables escape in IPA-PTA?

2020-08-25 Thread Richard Biener via Gcc
On August 25, 2020 6:36:19 PM GMT+02:00, Erick Ochoa 
 wrote:
>
>
>On 25/08/2020 17:19, Erick Ochoa wrote:
>> 
>> 
>> On 25/08/2020 17:10, Richard Biener wrote:
>>> On August 25, 2020 3:09:13 PM GMT+02:00, Erick Ochoa 
>>>  wrote:
 Hi,

 I'm trying to understand how the escape analysis in IPA-PTA works.
>I
 was
 testing a hypothesis where if a structure contains an array of
 characters and this array of characters is passed to fopen, the
 structure and all subfields will escape.

 To do this, I made a program that has a global structure variable
>foo2
 that is has a field passed as an argument to fopen. I also made
>another

 variable foo whose array is initialized by the result of rand.

 However, after compiling this program with -flto
>-flto-partition=none
 -fipa -fdump-ipa-pta -fdump-tree-all-all -Ofast (gcc --version
>10.2.0)

 E.g.

 #include 
 #include 
 #include 

 struct foo_t {
    char buffer1[100];
    char buffer2[100];
 };

 struct foo_t foo;
 struct foo_t foo2;

 int
 main(int argc, char** argv)
 {

    fopen(foo2.buffer1, "r");
    for (int i = 0; i < 100; i++)
    {
  foo.buffer1[i] = rand();
    }
    int i = rand();
    int retval = foo.buffer1[i % 100];
    return retval;
 }

 I see the PTA dump state the following:

 ESCAPED = { STRING ESCAPED NONLOCAL foo2 }
 foo = { ESCAPED NONLOCAL }
 foo2 = { ESCAPED NONLOCAL }

 which I understand as
 * something externally visible might point to foo2
 * foo2 might point to something externally visible
 * foo might point to something externally visible
>>>
>>> Yes. So it's exactly as your hypothesis.
>>>
 I have seen that global variables are stored in the .gnu.lto_.decls
>LTO

 file section. In the passes I have worked on I have ignored global
 variables. But can foo and foo2 be marked as escaping because the
 declarations are not streamed in yet? Or is there another reason I
>am
 not seeing? I am aware of aware of the several TODOs at the
>beginning
 of
 gcc/tree-ssa-structalias.c but I am unsure if they contribute to
>these
 variables being marked as escaping. (Maybe TODO 1 and TODO 2?)
>>>
>>> Not sure what the problem is. Foo2 escapes because it's address is 
>>> passed to a function.
>>>
>> 
>> foo2 is not the problem, it is foo. foo is not passed to a function
>and 
>> it is also escaping.
>
>
>Sorry, I meant: foo might point to something which is externally 
>visible. Which I don't think is the case in the program. I understand 
>this might be due to the imprecision in the escape-analysis and what
>I'm 
>trying to find out is the source of imprecision.

Foo is exported and thus all function calls can store to it making it point to 
escaped and nonlocal variables. 

Richard. 

>> 
>>> ?
>>>
>>> Richard.
>>>
 Just FYI, I've been reading:
 * Structure Aliasing in GCC
 * Gimple Alias Improvements for GCC 4.5
 * Memory SSA - A Unified Approach for Sparsely Representing Memory
 Operations

 Thanks, I appreciate all help!
>>>



Re: Do all global structure variables escape in IPA-PTA?

2020-08-26 Thread Richard Biener via Gcc
On Wed, Aug 26, 2020 at 11:45 AM Erick Ochoa
 wrote:
>
>
>
> On 26/08/2020 10:36, Erick Ochoa wrote:
> >
> >
> > On 25/08/2020 22:03, Richard Biener wrote:
> >> On August 25, 2020 6:36:19 PM GMT+02:00, Erick Ochoa
> >>  wrote:
> >>>
> >>>
> >>> On 25/08/2020 17:19, Erick Ochoa wrote:
> 
> 
>  On 25/08/2020 17:10, Richard Biener wrote:
> > On August 25, 2020 3:09:13 PM GMT+02:00, Erick Ochoa
> >  wrote:
> >> Hi,
> >>
> >> I'm trying to understand how the escape analysis in IPA-PTA works.
> >>> I
> >> was
> >> testing a hypothesis where if a structure contains an array of
> >> characters and this array of characters is passed to fopen, the
> >> structure and all subfields will escape.
> >>
> >> To do this, I made a program that has a global structure variable
> >>> foo2
> >> that is has a field passed as an argument to fopen. I also made
> >>> another
> >>
> >> variable foo whose array is initialized by the result of rand.
> >>
> >> However, after compiling this program with -flto
> >>> -flto-partition=none
> >> -fipa -fdump-ipa-pta -fdump-tree-all-all -Ofast (gcc --version
> >>> 10.2.0)
> >>
> >> E.g.
> >>
> >> #include 
> >> #include 
> >> #include 
> >>
> >> struct foo_t {
> >> char buffer1[100];
> >> char buffer2[100];
> >> };
> >>
> >> struct foo_t foo;
> >> struct foo_t foo2;
> >>
> >> int
> >> main(int argc, char** argv)
> >> {
> >>
> >> fopen(foo2.buffer1, "r");
> >> for (int i = 0; i < 100; i++)
> >> {
> >>   foo.buffer1[i] = rand();
> >> }
> >> int i = rand();
> >> int retval = foo.buffer1[i % 100];
> >> return retval;
> >> }
> >>
> >> I see the PTA dump state the following:
> >>
> >> ESCAPED = { STRING ESCAPED NONLOCAL foo2 }
> >> foo = { ESCAPED NONLOCAL }
> >> foo2 = { ESCAPED NONLOCAL }
> >>
> >> which I understand as
> >> * something externally visible might point to foo2
> >> * foo2 might point to something externally visible
> >> * foo might point to something externally visible
> >
> > Yes. So it's exactly as your hypothesis.
> >
> >> I have seen that global variables are stored in the .gnu.lto_.decls
> >>> LTO
> >>
> >> file section. In the passes I have worked on I have ignored global
> >> variables. But can foo and foo2 be marked as escaping because the
> >> declarations are not streamed in yet? Or is there another reason I
> >>> am
> >> not seeing? I am aware of aware of the several TODOs at the
> >>> beginning
> >> of
> >> gcc/tree-ssa-structalias.c but I am unsure if they contribute to
> >>> these
> >> variables being marked as escaping. (Maybe TODO 1 and TODO 2?)
> >
> > Not sure what the problem is. Foo2 escapes because it's address is
> > passed to a function.
> >
> 
>  foo2 is not the problem, it is foo. foo is not passed to a function
> >>> and
>  it is also escaping.
> >>>
> >>>
> >>> Sorry, I meant: foo might point to something which is externally
> >>> visible. Which I don't think is the case in the program. I understand
> >>> this might be due to the imprecision in the escape-analysis and what
> >>> I'm
> >>> trying to find out is the source of imprecision.
> >>
> >> Foo is exported and thus all function calls can store to it making it
> >> point to escaped and nonlocal variables.
> >
> > Hi Richard,
> >
> > I'm still not sure why foo can point to escaped and nonlocal variables.
> >
> > My understanding is that ipa-visibility can mark variables and functions
> > as not externally visible. Which I think is equivalent to excluding
> > these symbols from the export table. I have printed the results of
> > vnode->externally_visible and vnode->externally_visible_p() of foo and
> > foo2 and both predicates return 0. (This is in a pass just before
> > IPA-PTA). I think this means that both variables are not exported.
> > IPA-PTA still says that foo can point to escaped memory.
> >
> > ESCAPED = { NULL STRING ESCAPED NONLOCAL foo2 } // correct
> > foo = { ESCAPED NONLOCAL } same as _4 // my question
> > foo2 = { ESCAPED NONLOCAL } // correct
> >
> > I then later declared foo as static. Which I think should mark the
> > variable for internal linkage only. IPA-PTA still says that foo can
> > point to escaped memory.
> >
> > I also used the -fwhole-program flag and IPA-PTA still says that foo can
> > point to escaped memory.
> >
> > Am I failing to specify that foo must *not* be exported? Or IPA-PTA does
> > not considering the visibility of symbols? Or there is something else
> > I'm missing?
>
> Hi Richard,
>
> I think the reason why the global variables escape is because probably
> is_ipa_escape_point is not being used in all the places. According to
> the comments in tree-ssa-structalias.c
>
> The is_global_var bit which marks escape

Re: LTO slows down calculix by more than 10% on aarch64

2020-08-26 Thread Richard Biener via Gcc
On Wed, Aug 26, 2020 at 12:34 PM Prathamesh Kulkarni via Gcc
 wrote:
>
> Hi,
> We're seeing a consistent regression >10% on calculix with -O2 -flto vs -O2
> on aarch64 in our validation CI. I tried to investigate this issue a
> bit, and it seems the regression comes from inlining of orthonl into
> e_c3d. Disabling that brings back the performance. However, inlining
> orthonl into e_c3d, increases it's size from 3187 to 3837 by around
> 16.9% which isn't too large.
>
> I have attached two test-cases, e_c3d.f that has orthonl manually
> inlined into e_c3d to "simulate" LTO's inlining, and e_c3d-orig.f,
> which contains unmodified function.
> (gauss.f is included by e_c3d.f). For reproducing, just passing -O2 is
> sufficient.
>
> It seems that inlining orthonl, causes 20 hoistings into block 181,
> which are then hoisted to block 173, in particular hoistings of w(1,
> 1) ... w(3, 3), which wasn't
> possible without inlining. The hoistings happen because of basic block
> that computes orthonl in line 672 has w(1, 1) ... w(3, 3) and the
> following block in line 1035 in e_c3d.f:
>
> senergy=
>  &(s11*w(1,1)+s12*(w(1,2)+w(2,1))
>  &+s13*(w(1,3)+w(3,1))+s22*w(2,2)
>  &+s23*(w(2,3)+w(3,2))+s33*w(3,3))*weight
>
> Disabling hoisting into blocks 173 (and 181), brings back most of the
> performance. I am not able to understand why (if?) these hoistings of
> w(1, 1) ...
> w(3, 3) are causing slowdown however. Looking at assembly, the hot
> code-path from perf in e_c3d shows following code-gen diff:
> For inlined version:
> .L122:
> ldr d15, [x1, -248]
> add w0, w0, 1
> add x2, x2, 24
> add x1, x1, 72
> fmuld15, d17, d15
> fmuld15, d15, d18
> fmuld14, d15, d14
> fmadd   d16, d14, d31, d16
> cmp w0, 4
> beq .L121
> ldr d14, [x2, -8]
> b   .L122
>
> and for non-inlined version:
> .L118:
> ldr d0, [x1, -248]
> add w0, w0, 1
> ldr d2, [x2, -8]
> add x1, x1, 72
> add x2, x2, 24
> fmuld0, d3, d0
> fmuld0, d0, d5
> fmuld0, d0, d2
> fmadd   d1, d4, d0, d1
> cmp w0, 4
> bne .L118

I wonder if you have profles.  The inlined version has a
non-empty latch block (looks like some PRE is happening
there?).  Eventually your uarch does not like the close
(does your assembly show the layour as it is?) branches?

> which corresponds to the following loop in line 1014.
> do n1=1,3
>   s(iii1,jjj1)=s(iii1,jjj1)
>  &  +anisox(m1,k1,n1,l1)
>  &  *w(k1,l1)*vo(i1,m1)*vo(j1,n1)
>  &  *weight
>
> I am not sure why would hoisting have any direct effect on this loop
> except perhaps that hoisting allocated more reigsters, and led to
> increased register pressure. Perhaps that's why it's using highered
> number regs for code-gen in inlined version ? However disabling
> hoisting in blocks 173 and 181, also leads to overall 6 extra spills
> (by grepping for str to sp), so
> hoisting is also helping here ? I am not sure how to proceed further,
> and would be grateful for suggestions.
>
> Thanks,
> Prathamesh


Re: Questions regarding update_stmt and release_ssa_name_fn.

2020-08-27 Thread Richard Biener via Gcc
On Wed, Aug 26, 2020 at 11:32 PM Gary Oblock via Gcc  wrote:
>
> I'm having some major grief with a few related things that I'm try to
> do. The mostly revolve around trying to change the type of an SSA name
> (which I've given up in favor of creating new SSA names and replacing
> the ones I wanted to change.) However, this seems too has its own
> issues.
>
> In one problematic case in particular, I'm seeing a sequence like:
>
> foo_3 = mumble_1 op mumble_2
>
> bar_5 = foo_3 op baz_4
>
> when replace foo_3 with foo_4 the (having the needed new type.)
>
> I'm seeing a later verification phase think
>
> bar_5 = foo_4 op baz_4
>
> is still associated with the foo_3.
>
> Should the transformation above be associated with update_stmt and/or
> release_ssa_name_fn? And if they are both needed is there a proper
> order required.  Note, when I try using them, I'm seeing some malformed
> tree operands that die in horrible ways.
>
> By the way, I realize I can probably simply create a new GIMPLE stmt
> from scratch to replace the ones I'm modifying but this will cause
> some significant code bloat and I want to avoid that if at all
> possible.

You need to call update_stmt () if you change SSA operands to
sth else.

> There is an addition wrinkle to this problem with C code like this
>
> void
> whatever ( int x, .. )
> {
>   :
>   x++;
>   :
> }
>
> I'm seeing x_2 being thought of as default definition in the following
> GIMPLE stmt when it's clearly not since it's defined by the statement.
>
>   x_2 = X_1 + 4
>
> My approach has been to simply make the SSA name to replace x_2a
> normal SSA name and not a default def. Is this not reasonable and
> correct?

If x_2 is a default def then the IL isn't correct in the first place.  I doubt
it is that way, btw. - we have verifiers that would blow up if it would.

Richard.

>
> Thanks,
>
> Gary Oblock
>
> Gary
>
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Re: Questions regarding update_stmt and release_ssa_name_fn.

2020-08-27 Thread Richard Biener via Gcc
On August 27, 2020 7:45:15 PM GMT+02:00, Gary Oblock  
wrote:
>Richard,
>
>>You need to call update_stmt () if you change SSA operands to
>>sth else.
>
>I'm having trouble parsing the "sth else" above. Could you
>please rephrase this if it's important to your point. I take
>what you mean is if you change any SSA operand to any
>statement then update that statement.

If you change any SSA operand of any stmt to a different SSA name or a constant 
then you need to update the stmt containing the SSA operand. 

In _1 = _2 + 3; _2 is an SSA operand of that stmt. 

Richard. 

>Thanks,
>
>Gary
>
>From: Richard Biener 
>Sent: Thursday, August 27, 2020 2:04 AM
>To: Gary Oblock 
>Cc: gcc@gcc.gnu.org 
>Subject: Re: Questions regarding update_stmt and release_ssa_name_fn.
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>On Wed, Aug 26, 2020 at 11:32 PM Gary Oblock via Gcc 
>wrote:
>>
>> I'm having some major grief with a few related things that I'm try to
>> do. The mostly revolve around trying to change the type of an SSA
>name
>> (which I've given up in favor of creating new SSA names and replacing
>> the ones I wanted to change.) However, this seems too has its own
>> issues.
>>
>> In one problematic case in particular, I'm seeing a sequence like:
>>
>> foo_3 = mumble_1 op mumble_2
>>
>> bar_5 = foo_3 op baz_4
>>
>> when replace foo_3 with foo_4 the (having the needed new type.)
>>
>> I'm seeing a later verification phase think
>>
>> bar_5 = foo_4 op baz_4
>>
>> is still associated with the foo_3.
>>
>> Should the transformation above be associated with update_stmt and/or
>> release_ssa_name_fn? And if they are both needed is there a proper
>> order required.  Note, when I try using them, I'm seeing some
>malformed
>> tree operands that die in horrible ways.
>>
>> By the way, I realize I can probably simply create a new GIMPLE stmt
>> from scratch to replace the ones I'm modifying but this will cause
>> some significant code bloat and I want to avoid that if at all
>> possible.
>
>You need to call update_stmt () if you change SSA operands to
>sth else.
>
>> There is an addition wrinkle to this problem with C code like this
>>
>> void
>> whatever ( int x, .. )
>> {
>>   :
>>   x++;
>>   :
>> }
>>
>> I'm seeing x_2 being thought of as default definition in the
>following
>> GIMPLE stmt when it's clearly not since it's defined by the
>statement.
>>
>>   x_2 = X_1 + 4
>>
>> My approach has been to simply make the SSA name to replace x_2a
>> normal SSA name and not a default def. Is this not reasonable and
>> correct?
>
>If x_2 is a default def then the IL isn't correct in the first place. 
>I doubt
>it is that way, btw. - we have verifiers that would blow up if it
>would.
>
>Richard.
>
>>
>> Thanks,
>>
>> Gary Oblock
>>
>> Gary
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any
>attachments, is for the sole use of the intended recipient(s) and
>contains information that is confidential and proprietary to Ampere
>Computing or its subsidiaries. It is to be used solely for the purpose
>of furthering the parties' business relationship. Any unauthorized
>review, copying, or distribution of this email (or any attachments
>thereto) is strictly prohibited. If you are not the intended recipient,
>please contact the sender immediately and permanently delete the
>original and any copies of this email and any attachments thereto.



Re: Questions regarding update_stmt and release_ssa_name_fn.

2020-08-28 Thread Richard Biener via Gcc
On Fri, Aug 28, 2020 at 6:29 AM Gary Oblock  wrote:
>
> > If x_2 is a default def then the IL isn't correct in the first place.  I 
> > doubt
> > it is that way, btw. - we have verifiers that would blow up if it would.
>
> Richard,
>
> I'm just sharing this so you can tell me whether or not I'm going
> crazy. ;-)
>
> This little function is finding that arr_2 = PHI 
> is problematic.
>
> void
> wolf_fence (
> Info *info // Pass level gobal info (might not use it)
>   )
> {
>   struct cgraph_node *node;
>
>   fprintf( stderr,
>   "Wolf Fence: Find wolf for default defs with non nop defines\n");
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> {
>   struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
>   push_cfun ( func);
>
>   unsigned int len = SSANAMES ( func)->length ();
>   for ( unsigned int i = 0; i < len; i++)
>
> {
>
>  tree ssa_name = (*SSANAMES ( func))[i];
>  if ( ssa_name == NULL ) continue;
>  if ( ssa_defined_default_def_p ( ssa_name) )

That's because this function is supposed to be only called
on SSA default defs but you call it on all SSA names.  Add
a SSA_NAME_IS_DEFAULT_DEF (ssa_name) && before
and all will be well.

>{
>  gimple *def_stmt =
> SSA_NAME_DEF_STMT ( ssa_name);
>  if ( !gimple_nop_p ( def_stmt) )
>
> {
>  fprintf ( stderr, "Wolf fence caught :");
>  print_gimple_stmt ( stderr, def_stmt, 0);
>  gcc_assert (0);
> }
>
>}
>
>}
>
> pop_cfun ();
> }
> fprintf( stderr, "Wolf Fence: Didn't find wolf!\n");
> }
>
> This is run at the very start of the structure reorg pass
> before any of my code did anything at all (except initiate
> the structure info with a few flags and the like.)
>
> Here's C code:
>
> - aux.h ---
> #include "stdlib.h"
> typedef struct type type_t;
> struct type {
>   double x;
>   double y;
> };
>
> extern type_t *min_of_x( type_t *, size_t);
> - aux.c ---
> #include "aux.h"
> #include "stdlib.h"
>
> type_t *
> min_of_x( type_t *arr, size_t len)
> {
>   type_t *end_of = arr + len;
>   type_t *loc = arr;
>   double result = arr->x;
>   arr++;
>   for( ; arr < end_of ; arr++  ) {
> double value = arr->x;
> if (  value < result ) {
>   result = value;
>   loc = arr;
> }
>   }
>   return loc;
> }
> - main.c --
> #include "aux.h"
> #include "stdio.h"
>
> int
> main(void)
> {
>   size_t len = 1;
>   type_t *data = (type_t *)malloc( len * sizeof(type_t));
>   int i;
>   for( i = 0; i < len; i++ ) {
> data[i].x = drand48();
>   }
>
>   type_t *min_x;
>   min_x = min_of_x( data, len);
>
>   if ( min_x == 0 ) {
> printf("min_x error\n");
> exit(-1);
>   }
>
>   printf("min_x %e\n" , min_x->x);
> }
> ---
> Here's the GIMPLE comining into the structure reoganization pass:
>
> Program:
>
> ;; Function min_of_x (min_of_x, funcdef_no=0, decl_uid=4391, cgraph_uid=2, 
> symbol_order=23) (executed once)
>
> min_of_x (struct type_t * arr, size_t len)
> {
>   double value;
>   double result;
>   struct type_t * loc;
>   struct type_t * end_of;
>
>[local count: 118111600]:
>   _1 = len_7(D) * 16;
>   end_of_9 = arr_8(D) + _1;
>   result_11 = arr_8(D)->x;
>   arr_12 = arr_8(D) + 16;
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   value_14 = arr_2->x;
>   if (result_6 > value_14)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 477815112]:
>
>[local count: 955630225]:
>   # loc_3 = PHI 
>   # result_5 = PHI 
>   arr_15 = arr_2 + 16;
>
>[local count: 1073741824]:
>   # arr_2 = PHI 
>   # loc_4 = PHI 
>   # result_6 = PHI 
>   if (arr_2 < end_of_9)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   # loc_13 = PHI 
>   return loc_13;
>
> }
>
>
>
> ;; Function main (main, funcdef_no=1, decl_uid=4389, cgraph_uid=1, 
> symbol_order=5) (executed once)
>
> main ()
> {
>   struct type_t * min_x;
>   int i;
>   struct type_t * data;
>
>[local count: 10737416]:
>   data_10 = malloc (16);
>   goto ; [100.00%]
>
>[local count: 1063004409]:
>   _1 = _4 * 16;
>   _2 = data_10 + _1;
>   _3 = drand48 ();
>   _2->x = _3;
>   i_18 = i_6 + 1;
>
>[local count: 1073741824]:
>   # i_6 = PHI <0(2), i_18(3)>
>   _4 = (long unsigned int) i_6;
>   if (_4 != 1)
> goto ; [99.00%]
>   else
> goto ; [1.00%]
>
>[local count: 10737416]:
>   min_x_12 = min_of_x (data_10, 1);
>   if (min_x_12 == 0B)
> goto ; [0.04%]
>   else
> goto ; [99.96%]
>
>[local count: 4295]:
>   __builtin_puts (&"min_x error"[0]);
>   exit (-1);
>
>[local count: 10733121]:
>   _5 = min_x_12->x;
>   printf ("min_x %e\n", _5);
>   return 0;
>
> }
>
> Am I crazy?
>
> Thanks,
>
> Gary
>
>
>
>
>
>
>
>
> 
> From: Richard Biener 
> Sent: Thursday, August 27,

Re: LTO slows down calculix by more than 10% on aarch64

2020-08-28 Thread Richard Biener via Gcc
On Fri, Aug 28, 2020 at 1:17 PM Prathamesh Kulkarni
 wrote:
>
> On Wed, 26 Aug 2020 at 16:50, Richard Biener  
> wrote:
> >
> > On Wed, Aug 26, 2020 at 12:34 PM Prathamesh Kulkarni via Gcc
> >  wrote:
> > >
> > > Hi,
> > > We're seeing a consistent regression >10% on calculix with -O2 -flto vs 
> > > -O2
> > > on aarch64 in our validation CI. I tried to investigate this issue a
> > > bit, and it seems the regression comes from inlining of orthonl into
> > > e_c3d. Disabling that brings back the performance. However, inlining
> > > orthonl into e_c3d, increases it's size from 3187 to 3837 by around
> > > 16.9% which isn't too large.
> > >
> > > I have attached two test-cases, e_c3d.f that has orthonl manually
> > > inlined into e_c3d to "simulate" LTO's inlining, and e_c3d-orig.f,
> > > which contains unmodified function.
> > > (gauss.f is included by e_c3d.f). For reproducing, just passing -O2 is
> > > sufficient.
> > >
> > > It seems that inlining orthonl, causes 20 hoistings into block 181,
> > > which are then hoisted to block 173, in particular hoistings of w(1,
> > > 1) ... w(3, 3), which wasn't
> > > possible without inlining. The hoistings happen because of basic block
> > > that computes orthonl in line 672 has w(1, 1) ... w(3, 3) and the
> > > following block in line 1035 in e_c3d.f:
> > >
> > > senergy=
> > >  &(s11*w(1,1)+s12*(w(1,2)+w(2,1))
> > >  &+s13*(w(1,3)+w(3,1))+s22*w(2,2)
> > >  &+s23*(w(2,3)+w(3,2))+s33*w(3,3))*weight
> > >
> > > Disabling hoisting into blocks 173 (and 181), brings back most of the
> > > performance. I am not able to understand why (if?) these hoistings of
> > > w(1, 1) ...
> > > w(3, 3) are causing slowdown however. Looking at assembly, the hot
> > > code-path from perf in e_c3d shows following code-gen diff:
> > > For inlined version:
> > > .L122:
> > > ldr d15, [x1, -248]
> > > add w0, w0, 1
> > > add x2, x2, 24
> > > add x1, x1, 72
> > > fmuld15, d17, d15
> > > fmuld15, d15, d18
> > > fmuld14, d15, d14
> > > fmadd   d16, d14, d31, d16
> > > cmp w0, 4
> > > beq .L121
> > > ldr d14, [x2, -8]
> > > b   .L122
> > >
> > > and for non-inlined version:
> > > .L118:
> > > ldr d0, [x1, -248]
> > > add w0, w0, 1
> > > ldr d2, [x2, -8]
> > > add x1, x1, 72
> > > add x2, x2, 24
> > > fmuld0, d3, d0
> > > fmuld0, d0, d5
> > > fmuld0, d0, d2
> > > fmadd   d1, d4, d0, d1
> > > cmp w0, 4
> > > bne .L118
> >
> > I wonder if you have profles.  The inlined version has a
> > non-empty latch block (looks like some PRE is happening
> > there?).  Eventually your uarch does not like the close
> > (does your assembly show the layour as it is?) branches?
> Hi Richard,
> I have uploaded profiles obtained by perf here:
> -O2: https://people.linaro.org/~prathamesh.kulkarni/o2_perf.data
> -O2 -flto: https://people.linaro.org/~prathamesh.kulkarni/o2_lto_perf.data
>
> For the above loop, it shows the following:
> -O2:
>   0.01 │ f1c:  ldur   d0, [x1, #-248]
>   3.53 │addw0, w0, #0x1
>   │ldur   d2, [x2, #-8]
>   3.54 │addx1, x1, #0x48
>   │addx2, x2, #0x18
>   5.89 │fmul   d0, d3, d0
> 14.12 │fmul   d0, d0, d5
> 14.14 │fmul   d0, d0, d2
> 14.13 │fmadd  d1, d4, d0, d1
>   0.00 │cmpw0, #0x4
>   3.52 │  ↑ b.ne   f1c
>
> -O2 -flto:
>   5.47  |1124:ldur   d15, [x1, #-248]
>   2.19  │addw0, w0, #0x1
>   1.10  │addx2, x2, #0x18
>   2.18  │addx1, x1, #0x48
>   4.37  │fmul   d15, d17, d15
>  13.13 │fmul   d15, d15, d18
>  13.13 │fmul   d14, d15, d14
>  13.14 │fmadd  d16, d14, d31, d16
>│cmpw0, #0x4
>   3.28  │↓ b.eq   1154
>   0.00  │ldur   d14, [x2, #-8]
>   2.19  │↑ b  1124
>
> IIUC, the biggest relative difference comes from load [x1, #-248]
> which in LTO's case takes 5.47% of overall samples:
> 5.47  |1124:   ldur   d15, [x1, #-248]
> while in case of -O2, it's just 0.01:
>  0.01 │ f1c:   ldur   d0, [x1, #-248]
>
> I wonder if that's (one of) the main factor(s) behind slowdown or it's
> not too relevant ?

This looks more like the branch since usually branch costs
are attributed to the target rather than the branch itself.  You could
try re-ordering the code so the loop entry jumps around the
latch which can then fall thru so see if that makes a difference.

Richard.

> Thanks,
> Prathamesh
> >
> > > which corresponds to the following loop in line 1014.
> > > do n1=1,3
> > >   s(iii1,jjj1)=s(iii1,jjj1)
> > >  &   

  1   2   3   4   5   6   7   8   >