Failing in generated file options.c

2021-03-16 Thread Erick Ochoa via Gcc
Hi Gary,

the options.c file is generated from the gcc/common.opt file. Report was a
keyword that could be given to optimization options but which was dropped
sometime in December I think. The patches I sent you should have the
keyword Report dropped. Are you applying your sources already? If not, are
you only applying a subset of the patches? If you look at the patch
numbered 9, you will see that the Report keyword was removed. If you are
only applying patches before the 9th, yeah I would expect that error to
trigger.

-Erick


Question on special constraint variables in points-to analysis

2021-03-16 Thread Erick Ochoa via Gcc
Hello,

I'm currently working on improving my understanding of the implementation
of the intraprocedural points-to analysis in GCC. I have already read the
papers by Daniel Berlin and have been looking at the source for some time,
but I understand that those references are a little bit old and might not
fully reflect the status of the current points-to analysis.

In order to familiarize myself with the analysis I decided to take a look
at the constraints dumped in the logs and input them into a different
solver which also implements a points-to analysis. I do this during GCC's
run time. In other words, before GCC computes the points-to analysis using
the regular intraprocedural points-to analysis, I take a look at the
constraint vector and translate them to an interface the other solver can
understand. Therefore, after this other solver finished, there are two sets
of solutions for the points-to analysis.

In order to verify the correctness of my analysis I simply iterate over
GCC's solution and place an assertion to verify that both points-to sets
for each variable are equal. The solution computed by my implementation
solves the constraints by abstracting the varinfo into integers by using
the varinfo id. And then the relationship between points-to variables are
expressed as variable A id points to variable B id. Since the variable info
ids are just offsets into the varinfo array, then there should be a one to
one mapping between the solutions in this solver to the solutions computed
by GCC.

Its implemented something like the following:

```
for (auto output : points_to_analysis) {
   int from_i; // pointer variable index in get_varinfo(from_i)
   int to_i; // point-ee variable index in get_varinfo(toi)

   if(!get_varinfo(from_i)->may_have_pointers) continue;
   bitmap vars = get_varinfo(from_i)->solution;
   if (!vars) continue;

   if (dump_file) fprintf(dump_file, "%s -> %s\n",
get_varinfo(from_i)->name, get_varinfo(to_i)->name);

   bool same = bitmap_bit_p(vars, to_i);
   if (dump_file) fprintf(dump_file, "SAME = %s\n", same ? "true" :
"false");
   gcc_assert(same);

}
```

This means that if the two analyses were equal, all test cases testing
-ftree-pta should succeed. At the moment, the assertion is being triggered.
Now, I would like to understand why the assertion is triggered without
asking you to debug my code, so instead I'm trying to understand why GCC's
solution is correct. (I gave all this exposition because I may be making
assumptions that are incorrect and that can quickly be verified by someone
more knowledgeable.) Now, I'll focus on the test case:

The code in question is a slightly modified version of the test case
ipa-pta-16.c
```
struct X
{
  long l1;
  struct Y
{
  long l2;
  int *p;
} y;
};
int i;
static int __attribute__((noinline))
foo (struct X *x)
{
  struct Y y = x->y;
  *y.p = 0;
  i = 1;
  return *y.p;
}
extern void abort (void);
int main()
{
  struct X x;
  struct X t;
  x.y.p = &i;
  long* e = &t.l1;
  if (foo(&x) != 1)
abort ();
  return e;
}
```

and is being compiled with the following flags:

gcc -O2 -ftree-pta -ftree-pta-external-solver -fdump-tree-alias
-ftree-salias ipa-pta-16.c

I would like to focus on the foo function, which is translated to the
following gimple:
```
__attribute__((noinline))
int foo.constprop.isra (int * ISRA.4)
{
  int * y$p;
  struct X * x;
  struct X * x;
  int _2;

   [local count: 1073741824]:
  *ISRA.4_3(D) = 0;
  i = 1;
  _2 = *ISRA.4_3(D);
  return _2;

}
```

GCC generates the following constraints (as seen on the dump):

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
i = NONLOCAL
i = &NONLOCAL
ESCAPED = &NONLOCAL
_2 = *ISRA.4
```

The points-to set generated by GCC are the following:

```
ANYTHING = { ANYTHING }
ESCAPED = { NULL ESCAPED NONLOCAL }
NONLOCAL = { ESCAPED NONLOCAL } same as i
STOREDANYTHING = { }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
i = { ESCAPED NONLOCAL }
_2 = { ESCAPED NONLOCAL }
foo.constprop.0.isra.0 = { }
```

I also noticed that there is more information about ISRA.4 which is not
printed in the points-to sets printed above. Instead, the dump continues
and adds this line:

```
Flow-insensitive points-to information

ISRA.4_3(D), points-to non-local, points-to NULL, points-to vars: { }
```

Now, I understand that the first set of solutions (i.e, `{ NONLOCAL }`) is
the solution of the constraint variable. That is, the constraint variable
ISRA.4 when assigned the set `{ NONLOCAL }` is a solution to the
constraints... I also understand that ISRA.4_3(D) is the SSA variable on
the gimple code. There is technically a mapping between the two, but I
guess there's some distinction. I also understand that the first dump is
generated from the function dump_so

More questions on points-to analysis

2021-03-17 Thread Erick Ochoa via Gcc
Hello,

I'm still trying to compare the solution generated from the
intraprocedural points-to analysis in GCC against an external solver.

Yesterday it was pointed out that "NULL is not conservatively
correctly represented in the constraints". Can someone expand on this?
To me this sounds like a couple of things:
* even though malloc may return NULL, NULL is not added to the
points-to sets of whatever variable is on the left hand side of the
malloc call.
* the process in GCC that generates the constraints for NULL somehow
does not generate enough constraints to treat NULL conservatively and
therefore there might be points-to sets which should contain NULL but
don't. (However, doesn't this mean that feeding the constraints to an
external solver should still give the same answers?)
* the process in GCC that generates the constraints for NULL works
fine (i.e., feeding the constraints generated by GCC to an external
solver should yield a conservatively correct answer) but the process
that solves the constraints relaxes the solutions for the NULL
constraint variable (i.e., GCC has deviated from the constraint
solving algorithm somehow)

Also, "at some point we decided to encode optimistic info into
pt->null which means points-to now has to compute a conservatively
correct pt->null." Doesn't this contradict itself? How is a pt->null
first optimistically and now conservatively? Is what this is trying to
say that:

* NULL constraints were conservative first
* pt->null optimistic first
* Then conversion to SSA happened and NULL constraints became not
conservatively represented in the constraints (effectively becoming
somewhat optimistic)
* To avoid NULL and pt->null be both unsafe, pt->null was changed to
be conservative

I've been looking at find_what_vars_points_to and have changed my code
which verifies the constraint points-to sets. Basically, I now find
which variables have been collapsed and only for "real" constraint
pointer variables I take a look at the points to solution struct.
Before looking into vars, I take a look at the fields and compare the
null, anything, escape, etc, against the id of the pointee-variable.
Checking vars is slightly confusing for me at the moment, since it
appears that there are at least 3 plausible ways of validating the
solution (I haven't actually gotten there because assertions are being
triggered).

```
for (auto &output : *orel) {
   int from_i;
   int to_i;

  // Since find_what_var_points_to
  // doesn't change the solution for collapsed
  // variables, only verify the answer for the real ones.
   varinfo_t from_var = get_varinfo(from_i);
   varinfo_t vi = get_varinfo (find (from_i));
   if (from_var->id != vi->id) continue;
   if (!from_var->may_have_pointers) continue;

   // compute the pt_solution
   pt_solution solution = find_what_var_points_to (cfun->decl, from_var);

   // pointee variable according to external analysis
   varinfo_t vi_to = get_varinfo(to_i);

   // Since some artificial variables are stored in fields instead
of the bitset
   // assert based on field values.
  // However you can see that I already had to disable some of the
assertions.
   if (vi_to->is_artificial_var)
{
  if (vi_to->id == nothing_id)
  {
gcc_assert(solution.null && vi_to->id == nothing_id);
continue;
  }
  else if (vi_to->id == escaped_id)
{
  if (in_ipa_mode)
  {
gcc_assert(solution.ipa_escaped && vi_to->id == escaped_id);
  }
  else
  {
//gcc_assert(solution.escaped && vi_to->id == escaped_id);
  }
  continue;
  /* Expand some special vars of ESCAPED in-place here. ??*/
}
  // More...
 }

   if (solution.anything) continue;

   bitmap vars = solution.vars;
   if (!vars) continue;


   if (dump_file) fprintf(dump_file, "SAME = %s\n",
bitmap_bit_p(vars, DECL_PT_UID(vi_to->decl)) ? "true" : "false");
   if (dump_file) fprintf(dump_file, "SAME2 = %s\n",
bitmap_bit_p(vars, to_i) ? "true" : "false");
   if (dump_file) fprintf(dump_file, "SAME3 = %s\n",
bitmap_bit_p(from_var->solution, to_i) ? "true" : "false");
```

Can someone help me figure out why even though I have a "real"
variable and I compute its solution with the "find_what_var_points_to"
method the solution does not have the fields that I expect to be set?
(I would expect solution.escaped to be escaped if the pointee variable
vi_to has an id = escaped_id).

And also, how is the DECL_PT_UID different from the varinfo id field?
Shouldn't they be the same? It seems that during
"find_what_var_points_to" DECL_PT_UID is being used to set the bit in
the bitmap, but in previous instances it was the varinfo id offset?


Re: More questions on points-to analysis

2021-03-17 Thread Erick Ochoa via Gcc
Hi Richard, I think I misunderstood yesterday's answer and deviated a
little bit. But now I want to focus on this:

> > * the process in GCC that generates the constraints for NULL works
> > fine (i.e., feeding the constraints generated by GCC to an external
> > solver should yield a conservatively correct answer) but the process
> > that solves the constraints relaxes the solutions for the NULL
> > constraint variable (i.e., GCC has deviated from the constraint
> > solving algorithm somehow)
>
> No, that part should work OK.
>

So, let's ignore the other solver for now and instead focus on the
concrete example I presented on the previous email. If GCC is solving
these constraints:

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
i = NONLOCAL
i = &NONLOCAL
ESCAPED = &NONLOCAL
_2 = *ISRA.4
```

What would a hand calculated solution gives us vs what was the
solution given by GCC?

I am following the algorithm stated on Section 3.3 of Structure
Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
UNKNOWN constraint since there isn't any other field offset that needs
to be calculated.

First, I want to make some adjustments. I am going to be using "=" to
signify the \supseteq symbol and I will be adding curly braces to
specify the element in a set as opposed to the whole set. Therefore
the constraints will now become (ordered slightly differently):

```
direct contraints
ANYTHING = { ANYTHING }
ESCAPED = { NONLOCAL }
NONLOCAL = { NONLOCAL }
NONLOCAL =  { ESCAPED }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
i = { NONLOCAL }

complex constraints==
ESCAPED = *ESCAPED
*ESCAPED = NONLOCAL
*ISRA.4 = derefaddrtmp(9)
_2 = *ISRA.4

= copy-constraints==
ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
it will matter...
i = NONLOCAL
```

Solution sets are basically the direct constraints at the moment.

Let's now create the graph

1. node ESCAPED has an edge going to itself (due to the copy constraint)
2. node ISRA.4 has no outgoing copy edges
3. node derefaddrtmp(9) has no outgoing edges
4. node _2 has no outgoing edges
5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
6. node NONLOCAL has no outgoing edge

Now, we can iterate over this set of nodes

1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
edges, but it has complex-constraints. Let's modify the graph.
  1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
edge from ESCAPED to NONLOCAL.
  2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
edge from NONLOCAL to NONLOCAL

The graph is now transformed to

1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
2. node ISRA.4 has no outgoing copy edges
3. node derefaddrtmp(9) has no outgoing edges
4. node _2 has no outgoing edges
5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
6. node NONLOCAL has an edge going to NONLOCAL

The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
Sol(NONLOCAL) = {NONLOCAL, ESCAPED}

Now we continue walking

2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
There are no edges, but it has complex-constraints. Let's modify the
graph.
  1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
a copy edge from NONLOCAL to derefaddrtmp(9).

The graph is now transformed to

1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
2. node ISRA.4 has no outgoing copy edges
3. node derefaddrtmp(9) has no outgoing edges
4. node _2 has no outgoing edges
5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)

The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
ESCAPED, NULL}.

Now I could continue, but here is already something that is not shown
in the points-to sets in the dump. It shows that

NONLOCAL = {NONLOCAL, ESCAPED, NULL}

Looking at the data that I showed yesterday:

```
NONLOCAL = { ESCAPED NONLOCAL } same as i
```

we see that NULL is not in the solution set of NONLOCAL.

Now, yesterday you said that NULL is not conservatively correctly
represented in the constraints. You also said today the points-to
analysis should be solving the constraints fine. What I now understand
from this is that while NULL may be pointed to by some constraints, it
doesn't mean that not being on the set means that a pointer will not
point to NULL. However, it should still be shown in the dumps where
the points-to sets are shown for the constraint variables since it is
solved using the same analysis? Is this correct? Am I doing the points
to analysis by hand wrong somehow? Why would NULL not be in
Sol(NONLOCAL) if it is solving the same constraints that I am solving
by hand?


Re: More questions on points-to analysis

2021-03-17 Thread Erick Ochoa via Gcc
Mm... ignore this for now please, I think I messed up the analysis by
hand. I will try again. Thanks!

On Wed, 17 Mar 2021 at 16:16, Erick Ochoa  wrote:
>
> Hi Richard, I think I misunderstood yesterday's answer and deviated a
> little bit. But now I want to focus on this:
>
> > > * the process in GCC that generates the constraints for NULL works
> > > fine (i.e., feeding the constraints generated by GCC to an external
> > > solver should yield a conservatively correct answer) but the process
> > > that solves the constraints relaxes the solutions for the NULL
> > > constraint variable (i.e., GCC has deviated from the constraint
> > > solving algorithm somehow)
> >
> > No, that part should work OK.
> >
>
> So, let's ignore the other solver for now and instead focus on the
> concrete example I presented on the previous email. If GCC is solving
> these constraints:
>
> ```
> ANYTHING = &ANYTHING
> ESCAPED = *ESCAPED
> ESCAPED = ESCAPED + UNKNOWN
> *ESCAPED = NONLOCAL
> NONLOCAL = &NONLOCAL
> NONLOCAL = &ESCAPED
> INTEGER = &ANYTHING
> ISRA.4 = &NONLOCAL
> derefaddrtmp(9) = &NULL
> *ISRA.4 = derefaddrtmp(9)
> i = NONLOCAL
> i = &NONLOCAL
> ESCAPED = &NONLOCAL
> _2 = *ISRA.4
> ```
>
> What would a hand calculated solution gives us vs what was the
> solution given by GCC?
>
> I am following the algorithm stated on Section 3.3 of Structure
> Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> UNKNOWN constraint since there isn't any other field offset that needs
> to be calculated.
>
> First, I want to make some adjustments. I am going to be using "=" to
> signify the \supseteq symbol and I will be adding curly braces to
> specify the element in a set as opposed to the whole set. Therefore
> the constraints will now become (ordered slightly differently):
>
> ```
> direct contraints
> ANYTHING = { ANYTHING }
> ESCAPED = { NONLOCAL }
> NONLOCAL = { NONLOCAL }
> NONLOCAL =  { ESCAPED }
> INTEGER = { ANYTHING }
> ISRA.4 = { NONLOCAL }
> derefaddrtmp(9) = { NULL }
> i = { NONLOCAL }
>
> complex constraints==
> ESCAPED = *ESCAPED
> *ESCAPED = NONLOCAL
> *ISRA.4 = derefaddrtmp(9)
> _2 = *ISRA.4
>
> = copy-constraints==
> ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> it will matter...
> i = NONLOCAL
> ```
>
> Solution sets are basically the direct constraints at the moment.
>
> Let's now create the graph
>
> 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has no outgoing edge
>
> Now, we can iterate over this set of nodes
>
> 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> edges, but it has complex-constraints. Let's modify the graph.
>   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> edge from ESCAPED to NONLOCAL.
>   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> edge from NONLOCAL to NONLOCAL
>
> The graph is now transformed to
>
> 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has an edge going to NONLOCAL
>
> The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
>
> Now we continue walking
>
> 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> There are no edges, but it has complex-constraints. Let's modify the
> graph.
>   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> a copy edge from NONLOCAL to derefaddrtmp(9).
>
> The graph is now transformed to
>
> 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
>
> The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> ESCAPED, NULL}.
>
> Now I could continue, but here is already something that is not shown
> in the points-to sets in the dump. It shows that
>
> NONLOCAL = {NONLOCAL, ESCAPED, NULL}
>
> Looking at the data that I showed yesterday:
>
> ```
> NONLOCAL = { ESCAPED NONLOCAL } same as i
> ```
>
> we see that NULL is not in the solution set of NONLOCAL.
>
> Now, yesterday you said that NULL is not conservatively correctly
> represented in the constraints. You also said today the points-to
> analysis should be solving the constraints fine. What I now understand
> from this is that while NULL may be pointed to by some constraints, it
> doesn't mean that not being on the set means that a 

Re: More questions on points-to analysis

2021-03-17 Thread Erick Ochoa via Gcc
No, I think it is correct. Any help would be clearly appreciated.
Thanks! (Even if I'm wrong, of course I'd like to know.)

On Wed, 17 Mar 2021 at 17:21, Erick Ochoa  wrote:
>
> Mm... ignore this for now please, I think I messed up the analysis by
> hand. I will try again. Thanks!
>
> On Wed, 17 Mar 2021 at 16:16, Erick Ochoa  wrote:
> >
> > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > little bit. But now I want to focus on this:
> >
> > > > * the process in GCC that generates the constraints for NULL works
> > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > solver should yield a conservatively correct answer) but the process
> > > > that solves the constraints relaxes the solutions for the NULL
> > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > solving algorithm somehow)
> > >
> > > No, that part should work OK.
> > >
> >
> > So, let's ignore the other solver for now and instead focus on the
> > concrete example I presented on the previous email. If GCC is solving
> > these constraints:
> >
> > ```
> > ANYTHING = &ANYTHING
> > ESCAPED = *ESCAPED
> > ESCAPED = ESCAPED + UNKNOWN
> > *ESCAPED = NONLOCAL
> > NONLOCAL = &NONLOCAL
> > NONLOCAL = &ESCAPED
> > INTEGER = &ANYTHING
> > ISRA.4 = &NONLOCAL
> > derefaddrtmp(9) = &NULL
> > *ISRA.4 = derefaddrtmp(9)
> > i = NONLOCAL
> > i = &NONLOCAL
> > ESCAPED = &NONLOCAL
> > _2 = *ISRA.4
> > ```
> >
> > What would a hand calculated solution gives us vs what was the
> > solution given by GCC?
> >
> > I am following the algorithm stated on Section 3.3 of Structure
> > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > UNKNOWN constraint since there isn't any other field offset that needs
> > to be calculated.
> >
> > First, I want to make some adjustments. I am going to be using "=" to
> > signify the \supseteq symbol and I will be adding curly braces to
> > specify the element in a set as opposed to the whole set. Therefore
> > the constraints will now become (ordered slightly differently):
> >
> > ```
> > direct contraints
> > ANYTHING = { ANYTHING }
> > ESCAPED = { NONLOCAL }
> > NONLOCAL = { NONLOCAL }
> > NONLOCAL =  { ESCAPED }
> > INTEGER = { ANYTHING }
> > ISRA.4 = { NONLOCAL }
> > derefaddrtmp(9) = { NULL }
> > i = { NONLOCAL }
> >
> > complex constraints==
> > ESCAPED = *ESCAPED
> > *ESCAPED = NONLOCAL
> > *ISRA.4 = derefaddrtmp(9)
> > _2 = *ISRA.4
> >
> > = copy-constraints==
> > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > it will matter...
> > i = NONLOCAL
> > ```
> >
> > Solution sets are basically the direct constraints at the moment.
> >
> > Let's now create the graph
> >
> > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has no outgoing edge
> >
> > Now, we can iterate over this set of nodes
> >
> > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > edges, but it has complex-constraints. Let's modify the graph.
> >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > edge from ESCAPED to NONLOCAL.
> >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > edge from NONLOCAL to NONLOCAL
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL
> >
> > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> >
> > Now we continue walking
> >
> > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > There are no edges, but it has complex-constraints. Let's modify the
> > graph.
> >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > a copy edge from NONLOCAL to derefaddrtmp(9).
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> >
> > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > ESCAPED, NULL}.
> >
> > Now I could continue, but here is already something that is not shown
> > in the points-to sets in the dump. It shows that
> >
> > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> >
> > Looking at the data that I showed yesterday:
> >
> > ```
> > 

Re: More questions on points-to analysis

2021-03-18 Thread Erick Ochoa via Gcc
Perfect, thank you! This is exactly the answer that I was looking for.


On Thu, 18 Mar 2021 at 13:27, Richard Biener  wrote:
>
> On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa  wrote:
> >
> > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > little bit. But now I want to focus on this:
> >
> > > > * the process in GCC that generates the constraints for NULL works
> > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > solver should yield a conservatively correct answer) but the process
> > > > that solves the constraints relaxes the solutions for the NULL
> > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > solving algorithm somehow)
> > >
> > > No, that part should work OK.
> > >
> >
> > So, let's ignore the other solver for now and instead focus on the
> > concrete example I presented on the previous email. If GCC is solving
> > these constraints:
> >
> > ```
> > ANYTHING = &ANYTHING
> > ESCAPED = *ESCAPED
> > ESCAPED = ESCAPED + UNKNOWN
> > *ESCAPED = NONLOCAL
> > NONLOCAL = &NONLOCAL
> > NONLOCAL = &ESCAPED
> > INTEGER = &ANYTHING
> > ISRA.4 = &NONLOCAL
> > derefaddrtmp(9) = &NULL
> > *ISRA.4 = derefaddrtmp(9)
> > i = NONLOCAL
> > i = &NONLOCAL
> > ESCAPED = &NONLOCAL
> > _2 = *ISRA.4
> > ```
> >
> > What would a hand calculated solution gives us vs what was the
> > solution given by GCC?
> >
> > I am following the algorithm stated on Section 3.3 of Structure
> > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > UNKNOWN constraint since there isn't any other field offset that needs
> > to be calculated.
> >
> > First, I want to make some adjustments. I am going to be using "=" to
> > signify the \supseteq symbol and I will be adding curly braces to
> > specify the element in a set as opposed to the whole set. Therefore
> > the constraints will now become (ordered slightly differently):
> >
> > ```
> > direct contraints
> > ANYTHING = { ANYTHING }
> > ESCAPED = { NONLOCAL }
> > NONLOCAL = { NONLOCAL }
> > NONLOCAL =  { ESCAPED }
> > INTEGER = { ANYTHING }
> > ISRA.4 = { NONLOCAL }
> > derefaddrtmp(9) = { NULL }
> > i = { NONLOCAL }
> >
> > complex constraints==
> > ESCAPED = *ESCAPED
> > *ESCAPED = NONLOCAL
> > *ISRA.4 = derefaddrtmp(9)
> > _2 = *ISRA.4
> >
> > = copy-constraints==
> > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > it will matter...
> > i = NONLOCAL
> > ```
> >
> > Solution sets are basically the direct constraints at the moment.
> >
> > Let's now create the graph
> >
> > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has no outgoing edge
> >
> > Now, we can iterate over this set of nodes
> >
> > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > edges, but it has complex-constraints. Let's modify the graph.
> >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > edge from ESCAPED to NONLOCAL.
> >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > edge from NONLOCAL to NONLOCAL
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL
> >
> > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> >
> > Now we continue walking
> >
> > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > There are no edges, but it has complex-constraints. Let's modify the
> > graph.
> >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > a copy edge from NONLOCAL to derefaddrtmp(9).
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> >
> > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > ESCAPED, NULL}.
> >
> > Now I could continue, but here is already something that is not shown
> > in the points-to sets in the dump. It shows that
> >
> > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> >
> > Looking at the data that I showed yesterday:
> >
> > ```
> > NONLOCAL = { ESCAPED NONLOCAL } same as i
> > ```
> >
> > we see that NULL is not in the solution set of NONLOCAL.
> >
> > Now, yesterday you said that NU

Re: More questions on points-to analysis

2021-03-19 Thread Erick Ochoa via Gcc
Hello Richard,

I have a related question. I have the following GCC program (which is
again an altered version of ipa-pta-16.c:

```

struct X
{
  long l1;
  struct Y
{
  long l2;
  int *p;
} y;
};
int i;
static int __attribute__((noinline))
foo (struct X *x)
{
  struct Y y = x->y;
  *y.p = 0;
   //i = 1;
  return *y.p;
}
extern void abort (void);
int main()
{
  struct X x;
  struct X t;
  x.y.p = &i;
  long* e = &t.l1;
  if (foo(&x) != 1)
abort ();
  return e;
}
```

I am compiling it with gcc -O2 -ftree-pta -fdump-tree-alias
-ftree-salias ipa-pta-16.c

and I get the following constraints for function foo.

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
```

I am just ordering to make this easier:

```
ANYTHING = &ANYTHING
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL

ESCAPED = ESCAPED + UNKNOWN

ESCAPED = *ESCAPED
*ESCAPED = NONLOCAL
*ISRA.4 = derefaddrtmp(9)
```

I now construct a constraint graph. It is basically a collection of
unconnected nodes, except for ESCAPED which points to itself.

I then walk over this graph and first encounter the constraint
"ESCAPED = ESCAPED + UNKNOWN". Since Sol(ESCAPED) is the empty set
nothing happens.
Similar for the constraints "ESCAPED = *ESCAPED" and "*ESCAPED = NONLOCAL".

I then walk to "*ISRA.4 = derefaddrtmp(9)". Sol(ISRA.4) = { NONLOCAL
}. Then there's possibly an edge made from NONLOCAL to derefaddrtmp.
(I am not sure of the implementation, since NONLOCAL doesn't change it
might not require an added edge.). And because NONLOCAL is a special
variable, Sol(NONLOCAL) does not change.

Now, none of the solutions have changed. We have possibly added an
edge to NONLOCAL, so let's just process the graph once more. Again
Sol(ESCAPED) is still empty so "ESCAPED = ESCAPED + UNKNOWN", "ESCAPED
= *ESCAPED", and "*ESCAPED = NONLOCAL" should have no effect ."Then
*ISRA.4 = derefaddrtmp(9)" also has no effect since nothing can
happen.

My question here is, why if we look at the points-to sets dumped by GCC we see

ESCAPED = { NULL }

Here is the full points-to sets:

computed for foo:

```
ANYTHING = { ANYTHING }
ESCAPED = { NULL }
NONLOCAL = { ESCAPED NONLOCAL }
STOREDANYTHING = { }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
foo.constprop.0.isra.0 = { }
```

I think this might have to do with "+ UNKNOWN" but I am unsure about
its semantics in the constraint set.

Thanks, any help is appreciated!


On Thu, 18 Mar 2021 at 13:36, Erick Ochoa  wrote:
>
> Perfect, thank you! This is exactly the answer that I was looking for.
>
>
> On Thu, 18 Mar 2021 at 13:27, Richard Biener  
> wrote:
> >
> > On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa  wrote:
> > >
> > > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > > little bit. But now I want to focus on this:
> > >
> > > > > * the process in GCC that generates the constraints for NULL works
> > > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > > solver should yield a conservatively correct answer) but the process
> > > > > that solves the constraints relaxes the solutions for the NULL
> > > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > > solving algorithm somehow)
> > > >
> > > > No, that part should work OK.
> > > >
> > >
> > > So, let's ignore the other solver for now and instead focus on the
> > > concrete example I presented on the previous email. If GCC is solving
> > > these constraints:
> > >
> > > ```
> > > ANYTHING = &ANYTHING
> > > ESCAPED = *ESCAPED
> > > ESCAPED = ESCAPED + UNKNOWN
> > > *ESCAPED = NONLOCAL
> > > NONLOCAL = &NONLOCAL
> > > NONLOCAL = &ESCAPED
> > > INTEGER = &ANYTHING
> > > ISRA.4 = &NONLOCAL
> > > derefaddrtmp(9) = &NULL
> > > *ISRA.4 = derefaddrtmp(9)
> > > i = NONLOCAL
> > > i = &NONLOCAL
> > > ESCAPED = &NONLOCAL
> > > _2 = *ISRA.4
> > > ```
> > >
> > > What would a hand calculated solution gives us vs what was the
> > > solution given by GCC?
> > >
> > > I am following the algorithm stated on Section 3.3 of Structure
> > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > > UNKNOWN constraint since there isn't any other field offset that needs
> > > to be calculated.
> > >
> > > First, I want to make some adjustments. I am going to be using "=" to
> > > signify the \supseteq symbol and I will be adding curly braces to
> > > specify the element in a set as opposed to the whole set. Therefore
> > > the constraints will now become (ordered slightly differently):
> > >
> > > ```
> > > direct contraints
> > > ANYTHING = { ANYTHING }
> > > ESCAPED = { NONLOCAL }
> > > NONLOCAL = { NONLOCAL }
> > > NONLOCAL =  { ESCAPED }
> > > INTEGER = { ANYTHING }
> > > ISRA.4 = { NONLOCAL }
> > > derefaddrtmp(9) = { NULL }
> > > i 

Question about reading LTO function summaries

2021-03-26 Thread Erick Ochoa via Gcc
Hello,

I already have some experience developing SIMPLE_IPA_PASSes, but I am
looking to understand IPA_PASSes better. I have made a hello world ipa
pass that stores "hello world $FUNCTION_NAME" in the function
summaries; however, I am having trouble reading this information back.
Can someone help me understand how to use these interfaces correctly?

At the moment, it **seems** to be writing information correctly.
(I.e., it doesn't segfault when attempting to write data.) However, in
my read summary function (ipa_hello_world_read_summary (void)) the
function `lto_get_summary_section_data (file_data,
LTO_section_ipa_hello_world, &len);` always returns NULL and
`file_data_vec` is of size 1. This means that at run time, there is
only one call to `lto_get_summary_section_data` and it returns NULL.

I am copy-pasting the relevant code below.

/* Map from cgraph_node* to "hello world $FUNCTION_NAME". */
static hash_map *hello_summaries;

static void
ipa_hello_world_write_summary (void)
{
  gcc_assert(hello_summaries);
  struct output_block *ob = create_output_block (LTO_section_ipa_hello_world);
  gcc_assert(ob);
  if (dump_file) fprintf(dump_file, "hello world from write summary\n");
  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
  if (dump_file) fprintf(dump_file, "writing %d\n",
hello_summaries->elements());
  streamer_write_uhwi (ob, hello_summaries->elements());

  for (auto i = hello_summaries->begin(), e = hello_summaries->end();
i != e; ++i)
  {
if (dump_file) fprintf(dump_file, "writing %s\n", (*i).second);
streamer_write_uhwi(ob, lto_symtab_encoder_encode(encoder, (*i).first));
streamer_write_string (ob, ob->main_stream, (*i).second, true);
  }
  produce_asm (ob, NULL);
  destroy_output_block (ob);
  delete hello_summaries;
}

static void
ipa_hello_world_read_summary (void)
{
  struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
  struct lto_file_decl_data *file_data;
  unsigned int j = 0;
  if (dump_file) fprintf(dump_file, "hello from read summary\n");
  while ((file_data = file_data_vec[j++]))
  {
if (dump_file) fprintf(dump_file, "iteration = %d\n", j);
size_t len;
const char *data =
  lto_get_summary_section_data (file_data,
LTO_section_ipa_hello_world, &len);
if (!data) continue;

const struct lto_function_header *header = (const struct
lto_function_header*) data;
gcc_assert(header);
gcc_assert(header->cfg_size);
const int cfg_offset = sizeof (struct lto_function_header);
const int main_offset = cfg_offset + header->cfg_size;
const int string_offset = main_offset + header->main_size;
class data_in *data_in;

lto_input_block ib ((const char *) data + main_offset, header->main_size,
  file_data->mode_table);
data_in
  = lto_data_in_create (file_data, (const char *) data + string_offset,
  header->string_size, vNULL);
unsigned int n = streamer_read_uhwi (&ib);
//hello_summaries = new hash_map;
for (unsigned i = 0; i < n; i++)
{
  unsigned int index = streamer_read_uhwi(&ib);
  lto_symtab_encoder_t encoder = file_data->symtab_node_encoder;
  struct cgraph_node *cnode = dyn_cast
(lto_symtab_encoder_deref(encoder, index));
  gcc_assert(cnode);
  const char* string = streamer_read_string (data_in, &ib);
  if (dump_file) fprintf(dump_file, string);
}
  }

}


Re: Question about reading LTO function summaries

2021-03-30 Thread Erick Ochoa via Gcc
Hello,

just trying again to increase visibility of this question. Many thanks
in advance!


On Fri, 26 Mar 2021 at 13:49, Erick Ochoa  wrote:
>
> Hello,
>
> I already have some experience developing SIMPLE_IPA_PASSes, but I am
> looking to understand IPA_PASSes better. I have made a hello world ipa
> pass that stores "hello world $FUNCTION_NAME" in the function
> summaries; however, I am having trouble reading this information back.
> Can someone help me understand how to use these interfaces correctly?
>
> At the moment, it **seems** to be writing information correctly.
> (I.e., it doesn't segfault when attempting to write data.) However, in
> my read summary function (ipa_hello_world_read_summary (void)) the
> function `lto_get_summary_section_data (file_data,
> LTO_section_ipa_hello_world, &len);` always returns NULL and
> `file_data_vec` is of size 1. This means that at run time, there is
> only one call to `lto_get_summary_section_data` and it returns NULL.
>
> I am copy-pasting the relevant code below.
>
> /* Map from cgraph_node* to "hello world $FUNCTION_NAME". */
> static hash_map *hello_summaries;
>
> static void
> ipa_hello_world_write_summary (void)
> {
>   gcc_assert(hello_summaries);
>   struct output_block *ob = create_output_block (LTO_section_ipa_hello_world);
>   gcc_assert(ob);
>   if (dump_file) fprintf(dump_file, "hello world from write summary\n");
>   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
>   if (dump_file) fprintf(dump_file, "writing %d\n",
> hello_summaries->elements());
>   streamer_write_uhwi (ob, hello_summaries->elements());
>
>   for (auto i = hello_summaries->begin(), e = hello_summaries->end();
> i != e; ++i)
>   {
> if (dump_file) fprintf(dump_file, "writing %s\n", (*i).second);
> streamer_write_uhwi(ob, lto_symtab_encoder_encode(encoder, (*i).first));
> streamer_write_string (ob, ob->main_stream, (*i).second, true);
>   }
>   produce_asm (ob, NULL);
>   destroy_output_block (ob);
>   delete hello_summaries;
> }
>
> static void
> ipa_hello_world_read_summary (void)
> {
>   struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
>   struct lto_file_decl_data *file_data;
>   unsigned int j = 0;
>   if (dump_file) fprintf(dump_file, "hello from read summary\n");
>   while ((file_data = file_data_vec[j++]))
>   {
> if (dump_file) fprintf(dump_file, "iteration = %d\n", j);
> size_t len;
> const char *data =
>   lto_get_summary_section_data (file_data,
> LTO_section_ipa_hello_world, &len);
> if (!data) continue;
>
> const struct lto_function_header *header = (const struct
> lto_function_header*) data;
> gcc_assert(header);
> gcc_assert(header->cfg_size);
> const int cfg_offset = sizeof (struct lto_function_header);
> const int main_offset = cfg_offset + header->cfg_size;
> const int string_offset = main_offset + header->main_size;
> class data_in *data_in;
>
> lto_input_block ib ((const char *) data + main_offset, header->main_size,
>   file_data->mode_table);
> data_in
>   = lto_data_in_create (file_data, (const char *) data + string_offset,
>   header->string_size, vNULL);
> unsigned int n = streamer_read_uhwi (&ib);
> //hello_summaries = new hash_map;
> for (unsigned i = 0; i < n; i++)
> {
>   unsigned int index = streamer_read_uhwi(&ib);
>   lto_symtab_encoder_t encoder = file_data->symtab_node_encoder;
>   struct cgraph_node *cnode = dyn_cast
> (lto_symtab_encoder_deref(encoder, index));
>   gcc_assert(cnode);
>   const char* string = streamer_read_string (data_in, &ib);
>   if (dump_file) fprintf(dump_file, string);
> }
>   }
>
> }


Question about points-to analysis, global variables, IPA, and field sensitivity

2021-03-30 Thread Erick Ochoa via Gcc
Hi,

I am looking at the points-to analysis in GCC and I found the
following comment in tree-ssa-structalias.c:

  /* Collect field information.  */
  if (use_field_sensitive
  && var_can_have_subvars (decl)
  /* ???  Force us to not use subfields for globals in IPA mode.
 Else we'd have to parse arbitrary initializers.  */
  && !(in_ipa_mode
   && is_global_var (decl)))


>From what I understand here the points-to analysis is explicitly not
using field sensitivity for global variables when in IPA. I am
wondering,
0) Is "initializers" here talking about brace initializers? Or are
"initializers" a little bit more abstract and refer to anything that
can initialize a field? Like struct->field = function()?
1) How much work would it be to parse initializers?
2) How would global structs which are not initialized using
initializers be handled? Would all fields get assigned NULL at the
global variable level, but then as other fields get assigned we just
create new constraints with the correct offsets?

Thanks!


Re: Question about points-to analysis, global variables, IPA, and field sensitivity

2021-03-30 Thread Erick Ochoa via Gcc
> If the global is module local we should initialize it with NULL, yes.  If it 
> is
> not module local it should be initialized with NONLOCAL (that's both what
> should currently happen correctly - it's needed for non-field-sensitive init
> as well).
>

Awesome, thanks Richard! One more question: in the context of LTO,
"module local" means the current partition?

I understand that IPA-PTA is a late SIMPLE_IPA_PASS but my
understanding of the consequences of this is a little fuzzy. I think
that IPA-PTA also runs in multiple partitions (unless a flag like
-flto-partition=none is used), but there might be some issue with
reduced precision. Perhaps this reduced precision comes from NONLOCAL
constraints with symbols that cross partitions? (This is just
speculation on my part, as I mention, my understanding is a little bit
fuzzy.)


Re: Question about reading LTO function summaries

2021-04-01 Thread Erick Ochoa via Gcc
Hello,

just trying one more time. Any direction or hint is appreciated. Just
as a note, I looked at the ipa-devirt as an inspiration for these
small functions I wrote, but for some reason I can't read what I
assume have written to the summaries. (What I'm trying to do is simply
use LTO using a real IPA_PASS as opposed to SIMPLE_IPA_PASS, and
therefore I just want to write some simple summaries and read them).

I have wondered if the bitpack_d bp = bitpack_create
(ob->main_stream); creates any difference, but when I tried it that
didn't seem to be the case.

Many thanks in advance!


On Tue, 30 Mar 2021 at 10:18, Erick Ochoa  wrote:
>
> Hello,
>
> just trying again to increase visibility of this question. Many thanks
> in advance!
>
>
> On Fri, 26 Mar 2021 at 13:49, Erick Ochoa  wrote:
> >
> > Hello,
> >
> > I already have some experience developing SIMPLE_IPA_PASSes, but I am
> > looking to understand IPA_PASSes better. I have made a hello world ipa
> > pass that stores "hello world $FUNCTION_NAME" in the function
> > summaries; however, I am having trouble reading this information back.
> > Can someone help me understand how to use these interfaces correctly?
> >
> > At the moment, it **seems** to be writing information correctly.
> > (I.e., it doesn't segfault when attempting to write data.) However, in
> > my read summary function (ipa_hello_world_read_summary (void)) the
> > function `lto_get_summary_section_data (file_data,
> > LTO_section_ipa_hello_world, &len);` always returns NULL and
> > `file_data_vec` is of size 1. This means that at run time, there is
> > only one call to `lto_get_summary_section_data` and it returns NULL.
> >
> > I am copy-pasting the relevant code below.
> >
> > /* Map from cgraph_node* to "hello world $FUNCTION_NAME". */
> > static hash_map *hello_summaries;
> >
> > static void
> > ipa_hello_world_write_summary (void)
> > {
> >   gcc_assert(hello_summaries);
> >   struct output_block *ob = create_output_block 
> > (LTO_section_ipa_hello_world);
> >   gcc_assert(ob);
> >   if (dump_file) fprintf(dump_file, "hello world from write summary\n");
> >   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> >   if (dump_file) fprintf(dump_file, "writing %d\n",
> > hello_summaries->elements());
> >   streamer_write_uhwi (ob, hello_summaries->elements());
> >
> >   for (auto i = hello_summaries->begin(), e = hello_summaries->end();
> > i != e; ++i)
> >   {
> > if (dump_file) fprintf(dump_file, "writing %s\n", (*i).second);
> > streamer_write_uhwi(ob, lto_symtab_encoder_encode(encoder, (*i).first));
> > streamer_write_string (ob, ob->main_stream, (*i).second, true);
> >   }
> >   produce_asm (ob, NULL);
> >   destroy_output_block (ob);
> >   delete hello_summaries;
> > }
> >
> > static void
> > ipa_hello_world_read_summary (void)
> > {
> >   struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
> >   struct lto_file_decl_data *file_data;
> >   unsigned int j = 0;
> >   if (dump_file) fprintf(dump_file, "hello from read summary\n");
> >   while ((file_data = file_data_vec[j++]))
> >   {
> > if (dump_file) fprintf(dump_file, "iteration = %d\n", j);
> > size_t len;
> > const char *data =
> >   lto_get_summary_section_data (file_data,
> > LTO_section_ipa_hello_world, &len);
> > if (!data) continue;
> >
> > const struct lto_function_header *header = (const struct
> > lto_function_header*) data;
> > gcc_assert(header);
> > gcc_assert(header->cfg_size);
> > const int cfg_offset = sizeof (struct lto_function_header);
> > const int main_offset = cfg_offset + header->cfg_size;
> > const int string_offset = main_offset + header->main_size;
> > class data_in *data_in;
> >
> > lto_input_block ib ((const char *) data + main_offset, 
> > header->main_size,
> >   file_data->mode_table);
> > data_in
> >   = lto_data_in_create (file_data, (const char *) data + string_offset,
> >   header->string_size, vNULL);
> > unsigned int n = streamer_read_uhwi (&ib);
> > //hello_summaries = new hash_map;
> > for (unsigned i = 0; i < n; i++)
> > {
> >   unsigned int index = streamer_read_uhwi(&ib);
> >   lto_symtab_encoder_t encoder = file_data->symtab_node_encoder;
> >   struct cgraph_node *cnode = dyn_cast
> > (lto_symtab_encoder_deref(encoder, index));
> >   gcc_assert(cnode);
> >   const char* string = streamer_read_string (data_in, &ib);
> >   if (dump_file) fprintf(dump_file, string);
> > }
> >   }
> >
> > }


Question on changing IPA-PTA to an IPA_PASS

2021-04-01 Thread Erick Ochoa via Gcc
Hi,

just a high level question. I know that IPA-PTA is a SIMPLE_IPA_PASS
and that ideally it would be better as an IPA_PASS. I understand that
one of the biggest challenges of changing IPA-PTA to an IPA_PASS is
that on the current LTO framework, the LTRANS stage can happen at the
same time for multiple transformations. This means (I think) that if
IPA-PTA computed the points-to sets during WPA, during LTRANS there
can be new variables, or modified function bodies that therefore no
longer correspond to the computed IPA-PTA.

However, it seems that there's a simpler way to change IPA-PTA to an
IPA_PASS. This is by just introducing another section of
"all_regular_ipa_passes". Something like (not sure if the name inside
the brackets has to be unique)

  INSERT_PASSES_AFTER (all_regular_ipa_passes)
 // normal stuff
  TERMINATE_PASS_LIST (all_regular_ipa_passes)
   INSERT_PASSES_AFTER (all_regular_ipa_passes)
NEXT_PASS (pass_ipa_pta);
   TERMINATE_PASS_LIST (all_regular_ipa_passes)

I understand that this still singles out IPA-PTA as being "special",
but I am wondering if this is something that has been discussed
previously. The changes in the implementation on IPA-PTA should be
something like:

1. LGEN: in summaries store the constraint variables that GCC is
finding by inspecting gimple
2. WPA: actually solve the constraint variables and store only the
solutions to the constraint variables in another summary
3. LTRANS: read from the summary and pass the information to gimple

The advantage I see to this is that the precision of IPA-PTA would be
equal to using -flto-partition=none but without actually using that
flag and possibly some compile time speedup (when compared to
-flto-partition=none because parallel reading and writing of
summaries). Again, just an idea and I just want to understand if this
has been discussed before and why it is or might not be something that
you guys want.


Re: Question on changing IPA-PTA to an IPA_PASS

2021-04-01 Thread Erick Ochoa via Gcc
>
> I don't think this would remove any problem that is present.
>

I have a problem understanding what you mean here because later on you state:

> Now - the reason you think of is likely that IPA transform will instantiate
> IPA clones and do inlining and transfering the IPA PTA solution to the
> such transformed IL is at best complicated.  That's true as well, of course,
> in fact IPA inlining and cloning will improve context sensitivity (by
> duplicating).

So, I think you are agreeing with me that transferring the IPA PTA
solution to the transformed IL is a big complication. So, going back
to your previous statement "I don't think this would remove any
problem that is present," does this mean that:

1. this will not solve the "transfering the IPA PTA solution to the
such transformed IL" problem?
2. or "transfering the IPA PTA solution to the such transformed IL" is
not a present problem?

Sorry, just trying to understand what you are fully saying.

Just to be more explicit, I believe (and might be wrong) that placing
a new section of passes after the ipa-passes would now materialize all
the new IL, and therefore when creating a new section, IPA-PTA would
only see IL that will not be transformed. This gets rid of
"transfering the IPA PTA solution to transformed IL" problem, because
there is no more transformed IL. (But again, my understanding of some
things might be wrong).

>
> But the main reason we've not even tried to make IPA PTA a true IPA pass
> is because of the scalability issue.
>

I didn't know this was the main reason! Thanks! I'm wondering, let's
say that IPA PTA magically gets better performance. What would be the
next steps after that? Is looking into the summaries and updating the
constraints something that would be a good idea. (I think Hubicka
agrees on doing that.) Or do you have some other hints/ideas about
what would be a good implementation?


Re: Question about reading LTO function summaries

2021-04-06 Thread Erick Ochoa via Gcc
t;,
j, data ? "t" : "f");
+if (!data) continue;
+
+// This has not been executed with my simple example programs
+const struct lto_function_header *header = (const struct
lto_function_header*) data;
+gcc_assert(header);
+gcc_assert(header->cfg_size);
+const int cfg_offset = sizeof (struct lto_function_header);
+const int main_offset = cfg_offset + header->cfg_size;
+const int string_offset = main_offset + header->main_size;
+class data_in *data_in;
+
+lto_input_block ib ((const char *) data + main_offset, header->main_size,
+  file_data->mode_table);
+data_in
+  = lto_data_in_create (file_data, (const char *) data + string_offset,
+  header->string_size, vNULL);
+unsigned int n = streamer_read_uhwi (&ib);
+//hello_summaries = new hash_map;
+for (unsigned i = 0; i < n; i++)
+{
+  unsigned int index = streamer_read_uhwi(&ib);
+  lto_symtab_encoder_t encoder = file_data->symtab_node_encoder;
+  struct cgraph_node *cnode = dyn_cast
+(lto_symtab_encoder_deref(encoder, index));
+  gcc_assert(cnode);
+  const char* string = streamer_read_string (data_in, &ib);
+  if (dump_file) fprintf(dump_file, string);
+}
+  }
+
+}
+
+
+
+static unsigned int
+iphw_execute()
+{
+  if (dump_file) fprintf(dump_file, "hello from wpa\n");
+  return 0;
+}
+
+namespace {
+const pass_data pass_data_ipa_hello_world =
+{
+  IPA_PASS,
+  "hello-world",
+  OPTGROUP_NONE,
+  TV_NONE,
+  (PROP_cfg | PROP_ssa),
+  0,
+  0,
+  0,
+  0,
+};
+
+class pass_ipa_hello_world : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_hello_world (gcc::context *ctx)
+: ipa_opt_pass_d (pass_data_ipa_hello_world, ctx,
+   ipa_hello_world_generate_summary,
+   ipa_hello_world_write_summary,
+   ipa_hello_world_read_summary,
+   NULL,
+   NULL,
+   NULL,
+   0,
+   NULL,
+   NULL)
+  {}
+
+  virtual bool gate(function*) { return flag_ipa_hello_world; }
+  virtual unsigned execute (function*) { return iphw_execute(); }
+};
+} // anon namespace
+
+ipa_opt_pass_d*
+make_pass_ipa_hello_world (gcc::context *ctx)
+{
+  return new pass_ipa_hello_world (ctx);
+}
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 5c7cd84d46f..fb96cc357c6 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -228,6 +228,7 @@ enum lto_section_type
   LTO_section_ipa_sra,
   LTO_section_odr_types,
   LTO_section_ipa_modref,
+  LTO_section_hello_world,
   LTO_N_SECTION_TYPES/* Must be last.  */
 };

diff --git a/gcc/passes.def b/gcc/passes.def
index e9ed3c7bc57..54a7df36425 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -153,6 +153,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_profile);
   NEXT_PASS (pass_ipa_icf);
   NEXT_PASS (pass_ipa_devirt);
+  NEXT_PASS (pass_ipa_hello_world);
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_sra);
   NEXT_PASS (pass_ipa_cdtor_merge);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 15693fee150..5ab807f045d 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -510,6 +510,7 @@ extern ipa_opt_pass_d *make_pass_ipa_fn_summary
(gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_inline (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_free_lang_data (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_free_fn_summary (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_hello_world (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_sra (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
-- 
2.27.0

On Tue, 6 Apr 2021 at 14:07, Martin Jambor  wrote:
>
> Hi,
>
> On Fri, Mar 26 2021, Erick Ochoa via Gcc wrote:
> > I already have some experience developing SIMPLE_IPA_PASSes, but I am
> > looking to understand IPA_PASSes better. I have made a hello world ipa
> > pass that stores "hello world $FUNCTION_NAME" in the function
> > summaries; however, I am having trouble reading this information back.
> > Can someone help me understand how to use these interfaces correctly?
> >
> > At the moment, it **seems** to be writing information correctly.
> > (I.e., it doesn't segfault when attempting to write data.) However, in
> > my read summary function (ipa_hello_world_read_summary (void)) the
> > function `lto_get_summary_section_data (file_data,
> > LTO_section_ipa_hello_world, &len);` always returns NULL and
> > `file_data_vec` is of size 1. This means that at run time, there is
> > only one call to `lto_get_summary_section_data` and it returns NULL.
>
> I looked at the code you posted and compared it with streaming in
> ipa-sra.c and did not spot any difference that could result in this
> behavior.
>
> I guess you have checked that the functions are called from proper
> hooks?  (I.e. from write_summary and read_summary or
> write_optimization_summary and read_optimization_summary, depending on
> what you are trying to do, and not some mixture of these combinations?)
>
> You can try and send the whole patch (hopefully a hello world pass would
> not be too large) and I can have a look.
>
> Martin


Late SIMPLE_IPA_PASS, DECL_INITIAL and error mark nodes

2021-06-11 Thread Erick Ochoa via Gcc
Hi,

I am looking for a small clarification. I understand that during late
SIMPLE_IPA_PASSes some statically initialized global variables might
have error_mark_node trees in their DECL_INITIAL field.

I believe that I read something similar in the past about how to get
the tree expressions in these situations, and I believe it said one
had to stream in those symbols from the .gnu.lto_.decl section.

However, on the GCC Internals there is also the following mention: "If
the DECL_INITIAL is the error_mark_node, there is an initializer, but
it is given by an explicit statement later in the code; no bitwise
copy is required. " [0]

What (if any) is the correct way to get these expressions?

As a note, when running this pass as an IPA_PASS I am able to see the
DECL_INITIAL node of the variable of interest during LGEN.

Thanks!

[0] https://gcc.gnu.org/onlinedocs/gccint/Working-with-declarations.html


Semantics of OBJ_TYPE_REF

2021-06-18 Thread Erick Ochoa via Gcc
Hi,

I am having some trouble understanding the semantics of OBJ_TYPE_REF.
I understand that it is made of three operands:

1. OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use.
2. OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is
being performed
3. OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table.

The language here is very general and makes it hard for me to understand. Is

1. OBJ_TYPE_REF_EXPR: The virtual function
2. OBJ_TYPE_REF_OBJECT: The "this" object

?

Why is the index needed if it looks like the virtual function is
already pointed to by the first operand? I am reading Hubicka's post
on devirtualization and Part II
(https://hubicka.blogspot.com/2014/01/devirtualization-in-c-part-2-low-level.html)
seems to agree with me on the example below. There is also no mention
of OBJ_TYPE_REF on part II, but there is on part III.

Hubicka's example:

int test() ()
{
  int (*__vtbl_ptr_type) () *tmp1;
  int (*__vtbl_ptr_type) () tmp2;
  struct A a;
struct A * b;


  _ZN1AC2Ev (&a);
  b = &a;

  tmp1 = b->_vptr.A;
  tmp2 = *tmp1;

  // Notice no OBJ_TYPE_REF below.
  return tmp2 (b);
}

My example:

#include 

class A
{
public:
  virtual void foo() { printf("hello\n"); };
  virtual void bar() { printf("world\n"); };
};

int
main(int argc, char**argv )
{
class A* a = new A();
a->foo();
a->bar();
return 0;
}

// Relevant gimple:

a = D.2922;
_1 = a->_vptr.A;
_2 = *_1; // first element of the virtual table?

OBJ_TYPE_REF(_2;(struct A)a->0) (a);
// wouldn't this just be equivalent to
// _2 (a)

_3 = a->_vptr.A;
_4 = _3 + 8;

_5 = *_4; // second element of the virtual table
OBJ_TYPE_REF(_5;(struct A)a->1) (a);
// same
// _5 (a)

On part III Hubicka says that OBJ_TYPE_REF is a wrapper and that it
represents the type and token. (My guess is that these are the
arguments 2 and 3). I fail to understand why these are needed if we
have the function already.

Thanks! Any help is appreciated.


Re: Semantics of OBJ_TYPE_REF

2021-06-23 Thread Erick Ochoa via Gcc
Cool, thanks! I think I understand now.

On Tue, 22 Jun 2021 at 23:58, Martin Jambor  wrote:
>
> Hi,
>
> On Fri, Jun 18 2021, Erick Ochoa via Gcc wrote:
> > Hi,
> >
> > I am having some trouble understanding the semantics of OBJ_TYPE_REF.
> > I understand that it is made of three operands:
> >
> > 1. OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use.
> > 2. OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is
> > being performed
> > 3. OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table.
> >
> > The language here is very general and makes it hard for me to understand. Is
> >
> > 1. OBJ_TYPE_REF_EXPR: The virtual function
>
> well, it is never the actual function as in a FUNCTION_DECL, it is
> always a pointer to an unknown function.
>
> > 2. OBJ_TYPE_REF_OBJECT: The "this" object
> >
> > ?
> >
> > Why is the index needed if it looks like the virtual function is
> > already pointed to by the first operand?
>
> As I wrote above, the first operand is just a pointer.  Sometimes other
> optimizations can figure out where it leads but often not (or not early
> enough for inlining).
>
> In some situations we can get at the actual DECL (and subsequently have
> a direct call) using type based devirtualization and for that we use the
> other two arguments of OBJ_TYPE_REF.  See the big leading comment in
> ipa-devirt.c.
>
>
> > I am reading Hubicka's post
> > on devirtualization and Part II
> > (https://hubicka.blogspot.com/2014/01/devirtualization-in-c-part-2-low-level.html)
> > seems to agree with me on the example below. There is also no mention
> > of OBJ_TYPE_REF on part II, but there is on part III.
>
> I believe the gimple quoted there has been simplified to make it easier
> for the reader, since it does not describe the type-based
> devirtualization but how other optimizations can together deduce the
> call target, and so OBJ_TYPE_REF has been omitted not to confuse anyone.
>
> If you dump the IL for th example yourself, you will see OBJ_TYPE_REF in
> the earliest dumps and you will see that it disappears as soon as the
> call becomes direct.
>
> Hope this helps,
>
> Martin


Why does printing a pointer cause it to escape?

2021-06-23 Thread Erick Ochoa via Gcc
Hello,

I know that some BUILT_IN functions are treated in a special way by
the points-to analysis. Those functions are those that take pointers
as arguments or return them but do not change their points-to set and
similar cases. (E.g. strcpy returns a pointer to the same object as
their first argument points to.)

I notice that in these special cases, the printf function is nowhere
to be found, and if one prints a pointer using printf the pointer
points to escaped memory.

Why is this the case?

Thanks!


Re: Why does printing a pointer cause it to escape?

2021-06-23 Thread Erick Ochoa via Gcc
> I guess that to assume otherwise, one would have to make sure the
> pointer does not correspond to a "%n" (or similar, perhaps even future)
> conversion specifier.
>

Oh, wow, I didn't know about the %n specifier.

Thanks Martin!


Re: Why does printing a pointer cause it to escape?

2021-06-23 Thread Erick Ochoa via Gcc
Hi Liu,

thanks, this also seems correct. I originally thought that (in your
example) ptr would be marked as pointing to { ESCAPE NONLOCAL } and
that would be enough to hinder some optimizations that might influence
variable value. Therefore, it wasn't necessary to mark "value" as
pointing to { ESCAPE NONLOCAL }. But now I see that that's not the
case and value needs to also point to { ESCAPE NONLOCAL }.

Thanks!


Unique identifier across different partitions (LTO)

2021-06-29 Thread Erick Ochoa via Gcc
Hello,

I'm trying to generate unique identifiers for some trees at link time.
I understand that there are already some unique identifiers in
declarations (DECL_UID) and perhaps others. Do all trees have unique
identifiers or only declarations?

Alternatively, if they don't have unique identifiers, again, I am
trying to generate my own at link time. I originally was thinking
about just having a counter and incrementing it every time I add a
tree of interest to this data structure that I use to keep track of
trees. However, with the parallel LTO framework, this would mean that
identifiers will be duplicated across different partitions. Has anyone
done something similar where information across partitions needs to be
communicated?

What I was thinking of as an alternative to communicating this
information across partitions is to record the process id then use
this information to generate a unique identifier based on the counter
and the process id that is processing the partition. This derived
identifier would be generated during WPA time.

Has anyone had any experience doing something similar? I would be
interested in seeing similar examples and use cases.

Thanks!


How to get the global namespace (DECL_CONTEXT) at LTO during LGEN?

2021-06-30 Thread Erick Ochoa via Gcc
Hello,

I am wondering if there's a way to get the global namespace at LTO
during LGEN in each of the partitions being processed. I believe that
during parse time for c/c++ there is a global_namespace macro or
variable that can be used, but I don't think that it is possible to
use at link time.

Thanks!


Are DECL_UIDs for the same across partitions during LGEN?

2021-06-30 Thread Erick Ochoa via Gcc
Hi,

I am still working on understanding the LTO framework and how the
gimple representation works across multiple partitions. I found myself
printing all global variables and printing their DECL_UID. I noticed
that for some variables, their DECL_UIDs were different across
different partitions. That is, a single global variable has different
DECL_UIDs in different partitions. Is this the correct behaviour? My
understanding of DECL_UID is that for every declaration there is a
unique identifier, but is this not the case during LTO?

I am building a transformation on top of releases/gcc-11, maybe newer
commits address this, but I haven't seen a discussion in this mailing
list. Perhaps in patches, but I don't follow that mailing list too
closely.

Thanks, any help is appreciated!


Re: Are DECL_UIDs for the same across partitions during LGEN?

2021-06-30 Thread Erick Ochoa via Gcc
On Wed, 30 Jun 2021 at 17:06, Richard Biener  wrote:
>
> On June 30, 2021 4:07:00 PM GMT+02:00, Erick Ochoa via Gcc  
> wrote:
> >Hi,
> >
> >I am still working on understanding the LTO framework and how the
> >gimple representation works across multiple partitions. I found myself
> >printing all global variables and printing their DECL_UID. I noticed
> >that for some variables, their DECL_UIDs were different across
> >different partitions. That is, a single global variable has different
> >DECL_UIDs in different partitions. Is this the correct behaviour?
>
> Yes.
>
> My
> >understanding of DECL_UID is that for every declaration there is a
> >unique identifier, but is this not the case during LTO?
>
> It's also true for WPA and LTrans a variables UID is just not the same in all 
> of them because there's no way to ensure that.
>

So how would I be able to say that two different declarations are the
same variable?


How to dump_decl_name to a string instead to a file?

2021-07-01 Thread Erick Ochoa via Gcc
Hello,

I have a function that looks at identifiers of formal parameters. I
found that when I ran this function on larger projects, the compiler
segfaulted. I looked into this and I found that there are some formal
parameters which have NULL in their DECL_NAME and that triggered a
NULL dereference.

I also noticed that these parameters when printed with
dump_function_to_file were displayed using their DECL_UIDs. Digging a
little bit deeper, the function responsible for printing parameters
names is dump_decl_name. However, I am interested not in printing the
names but having them as strings.

I noticed that dump_decl_name takes a pretty_printer argument. Is it
possible to somehow pass a pretty_printer and get the string instead
of printing it?

Thanks!


tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-07 Thread Erick Ochoa via Gcc
Hi,

I am saving some tree declarations during LGEN that I will be later
analyzing at WPA time. I am able to read the decl from my summaries
and print it at WPA time. It corresponds to a global variable.
However, whenever I use symtab_node::get (decl) during WPA time I keep
getting NULL.

Does anyone know why that might be the case? Is it possible that other
optimizations are rewriting global variables during LGEN (or prior
WPA)? The variable I am looking at is a static const char typeinfo
name for a class in the program I am analyzing. I don't think this is
an issue since other type info names have an associated symtab_node.

Thanks!


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-09 Thread Erick Ochoa via Gcc
Hi, I noticed this is also happening also for local variables. Again,
storing tree declarations on a summary during LGEN and then at WPA
time reading from those summaries. I can print the declaration, but
when I try to look for its node in the symtab I get NULL as the return
value.

Any help is appreciated. Thanks!

On Wed, 7 Jul 2021 at 11:27, Erick Ochoa  wrote:
>
> Hi,
>
> I am saving some tree declarations during LGEN that I will be later
> analyzing at WPA time. I am able to read the decl from my summaries
> and print it at WPA time. It corresponds to a global variable.
> However, whenever I use symtab_node::get (decl) during WPA time I keep
> getting NULL.
>
> Does anyone know why that might be the case? Is it possible that other
> optimizations are rewriting global variables during LGEN (or prior
> WPA)? The variable I am looking at is a static const char typeinfo
> name for a class in the program I am analyzing. I don't think this is
> an issue since other type info names have an associated symtab_node.
>
> Thanks!


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-12 Thread Erick Ochoa via Gcc
> I'm not too familiar with it but I think you're supposed to stream encoded
> symtab references during LGEN/WPA,

Thanks Richard, this happened to be the solution. I am now using
lto_symtab_encoder_t to encode the declarations during LGEN and decode
them during WPA.

Are there any more limitations of using stream_write_tree that one
should be aware of? Now I am looking into storing trees of the type
STRING_CST and I think this might be causing me a problem at WPA time.
I think it segfaults at the moment of creating the process, but I
still need more time to investigate. Perhaps you might know if storing
STRING_CST trees has to be handled in a special way? Not sure if it
also has something to do with LTO file sections. The tree is used to
initialize a global static variable.

Thanks!


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-13 Thread Erick Ochoa via Gcc
Hi,

Just to clarify a similar question: I am using stream_write_tree and
looking at the comments it says that it is assumed that the tree T is
already in the encoder cache. Does this mean that I have to use
lto_symtab_encoder_t for all trees I want to store in summaries? I
thought the encoder only works for trees which are stored on the
symbol table. Would this mean that the only trees that can be written
out to summaries are those that are declarations? Or are there any
other encoders?

I am trying to store SSA trees at LGEN and read them back during WPA.

Thanks! Any help is appreciated.

On Mon, 12 Jul 2021 at 12:55, Erick Ochoa  wrote:
>
> > I'm not too familiar with it but I think you're supposed to stream encoded
> > symtab references during LGEN/WPA,
>
> Thanks Richard, this happened to be the solution. I am now using
> lto_symtab_encoder_t to encode the declarations during LGEN and decode
> them during WPA.
>
> Are there any more limitations of using stream_write_tree that one
> should be aware of? Now I am looking into storing trees of the type
> STRING_CST and I think this might be causing me a problem at WPA time.
> I think it segfaults at the moment of creating the process, but I
> still need more time to investigate. Perhaps you might know if storing
> STRING_CST trees has to be handled in a special way? Not sure if it
> also has something to do with LTO file sections. The tree is used to
> initialize a global static variable.
>
> Thanks!


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-13 Thread Erick Ochoa via Gcc
> There are entities, like SSA names and STRING_CSTs which are specially
> encoded and if you stream those in your LGEN data you have to set up
> appropriate encoders.  In general streaming arbitrary trees isn't the
> best thing to do, usually you're interested in specific pieces only.  That's
> especially true for things "local" to a function (like SSA names), I think
> existing IPA passes only stream encoded references to global entities
> (or parameters) and all "local" info is some pass specific data streamed
> as raw data, not trees.

Thanks!

>
> Why do you want to stream SSA trees for example?
>

I am working on a prototype for a points-to analysis where the
implementation is an IPA_PASS as opposed to a SIMPLE_IPA_PASS. It is
still somewhat early in the prototype implementation stages but I
would like to have SSA trees at WPA time as a way to map constraints
variables back to Gimple. The idea is to generate most constraints at
LGEN time (some will have to be updated) and solve them at WPA time.

Do you have any other suggestions on how to map constraint variables
(similar to the index in the varinfo_t array) back to Gimple (but now
as an IPA_PASS)? We have had similar discussions before, but now I am
looking at things more concretely, from the point of view of: how
exactly does one map the analysis back to Gimple while staying within
what is possible in the LTO framework?

I do have a pass-specific data structure, but it contains references
to trees. It is similar to varinfo_t, which has a decl. I also have
another tree for expressions (the concrete use case right now is
having a global variable being assigned to a string literal), and a
field for ssa variables, which was intended to store a reference to
the corresponding tree. Again, as a way to map back the constraint
variable to Gimple.

> There can be no
> references to those from other IPA entities?

I don't follow this question. I think this may be a statement and not
a question. Please clarify this for me, but perhaps the following
might answer: Since I am working on an interprocedural analysis, SSA
variables may point to abstract memory locations allocated in
different functions. So I need to have a link between SSA variables
and whatever memory locations they might point to.


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-13 Thread Erick Ochoa via Gcc
On Tue, 13 Jul 2021 at 11:41, Richard Biener  wrote:

> There are entities, like SSA names and STRING_CSTs which are specially
> encoded and if you stream those in your LGEN data you have to set up
> appropriate encoders.

I forgot to ask, is there an example of these appropriate encoders
being used somewhere? I only see references to lto_symtab_encoder_t.

Thanks!


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-14 Thread Erick Ochoa via Gcc
> I guess the way to encode SSA trees would be to use sth like a
> , SSA-version tuple much like PTA internally
> uses the varinfo array index as identifier for the variables in the
> constraints.  For local decls (as opposed to SSA names) it's a bit
> more difficult - you'd have to devise your own encoding here.
>
> What you can rely on I think is that for local variables UID relations
> are preserved, so you could sort cfun->local_decls and use the
> position in this array as encoding (in fact I see local_decls is
> streamed literally, so you don't even need to sort that for the
> start - but we could likely do that without harm to make searching
> for a UID O(log n)).

At the moment I am generating a unique id for each constraint variable
generated. I have assigned a unique LGEN number to each variable and
during WPA I have merged duplicates. The duplication of equivalent
gimple variables in distinct LGEN partitions happens for global
variables (as we have discussed before). Do you know if there are
other cases of duplication that might happen? For example, could a
single function be analyzed in different LGEN partitions?

I followed your example here and I am "encoding" the constraint
variables that relate to SSA variables by looking at the cgraph_node
and the SSA-version. The tree is not stored but at WPA we know the
SSA-version and the cgraph_node and I think this is enough to relate
back to the SSA variable in the gimple source.

You mention that I need to devise my own "encoder", but I am not sure
if we are conflating two notions:

1. encoding tree variables to constraint variables (i.e., a mapping of
some tuple (cgraph_node x symtab_node x ssa-version) to an integer
that represents the constraint variable)
2. encoding as an implementation of a data structure used during LTO
to stream in and stream out trees/symbols to and from partitions.
(e.g., lto_symtab_encoder_t).

So to be clear, when you say I need to devise my own "encoder" you are
referring to definition number 1, not definition number 2, right? And
at LTRANS using the relation (cgraph_node x symtab_node x ssa-version)
x constraint-variable-id one should be able to map to the interesting
pointer/pointee from the constraint variable id.

I am thinking a little bit ahead, but I will need a way to relate
memory allocation sites (e.g., malloc's) to some constraint variable
and perhaps generalize this to expressions (I would like to say that a
variable is pointing to a STRING_CST for example). Do you have an idea
on how to go and encode using the first definition of encoding tree
expressions? I have seen some papers that use instruction-id's
(potentially an integer that corresponds as a unique identifier for
the instruction) but I am unsure if there is something similar to this
in GCC. If what you meant is the second definition, can someone
elaborate on the precise steps for making my own encoder? While I am
somewhat familiar with using the LTO framework I am unfamiliar with
potentially extending it in these sorts of ways.

Thanks! Any help is appreciated.


tree that is both SSA_VAR_P and DECL_P?

2021-07-15 Thread Erick Ochoa via Gcc
Hi,

I was sorting SSA trees and DECL_P trees into different buckets. I
used DECL_P as a proxy for it being a local/global variable/function
(essentially a declaration). It seems that "probabilistically", I'm
kinda right. Normally there is no tree that is both an SSA_VAR_P and a
DECL_P at the same time, but there appears to be at least one in a
couple of large benchmarks.

What could lead to a tree being declared both SSA_VAR_P and DECL_P?

The first instance I've looked at where this happens the variable in
the source is a static class member. But are there other cases? And
why would it be the case that for smaller benchmarks there seems to be
no instance of trees that are both SSA_VAR_P and DECL_P at the same
time?

I've been interpreting DECL_P as any variable that actually takes up
memory. E.g., for every DECL_P variable there will be space allocated
on the stack. But SSA_VAR_P does not need space allocated since they
are just an abstraction to think about the flow of assignments. For
example, we might have the following pseudo-gimple:

PARM cond // variable cond is a boolean parameter (also DECL_P)
DECL int i. // variable i is a local integer declaration
// stack space is reserved for int i
if (cond)
{
  i_0 = 0 // i_0 is an SSA variable that corresponds to i
}
else
{
  i_1 = 1 // same
}
i_2 = phi(i_0, i_1)

In the end i_x will refer to the stack location reserved for i. I
think that this is a somewhat naive view of the execution model but it
is not until now that I found a counter example.

Is the above not the case?

As another note, I have not gone through the details on how static
class member variables are implemented, so perhaps there are some
details there where it would make sense to have a variable be both
SSA_VAR_P and DECL_P.

Can someone shed some light on this? Thanks!

Thanks!


Re: tree that is both SSA_VAR_P and DECL_P?

2021-07-15 Thread Erick Ochoa via Gcc
Thanks! I'll have to see what conditions led to everything working
before and change them to meet reality.

On Thu, 15 Jul 2021 at 10:35, Richard Biener  wrote:
>
> On Thu, Jul 15, 2021 at 10:22 AM Erick Ochoa via Gcc  wrote:
> >
> > Hi,
> >
> > I was sorting SSA trees and DECL_P trees into different buckets. I
> > used DECL_P as a proxy for it being a local/global variable/function
> > (essentially a declaration). It seems that "probabilistically", I'm
> > kinda right. Normally there is no tree that is both an SSA_VAR_P and a
> > DECL_P at the same time, but there appears to be at least one in a
> > couple of large benchmarks.
> >
> > What could lead to a tree being declared both SSA_VAR_P and DECL_P?
>
> /* Nonzero if DECL represents an SSA name or a variable that can possibly
>have an associated SSA name.  */
> #define SSA_VAR_P(DECL) \
> (TREE_CODE (DECL) == VAR_DECL   \
>  || TREE_CODE (DECL) == PARM_DECL   \
>  || TREE_CODE (DECL) == RESULT_DECL \
>  || TREE_CODE (DECL) == SSA_NAME)
>
> there's your answer.  SSA_VAR_P and DECL_P are not related in a way
> you think.
>
> > The first instance I've looked at where this happens the variable in
> > the source is a static class member. But are there other cases? And
> > why would it be the case that for smaller benchmarks there seems to be
> > no instance of trees that are both SSA_VAR_P and DECL_P at the same
> > time?
> >
> > I've been interpreting DECL_P as any variable that actually takes up
> > memory. E.g., for every DECL_P variable there will be space allocated
> > on the stack. But SSA_VAR_P does not need space allocated since they
> > are just an abstraction to think about the flow of assignments. For
> > example, we might have the following pseudo-gimple:
> >
> > PARM cond // variable cond is a boolean parameter (also DECL_P)
> > DECL int i. // variable i is a local integer declaration
> > // stack space is reserved for int i
> > if (cond)
> > {
> >   i_0 = 0 // i_0 is an SSA variable that corresponds to i
> > }
> > else
> > {
> >   i_1 = 1 // same
> > }
> > i_2 = phi(i_0, i_1)
> >
> > In the end i_x will refer to the stack location reserved for i. I
> > think that this is a somewhat naive view of the execution model but it
> > is not until now that I found a counter example.
> >
> > Is the above not the case?
> >
> > As another note, I have not gone through the details on how static
> > class member variables are implemented, so perhaps there are some
> > details there where it would make sense to have a variable be both
> > SSA_VAR_P and DECL_P.
> >
> > Can someone shed some light on this? Thanks!
> >
> > Thanks!


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-21 Thread Erick Ochoa via Gcc
Hello Richard, I need a little bit more help. In our previous messages
you mentioned ""

> >
> > > I guess the way to encode SSA trees would be to use sth like a
> > > , SSA-version tuple much like PTA internally
> > > uses the varinfo array index as identifier for the variables in the
> > > constraints.  For local decls (as opposed to SSA names) it's a bit
> > > more difficult - you'd have to devise your own encoding here.
> > >

There was a little confusion on my part about what "encoder" meant

>
> > You mention that I need to devise my own "encoder", but I am not sure
> > if we are conflating two notions:
> >
> > 1. encoding tree variables to constraint variables (i.e., a mapping of
> > some tuple (cgraph_node x symtab_node x ssa-version) to an integer
> > that represents the constraint variable)
> > 2. encoding as an implementation of a data structure used during LTO
> > to stream in and stream out trees/symbols to and from partitions.
> > (e.g., lto_symtab_encoder_t).
>

And you proceed with the following answer

> I meant 1) and streaming using the LTO cgraph encoder for the cgraph
> part and simply using the SSA version for the second part.
>

>From this exchange I understood that I should develop my own mapping
for ssa variables and local declarations. However, when dealing with
encoding a node which is available in the symbol table, I could use
the function lto_symtab_encoder_encode to map symbols to an integer
which would later make the symbol available at WPA time. Implicitly,
for me, this meant that this integer is the same for every distinct
symbol in different LGEN partitions. For example, if we encode symbol
X from partitions Y and we get the number N, then encoding symbol X in
partition Z should also yield N.

I believe this is not the case, during WPA time I am printing:
1. pid of lgen process that generated the encoding
2. index returned by lto_symtab_encoder_encode
3. varpool_node->name ()
4. the pointer address being pointed by varpool node

I think we had a previous discussion where it was mentioned that the
only way to distinguish between these cases is to look at varpool_node
cgraph_node:

(From a different email edited for brevity)

On Wed, 30 Jun 2021 at 19:38, Richard Biener  wrote:
>
> On June 30, 2021 6:28:29 PM GMT+02:00, Erick Ochoa  wrote:
> >So how would I be able to say that two different declarations are the
> >same variable?
>
> By looking at the associated varpool node.
>
> Richard.
>

If this is the case, I can indeed get the varpool node's at WPA time
(as shown above), but comparing their pointer addresses will be
distinct. How can one find out that two varpool nodes/cgraph nodes are
the same at WPA time? Is just looking at the assembler name enough? I
of course want this to be safe.

Another question, how is this currently handled in other IPA passes?
Alternatively, do you have suggestions for encoding functions and
global variables in a similar way to how you suggested encoding ssa
variables and local declarations?


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-22 Thread Erick Ochoa via Gcc
> > If this is the case, I can indeed get the varpool node's at WPA time
> > (as shown above), but comparing their pointer addresses will be
> > distinct. How can one find out that two varpool nodes/cgraph nodes are
> > the same at WPA time? Is just looking at the assembler name enough? I
> > of course want this to be safe.
>
> If they are the same they are merged by the symtab merging process done
> at WPA time.

I trust you, but how would I verify it myself? For example, I
mentioned that I printed:

> 1. pid of lgen process that generated the encoding
> 2. index returned by lto_symtab_encoder_encode
> 3. varpool_node->name ()
> 4. the pointer address being pointed by varpool node

and I got the same "name" but different indices, and different varpool
node's addresses for each of the lgen partitions. And of course
there's only one global variable with that name. In other words, what
exactly does it mean that they are "merged" and in which cases would
they not get merged?

What I'm seeing is for example:

fopen $PID1 8 $ADDR1
fopen $PID2 7 $ADDR2

where $PID1 != $PID2 (expected since it was seen in two LGEN
partitions). They were encoded as "8" and "7" in each of their LGEN
partitions. And when reading them and printing the address of
cgraph_node $ADDR1 != $ADDR2.

So, previously when I thought that merged implied that $ADDR1 ==
$ADDR2 this print shows that not to be the case. Also when I thought
that merging implied they would have the same number, the different
encoding showed that not to be the case. What I essentially want is
the following:

fopen $PID1 $ID
fopen $PID2 $ID

Where $ID can either be derived at WPA time or is encoded during LGEN.

I'm wondering if I should "merge" these two constraint variables by
checking their asm_name? I was just about to run an experiment to see
different cases for this (i.e., same named static function and print
their assembler name, etc.). This would be an example of $ID being
derived at WPA time by doing an O(n^2) comparison in the number of
functions, partitioning them into equivalent functions and then
assigning each of the distinct partitions an incremental ID.

>
> > Another question, how is this currently handled in other IPA passes?
> > Alternatively, do you have suggestions for encoding functions and
> > global variables in a similar way to how you suggested encoding ssa
> > variables and local declarations?
>
> I don't think any other pass has to encode SSA vars because those are
> strictly function local.  They only handle IP invariants aka addresses of
> globals or so.
>

For SSA vars I followed your suggestion of having a function encoding
and then using the SSA_VERSION number. Essentially 
x ssa version. However, since both varpool and cgraph_node are encoded
in the same way (for which I am having problems), then the issue is
with the function encoding.


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-22 Thread Erick Ochoa via Gcc
>
> fopen $PID1 8 $ADDR1
> fopen $PID2 7 $ADDR2
>

Just to clarify a bit further. $PID is generated and stored during
LGEN. The encoding is obviously generated during LGEN.
These are read during WPA. And the encoding is decoded and dyn_casted
into a cgraph_node at WPA time.
All these are printed during WPA.


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-22 Thread Erick Ochoa via Gcc
> > > 1. pid of lgen process that generated the encoding
> > > 2. index returned by lto_symtab_encoder_encode
> > > 3. varpool_node->name ()
> > > 4. the pointer address being pointed by varpool node
>
> Well, yes, during LGEN no WPA has run.  Do you mean LTRANS after WPA?
> Sure, the encoder numbers do not have to match up between different
> LTRANS units but then they don't speak to each other so that shouldn't
> matter, no?

No. I mean during WPA. On a different e-mail I clarified the following:

```
>
> fopen $PID1 8 $ADDR1
> fopen $PID2 7 $ADDR2
>

Just to clarify a bit further. $PID is generated and stored during
LGEN. The encoding is obviously generated during LGEN.
These are read during WPA. And the encoding is decoded and dyn_casted
into a cgraph_node at WPA time.
All these are printed during WPA.
```

>
> I _think_ that it should work if you stream at LGEN constraint variables
> as their varpool node (using the varpool encoder), get the nodes merged
> at WPA time, and thus your constraints from different LGEN runs "merged"
> properly, then stream them the same way to the LTRANS units?
>

The only reference to a varpool encoder is on the Changelog. The only
encoder I know of is the symtab_encoder (which I believe should be the
same as varpool_nodes and cgraph_nodes are both symtab_nodes). But
again, I do not know what you mean by "merged" here, since they have
different addresses.

>
> And the same solution should exist.  For "merged" function definitions
> (like multiple same inline definitions) you'd simply drop all but one set of
> constraints.
>

Yes, this is what I would like, but I don't see how to detect "merged"
function definitions. I can get their cgraphs but as I mentioned for
every encoding I decode and dyn_cast all I get is a cgraph holding a
different address. What does "merged" concretely mean?


Re: tree decl stored during LGEN does not map to a symtab_node during WPA

2021-07-22 Thread Erick Ochoa via Gcc
>
> But the addresses are at LGEN time?

The following is what runs at WPA time

unsigned long pid = streamer_read_uhwi (&ib);
unsigned long id = streamer_read_uhwi (&ib);
lto_symtab_encoder_t encoder = file_data->symtab_node_encoder;
cgraph_node *cnode =
dyn_cast(lto_symtab_encoder_deref(encoder, id));
logger ("%s %ld %ld %p\n", cnode->name (), pid, id, cnode);

> Note the nodes are actually
> streamed to different instances by input_symtab, then decls are merged
> (lto_symtab_merge_decls), then I think the IPA
> pass summaries are read in (to different unmerged instances!), _then_
> the symtab merging process starts (lto_symtab_merge_symbols).
> I think the last step eventually calls the cgraph/varpool removal hook
> IPA passes registered.

Ah, so what you are saying is that during the read_summary stage they
will still be different, but during execute or
write_optimization_summary (), will they be finally merged? I think
maybe the terminology of LGEN/WPA/LTRANS should be expanded to be
lgen_gen, lgen_write, lwpa_read, lwpa_exec/lwpa_write, ltrans_read,
ltrans_exec?

So, just to be a bit more concrete, when initializing the
ipa_opt_pass_d instance one has to write functions which will be
called by a parent process. Normally I see the following comments with
them:

generate_summary
write_summary
read_summary
write_optimization_summary
read_optimization_summary

and finally there's the execute function that gets called.

I am doing the following:

generate_summary, /* generating pid */
write_summary /* generating id and writing pid and id */
read_summary /* reading and printing the info I told about */
write_optimization_summary /* nothing yet */
read_optimization_summary /* nothing yet */
execute /* nothing yet */

And I think these correspond to the following "LGEN/WPA/LTRANS" stages

1. lgen (multiple processes) generate_summary
2. lgen (multiple process) write_summary
3. wpa (single process) read_summary
4. wpa (single process) execute
5. wpa? (single process?) write_optimization_summary
6  ltrans (multiple processes) read_optimization_summary


And you are telling me that cgraph_node and varpool_nodes will have
the same address only after the beginning of the execute stage but not
before that?

Is the above correct?



I did try printing cnode->name() during execute and it segfaulted, so
perhaps those function bodies where merged to something else? Note,
that some names were successfully printed out. I'm wondering, can I
use the function lto_symtab_encoder_deref during execute? I think this
is unlikely... because in the past I've tried to use
lto_symtab_encoder_encode during generate_summary and it caused
segfaults. I'll still give it a try.

Perhaps this is still a bit of progress? But now I'm wondering, if I
can't use lto_symtab_encoder_deref and the nodes were indeed merged,
do some of the varpool_node* I saved during read_summary are pointing
to random memory? How am I able to tell which ones survived?



>
> It might be that you need a replace hook to do what you want, I think
> that for example IPA CP encodes references to global vars aka &global
> as IPA_REF and those are transparently re-written.
>
> As said, I think it can be made work but the details, since this is the
> first IPA pass needing this, can be incomplete infra-structure-wise.
>
> Basically you have summaries like
>
>  'global = _3'
>
> where the  should eventually be implicit and the constraints
> grouped into constraints generated from the respective function body
> and constraints generated by call stmts (not sure here), and constraints
> for global variable init.  But for the above constraint the point is to
> make the 'global' references from different LGEN units the same by
> some means (but not streaming and comparing the actual assembler name).
>

I'll need some more time to read through how ipa-cp encodes references
to global variables. Thanks for the suggestion!

I don't really follow the paragraph that details what you think my
summaries look like. I'm thinking that for

global = _3

global is a variable? and _3 states that it is an SSA variable
in function 1? I think that can be a possible notation. I prefer to
just use integers.

What do you mean by implicit?

But the idea is to essentially "compile" down all
variables/functions/locals/ssa/etc into integers. And have the program
represented as integers and relation of integers. For example:

int* a

extern void foo (int* c);

int main ()
{
  int b;
  a = &b;
  foo (a) // defined in a different function
}

Should have the following at LGEN time (more specifically write_summary)

variable -> long integer encoding

abstract null -> $null_id
cgraph main -> 0
cgraph foo -> 1
varpool a -> 2
tree b -> 0 x 0  // corresponds to main position 0
real arg c -> 1 x 0 // corresponds to foo position 0

Technically we can also map the other way around, we just need to know
in which "table" the information is stored. (I.e., the symbol_table

Can someone help me understand cgraph_nodes & cgraph_edges during WPA

2021-07-23 Thread Erick Ochoa via Gcc
Hello,

I've been working on an LTO points-to analysis pass for a little
while. Because of LTO's design, gimple bodies are inaccessible during
WPA. This essentially means that every LTO pass compiles down function
bodies into their own IR which gets stored in function summaries and
later read during WPA. This is also what I plan to do.

I recently started looking into how IPA-CP works. I noticed that
again, IPA-CP compiles down every function into its own function
summary. However, while reading functions, it selectively decides
which information to store by looking at symtab_node::prevailing_p. I
was not aware of this function but from what I understand it is a way
of deciding which symtab_node's bodies survive when removing
duplicates before the execute stage of the pass. Is this correct? For
IPA-CP, those cgraph_nodes for which the predicate
symtab_node::prevailing_p are just read and discarded. This makes
sense if it is a duplication of content.

I know that different cgraph_nodes might represent the same function
but maybe one of them has been specialized, another version has been
inlined. I also think that two different cgraph_nodes might represent
the same function implementation (i.e., they shared the same body and
the same information but this information is duplicated during LGEN
across partitions). I believe that it is not until the WPA/execute
(making a distinction between WPA/execute and WPA/read_summary) that
the distinct cgraph_nodes are merged. Would it be correct to say that
a more faithful representation of reality is that non-prevailing_p
nodes are eliminated while the other ones remain?) However,
cgraph_nodes which represent the same function, but have been
specialized will be marked as prevailing_p. Is this correct? (Here, I
am not sure about the internals of the LTO, because in some sense, the
points-to analysis hasn't run, but it is possible that other analysis
have already run their WPA/execute stage and have said that some
function bodies need to be specialized but at the moment they are
still virtual clones? Related question, do virtual clones have
cgraph_node?)

I did a little experiment yesterday where I had the following control group:
1. encoded a cgraph_node during LGEN/write_summary
2. decoded a cgraph_node during WPA/read_summary and printed cnode->name ()

and compared it against the following experimental group:
1. encoded a cgraph_node during LGEN/write_summary
2. decoded a cgraph_node during WPA/read
3. during WPA/execute I printed cnode->name ()

What I found was that during the run of the "control group" I was able
to print all cnodes' names. However, during the run of the
"experimental group" only some of the names were printed before a
segmentation fault occurred. Again, this might have been because those
cgraph_node's were deleted. My theory is that these are
non-prevailing_p cgraph_nodes but I haven't confirmed it
experimentally, is this the case? I also do not know if all data being
pointed to by these cgraph_node* is corrupted or if only some parts of
the cgraph_node* have been removed from memory (like the name). Would
cgraph_node* during WPA/execute in the experimental run have some
valid fields or should it all be considered invalid and not even
accessed outside of WPA/read?

Looking at the definition of non-prevailing_p, it seems that all
functions without a gimple body will be marked as non-prevailing_p.
What does this mean though? There are definitely calls to external
functions and so having a call to a non-prevailing_p just means that
you are calling a function with no defined body. But what does that
mean for functions that were "merged" or removed because they are
duplicates? Can you have a cgraph_edge to a non-prevailing_p
cgraph_node whose function body was once available at LGEN/lwrite but
it is no longer available during WPA/execute? If that's the case how
does one know the target of the call?

Sorry if these are too many questions, I do greatly appreciate all the
support given to me in the mailing list.

In the meanwhile, I'll continue looking into how ipa-cp works to see
what I can learn from other sources.

Thanks
-Erick


Question about finding parameters in function bodies from SSA variables

2021-08-05 Thread Erick Ochoa via Gcc
Hello Richard,

I'm still working on the points-to analysis and I am happy to say that
after reviewing the ipa-cp code I was able to generate summaries for
local variables, ssa variables, heap variables, global variables and
functions. I am also using the callback hooks to find out if
cgraph_nodes and varpool_nodes are added or deleted between
read_summaries and execute. Even though I don't update the solutions
between execute and function_transform yet, I am reading the points-to
pairs and remapping the constraint variables back to trees during
function_transform and printing the name of pointer-pointee pairs.

This is still very much a work in progress and a very weak points-to
analysis. I have almost finished my Andersen's / field insensitive /
context insensitive / flow-insensitive / intraprocedural analysis with
the LTO framework (without interacting with other transformations
yet). The only thing that I am missing is assigning parameters to be
pointing to NONLOCAL memory upon entry to the function and perhaps
some corner cases where gimple is not exactly how I expect it to be.

I am wondering, none of the variables in
function->gimple_df->ssa_names and function->local_decls are
PARM_DECL. I'm also not entirely sure if I should be looking for
PARM_DECLs since looking at function bodies' gimple representation I
don't see the formal parameters being used inside the function.
Instead, it appears that some SSA variables are automatically
initialized with the parameter value. Is this the case?

For example, for a function:

foo (struct a* $NAME)

The variable $NAME is nowhere used inside the function. I also found
that there is an ssa variable in location X ( in
function->gimple_df->ssa_names[X]) named with a variation like
$NAME_$X(D) and this seems to correspond to the parameter $NAME. How
can one (preferably looking only at
function->gimple_df->ssa_names[$X]) find out that this tree
corresponds to a parameter?

Many thanks!
-Erick


Some questions on hash_set/hash_map traits

2021-09-08 Thread Erick Ochoa via Gcc
Hello,

I am refactoring some old code that runs as an IPA_PASS. This code is
intended to run at link-time using the LTO framework and so I am
always testing it that way. At the moment, I am trying to use
hash-maps but having some difficulties. I am adding some patches that
will help illustrate along the way.

The first patch simply adds a link-time optimization pass that will be
used to demo the problem that I am having. One can run with -flto
-fipa-hash. During the first patch, the pass does not do any
meaningful work but it just helps to set up the stage.

For the second patch I create a wrapper around a symtab_node. This
wrapper is intended to add some other functionality to elements that I
might want to insert in hash-sets or hash-maps.

class symtab_wrapper
{
private:
  symtab_node *_symtab_node;
public:
  friend symtab_wrapper_traits;
  symtab_wrapper ()
: _symtab_node (NULL)
  {}
  symtab_wrapper (symtab_node *s)
: _symtab_node (s)
  { gcc_assert (s); }

  void print (void)
  {
if (!dump_file) return;
gcc_assert (_symtab_node);
fprintf (dump_file, "%s\n", _symtab_node->name ());
  }
};

I also have a wrapper around a hash-set to add some things that might
be of interest to me in the future:

template  >
class my_set_t
{
private:
  hash_set  _impl;
public:
  my_set_t () {}

  void insert (const KeyId &k)
  {
_impl.add (k);
  }

  bool contains (const KeyId &k)
  {
return _impl.contains (k);
  }

  void print (void)
  {
if (!dump_file) return;
for (auto i = _impl.begin (), e = _impl.end (); i != e; ++i)
{
  (*i).print ();
}
  }
};

I also created a symtab_wrapper_traits because I want to store the
object in the hash-map itself, and so I need to specify how to hash
it.

class symtab_wrapper_traits
{
public:
  typedef symtab_wrapper value_type;
  typedef symtab_wrapper compare_type;
  static const bool empty_zero_p = true;

  static inline hashval_t hash (const value_type &v)
  {
return pointer_hash ::hash (v._symtab_node);
  }

  static bool equal (const value_type &l, const value_type &r)
  {
return l._symtab_node == r._symtab_node;
  }

  static void mark_deleted (value_type &v)
  {
v._symtab_node = NULL;
  }

  static void mark_empty (value_type &v)
  {
v._symtab_node = NULL;
  }

  static void remove (value_type &v)
  {
v._symtab_node = NULL;
  }

  static bool is_deleted (const value_type &v)
  {
 return !v._symtab_node;
  }

  static bool is_empty (const value_type &v)
  {
 return !v._symtab_node;
  }
};

I then create a global set with my traits and populate it with some
information and later print. It seems to be working:

my_set_t  my_set;

 static void
 ipa_hash_generate_summary (void)
 {
  cgraph_node *cnode = NULL;
  FOR_EACH_FUNCTION (cnode)
  {
symtab_wrapper w (cnode);
my_set.insert (w);
  }

  my_set.print ();

  return;
 }

Here, I already have some questions, but this is not the biggest issue
that I am having: I believe that the hash_set implementation manages
its own memory, but I am unclear about the semantics of "removed".

* Is "removed" supposed to, for example, call the destructor? Since,
in this particular case, it is only a wrapper and cgraph_node is not
intended to be destroyed, I just assign the pointer to NULL. Not sure
if that's the intention.
* I don't understand the semantics of empty_zero_p, I think it is used
to zero out deleted entries? If I mark something as empty.
* I am assuming that its memory will be re-used, is this the case?
* I see that there is a typed_delete_remove function that is sometimes
used in traits, but I think that it doesn't apply to this case because
I am storing the whole object. Is that the case?

I later did some refactoring (patch 3), which did not seem to impact
the code now, but will impact hash_map later. The refactoring is as
follows, I create an abstract "printable" class

class printable
{
public:
  virtual char* str () = 0;
  void print (void);
};

void
printable::print (void)
{
  if (!dump_file) return;
  char* _str = str ();
  fprintf (dump_file, "%s\n", _str);
  free (_str);
}

Make symtab_wrapper a publically derived class

-class symtab_wrapper
+class symtab_wrapper : public printable

and implemented the str function in symtab_wrapper:

  virtual char* str (void)
   {
if (!dump_file) return;
gcc_assert (_symtab_node);
char *retval = NULL;
asprintf (&retval, "%s", _symtab_node->name ());
return retval;
  }

Again, this seems to be working correctly.

What is more interesting is moving from a hash-set to a hash-map. On
the fourth patch, I implemented the hash_map equivalent to the
hash_set that I am describing above:

template 
class my_map_t
{
private:
  hash_map  _impl;
public:
  my_map_t () {}

  void insert (const KeyId &k, const Value &v)
  {
_impl.put (k, v);
  }

  Value *select (const KeyId &k)
  {
return _impl.get (k);
  }

  void print (void)
  {
if (!dump_file) return;
for (auto i = _impl.begin (), e = 

Re: Can gcc itself be tested with ubsan? If so, how?

2021-09-27 Thread Erick Ochoa via Gcc
Hi,

just as a note. This is also of interest to me. I have wanted to
compile a single pass that I wrote using ubsan/other sanitizers for
testing purposes. I was wondering if someone has already modified the
build system to use ubsan to test their passes and if they could
document the process for doing so. For these purposes, I don't really
care if it can be done only without bootstrapping. I haven't
investigated enough to find out if it is possible, but I suspect it is
and may have already been done.

Thanks!

On Tue, 28 Sept 2021 at 01:01, Gary Oblock via Gcc  wrote:
>
> I tried just adding "-fsanitize=undefined" to my CXX_FLAGS and got
> a bunch of errors like this:
>
> /usr/bin/ld: ../libcody/libcody.a(server.o): in function 
> `std::__cxx11::basic_string, 
> std::allocator >::_Alloc_hider::~_Alloc_hider()':
> /usr/include/c++/9/bits/basic_string.h:150: undefined reference to 
> `__ubsan_handle_type_mismatch_v1'
>
> They all seemed library related.
>
> Can ubsan be used on the compiler itself?
>
> Thanks,
>
> Gary
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Question about symtab_node::order during LTO

2021-10-12 Thread Erick Ochoa via Gcc
Hi,

I have an LTO pass which stores information collected during "generate
function summary" in a map which is symtab_node* -> data*. I know that
the symtab_node*s are encoded by an lto encoder and can be decoded
back during the "read function summary". I also am aware that other
optimizations might be removing or adding cgraph_node*s to the program
and my pass uses the insertion and removal hooks to analyze or drop
these accordingly. However, I am confused about different function
versions. My current understanding is that different versions of the
same function share the same cgraph_node*. Is this correct?

If that's the case, that would mean that storing information on a map
where the key is the symtab_node* is probably not a good idea as a
specific version of the function might be dropped and in that case I
may be dropping information for all cases.

This brings me to the field "order" in symtab_node. Is this field
constant through the linking process and can it be used to
differentiate between different versions of the same function?

Thanks!


Question about semantics of cgraph_node::prevailing_p

2021-10-13 Thread Erick Ochoa via Gcc
Hi,

My current understanding of LTO is that the callgraph is very dynamic
(i.e., optimizations might add or remove cgraph_nodes). A while back I
encountered an issue where I couldn't print the cgraph_node::name of a
function during the execute stage in LTO. I found that the only thing
different was that these cgraph_nodes had a prevailing_p () returning
false. There's only a few other places in the GCC sources that look at
prevailing_p (ipa-fnsummary.c and ipa-prop.c) and both of them seem to
drop information if the cgraph_node is not prevailing.

I took this as guidance and in my optimization I dropped information
from non-prevailing cgraph_nodes. (I figured out it must be something
similar to the remove cgraph_hooks.) However, I am now revisiting
whether this interpretation was correct or not. Can someone tell me
what does cgraph_node::prevailing_p actually means?

Also, is there a way to code an LTO test where a specific
cgraph_node's prevailing_p returns false? That way I could probably
verify the correctness of my analysis in the presence of
non-prevailing_p cgraph_nodes.

Thanks!


Question on calling possible_polymorphic_call_targets, side effects, and virtual tables.

2021-10-27 Thread Erick Ochoa via Gcc
Hello,

I have a SIMPLE_IPA_PASS that parses the program multiple times. As it
parses gimple, it builds a data structure with the information
collected that will provide some invariants to future iterations over
the program.

I was looking into adding a new feature that would take advantage of
devirtualization by calling possible_polymorphic_call_targets. All
looks good, a large program that I use for verifying changes seems to
compile nicely. However, as I generate a test file in which I discard
most optimizations (except the one I am currently working on) I
realize that an assertion on my pass is triggered.

I dig in and it looks like I am ignoring error_mark_nodes in varpools'
DECL_INITIAL. The first passes essentially encode the information that
these are error_mark_nodes. On the pass in which I call
possible_polymorphic_call_targets I find that suddenly, these
error_mark_nodes are gone and replaced with a virtual table, thus
triggering the assertion.

Is there a way I can get rid of the error_mark_nodes from the earlier
passes to match the changes brought by
possible_polymorphic_call_targets? I suppose that finding
polymorphic_call_targets at an earlier stage is a possibility, but I
was wondering exactly which function/statement is able to fix these
error_mark_nodes so that I can also learn about this.

Thanks!


How to run C++ IPA tests?

2021-10-27 Thread Erick Ochoa via Gcc
Hi,

I have been adding tests to the gcc/testsuite/gcc.dg/ipa folder
successfully for a while now. I am starting to add some tests into
gcc/testsuite/g++.dg/ipa now but I am having some issues.

1. Using `make check-g++` returns the following error message "No rule
to make target 'check-g++'".
2. While `make check-gcc` works, this seems to be only for C tests
(which is correct).
3. When I type `make check RUNTESTS="ipa.exp"`  but the section "g++
Summary" looks empty.
4. I have added a test that I know will fail, but I don't see it
(because I don't think I have run g++ tests correctly).

To make sure that I haven't changed anything with my transformation I
actually checked out releases/gcc-11.2.0

Can someone tell me how to run the C++ IPA tests? I'm sure there is
something silly that I'm doing wrong, but can't find out what it is.

Thanks!


Question on cgraph_node::force_output

2021-11-02 Thread Erick Ochoa via Gcc
Hi,

I am looking at tree-ssa-structalias.c looking at what makes a
function nonlocal during IPA-PTA. I am having some problems
understanding force_output and when it is set or unset.

1. What is the meaning of force_output? cgraph.h gives an example that
force output means that the symbol might be used in an invisible way.
I believe this means some sort of "unanalyzable" way. However, for a
few tests I've made, all functions have the field force_output set to
true.
2. Does this value depend on some other pass?

At the moment, I am looking at this field within my own passes
(IPA_PASS and SIMPLE_IPA_PASS), but I would like to inspect the
dump_file(s) which show information about force_output to make sure
that it doesn't depend on pass order or even my own flags.

3. What flags should I use to inspect force_output?

Thanks!


Question on ipa_ref->referring and ipa_ref->stmt on all_late_ipa_passes

2022-02-14 Thread Erick Ochoa via Gcc
Hi,

I would like to use ipa_ref in the PASS_LIST all_late_ipa_passes to query
the statement (ref->stmt) of where a global variable is used. However, I am
having some problems achieving this.

What I do is:

1. Check that ipa_ref->referring has a body and is not inlined.
2. get_body
3. try to print out the gimple statement using print_gimple_stmt
(dump_file, ref->stmt, 0, TDF_NONE).

This all seems correct to me, but I have been receiving errors that print
is trying to print a tree with an incorrect TREE_CODE. I am assuming here
that ref->stmt is not updated after all_regular_ipa_passes, much like how
when looking at cgraph_edge the call statement is also not updated. Can
someone please tell me if this is indeed the case or what is happening here?

Also, while I think that the gimple statements might not be maintained, I
see that ipa_ref is still used in the ipa_pta pass during
all_late_ipa_passes. I see that ipa_ref->referring and ipa_ref->stmt are
not used. Instead the tree of the referred is obtained in the following
way: ref->referred->decl. I am assuming that it would be possible to use
ref->referred->decl and search for this tree everywhere in referring to
find the uses. Can someone confirm this?

Thanks!
-Erick


Re: Question on ipa_ref->referring and ipa_ref->stmt on all_late_ipa_passes

2022-02-14 Thread Erick Ochoa via Gcc
On Mon, 14 Feb 2022 at 10:57, Jan Hubicka  wrote:

> > Hi,
> >
> > I would like to use ipa_ref in the PASS_LIST all_late_ipa_passes to query
> > the statement (ref->stmt) of where a global variable is used. However, I
> am
> > having some problems achieving this.
> >
> > What I do is:
> >
> > 1. Check that ipa_ref->referring has a body and is not inlined.
> > 2. get_body
> > 3. try to print out the gimple statement using print_gimple_stmt
> > (dump_file, ref->stmt, 0, TDF_NONE).
> >
> > This all seems correct to me, but I have been receiving errors that print
> > is trying to print a tree with an incorrect TREE_CODE. I am assuming here
> > that ref->stmt is not updated after all_regular_ipa_passes, much like how
> > when looking at cgraph_edge the call statement is also not updated. Can
> > someone please tell me if this is indeed the case or what is happening
> here?
>
> Yes, while body materialization we keep cgraph edges up to date but we
> do not keep references.  We probably should remove them earlier.
> Keeping them up to date would need relatively some work. We do so for
> calls since they hold important information (i.e. where to redirect the
> call). For references we don't have such machinery in place (even though
> I was thinking implementing it). Main difficulty is that inlining also
> performs statement folding that moves referneces around statements quite
> freely
>

Hi Honza,

just a bit of clarification here, when you are saying that references are
not updated do you mean just ipa_ref->stmt or do you also include other
things like ipa_ref->referring, ipa_ref->use?

Thanks!


> >
> > Also, while I think that the gimple statements might not be maintained, I
> > see that ipa_ref is still used in the ipa_pta pass during
> > all_late_ipa_passes. I see that ipa_ref->referring and ipa_ref->stmt are
> > not used. Instead the tree of the referred is obtained in the following
> > way: ref->referred->decl. I am assuming that it would be possible to use
> > ref->referred->decl and search for this tree everywhere in referring to
> > find the uses. Can someone confirm this?
>
> I will check what ipa-pta uses here.  I suppose it works since you still
> have all references from the pre-IPA-transform stage, so it is
> consistent...
>
> Honza
> >
> > Thanks!
> > -Erick
>


Question on ipa-bit-cp and pointer_plus_expr

2022-02-17 Thread Erick Ochoa via Gcc
Hello,

I'm trying to understand ipa-bit-cp/ipa-cp and how the known bits are
propagated to the lattice in the case of a pointer_plus_expr.

I have a piece of code similar to the following (this being a simplified
example)

int main ()
{
  // a = some pointer.
  foo (a);
}

foo (void* a)
{
  bar (a + 1);
}

bar (void *a) {}

I have the following ipa-cp dump:

callsite main -> foo
param 0: UNKNOWN
 value: 0x0, mask 0xfff8


as it is passed to bar I have the following:

callsite foo -> bar
param 0: PASS THROUGH: 0, op poiner_plus_expr 8
  value: 0x0, mask 0x

My question is, shouldn't the mask to bar be 0xfff8?

If I understand correctly the mask being 0xfff8 implies that
the value will be a multiple of 8. By adding 8 (or any multiple of 8) to
it, it should still be a multiple of 8. Where I believe the mask becomes
0x is that during the addition there might be an overflow
of the masks and the operand 8. But I am still unsure. Can someone point
out what is happening here?

Thanks!


Re: Question on ipa-bit-cp and pointer_plus_expr

2022-02-17 Thread Erick Ochoa via Gcc
Thanks Martin!

The example I showed is simplified and I had not specified that I was using
LTO, but I am. Thanks for asking me to clarify this.

The issue I have with the information in the Lattice section of the dump is
that it shows the information once the analysis has achieved a fixed point.
However, the example is a bit more complicated. There is a recursive call
in bar and some pointer manipulation which will for sure remove the static
information that the incoming parameter to bar is a multiple of 8.

bar (void *a) {
  // some pointer manipulation
  a = // pointer manipulation based on a, casts to (char*) arithmetic...
  bar (a);
}

However, I am mostly interested in the static information available at the
callsite foo -> bar and not on the incoming static information available at
the parameter "a" of bar. Does this make sense?

Is there a place where I can look at the static information available at
the argument and not the parameter?

Thanks!

On Thu, 17 Feb 2022 at 17:25, Martin Jambor  wrote:

> Hi,
>
> On Thu, Feb 17 2022, Erick Ochoa via Gcc wrote:
> > Hello,
> >
> > I'm trying to understand ipa-bit-cp/ipa-cp and how the known bits are
> > propagated to the lattice in the case of a pointer_plus_expr.
> >
> > I have a piece of code similar to the following (this being a simplified
> > example)
> >
> > int main ()
> > {
> >   // a = some pointer.
> >   foo (a);
> > }
> >
> > foo (void* a)
> > {
> >   bar (a + 1);
> > }
> >
> > bar (void *a) {}
> >
> > I have the following ipa-cp dump:
> >
> > callsite main -> foo
> > param 0: UNKNOWN
> >  value: 0x0, mask 0xfff8
> >
> >
> > as it is passed to bar I have the following:
> >
> > callsite foo -> bar
> > param 0: PASS THROUGH: 0, op poiner_plus_expr 8
> >   value: 0x0, mask 0x
> >
> > My question is, shouldn't the mask to bar be 0xfff8?
> >
> > If I understand correctly the mask being 0xfff8 implies that
> > the value will be a multiple of 8. By adding 8 (or any multiple of 8) to
> > it, it should still be a multiple of 8. Where I believe the mask becomes
> > 0x is that during the addition there might be an overflow
> > of the masks and the operand 8. But I am still unsure. Can someone point
> > out what is happening here?
>
> If you do not use LTO, foo and bar must be static for IPA-CP to
> determine that their arguments are aligned - and correctly so, if there
> are unknown callers passing unknown values, we cannot assume anything.
> With LTO or them being static, IPA-CP can do so.
>
> But make sure you understand that different things in the dump mean
> different stuff, you will not find any propagated values in the
> jump-function dump which you quoted, those really just describe the call
> without any larger context.  For results of propagation you need to look
> into the "Lattices" section of the dump.
>
> Martin
>


Re: Question on ipa-bit-cp and pointer_plus_expr

2022-02-17 Thread Erick Ochoa via Gcc
> If I understand you correctly, that is indeed the jump function,
> obtainable through ipa_get_ith_jump_func (args, i) where args is a
> result of ipa_edge_args_sum->get and i is the index of the parameter.
>

Thanks Martin!

So then, am I correct in thinking that

callsite foo -> bar
param 0: PASS THROUGH: 0, op poiner_plus_expr 8
  value: 0x0, mask 0x

should be

callsite foo -> bar
param 0: PASS THROUGH: 0, op poiner_plus_expr 8
  value: 0x0, mask 0xfff8

?


ASSERT_EXPR during SIMPLE_IPA_PASS and LTO

2022-03-01 Thread Erick Ochoa via Gcc
Hi,

I am working on an analysis that is able to determine some static
information about a specific variable. At the moment, I would like to avoid
much of the transformation by taking advantage of other GCC's passes. So, I
can imagine something like, create an ASSERT_EXPR and let other passes
change the CFG, and remove dead code, fold constants, etc.

At the moment I am only prototyping it, and so I have created a
SIMPLE_IPA_PASS during LTO that finds the variables of interest and then
creates an ASSERT_EXPR which contains the static information of interest.

This looks to be working somewhat correctly. When I print the functions'
gimple everything looks ok. The pass doesn't fail even if I use the
TODO_verify_all flag at the end of the SIMPLE_IPA_PASS. However, the
transformation triggers a segfault during fixup_cfg and I am unsure what
can be the case.

My questions are:
1. Is it possible to insert new ASSERT_EXPR during LTO and have other tree
passes transform the function based on this?
2. How complex can the ASSERT_EXPR be? I can see that it must be a COND
expr, by which it can be EQ_EXPR, NE_EXPR, LT_EXPR... and others. But what
if I need to express something like a bitmask to guarantee that certain
bits are zero? Could I have a complicated ASSERT_EXPR where COND is
ASSERT_EXPR (a & 0xff == 0)?
Or should I break it down?
temp = a & 0xff;
ASSERT_EXPR (temp == 0);
3. Any other advice?

Thanks!


Question on updating function body on specialized functions

2022-03-08 Thread Erick Ochoa via Gcc
Hi,

I have one function (F) that has been specialized for two different calling
contexts (F1 and F2) and two late SIMPLE_IPA_PASSes (A and B). Pass A
changes some MEM_REFs such that the type of MEM_REF is compatible with the
type of the first operand of the expression. Pass A changes both F1 and F2.
I have printed the function bodies of both F1 and F2 during Pass A and
everything looks correct. Pass B uses these changes.

However I noticed this interesting behaviour:

1. If I fix F1 first and then F2, then pass B will see F2 correctly but
some of F1 MEM_REFs will be incorrect.
2. If I fix F2 first and then F1, then pass B will see F1 correctly but
some of F2 MEM_REFs will be incorrect.

My question is do different specialized functions share the same trees? How
would I then change the bodies of specialized functions?

Thanks!


Re: Question on updating function body on specialized functions

2022-03-08 Thread Erick Ochoa via Gcc
Hi Martin!

Thanks for replying, turns out that while I was trying to reply to you I
was able to get the answer. Turns out there is indeed one tree node which
is shared across the two functions. And that is

TREE_OPERAND (MEM_REF, 1).

When I was assigning to

TREE_TYPE ( TREE_OPERAND (MEM_REF, 1) ) in one function, I was modifying
the other. The solution was to create a new tree and assign it directly to
TREE_OPERAND (MEM_REF, 1) in both functions.

Thanks!


Question on mapping old to specialized cgraph_edges

2022-03-09 Thread Erick Ochoa via Gcc
Hi,

I am trying to find a map between cgraph_edge*s before ipa-cp and after
ipa-cp has specialized nodes. Does anyone know if such a map is available?
Or an equivalent? I think technically it should be a map between:

(cgraph_node* caller, cgraph_edge* e, cgraph_node *callee) X (cgraph_node*
caller', cgraph_edge* e', cgraph_node *callee')

since both caller and callee may be updated during ipa-cp. But any help or
direction is appreciated.

Thanks!


Question on not sharing trees (tree_node_can_be_shared)

2022-03-21 Thread Erick Ochoa via Gcc
Hi,

I am interested in annotating INTEGER_CSTs. I have added a field to
typed_trees and set the value in some cases. However, INTEGER_CSTs can be
shared and copied across the function and even copied to other functions. I
don't want all INTEGER_CSTs with the same value to be shared as now they
have an annotation which may make them somewhat unique. My question is, is
there a way to disable sharing of INTEGER_CSTs?

I have tried modifying tree_node_can_be_shared to return false when the
argument is an INTEGER_CST. However, the verifier then fails stating that
there was an incorrect sharing of tree nodes. My intuition tells me that
this means that some pass assumes that INTEGER_CSTs can be shared and
therefore the information gets copied. However, during the verifier this
gets caught and returns an error. Is this correct? How would one go around
and disable sharing INTEGER_CSTs?

Is there a tree comparison for INTEGER_CSTs that I can set such that if an
INTEGER_CST has my field set then it registers as a different INTEGER_CST
even if they have the same constant value?

Thanks!


Question on path from C parser to DECL_INITIAL

2022-03-23 Thread Erick Ochoa via Gcc
Hi,

I am trying to understand what path is executed in GCC from parsing a C
expression (in a global variable declaration) to the value in DECL_INITIAL.
At the moment, I have annotated a tree during parsing. I have another
debugging pass that looks for this tree in subsequent passes. The
annotation persists if the tree is inside a function body. However, if the
tree is in a global variable initialization (i.e., DECL_INITIAL), then I am
unable to see the annotation even when placing the debugging pass at the
beginning of passes.def.

Can someone provide a little bit of guidance?

Thanks!


Re: Question on path from C parser to DECL_INITIAL

2022-03-23 Thread Erick Ochoa via Gcc
> I'm not sure I understand but no pass walks 'global variables', so you're
> not
> going to "see" the annotation from passes (whatever that means).
>
>
What I mean by walking global variables is that I have a GIMPLE_PASS that
ignores the function sent as an argument and instead just uses a
FOR_EACH_VARIABLE to look at varpool nodes and check if the DECL_INITIAL
has a field set or unset. I can print the variable, and DECL_INITIAL.
However the DECL_INITIAL tree always has the field unset even though I am
setting it during c_sizeof_or_alignof_type. The initialization looks simply
as:

int a = sizeof (foo);


Question on cgraph_edge::call_stmt during LTO

2022-05-20 Thread Erick Ochoa via Gcc
Hi,

I'm working on a pass that looks into the estimated values during ipa-cp
and stores them for a later analyses/pass. I would like to store the real
arguments' estimates in a cgraph_edge::call_stmt or somewhere else that
makes similar sense. (Note, this is different from the formal parameters'
estimates which can be found in the lattice print out of ipa-cp).

I have already added a new field in the definition of gimple_call, made
sure that this field is streamed-out, streamed-in, and set the values
during ipa-cp. However, I am having problems with what I believe is the
inlining and cloning of cgraph_nodes. Whenever a cgraph_node is inlined or
cloned, I would need to copy this information and update if necessary. At
the moment, when I am trying to read the information during a late
SIMPLE_IPA_PASS, the information is simply not there. I believe that the
cgraph_edge is not the same since during ipa-cp the callee has been
specialized and during ipa-inline the caller has been inlined to a
different caller.

Also, for some cgraph_edge's the call_stmt is NULL. I believe this can also
be due to inlining, but I am not sure.

Can someone point out a good way to keep this information in sync with the
creation and deletion of cgraph_edges? Maybe an alternative to
cgraph_edge::call_stmt?

Thanks!

-Erick


Re: Question on cgraph_edge::call_stmt during LTO

2022-06-02 Thread Erick Ochoa via Gcc
Hi Martin,

Thanks for the tips! I have implemented an edge summary which:

* is allocated at IPA analysis phase
* streamed out in ipcp_write_transformation_summaries
* streamed in in ipcp_read_transformation_summaries

However, before the implementation of this edge summary we had another
mechanism of propagating the information all the way until it was used in a
SIMPLE_IPA_PASS executed after all LGEN stages were finished (after
all_regular_ipa_passes). After changing the implementation to use edge
summaries, I find that the information is conserved during inlining (the
duplication hook prints out the new edges that gets formed via inlining
with the correct information), however it is not found in the
SIMPLE_IPA_PASS that gets executed after all_regular_ipa_passes.

What is perhaps more interesting is that if I run with -fno-ipa-pure-const
and no -fno-ipa-modref, I can still see the cgraph_nodes and edges of the
inlined methods, along with the information needed. But not in the ones
that have been inlined. I believe this could be just that when these
options are disabled, cgraph_nodes might not be reclaimed.

I understand that there are many differences between SIMPLE_IPA_PASSes and
regular IPA_PASSes, but at the moment I am unsure how to narrow down my
search for a fix. Is this something that could be caused by:

* memory management: (I am not familiar with the memory management in GCC
and it is a bit difficult to understand.) I have removed the bodies of the
my_edge_summary::remove (cgraph_edge*) and my_edge_summary::remove
(cgraph_edge *, my_edge_summary_instance *) so I don't think this might be
it. However, the class my_edge_summary still copies some of the structure
in the other transformation summaries, so there is a macro GTY((for_user))
in the class declaration and the information is stored in a vec  *my_info.
* missing implementation details in the duplicate functions: Looking at
ipa_edge_args_sum_t::duplicate, it is a relatively complex function. I also
noticed that it does something else when the dst->caller has been inlined.
Should I also update the cgraph_edge that disappears when dst->caller is
inlined to its caller?
* something else?

Any direction is greatly appreciated!
Many thanks!

-Erick


On Sat, 21 May 2022 at 00:13, Martin Jambor  wrote:

> Hello,
>
> On Fri, May 20 2022, Erick Ochoa via Gcc wrote:
> > Hi,
> >
> > I'm working on a pass that looks into the estimated values during ipa-cp
> > and stores them for a later analyses/pass. I would like to store the real
> > arguments' estimates in a cgraph_edge::call_stmt or somewhere else that
> > makes similar sense. (Note, this is different from the formal parameters'
> > estimates which can be found in the lattice print out of ipa-cp).
>
> the statement is not the right place to store such pass-specific
> information, for reasons you described and more (especially simple
> memory use efficiency).
>
> Instead they should be placed into an "edge summary" (also sometimes
> called "call summary"), a structure similar to ipa_edge_args_sum (in
> ipa-prop.h and ipa-prop.cc).  Unlike ipa_edge_args_sum, which is
> allocated at analysis phase, then streamed out and in in case of LTO,
> and used thrown away during the IPA analysis phase, your summary would
> need to be allocated at IPA analysis time, then streamed out in
> ipcp_write_transformation_summaries, streamed in in
> ipcp_read_transformation_summaries so that they can be used in the
> transformation phase.
>
> Usually a simple implementation of the duplication hook of an edge
> summary is enough for the data to survive cloning and inlining and the
> like.
>
> Martin
>


Questions on transition from IPA_PASS (LTRANS) to SIMPLE_IPA_PASS

2022-06-09 Thread Erick Ochoa via Gcc
Hi,

I understand some differences between IPA_PASSes and SIMPLE_IPA_PASSes.
However, I have questions about the cleanup processes that IPA_PASSes might
have.

* Would it be correct to assume that all node and edge summaries are
deleted after the last IPA_PASS? And would it also be correct to assume
that passes are responsible for releasing this information whenever it is
no longer needed? In other words, are there any methods / infrastructure
that runs between passes that may deallocate these summaries?
* Is there any sort of processing done on cgraph_nodes and cgraph_edges
beyond what is found in the LTRANS stage of each of the passes? I am
imagining something similar to streaming in and streaming out trees /
summaries, but I don't believe that should be the case. In my
understanding, the process that runs LTRANS is the same that runs the late
SIMPLE_IPA_PASSes.

Thanks!

-Erick


Question about speculative make_edge_direct_to_target during LTO/IPA_PASS

2022-07-01 Thread Erick Ochoa via Gcc
Hi,

I have a pass that is able to speculate the target of indirect function
calls. This pass is an IPA_PASS. It:

1. generates summaries with the possible targets.
2. writes analysis summary
3. reads analysis summary
4. combines the results from multiple partitions and if needed fixes the
targets
5. writes opt summary
6. reads opt summary
7. calls ipa_make_edge_direct_to_target with speculative=true

This pass is successful in that I can observe the transformation in the
generated gimple. (A function test and a direct function call in one branch
and the indirect function call in the other.)  However, I would like to
make the edges in the call graph visible to other IPA passes, particularly
ipa-cp. For this, I would need to create virtual clones during the WPA
stage. I am a little unfamiliar with virtual clones. What kind of
information would I need to store in the analysis summary and is there a
way to create speculative virtual clones? Can someone point to a similar
piece of code in GCC where this happens?

Thanks!


Re: Question about speculative make_edge_direct_to_target during LTO/IPA_PASS

2022-07-07 Thread Erick Ochoa via Gcc
On Fri, 1 Jul 2022 at 14:48, Martin Jambor  wrote:

> Why so late, why not as part of number 4?
>

Hi Martin,

Thanks for the feedback. The original reason why the call to
make_edge_direct_to_target was done so late is because of the lack of
function bodies during WPA, summaries with insufficient information (I keep
track of only some information which when access to function bodies is
given during LTRANS I can finally have a cgraph_edge to use as a
parameter), and I am still figuring out things with edge summaries.

I have a couple of questions here to improve my summaries:

If I were to call make_edge_direct_to_target during WPA (or number 4) then
I need to have access to the indirect edge during WPA. This is because
make_edge_direct_to_target receives a cgraph_edge as a parameter. In order
to have access to the indirect edge during WPA, I need to generate
summaries which keep track of indirect edges.

I think this would be done with edge summaries. Again, my understanding of
edge summaries is incomplete. My understanding is that they contain
information to be transmitted along the edges of the call graph. However,
the implementation details are what is making it difficult for me to
understand. Unlike cgraph_nodes which can be encoded and decoded, it seems
to me that cgraph_edges have no encoding nor decoding. This makes it
somewhat hard to say that an information belongs to a given edge.

Looking into ipa-cp / ipa-prop it looks like edge summaries are stored by
first encoding the caller cgraph_node and then iterating over the edges in
the node->callees and node->indirect_calls. However, I know that edges can
be removed. That's why there are edge removal hooks. Shouldn't this affect
the information that is written if an edge was removed somewhere else?
Something that is not explicit, is also that the order in the node->callees
and node->indirect_calls is preserved from the moment the summary is
written to the moment the summary is read. Is this the case?

My understanding is also limited from the fact that ipa-cp does not have a
proper write_summary nor read_summary stages in its own class and instead
ipa-cp's summaries are written and read during ipa-prop. Is this an
implementation detail so that the edge information created for ipa-cp is
propagated correctly during inlining? (Or is there some other explanation
for this?) Would it also makes sense to write and read my edge summaries at
the same time as ipa-cp's edge summaries? I know that the base class
(call_summary) also has virtual methods for removal and duplication of
edges (which should be triggered during inlining / cleaning up of the
callgraph). But somehow the virtual methods ::remove and ::duplicate seem
different to me from the callbacks which need to be registered in the
IPA_PASSes. Again, why would one need to write and read ipa-cp's summaries
during ipa-prop if these callbacks exist which are triggered when summaries
need to be updated?

Thanks!

-Erick


Creating a wrapper around a function at compile time

2022-07-14 Thread Erick Ochoa via Gcc
Hello,

I'm looking for some help in how to create a new function at compile time /
link time. The idea is an alternative form of constant propagation.

The current implementation of ipa-cp, may specialize functions for which
arguments may be known at compile time. Call graph edges from the caller to
the new specialized functions will replace the old call graph edges from
the caller to the original functions. Call graph edges which have no known
compile time constants will still point to the original unspecialized
function.

I would like to explore a different approach to function specialization.
Instead of only specializing functions which are guaranteed to have a
compile time constant, I would like to also attempt to specialize the edges
which do not have compile time constants with a parameter test. In other
words, for call graph edges with non-constant arguments at compile time,
create a wrapper function around the original function and do a switch
statement around parameters.

For example, let's say we have a function mul, which multiplies two
integers.

int
mul (int a, int b) {
  return a * b;
}

Function mul is called from three different callsites in the whole program:

A: mul (a, 2);
B: mul (b, 4);
C: mul (c, d);

At the moment, ipa-cp might specialize mul into 3 different versions:

// unoptimized original mul
int
mul (int a, int b) {
  return a * b;
}

// optimized for b = 2;
int
mul.constprop1 (int a) {
  // DEBUG b => 2
  return a << 1;
}

// optimized for b = 4;
int
mul.constprop2 (int a) {
  // DEBUG b => 4
  return a << 2;
}

and change the callsites to:

A: mul.constprop1 (a);
B: mul.constprop2 (b);
C: mul (c, d);

I would like instead to do the following:

Create a function mul_test_param

int
mul_test_param (int a, int b) {
  switch (b)
  {
case 2:
  return mul.constprop1 (a);
  break;
case 4:
  return mul.constprop2 (a);
  break;
default:
  return mul (a, b);
  break;
  }
}

The function mul_test_param will test each parameter and then call the
specialized function. The callsites can either be changed to:

A: mul.constprop1 (a);
B: mul.constprop2 (b);
C: mul_test_param (c, d);

or

A: mul_test_param (a, 2);
B: mul_test_param (b, 4);
C: mul_test_param (c, d);

The idea is that there exist some class of functions for which the
parameter test and the specialized version is less expensive than the
original function version. And if, at runtime, d might be a quasi-constant
with a good likelihood of being either 2 or 4, then it makes sense to have
this parameter test.

This is very similar to function tests for making direct to indirect
functions and to what could be done in value profiling.

I already know how to achieve most of this, but I have never created a
function from scratch. That is the bit that is challenging to me at the
moment. Any help is appreciated.

Thanks!

-Erick


Re: Question about speculative make_edge_direct_to_target during LTO/IPA_PASS

2022-07-14 Thread Erick Ochoa via Gcc
Hi Martin,

thanks a lot for your help! You were right!  I am now able to call
make_edge_direct_to_target during WPA.

-Erick


Re: Creating a wrapper around a function at compile time

2022-07-14 Thread Erick Ochoa via Gcc
Hi Richard,


>
> > So instead of wrapping the function why not transform the original
> function
> > to have a prologue doing a runtime check for the compile-time specialized
> > versions and perform tail-calls to them?
> >
> > What I'm missing is who would call mul_test_param in your case?
>

The idea is that all callsites to mul may call instead mul_test_param or
only those for which there is no known compile time constant. If we had
some more information about the distribution of the parameter values at
runtime, then we could even choose not to use the wrapper.


> Following your variant more closely would be doing value profiling
> of function parameters and then "speculative IPA-CP".
>

Yes, the idea of doing value profiling on function parameters certainly
fits this very well. I refrained from calling it "speculative IPA-CP" and
instead called it specialization with a parameter test but the idea I think
is the same. However, the main difference between "speculative IPA-CP" and
this idea is that (if I understand correctly how speculative IPA-CP works)
is that instead of changing the callsite C to the following version:

C: mul_test_param (c, d);

speculative IPA-CP will have the transformation

C: if (a == 2) mul.constprop1 (a)
else if (a == 4) mul.constprop2 (a)
else mul (a, b)

Which depending on how many non compile-time constant callsites there are,
will increase the size of the binary. That's why I prefer the wrapper
around the function. But this would be essentially inlining the wrapper.

As of now, the only "speculative IPA-CP" that I've seen is the speculation
on the target of the indirect edge. I could look into exactly how this
mechanism works to try to instead speculate on an arbitrary argument. Do
you think that would be a good first step instead of creating a wrapper and
replacing the edges to the wrapper?


Re: Creating a wrapper around a function at compile time

2022-07-14 Thread Erick Ochoa via Gcc
On Thu, 14 Jul 2022 at 15:51, Richard Biener 
wrote:

> With a value-profile it would be per call site and IPA-CP can use that
> to direct the cloning.  I'm not sure how many
> values a value histogram can track reliably here.
>

Last time I checked, value profiling can only track a single value per
statement. So that would need to be augmented if we wanted to specialize
for more than one parameter. However, this can also be a bad idea. Because
even if parameters a and b are quasi-constants, perhaps the cross product a
x b follows a more uniform distribution. This means that optimizing on a or
on b might be a good idea but optimizing on a x b is a bad idea. (There is
still some work to be done defining when specializing is a good idea or
not). Also, I do not believe that value profiling profiles arguments?
According to the comments only values regarding division and/or
indirect/virtual call specialization are tracked during value profiling.

However, even if value profiling would be a great addition to this, I would
like to explore the transformation independent of value profiling. There
are some heuristics that could be used, for example looking at the
callsites which do have constant arguments and which optimizations are
enabled by those constants.

You mention that: "IPA-CP can use that to direct the cloning". Can you
elaborate a bit further? I guess the idea here is augmenting ipa-cp with a
list of "likely values" and extend the infrastructure which deals with
speculation to support arguments. Am I following correctly? Any other
suggestions?


Re: Creating a wrapper around a function at compile time

2022-07-14 Thread Erick Ochoa via Gcc
On Thu, 14 Jul 2022 at 16:10, Martin Liška  wrote:

> On 7/14/22 16:08, Erick Ochoa via Gcc wrote:
> > Last time I checked, value profiling can only track a single value per
> > statement.
>
> Hi.
>
> Take a look at HIST_TYPE_INDIR_CALL which we use for tracking at maximum 32
> (#define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32) and we use for indirect call
> speculative calls which you can see for instance here:
>
> ./gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C
>

Thanks Martin,

I'll give it a read. However, I have mis-spoken. If my understanding is
correct: multiple values are tracked, but only the values of a single
variable/expression per statement are tracked. That means that for a gcall
(which is a single statement and) which has n argument expressions, I
believe that the naive way to track all argument expressions is not
possible without extending how histograms are associated to statements.
Perhaps canonicalizing how callsites work (i.e., only variables are allowed
as arguments in call sites and then associating a histogram to the
definition of the variables being used in call sites) would be enough, but
I haven't given it much thought for the consequences that might follow from
this.


>
> Cheers,
> Martin
>


Re: Creating a wrapper around a function at compile time

2022-07-15 Thread Erick Ochoa via Gcc
Awesome! Thanks!

Let's go a little bit into the transformation mechanics itself. I am
somewhat familiar with LTO but I am always confused about the
transformations scheduled during WPA.

Let's assume that:

1. we have the profiling pass, which profiles each argument in each
callsite.
2. we are now running in the -fprofile-use so that means that we have
access to the value profiles for each of the arguments
3. there's an IPA/LTO pass which creates edge summaries storing "likely"
values for each of callsite/argument
4. we have a heuristic that let's us decide to some extent which subset of
likely values will yield good results (heavily used and specialization is
greatly beneficial if found) and run it during WPA.

Then, what happens next? I know that the idea is to create some parameter
tests at the callsite and call the respective parameter. The gap in my
understanding is that at this point I am assuming we are in WPA and
therefore have no function bodies to add these parameter tests. Would it be
sufficient to do the following:

1. Store in optimization (edge) summaries the (argument position x value x
specialized function version)
2. During LTRANS actually perform the transformation for the parameter test

Or is there some other way? I am still working on understanding how
ipa_make_edge_direct_to_target with speculative = true adds the edge and
function test during WPA. Otherwise, if we go with the two steps above, the
edges won't be visible to the rest of the IPA optimizations.

I'll be reading the cgraph_edge::make_speculative and the rest of the
infrastructure a bit more closely...

Any help or direction is greatly appreciated. Thanks!

-Erick