Question on special constraint variables in points-to analysis

Erick Ochoa via Gcc Tue, 16 Mar 2021 05:15:21 -0700

Hello,

I'm currently working on improving my understanding of the implementation
of the intraprocedural points-to analysis in GCC. I have already read the
papers by Daniel Berlin and have been looking at the source for some time,
but I understand that those references are a little bit old and might not
fully reflect the status of the current points-to analysis.


In order to familiarize myself with the analysis I decided to take a look
at the constraints dumped in the logs and input them into a different
solver which also implements a points-to analysis. I do this during GCC's
run time. In other words, before GCC computes the points-to analysis using
the regular intraprocedural points-to analysis, I take a look at the
constraint vector and translate them to an interface the other solver can
understand. Therefore, after this other solver finished, there are two sets
of solutions for the points-to analysis.

In order to verify the correctness of my analysis I simply iterate over
GCC's solution and place an assertion to verify that both points-to sets
for each variable are equal. The solution computed by my implementation
solves the constraints by abstracting the varinfo into integers by using
the varinfo id. And then the relationship between points-to variables are
expressed as variable A id points to variable B id. Since the variable info
ids are just offsets into the varinfo array, then there should be a one to
one mapping between the solutions in this solver to the solutions computed
by GCC.

Its implemented something like the following:

```
    for (auto output : points_to_analysis) {
       int from_i; // pointer variable index in get_varinfo(from_i)
       int to_i; // point-ee variable index in get_varinfo(toi)

       if(!get_varinfo(from_i)->may_have_pointers) continue;
       bitmap vars = get_varinfo(from_i)->solution;
       if (!vars) continue;

       if (dump_file) fprintf(dump_file, "%s -> %s\n",
get_varinfo(from_i)->name, get_varinfo(to_i)->name);

       bool same = bitmap_bit_p(vars, to_i);
       if (dump_file) fprintf(dump_file, "SAME = %s\n", same ? "true" :
"false");
       gcc_assert(same);

    }
```

This means that if the two analyses were equal, all test cases testing
-ftree-pta should succeed. At the moment, the assertion is being triggered.
Now, I would like to understand why the assertion is triggered without
asking you to debug my code, so instead I'm trying to understand why GCC's
solution is correct. (I gave all this exposition because I may be making
assumptions that are incorrect and that can quickly be verified by someone
more knowledgeable.) Now, I'll focus on the test case:

The code in question is a slightly modified version of the test case
ipa-pta-16.c
```
struct X
{
  long l1;
  struct Y
    {
      long l2;
      int *p;
    } y;
};
int i;
static int __attribute__((noinline))
foo (struct X *x)
{
  struct Y y = x->y;
  *y.p = 0;
  i = 1;
  return *y.p;
}
extern void abort (void);
int main()
{
  struct X x;
  struct X t;
  x.y.p = &i;
  long* e = &t.l1;
  if (foo(&x) != 1)
    abort ();
  return e;
}
```

and is being compiled with the following flags:

gcc -O2 -ftree-pta -ftree-pta-external-solver -fdump-tree-alias
-ftree-salias ipa-pta-16.c

I would like to focus on the foo function, which is translated to the
following gimple:
```
__attribute__((noinline))
int foo.constprop.isra (int * ISRA.4)
{
  int * y$p;
  struct X * x;
  struct X * x;
  int _2;

  <bb 2> [local count: 1073741824]:
  *ISRA.4_3(D) = 0;
  i = 1;
  _2 = *ISRA.4_3(D);
  return _2;

}
```

GCC generates the following constraints (as seen on the dump):

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
i = NONLOCAL
i = &NONLOCAL
ESCAPED = &NONLOCAL
_2 = *ISRA.4
```

The points-to set generated by GCC are the following:

```
ANYTHING = { ANYTHING }
ESCAPED = { NULL ESCAPED NONLOCAL }
NONLOCAL = { ESCAPED NONLOCAL } same as i
STOREDANYTHING = { }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
i = { ESCAPED NONLOCAL }
_2 = { ESCAPED NONLOCAL }
foo.constprop.0.isra.0 = { }
```

I also noticed that there is more information about ISRA.4 which is not
printed in the points-to sets printed above. Instead, the dump continues
and adds this line:

```
Flow-insensitive points-to information

ISRA.4_3(D), points-to non-local, points-to NULL, points-to vars: { }
```

Now, I understand that the first set of solutions (i.e, `{ NONLOCAL }`) is
the solution of the constraint variable. That is, the constraint variable
ISRA.4 when assigned the set `{ NONLOCAL }` is a solution to the
constraints... I also understand that ISRA.4_3(D) is the SSA variable on
the gimple code. There is technically a mapping between the two, but I
guess there's some distinction. I also understand that the first dump is
generated from the function dump_solution_for_var and just looks for a set
bit.

```
  EXECUTE_IF_SET_IN_BITMAP (vi->solution, 0, i, bi)
    fprintf (file, "%s ", get_varinfo (i)->name);
```

The second dump is generated from dump_points_to_solution
```
  if (pt->null)
    fprintf (file, ", points-to NULL");
```

which looks into the points to solution struct.

My question here is, why is NULL not in the solution in the bitmap but
instead is in this special field? Probably for optimizations somehow. But,
wouldn't removing this bit from the constraint variable could somehow
affect what solution gets propagated to other variables?

But let's go with the idea that ISRA.4 does not point to NULL as given by
the points-to set dumped by GCC. Is this correct? We can try to compute the
analysis by hand by solving the constraints according to Anderson's
points-to analysis... I think we get that NONLOCAL points to NULL (which is
not in the dump).

derefaddrtmp(9) = &NULL
ISRA.4 = &NONLOCAL
*ISRA.4 = derefaddrtmp(9)

So, how I understand it, there are 4 nodes in this subgraph of the
constraint graph and the first two facts gives us the following relations:

1. derefaddrtmp(9) points to NULL
2. ISRA.4 points to NONLOCAL

The third constraint says to propagate the subset constraint to all
variables which are in the ISRA.4 set. So effectively we are creating a new
edge from NONLOCAL to deref that says that whatever deref is pointing to,
now NONLOCAL is pointing to as well. Basically, NONLOCAL now points to NULL.

But as I mentioned, the NULL does not appear on the first dump for the
variable NONLOCAL.

```
NONLOCAL = { ESCAPED NONLOCAL } same as i
```

I think all of this is not a bug, it is just some conventions that GCC
makes because NULL is a special variable, but:

* I would like some reassurance that this is indeed the case of treating
some constraint variables in a special way
* Some guidance on how to extend my solver to also treat the special
variables in the same way (in order to generate exactly the same solution).
* Some idea of what else I can expect to be handled somewhat differently
that I cannot foresee at the moment.

Thanks!

Question on special constraint variables in points-to analysis

Reply via email to