Failing in generated file options.c
Hi Gary, the options.c file is generated from the gcc/common.opt file. Report was a keyword that could be given to optimization options but which was dropped sometime in December I think. The patches I sent you should have the keyword Report dropped. Are you applying your sources already? If not, are you only applying a subset of the patches? If you look at the patch numbered 9, you will see that the Report keyword was removed. If you are only applying patches before the 9th, yeah I would expect that error to trigger. -Erick
Question on special constraint variables in points-to analysis
Hello, I'm currently working on improving my understanding of the implementation of the intraprocedural points-to analysis in GCC. I have already read the papers by Daniel Berlin and have been looking at the source for some time, but I understand that those references are a little bit old and might not fully reflect the status of the current points-to analysis. In order to familiarize myself with the analysis I decided to take a look at the constraints dumped in the logs and input them into a different solver which also implements a points-to analysis. I do this during GCC's run time. In other words, before GCC computes the points-to analysis using the regular intraprocedural points-to analysis, I take a look at the constraint vector and translate them to an interface the other solver can understand. Therefore, after this other solver finished, there are two sets of solutions for the points-to analysis. In order to verify the correctness of my analysis I simply iterate over GCC's solution and place an assertion to verify that both points-to sets for each variable are equal. The solution computed by my implementation solves the constraints by abstracting the varinfo into integers by using the varinfo id. And then the relationship between points-to variables are expressed as variable A id points to variable B id. Since the variable info ids are just offsets into the varinfo array, then there should be a one to one mapping between the solutions in this solver to the solutions computed by GCC. Its implemented something like the following: ``` for (auto output : points_to_analysis) { int from_i; // pointer variable index in get_varinfo(from_i) int to_i; // point-ee variable index in get_varinfo(toi) if(!get_varinfo(from_i)->may_have_pointers) continue; bitmap vars = get_varinfo(from_i)->solution; if (!vars) continue; if (dump_file) fprintf(dump_file, "%s -> %s\n", get_varinfo(from_i)->name, get_varinfo(to_i)->name); bool same = bitmap_bit_p(vars, to_i); if (dump_file) fprintf(dump_file, "SAME = %s\n", same ? "true" : "false"); gcc_assert(same); } ``` This means that if the two analyses were equal, all test cases testing -ftree-pta should succeed. At the moment, the assertion is being triggered. Now, I would like to understand why the assertion is triggered without asking you to debug my code, so instead I'm trying to understand why GCC's solution is correct. (I gave all this exposition because I may be making assumptions that are incorrect and that can quickly be verified by someone more knowledgeable.) Now, I'll focus on the test case: The code in question is a slightly modified version of the test case ipa-pta-16.c ``` struct X { long l1; struct Y { long l2; int *p; } y; }; int i; static int __attribute__((noinline)) foo (struct X *x) { struct Y y = x->y; *y.p = 0; i = 1; return *y.p; } extern void abort (void); int main() { struct X x; struct X t; x.y.p = &i; long* e = &t.l1; if (foo(&x) != 1) abort (); return e; } ``` and is being compiled with the following flags: gcc -O2 -ftree-pta -ftree-pta-external-solver -fdump-tree-alias -ftree-salias ipa-pta-16.c I would like to focus on the foo function, which is translated to the following gimple: ``` __attribute__((noinline)) int foo.constprop.isra (int * ISRA.4) { int * y$p; struct X * x; struct X * x; int _2; [local count: 1073741824]: *ISRA.4_3(D) = 0; i = 1; _2 = *ISRA.4_3(D); return _2; } ``` GCC generates the following constraints (as seen on the dump): ``` ANYTHING = &ANYTHING ESCAPED = *ESCAPED ESCAPED = ESCAPED + UNKNOWN *ESCAPED = NONLOCAL NONLOCAL = &NONLOCAL NONLOCAL = &ESCAPED INTEGER = &ANYTHING ISRA.4 = &NONLOCAL derefaddrtmp(9) = &NULL *ISRA.4 = derefaddrtmp(9) i = NONLOCAL i = &NONLOCAL ESCAPED = &NONLOCAL _2 = *ISRA.4 ``` The points-to set generated by GCC are the following: ``` ANYTHING = { ANYTHING } ESCAPED = { NULL ESCAPED NONLOCAL } NONLOCAL = { ESCAPED NONLOCAL } same as i STOREDANYTHING = { } INTEGER = { ANYTHING } ISRA.4 = { NONLOCAL } derefaddrtmp(9) = { NULL } i = { ESCAPED NONLOCAL } _2 = { ESCAPED NONLOCAL } foo.constprop.0.isra.0 = { } ``` I also noticed that there is more information about ISRA.4 which is not printed in the points-to sets printed above. Instead, the dump continues and adds this line: ``` Flow-insensitive points-to information ISRA.4_3(D), points-to non-local, points-to NULL, points-to vars: { } ``` Now, I understand that the first set of solutions (i.e, `{ NONLOCAL }`) is the solution of the constraint variable. That is, the constraint variable ISRA.4 when assigned the set `{ NONLOCAL }` is a solution to the constraints... I also understand that ISRA.4_3(D) is the SSA variable on the gimple code. There is technically a mapping between the two, but I guess there's some distinction. I also understand that the first dump is generated from the function dump_so
More questions on points-to analysis
Hello, I'm still trying to compare the solution generated from the intraprocedural points-to analysis in GCC against an external solver. Yesterday it was pointed out that "NULL is not conservatively correctly represented in the constraints". Can someone expand on this? To me this sounds like a couple of things: * even though malloc may return NULL, NULL is not added to the points-to sets of whatever variable is on the left hand side of the malloc call. * the process in GCC that generates the constraints for NULL somehow does not generate enough constraints to treat NULL conservatively and therefore there might be points-to sets which should contain NULL but don't. (However, doesn't this mean that feeding the constraints to an external solver should still give the same answers?) * the process in GCC that generates the constraints for NULL works fine (i.e., feeding the constraints generated by GCC to an external solver should yield a conservatively correct answer) but the process that solves the constraints relaxes the solutions for the NULL constraint variable (i.e., GCC has deviated from the constraint solving algorithm somehow) Also, "at some point we decided to encode optimistic info into pt->null which means points-to now has to compute a conservatively correct pt->null." Doesn't this contradict itself? How is a pt->null first optimistically and now conservatively? Is what this is trying to say that: * NULL constraints were conservative first * pt->null optimistic first * Then conversion to SSA happened and NULL constraints became not conservatively represented in the constraints (effectively becoming somewhat optimistic) * To avoid NULL and pt->null be both unsafe, pt->null was changed to be conservative I've been looking at find_what_vars_points_to and have changed my code which verifies the constraint points-to sets. Basically, I now find which variables have been collapsed and only for "real" constraint pointer variables I take a look at the points to solution struct. Before looking into vars, I take a look at the fields and compare the null, anything, escape, etc, against the id of the pointee-variable. Checking vars is slightly confusing for me at the moment, since it appears that there are at least 3 plausible ways of validating the solution (I haven't actually gotten there because assertions are being triggered). ``` for (auto &output : *orel) { int from_i; int to_i; // Since find_what_var_points_to // doesn't change the solution for collapsed // variables, only verify the answer for the real ones. varinfo_t from_var = get_varinfo(from_i); varinfo_t vi = get_varinfo (find (from_i)); if (from_var->id != vi->id) continue; if (!from_var->may_have_pointers) continue; // compute the pt_solution pt_solution solution = find_what_var_points_to (cfun->decl, from_var); // pointee variable according to external analysis varinfo_t vi_to = get_varinfo(to_i); // Since some artificial variables are stored in fields instead of the bitset // assert based on field values. // However you can see that I already had to disable some of the assertions. if (vi_to->is_artificial_var) { if (vi_to->id == nothing_id) { gcc_assert(solution.null && vi_to->id == nothing_id); continue; } else if (vi_to->id == escaped_id) { if (in_ipa_mode) { gcc_assert(solution.ipa_escaped && vi_to->id == escaped_id); } else { //gcc_assert(solution.escaped && vi_to->id == escaped_id); } continue; /* Expand some special vars of ESCAPED in-place here. ??*/ } // More... } if (solution.anything) continue; bitmap vars = solution.vars; if (!vars) continue; if (dump_file) fprintf(dump_file, "SAME = %s\n", bitmap_bit_p(vars, DECL_PT_UID(vi_to->decl)) ? "true" : "false"); if (dump_file) fprintf(dump_file, "SAME2 = %s\n", bitmap_bit_p(vars, to_i) ? "true" : "false"); if (dump_file) fprintf(dump_file, "SAME3 = %s\n", bitmap_bit_p(from_var->solution, to_i) ? "true" : "false"); ``` Can someone help me figure out why even though I have a "real" variable and I compute its solution with the "find_what_var_points_to" method the solution does not have the fields that I expect to be set? (I would expect solution.escaped to be escaped if the pointee variable vi_to has an id = escaped_id). And also, how is the DECL_PT_UID different from the varinfo id field? Shouldn't they be the same? It seems that during "find_what_var_points_to" DECL_PT_UID is being used to set the bit in the bitmap, but in previous instances it was the varinfo id offset?
Re: More questions on points-to analysis
Hi Richard, I think I misunderstood yesterday's answer and deviated a little bit. But now I want to focus on this: > > * the process in GCC that generates the constraints for NULL works > > fine (i.e., feeding the constraints generated by GCC to an external > > solver should yield a conservatively correct answer) but the process > > that solves the constraints relaxes the solutions for the NULL > > constraint variable (i.e., GCC has deviated from the constraint > > solving algorithm somehow) > > No, that part should work OK. > So, let's ignore the other solver for now and instead focus on the concrete example I presented on the previous email. If GCC is solving these constraints: ``` ANYTHING = &ANYTHING ESCAPED = *ESCAPED ESCAPED = ESCAPED + UNKNOWN *ESCAPED = NONLOCAL NONLOCAL = &NONLOCAL NONLOCAL = &ESCAPED INTEGER = &ANYTHING ISRA.4 = &NONLOCAL derefaddrtmp(9) = &NULL *ISRA.4 = derefaddrtmp(9) i = NONLOCAL i = &NONLOCAL ESCAPED = &NONLOCAL _2 = *ISRA.4 ``` What would a hand calculated solution gives us vs what was the solution given by GCC? I am following the algorithm stated on Section 3.3 of Structure Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED + UNKNOWN constraint since there isn't any other field offset that needs to be calculated. First, I want to make some adjustments. I am going to be using "=" to signify the \supseteq symbol and I will be adding curly braces to specify the element in a set as opposed to the whole set. Therefore the constraints will now become (ordered slightly differently): ``` direct contraints ANYTHING = { ANYTHING } ESCAPED = { NONLOCAL } NONLOCAL = { NONLOCAL } NONLOCAL = { ESCAPED } INTEGER = { ANYTHING } ISRA.4 = { NONLOCAL } derefaddrtmp(9) = { NULL } i = { NONLOCAL } complex constraints== ESCAPED = *ESCAPED *ESCAPED = NONLOCAL *ISRA.4 = derefaddrtmp(9) _2 = *ISRA.4 = copy-constraints== ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think it will matter... i = NONLOCAL ``` Solution sets are basically the direct constraints at the moment. Let's now create the graph 1. node ESCAPED has an edge going to itself (due to the copy constraint) 2. node ISRA.4 has no outgoing copy edges 3. node derefaddrtmp(9) has no outgoing edges 4. node _2 has no outgoing edges 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) 6. node NONLOCAL has no outgoing edge Now, we can iterate over this set of nodes 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no edges, but it has complex-constraints. Let's modify the graph. 1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy edge from ESCAPED to NONLOCAL. 2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy edge from NONLOCAL to NONLOCAL The graph is now transformed to 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL 2. node ISRA.4 has no outgoing copy edges 3. node derefaddrtmp(9) has no outgoing edges 4. node _2 has no outgoing edges 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) 6. node NONLOCAL has an edge going to NONLOCAL The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U Sol(NONLOCAL) = {NONLOCAL, ESCAPED} Now we continue walking 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }. There are no edges, but it has complex-constraints. Let's modify the graph. 1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add a copy edge from NONLOCAL to derefaddrtmp(9). The graph is now transformed to 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL 2. node ISRA.4 has no outgoing copy edges 3. node derefaddrtmp(9) has no outgoing edges 4. node _2 has no outgoing edges 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9) The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL, ESCAPED, NULL}. Now I could continue, but here is already something that is not shown in the points-to sets in the dump. It shows that NONLOCAL = {NONLOCAL, ESCAPED, NULL} Looking at the data that I showed yesterday: ``` NONLOCAL = { ESCAPED NONLOCAL } same as i ``` we see that NULL is not in the solution set of NONLOCAL. Now, yesterday you said that NULL is not conservatively correctly represented in the constraints. You also said today the points-to analysis should be solving the constraints fine. What I now understand from this is that while NULL may be pointed to by some constraints, it doesn't mean that not being on the set means that a pointer will not point to NULL. However, it should still be shown in the dumps where the points-to sets are shown for the constraint variables since it is solved using the same analysis? Is this correct? Am I doing the points to analysis by hand wrong somehow? Why would NULL not be in Sol(NONLOCAL) if it is solving the same constraints that I am solving by hand?
Re: More questions on points-to analysis
Mm... ignore this for now please, I think I messed up the analysis by hand. I will try again. Thanks! On Wed, 17 Mar 2021 at 16:16, Erick Ochoa wrote: > > Hi Richard, I think I misunderstood yesterday's answer and deviated a > little bit. But now I want to focus on this: > > > > * the process in GCC that generates the constraints for NULL works > > > fine (i.e., feeding the constraints generated by GCC to an external > > > solver should yield a conservatively correct answer) but the process > > > that solves the constraints relaxes the solutions for the NULL > > > constraint variable (i.e., GCC has deviated from the constraint > > > solving algorithm somehow) > > > > No, that part should work OK. > > > > So, let's ignore the other solver for now and instead focus on the > concrete example I presented on the previous email. If GCC is solving > these constraints: > > ``` > ANYTHING = &ANYTHING > ESCAPED = *ESCAPED > ESCAPED = ESCAPED + UNKNOWN > *ESCAPED = NONLOCAL > NONLOCAL = &NONLOCAL > NONLOCAL = &ESCAPED > INTEGER = &ANYTHING > ISRA.4 = &NONLOCAL > derefaddrtmp(9) = &NULL > *ISRA.4 = derefaddrtmp(9) > i = NONLOCAL > i = &NONLOCAL > ESCAPED = &NONLOCAL > _2 = *ISRA.4 > ``` > > What would a hand calculated solution gives us vs what was the > solution given by GCC? > > I am following the algorithm stated on Section 3.3 of Structure > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED + > UNKNOWN constraint since there isn't any other field offset that needs > to be calculated. > > First, I want to make some adjustments. I am going to be using "=" to > signify the \supseteq symbol and I will be adding curly braces to > specify the element in a set as opposed to the whole set. Therefore > the constraints will now become (ordered slightly differently): > > ``` > direct contraints > ANYTHING = { ANYTHING } > ESCAPED = { NONLOCAL } > NONLOCAL = { NONLOCAL } > NONLOCAL = { ESCAPED } > INTEGER = { ANYTHING } > ISRA.4 = { NONLOCAL } > derefaddrtmp(9) = { NULL } > i = { NONLOCAL } > > complex constraints== > ESCAPED = *ESCAPED > *ESCAPED = NONLOCAL > *ISRA.4 = derefaddrtmp(9) > _2 = *ISRA.4 > > = copy-constraints== > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think > it will matter... > i = NONLOCAL > ``` > > Solution sets are basically the direct constraints at the moment. > > Let's now create the graph > > 1. node ESCAPED has an edge going to itself (due to the copy constraint) > 2. node ISRA.4 has no outgoing copy edges > 3. node derefaddrtmp(9) has no outgoing edges > 4. node _2 has no outgoing edges > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > 6. node NONLOCAL has no outgoing edge > > Now, we can iterate over this set of nodes > > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no > edges, but it has complex-constraints. Let's modify the graph. > 1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy > edge from ESCAPED to NONLOCAL. > 2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy > edge from NONLOCAL to NONLOCAL > > The graph is now transformed to > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL > 2. node ISRA.4 has no outgoing copy edges > 3. node derefaddrtmp(9) has no outgoing edges > 4. node _2 has no outgoing edges > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > 6. node NONLOCAL has an edge going to NONLOCAL > > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U > Sol(NONLOCAL) = {NONLOCAL, ESCAPED} > > Now we continue walking > > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }. > There are no edges, but it has complex-constraints. Let's modify the > graph. > 1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add > a copy edge from NONLOCAL to derefaddrtmp(9). > > The graph is now transformed to > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL > 2. node ISRA.4 has no outgoing copy edges > 3. node derefaddrtmp(9) has no outgoing edges > 4. node _2 has no outgoing edges > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9) > > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL, > ESCAPED, NULL}. > > Now I could continue, but here is already something that is not shown > in the points-to sets in the dump. It shows that > > NONLOCAL = {NONLOCAL, ESCAPED, NULL} > > Looking at the data that I showed yesterday: > > ``` > NONLOCAL = { ESCAPED NONLOCAL } same as i > ``` > > we see that NULL is not in the solution set of NONLOCAL. > > Now, yesterday you said that NULL is not conservatively correctly > represented in the constraints. You also said today the points-to > analysis should be solving the constraints fine. What I now understand > from this is that while NULL may be pointed to by some constraints, it > doesn't mean that not being on the set means that a
Re: More questions on points-to analysis
No, I think it is correct. Any help would be clearly appreciated. Thanks! (Even if I'm wrong, of course I'd like to know.) On Wed, 17 Mar 2021 at 17:21, Erick Ochoa wrote: > > Mm... ignore this for now please, I think I messed up the analysis by > hand. I will try again. Thanks! > > On Wed, 17 Mar 2021 at 16:16, Erick Ochoa wrote: > > > > Hi Richard, I think I misunderstood yesterday's answer and deviated a > > little bit. But now I want to focus on this: > > > > > > * the process in GCC that generates the constraints for NULL works > > > > fine (i.e., feeding the constraints generated by GCC to an external > > > > solver should yield a conservatively correct answer) but the process > > > > that solves the constraints relaxes the solutions for the NULL > > > > constraint variable (i.e., GCC has deviated from the constraint > > > > solving algorithm somehow) > > > > > > No, that part should work OK. > > > > > > > So, let's ignore the other solver for now and instead focus on the > > concrete example I presented on the previous email. If GCC is solving > > these constraints: > > > > ``` > > ANYTHING = &ANYTHING > > ESCAPED = *ESCAPED > > ESCAPED = ESCAPED + UNKNOWN > > *ESCAPED = NONLOCAL > > NONLOCAL = &NONLOCAL > > NONLOCAL = &ESCAPED > > INTEGER = &ANYTHING > > ISRA.4 = &NONLOCAL > > derefaddrtmp(9) = &NULL > > *ISRA.4 = derefaddrtmp(9) > > i = NONLOCAL > > i = &NONLOCAL > > ESCAPED = &NONLOCAL > > _2 = *ISRA.4 > > ``` > > > > What would a hand calculated solution gives us vs what was the > > solution given by GCC? > > > > I am following the algorithm stated on Section 3.3 of Structure > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED + > > UNKNOWN constraint since there isn't any other field offset that needs > > to be calculated. > > > > First, I want to make some adjustments. I am going to be using "=" to > > signify the \supseteq symbol and I will be adding curly braces to > > specify the element in a set as opposed to the whole set. Therefore > > the constraints will now become (ordered slightly differently): > > > > ``` > > direct contraints > > ANYTHING = { ANYTHING } > > ESCAPED = { NONLOCAL } > > NONLOCAL = { NONLOCAL } > > NONLOCAL = { ESCAPED } > > INTEGER = { ANYTHING } > > ISRA.4 = { NONLOCAL } > > derefaddrtmp(9) = { NULL } > > i = { NONLOCAL } > > > > complex constraints== > > ESCAPED = *ESCAPED > > *ESCAPED = NONLOCAL > > *ISRA.4 = derefaddrtmp(9) > > _2 = *ISRA.4 > > > > = copy-constraints== > > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think > > it will matter... > > i = NONLOCAL > > ``` > > > > Solution sets are basically the direct constraints at the moment. > > > > Let's now create the graph > > > > 1. node ESCAPED has an edge going to itself (due to the copy constraint) > > 2. node ISRA.4 has no outgoing copy edges > > 3. node derefaddrtmp(9) has no outgoing edges > > 4. node _2 has no outgoing edges > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > > 6. node NONLOCAL has no outgoing edge > > > > Now, we can iterate over this set of nodes > > > > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no > > edges, but it has complex-constraints. Let's modify the graph. > > 1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy > > edge from ESCAPED to NONLOCAL. > > 2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy > > edge from NONLOCAL to NONLOCAL > > > > The graph is now transformed to > > > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL > > 2. node ISRA.4 has no outgoing copy edges > > 3. node derefaddrtmp(9) has no outgoing edges > > 4. node _2 has no outgoing edges > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > > 6. node NONLOCAL has an edge going to NONLOCAL > > > > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U > > Sol(NONLOCAL) = {NONLOCAL, ESCAPED} > > > > Now we continue walking > > > > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }. > > There are no edges, but it has complex-constraints. Let's modify the > > graph. > > 1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add > > a copy edge from NONLOCAL to derefaddrtmp(9). > > > > The graph is now transformed to > > > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL > > 2. node ISRA.4 has no outgoing copy edges > > 3. node derefaddrtmp(9) has no outgoing edges > > 4. node _2 has no outgoing edges > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9) > > > > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL, > > ESCAPED, NULL}. > > > > Now I could continue, but here is already something that is not shown > > in the points-to sets in the dump. It shows that > > > > NONLOCAL = {NONLOCAL, ESCAPED, NULL} > > > > Looking at the data that I showed yesterday: > > > > ``` > >
Re: More questions on points-to analysis
Perfect, thank you! This is exactly the answer that I was looking for. On Thu, 18 Mar 2021 at 13:27, Richard Biener wrote: > > On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa wrote: > > > > Hi Richard, I think I misunderstood yesterday's answer and deviated a > > little bit. But now I want to focus on this: > > > > > > * the process in GCC that generates the constraints for NULL works > > > > fine (i.e., feeding the constraints generated by GCC to an external > > > > solver should yield a conservatively correct answer) but the process > > > > that solves the constraints relaxes the solutions for the NULL > > > > constraint variable (i.e., GCC has deviated from the constraint > > > > solving algorithm somehow) > > > > > > No, that part should work OK. > > > > > > > So, let's ignore the other solver for now and instead focus on the > > concrete example I presented on the previous email. If GCC is solving > > these constraints: > > > > ``` > > ANYTHING = &ANYTHING > > ESCAPED = *ESCAPED > > ESCAPED = ESCAPED + UNKNOWN > > *ESCAPED = NONLOCAL > > NONLOCAL = &NONLOCAL > > NONLOCAL = &ESCAPED > > INTEGER = &ANYTHING > > ISRA.4 = &NONLOCAL > > derefaddrtmp(9) = &NULL > > *ISRA.4 = derefaddrtmp(9) > > i = NONLOCAL > > i = &NONLOCAL > > ESCAPED = &NONLOCAL > > _2 = *ISRA.4 > > ``` > > > > What would a hand calculated solution gives us vs what was the > > solution given by GCC? > > > > I am following the algorithm stated on Section 3.3 of Structure > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED + > > UNKNOWN constraint since there isn't any other field offset that needs > > to be calculated. > > > > First, I want to make some adjustments. I am going to be using "=" to > > signify the \supseteq symbol and I will be adding curly braces to > > specify the element in a set as opposed to the whole set. Therefore > > the constraints will now become (ordered slightly differently): > > > > ``` > > direct contraints > > ANYTHING = { ANYTHING } > > ESCAPED = { NONLOCAL } > > NONLOCAL = { NONLOCAL } > > NONLOCAL = { ESCAPED } > > INTEGER = { ANYTHING } > > ISRA.4 = { NONLOCAL } > > derefaddrtmp(9) = { NULL } > > i = { NONLOCAL } > > > > complex constraints== > > ESCAPED = *ESCAPED > > *ESCAPED = NONLOCAL > > *ISRA.4 = derefaddrtmp(9) > > _2 = *ISRA.4 > > > > = copy-constraints== > > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think > > it will matter... > > i = NONLOCAL > > ``` > > > > Solution sets are basically the direct constraints at the moment. > > > > Let's now create the graph > > > > 1. node ESCAPED has an edge going to itself (due to the copy constraint) > > 2. node ISRA.4 has no outgoing copy edges > > 3. node derefaddrtmp(9) has no outgoing edges > > 4. node _2 has no outgoing edges > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > > 6. node NONLOCAL has no outgoing edge > > > > Now, we can iterate over this set of nodes > > > > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no > > edges, but it has complex-constraints. Let's modify the graph. > > 1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy > > edge from ESCAPED to NONLOCAL. > > 2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy > > edge from NONLOCAL to NONLOCAL > > > > The graph is now transformed to > > > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL > > 2. node ISRA.4 has no outgoing copy edges > > 3. node derefaddrtmp(9) has no outgoing edges > > 4. node _2 has no outgoing edges > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > > 6. node NONLOCAL has an edge going to NONLOCAL > > > > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U > > Sol(NONLOCAL) = {NONLOCAL, ESCAPED} > > > > Now we continue walking > > > > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }. > > There are no edges, but it has complex-constraints. Let's modify the > > graph. > > 1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add > > a copy edge from NONLOCAL to derefaddrtmp(9). > > > > The graph is now transformed to > > > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL > > 2. node ISRA.4 has no outgoing copy edges > > 3. node derefaddrtmp(9) has no outgoing edges > > 4. node _2 has no outgoing edges > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint) > > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9) > > > > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL, > > ESCAPED, NULL}. > > > > Now I could continue, but here is already something that is not shown > > in the points-to sets in the dump. It shows that > > > > NONLOCAL = {NONLOCAL, ESCAPED, NULL} > > > > Looking at the data that I showed yesterday: > > > > ``` > > NONLOCAL = { ESCAPED NONLOCAL } same as i > > ``` > > > > we see that NULL is not in the solution set of NONLOCAL. > > > > Now, yesterday you said that NU
Re: More questions on points-to analysis
Hello Richard, I have a related question. I have the following GCC program (which is again an altered version of ipa-pta-16.c: ``` struct X { long l1; struct Y { long l2; int *p; } y; }; int i; static int __attribute__((noinline)) foo (struct X *x) { struct Y y = x->y; *y.p = 0; //i = 1; return *y.p; } extern void abort (void); int main() { struct X x; struct X t; x.y.p = &i; long* e = &t.l1; if (foo(&x) != 1) abort (); return e; } ``` I am compiling it with gcc -O2 -ftree-pta -fdump-tree-alias -ftree-salias ipa-pta-16.c and I get the following constraints for function foo. ``` ANYTHING = &ANYTHING ESCAPED = *ESCAPED ESCAPED = ESCAPED + UNKNOWN *ESCAPED = NONLOCAL NONLOCAL = &NONLOCAL NONLOCAL = &ESCAPED INTEGER = &ANYTHING ISRA.4 = &NONLOCAL derefaddrtmp(9) = &NULL *ISRA.4 = derefaddrtmp(9) ``` I am just ordering to make this easier: ``` ANYTHING = &ANYTHING NONLOCAL = &NONLOCAL NONLOCAL = &ESCAPED INTEGER = &ANYTHING ISRA.4 = &NONLOCAL derefaddrtmp(9) = &NULL ESCAPED = ESCAPED + UNKNOWN ESCAPED = *ESCAPED *ESCAPED = NONLOCAL *ISRA.4 = derefaddrtmp(9) ``` I now construct a constraint graph. It is basically a collection of unconnected nodes, except for ESCAPED which points to itself. I then walk over this graph and first encounter the constraint "ESCAPED = ESCAPED + UNKNOWN". Since Sol(ESCAPED) is the empty set nothing happens. Similar for the constraints "ESCAPED = *ESCAPED" and "*ESCAPED = NONLOCAL". I then walk to "*ISRA.4 = derefaddrtmp(9)". Sol(ISRA.4) = { NONLOCAL }. Then there's possibly an edge made from NONLOCAL to derefaddrtmp. (I am not sure of the implementation, since NONLOCAL doesn't change it might not require an added edge.). And because NONLOCAL is a special variable, Sol(NONLOCAL) does not change. Now, none of the solutions have changed. We have possibly added an edge to NONLOCAL, so let's just process the graph once more. Again Sol(ESCAPED) is still empty so "ESCAPED = ESCAPED + UNKNOWN", "ESCAPED = *ESCAPED", and "*ESCAPED = NONLOCAL" should have no effect ."Then *ISRA.4 = derefaddrtmp(9)" also has no effect since nothing can happen. My question here is, why if we look at the points-to sets dumped by GCC we see ESCAPED = { NULL } Here is the full points-to sets: computed for foo: ``` ANYTHING = { ANYTHING } ESCAPED = { NULL } NONLOCAL = { ESCAPED NONLOCAL } STOREDANYTHING = { } INTEGER = { ANYTHING } ISRA.4 = { NONLOCAL } derefaddrtmp(9) = { NULL } foo.constprop.0.isra.0 = { } ``` I think this might have to do with "+ UNKNOWN" but I am unsure about its semantics in the constraint set. Thanks, any help is appreciated! On Thu, 18 Mar 2021 at 13:36, Erick Ochoa wrote: > > Perfect, thank you! This is exactly the answer that I was looking for. > > > On Thu, 18 Mar 2021 at 13:27, Richard Biener > wrote: > > > > On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa wrote: > > > > > > Hi Richard, I think I misunderstood yesterday's answer and deviated a > > > little bit. But now I want to focus on this: > > > > > > > > * the process in GCC that generates the constraints for NULL works > > > > > fine (i.e., feeding the constraints generated by GCC to an external > > > > > solver should yield a conservatively correct answer) but the process > > > > > that solves the constraints relaxes the solutions for the NULL > > > > > constraint variable (i.e., GCC has deviated from the constraint > > > > > solving algorithm somehow) > > > > > > > > No, that part should work OK. > > > > > > > > > > So, let's ignore the other solver for now and instead focus on the > > > concrete example I presented on the previous email. If GCC is solving > > > these constraints: > > > > > > ``` > > > ANYTHING = &ANYTHING > > > ESCAPED = *ESCAPED > > > ESCAPED = ESCAPED + UNKNOWN > > > *ESCAPED = NONLOCAL > > > NONLOCAL = &NONLOCAL > > > NONLOCAL = &ESCAPED > > > INTEGER = &ANYTHING > > > ISRA.4 = &NONLOCAL > > > derefaddrtmp(9) = &NULL > > > *ISRA.4 = derefaddrtmp(9) > > > i = NONLOCAL > > > i = &NONLOCAL > > > ESCAPED = &NONLOCAL > > > _2 = *ISRA.4 > > > ``` > > > > > > What would a hand calculated solution gives us vs what was the > > > solution given by GCC? > > > > > > I am following the algorithm stated on Section 3.3 of Structure > > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED + > > > UNKNOWN constraint since there isn't any other field offset that needs > > > to be calculated. > > > > > > First, I want to make some adjustments. I am going to be using "=" to > > > signify the \supseteq symbol and I will be adding curly braces to > > > specify the element in a set as opposed to the whole set. Therefore > > > the constraints will now become (ordered slightly differently): > > > > > > ``` > > > direct contraints > > > ANYTHING = { ANYTHING } > > > ESCAPED = { NONLOCAL } > > > NONLOCAL = { NONLOCAL } > > > NONLOCAL = { ESCAPED } > > > INTEGER = { ANYTHING } > > > ISRA.4 = { NONLOCAL } > > > derefaddrtmp(9) = { NULL } > > > i
Question about reading LTO function summaries
Hello, I already have some experience developing SIMPLE_IPA_PASSes, but I am looking to understand IPA_PASSes better. I have made a hello world ipa pass that stores "hello world $FUNCTION_NAME" in the function summaries; however, I am having trouble reading this information back. Can someone help me understand how to use these interfaces correctly? At the moment, it **seems** to be writing information correctly. (I.e., it doesn't segfault when attempting to write data.) However, in my read summary function (ipa_hello_world_read_summary (void)) the function `lto_get_summary_section_data (file_data, LTO_section_ipa_hello_world, &len);` always returns NULL and `file_data_vec` is of size 1. This means that at run time, there is only one call to `lto_get_summary_section_data` and it returns NULL. I am copy-pasting the relevant code below. /* Map from cgraph_node* to "hello world $FUNCTION_NAME". */ static hash_map *hello_summaries; static void ipa_hello_world_write_summary (void) { gcc_assert(hello_summaries); struct output_block *ob = create_output_block (LTO_section_ipa_hello_world); gcc_assert(ob); if (dump_file) fprintf(dump_file, "hello world from write summary\n"); lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; if (dump_file) fprintf(dump_file, "writing %d\n", hello_summaries->elements()); streamer_write_uhwi (ob, hello_summaries->elements()); for (auto i = hello_summaries->begin(), e = hello_summaries->end(); i != e; ++i) { if (dump_file) fprintf(dump_file, "writing %s\n", (*i).second); streamer_write_uhwi(ob, lto_symtab_encoder_encode(encoder, (*i).first)); streamer_write_string (ob, ob->main_stream, (*i).second, true); } produce_asm (ob, NULL); destroy_output_block (ob); delete hello_summaries; } static void ipa_hello_world_read_summary (void) { struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); struct lto_file_decl_data *file_data; unsigned int j = 0; if (dump_file) fprintf(dump_file, "hello from read summary\n"); while ((file_data = file_data_vec[j++])) { if (dump_file) fprintf(dump_file, "iteration = %d\n", j); size_t len; const char *data = lto_get_summary_section_data (file_data, LTO_section_ipa_hello_world, &len); if (!data) continue; const struct lto_function_header *header = (const struct lto_function_header*) data; gcc_assert(header); gcc_assert(header->cfg_size); const int cfg_offset = sizeof (struct lto_function_header); const int main_offset = cfg_offset + header->cfg_size; const int string_offset = main_offset + header->main_size; class data_in *data_in; lto_input_block ib ((const char *) data + main_offset, header->main_size, file_data->mode_table); data_in = lto_data_in_create (file_data, (const char *) data + string_offset, header->string_size, vNULL); unsigned int n = streamer_read_uhwi (&ib); //hello_summaries = new hash_map; for (unsigned i = 0; i < n; i++) { unsigned int index = streamer_read_uhwi(&ib); lto_symtab_encoder_t encoder = file_data->symtab_node_encoder; struct cgraph_node *cnode = dyn_cast (lto_symtab_encoder_deref(encoder, index)); gcc_assert(cnode); const char* string = streamer_read_string (data_in, &ib); if (dump_file) fprintf(dump_file, string); } } }
Re: Question about reading LTO function summaries
Hello, just trying again to increase visibility of this question. Many thanks in advance! On Fri, 26 Mar 2021 at 13:49, Erick Ochoa wrote: > > Hello, > > I already have some experience developing SIMPLE_IPA_PASSes, but I am > looking to understand IPA_PASSes better. I have made a hello world ipa > pass that stores "hello world $FUNCTION_NAME" in the function > summaries; however, I am having trouble reading this information back. > Can someone help me understand how to use these interfaces correctly? > > At the moment, it **seems** to be writing information correctly. > (I.e., it doesn't segfault when attempting to write data.) However, in > my read summary function (ipa_hello_world_read_summary (void)) the > function `lto_get_summary_section_data (file_data, > LTO_section_ipa_hello_world, &len);` always returns NULL and > `file_data_vec` is of size 1. This means that at run time, there is > only one call to `lto_get_summary_section_data` and it returns NULL. > > I am copy-pasting the relevant code below. > > /* Map from cgraph_node* to "hello world $FUNCTION_NAME". */ > static hash_map *hello_summaries; > > static void > ipa_hello_world_write_summary (void) > { > gcc_assert(hello_summaries); > struct output_block *ob = create_output_block (LTO_section_ipa_hello_world); > gcc_assert(ob); > if (dump_file) fprintf(dump_file, "hello world from write summary\n"); > lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; > if (dump_file) fprintf(dump_file, "writing %d\n", > hello_summaries->elements()); > streamer_write_uhwi (ob, hello_summaries->elements()); > > for (auto i = hello_summaries->begin(), e = hello_summaries->end(); > i != e; ++i) > { > if (dump_file) fprintf(dump_file, "writing %s\n", (*i).second); > streamer_write_uhwi(ob, lto_symtab_encoder_encode(encoder, (*i).first)); > streamer_write_string (ob, ob->main_stream, (*i).second, true); > } > produce_asm (ob, NULL); > destroy_output_block (ob); > delete hello_summaries; > } > > static void > ipa_hello_world_read_summary (void) > { > struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); > struct lto_file_decl_data *file_data; > unsigned int j = 0; > if (dump_file) fprintf(dump_file, "hello from read summary\n"); > while ((file_data = file_data_vec[j++])) > { > if (dump_file) fprintf(dump_file, "iteration = %d\n", j); > size_t len; > const char *data = > lto_get_summary_section_data (file_data, > LTO_section_ipa_hello_world, &len); > if (!data) continue; > > const struct lto_function_header *header = (const struct > lto_function_header*) data; > gcc_assert(header); > gcc_assert(header->cfg_size); > const int cfg_offset = sizeof (struct lto_function_header); > const int main_offset = cfg_offset + header->cfg_size; > const int string_offset = main_offset + header->main_size; > class data_in *data_in; > > lto_input_block ib ((const char *) data + main_offset, header->main_size, > file_data->mode_table); > data_in > = lto_data_in_create (file_data, (const char *) data + string_offset, > header->string_size, vNULL); > unsigned int n = streamer_read_uhwi (&ib); > //hello_summaries = new hash_map; > for (unsigned i = 0; i < n; i++) > { > unsigned int index = streamer_read_uhwi(&ib); > lto_symtab_encoder_t encoder = file_data->symtab_node_encoder; > struct cgraph_node *cnode = dyn_cast > (lto_symtab_encoder_deref(encoder, index)); > gcc_assert(cnode); > const char* string = streamer_read_string (data_in, &ib); > if (dump_file) fprintf(dump_file, string); > } > } > > }
Question about points-to analysis, global variables, IPA, and field sensitivity
Hi, I am looking at the points-to analysis in GCC and I found the following comment in tree-ssa-structalias.c: /* Collect field information. */ if (use_field_sensitive && var_can_have_subvars (decl) /* ??? Force us to not use subfields for globals in IPA mode. Else we'd have to parse arbitrary initializers. */ && !(in_ipa_mode && is_global_var (decl))) >From what I understand here the points-to analysis is explicitly not using field sensitivity for global variables when in IPA. I am wondering, 0) Is "initializers" here talking about brace initializers? Or are "initializers" a little bit more abstract and refer to anything that can initialize a field? Like struct->field = function()? 1) How much work would it be to parse initializers? 2) How would global structs which are not initialized using initializers be handled? Would all fields get assigned NULL at the global variable level, but then as other fields get assigned we just create new constraints with the correct offsets? Thanks!
Re: Question about points-to analysis, global variables, IPA, and field sensitivity
> If the global is module local we should initialize it with NULL, yes. If it > is > not module local it should be initialized with NONLOCAL (that's both what > should currently happen correctly - it's needed for non-field-sensitive init > as well). > Awesome, thanks Richard! One more question: in the context of LTO, "module local" means the current partition? I understand that IPA-PTA is a late SIMPLE_IPA_PASS but my understanding of the consequences of this is a little fuzzy. I think that IPA-PTA also runs in multiple partitions (unless a flag like -flto-partition=none is used), but there might be some issue with reduced precision. Perhaps this reduced precision comes from NONLOCAL constraints with symbols that cross partitions? (This is just speculation on my part, as I mention, my understanding is a little bit fuzzy.)
Re: Question about reading LTO function summaries
Hello, just trying one more time. Any direction or hint is appreciated. Just as a note, I looked at the ipa-devirt as an inspiration for these small functions I wrote, but for some reason I can't read what I assume have written to the summaries. (What I'm trying to do is simply use LTO using a real IPA_PASS as opposed to SIMPLE_IPA_PASS, and therefore I just want to write some simple summaries and read them). I have wondered if the bitpack_d bp = bitpack_create (ob->main_stream); creates any difference, but when I tried it that didn't seem to be the case. Many thanks in advance! On Tue, 30 Mar 2021 at 10:18, Erick Ochoa wrote: > > Hello, > > just trying again to increase visibility of this question. Many thanks > in advance! > > > On Fri, 26 Mar 2021 at 13:49, Erick Ochoa wrote: > > > > Hello, > > > > I already have some experience developing SIMPLE_IPA_PASSes, but I am > > looking to understand IPA_PASSes better. I have made a hello world ipa > > pass that stores "hello world $FUNCTION_NAME" in the function > > summaries; however, I am having trouble reading this information back. > > Can someone help me understand how to use these interfaces correctly? > > > > At the moment, it **seems** to be writing information correctly. > > (I.e., it doesn't segfault when attempting to write data.) However, in > > my read summary function (ipa_hello_world_read_summary (void)) the > > function `lto_get_summary_section_data (file_data, > > LTO_section_ipa_hello_world, &len);` always returns NULL and > > `file_data_vec` is of size 1. This means that at run time, there is > > only one call to `lto_get_summary_section_data` and it returns NULL. > > > > I am copy-pasting the relevant code below. > > > > /* Map from cgraph_node* to "hello world $FUNCTION_NAME". */ > > static hash_map *hello_summaries; > > > > static void > > ipa_hello_world_write_summary (void) > > { > > gcc_assert(hello_summaries); > > struct output_block *ob = create_output_block > > (LTO_section_ipa_hello_world); > > gcc_assert(ob); > > if (dump_file) fprintf(dump_file, "hello world from write summary\n"); > > lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder; > > if (dump_file) fprintf(dump_file, "writing %d\n", > > hello_summaries->elements()); > > streamer_write_uhwi (ob, hello_summaries->elements()); > > > > for (auto i = hello_summaries->begin(), e = hello_summaries->end(); > > i != e; ++i) > > { > > if (dump_file) fprintf(dump_file, "writing %s\n", (*i).second); > > streamer_write_uhwi(ob, lto_symtab_encoder_encode(encoder, (*i).first)); > > streamer_write_string (ob, ob->main_stream, (*i).second, true); > > } > > produce_asm (ob, NULL); > > destroy_output_block (ob); > > delete hello_summaries; > > } > > > > static void > > ipa_hello_world_read_summary (void) > > { > > struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); > > struct lto_file_decl_data *file_data; > > unsigned int j = 0; > > if (dump_file) fprintf(dump_file, "hello from read summary\n"); > > while ((file_data = file_data_vec[j++])) > > { > > if (dump_file) fprintf(dump_file, "iteration = %d\n", j); > > size_t len; > > const char *data = > > lto_get_summary_section_data (file_data, > > LTO_section_ipa_hello_world, &len); > > if (!data) continue; > > > > const struct lto_function_header *header = (const struct > > lto_function_header*) data; > > gcc_assert(header); > > gcc_assert(header->cfg_size); > > const int cfg_offset = sizeof (struct lto_function_header); > > const int main_offset = cfg_offset + header->cfg_size; > > const int string_offset = main_offset + header->main_size; > > class data_in *data_in; > > > > lto_input_block ib ((const char *) data + main_offset, > > header->main_size, > > file_data->mode_table); > > data_in > > = lto_data_in_create (file_data, (const char *) data + string_offset, > > header->string_size, vNULL); > > unsigned int n = streamer_read_uhwi (&ib); > > //hello_summaries = new hash_map; > > for (unsigned i = 0; i < n; i++) > > { > > unsigned int index = streamer_read_uhwi(&ib); > > lto_symtab_encoder_t encoder = file_data->symtab_node_encoder; > > struct cgraph_node *cnode = dyn_cast > > (lto_symtab_encoder_deref(encoder, index)); > > gcc_assert(cnode); > > const char* string = streamer_read_string (data_in, &ib); > > if (dump_file) fprintf(dump_file, string); > > } > > } > > > > }
Question on changing IPA-PTA to an IPA_PASS
Hi, just a high level question. I know that IPA-PTA is a SIMPLE_IPA_PASS and that ideally it would be better as an IPA_PASS. I understand that one of the biggest challenges of changing IPA-PTA to an IPA_PASS is that on the current LTO framework, the LTRANS stage can happen at the same time for multiple transformations. This means (I think) that if IPA-PTA computed the points-to sets during WPA, during LTRANS there can be new variables, or modified function bodies that therefore no longer correspond to the computed IPA-PTA. However, it seems that there's a simpler way to change IPA-PTA to an IPA_PASS. This is by just introducing another section of "all_regular_ipa_passes". Something like (not sure if the name inside the brackets has to be unique) INSERT_PASSES_AFTER (all_regular_ipa_passes) // normal stuff TERMINATE_PASS_LIST (all_regular_ipa_passes) INSERT_PASSES_AFTER (all_regular_ipa_passes) NEXT_PASS (pass_ipa_pta); TERMINATE_PASS_LIST (all_regular_ipa_passes) I understand that this still singles out IPA-PTA as being "special", but I am wondering if this is something that has been discussed previously. The changes in the implementation on IPA-PTA should be something like: 1. LGEN: in summaries store the constraint variables that GCC is finding by inspecting gimple 2. WPA: actually solve the constraint variables and store only the solutions to the constraint variables in another summary 3. LTRANS: read from the summary and pass the information to gimple The advantage I see to this is that the precision of IPA-PTA would be equal to using -flto-partition=none but without actually using that flag and possibly some compile time speedup (when compared to -flto-partition=none because parallel reading and writing of summaries). Again, just an idea and I just want to understand if this has been discussed before and why it is or might not be something that you guys want.
Re: Question on changing IPA-PTA to an IPA_PASS
> > I don't think this would remove any problem that is present. > I have a problem understanding what you mean here because later on you state: > Now - the reason you think of is likely that IPA transform will instantiate > IPA clones and do inlining and transfering the IPA PTA solution to the > such transformed IL is at best complicated. That's true as well, of course, > in fact IPA inlining and cloning will improve context sensitivity (by > duplicating). So, I think you are agreeing with me that transferring the IPA PTA solution to the transformed IL is a big complication. So, going back to your previous statement "I don't think this would remove any problem that is present," does this mean that: 1. this will not solve the "transfering the IPA PTA solution to the such transformed IL" problem? 2. or "transfering the IPA PTA solution to the such transformed IL" is not a present problem? Sorry, just trying to understand what you are fully saying. Just to be more explicit, I believe (and might be wrong) that placing a new section of passes after the ipa-passes would now materialize all the new IL, and therefore when creating a new section, IPA-PTA would only see IL that will not be transformed. This gets rid of "transfering the IPA PTA solution to transformed IL" problem, because there is no more transformed IL. (But again, my understanding of some things might be wrong). > > But the main reason we've not even tried to make IPA PTA a true IPA pass > is because of the scalability issue. > I didn't know this was the main reason! Thanks! I'm wondering, let's say that IPA PTA magically gets better performance. What would be the next steps after that? Is looking into the summaries and updating the constraints something that would be a good idea. (I think Hubicka agrees on doing that.) Or do you have some other hints/ideas about what would be a good implementation?
Re: Question about reading LTO function summaries
t;, j, data ? "t" : "f"); +if (!data) continue; + +// This has not been executed with my simple example programs +const struct lto_function_header *header = (const struct lto_function_header*) data; +gcc_assert(header); +gcc_assert(header->cfg_size); +const int cfg_offset = sizeof (struct lto_function_header); +const int main_offset = cfg_offset + header->cfg_size; +const int string_offset = main_offset + header->main_size; +class data_in *data_in; + +lto_input_block ib ((const char *) data + main_offset, header->main_size, + file_data->mode_table); +data_in + = lto_data_in_create (file_data, (const char *) data + string_offset, + header->string_size, vNULL); +unsigned int n = streamer_read_uhwi (&ib); +//hello_summaries = new hash_map; +for (unsigned i = 0; i < n; i++) +{ + unsigned int index = streamer_read_uhwi(&ib); + lto_symtab_encoder_t encoder = file_data->symtab_node_encoder; + struct cgraph_node *cnode = dyn_cast +(lto_symtab_encoder_deref(encoder, index)); + gcc_assert(cnode); + const char* string = streamer_read_string (data_in, &ib); + if (dump_file) fprintf(dump_file, string); +} + } + +} + + + +static unsigned int +iphw_execute() +{ + if (dump_file) fprintf(dump_file, "hello from wpa\n"); + return 0; +} + +namespace { +const pass_data pass_data_ipa_hello_world = +{ + IPA_PASS, + "hello-world", + OPTGROUP_NONE, + TV_NONE, + (PROP_cfg | PROP_ssa), + 0, + 0, + 0, + 0, +}; + +class pass_ipa_hello_world : public ipa_opt_pass_d +{ +public: + pass_ipa_hello_world (gcc::context *ctx) +: ipa_opt_pass_d (pass_data_ipa_hello_world, ctx, + ipa_hello_world_generate_summary, + ipa_hello_world_write_summary, + ipa_hello_world_read_summary, + NULL, + NULL, + NULL, + 0, + NULL, + NULL) + {} + + virtual bool gate(function*) { return flag_ipa_hello_world; } + virtual unsigned execute (function*) { return iphw_execute(); } +}; +} // anon namespace + +ipa_opt_pass_d* +make_pass_ipa_hello_world (gcc::context *ctx) +{ + return new pass_ipa_hello_world (ctx); +} diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 5c7cd84d46f..fb96cc357c6 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -228,6 +228,7 @@ enum lto_section_type LTO_section_ipa_sra, LTO_section_odr_types, LTO_section_ipa_modref, + LTO_section_hello_world, LTO_N_SECTION_TYPES/* Must be last. */ }; diff --git a/gcc/passes.def b/gcc/passes.def index e9ed3c7bc57..54a7df36425 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -153,6 +153,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ipa_profile); NEXT_PASS (pass_ipa_icf); NEXT_PASS (pass_ipa_devirt); + NEXT_PASS (pass_ipa_hello_world); NEXT_PASS (pass_ipa_cp); NEXT_PASS (pass_ipa_sra); NEXT_PASS (pass_ipa_cdtor_merge); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 15693fee150..5ab807f045d 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -510,6 +510,7 @@ extern ipa_opt_pass_d *make_pass_ipa_fn_summary (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_inline (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_free_lang_data (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_free_fn_summary (gcc::context *ctxt); +extern ipa_opt_pass_d *make_pass_ipa_hello_world (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_sra (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt); -- 2.27.0 On Tue, 6 Apr 2021 at 14:07, Martin Jambor wrote: > > Hi, > > On Fri, Mar 26 2021, Erick Ochoa via Gcc wrote: > > I already have some experience developing SIMPLE_IPA_PASSes, but I am > > looking to understand IPA_PASSes better. I have made a hello world ipa > > pass that stores "hello world $FUNCTION_NAME" in the function > > summaries; however, I am having trouble reading this information back. > > Can someone help me understand how to use these interfaces correctly? > > > > At the moment, it **seems** to be writing information correctly. > > (I.e., it doesn't segfault when attempting to write data.) However, in > > my read summary function (ipa_hello_world_read_summary (void)) the > > function `lto_get_summary_section_data (file_data, > > LTO_section_ipa_hello_world, &len);` always returns NULL and > > `file_data_vec` is of size 1. This means that at run time, there is > > only one call to `lto_get_summary_section_data` and it returns NULL. > > I looked at the code you posted and compared it with streaming in > ipa-sra.c and did not spot any difference that could result in this > behavior. > > I guess you have checked that the functions are called from proper > hooks? (I.e. from write_summary and read_summary or > write_optimization_summary and read_optimization_summary, depending on > what you are trying to do, and not some mixture of these combinations?) > > You can try and send the whole patch (hopefully a hello world pass would > not be too large) and I can have a look. > > Martin
Late SIMPLE_IPA_PASS, DECL_INITIAL and error mark nodes
Hi, I am looking for a small clarification. I understand that during late SIMPLE_IPA_PASSes some statically initialized global variables might have error_mark_node trees in their DECL_INITIAL field. I believe that I read something similar in the past about how to get the tree expressions in these situations, and I believe it said one had to stream in those symbols from the .gnu.lto_.decl section. However, on the GCC Internals there is also the following mention: "If the DECL_INITIAL is the error_mark_node, there is an initializer, but it is given by an explicit statement later in the code; no bitwise copy is required. " [0] What (if any) is the correct way to get these expressions? As a note, when running this pass as an IPA_PASS I am able to see the DECL_INITIAL node of the variable of interest during LGEN. Thanks! [0] https://gcc.gnu.org/onlinedocs/gccint/Working-with-declarations.html
Semantics of OBJ_TYPE_REF
Hi, I am having some trouble understanding the semantics of OBJ_TYPE_REF. I understand that it is made of three operands: 1. OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use. 2. OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is being performed 3. OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table. The language here is very general and makes it hard for me to understand. Is 1. OBJ_TYPE_REF_EXPR: The virtual function 2. OBJ_TYPE_REF_OBJECT: The "this" object ? Why is the index needed if it looks like the virtual function is already pointed to by the first operand? I am reading Hubicka's post on devirtualization and Part II (https://hubicka.blogspot.com/2014/01/devirtualization-in-c-part-2-low-level.html) seems to agree with me on the example below. There is also no mention of OBJ_TYPE_REF on part II, but there is on part III. Hubicka's example: int test() () { int (*__vtbl_ptr_type) () *tmp1; int (*__vtbl_ptr_type) () tmp2; struct A a; struct A * b; _ZN1AC2Ev (&a); b = &a; tmp1 = b->_vptr.A; tmp2 = *tmp1; // Notice no OBJ_TYPE_REF below. return tmp2 (b); } My example: #include class A { public: virtual void foo() { printf("hello\n"); }; virtual void bar() { printf("world\n"); }; }; int main(int argc, char**argv ) { class A* a = new A(); a->foo(); a->bar(); return 0; } // Relevant gimple: a = D.2922; _1 = a->_vptr.A; _2 = *_1; // first element of the virtual table? OBJ_TYPE_REF(_2;(struct A)a->0) (a); // wouldn't this just be equivalent to // _2 (a) _3 = a->_vptr.A; _4 = _3 + 8; _5 = *_4; // second element of the virtual table OBJ_TYPE_REF(_5;(struct A)a->1) (a); // same // _5 (a) On part III Hubicka says that OBJ_TYPE_REF is a wrapper and that it represents the type and token. (My guess is that these are the arguments 2 and 3). I fail to understand why these are needed if we have the function already. Thanks! Any help is appreciated.
Re: Semantics of OBJ_TYPE_REF
Cool, thanks! I think I understand now. On Tue, 22 Jun 2021 at 23:58, Martin Jambor wrote: > > Hi, > > On Fri, Jun 18 2021, Erick Ochoa via Gcc wrote: > > Hi, > > > > I am having some trouble understanding the semantics of OBJ_TYPE_REF. > > I understand that it is made of three operands: > > > > 1. OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use. > > 2. OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is > > being performed > > 3. OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table. > > > > The language here is very general and makes it hard for me to understand. Is > > > > 1. OBJ_TYPE_REF_EXPR: The virtual function > > well, it is never the actual function as in a FUNCTION_DECL, it is > always a pointer to an unknown function. > > > 2. OBJ_TYPE_REF_OBJECT: The "this" object > > > > ? > > > > Why is the index needed if it looks like the virtual function is > > already pointed to by the first operand? > > As I wrote above, the first operand is just a pointer. Sometimes other > optimizations can figure out where it leads but often not (or not early > enough for inlining). > > In some situations we can get at the actual DECL (and subsequently have > a direct call) using type based devirtualization and for that we use the > other two arguments of OBJ_TYPE_REF. See the big leading comment in > ipa-devirt.c. > > > > I am reading Hubicka's post > > on devirtualization and Part II > > (https://hubicka.blogspot.com/2014/01/devirtualization-in-c-part-2-low-level.html) > > seems to agree with me on the example below. There is also no mention > > of OBJ_TYPE_REF on part II, but there is on part III. > > I believe the gimple quoted there has been simplified to make it easier > for the reader, since it does not describe the type-based > devirtualization but how other optimizations can together deduce the > call target, and so OBJ_TYPE_REF has been omitted not to confuse anyone. > > If you dump the IL for th example yourself, you will see OBJ_TYPE_REF in > the earliest dumps and you will see that it disappears as soon as the > call becomes direct. > > Hope this helps, > > Martin
Why does printing a pointer cause it to escape?
Hello, I know that some BUILT_IN functions are treated in a special way by the points-to analysis. Those functions are those that take pointers as arguments or return them but do not change their points-to set and similar cases. (E.g. strcpy returns a pointer to the same object as their first argument points to.) I notice that in these special cases, the printf function is nowhere to be found, and if one prints a pointer using printf the pointer points to escaped memory. Why is this the case? Thanks!
Re: Why does printing a pointer cause it to escape?
> I guess that to assume otherwise, one would have to make sure the > pointer does not correspond to a "%n" (or similar, perhaps even future) > conversion specifier. > Oh, wow, I didn't know about the %n specifier. Thanks Martin!
Re: Why does printing a pointer cause it to escape?
Hi Liu, thanks, this also seems correct. I originally thought that (in your example) ptr would be marked as pointing to { ESCAPE NONLOCAL } and that would be enough to hinder some optimizations that might influence variable value. Therefore, it wasn't necessary to mark "value" as pointing to { ESCAPE NONLOCAL }. But now I see that that's not the case and value needs to also point to { ESCAPE NONLOCAL }. Thanks!
Unique identifier across different partitions (LTO)
Hello, I'm trying to generate unique identifiers for some trees at link time. I understand that there are already some unique identifiers in declarations (DECL_UID) and perhaps others. Do all trees have unique identifiers or only declarations? Alternatively, if they don't have unique identifiers, again, I am trying to generate my own at link time. I originally was thinking about just having a counter and incrementing it every time I add a tree of interest to this data structure that I use to keep track of trees. However, with the parallel LTO framework, this would mean that identifiers will be duplicated across different partitions. Has anyone done something similar where information across partitions needs to be communicated? What I was thinking of as an alternative to communicating this information across partitions is to record the process id then use this information to generate a unique identifier based on the counter and the process id that is processing the partition. This derived identifier would be generated during WPA time. Has anyone had any experience doing something similar? I would be interested in seeing similar examples and use cases. Thanks!
How to get the global namespace (DECL_CONTEXT) at LTO during LGEN?
Hello, I am wondering if there's a way to get the global namespace at LTO during LGEN in each of the partitions being processed. I believe that during parse time for c/c++ there is a global_namespace macro or variable that can be used, but I don't think that it is possible to use at link time. Thanks!
Are DECL_UIDs for the same across partitions during LGEN?
Hi, I am still working on understanding the LTO framework and how the gimple representation works across multiple partitions. I found myself printing all global variables and printing their DECL_UID. I noticed that for some variables, their DECL_UIDs were different across different partitions. That is, a single global variable has different DECL_UIDs in different partitions. Is this the correct behaviour? My understanding of DECL_UID is that for every declaration there is a unique identifier, but is this not the case during LTO? I am building a transformation on top of releases/gcc-11, maybe newer commits address this, but I haven't seen a discussion in this mailing list. Perhaps in patches, but I don't follow that mailing list too closely. Thanks, any help is appreciated!
Re: Are DECL_UIDs for the same across partitions during LGEN?
On Wed, 30 Jun 2021 at 17:06, Richard Biener wrote: > > On June 30, 2021 4:07:00 PM GMT+02:00, Erick Ochoa via Gcc > wrote: > >Hi, > > > >I am still working on understanding the LTO framework and how the > >gimple representation works across multiple partitions. I found myself > >printing all global variables and printing their DECL_UID. I noticed > >that for some variables, their DECL_UIDs were different across > >different partitions. That is, a single global variable has different > >DECL_UIDs in different partitions. Is this the correct behaviour? > > Yes. > > My > >understanding of DECL_UID is that for every declaration there is a > >unique identifier, but is this not the case during LTO? > > It's also true for WPA and LTrans a variables UID is just not the same in all > of them because there's no way to ensure that. > So how would I be able to say that two different declarations are the same variable?
How to dump_decl_name to a string instead to a file?
Hello, I have a function that looks at identifiers of formal parameters. I found that when I ran this function on larger projects, the compiler segfaulted. I looked into this and I found that there are some formal parameters which have NULL in their DECL_NAME and that triggered a NULL dereference. I also noticed that these parameters when printed with dump_function_to_file were displayed using their DECL_UIDs. Digging a little bit deeper, the function responsible for printing parameters names is dump_decl_name. However, I am interested not in printing the names but having them as strings. I noticed that dump_decl_name takes a pretty_printer argument. Is it possible to somehow pass a pretty_printer and get the string instead of printing it? Thanks!
tree decl stored during LGEN does not map to a symtab_node during WPA
Hi, I am saving some tree declarations during LGEN that I will be later analyzing at WPA time. I am able to read the decl from my summaries and print it at WPA time. It corresponds to a global variable. However, whenever I use symtab_node::get (decl) during WPA time I keep getting NULL. Does anyone know why that might be the case? Is it possible that other optimizations are rewriting global variables during LGEN (or prior WPA)? The variable I am looking at is a static const char typeinfo name for a class in the program I am analyzing. I don't think this is an issue since other type info names have an associated symtab_node. Thanks!
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
Hi, I noticed this is also happening also for local variables. Again, storing tree declarations on a summary during LGEN and then at WPA time reading from those summaries. I can print the declaration, but when I try to look for its node in the symtab I get NULL as the return value. Any help is appreciated. Thanks! On Wed, 7 Jul 2021 at 11:27, Erick Ochoa wrote: > > Hi, > > I am saving some tree declarations during LGEN that I will be later > analyzing at WPA time. I am able to read the decl from my summaries > and print it at WPA time. It corresponds to a global variable. > However, whenever I use symtab_node::get (decl) during WPA time I keep > getting NULL. > > Does anyone know why that might be the case? Is it possible that other > optimizations are rewriting global variables during LGEN (or prior > WPA)? The variable I am looking at is a static const char typeinfo > name for a class in the program I am analyzing. I don't think this is > an issue since other type info names have an associated symtab_node. > > Thanks!
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> I'm not too familiar with it but I think you're supposed to stream encoded > symtab references during LGEN/WPA, Thanks Richard, this happened to be the solution. I am now using lto_symtab_encoder_t to encode the declarations during LGEN and decode them during WPA. Are there any more limitations of using stream_write_tree that one should be aware of? Now I am looking into storing trees of the type STRING_CST and I think this might be causing me a problem at WPA time. I think it segfaults at the moment of creating the process, but I still need more time to investigate. Perhaps you might know if storing STRING_CST trees has to be handled in a special way? Not sure if it also has something to do with LTO file sections. The tree is used to initialize a global static variable. Thanks!
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
Hi, Just to clarify a similar question: I am using stream_write_tree and looking at the comments it says that it is assumed that the tree T is already in the encoder cache. Does this mean that I have to use lto_symtab_encoder_t for all trees I want to store in summaries? I thought the encoder only works for trees which are stored on the symbol table. Would this mean that the only trees that can be written out to summaries are those that are declarations? Or are there any other encoders? I am trying to store SSA trees at LGEN and read them back during WPA. Thanks! Any help is appreciated. On Mon, 12 Jul 2021 at 12:55, Erick Ochoa wrote: > > > I'm not too familiar with it but I think you're supposed to stream encoded > > symtab references during LGEN/WPA, > > Thanks Richard, this happened to be the solution. I am now using > lto_symtab_encoder_t to encode the declarations during LGEN and decode > them during WPA. > > Are there any more limitations of using stream_write_tree that one > should be aware of? Now I am looking into storing trees of the type > STRING_CST and I think this might be causing me a problem at WPA time. > I think it segfaults at the moment of creating the process, but I > still need more time to investigate. Perhaps you might know if storing > STRING_CST trees has to be handled in a special way? Not sure if it > also has something to do with LTO file sections. The tree is used to > initialize a global static variable. > > Thanks!
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> There are entities, like SSA names and STRING_CSTs which are specially > encoded and if you stream those in your LGEN data you have to set up > appropriate encoders. In general streaming arbitrary trees isn't the > best thing to do, usually you're interested in specific pieces only. That's > especially true for things "local" to a function (like SSA names), I think > existing IPA passes only stream encoded references to global entities > (or parameters) and all "local" info is some pass specific data streamed > as raw data, not trees. Thanks! > > Why do you want to stream SSA trees for example? > I am working on a prototype for a points-to analysis where the implementation is an IPA_PASS as opposed to a SIMPLE_IPA_PASS. It is still somewhat early in the prototype implementation stages but I would like to have SSA trees at WPA time as a way to map constraints variables back to Gimple. The idea is to generate most constraints at LGEN time (some will have to be updated) and solve them at WPA time. Do you have any other suggestions on how to map constraint variables (similar to the index in the varinfo_t array) back to Gimple (but now as an IPA_PASS)? We have had similar discussions before, but now I am looking at things more concretely, from the point of view of: how exactly does one map the analysis back to Gimple while staying within what is possible in the LTO framework? I do have a pass-specific data structure, but it contains references to trees. It is similar to varinfo_t, which has a decl. I also have another tree for expressions (the concrete use case right now is having a global variable being assigned to a string literal), and a field for ssa variables, which was intended to store a reference to the corresponding tree. Again, as a way to map back the constraint variable to Gimple. > There can be no > references to those from other IPA entities? I don't follow this question. I think this may be a statement and not a question. Please clarify this for me, but perhaps the following might answer: Since I am working on an interprocedural analysis, SSA variables may point to abstract memory locations allocated in different functions. So I need to have a link between SSA variables and whatever memory locations they might point to.
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
On Tue, 13 Jul 2021 at 11:41, Richard Biener wrote: > There are entities, like SSA names and STRING_CSTs which are specially > encoded and if you stream those in your LGEN data you have to set up > appropriate encoders. I forgot to ask, is there an example of these appropriate encoders being used somewhere? I only see references to lto_symtab_encoder_t. Thanks!
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> I guess the way to encode SSA trees would be to use sth like a > , SSA-version tuple much like PTA internally > uses the varinfo array index as identifier for the variables in the > constraints. For local decls (as opposed to SSA names) it's a bit > more difficult - you'd have to devise your own encoding here. > > What you can rely on I think is that for local variables UID relations > are preserved, so you could sort cfun->local_decls and use the > position in this array as encoding (in fact I see local_decls is > streamed literally, so you don't even need to sort that for the > start - but we could likely do that without harm to make searching > for a UID O(log n)). At the moment I am generating a unique id for each constraint variable generated. I have assigned a unique LGEN number to each variable and during WPA I have merged duplicates. The duplication of equivalent gimple variables in distinct LGEN partitions happens for global variables (as we have discussed before). Do you know if there are other cases of duplication that might happen? For example, could a single function be analyzed in different LGEN partitions? I followed your example here and I am "encoding" the constraint variables that relate to SSA variables by looking at the cgraph_node and the SSA-version. The tree is not stored but at WPA we know the SSA-version and the cgraph_node and I think this is enough to relate back to the SSA variable in the gimple source. You mention that I need to devise my own "encoder", but I am not sure if we are conflating two notions: 1. encoding tree variables to constraint variables (i.e., a mapping of some tuple (cgraph_node x symtab_node x ssa-version) to an integer that represents the constraint variable) 2. encoding as an implementation of a data structure used during LTO to stream in and stream out trees/symbols to and from partitions. (e.g., lto_symtab_encoder_t). So to be clear, when you say I need to devise my own "encoder" you are referring to definition number 1, not definition number 2, right? And at LTRANS using the relation (cgraph_node x symtab_node x ssa-version) x constraint-variable-id one should be able to map to the interesting pointer/pointee from the constraint variable id. I am thinking a little bit ahead, but I will need a way to relate memory allocation sites (e.g., malloc's) to some constraint variable and perhaps generalize this to expressions (I would like to say that a variable is pointing to a STRING_CST for example). Do you have an idea on how to go and encode using the first definition of encoding tree expressions? I have seen some papers that use instruction-id's (potentially an integer that corresponds as a unique identifier for the instruction) but I am unsure if there is something similar to this in GCC. If what you meant is the second definition, can someone elaborate on the precise steps for making my own encoder? While I am somewhat familiar with using the LTO framework I am unfamiliar with potentially extending it in these sorts of ways. Thanks! Any help is appreciated.
tree that is both SSA_VAR_P and DECL_P?
Hi, I was sorting SSA trees and DECL_P trees into different buckets. I used DECL_P as a proxy for it being a local/global variable/function (essentially a declaration). It seems that "probabilistically", I'm kinda right. Normally there is no tree that is both an SSA_VAR_P and a DECL_P at the same time, but there appears to be at least one in a couple of large benchmarks. What could lead to a tree being declared both SSA_VAR_P and DECL_P? The first instance I've looked at where this happens the variable in the source is a static class member. But are there other cases? And why would it be the case that for smaller benchmarks there seems to be no instance of trees that are both SSA_VAR_P and DECL_P at the same time? I've been interpreting DECL_P as any variable that actually takes up memory. E.g., for every DECL_P variable there will be space allocated on the stack. But SSA_VAR_P does not need space allocated since they are just an abstraction to think about the flow of assignments. For example, we might have the following pseudo-gimple: PARM cond // variable cond is a boolean parameter (also DECL_P) DECL int i. // variable i is a local integer declaration // stack space is reserved for int i if (cond) { i_0 = 0 // i_0 is an SSA variable that corresponds to i } else { i_1 = 1 // same } i_2 = phi(i_0, i_1) In the end i_x will refer to the stack location reserved for i. I think that this is a somewhat naive view of the execution model but it is not until now that I found a counter example. Is the above not the case? As another note, I have not gone through the details on how static class member variables are implemented, so perhaps there are some details there where it would make sense to have a variable be both SSA_VAR_P and DECL_P. Can someone shed some light on this? Thanks! Thanks!
Re: tree that is both SSA_VAR_P and DECL_P?
Thanks! I'll have to see what conditions led to everything working before and change them to meet reality. On Thu, 15 Jul 2021 at 10:35, Richard Biener wrote: > > On Thu, Jul 15, 2021 at 10:22 AM Erick Ochoa via Gcc wrote: > > > > Hi, > > > > I was sorting SSA trees and DECL_P trees into different buckets. I > > used DECL_P as a proxy for it being a local/global variable/function > > (essentially a declaration). It seems that "probabilistically", I'm > > kinda right. Normally there is no tree that is both an SSA_VAR_P and a > > DECL_P at the same time, but there appears to be at least one in a > > couple of large benchmarks. > > > > What could lead to a tree being declared both SSA_VAR_P and DECL_P? > > /* Nonzero if DECL represents an SSA name or a variable that can possibly >have an associated SSA name. */ > #define SSA_VAR_P(DECL) \ > (TREE_CODE (DECL) == VAR_DECL \ > || TREE_CODE (DECL) == PARM_DECL \ > || TREE_CODE (DECL) == RESULT_DECL \ > || TREE_CODE (DECL) == SSA_NAME) > > there's your answer. SSA_VAR_P and DECL_P are not related in a way > you think. > > > The first instance I've looked at where this happens the variable in > > the source is a static class member. But are there other cases? And > > why would it be the case that for smaller benchmarks there seems to be > > no instance of trees that are both SSA_VAR_P and DECL_P at the same > > time? > > > > I've been interpreting DECL_P as any variable that actually takes up > > memory. E.g., for every DECL_P variable there will be space allocated > > on the stack. But SSA_VAR_P does not need space allocated since they > > are just an abstraction to think about the flow of assignments. For > > example, we might have the following pseudo-gimple: > > > > PARM cond // variable cond is a boolean parameter (also DECL_P) > > DECL int i. // variable i is a local integer declaration > > // stack space is reserved for int i > > if (cond) > > { > > i_0 = 0 // i_0 is an SSA variable that corresponds to i > > } > > else > > { > > i_1 = 1 // same > > } > > i_2 = phi(i_0, i_1) > > > > In the end i_x will refer to the stack location reserved for i. I > > think that this is a somewhat naive view of the execution model but it > > is not until now that I found a counter example. > > > > Is the above not the case? > > > > As another note, I have not gone through the details on how static > > class member variables are implemented, so perhaps there are some > > details there where it would make sense to have a variable be both > > SSA_VAR_P and DECL_P. > > > > Can someone shed some light on this? Thanks! > > > > Thanks!
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
Hello Richard, I need a little bit more help. In our previous messages you mentioned "" > > > > > I guess the way to encode SSA trees would be to use sth like a > > > , SSA-version tuple much like PTA internally > > > uses the varinfo array index as identifier for the variables in the > > > constraints. For local decls (as opposed to SSA names) it's a bit > > > more difficult - you'd have to devise your own encoding here. > > > There was a little confusion on my part about what "encoder" meant > > > You mention that I need to devise my own "encoder", but I am not sure > > if we are conflating two notions: > > > > 1. encoding tree variables to constraint variables (i.e., a mapping of > > some tuple (cgraph_node x symtab_node x ssa-version) to an integer > > that represents the constraint variable) > > 2. encoding as an implementation of a data structure used during LTO > > to stream in and stream out trees/symbols to and from partitions. > > (e.g., lto_symtab_encoder_t). > And you proceed with the following answer > I meant 1) and streaming using the LTO cgraph encoder for the cgraph > part and simply using the SSA version for the second part. > >From this exchange I understood that I should develop my own mapping for ssa variables and local declarations. However, when dealing with encoding a node which is available in the symbol table, I could use the function lto_symtab_encoder_encode to map symbols to an integer which would later make the symbol available at WPA time. Implicitly, for me, this meant that this integer is the same for every distinct symbol in different LGEN partitions. For example, if we encode symbol X from partitions Y and we get the number N, then encoding symbol X in partition Z should also yield N. I believe this is not the case, during WPA time I am printing: 1. pid of lgen process that generated the encoding 2. index returned by lto_symtab_encoder_encode 3. varpool_node->name () 4. the pointer address being pointed by varpool node I think we had a previous discussion where it was mentioned that the only way to distinguish between these cases is to look at varpool_node cgraph_node: (From a different email edited for brevity) On Wed, 30 Jun 2021 at 19:38, Richard Biener wrote: > > On June 30, 2021 6:28:29 PM GMT+02:00, Erick Ochoa wrote: > >So how would I be able to say that two different declarations are the > >same variable? > > By looking at the associated varpool node. > > Richard. > If this is the case, I can indeed get the varpool node's at WPA time (as shown above), but comparing their pointer addresses will be distinct. How can one find out that two varpool nodes/cgraph nodes are the same at WPA time? Is just looking at the assembler name enough? I of course want this to be safe. Another question, how is this currently handled in other IPA passes? Alternatively, do you have suggestions for encoding functions and global variables in a similar way to how you suggested encoding ssa variables and local declarations?
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> > If this is the case, I can indeed get the varpool node's at WPA time > > (as shown above), but comparing their pointer addresses will be > > distinct. How can one find out that two varpool nodes/cgraph nodes are > > the same at WPA time? Is just looking at the assembler name enough? I > > of course want this to be safe. > > If they are the same they are merged by the symtab merging process done > at WPA time. I trust you, but how would I verify it myself? For example, I mentioned that I printed: > 1. pid of lgen process that generated the encoding > 2. index returned by lto_symtab_encoder_encode > 3. varpool_node->name () > 4. the pointer address being pointed by varpool node and I got the same "name" but different indices, and different varpool node's addresses for each of the lgen partitions. And of course there's only one global variable with that name. In other words, what exactly does it mean that they are "merged" and in which cases would they not get merged? What I'm seeing is for example: fopen $PID1 8 $ADDR1 fopen $PID2 7 $ADDR2 where $PID1 != $PID2 (expected since it was seen in two LGEN partitions). They were encoded as "8" and "7" in each of their LGEN partitions. And when reading them and printing the address of cgraph_node $ADDR1 != $ADDR2. So, previously when I thought that merged implied that $ADDR1 == $ADDR2 this print shows that not to be the case. Also when I thought that merging implied they would have the same number, the different encoding showed that not to be the case. What I essentially want is the following: fopen $PID1 $ID fopen $PID2 $ID Where $ID can either be derived at WPA time or is encoded during LGEN. I'm wondering if I should "merge" these two constraint variables by checking their asm_name? I was just about to run an experiment to see different cases for this (i.e., same named static function and print their assembler name, etc.). This would be an example of $ID being derived at WPA time by doing an O(n^2) comparison in the number of functions, partitioning them into equivalent functions and then assigning each of the distinct partitions an incremental ID. > > > Another question, how is this currently handled in other IPA passes? > > Alternatively, do you have suggestions for encoding functions and > > global variables in a similar way to how you suggested encoding ssa > > variables and local declarations? > > I don't think any other pass has to encode SSA vars because those are > strictly function local. They only handle IP invariants aka addresses of > globals or so. > For SSA vars I followed your suggestion of having a function encoding and then using the SSA_VERSION number. Essentially x ssa version. However, since both varpool and cgraph_node are encoded in the same way (for which I am having problems), then the issue is with the function encoding.
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> > fopen $PID1 8 $ADDR1 > fopen $PID2 7 $ADDR2 > Just to clarify a bit further. $PID is generated and stored during LGEN. The encoding is obviously generated during LGEN. These are read during WPA. And the encoding is decoded and dyn_casted into a cgraph_node at WPA time. All these are printed during WPA.
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> > > 1. pid of lgen process that generated the encoding > > > 2. index returned by lto_symtab_encoder_encode > > > 3. varpool_node->name () > > > 4. the pointer address being pointed by varpool node > > Well, yes, during LGEN no WPA has run. Do you mean LTRANS after WPA? > Sure, the encoder numbers do not have to match up between different > LTRANS units but then they don't speak to each other so that shouldn't > matter, no? No. I mean during WPA. On a different e-mail I clarified the following: ``` > > fopen $PID1 8 $ADDR1 > fopen $PID2 7 $ADDR2 > Just to clarify a bit further. $PID is generated and stored during LGEN. The encoding is obviously generated during LGEN. These are read during WPA. And the encoding is decoded and dyn_casted into a cgraph_node at WPA time. All these are printed during WPA. ``` > > I _think_ that it should work if you stream at LGEN constraint variables > as their varpool node (using the varpool encoder), get the nodes merged > at WPA time, and thus your constraints from different LGEN runs "merged" > properly, then stream them the same way to the LTRANS units? > The only reference to a varpool encoder is on the Changelog. The only encoder I know of is the symtab_encoder (which I believe should be the same as varpool_nodes and cgraph_nodes are both symtab_nodes). But again, I do not know what you mean by "merged" here, since they have different addresses. > > And the same solution should exist. For "merged" function definitions > (like multiple same inline definitions) you'd simply drop all but one set of > constraints. > Yes, this is what I would like, but I don't see how to detect "merged" function definitions. I can get their cgraphs but as I mentioned for every encoding I decode and dyn_cast all I get is a cgraph holding a different address. What does "merged" concretely mean?
Re: tree decl stored during LGEN does not map to a symtab_node during WPA
> > But the addresses are at LGEN time? The following is what runs at WPA time unsigned long pid = streamer_read_uhwi (&ib); unsigned long id = streamer_read_uhwi (&ib); lto_symtab_encoder_t encoder = file_data->symtab_node_encoder; cgraph_node *cnode = dyn_cast(lto_symtab_encoder_deref(encoder, id)); logger ("%s %ld %ld %p\n", cnode->name (), pid, id, cnode); > Note the nodes are actually > streamed to different instances by input_symtab, then decls are merged > (lto_symtab_merge_decls), then I think the IPA > pass summaries are read in (to different unmerged instances!), _then_ > the symtab merging process starts (lto_symtab_merge_symbols). > I think the last step eventually calls the cgraph/varpool removal hook > IPA passes registered. Ah, so what you are saying is that during the read_summary stage they will still be different, but during execute or write_optimization_summary (), will they be finally merged? I think maybe the terminology of LGEN/WPA/LTRANS should be expanded to be lgen_gen, lgen_write, lwpa_read, lwpa_exec/lwpa_write, ltrans_read, ltrans_exec? So, just to be a bit more concrete, when initializing the ipa_opt_pass_d instance one has to write functions which will be called by a parent process. Normally I see the following comments with them: generate_summary write_summary read_summary write_optimization_summary read_optimization_summary and finally there's the execute function that gets called. I am doing the following: generate_summary, /* generating pid */ write_summary /* generating id and writing pid and id */ read_summary /* reading and printing the info I told about */ write_optimization_summary /* nothing yet */ read_optimization_summary /* nothing yet */ execute /* nothing yet */ And I think these correspond to the following "LGEN/WPA/LTRANS" stages 1. lgen (multiple processes) generate_summary 2. lgen (multiple process) write_summary 3. wpa (single process) read_summary 4. wpa (single process) execute 5. wpa? (single process?) write_optimization_summary 6 ltrans (multiple processes) read_optimization_summary And you are telling me that cgraph_node and varpool_nodes will have the same address only after the beginning of the execute stage but not before that? Is the above correct? I did try printing cnode->name() during execute and it segfaulted, so perhaps those function bodies where merged to something else? Note, that some names were successfully printed out. I'm wondering, can I use the function lto_symtab_encoder_deref during execute? I think this is unlikely... because in the past I've tried to use lto_symtab_encoder_encode during generate_summary and it caused segfaults. I'll still give it a try. Perhaps this is still a bit of progress? But now I'm wondering, if I can't use lto_symtab_encoder_deref and the nodes were indeed merged, do some of the varpool_node* I saved during read_summary are pointing to random memory? How am I able to tell which ones survived? > > It might be that you need a replace hook to do what you want, I think > that for example IPA CP encodes references to global vars aka &global > as IPA_REF and those are transparently re-written. > > As said, I think it can be made work but the details, since this is the > first IPA pass needing this, can be incomplete infra-structure-wise. > > Basically you have summaries like > > 'global = _3' > > where the should eventually be implicit and the constraints > grouped into constraints generated from the respective function body > and constraints generated by call stmts (not sure here), and constraints > for global variable init. But for the above constraint the point is to > make the 'global' references from different LGEN units the same by > some means (but not streaming and comparing the actual assembler name). > I'll need some more time to read through how ipa-cp encodes references to global variables. Thanks for the suggestion! I don't really follow the paragraph that details what you think my summaries look like. I'm thinking that for global = _3 global is a variable? and _3 states that it is an SSA variable in function 1? I think that can be a possible notation. I prefer to just use integers. What do you mean by implicit? But the idea is to essentially "compile" down all variables/functions/locals/ssa/etc into integers. And have the program represented as integers and relation of integers. For example: int* a extern void foo (int* c); int main () { int b; a = &b; foo (a) // defined in a different function } Should have the following at LGEN time (more specifically write_summary) variable -> long integer encoding abstract null -> $null_id cgraph main -> 0 cgraph foo -> 1 varpool a -> 2 tree b -> 0 x 0 // corresponds to main position 0 real arg c -> 1 x 0 // corresponds to foo position 0 Technically we can also map the other way around, we just need to know in which "table" the information is stored. (I.e., the symbol_table
Can someone help me understand cgraph_nodes & cgraph_edges during WPA
Hello, I've been working on an LTO points-to analysis pass for a little while. Because of LTO's design, gimple bodies are inaccessible during WPA. This essentially means that every LTO pass compiles down function bodies into their own IR which gets stored in function summaries and later read during WPA. This is also what I plan to do. I recently started looking into how IPA-CP works. I noticed that again, IPA-CP compiles down every function into its own function summary. However, while reading functions, it selectively decides which information to store by looking at symtab_node::prevailing_p. I was not aware of this function but from what I understand it is a way of deciding which symtab_node's bodies survive when removing duplicates before the execute stage of the pass. Is this correct? For IPA-CP, those cgraph_nodes for which the predicate symtab_node::prevailing_p are just read and discarded. This makes sense if it is a duplication of content. I know that different cgraph_nodes might represent the same function but maybe one of them has been specialized, another version has been inlined. I also think that two different cgraph_nodes might represent the same function implementation (i.e., they shared the same body and the same information but this information is duplicated during LGEN across partitions). I believe that it is not until the WPA/execute (making a distinction between WPA/execute and WPA/read_summary) that the distinct cgraph_nodes are merged. Would it be correct to say that a more faithful representation of reality is that non-prevailing_p nodes are eliminated while the other ones remain?) However, cgraph_nodes which represent the same function, but have been specialized will be marked as prevailing_p. Is this correct? (Here, I am not sure about the internals of the LTO, because in some sense, the points-to analysis hasn't run, but it is possible that other analysis have already run their WPA/execute stage and have said that some function bodies need to be specialized but at the moment they are still virtual clones? Related question, do virtual clones have cgraph_node?) I did a little experiment yesterday where I had the following control group: 1. encoded a cgraph_node during LGEN/write_summary 2. decoded a cgraph_node during WPA/read_summary and printed cnode->name () and compared it against the following experimental group: 1. encoded a cgraph_node during LGEN/write_summary 2. decoded a cgraph_node during WPA/read 3. during WPA/execute I printed cnode->name () What I found was that during the run of the "control group" I was able to print all cnodes' names. However, during the run of the "experimental group" only some of the names were printed before a segmentation fault occurred. Again, this might have been because those cgraph_node's were deleted. My theory is that these are non-prevailing_p cgraph_nodes but I haven't confirmed it experimentally, is this the case? I also do not know if all data being pointed to by these cgraph_node* is corrupted or if only some parts of the cgraph_node* have been removed from memory (like the name). Would cgraph_node* during WPA/execute in the experimental run have some valid fields or should it all be considered invalid and not even accessed outside of WPA/read? Looking at the definition of non-prevailing_p, it seems that all functions without a gimple body will be marked as non-prevailing_p. What does this mean though? There are definitely calls to external functions and so having a call to a non-prevailing_p just means that you are calling a function with no defined body. But what does that mean for functions that were "merged" or removed because they are duplicates? Can you have a cgraph_edge to a non-prevailing_p cgraph_node whose function body was once available at LGEN/lwrite but it is no longer available during WPA/execute? If that's the case how does one know the target of the call? Sorry if these are too many questions, I do greatly appreciate all the support given to me in the mailing list. In the meanwhile, I'll continue looking into how ipa-cp works to see what I can learn from other sources. Thanks -Erick
Question about finding parameters in function bodies from SSA variables
Hello Richard, I'm still working on the points-to analysis and I am happy to say that after reviewing the ipa-cp code I was able to generate summaries for local variables, ssa variables, heap variables, global variables and functions. I am also using the callback hooks to find out if cgraph_nodes and varpool_nodes are added or deleted between read_summaries and execute. Even though I don't update the solutions between execute and function_transform yet, I am reading the points-to pairs and remapping the constraint variables back to trees during function_transform and printing the name of pointer-pointee pairs. This is still very much a work in progress and a very weak points-to analysis. I have almost finished my Andersen's / field insensitive / context insensitive / flow-insensitive / intraprocedural analysis with the LTO framework (without interacting with other transformations yet). The only thing that I am missing is assigning parameters to be pointing to NONLOCAL memory upon entry to the function and perhaps some corner cases where gimple is not exactly how I expect it to be. I am wondering, none of the variables in function->gimple_df->ssa_names and function->local_decls are PARM_DECL. I'm also not entirely sure if I should be looking for PARM_DECLs since looking at function bodies' gimple representation I don't see the formal parameters being used inside the function. Instead, it appears that some SSA variables are automatically initialized with the parameter value. Is this the case? For example, for a function: foo (struct a* $NAME) The variable $NAME is nowhere used inside the function. I also found that there is an ssa variable in location X ( in function->gimple_df->ssa_names[X]) named with a variation like $NAME_$X(D) and this seems to correspond to the parameter $NAME. How can one (preferably looking only at function->gimple_df->ssa_names[$X]) find out that this tree corresponds to a parameter? Many thanks! -Erick
Some questions on hash_set/hash_map traits
Hello, I am refactoring some old code that runs as an IPA_PASS. This code is intended to run at link-time using the LTO framework and so I am always testing it that way. At the moment, I am trying to use hash-maps but having some difficulties. I am adding some patches that will help illustrate along the way. The first patch simply adds a link-time optimization pass that will be used to demo the problem that I am having. One can run with -flto -fipa-hash. During the first patch, the pass does not do any meaningful work but it just helps to set up the stage. For the second patch I create a wrapper around a symtab_node. This wrapper is intended to add some other functionality to elements that I might want to insert in hash-sets or hash-maps. class symtab_wrapper { private: symtab_node *_symtab_node; public: friend symtab_wrapper_traits; symtab_wrapper () : _symtab_node (NULL) {} symtab_wrapper (symtab_node *s) : _symtab_node (s) { gcc_assert (s); } void print (void) { if (!dump_file) return; gcc_assert (_symtab_node); fprintf (dump_file, "%s\n", _symtab_node->name ()); } }; I also have a wrapper around a hash-set to add some things that might be of interest to me in the future: template > class my_set_t { private: hash_set _impl; public: my_set_t () {} void insert (const KeyId &k) { _impl.add (k); } bool contains (const KeyId &k) { return _impl.contains (k); } void print (void) { if (!dump_file) return; for (auto i = _impl.begin (), e = _impl.end (); i != e; ++i) { (*i).print (); } } }; I also created a symtab_wrapper_traits because I want to store the object in the hash-map itself, and so I need to specify how to hash it. class symtab_wrapper_traits { public: typedef symtab_wrapper value_type; typedef symtab_wrapper compare_type; static const bool empty_zero_p = true; static inline hashval_t hash (const value_type &v) { return pointer_hash ::hash (v._symtab_node); } static bool equal (const value_type &l, const value_type &r) { return l._symtab_node == r._symtab_node; } static void mark_deleted (value_type &v) { v._symtab_node = NULL; } static void mark_empty (value_type &v) { v._symtab_node = NULL; } static void remove (value_type &v) { v._symtab_node = NULL; } static bool is_deleted (const value_type &v) { return !v._symtab_node; } static bool is_empty (const value_type &v) { return !v._symtab_node; } }; I then create a global set with my traits and populate it with some information and later print. It seems to be working: my_set_t my_set; static void ipa_hash_generate_summary (void) { cgraph_node *cnode = NULL; FOR_EACH_FUNCTION (cnode) { symtab_wrapper w (cnode); my_set.insert (w); } my_set.print (); return; } Here, I already have some questions, but this is not the biggest issue that I am having: I believe that the hash_set implementation manages its own memory, but I am unclear about the semantics of "removed". * Is "removed" supposed to, for example, call the destructor? Since, in this particular case, it is only a wrapper and cgraph_node is not intended to be destroyed, I just assign the pointer to NULL. Not sure if that's the intention. * I don't understand the semantics of empty_zero_p, I think it is used to zero out deleted entries? If I mark something as empty. * I am assuming that its memory will be re-used, is this the case? * I see that there is a typed_delete_remove function that is sometimes used in traits, but I think that it doesn't apply to this case because I am storing the whole object. Is that the case? I later did some refactoring (patch 3), which did not seem to impact the code now, but will impact hash_map later. The refactoring is as follows, I create an abstract "printable" class class printable { public: virtual char* str () = 0; void print (void); }; void printable::print (void) { if (!dump_file) return; char* _str = str (); fprintf (dump_file, "%s\n", _str); free (_str); } Make symtab_wrapper a publically derived class -class symtab_wrapper +class symtab_wrapper : public printable and implemented the str function in symtab_wrapper: virtual char* str (void) { if (!dump_file) return; gcc_assert (_symtab_node); char *retval = NULL; asprintf (&retval, "%s", _symtab_node->name ()); return retval; } Again, this seems to be working correctly. What is more interesting is moving from a hash-set to a hash-map. On the fourth patch, I implemented the hash_map equivalent to the hash_set that I am describing above: template class my_map_t { private: hash_map _impl; public: my_map_t () {} void insert (const KeyId &k, const Value &v) { _impl.put (k, v); } Value *select (const KeyId &k) { return _impl.get (k); } void print (void) { if (!dump_file) return; for (auto i = _impl.begin (), e =
Re: Can gcc itself be tested with ubsan? If so, how?
Hi, just as a note. This is also of interest to me. I have wanted to compile a single pass that I wrote using ubsan/other sanitizers for testing purposes. I was wondering if someone has already modified the build system to use ubsan to test their passes and if they could document the process for doing so. For these purposes, I don't really care if it can be done only without bootstrapping. I haven't investigated enough to find out if it is possible, but I suspect it is and may have already been done. Thanks! On Tue, 28 Sept 2021 at 01:01, Gary Oblock via Gcc wrote: > > I tried just adding "-fsanitize=undefined" to my CXX_FLAGS and got > a bunch of errors like this: > > /usr/bin/ld: ../libcody/libcody.a(server.o): in function > `std::__cxx11::basic_string, > std::allocator >::_Alloc_hider::~_Alloc_hider()': > /usr/include/c++/9/bits/basic_string.h:150: undefined reference to > `__ubsan_handle_type_mismatch_v1' > > They all seemed library related. > > Can ubsan be used on the compiler itself? > > Thanks, > > Gary > > > > > CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is > for the sole use of the intended recipient(s) and contains information that > is confidential and proprietary to Ampere Computing or its subsidiaries. It > is to be used solely for the purpose of furthering the parties' business > relationship. Any unauthorized review, copying, or distribution of this email > (or any attachments thereto) is strictly prohibited. If you are not the > intended recipient, please contact the sender immediately and permanently > delete the original and any copies of this email and any attachments thereto.
Question about symtab_node::order during LTO
Hi, I have an LTO pass which stores information collected during "generate function summary" in a map which is symtab_node* -> data*. I know that the symtab_node*s are encoded by an lto encoder and can be decoded back during the "read function summary". I also am aware that other optimizations might be removing or adding cgraph_node*s to the program and my pass uses the insertion and removal hooks to analyze or drop these accordingly. However, I am confused about different function versions. My current understanding is that different versions of the same function share the same cgraph_node*. Is this correct? If that's the case, that would mean that storing information on a map where the key is the symtab_node* is probably not a good idea as a specific version of the function might be dropped and in that case I may be dropping information for all cases. This brings me to the field "order" in symtab_node. Is this field constant through the linking process and can it be used to differentiate between different versions of the same function? Thanks!
Question about semantics of cgraph_node::prevailing_p
Hi, My current understanding of LTO is that the callgraph is very dynamic (i.e., optimizations might add or remove cgraph_nodes). A while back I encountered an issue where I couldn't print the cgraph_node::name of a function during the execute stage in LTO. I found that the only thing different was that these cgraph_nodes had a prevailing_p () returning false. There's only a few other places in the GCC sources that look at prevailing_p (ipa-fnsummary.c and ipa-prop.c) and both of them seem to drop information if the cgraph_node is not prevailing. I took this as guidance and in my optimization I dropped information from non-prevailing cgraph_nodes. (I figured out it must be something similar to the remove cgraph_hooks.) However, I am now revisiting whether this interpretation was correct or not. Can someone tell me what does cgraph_node::prevailing_p actually means? Also, is there a way to code an LTO test where a specific cgraph_node's prevailing_p returns false? That way I could probably verify the correctness of my analysis in the presence of non-prevailing_p cgraph_nodes. Thanks!
Question on calling possible_polymorphic_call_targets, side effects, and virtual tables.
Hello, I have a SIMPLE_IPA_PASS that parses the program multiple times. As it parses gimple, it builds a data structure with the information collected that will provide some invariants to future iterations over the program. I was looking into adding a new feature that would take advantage of devirtualization by calling possible_polymorphic_call_targets. All looks good, a large program that I use for verifying changes seems to compile nicely. However, as I generate a test file in which I discard most optimizations (except the one I am currently working on) I realize that an assertion on my pass is triggered. I dig in and it looks like I am ignoring error_mark_nodes in varpools' DECL_INITIAL. The first passes essentially encode the information that these are error_mark_nodes. On the pass in which I call possible_polymorphic_call_targets I find that suddenly, these error_mark_nodes are gone and replaced with a virtual table, thus triggering the assertion. Is there a way I can get rid of the error_mark_nodes from the earlier passes to match the changes brought by possible_polymorphic_call_targets? I suppose that finding polymorphic_call_targets at an earlier stage is a possibility, but I was wondering exactly which function/statement is able to fix these error_mark_nodes so that I can also learn about this. Thanks!
How to run C++ IPA tests?
Hi, I have been adding tests to the gcc/testsuite/gcc.dg/ipa folder successfully for a while now. I am starting to add some tests into gcc/testsuite/g++.dg/ipa now but I am having some issues. 1. Using `make check-g++` returns the following error message "No rule to make target 'check-g++'". 2. While `make check-gcc` works, this seems to be only for C tests (which is correct). 3. When I type `make check RUNTESTS="ipa.exp"` but the section "g++ Summary" looks empty. 4. I have added a test that I know will fail, but I don't see it (because I don't think I have run g++ tests correctly). To make sure that I haven't changed anything with my transformation I actually checked out releases/gcc-11.2.0 Can someone tell me how to run the C++ IPA tests? I'm sure there is something silly that I'm doing wrong, but can't find out what it is. Thanks!
Question on cgraph_node::force_output
Hi, I am looking at tree-ssa-structalias.c looking at what makes a function nonlocal during IPA-PTA. I am having some problems understanding force_output and when it is set or unset. 1. What is the meaning of force_output? cgraph.h gives an example that force output means that the symbol might be used in an invisible way. I believe this means some sort of "unanalyzable" way. However, for a few tests I've made, all functions have the field force_output set to true. 2. Does this value depend on some other pass? At the moment, I am looking at this field within my own passes (IPA_PASS and SIMPLE_IPA_PASS), but I would like to inspect the dump_file(s) which show information about force_output to make sure that it doesn't depend on pass order or even my own flags. 3. What flags should I use to inspect force_output? Thanks!
Question on ipa_ref->referring and ipa_ref->stmt on all_late_ipa_passes
Hi, I would like to use ipa_ref in the PASS_LIST all_late_ipa_passes to query the statement (ref->stmt) of where a global variable is used. However, I am having some problems achieving this. What I do is: 1. Check that ipa_ref->referring has a body and is not inlined. 2. get_body 3. try to print out the gimple statement using print_gimple_stmt (dump_file, ref->stmt, 0, TDF_NONE). This all seems correct to me, but I have been receiving errors that print is trying to print a tree with an incorrect TREE_CODE. I am assuming here that ref->stmt is not updated after all_regular_ipa_passes, much like how when looking at cgraph_edge the call statement is also not updated. Can someone please tell me if this is indeed the case or what is happening here? Also, while I think that the gimple statements might not be maintained, I see that ipa_ref is still used in the ipa_pta pass during all_late_ipa_passes. I see that ipa_ref->referring and ipa_ref->stmt are not used. Instead the tree of the referred is obtained in the following way: ref->referred->decl. I am assuming that it would be possible to use ref->referred->decl and search for this tree everywhere in referring to find the uses. Can someone confirm this? Thanks! -Erick
Re: Question on ipa_ref->referring and ipa_ref->stmt on all_late_ipa_passes
On Mon, 14 Feb 2022 at 10:57, Jan Hubicka wrote: > > Hi, > > > > I would like to use ipa_ref in the PASS_LIST all_late_ipa_passes to query > > the statement (ref->stmt) of where a global variable is used. However, I > am > > having some problems achieving this. > > > > What I do is: > > > > 1. Check that ipa_ref->referring has a body and is not inlined. > > 2. get_body > > 3. try to print out the gimple statement using print_gimple_stmt > > (dump_file, ref->stmt, 0, TDF_NONE). > > > > This all seems correct to me, but I have been receiving errors that print > > is trying to print a tree with an incorrect TREE_CODE. I am assuming here > > that ref->stmt is not updated after all_regular_ipa_passes, much like how > > when looking at cgraph_edge the call statement is also not updated. Can > > someone please tell me if this is indeed the case or what is happening > here? > > Yes, while body materialization we keep cgraph edges up to date but we > do not keep references. We probably should remove them earlier. > Keeping them up to date would need relatively some work. We do so for > calls since they hold important information (i.e. where to redirect the > call). For references we don't have such machinery in place (even though > I was thinking implementing it). Main difficulty is that inlining also > performs statement folding that moves referneces around statements quite > freely > Hi Honza, just a bit of clarification here, when you are saying that references are not updated do you mean just ipa_ref->stmt or do you also include other things like ipa_ref->referring, ipa_ref->use? Thanks! > > > > Also, while I think that the gimple statements might not be maintained, I > > see that ipa_ref is still used in the ipa_pta pass during > > all_late_ipa_passes. I see that ipa_ref->referring and ipa_ref->stmt are > > not used. Instead the tree of the referred is obtained in the following > > way: ref->referred->decl. I am assuming that it would be possible to use > > ref->referred->decl and search for this tree everywhere in referring to > > find the uses. Can someone confirm this? > > I will check what ipa-pta uses here. I suppose it works since you still > have all references from the pre-IPA-transform stage, so it is > consistent... > > Honza > > > > Thanks! > > -Erick >
Question on ipa-bit-cp and pointer_plus_expr
Hello, I'm trying to understand ipa-bit-cp/ipa-cp and how the known bits are propagated to the lattice in the case of a pointer_plus_expr. I have a piece of code similar to the following (this being a simplified example) int main () { // a = some pointer. foo (a); } foo (void* a) { bar (a + 1); } bar (void *a) {} I have the following ipa-cp dump: callsite main -> foo param 0: UNKNOWN value: 0x0, mask 0xfff8 as it is passed to bar I have the following: callsite foo -> bar param 0: PASS THROUGH: 0, op poiner_plus_expr 8 value: 0x0, mask 0x My question is, shouldn't the mask to bar be 0xfff8? If I understand correctly the mask being 0xfff8 implies that the value will be a multiple of 8. By adding 8 (or any multiple of 8) to it, it should still be a multiple of 8. Where I believe the mask becomes 0x is that during the addition there might be an overflow of the masks and the operand 8. But I am still unsure. Can someone point out what is happening here? Thanks!
Re: Question on ipa-bit-cp and pointer_plus_expr
Thanks Martin! The example I showed is simplified and I had not specified that I was using LTO, but I am. Thanks for asking me to clarify this. The issue I have with the information in the Lattice section of the dump is that it shows the information once the analysis has achieved a fixed point. However, the example is a bit more complicated. There is a recursive call in bar and some pointer manipulation which will for sure remove the static information that the incoming parameter to bar is a multiple of 8. bar (void *a) { // some pointer manipulation a = // pointer manipulation based on a, casts to (char*) arithmetic... bar (a); } However, I am mostly interested in the static information available at the callsite foo -> bar and not on the incoming static information available at the parameter "a" of bar. Does this make sense? Is there a place where I can look at the static information available at the argument and not the parameter? Thanks! On Thu, 17 Feb 2022 at 17:25, Martin Jambor wrote: > Hi, > > On Thu, Feb 17 2022, Erick Ochoa via Gcc wrote: > > Hello, > > > > I'm trying to understand ipa-bit-cp/ipa-cp and how the known bits are > > propagated to the lattice in the case of a pointer_plus_expr. > > > > I have a piece of code similar to the following (this being a simplified > > example) > > > > int main () > > { > > // a = some pointer. > > foo (a); > > } > > > > foo (void* a) > > { > > bar (a + 1); > > } > > > > bar (void *a) {} > > > > I have the following ipa-cp dump: > > > > callsite main -> foo > > param 0: UNKNOWN > > value: 0x0, mask 0xfff8 > > > > > > as it is passed to bar I have the following: > > > > callsite foo -> bar > > param 0: PASS THROUGH: 0, op poiner_plus_expr 8 > > value: 0x0, mask 0x > > > > My question is, shouldn't the mask to bar be 0xfff8? > > > > If I understand correctly the mask being 0xfff8 implies that > > the value will be a multiple of 8. By adding 8 (or any multiple of 8) to > > it, it should still be a multiple of 8. Where I believe the mask becomes > > 0x is that during the addition there might be an overflow > > of the masks and the operand 8. But I am still unsure. Can someone point > > out what is happening here? > > If you do not use LTO, foo and bar must be static for IPA-CP to > determine that their arguments are aligned - and correctly so, if there > are unknown callers passing unknown values, we cannot assume anything. > With LTO or them being static, IPA-CP can do so. > > But make sure you understand that different things in the dump mean > different stuff, you will not find any propagated values in the > jump-function dump which you quoted, those really just describe the call > without any larger context. For results of propagation you need to look > into the "Lattices" section of the dump. > > Martin >
Re: Question on ipa-bit-cp and pointer_plus_expr
> If I understand you correctly, that is indeed the jump function, > obtainable through ipa_get_ith_jump_func (args, i) where args is a > result of ipa_edge_args_sum->get and i is the index of the parameter. > Thanks Martin! So then, am I correct in thinking that callsite foo -> bar param 0: PASS THROUGH: 0, op poiner_plus_expr 8 value: 0x0, mask 0x should be callsite foo -> bar param 0: PASS THROUGH: 0, op poiner_plus_expr 8 value: 0x0, mask 0xfff8 ?
ASSERT_EXPR during SIMPLE_IPA_PASS and LTO
Hi, I am working on an analysis that is able to determine some static information about a specific variable. At the moment, I would like to avoid much of the transformation by taking advantage of other GCC's passes. So, I can imagine something like, create an ASSERT_EXPR and let other passes change the CFG, and remove dead code, fold constants, etc. At the moment I am only prototyping it, and so I have created a SIMPLE_IPA_PASS during LTO that finds the variables of interest and then creates an ASSERT_EXPR which contains the static information of interest. This looks to be working somewhat correctly. When I print the functions' gimple everything looks ok. The pass doesn't fail even if I use the TODO_verify_all flag at the end of the SIMPLE_IPA_PASS. However, the transformation triggers a segfault during fixup_cfg and I am unsure what can be the case. My questions are: 1. Is it possible to insert new ASSERT_EXPR during LTO and have other tree passes transform the function based on this? 2. How complex can the ASSERT_EXPR be? I can see that it must be a COND expr, by which it can be EQ_EXPR, NE_EXPR, LT_EXPR... and others. But what if I need to express something like a bitmask to guarantee that certain bits are zero? Could I have a complicated ASSERT_EXPR where COND is ASSERT_EXPR (a & 0xff == 0)? Or should I break it down? temp = a & 0xff; ASSERT_EXPR (temp == 0); 3. Any other advice? Thanks!
Question on updating function body on specialized functions
Hi, I have one function (F) that has been specialized for two different calling contexts (F1 and F2) and two late SIMPLE_IPA_PASSes (A and B). Pass A changes some MEM_REFs such that the type of MEM_REF is compatible with the type of the first operand of the expression. Pass A changes both F1 and F2. I have printed the function bodies of both F1 and F2 during Pass A and everything looks correct. Pass B uses these changes. However I noticed this interesting behaviour: 1. If I fix F1 first and then F2, then pass B will see F2 correctly but some of F1 MEM_REFs will be incorrect. 2. If I fix F2 first and then F1, then pass B will see F1 correctly but some of F2 MEM_REFs will be incorrect. My question is do different specialized functions share the same trees? How would I then change the bodies of specialized functions? Thanks!
Re: Question on updating function body on specialized functions
Hi Martin! Thanks for replying, turns out that while I was trying to reply to you I was able to get the answer. Turns out there is indeed one tree node which is shared across the two functions. And that is TREE_OPERAND (MEM_REF, 1). When I was assigning to TREE_TYPE ( TREE_OPERAND (MEM_REF, 1) ) in one function, I was modifying the other. The solution was to create a new tree and assign it directly to TREE_OPERAND (MEM_REF, 1) in both functions. Thanks!
Question on mapping old to specialized cgraph_edges
Hi, I am trying to find a map between cgraph_edge*s before ipa-cp and after ipa-cp has specialized nodes. Does anyone know if such a map is available? Or an equivalent? I think technically it should be a map between: (cgraph_node* caller, cgraph_edge* e, cgraph_node *callee) X (cgraph_node* caller', cgraph_edge* e', cgraph_node *callee') since both caller and callee may be updated during ipa-cp. But any help or direction is appreciated. Thanks!
Question on not sharing trees (tree_node_can_be_shared)
Hi, I am interested in annotating INTEGER_CSTs. I have added a field to typed_trees and set the value in some cases. However, INTEGER_CSTs can be shared and copied across the function and even copied to other functions. I don't want all INTEGER_CSTs with the same value to be shared as now they have an annotation which may make them somewhat unique. My question is, is there a way to disable sharing of INTEGER_CSTs? I have tried modifying tree_node_can_be_shared to return false when the argument is an INTEGER_CST. However, the verifier then fails stating that there was an incorrect sharing of tree nodes. My intuition tells me that this means that some pass assumes that INTEGER_CSTs can be shared and therefore the information gets copied. However, during the verifier this gets caught and returns an error. Is this correct? How would one go around and disable sharing INTEGER_CSTs? Is there a tree comparison for INTEGER_CSTs that I can set such that if an INTEGER_CST has my field set then it registers as a different INTEGER_CST even if they have the same constant value? Thanks!
Question on path from C parser to DECL_INITIAL
Hi, I am trying to understand what path is executed in GCC from parsing a C expression (in a global variable declaration) to the value in DECL_INITIAL. At the moment, I have annotated a tree during parsing. I have another debugging pass that looks for this tree in subsequent passes. The annotation persists if the tree is inside a function body. However, if the tree is in a global variable initialization (i.e., DECL_INITIAL), then I am unable to see the annotation even when placing the debugging pass at the beginning of passes.def. Can someone provide a little bit of guidance? Thanks!
Re: Question on path from C parser to DECL_INITIAL
> I'm not sure I understand but no pass walks 'global variables', so you're > not > going to "see" the annotation from passes (whatever that means). > > What I mean by walking global variables is that I have a GIMPLE_PASS that ignores the function sent as an argument and instead just uses a FOR_EACH_VARIABLE to look at varpool nodes and check if the DECL_INITIAL has a field set or unset. I can print the variable, and DECL_INITIAL. However the DECL_INITIAL tree always has the field unset even though I am setting it during c_sizeof_or_alignof_type. The initialization looks simply as: int a = sizeof (foo);
Question on cgraph_edge::call_stmt during LTO
Hi, I'm working on a pass that looks into the estimated values during ipa-cp and stores them for a later analyses/pass. I would like to store the real arguments' estimates in a cgraph_edge::call_stmt or somewhere else that makes similar sense. (Note, this is different from the formal parameters' estimates which can be found in the lattice print out of ipa-cp). I have already added a new field in the definition of gimple_call, made sure that this field is streamed-out, streamed-in, and set the values during ipa-cp. However, I am having problems with what I believe is the inlining and cloning of cgraph_nodes. Whenever a cgraph_node is inlined or cloned, I would need to copy this information and update if necessary. At the moment, when I am trying to read the information during a late SIMPLE_IPA_PASS, the information is simply not there. I believe that the cgraph_edge is not the same since during ipa-cp the callee has been specialized and during ipa-inline the caller has been inlined to a different caller. Also, for some cgraph_edge's the call_stmt is NULL. I believe this can also be due to inlining, but I am not sure. Can someone point out a good way to keep this information in sync with the creation and deletion of cgraph_edges? Maybe an alternative to cgraph_edge::call_stmt? Thanks! -Erick
Re: Question on cgraph_edge::call_stmt during LTO
Hi Martin, Thanks for the tips! I have implemented an edge summary which: * is allocated at IPA analysis phase * streamed out in ipcp_write_transformation_summaries * streamed in in ipcp_read_transformation_summaries However, before the implementation of this edge summary we had another mechanism of propagating the information all the way until it was used in a SIMPLE_IPA_PASS executed after all LGEN stages were finished (after all_regular_ipa_passes). After changing the implementation to use edge summaries, I find that the information is conserved during inlining (the duplication hook prints out the new edges that gets formed via inlining with the correct information), however it is not found in the SIMPLE_IPA_PASS that gets executed after all_regular_ipa_passes. What is perhaps more interesting is that if I run with -fno-ipa-pure-const and no -fno-ipa-modref, I can still see the cgraph_nodes and edges of the inlined methods, along with the information needed. But not in the ones that have been inlined. I believe this could be just that when these options are disabled, cgraph_nodes might not be reclaimed. I understand that there are many differences between SIMPLE_IPA_PASSes and regular IPA_PASSes, but at the moment I am unsure how to narrow down my search for a fix. Is this something that could be caused by: * memory management: (I am not familiar with the memory management in GCC and it is a bit difficult to understand.) I have removed the bodies of the my_edge_summary::remove (cgraph_edge*) and my_edge_summary::remove (cgraph_edge *, my_edge_summary_instance *) so I don't think this might be it. However, the class my_edge_summary still copies some of the structure in the other transformation summaries, so there is a macro GTY((for_user)) in the class declaration and the information is stored in a vec *my_info. * missing implementation details in the duplicate functions: Looking at ipa_edge_args_sum_t::duplicate, it is a relatively complex function. I also noticed that it does something else when the dst->caller has been inlined. Should I also update the cgraph_edge that disappears when dst->caller is inlined to its caller? * something else? Any direction is greatly appreciated! Many thanks! -Erick On Sat, 21 May 2022 at 00:13, Martin Jambor wrote: > Hello, > > On Fri, May 20 2022, Erick Ochoa via Gcc wrote: > > Hi, > > > > I'm working on a pass that looks into the estimated values during ipa-cp > > and stores them for a later analyses/pass. I would like to store the real > > arguments' estimates in a cgraph_edge::call_stmt or somewhere else that > > makes similar sense. (Note, this is different from the formal parameters' > > estimates which can be found in the lattice print out of ipa-cp). > > the statement is not the right place to store such pass-specific > information, for reasons you described and more (especially simple > memory use efficiency). > > Instead they should be placed into an "edge summary" (also sometimes > called "call summary"), a structure similar to ipa_edge_args_sum (in > ipa-prop.h and ipa-prop.cc). Unlike ipa_edge_args_sum, which is > allocated at analysis phase, then streamed out and in in case of LTO, > and used thrown away during the IPA analysis phase, your summary would > need to be allocated at IPA analysis time, then streamed out in > ipcp_write_transformation_summaries, streamed in in > ipcp_read_transformation_summaries so that they can be used in the > transformation phase. > > Usually a simple implementation of the duplication hook of an edge > summary is enough for the data to survive cloning and inlining and the > like. > > Martin >
Questions on transition from IPA_PASS (LTRANS) to SIMPLE_IPA_PASS
Hi, I understand some differences between IPA_PASSes and SIMPLE_IPA_PASSes. However, I have questions about the cleanup processes that IPA_PASSes might have. * Would it be correct to assume that all node and edge summaries are deleted after the last IPA_PASS? And would it also be correct to assume that passes are responsible for releasing this information whenever it is no longer needed? In other words, are there any methods / infrastructure that runs between passes that may deallocate these summaries? * Is there any sort of processing done on cgraph_nodes and cgraph_edges beyond what is found in the LTRANS stage of each of the passes? I am imagining something similar to streaming in and streaming out trees / summaries, but I don't believe that should be the case. In my understanding, the process that runs LTRANS is the same that runs the late SIMPLE_IPA_PASSes. Thanks! -Erick
Question about speculative make_edge_direct_to_target during LTO/IPA_PASS
Hi, I have a pass that is able to speculate the target of indirect function calls. This pass is an IPA_PASS. It: 1. generates summaries with the possible targets. 2. writes analysis summary 3. reads analysis summary 4. combines the results from multiple partitions and if needed fixes the targets 5. writes opt summary 6. reads opt summary 7. calls ipa_make_edge_direct_to_target with speculative=true This pass is successful in that I can observe the transformation in the generated gimple. (A function test and a direct function call in one branch and the indirect function call in the other.) However, I would like to make the edges in the call graph visible to other IPA passes, particularly ipa-cp. For this, I would need to create virtual clones during the WPA stage. I am a little unfamiliar with virtual clones. What kind of information would I need to store in the analysis summary and is there a way to create speculative virtual clones? Can someone point to a similar piece of code in GCC where this happens? Thanks!
Re: Question about speculative make_edge_direct_to_target during LTO/IPA_PASS
On Fri, 1 Jul 2022 at 14:48, Martin Jambor wrote: > Why so late, why not as part of number 4? > Hi Martin, Thanks for the feedback. The original reason why the call to make_edge_direct_to_target was done so late is because of the lack of function bodies during WPA, summaries with insufficient information (I keep track of only some information which when access to function bodies is given during LTRANS I can finally have a cgraph_edge to use as a parameter), and I am still figuring out things with edge summaries. I have a couple of questions here to improve my summaries: If I were to call make_edge_direct_to_target during WPA (or number 4) then I need to have access to the indirect edge during WPA. This is because make_edge_direct_to_target receives a cgraph_edge as a parameter. In order to have access to the indirect edge during WPA, I need to generate summaries which keep track of indirect edges. I think this would be done with edge summaries. Again, my understanding of edge summaries is incomplete. My understanding is that they contain information to be transmitted along the edges of the call graph. However, the implementation details are what is making it difficult for me to understand. Unlike cgraph_nodes which can be encoded and decoded, it seems to me that cgraph_edges have no encoding nor decoding. This makes it somewhat hard to say that an information belongs to a given edge. Looking into ipa-cp / ipa-prop it looks like edge summaries are stored by first encoding the caller cgraph_node and then iterating over the edges in the node->callees and node->indirect_calls. However, I know that edges can be removed. That's why there are edge removal hooks. Shouldn't this affect the information that is written if an edge was removed somewhere else? Something that is not explicit, is also that the order in the node->callees and node->indirect_calls is preserved from the moment the summary is written to the moment the summary is read. Is this the case? My understanding is also limited from the fact that ipa-cp does not have a proper write_summary nor read_summary stages in its own class and instead ipa-cp's summaries are written and read during ipa-prop. Is this an implementation detail so that the edge information created for ipa-cp is propagated correctly during inlining? (Or is there some other explanation for this?) Would it also makes sense to write and read my edge summaries at the same time as ipa-cp's edge summaries? I know that the base class (call_summary) also has virtual methods for removal and duplication of edges (which should be triggered during inlining / cleaning up of the callgraph). But somehow the virtual methods ::remove and ::duplicate seem different to me from the callbacks which need to be registered in the IPA_PASSes. Again, why would one need to write and read ipa-cp's summaries during ipa-prop if these callbacks exist which are triggered when summaries need to be updated? Thanks! -Erick
Creating a wrapper around a function at compile time
Hello, I'm looking for some help in how to create a new function at compile time / link time. The idea is an alternative form of constant propagation. The current implementation of ipa-cp, may specialize functions for which arguments may be known at compile time. Call graph edges from the caller to the new specialized functions will replace the old call graph edges from the caller to the original functions. Call graph edges which have no known compile time constants will still point to the original unspecialized function. I would like to explore a different approach to function specialization. Instead of only specializing functions which are guaranteed to have a compile time constant, I would like to also attempt to specialize the edges which do not have compile time constants with a parameter test. In other words, for call graph edges with non-constant arguments at compile time, create a wrapper function around the original function and do a switch statement around parameters. For example, let's say we have a function mul, which multiplies two integers. int mul (int a, int b) { return a * b; } Function mul is called from three different callsites in the whole program: A: mul (a, 2); B: mul (b, 4); C: mul (c, d); At the moment, ipa-cp might specialize mul into 3 different versions: // unoptimized original mul int mul (int a, int b) { return a * b; } // optimized for b = 2; int mul.constprop1 (int a) { // DEBUG b => 2 return a << 1; } // optimized for b = 4; int mul.constprop2 (int a) { // DEBUG b => 4 return a << 2; } and change the callsites to: A: mul.constprop1 (a); B: mul.constprop2 (b); C: mul (c, d); I would like instead to do the following: Create a function mul_test_param int mul_test_param (int a, int b) { switch (b) { case 2: return mul.constprop1 (a); break; case 4: return mul.constprop2 (a); break; default: return mul (a, b); break; } } The function mul_test_param will test each parameter and then call the specialized function. The callsites can either be changed to: A: mul.constprop1 (a); B: mul.constprop2 (b); C: mul_test_param (c, d); or A: mul_test_param (a, 2); B: mul_test_param (b, 4); C: mul_test_param (c, d); The idea is that there exist some class of functions for which the parameter test and the specialized version is less expensive than the original function version. And if, at runtime, d might be a quasi-constant with a good likelihood of being either 2 or 4, then it makes sense to have this parameter test. This is very similar to function tests for making direct to indirect functions and to what could be done in value profiling. I already know how to achieve most of this, but I have never created a function from scratch. That is the bit that is challenging to me at the moment. Any help is appreciated. Thanks! -Erick
Re: Question about speculative make_edge_direct_to_target during LTO/IPA_PASS
Hi Martin, thanks a lot for your help! You were right! I am now able to call make_edge_direct_to_target during WPA. -Erick
Re: Creating a wrapper around a function at compile time
Hi Richard, > > > So instead of wrapping the function why not transform the original > function > > to have a prologue doing a runtime check for the compile-time specialized > > versions and perform tail-calls to them? > > > > What I'm missing is who would call mul_test_param in your case? > The idea is that all callsites to mul may call instead mul_test_param or only those for which there is no known compile time constant. If we had some more information about the distribution of the parameter values at runtime, then we could even choose not to use the wrapper. > Following your variant more closely would be doing value profiling > of function parameters and then "speculative IPA-CP". > Yes, the idea of doing value profiling on function parameters certainly fits this very well. I refrained from calling it "speculative IPA-CP" and instead called it specialization with a parameter test but the idea I think is the same. However, the main difference between "speculative IPA-CP" and this idea is that (if I understand correctly how speculative IPA-CP works) is that instead of changing the callsite C to the following version: C: mul_test_param (c, d); speculative IPA-CP will have the transformation C: if (a == 2) mul.constprop1 (a) else if (a == 4) mul.constprop2 (a) else mul (a, b) Which depending on how many non compile-time constant callsites there are, will increase the size of the binary. That's why I prefer the wrapper around the function. But this would be essentially inlining the wrapper. As of now, the only "speculative IPA-CP" that I've seen is the speculation on the target of the indirect edge. I could look into exactly how this mechanism works to try to instead speculate on an arbitrary argument. Do you think that would be a good first step instead of creating a wrapper and replacing the edges to the wrapper?
Re: Creating a wrapper around a function at compile time
On Thu, 14 Jul 2022 at 15:51, Richard Biener wrote: > With a value-profile it would be per call site and IPA-CP can use that > to direct the cloning. I'm not sure how many > values a value histogram can track reliably here. > Last time I checked, value profiling can only track a single value per statement. So that would need to be augmented if we wanted to specialize for more than one parameter. However, this can also be a bad idea. Because even if parameters a and b are quasi-constants, perhaps the cross product a x b follows a more uniform distribution. This means that optimizing on a or on b might be a good idea but optimizing on a x b is a bad idea. (There is still some work to be done defining when specializing is a good idea or not). Also, I do not believe that value profiling profiles arguments? According to the comments only values regarding division and/or indirect/virtual call specialization are tracked during value profiling. However, even if value profiling would be a great addition to this, I would like to explore the transformation independent of value profiling. There are some heuristics that could be used, for example looking at the callsites which do have constant arguments and which optimizations are enabled by those constants. You mention that: "IPA-CP can use that to direct the cloning". Can you elaborate a bit further? I guess the idea here is augmenting ipa-cp with a list of "likely values" and extend the infrastructure which deals with speculation to support arguments. Am I following correctly? Any other suggestions?
Re: Creating a wrapper around a function at compile time
On Thu, 14 Jul 2022 at 16:10, Martin Liška wrote: > On 7/14/22 16:08, Erick Ochoa via Gcc wrote: > > Last time I checked, value profiling can only track a single value per > > statement. > > Hi. > > Take a look at HIST_TYPE_INDIR_CALL which we use for tracking at maximum 32 > (#define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32) and we use for indirect call > speculative calls which you can see for instance here: > > ./gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C > Thanks Martin, I'll give it a read. However, I have mis-spoken. If my understanding is correct: multiple values are tracked, but only the values of a single variable/expression per statement are tracked. That means that for a gcall (which is a single statement and) which has n argument expressions, I believe that the naive way to track all argument expressions is not possible without extending how histograms are associated to statements. Perhaps canonicalizing how callsites work (i.e., only variables are allowed as arguments in call sites and then associating a histogram to the definition of the variables being used in call sites) would be enough, but I haven't given it much thought for the consequences that might follow from this. > > Cheers, > Martin >
Re: Creating a wrapper around a function at compile time
Awesome! Thanks! Let's go a little bit into the transformation mechanics itself. I am somewhat familiar with LTO but I am always confused about the transformations scheduled during WPA. Let's assume that: 1. we have the profiling pass, which profiles each argument in each callsite. 2. we are now running in the -fprofile-use so that means that we have access to the value profiles for each of the arguments 3. there's an IPA/LTO pass which creates edge summaries storing "likely" values for each of callsite/argument 4. we have a heuristic that let's us decide to some extent which subset of likely values will yield good results (heavily used and specialization is greatly beneficial if found) and run it during WPA. Then, what happens next? I know that the idea is to create some parameter tests at the callsite and call the respective parameter. The gap in my understanding is that at this point I am assuming we are in WPA and therefore have no function bodies to add these parameter tests. Would it be sufficient to do the following: 1. Store in optimization (edge) summaries the (argument position x value x specialized function version) 2. During LTRANS actually perform the transformation for the parameter test Or is there some other way? I am still working on understanding how ipa_make_edge_direct_to_target with speculative = true adds the edge and function test during WPA. Otherwise, if we go with the two steps above, the edges won't be visible to the rest of the IPA optimizations. I'll be reading the cgraph_edge::make_speculative and the rest of the infrastructure a bit more closely... Any help or direction is greatly appreciated. Thanks! -Erick