External email: Use caution opening links or attachments
On 15/09/24 18:04, Jan Hubicka wrote:
External email: Use caution opening links or attachments
Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).
Hi,
Hi,
Thanks for the feedback! A few comments:
* Proposed Solution *
* Extending IPA-CP:
1. Add return jump function data structures
- This involves updating ipa_node_params to contain information
regarding the
return statements of the function, namely the lattice and the jump
function
describing the return value, where both use existing data structures.
- The ipa_node_params class is reused to avoid introducing a new class
and a
corresponding function_summary type, though it is trivial to add if
deemed
a better solution. The ipa_return_value_summary class may also be a good
place for this information, however using it would involve moving it from
ipa-prop.cc to ipa-prop.h.
Yes, I think the jump functions for return values are not realy
different from jump functions for call parameters, so indeed we want to
reuse same datatstructure.
We want to track all the usual stuff we do (value ranges, constant
values, known fields in structures...)
In longer run we will want to extend jump functions to represent
properties like "value returned by call X is passed as argument n to
call Y". But for the first version we probably do not need to worry
about this.
- Additionally, ipa_edge_args is updated to track whether or not it is a
callgraph edge originating from a return statement. This enables the
propagation of information in the WPA phase.
What you exactly mean by callgraph edge originating from return
statement?
As a first step, there are three main types of expressions in return statements
we are analyzing:
- Direct returns of formal parameters
- PHI nodes (which can be nested/recursive)
- Call expressions (this is the information we are propagating in ipa-cp)
For the first version of our patch, we are using ipa-cp as a framework only for
analysis and propagation, not for transformations. So, we are detecting
candidates for attribution with "returns_arg", and propagating that information
from callee to caller in ipa-cp in the hopes that we can annotate callers as
well by using the information from the callee.
This is why we need to detect when a callgraph edge "originates" from a return
statement, so that we know which callees to propagate to callers from. We are
interested in walking from callees to callers, so this allows us to effectively
do that.
Consider the following example:
int foo (int x) { return x; } /* <- foo becomes returns_arg (1) */
int bar (int x) { return foo (x); }
In this example, bar calls foo in a return statement so the edge from bar to foo
becomes a "return edge". This allows us to propagate "returns_arg" information
from foo to bar, so we are able to annotate bar as well.
2. LGEN
- Set up return jump functions for each function similar to the parameter
jump function. When it cannot be conclusively determined that one formal
parameter is always returned (such as conditionally returning either of
two), the jump function is marked as invalid.
- This involves looking through phi nodes when return statements of
conditional branches are merged into a single exit block. Note that
returning a call expression does not count towards invalidating the
information for that function.
The analysis is essentially the same as for function parameters, just
looking at the return statements, so I think most of code can be shared
with what we have.
3. WPA
- Implement return value information propagation. It is not possible to do
this in the existing propagate_constants_topo function, because it
iterates
in reverse postorder i.e. from caller to callee. However return value
propagation needs to be done from callee to caller, thus there is a
need to
iterate in a postorder fashion.
- The lattice values are initialized using the jump functions computed
in the
LGEN phase. These values are then propagated over the callgraph.
- The existing lattices are reused, with three possible values like usual.
The possible values are:
return_lattice -> { TOP, param_decl, BOTTOM }
where TOP and BOTTOM are defined as usual. param_decl refers to the
tree of
either the parameter declaration or its type, whichever is available. The
meet operator is defined as usual for TOP and BOTTOM. When both are
param_decl, the following meet operation is defined:
meet(x, y) = x if x == y, BOTTOM otherwise
- Finally, nodes to which no information could be propagated are marked as
BOTTOM and the BOTTOM value is propagated to a fixed-point. This is
required
when information about a caller has been inferred but not about one of
its
callees. For example:
extern int foo(int);
__attribute__((noinline))
int bar (int z)
{
return foo (z);
}
__attribute__((noinline))
int baz (int z)
{
return z;
}
int quux (int z)
{
if (z & 1)
return bar (z);
else
return baz (z);
}
In this case, after propagation, the lattice values look like the
following:
- bar/ret -> TOP
- baz/ret -> baz/0 (first parameter being returned)
- quux/ret -> quux/0 (lattice partially inferred from baz)
Here, we are not able to infer any information about bar because it calls a
function foo that is not visible to the compiler. If we only set bar to
BOTTOM and do not propagate this BOTTOM value to quux, we will incorrectly
infer that quux always returns its 1st parameter.
Yep, it is essentially a simple dataflow. propagate_constants_topo
can probably be still saved by simply considering calls with interesting
return functions as bi-directional for discovery of strongly connected
component.
Somewhat tricky will be implementing cloning heuristics, but that is
also a next step.
Thanks, this looks like a good idea, and we would like to implement bi-
directional flow for existing propagations in a follow up patch. For now, would
it be OK to have a separate dataflow pass from callees to callers only for
propagating returns_arg attribute?
4. LTRANS
- The computed information is used to set a field denoting the returned
formal parameter in the ipcp_transformation class. This is streamed
out to
bitcode at the end of WPA and read back in the LTRANS phase.
- This information is currently used to annotate functions with the "fn
spec"
internal annotation in a prototype patch. This is only a stop-gap measure
given the limitations that exist on "fn spec" (detailed in the next
section).
Consider the following example:
__attribute__((noinline))
static int f3 (int x, int y)
{
use (y);
return x;
}
int f2 (int x, int y)
{
return f3 (x, y);
}
After LGEN, the jump functions will look like the following:
- f3/ret -> f3/0 (f3 returns the 0th parameter, i.e. x)
- f2/ret -> unknown (but not invalid, because it returns a call)
Therefore, the lattices are initialized at the start of WPA in the following
manner:
- f3/lat -> 0
- f2/lat -> TOP
Simple propagation gives f2/lat the final value of 0. After LTRANS, with the
current patch, the code would end up like the following:
__attribute__((fn spec ("1 "), noinline))
static int f3 (int x, int y)
{
use (y);
return x;
}
__attribute__((fn spec ("1 ")))
int f2 (int x, int y)
{
return f3 (x, y);
}
* Other Options *
* ipa-pure-const pass:
The "ipa-pure-const" pass can also be used to implement this analysis by using
the same local analysis and storing inferred information in the function
declaration using the "fn spec" annotation. However, this is quite restrictive
because it does not allow handling cases such as "return x + 1;", where the
formal parameter has an additional binary operation with a constant operand.
ipa-prop and ipa-cp is the right place to handle this problem.
The "fn spec" annotation is also quite limited as it only allows tracking
returns of up to the 4th formal parameter of the function.
* Existing Solutions *
Patch 53ba8d669550d3a1f809048428b97ca607f95cf5 introduces a new class,
ipa_return_value_summary, to store range information of the return value and
extends the tree-vrp pass to store inferred information in this class. This is
then used in fold_using_range. However, this only works within a single
translation unit.
Yep, it is useful to have the local propagation since there are cases we
can optimize before IPA passes cheaply, but we do want to have full
IPA-CP implementation at WPA stage too.
We can probably put the results to ipa_return_value_summary and make it
similar to summaries produced by ipa-prop.
Unfortunately, we were not able to reuse ipa_return_value_summary. Trying
to share the data structure across tree and ipa passes just proved too
difficult with memory management, as we ended up getting segmentation faults
from various cgraph hooks.
So we ended up introducing a new summary class for this information and making
it work like the other ipa-prop summary classes. The plan is to load this new
class with information from ipa_return_value_summary when ipa_analyze_node is
called.
- Does this sound reasonable?
Yes, it would be really nice to have this.
- Would it be better to reuse ipa_node_params, ipa_return_value_summary, or
introduce a new class entirely?
I would try to share datastructures since we will be adding more info
over time.
- What are other optimizations that could be extended with this information?
ipa-prop propagates aggreagete values that are interesting here too.
Also polymorphic call contextes can probably be handled.
- Could the return value range information be improved in any other way?
- Should build_int_cst be used to create the value stored in the lattice,
instead of the tree corresponding to the parameter?
- Should a new class be created to correspond to return jump functions,
something like ipa_return_jump_func?
- Should a constant jump function (instead of a pass-through one) be used to
represent cases where the returned parameter is known?
How the return jump function would differ from the class we have for
(argument) jump functions?
They would not, just wanted to clarify.
Also, after further discussion, we will be using pass-through jump functions for
representing returned parameters, as it allows us to represent expressions like
"x + 1" that constant jump functions cannot.
Honza
--
Regards,
Dhruv
--
Regards,
Dhruv
--
Regards,
Dhruv