[RFC] Return Value Propagation in IPA-CP
* Table Of Contents * - Introduction - Motivating Test Cases - Proposed Solution - Other Options - Existing Solutions - Primary Optimizations To Be Improved - Front-End Attribute For Recording Return-Value Information - Future Work - References - Questions * Introduction * Return jump functions offer a way of modeling return values of a function in terms of its formal parameters, similar to how regular jump functions can be used to model the actual parameters of a callee at a callsite in terms of the formal parameters of the caller. By discovering this information and propagating it across the program callgraph in LTO mode, various transformations like value-range propagation, sibling-call and chaining-call optimization can be augmented to take advantage of it and potentially perform better optimizations. Currently, ipa-cp models parameters using jump functions, propagates the values of those jump functions by iterating to a fixed-point using lattices, and clones functions by taking decisions from those lattice values. We propose extending ipa-cp with the analysis and propagation of return jump functions, starting with the basic case of a direct return of the form "return x;" where "x" is any formal parameter of the function. * Motivating Test Cases * * Optimizing tail calls: - PR92867 char buf[128] = { 1 }; char *foo (__SIZE_TYPE__ n) { return __builtin_memset (buf, ' ', n); } char *bar (__SIZE_TYPE__ n) { __builtin_memset (buf, ' ', n); return buf; } Here, the call to __builtin_memset in foo gets successfully converted to a tail-call, however the call in bar cannot be optimized to a tail-call. This is an optimization that LLVM is able to perform. Link to compiler explorer: https://godbolt.org/z/E81axqWTb - PR67797 #include void *my_func(void *s, size_t n) { memset(s, 0, n); return s; } This is a similar case to the above example. LLVM is able to optimize the call to memset to be a tail-call as well. Link to compiler explorer: https://godbolt.org/z/YzjGc59f8 * Optimizing chaining calls: #include void f (int x, int y) { std::cout << x; std::cout << y; } This function can be optimized to the following: #include void f (int x, int y) { std::cout << x << y; } LLVM is able to perform this optimization as well: https://godbolt.org/z/MW75on1o5 * Proposed Solution * * Extending IPA-CP: 1. Add return jump function data structures - This involves updating ipa_node_params to contain information regarding the return statements of the function, namely the lattice and the jump function describing the return value, where both use existing data structures. - The ipa_node_params class is reused to avoid introducing a new class and a corresponding function_summary type, though it is trivial to add if deemed a better solution. The ipa_return_value_summary class may also be a good place for this information, however using it would involve moving it from ipa-prop.cc to ipa-prop.h. - Additionally, ipa_edge_args is updated to track whether or not it is a callgraph edge originating from a return statement. This enables the propagation of information in the WPA phase. 2. LGEN - Set up return jump functions for each function similar to the parameter jump function. When it cannot be conclusively determined that one formal parameter is always returned (such as conditionally returning either of two), the jump function is marked as invalid. - This involves looking through phi nodes when return statements of conditional branches are merged into a single exit block. Note that returning a call expression does not count towards invalidating the information for that function. 3. WPA - Implement return value information propagation. It is not possible to do this in the existing propagate_constants_topo function, because it iterates in reverse postorder i.e. from caller to callee. However return value propagation needs to be done from callee to caller, thus there is a need to iterate in a postorder fashion. - The lattice values are initialized using the jump functions computed in the LGEN phase. These values are then propagated over the callgraph. - The existing lattices are reused, with three possible values like usual. The possible values are: return_lattice -> { TOP, param_decl, BOTTOM } where TOP and BOTTOM are defined as usual. param_decl refers to the tree of either the parameter declaration or its type, whichever is available. The meet operator is defined as usual for TOP and BOTTOM. When both are param_decl, the following meet operation is defined: meet(x, y) = x if x == y, BOTTOM otherwise - Finally, nodes to which no information could be propagated are marked as BOTTOM and the BOTTOM value is propagated to a fixed-point. This is required when information about a caller has been inferred but not about one of its callees. For example: ex
Re: [RFC] Return Value Propagation in IPA-CP
Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html). On 22/08/24 11:32, Dhruv Chawla via Gcc wrote: External email: Use caution opening links or attachments * Table Of Contents * - Introduction - Motivating Test Cases - Proposed Solution - Other Options - Existing Solutions - Primary Optimizations To Be Improved - Front-End Attribute For Recording Return-Value Information - Future Work - References - Questions * Introduction * Return jump functions offer a way of modeling return values of a function in terms of its formal parameters, similar to how regular jump functions can be used to model the actual parameters of a callee at a callsite in terms of the formal parameters of the caller. By discovering this information and propagating it across the program callgraph in LTO mode, various transformations like value-range propagation, sibling-call and chaining-call optimization can be augmented to take advantage of it and potentially perform better optimizations. Currently, ipa-cp models parameters using jump functions, propagates the values of those jump functions by iterating to a fixed-point using lattices, and clones functions by taking decisions from those lattice values. We propose extending ipa-cp with the analysis and propagation of return jump functions, starting with the basic case of a direct return of the form "return x;" where "x" is any formal parameter of the function. * Motivating Test Cases * * Optimizing tail calls: - PR92867 char buf[128] = { 1 }; char *foo (__SIZE_TYPE__ n) { return __builtin_memset (buf, ' ', n); } char *bar (__SIZE_TYPE__ n) { __builtin_memset (buf, ' ', n); return buf; } Here, the call to __builtin_memset in foo gets successfully converted to a tail-call, however the call in bar cannot be optimized to a tail-call. This is an optimization that LLVM is able to perform. Link to compiler explorer: https://godbolt.org/z/E81axqWTb - PR67797 #include void *my_func(void *s, size_t n) { memset(s, 0, n); return s; } This is a similar case to the above example. LLVM is able to optimize the call to memset to be a tail-call as well. Link to compiler explorer: https://godbolt.org/z/YzjGc59f8 * Optimizing chaining calls: #include void f (int x, int y) { std::cout << x; std::cout << y; } This function can be optimized to the following: #include void f (int x, int y) { std::cout << x << y; } LLVM is able to perform this optimization as well: https://godbolt.org/z/MW75on1o5 * Proposed Solution * * Extending IPA-CP: 1. Add return jump function data structures - This involves updating ipa_node_params to contain information regarding the return statements of the function, namely the lattice and the jump function describing the return value, where both use existing data structures. - The ipa_node_params class is reused to avoid introducing a new class and a corresponding function_summary type, though it is trivial to add if deemed a better solution. The ipa_return_value_summary class may also be a good place for this information, however using it would involve moving it from ipa-prop.cc to ipa-prop.h. - Additionally, ipa_edge_args is updated to track whether or not it is a callgraph edge originating from a return statement. This enables the propagation of information in the WPA phase. 2. LGEN - Set up return jump functions for each function similar to the parameter jump function. When it cannot be conclusively determined that one formal parameter is always returned (such as conditionally returning either of two), the jump function is marked as invalid. - This involves looking through phi nodes when return statements of conditional branches are merged into a single exit block. Note that returning a call expression does not count towards invalidating the information for that function. 3. WPA - Implement return value information propagation. It is not possible to do this in the existing propagate_constants_topo function, because it iterates in reverse postorder i.e. from caller to callee. However return value propagation needs to be done from callee to caller, thus there is a need to iterate in a postorder fashion. - The lattice values are initialized using the jump functions computed in the LGEN phase. These values are then propagated over the callgraph. - The existing lattices are reused, with three possible values like usual. The possible values are: return_lattice -> { TOP, param_decl, BOTTOM } where TOP and BOTTOM are defined as usual. param_decl refers to the tree of either the parameter declaration or its type, whichever is available. The meet operator is defined as usual for TOP and BOTTOM. When both are param_decl, the following meet operation is defined: meet(x, y) = x if x == y,
Re: [RFC] Return Value Propagation in IPA-CP
Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html). On 22/08/24 11:32, Dhruv Chawla via Gcc wrote: External email: Use caution opening links or attachments * Table Of Contents * - Introduction - Motivating Test Cases - Proposed Solution - Other Options - Existing Solutions - Primary Optimizations To Be Improved - Front-End Attribute For Recording Return-Value Information - Future Work - References - Questions * Introduction * Return jump functions offer a way of modeling return values of a function in terms of its formal parameters, similar to how regular jump functions can be used to model the actual parameters of a callee at a callsite in terms of the formal parameters of the caller. By discovering this information and propagating it across the program callgraph in LTO mode, various transformations like value-range propagation, sibling-call and chaining-call optimization can be augmented to take advantage of it and potentially perform better optimizations. Currently, ipa-cp models parameters using jump functions, propagates the values of those jump functions by iterating to a fixed-point using lattices, and clones functions by taking decisions from those lattice values. We propose extending ipa-cp with the analysis and propagation of return jump functions, starting with the basic case of a direct return of the form "return x;" where "x" is any formal parameter of the function. * Motivating Test Cases * * Optimizing tail calls: - PR92867 char buf[128] = { 1 }; char *foo (__SIZE_TYPE__ n) { return __builtin_memset (buf, ' ', n); } char *bar (__SIZE_TYPE__ n) { __builtin_memset (buf, ' ', n); return buf; } Here, the call to __builtin_memset in foo gets successfully converted to a tail-call, however the call in bar cannot be optimized to a tail-call. This is an optimization that LLVM is able to perform. Link to compiler explorer: https://godbolt.org/z/E81axqWTb - PR67797 #include void *my_func(void *s, size_t n) { memset(s, 0, n); return s; } This is a similar case to the above example. LLVM is able to optimize the call to memset to be a tail-call as well. Link to compiler explorer: https://godbolt.org/z/YzjGc59f8 * Optimizing chaining calls: #include void f (int x, int y) { std::cout << x; std::cout << y; } This function can be optimized to the following: #include void f (int x, int y) { std::cout << x << y; } LLVM is able to perform this optimization as well: https://godbolt.org/z/MW75on1o5 * Proposed Solution * * Extending IPA-CP: 1. Add return jump function data structures - This involves updating ipa_node_params to contain information regarding the return statements of the function, namely the lattice and the jump function describing the return value, where both use existing data structures. - The ipa_node_params class is reused to avoid introducing a new class and a corresponding function_summary type, though it is trivial to add if deemed a better solution. The ipa_return_value_summary class may also be a good place for this information, however using it would involve moving it from ipa-prop.cc to ipa-prop.h. - Additionally, ipa_edge_args is updated to track whether or not it is a callgraph edge originating from a return statement. This enables the propagation of information in the WPA phase. 2. LGEN - Set up return jump functions for each function similar to the parameter jump function. When it cannot be conclusively determined that one formal parameter is always returned (such as conditionally returning either of two), the jump function is marked as invalid. - This involves looking through phi nodes when return statements of conditional branches are merged into a single exit block. Note that returning a call expression does not count towards invalidating the information for that function. 3. WPA - Implement return value information propagation. It is not possible to do this in the existing propagate_constants_topo function, because it iterates in reverse postorder i.e. from caller to callee. However return value propagation needs to be done from callee to caller, thus there is a need to iterate in a postorder fashion. - The lattice values are initialized using the jump functions computed in the LGEN phase. These values are then propagated over the callgraph. - The existing lattices are reused, with three possible values like usual. The possible values are: return_lattice -> { TOP, param_decl, BOTTOM } where TOP and BOTTOM are defined as usual. param_decl refers to the tree of either the parameter declaration or its type, whichever is available. The meet operator is defined as usual for TOP and BOTTOM. When both are param_decl, the following meet operation is defined: meet(x, y) = x if x == y,
Re: [RFC] Return Value Propagation in IPA-CP
Ping! On 25/09/24 14:44, Dhruv Chawla via Gcc wrote: External email: Use caution opening links or attachments On 15/09/24 18:04, Jan Hubicka wrote: External email: Use caution opening links or attachments Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html). Hi, Hi, Thanks for the feedback! A few comments: * Proposed Solution * * Extending IPA-CP: 1. Add return jump function data structures - This involves updating ipa_node_params to contain information regarding the return statements of the function, namely the lattice and the jump function describing the return value, where both use existing data structures. - The ipa_node_params class is reused to avoid introducing a new class and a corresponding function_summary type, though it is trivial to add if deemed a better solution. The ipa_return_value_summary class may also be a good place for this information, however using it would involve moving it from ipa-prop.cc to ipa-prop.h. Yes, I think the jump functions for return values are not realy different from jump functions for call parameters, so indeed we want to reuse same datatstructure. We want to track all the usual stuff we do (value ranges, constant values, known fields in structures...) In longer run we will want to extend jump functions to represent properties like "value returned by call X is passed as argument n to call Y". But for the first version we probably do not need to worry about this. - Additionally, ipa_edge_args is updated to track whether or not it is a callgraph edge originating from a return statement. This enables the propagation of information in the WPA phase. What you exactly mean by callgraph edge originating from return statement? As a first step, there are three main types of expressions in return statements we are analyzing: - Direct returns of formal parameters - PHI nodes (which can be nested/recursive) - Call expressions (this is the information we are propagating in ipa-cp) For the first version of our patch, we are using ipa-cp as a framework only for analysis and propagation, not for transformations. So, we are detecting candidates for attribution with "returns_arg", and propagating that information from callee to caller in ipa-cp in the hopes that we can annotate callers as well by using the information from the callee. This is why we need to detect when a callgraph edge "originates" from a return statement, so that we know which callees to propagate to callers from. We are interested in walking from callees to callers, so this allows us to effectively do that. Consider the following example: int foo (int x) { return x; } /* <- foo becomes returns_arg (1) */ int bar (int x) { return foo (x); } In this example, bar calls foo in a return statement so the edge from bar to foo becomes a "return edge". This allows us to propagate "returns_arg" information from foo to bar, so we are able to annotate bar as well. 2. LGEN - Set up return jump functions for each function similar to the parameter jump function. When it cannot be conclusively determined that one formal parameter is always returned (such as conditionally returning either of two), the jump function is marked as invalid. - This involves looking through phi nodes when return statements of conditional branches are merged into a single exit block. Note that returning a call expression does not count towards invalidating the information for that function. The analysis is essentially the same as for function parameters, just looking at the return statements, so I think most of code can be shared with what we have. 3. WPA - Implement return value information propagation. It is not possible to do this in the existing propagate_constants_topo function, because it iterates in reverse postorder i.e. from caller to callee. However return value propagation needs to be done from callee to caller, thus there is a need to iterate in a postorder fashion. - The lattice values are initialized using the jump functions computed in the LGEN phase. These values are then propagated over the callgraph. - The existing lattices are reused, with three possible values like usual. The possible values are: return_lattice -> { TOP, param_decl, BOTTOM } where TOP and BOTTOM are defined as usual. param_decl refers to the tree of either the parameter declaration or its type, whichever is available. The meet operator is defined as usual for TOP and BOTTOM. When both are param_decl, the following meet operation is defined: meet(x, y) = x if x == y, BOTTOM otherwise - Finally, nodes to which no information could be propagated are marked as BOTTOM and the BOTTOM value is propagated to a fixed-point. This is required
Re: [RFC] Return Value Propagation in IPA-CP
Hi, Thanks for the reply! On 16/10/24 20:25, Martin Jambor wrote: External email: Use caution opening links or attachments Hello, first and foremost, sorry for a late reply. I needed to take a larger leave of absence for family reasons. Comments inline: On Thu, Aug 22 2024, Dhruv Chawla via Gcc wrote: * Table Of Contents * - Introduction - Motivating Test Cases - Proposed Solution - Other Options - Existing Solutions - Primary Optimizations To Be Improved - Front-End Attribute For Recording Return-Value Information - Future Work - References - Questions * Introduction * Return jump functions offer a way of modeling return values of a function in terms of its formal parameters, similar to how regular jump functions can be used to model the actual parameters of a callee at a callsite in terms of the formal parameters of the caller. By discovering this information and propagating it across the program callgraph in LTO mode, various transformations like value-range propagation, sibling-call and chaining-call optimization can be augmented to take advantage of it and potentially perform better optimizations. Currently, ipa-cp models parameters using jump functions, propagates the values of those jump functions by iterating to a fixed-point using lattices, and clones functions by taking decisions from those lattice values. We propose extending ipa-cp with the analysis and propagation of return jump functions, starting with the basic case of a direct return of the form "return x;" where "x" is any formal parameter of the function. * Motivating Test Cases * * Optimizing tail calls: - PR92867 char buf[128] = { 1 }; char *foo (__SIZE_TYPE__ n) { return __builtin_memset (buf, ' ', n); } char *bar (__SIZE_TYPE__ n) { __builtin_memset (buf, ' ', n); return buf; } Here, the call to __builtin_memset in foo gets successfully converted to a tail-call, however the call in bar cannot be optimized to a tail-call. This is an optimization that LLVM is able to perform. Link to compiler explorer: https://godbolt.org/z/E81axqWTb - PR67797 #include void *my_func(void *s, size_t n) { memset(s, 0, n); return s; } This is a similar case to the above example. LLVM is able to optimize the call to memset to be a tail-call as well. Link to compiler explorer: https://godbolt.org/z/YzjGc59f8 * Optimizing chaining calls: #include void f (int x, int y) { std::cout << x; std::cout << y; } This function can be optimized to the following: #include void f (int x, int y) { std::cout << x << y; } LLVM is able to perform this optimization as well: https://godbolt.org/z/MW75on1o5 * Proposed Solution * * Extending IPA-CP: 1. Add return jump function data structures - This involves updating ipa_node_params to contain information regarding the return statements of the function, namely the lattice and the jump function describing the return value, where both use existing data structures. - The ipa_node_params class is reused to avoid introducing a new class and a corresponding function_summary type, though it is trivial to add if deemed a better solution. The ipa_return_value_summary class may also be a good place for this information, however using it would involve moving it from ipa-prop.cc to ipa-prop.h. I believe that ipa_return_value_summary should become a member of ipa_node_params (perhaps we could think of a better name) and ipa_return_value_sum should be removed. All infrastructure there is to create ipa_return_value_summary should be re-used or extended to fulfill also the new needs. As per the reply at https://gcc.gnu.org/pipermail/gcc/2024- September/244856.html, we ended up splitting this out into a new structure. Having both tree and IPA passes accessing the same structure made memory management hard. - Additionally, ipa_edge_args is updated to track whether or not it is a callgraph edge originating from a return statement. This enables the propagation of information in the WPA phase. I don't understand. I have read your reply to Honza to his question about this and I still don't understand what you mean by this. I believe we want to have a return function for each call graph edge, certainly for each one that calls a non-void function. Yeah, I believe that could have been phrased better. It just means that we track whether or not an SSA_NAME created as a result of a gcall is used in the retval of a return statement, either directly or through a PHI node. This is achieved by tracking this information in ipa_edge_args (as there is one instance of this structure for each cgraph_edge), so each gcall will have an ipa_edge_args instance associated with it. The return jump function is stored per-node, not per-edge. The per-edge tracking is only done to enable propagation of the "returns_arg" attribute from cal
Re: [RFC] Return Value Propagation in IPA-CP
On 15/09/24 18:04, Jan Hubicka wrote: External email: Use caution opening links or attachments Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html). Hi, Hi, Thanks for the feedback! A few comments: * Proposed Solution * * Extending IPA-CP: 1. Add return jump function data structures - This involves updating ipa_node_params to contain information regarding the return statements of the function, namely the lattice and the jump function describing the return value, where both use existing data structures. - The ipa_node_params class is reused to avoid introducing a new class and a corresponding function_summary type, though it is trivial to add if deemed a better solution. The ipa_return_value_summary class may also be a good place for this information, however using it would involve moving it from ipa-prop.cc to ipa-prop.h. Yes, I think the jump functions for return values are not realy different from jump functions for call parameters, so indeed we want to reuse same datatstructure. We want to track all the usual stuff we do (value ranges, constant values, known fields in structures...) In longer run we will want to extend jump functions to represent properties like "value returned by call X is passed as argument n to call Y". But for the first version we probably do not need to worry about this. - Additionally, ipa_edge_args is updated to track whether or not it is a callgraph edge originating from a return statement. This enables the propagation of information in the WPA phase. What you exactly mean by callgraph edge originating from return statement? As a first step, there are three main types of expressions in return statements we are analyzing: - Direct returns of formal parameters - PHI nodes (which can be nested/recursive) - Call expressions (this is the information we are propagating in ipa-cp) For the first version of our patch, we are using ipa-cp as a framework only for analysis and propagation, not for transformations. So, we are detecting candidates for attribution with "returns_arg", and propagating that information from callee to caller in ipa-cp in the hopes that we can annotate callers as well by using the information from the callee. This is why we need to detect when a callgraph edge "originates" from a return statement, so that we know which callees to propagate to callers from. We are interested in walking from callees to callers, so this allows us to effectively do that. Consider the following example: int foo (int x) { return x; }/* <- foo becomes returns_arg (1) */ int bar (int x) { return foo (x); } In this example, bar calls foo in a return statement so the edge from bar to foo becomes a "return edge". This allows us to propagate "returns_arg" information from foo to bar, so we are able to annotate bar as well. 2. LGEN - Set up return jump functions for each function similar to the parameter jump function. When it cannot be conclusively determined that one formal parameter is always returned (such as conditionally returning either of two), the jump function is marked as invalid. - This involves looking through phi nodes when return statements of conditional branches are merged into a single exit block. Note that returning a call expression does not count towards invalidating the information for that function. The analysis is essentially the same as for function parameters, just looking at the return statements, so I think most of code can be shared with what we have. 3. WPA - Implement return value information propagation. It is not possible to do this in the existing propagate_constants_topo function, because it iterates in reverse postorder i.e. from caller to callee. However return value propagation needs to be done from callee to caller, thus there is a need to iterate in a postorder fashion. - The lattice values are initialized using the jump functions computed in the LGEN phase. These values are then propagated over the callgraph. - The existing lattices are reused, with three possible values like usual. The possible values are: return_lattice -> { TOP, param_decl, BOTTOM } where TOP and BOTTOM are defined as usual. param_decl refers to the tree of either the parameter declaration or its type, whichever is available. The meet operator is defined as usual for TOP and BOTTOM. When both are param_decl, the following meet operation is defined: meet(x, y) = x if x == y, BOTTOM otherwise - Finally, nodes to which no information could be propagated are marked as BOTTOM and the BOTTOM value is propagated to a fixed-point. This is required when information about a caller has been inferred but not about one of its callees. For example: extern int foo(int); __attribute__((noinline)) int bar (int z