Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).

On 22/08/24 11:32, Dhruv Chawla via Gcc wrote:
External email: Use caution opening links or attachments


* Table Of Contents *

- Introduction
- Motivating Test Cases
- Proposed Solution
- Other Options
- Existing Solutions
- Primary Optimizations To Be Improved
- Front-End Attribute For Recording Return-Value Information
- Future Work
- References
- Questions

* Introduction *

Return jump functions offer a way of modeling return values of a function in
terms of its formal parameters, similar to how regular jump functions can be
used to model the actual parameters of a callee at a callsite in terms of the
formal parameters of the caller.

By discovering this information and propagating it across the program callgraph
in LTO mode, various transformations like value-range propagation, sibling-call
and chaining-call optimization can be augmented to take advantage of it and
potentially perform better optimizations.

Currently, ipa-cp models parameters using jump functions, propagates the values
of those jump functions by iterating to a fixed-point using lattices, and
clones functions by taking decisions from those lattice values.

We propose extending ipa-cp with the analysis and propagation of return jump
functions, starting with the basic case of a direct return of the form
"return x;" where "x" is any formal parameter of the function.

* Motivating Test Cases *

* Optimizing tail calls:

- PR92867

char buf[128] = { 1 };

char *foo (__SIZE_TYPE__ n)
{
   return __builtin_memset (buf, ' ', n);
}

char *bar (__SIZE_TYPE__ n)
{
   __builtin_memset (buf, ' ', n);
   return buf;
}

Here, the call to __builtin_memset in foo gets successfully converted to a
tail-call, however the call in bar cannot be optimized to a tail-call. This is
an optimization that LLVM is able to perform.

Link to compiler explorer: https://godbolt.org/z/E81axqWTb

- PR67797

#include <string.h>

void *my_func(void *s, size_t n)
{
     memset(s, 0, n);
     return s;
}

This is a similar case to the above example. LLVM is able to optimize the call
to memset to be a tail-call as well.

Link to compiler explorer: https://godbolt.org/z/YzjGc59f8

* Optimizing chaining calls:

#include <iostream>

void f (int x, int y)
{
   std::cout << x;
   std::cout << y;
}

This function can be optimized to the following:

#include <iostream>

void f (int x, int y)
{
   std::cout << x << y;
}

LLVM is able to perform this optimization as well:
https://godbolt.org/z/MW75on1o5

* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
   - This involves updating ipa_node_params to contain information regarding the
     return statements of the function, namely the lattice and the jump function
     describing the return value, where both use existing data structures.
   - The ipa_node_params class is reused to avoid introducing a new class and a
     corresponding function_summary type, though it is trivial to add if deemed
     a better solution. The ipa_return_value_summary class may also be a good
     place for this information, however using it would involve moving it from
     ipa-prop.cc to ipa-prop.h.
   - Additionally, ipa_edge_args is updated to track whether or not it is a
     callgraph edge originating from a return statement. This enables the
     propagation of information in the WPA phase.

2. LGEN
   - Set up return jump functions for each function similar to the parameter
     jump function. When it cannot be conclusively determined that one formal
     parameter is always returned (such as conditionally returning either of
     two), the jump function is marked as invalid.
   - This involves looking through phi nodes when return statements of
     conditional branches are merged into a single exit block. Note that
     returning a call expression does not count towards invalidating the
     information for that function.

3. WPA
   - Implement return value information propagation. It is not possible to do
     this in the existing propagate_constants_topo function, because it iterates
     in reverse postorder i.e. from caller to callee. However return value
     propagation needs to be done from callee to caller, thus there is a need to
     iterate in a postorder fashion.
   - The lattice values are initialized using the jump functions computed in the
     LGEN phase. These values are then propagated over the callgraph.
   - The existing lattices are reused, with three possible values like usual.
     The possible values are:

       return_lattice -> { TOP, param_decl, BOTTOM }

     where TOP and BOTTOM are defined as usual. param_decl refers to the tree of
     either the parameter declaration or its type, whichever is available. The
     meet operator is defined as usual for TOP and BOTTOM. When both are
     param_decl, the following meet operation is defined:

       meet(x, y) = x if x == y, BOTTOM otherwise

   - Finally, nodes to which no information could be propagated are marked as
     BOTTOM and the BOTTOM value is propagated to a fixed-point. This is 
required
     when information about a caller has been inferred but not about one of its
     callees. For example:

   extern int foo(int);

   __attribute__((noinline))
   int bar (int z)
   {
     return foo (z);
   }

   __attribute__((noinline))
   int baz (int z)
   {
     return z;
   }

   int quux (int z)
   {
     if (z & 1)
       return bar (z);
     else
       return baz (z);
   }

   In this case, after propagation, the lattice values look like the following:
   - bar/ret  ->  TOP
   - baz/ret  ->  baz/0  (first parameter being returned)
   - quux/ret ->  quux/0 (lattice partially inferred from baz)

   Here, we are not able to infer any information about bar because it calls a
   function foo that is not visible to the compiler. If we only set bar to
   BOTTOM and do not propagate this BOTTOM value to quux, we will incorrectly
   infer that quux always returns its 1st parameter.

4. LTRANS
   - The computed information is used to set a field denoting the returned
     formal parameter in the ipcp_transformation class. This is streamed out to
     bitcode at the end of WPA and read back in the LTRANS phase.
   - This information is currently used to annotate functions with the "fn spec"
     internal annotation in a prototype patch. This is only a stop-gap measure
     given the limitations that exist on "fn spec" (detailed in the next
     section).

Consider the following example:

__attribute__((noinline))
static int f3 (int x, int y)
{
   use (y);
   return x;
}

int f2 (int x, int y)
{
   return f3 (x, y);
}

After LGEN, the jump functions will look like the following:
- f3/ret -> f3/0    (f3 returns the 0th parameter, i.e. x)
- f2/ret -> unknown (but not invalid, because it returns a call)

Therefore, the lattices are initialized at the start of WPA in the following
manner:
- f3/lat -> 0
- f2/lat -> TOP

Simple propagation gives f2/lat the final value of 0. After LTRANS, with the
current patch, the code would end up like the following:

__attribute__((fn spec ("1 "), noinline))
static int f3 (int x, int y)
{
   use (y);
   return x;
}

__attribute__((fn spec ("1 ")))
int f2 (int x, int y)
{
   return f3 (x, y);
}

* Other Options *

* ipa-pure-const pass:

The "ipa-pure-const" pass can also be used to implement this analysis by using
the same local analysis and storing inferred information in the function
declaration using the "fn spec" annotation. However, this is quite restrictive
because it does not allow handling cases such as "return x + 1;", where the
formal parameter has an additional binary operation with a constant operand.

The "fn spec" annotation is also quite limited as it only allows tracking
returns of up to the 4th formal parameter of the function.

* Existing Solutions *

Patch 53ba8d669550d3a1f809048428b97ca607f95cf5 introduces a new class,
ipa_return_value_summary, to store range information of the return value and
extends the tree-vrp pass to store inferred information in this class. This is
then used in fold_using_range. However, this only works within a single
translation unit.

We propose extending this solution by integrating it into the ipa-cp
infrastructure and propagating this information across the whole-program
callgraph.

* Primary Optimizations To Be Improved *

1. Value Range Propagation
2. Sibling Call Optimization
3. Chaining Call Optimization

* Front-End Attribute For Recording Return-Value Information *

A new front-end attribute, "returns_arg", is proposed to explicitly annotate
functions that directly return one of their formal parameters. This is useful
when only function declarations are visible, as is the case when function
definitions are present in a separate shared object file.

There is a patch previously implementing returns_arg, which can be used as a
base for implementing this:

https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01199.html

* Future Work *

1. Handling polynomial cases, the most basic being a return statement of the
    form "x + c", where "x" is a formal parameter and "c" is a constant value.
    This is already representable in ipa_jump_func.
2. Handling pass-by-reference cases, where a formal parameter of the function
    is used as an out-value.

* References *

- Interprocedural Constant Propagation: A Study of Jump Function
   Implementations: https://dl.acm.org/doi/pdf/10.1145/155090.155099
- Interprocedural Constant Propagation:
https://courses.cs.washington.edu/courses/cse501/06wi/reading/callahan-pldi86.pdf

* Questions *

- Does this sound reasonable?
- Would it be better to reuse ipa_node_params, ipa_return_value_summary, or
   introduce a new class entirely?
- What are other optimizations that could be extended with this information?
- Could the return value range information be improved in any other way?
- Should build_int_cst be used to create the value stored in the lattice,
   instead of the tree corresponding to the parameter?
- Should a new class be created to correspond to return jump functions,
   something like ipa_return_jump_func?
- Should a constant jump function (instead of a pass-through one) be used to
   represent cases where the returned parameter is known?

--
Regards,
Dhruv


--
Regards,
Dhruv

Reply via email to