[RFC] Return Value Propagation in IPA-CP

2024-08-21 Thread Dhruv Chawla via Gcc

* Table Of Contents *

- Introduction
- Motivating Test Cases
- Proposed Solution
- Other Options
- Existing Solutions
- Primary Optimizations To Be Improved
- Front-End Attribute For Recording Return-Value Information
- Future Work
- References
- Questions

* Introduction *

Return jump functions offer a way of modeling return values of a function in
terms of its formal parameters, similar to how regular jump functions can be
used to model the actual parameters of a callee at a callsite in terms of the
formal parameters of the caller.

By discovering this information and propagating it across the program callgraph
in LTO mode, various transformations like value-range propagation, sibling-call
and chaining-call optimization can be augmented to take advantage of it and
potentially perform better optimizations.

Currently, ipa-cp models parameters using jump functions, propagates the values
of those jump functions by iterating to a fixed-point using lattices, and
clones functions by taking decisions from those lattice values.

We propose extending ipa-cp with the analysis and propagation of return jump
functions, starting with the basic case of a direct return of the form
"return x;" where "x" is any formal parameter of the function.

* Motivating Test Cases *

* Optimizing tail calls:

- PR92867

char buf[128] = { 1 };

char *foo (__SIZE_TYPE__ n)
{
  return __builtin_memset (buf, ' ', n);
}

char *bar (__SIZE_TYPE__ n)
{
  __builtin_memset (buf, ' ', n);
  return buf;
}

Here, the call to __builtin_memset in foo gets successfully converted to a
tail-call, however the call in bar cannot be optimized to a tail-call. This is
an optimization that LLVM is able to perform.

Link to compiler explorer: https://godbolt.org/z/E81axqWTb

- PR67797

#include 

void *my_func(void *s, size_t n)
{
memset(s, 0, n);
return s;
}

This is a similar case to the above example. LLVM is able to optimize the call
to memset to be a tail-call as well.

Link to compiler explorer: https://godbolt.org/z/YzjGc59f8

* Optimizing chaining calls:

#include 

void f (int x, int y)
{
  std::cout << x;
  std::cout << y;
}

This function can be optimized to the following:

#include 

void f (int x, int y)
{
  std::cout << x << y;
}

LLVM is able to perform this optimization as well:
https://godbolt.org/z/MW75on1o5

* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
  - This involves updating ipa_node_params to contain information regarding the
return statements of the function, namely the lattice and the jump function
describing the return value, where both use existing data structures.
  - The ipa_node_params class is reused to avoid introducing a new class and a
corresponding function_summary type, though it is trivial to add if deemed
a better solution. The ipa_return_value_summary class may also be a good
place for this information, however using it would involve moving it from
ipa-prop.cc to ipa-prop.h.
  - Additionally, ipa_edge_args is updated to track whether or not it is a
callgraph edge originating from a return statement. This enables the
propagation of information in the WPA phase.

2. LGEN
  - Set up return jump functions for each function similar to the parameter
jump function. When it cannot be conclusively determined that one formal
parameter is always returned (such as conditionally returning either of
two), the jump function is marked as invalid.
  - This involves looking through phi nodes when return statements of
conditional branches are merged into a single exit block. Note that
returning a call expression does not count towards invalidating the
information for that function.

3. WPA
  - Implement return value information propagation. It is not possible to do
this in the existing propagate_constants_topo function, because it iterates
in reverse postorder i.e. from caller to callee. However return value
propagation needs to be done from callee to caller, thus there is a need to
iterate in a postorder fashion.
  - The lattice values are initialized using the jump functions computed in the
LGEN phase. These values are then propagated over the callgraph.
  - The existing lattices are reused, with three possible values like usual.
The possible values are:

  return_lattice -> { TOP, param_decl, BOTTOM }

where TOP and BOTTOM are defined as usual. param_decl refers to the tree of
either the parameter declaration or its type, whichever is available. The
meet operator is defined as usual for TOP and BOTTOM. When both are
param_decl, the following meet operation is defined:

  meet(x, y) = x if x == y, BOTTOM otherwise

  - Finally, nodes to which no information could be propagated are marked as
BOTTOM and the BOTTOM value is propagated to a fixed-point. This is required
when information about a caller has been inferred but not about one of its
callees. For example:

  ex

Re: [RFC] Return Value Propagation in IPA-CP

2024-08-28 Thread Dhruv Chawla via Gcc

Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).

On 22/08/24 11:32, Dhruv Chawla via Gcc wrote:

External email: Use caution opening links or attachments


* Table Of Contents *

- Introduction
- Motivating Test Cases
- Proposed Solution
- Other Options
- Existing Solutions
- Primary Optimizations To Be Improved
- Front-End Attribute For Recording Return-Value Information
- Future Work
- References
- Questions

* Introduction *

Return jump functions offer a way of modeling return values of a function in
terms of its formal parameters, similar to how regular jump functions can be
used to model the actual parameters of a callee at a callsite in terms of the
formal parameters of the caller.

By discovering this information and propagating it across the program callgraph
in LTO mode, various transformations like value-range propagation, sibling-call
and chaining-call optimization can be augmented to take advantage of it and
potentially perform better optimizations.

Currently, ipa-cp models parameters using jump functions, propagates the values
of those jump functions by iterating to a fixed-point using lattices, and
clones functions by taking decisions from those lattice values.

We propose extending ipa-cp with the analysis and propagation of return jump
functions, starting with the basic case of a direct return of the form
"return x;" where "x" is any formal parameter of the function.

* Motivating Test Cases *

* Optimizing tail calls:

- PR92867

char buf[128] = { 1 };

char *foo (__SIZE_TYPE__ n)
{
   return __builtin_memset (buf, ' ', n);
}

char *bar (__SIZE_TYPE__ n)
{
   __builtin_memset (buf, ' ', n);
   return buf;
}

Here, the call to __builtin_memset in foo gets successfully converted to a
tail-call, however the call in bar cannot be optimized to a tail-call. This is
an optimization that LLVM is able to perform.

Link to compiler explorer: https://godbolt.org/z/E81axqWTb

- PR67797

#include 

void *my_func(void *s, size_t n)
{
     memset(s, 0, n);
     return s;
}

This is a similar case to the above example. LLVM is able to optimize the call
to memset to be a tail-call as well.

Link to compiler explorer: https://godbolt.org/z/YzjGc59f8

* Optimizing chaining calls:

#include 

void f (int x, int y)
{
   std::cout << x;
   std::cout << y;
}

This function can be optimized to the following:

#include 

void f (int x, int y)
{
   std::cout << x << y;
}

LLVM is able to perform this optimization as well:
https://godbolt.org/z/MW75on1o5

* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
   - This involves updating ipa_node_params to contain information regarding the
     return statements of the function, namely the lattice and the jump function
     describing the return value, where both use existing data structures.
   - The ipa_node_params class is reused to avoid introducing a new class and a
     corresponding function_summary type, though it is trivial to add if deemed
     a better solution. The ipa_return_value_summary class may also be a good
     place for this information, however using it would involve moving it from
     ipa-prop.cc to ipa-prop.h.
   - Additionally, ipa_edge_args is updated to track whether or not it is a
     callgraph edge originating from a return statement. This enables the
     propagation of information in the WPA phase.

2. LGEN
   - Set up return jump functions for each function similar to the parameter
     jump function. When it cannot be conclusively determined that one formal
     parameter is always returned (such as conditionally returning either of
     two), the jump function is marked as invalid.
   - This involves looking through phi nodes when return statements of
     conditional branches are merged into a single exit block. Note that
     returning a call expression does not count towards invalidating the
     information for that function.

3. WPA
   - Implement return value information propagation. It is not possible to do
     this in the existing propagate_constants_topo function, because it iterates
     in reverse postorder i.e. from caller to callee. However return value
     propagation needs to be done from callee to caller, thus there is a need to
     iterate in a postorder fashion.
   - The lattice values are initialized using the jump functions computed in the
     LGEN phase. These values are then propagated over the callgraph.
   - The existing lattices are reused, with three possible values like usual.
     The possible values are:

   return_lattice -> { TOP, param_decl, BOTTOM }

     where TOP and BOTTOM are defined as usual. param_decl refers to the tree of
     either the parameter declaration or its type, whichever is available. The
     meet operator is defined as usual for TOP and BOTTOM. When both are
     param_decl, the following meet operation is defined:

   meet(x, y) = x if x == y, 

Re: [RFC] Return Value Propagation in IPA-CP

2024-09-09 Thread Dhruv Chawla via Gcc

Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).

On 22/08/24 11:32, Dhruv Chawla via Gcc wrote:

External email: Use caution opening links or attachments


* Table Of Contents *

- Introduction
- Motivating Test Cases
- Proposed Solution
- Other Options
- Existing Solutions
- Primary Optimizations To Be Improved
- Front-End Attribute For Recording Return-Value Information
- Future Work
- References
- Questions

* Introduction *

Return jump functions offer a way of modeling return values of a function in
terms of its formal parameters, similar to how regular jump functions can be
used to model the actual parameters of a callee at a callsite in terms of the
formal parameters of the caller.

By discovering this information and propagating it across the program callgraph
in LTO mode, various transformations like value-range propagation, sibling-call
and chaining-call optimization can be augmented to take advantage of it and
potentially perform better optimizations.

Currently, ipa-cp models parameters using jump functions, propagates the values
of those jump functions by iterating to a fixed-point using lattices, and
clones functions by taking decisions from those lattice values.

We propose extending ipa-cp with the analysis and propagation of return jump
functions, starting with the basic case of a direct return of the form
"return x;" where "x" is any formal parameter of the function.

* Motivating Test Cases *

* Optimizing tail calls:

- PR92867

char buf[128] = { 1 };

char *foo (__SIZE_TYPE__ n)
{
   return __builtin_memset (buf, ' ', n);
}

char *bar (__SIZE_TYPE__ n)
{
   __builtin_memset (buf, ' ', n);
   return buf;
}

Here, the call to __builtin_memset in foo gets successfully converted to a
tail-call, however the call in bar cannot be optimized to a tail-call. This is
an optimization that LLVM is able to perform.

Link to compiler explorer: https://godbolt.org/z/E81axqWTb

- PR67797

#include 

void *my_func(void *s, size_t n)
{
     memset(s, 0, n);
     return s;
}

This is a similar case to the above example. LLVM is able to optimize the call
to memset to be a tail-call as well.

Link to compiler explorer: https://godbolt.org/z/YzjGc59f8

* Optimizing chaining calls:

#include 

void f (int x, int y)
{
   std::cout << x;
   std::cout << y;
}

This function can be optimized to the following:

#include 

void f (int x, int y)
{
   std::cout << x << y;
}

LLVM is able to perform this optimization as well:
https://godbolt.org/z/MW75on1o5

* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
   - This involves updating ipa_node_params to contain information regarding the
     return statements of the function, namely the lattice and the jump function
     describing the return value, where both use existing data structures.
   - The ipa_node_params class is reused to avoid introducing a new class and a
     corresponding function_summary type, though it is trivial to add if deemed
     a better solution. The ipa_return_value_summary class may also be a good
     place for this information, however using it would involve moving it from
     ipa-prop.cc to ipa-prop.h.
   - Additionally, ipa_edge_args is updated to track whether or not it is a
     callgraph edge originating from a return statement. This enables the
     propagation of information in the WPA phase.

2. LGEN
   - Set up return jump functions for each function similar to the parameter
     jump function. When it cannot be conclusively determined that one formal
     parameter is always returned (such as conditionally returning either of
     two), the jump function is marked as invalid.
   - This involves looking through phi nodes when return statements of
     conditional branches are merged into a single exit block. Note that
     returning a call expression does not count towards invalidating the
     information for that function.

3. WPA
   - Implement return value information propagation. It is not possible to do
     this in the existing propagate_constants_topo function, because it iterates
     in reverse postorder i.e. from caller to callee. However return value
     propagation needs to be done from callee to caller, thus there is a need to
     iterate in a postorder fashion.
   - The lattice values are initialized using the jump functions computed in the
     LGEN phase. These values are then propagated over the callgraph.
   - The existing lattices are reused, with three possible values like usual.
     The possible values are:

   return_lattice -> { TOP, param_decl, BOTTOM }

     where TOP and BOTTOM are defined as usual. param_decl refers to the tree of
     either the parameter declaration or its type, whichever is available. The
     meet operator is defined as usual for TOP and BOTTOM. When both are
     param_decl, the following meet operation is defined:

   meet(x, y) = x if x == y, 

Re: [RFC] Return Value Propagation in IPA-CP

2024-10-13 Thread Dhruv Chawla via Gcc

Ping!

On 25/09/24 14:44, Dhruv Chawla via Gcc wrote:

External email: Use caution opening links or attachments


On 15/09/24 18:04, Jan Hubicka wrote:

External email: Use caution opening links or attachments



Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).

Hi,


Hi,

Thanks for the feedback! A few comments:


* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
    - This involves updating ipa_node_params to contain information 
regarding the
  return statements of the function, namely the lattice and the jump 
function

  describing the return value, where both use existing data structures.
    - The ipa_node_params class is reused to avoid introducing a new class 
and a
  corresponding function_summary type, though it is trivial to add if 
deemed

  a better solution. The ipa_return_value_summary class may also be a good
  place for this information, however using it would involve moving it from
  ipa-prop.cc to ipa-prop.h.


Yes, I think the jump functions for return values are not realy
different from jump functions for call parameters, so indeed we want to
reuse same datatstructure.
We want to track all the usual stuff we do (value ranges, constant
values, known fields in structures...)

In longer run we will want to extend jump functions to represent
properties like "value returned by call X is passed as argument n to
call Y". But for the first version we probably do not need to worry
about this.

    - Additionally, ipa_edge_args is updated to track whether or not it is a
  callgraph edge originating from a return statement. This enables the
  propagation of information in the WPA phase.


What you exactly mean by callgraph edge originating from return
statement?


As a first step, there are three main types of expressions in return statements
we are analyzing:
  - Direct returns of formal parameters
  - PHI nodes (which can be nested/recursive)
  - Call expressions (this is the information we are propagating in ipa-cp)

For the first version of our patch, we are using ipa-cp as a framework only for
analysis and propagation, not for transformations. So, we are detecting
candidates for attribution with "returns_arg", and propagating that information
from callee to caller in ipa-cp in the hopes that we can annotate callers as
well by using the information from the callee.

This is why we need to detect when a callgraph edge "originates" from a return
statement, so that we know which callees to propagate to callers from. We are
interested in walking from callees to callers, so this allows us to effectively
do that.

Consider the following example:

int foo (int x) { return x; }    /* <- foo becomes returns_arg (1) */
int bar (int x) { return foo (x); }

In this example, bar calls foo in a return statement so the edge from bar to foo
becomes a "return edge". This allows us to propagate "returns_arg" information
from foo to bar, so we are able to annotate bar as well.



2. LGEN
    - Set up return jump functions for each function similar to the parameter
  jump function. When it cannot be conclusively determined that one formal
  parameter is always returned (such as conditionally returning either of
  two), the jump function is marked as invalid.
    - This involves looking through phi nodes when return statements of
  conditional branches are merged into a single exit block. Note that
  returning a call expression does not count towards invalidating the
  information for that function.


The analysis is essentially the same as for function parameters, just
looking at the return statements, so I think most of code can be shared
with what we have.


3. WPA
    - Implement return value information propagation. It is not possible to do
  this in the existing propagate_constants_topo function, because it 
iterates

  in reverse postorder i.e. from caller to callee. However return value
  propagation needs to be done from callee to caller, thus there is a 
need to

  iterate in a postorder fashion.
    - The lattice values are initialized using the jump functions computed 
in the

  LGEN phase. These values are then propagated over the callgraph.
    - The existing lattices are reused, with three possible values like usual.
  The possible values are:

    return_lattice -> { TOP, param_decl, BOTTOM }

  where TOP and BOTTOM are defined as usual. param_decl refers to the 
tree of

  either the parameter declaration or its type, whichever is available. The
  meet operator is defined as usual for TOP and BOTTOM. When both are
  param_decl, the following meet operation is defined:

    meet(x, y) = x if x == y, BOTTOM otherwise

    - Finally, nodes to which no information could be propagated are marked as
  BOTTOM and the BOTTOM value is propagated to a fixed-point. This is 
required

Re: [RFC] Return Value Propagation in IPA-CP

2024-10-17 Thread Dhruv Chawla via Gcc

Hi,

Thanks for the reply!

On 16/10/24 20:25, Martin Jambor wrote:

External email: Use caution opening links or attachments


Hello,

first and foremost, sorry for a late reply.  I needed to take a larger
leave of absence for family reasons.

Comments inline:

On Thu, Aug 22 2024, Dhruv Chawla via Gcc wrote:

* Table Of Contents *

- Introduction
- Motivating Test Cases
- Proposed Solution
- Other Options
- Existing Solutions
- Primary Optimizations To Be Improved
- Front-End Attribute For Recording Return-Value Information
- Future Work
- References
- Questions

* Introduction *

Return jump functions offer a way of modeling return values of a function in
terms of its formal parameters, similar to how regular jump functions can be
used to model the actual parameters of a callee at a callsite in terms of the
formal parameters of the caller.

By discovering this information and propagating it across the program callgraph
in LTO mode, various transformations like value-range propagation, sibling-call
and chaining-call optimization can be augmented to take advantage of it and
potentially perform better optimizations.

Currently, ipa-cp models parameters using jump functions, propagates the values
of those jump functions by iterating to a fixed-point using lattices, and
clones functions by taking decisions from those lattice values.

We propose extending ipa-cp with the analysis and propagation of return jump
functions, starting with the basic case of a direct return of the form
"return x;" where "x" is any formal parameter of the function.

* Motivating Test Cases *

* Optimizing tail calls:

- PR92867

char buf[128] = { 1 };

char *foo (__SIZE_TYPE__ n)
{
return __builtin_memset (buf, ' ', n);
}

char *bar (__SIZE_TYPE__ n)
{
__builtin_memset (buf, ' ', n);
return buf;
}

Here, the call to __builtin_memset in foo gets successfully converted to a
tail-call, however the call in bar cannot be optimized to a tail-call. This is
an optimization that LLVM is able to perform.

Link to compiler explorer: https://godbolt.org/z/E81axqWTb

- PR67797

#include 

void *my_func(void *s, size_t n)
{
  memset(s, 0, n);
  return s;
}

This is a similar case to the above example. LLVM is able to optimize the call
to memset to be a tail-call as well.

Link to compiler explorer: https://godbolt.org/z/YzjGc59f8

* Optimizing chaining calls:

#include 

void f (int x, int y)
{
std::cout << x;
std::cout << y;
}

This function can be optimized to the following:

#include 

void f (int x, int y)
{
std::cout << x << y;
}

LLVM is able to perform this optimization as well:
https://godbolt.org/z/MW75on1o5

* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
- This involves updating ipa_node_params to contain information regarding 
the
  return statements of the function, namely the lattice and the jump 
function
  describing the return value, where both use existing data structures.
- The ipa_node_params class is reused to avoid introducing a new class and a
  corresponding function_summary type, though it is trivial to add if deemed
  a better solution. The ipa_return_value_summary class may also be a good
  place for this information, however using it would involve moving it from
  ipa-prop.cc to ipa-prop.h.


I believe that ipa_return_value_summary should become a member of
ipa_node_params (perhaps we could think of a better name) and
ipa_return_value_sum should be removed.  All infrastructure there is to
create ipa_return_value_summary should be re-used or extended to fulfill
also the new needs.


As per the reply at https://gcc.gnu.org/pipermail/gcc/2024-
September/244856.html, we ended up splitting this out into a new structure.
Having both tree and IPA passes accessing the same structure made memory
management hard.





- Additionally, ipa_edge_args is updated to track whether or not it is a
  callgraph edge originating from a return statement. This enables the
  propagation of information in the WPA phase.


I don't understand.  I have read your reply to Honza to his question
about this and I still don't understand what you mean by this.

I believe we want to have a return function for each call graph edge,
certainly for each one that calls a non-void function.


Yeah, I believe that could have been phrased better. It just means that we track
whether or not an SSA_NAME created as a result of a gcall is used in the retval
of a return statement, either directly or through a PHI node. This is achieved
by tracking this information in ipa_edge_args (as there is one instance of this
structure for each cgraph_edge), so each gcall will have an ipa_edge_args
instance associated with it.

The return jump function is stored per-node, not per-edge. The per-edge tracking
is only done to enable propagation of the "returns_arg" attribute from cal

Re: [RFC] Return Value Propagation in IPA-CP

2024-09-25 Thread Dhruv Chawla via Gcc

On 15/09/24 18:04, Jan Hubicka wrote:

External email: Use caution opening links or attachments



Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).

Hi,


Hi,

Thanks for the feedback! A few comments:


* Proposed Solution *

* Extending IPA-CP:

1. Add return jump function data structures
- This involves updating ipa_node_params to contain information regarding 
the
  return statements of the function, namely the lattice and the jump 
function
  describing the return value, where both use existing data structures.
- The ipa_node_params class is reused to avoid introducing a new class and a
  corresponding function_summary type, though it is trivial to add if deemed
  a better solution. The ipa_return_value_summary class may also be a good
  place for this information, however using it would involve moving it from
  ipa-prop.cc to ipa-prop.h.


Yes, I think the jump functions for return values are not realy
different from jump functions for call parameters, so indeed we want to
reuse same datatstructure.
We want to track all the usual stuff we do (value ranges, constant
values, known fields in structures...)

In longer run we will want to extend jump functions to represent
properties like "value returned by call X is passed as argument n to
call Y". But for the first version we probably do not need to worry
about this.

- Additionally, ipa_edge_args is updated to track whether or not it is a
  callgraph edge originating from a return statement. This enables the
  propagation of information in the WPA phase.


What you exactly mean by callgraph edge originating from return
statement?


As a first step, there are three main types of expressions in return statements
we are analyzing:
 - Direct returns of formal parameters
 - PHI nodes (which can be nested/recursive)
 - Call expressions (this is the information we are propagating in ipa-cp)

For the first version of our patch, we are using ipa-cp as a framework only for
analysis and propagation, not for transformations. So, we are detecting
candidates for attribution with "returns_arg", and propagating that information
from callee to caller in ipa-cp in the hopes that we can annotate callers as
well by using the information from the callee.

This is why we need to detect when a callgraph edge "originates" from a return
statement, so that we know which callees to propagate to callers from. We are
interested in walking from callees to callers, so this allows us to effectively
do that.

Consider the following example:

int foo (int x) { return x; }/* <- foo becomes returns_arg (1) */
int bar (int x) { return foo (x); }

In this example, bar calls foo in a return statement so the edge from bar to foo
becomes a "return edge". This allows us to propagate "returns_arg" information
from foo to bar, so we are able to annotate bar as well.



2. LGEN
- Set up return jump functions for each function similar to the parameter
  jump function. When it cannot be conclusively determined that one formal
  parameter is always returned (such as conditionally returning either of
  two), the jump function is marked as invalid.
- This involves looking through phi nodes when return statements of
  conditional branches are merged into a single exit block. Note that
  returning a call expression does not count towards invalidating the
  information for that function.


The analysis is essentially the same as for function parameters, just
looking at the return statements, so I think most of code can be shared
with what we have.


3. WPA
- Implement return value information propagation. It is not possible to do
  this in the existing propagate_constants_topo function, because it 
iterates
  in reverse postorder i.e. from caller to callee. However return value
  propagation needs to be done from callee to caller, thus there is a need 
to
  iterate in a postorder fashion.
- The lattice values are initialized using the jump functions computed in 
the
  LGEN phase. These values are then propagated over the callgraph.
- The existing lattices are reused, with three possible values like usual.
  The possible values are:

return_lattice -> { TOP, param_decl, BOTTOM }

  where TOP and BOTTOM are defined as usual. param_decl refers to the tree 
of
  either the parameter declaration or its type, whichever is available. The
  meet operator is defined as usual for TOP and BOTTOM. When both are
  param_decl, the following meet operation is defined:

meet(x, y) = x if x == y, BOTTOM otherwise

- Finally, nodes to which no information could be propagated are marked as
  BOTTOM and the BOTTOM value is propagated to a fixed-point. This is 
required
  when information about a caller has been inferred but not about one of its
  callees. For example:

extern int foo(int);

__attribute__((noinline))
int bar (int z