Early inlining and function references from static const struct (bug?)

2016-02-04 Thread Carlos Pita
Hi all,

I've been trying to understand some bizarre interaction between
optimizing passes I've observed while compiling a heavily nested
inlined numerical code of mine. I managed to reduce the issue down to
this simple code:

``` test.c

typedef struct F {
  int (*call)(int);
} F;

static int g(F f, int x) {
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  return x;
}

static int sq(int x) {
  return x * x;
}

static const F f = {sq};

void dosomething(int);

int h(int x) {
  dosomething(g(f, x));
  dosomething(g((F){sq}, x));
}

```

Here we have a driver function h calling the workhorse g which
delegates some simple task to the inline-wannabe f. The distinctive
aspect of the above scheme is that f is referenced from a struct
member. The first call to g passes a static const struct while the
second call passes a compound literal (alternatively, a local version
of the struct will have the same effect regarding what follows).

Now, say I compile this code with:

gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c

The einline pass will not be able to inline calls to g with such a low
value for early-inlining-insns.

The inline_param2 pass still shows:

```
h (int x)
{
  struct F D.1847;
  int _4;
  int _8;

  :
  _4 = g (f, x_2(D));
  dosomething (_4);
  D.1847.call = sq;
  _8 = g (D.1847, x_2(D));
  dosomething (_8);
  return;

}

```

The next tree pass is fixup_cfg4, which does the inline but just for
the second all to g:

```
h (int x)
{
  

  :
  f = f;
  f$call_7 = MEM[(struct F *)&f];
  x_19 = f$call_7 (x_2(D));
  x_20 = f$call_7 (x_19);
  x_21 = f$call_7 (x_20);
  x_22 = f$call_7 (x_21);
  x_23 = f$call_7 (x_22);
  x_24 = f$call_7 (x_23);
  x_25 = f$call_7 (x_24);
  x_26 = f$call_7 (x_25);
  _43 = x_26;
  _4 = _43;
  dosomething (_4);
  D.1847.call = sq;
  f = D.1847;
  f$call_10 = MEM[(struct F *)&f];
  _33 = x_2(D) * x_2(D);
  _45 = _33;
  x_11 = _45;
  _32 = x_11 * x_11;
  _46 = _32;
  x_12 = _46;
  _31 = x_12 * x_12;
  _47 = _31;
  x_13 = _47;
  _30 = x_13 * x_13;
  _48 = _30;
  x_14 = _48;
  _29 = x_14 * x_14;
  _49 = _29;
  x_15 = _49;
  _28 = x_15 * x_15;
  _50 = _28;
  x_16 = _50;
  _27 = x_16 * x_16;
  _51 = _27;
  x_17 = _51;
  _3 = x_17 * x_17;
  _52 = _3;
  x_18 = _52;
  _53 = x_18;
  _8 = _53;
  dosomething (_8);
  return;

}
```

Now, say I recompile the code with a larger early-inlining-insns, so
that einline is able to early inline both calls to g:

gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c

After inline_param2 (that is, before fixup_cfg4), we now have:

```
h (int x)
{
  int x;
  int x;

  :
  x_13 = sq (x_2(D));
  x_14 = sq (x_13);
  x_15 = sq (x_14);
  x_16 = sq (x_15);
  x_17 = sq (x_16);
  x_18 = sq (x_17);
  x_19 = sq (x_18);
  x_20 = sq (x_19);
  dosomething (x_20);
  x_5 = sq (x_2(D));
  x_6 = sq (x_5);
  x_7 = sq (x_6);
  x_8 = sq (x_7);
  x_9 = sq (x_8);
  x_10 = sq (x_9);
  x_11 = sq (x_10);
  x_12 = sq (x_11);
  dosomething (x_12);
  return;

}
```

And fixup_cfg4 is able to do its job for both calls:

```
h (int x)
{
  

  :
  _36 = x_2(D) * x_2(D);
  _37 = _36;
  x_13 = _37;
  _35 = x_13 * x_13;
  _38 = _35;
  x_14 = _38;
  _34 = x_14 * x_14;
  _39 = _34;
  x_15 = _39;
  _33 = x_15 * x_15;
  _40 = _33;
  x_16 = _40;
  _32 = x_16 * x_16;
  _41 = _32;
  x_17 = _41;
  _31 = x_17 * x_17;
  _42 = _31;
  x_18 = _42;
  _30 = x_18 * x_18;
  _43 = _30;
  x_19 = _43;
  _29 = x_19 * x_19;
  _44 = _29;
  x_20 = _44;
  dosomething (x_20);
  _28 = x_2(D) * x_2(D);
  _45 = _28;
  x_5 = _45;
  _27 = x_5 * x_5;
  _46 = _27;
  x_6 = _46;
  _26 = x_6 * x_6;
  _47 = _26;
  x_7 = _47;
  _25 = x_7 * x_7;
  _48 = _25;
  x_8 = _48;
  _24 = x_8 * x_8;
  _49 = _24;
  x_9 = _49;
  _23 = x_9 * x_9;
  _50 = _23;
  x_10 = _50;
  _22 = x_10 * x_10;
  _51 = _22;
  x_11 = _51;
  _21 = x_11 * x_11;
  _52 = _21;
  x_12 = _52;
  dosomething (x_12);
  return;

}
```

The bottom line is that I get full inlining if einline manages to
early inline both g calls, but I get incomplete inlining otherwise. I
guess the problem is that fixup_cfg4 is not able to infer that
f$call_7 is just sq in disguise when f is the global static const
struct but it is able to get it when it's a local or literal one. In
case einline expands the code early the successive passes will make
fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
matter. But if einline hits its hard limits, fixup_cfg4 will have to
figure out that f$call is sq by itself.

I'm not sure whether this should be considered a proper bug or more of
a quirk of the inlining system one must learn to live with. In the
first case, I'll report it if you ask me to do it. In the second case,
I would like to ask for some advice about the best way to cope with
this scenario (besides blindly incrementing early-inlining-insns); I
can provide more background regarding my real use case if necessary.

Cheers
--
Carlos


Re: Early inlining and function references from static const struct (bug?)

2016-02-04 Thread Carlos Pita
PS: I framed the issue between the inline_param and fixup_cfg passes
because I was only looking at the tree passes, but the really relevant
passes are tree-einline (obviously) and ipa-inline, which happens
between tree-inline_param2 and tree-fixup_cfg2. So, restating the
problem: if early inline is not happening, late inline will miss the
chance to inline the reference from the static const struct member.

On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita  wrote:
> Hi all,
>
> I've been trying to understand some bizarre interaction between
> optimizing passes I've observed while compiling a heavily nested
> inlined numerical code of mine. I managed to reduce the issue down to
> this simple code:
>
> ``` test.c
>
> typedef struct F {
>   int (*call)(int);
> } F;
>
> static int g(F f, int x) {
>   x = f.call(x);
>   x = f.call(x);
>   x = f.call(x);
>   x = f.call(x);
>   x = f.call(x);
>   x = f.call(x);
>   x = f.call(x);
>   x = f.call(x);
>   return x;
> }
>
> static int sq(int x) {
>   return x * x;
> }
>
> static const F f = {sq};
>
> void dosomething(int);
>
> int h(int x) {
>   dosomething(g(f, x));
>   dosomething(g((F){sq}, x));
> }
>
> ```
>
> Here we have a driver function h calling the workhorse g which
> delegates some simple task to the inline-wannabe f. The distinctive
> aspect of the above scheme is that f is referenced from a struct
> member. The first call to g passes a static const struct while the
> second call passes a compound literal (alternatively, a local version
> of the struct will have the same effect regarding what follows).
>
> Now, say I compile this code with:
>
> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>
> The einline pass will not be able to inline calls to g with such a low
> value for early-inlining-insns.
>
> The inline_param2 pass still shows:
>
> ```
> h (int x)
> {
>   struct F D.1847;
>   int _4;
>   int _8;
>
>   :
>   _4 = g (f, x_2(D));
>   dosomething (_4);
>   D.1847.call = sq;
>   _8 = g (D.1847, x_2(D));
>   dosomething (_8);
>   return;
>
> }
>
> ```
>
> The next tree pass is fixup_cfg4, which does the inline but just for
> the second all to g:
>
> ```
> h (int x)
> {
>   
>
>   :
>   f = f;
>   f$call_7 = MEM[(struct F *)&f];
>   x_19 = f$call_7 (x_2(D));
>   x_20 = f$call_7 (x_19);
>   x_21 = f$call_7 (x_20);
>   x_22 = f$call_7 (x_21);
>   x_23 = f$call_7 (x_22);
>   x_24 = f$call_7 (x_23);
>   x_25 = f$call_7 (x_24);
>   x_26 = f$call_7 (x_25);
>   _43 = x_26;
>   _4 = _43;
>   dosomething (_4);
>   D.1847.call = sq;
>   f = D.1847;
>   f$call_10 = MEM[(struct F *)&f];
>   _33 = x_2(D) * x_2(D);
>   _45 = _33;
>   x_11 = _45;
>   _32 = x_11 * x_11;
>   _46 = _32;
>   x_12 = _46;
>   _31 = x_12 * x_12;
>   _47 = _31;
>   x_13 = _47;
>   _30 = x_13 * x_13;
>   _48 = _30;
>   x_14 = _48;
>   _29 = x_14 * x_14;
>   _49 = _29;
>   x_15 = _49;
>   _28 = x_15 * x_15;
>   _50 = _28;
>   x_16 = _50;
>   _27 = x_16 * x_16;
>   _51 = _27;
>   x_17 = _51;
>   _3 = x_17 * x_17;
>   _52 = _3;
>   x_18 = _52;
>   _53 = x_18;
>   _8 = _53;
>   dosomething (_8);
>   return;
>
> }
> ```
>
> Now, say I recompile the code with a larger early-inlining-insns, so
> that einline is able to early inline both calls to g:
>
> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
>
> After inline_param2 (that is, before fixup_cfg4), we now have:
>
> ```
> h (int x)
> {
>   int x;
>   int x;
>
>   :
>   x_13 = sq (x_2(D));
>   x_14 = sq (x_13);
>   x_15 = sq (x_14);
>   x_16 = sq (x_15);
>   x_17 = sq (x_16);
>   x_18 = sq (x_17);
>   x_19 = sq (x_18);
>   x_20 = sq (x_19);
>   dosomething (x_20);
>   x_5 = sq (x_2(D));
>   x_6 = sq (x_5);
>   x_7 = sq (x_6);
>   x_8 = sq (x_7);
>   x_9 = sq (x_8);
>   x_10 = sq (x_9);
>   x_11 = sq (x_10);
>   x_12 = sq (x_11);
>   dosomething (x_12);
>   return;
>
> }
> ```
>
> And fixup_cfg4 is able to do its job for both calls:
>
> ```
> h (int x)
> {
>   
>
>   :
>   _36 = x_2(D) * x_2(D);
>   _37 = _36;
>   x_13 = _37;
>   _35 = x_13 * x_13;
>   _38 = _35;
>   x_14 = _38;
>   _34 = x_14 * x_14;
>   _39 = _34;
>   x_15 = _39;
>   _33 = x_15 * x_15;
>   _40 = _33;
>   x_16 = _40;
>   _32 = x_16 * x_16;
>   _41 = _32;
>   x_17 = _41;
>   _31 = x_17 * x_17;
>   _42 = _31;
>   x_18 = _42;
>   _30 = x_18 * x_18;
>   _43 = _30;
>   x_19 = _43;
>   _29 = x_19 * x_19;
>   _44 = _29;
>   x_20 = _44;

Re: Early inlining and function references from static const struct (bug?)

2016-02-04 Thread Carlos Pita
PS 2 (last one, I swear): I've isolated what I think is the root of
the problem. When einline expands g, there is plenty of call sites for
f.call, so the full redundancy elimination pass replaces sum for
f.call, making things easy for the late ipa inliner. But when g is not
early inlined, there is only one call site for the global f.call and
another one for the local/literal f.call, so the fre pass just lets
them be. This is innocuous from the fre point of view, but disables
further inlining as described above.

On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita  wrote:
> PS: I framed the issue between the inline_param and fixup_cfg passes
> because I was only looking at the tree passes, but the really relevant
> passes are tree-einline (obviously) and ipa-inline, which happens
> between tree-inline_param2 and tree-fixup_cfg2. So, restating the
> problem: if early inline is not happening, late inline will miss the
> chance to inline the reference from the static const struct member.
>
> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita  wrote:
>> Hi all,
>>
>> I've been trying to understand some bizarre interaction between
>> optimizing passes I've observed while compiling a heavily nested
>> inlined numerical code of mine. I managed to reduce the issue down to
>> this simple code:
>>
>> ``` test.c
>>
>> typedef struct F {
>>   int (*call)(int);
>> } F;
>>
>> static int g(F f, int x) {
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   return x;
>> }
>>
>> static int sq(int x) {
>>   return x * x;
>> }
>>
>> static const F f = {sq};
>>
>> void dosomething(int);
>>
>> int h(int x) {
>>   dosomething(g(f, x));
>>   dosomething(g((F){sq}, x));
>> }
>>
>> ```
>>
>> Here we have a driver function h calling the workhorse g which
>> delegates some simple task to the inline-wannabe f. The distinctive
>> aspect of the above scheme is that f is referenced from a struct
>> member. The first call to g passes a static const struct while the
>> second call passes a compound literal (alternatively, a local version
>> of the struct will have the same effect regarding what follows).
>>
>> Now, say I compile this code with:
>>
>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>>
>> The einline pass will not be able to inline calls to g with such a low
>> value for early-inlining-insns.
>>
>> The inline_param2 pass still shows:
>>
>> ```
>> h (int x)
>> {
>>   struct F D.1847;
>>   int _4;
>>   int _8;
>>
>>   :
>>   _4 = g (f, x_2(D));
>>   dosomething (_4);
>>   D.1847.call = sq;
>>   _8 = g (D.1847, x_2(D));
>>   dosomething (_8);
>>   return;
>>
>> }
>>
>> ```
>>
>> The next tree pass is fixup_cfg4, which does the inline but just for
>> the second all to g:
>>
>> ```
>> h (int x)
>> {
>>   
>>
>>   :
>>   f = f;
>>   f$call_7 = MEM[(struct F *)&f];
>>   x_19 = f$call_7 (x_2(D));
>>   x_20 = f$call_7 (x_19);
>>   x_21 = f$call_7 (x_20);
>>   x_22 = f$call_7 (x_21);
>>   x_23 = f$call_7 (x_22);
>>   x_24 = f$call_7 (x_23);
>>   x_25 = f$call_7 (x_24);
>>   x_26 = f$call_7 (x_25);
>>   _43 = x_26;
>>   _4 = _43;
>>   dosomething (_4);
>>   D.1847.call = sq;
>>   f = D.1847;
>>   f$call_10 = MEM[(struct F *)&f];
>>   _33 = x_2(D) * x_2(D);
>>   _45 = _33;
>>   x_11 = _45;
>>   _32 = x_11 * x_11;
>>   _46 = _32;
>>   x_12 = _46;
>>   _31 = x_12 * x_12;
>>   _47 = _31;
>>   x_13 = _47;
>>   _30 = x_13 * x_13;
>>   _48 = _30;
>>   x_14 = _48;
>>   _29 = x_14 * x_14;
>>   _49 = _29;
>>   x_15 = _49;
>>   _28 = x_15 * x_15;
>>   _50 = _28;
>>   x_16 = _50;
>>   _27 = x_16 * x_16;
>>   _51 = _27;
>>   x_17 = _51;
>>   _3 = x_17 * x_17;
>>   _52 = _3;
>>   x_18 = _52;
>>   _53 = x_18;
>>   _8 = _53;
>>   dosomething (_8);
>>   return;
>>
>> }
>> ```
>>
>> Now, say I recompile the code with a larger early-inlining-insns, so
>> that einline is able to early inline both calls to g:
>>
>> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
>>
>> After inline_param2 (that is, before fixup_cfg4), we now have:
>>
>> ```
>> 

Re: Early inlining and function references from static const struct (bug?)

2016-02-05 Thread Carlos Pita
Hi Richard,

I'm not quite following you. I understand the non-cp issue. But are
you saying it's somehow related to the non-inlining issue (in that
case I'm unable to see the relationship, because the compound literal
case always gets inlined)? Or maybe you consider there is no
non-inlining problem to address here, despite the different outcomes
for the static const and the compound literal cases, and you're just
pointing out a different, unrelated, issue.

Cheers
--
Carlos

On Fri, Feb 5, 2016 at 8:28 AM, Richard Biener
 wrote:
> On Thu, Feb 4, 2016 at 9:10 PM, Carlos Pita  wrote:
>> PS 2 (last one, I swear): I've isolated what I think is the root of
>> the problem. When einline expands g, there is plenty of call sites for
>> f.call, so the full redundancy elimination pass replaces sum for
>> f.call, making things easy for the late ipa inliner. But when g is not
>> early inlined, there is only one call site for the global f.call and
>> another one for the local/literal f.call, so the fre pass just lets
>> them be. This is innocuous from the fre point of view, but disables
>> further inlining as described above.
>>
>> On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita  wrote:
>>> PS: I framed the issue between the inline_param and fixup_cfg passes
>>> because I was only looking at the tree passes, but the really relevant
>>> passes are tree-einline (obviously) and ipa-inline, which happens
>>> between tree-inline_param2 and tree-fixup_cfg2. So, restating the
>>> problem: if early inline is not happening, late inline will miss the
>>> chance to inline the reference from the static const struct member.
>>>
>>> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita  
>>> wrote:
>>>> Hi all,
>>>>
>>>> I've been trying to understand some bizarre interaction between
>>>> optimizing passes I've observed while compiling a heavily nested
>>>> inlined numerical code of mine. I managed to reduce the issue down to
>>>> this simple code:
>>>>
>>>> ``` test.c
>>>>
>>>> typedef struct F {
>>>>   int (*call)(int);
>>>> } F;
>>>>
>>>> static int g(F f, int x) {
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   return x;
>>>> }
>>>>
>>>> static int sq(int x) {
>>>>   return x * x;
>>>> }
>>>>
>>>> static const F f = {sq};
>>>>
>>>> void dosomething(int);
>>>>
>>>> int h(int x) {
>>>>   dosomething(g(f, x));
>>>>   dosomething(g((F){sq}, x));
>>>> }
>>>>
>>>> ```
>>>>
>>>> Here we have a driver function h calling the workhorse g which
>>>> delegates some simple task to the inline-wannabe f. The distinctive
>>>> aspect of the above scheme is that f is referenced from a struct
>>>> member. The first call to g passes a static const struct while the
>>>> second call passes a compound literal (alternatively, a local version
>>>> of the struct will have the same effect regarding what follows).
>>>>
>>>> Now, say I compile this code with:
>>>>
>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>>>>
>>>> The einline pass will not be able to inline calls to g with such a low
>>>> value for early-inlining-insns.
>>>>
>>>> The inline_param2 pass still shows:
>>>>
>>>> ```
>>>> h (int x)
>>>> {
>>>>   struct F D.1847;
>>>>   int _4;
>>>>   int _8;
>>>>
>>>>   :
>>>>   _4 = g (f, x_2(D));
>>>>   dosomething (_4);
>>>>   D.1847.call = sq;
>>>>   _8 = g (D.1847, x_2(D));
>>>>   dosomething (_8);
>>>>   return;
>>>>
>>>> }
>>>>
>>>> ```
>>>>
>>>> The next tree pass is fixup_cfg4, which does the inline but just for
>>>> the second all to g:
>>>>
>>>> ```
>>>> h (int x)
>>>> {
>>>>   
>>>>
>>>>   :
>>>>   f = f;
>>>>   f$call_7 = MEM[(struct F *)&f];
>>>>   x_19 = f$call_7 (x_2(D));
>>>>  

Re: Early inlining and function references from static const struct (bug?)

2016-02-05 Thread Carlos Pita
> I was saying that early inlining is not supposed to catch this case
> but IPA inlining.
> it shouldn't need to inline g early to end up inlining the calls to sq.  IPA 
> CP
> should clone g for the case of it calling sq and then inlining should
> just do its job.

Ok, I fully agree with that, forcing early inlining to trigger late
inlining is just a workaround. But there is still the fact that ipa
inlining is indeed inlining the compound literal case (despite ipa cp
failing there) while it's unable to inline the static const case that,
as I understand it, you don't seem to consider a missed oportunity for
ipa cp. So say you fix the "aggregate D.1772" scenario, I guess it
won't change anything regarding ipa inlining.

Cheers
--
Carlos


Re: Early inlining and function references from static const struct (bug?)

2016-02-06 Thread Carlos Pita
I've reported this at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69708.

Just to summarize:

1) If early inlining is forced then fre replaces the many references
to sq and ipa inlining is able to do its job.

2) If early inlining is disabled then ipa inlining only works for the
compound literal case. The cp pass (happening immediately before the
ipa inline one) results in:

```
h (int x)
{
  ...
  _4 = g (f, x_2(D));
  dosomething (_4);
  D.1847.call = sq;
  _8 = g (D.1847, x_2(D));
  dosomething (_8);
  
}
```

Nevertheless ipa inline seems clever enough to expand the second call to g.

3) The proper solution seems to be that cp were able to propagate sq
to both call sites in order to make things easy to ipa inline.