On Fri, Feb 5, 2016 at 3:12 PM, Carlos Pita <carlosjosep...@gmail.com> wrote: > Hi Richard, > > I'm not quite following you. I understand the non-cp issue. But are > you saying it's somehow related to the non-inlining issue (in that > case I'm unable to see the relationship, because the compound literal > case always gets inlined)? Or maybe you consider there is no > non-inlining problem to address here, despite the different outcomes > for the static const and the compound literal cases, and you're just > pointing out a different, unrelated, issue.
I was saying that early inlining is not supposed to catch this case but IPA inlining. it shouldn't need to inline g early to end up inlining the calls to sq. IPA CP should clone g for the case of it calling sq and then inlining should just do its job. Richard. > Cheers > -- > Carlos > > On Fri, Feb 5, 2016 at 8:28 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Thu, Feb 4, 2016 at 9:10 PM, Carlos Pita <carlosjosep...@gmail.com> wrote: >>> PS 2 (last one, I swear): I've isolated what I think is the root of >>> the problem. When einline expands g, there is plenty of call sites for >>> f.call, so the full redundancy elimination pass replaces sum for >>> f.call, making things easy for the late ipa inliner. But when g is not >>> early inlined, there is only one call site for the global f.call and >>> another one for the local/literal f.call, so the fre pass just lets >>> them be. This is innocuous from the fre point of view, but disables >>> further inlining as described above. >>> >>> On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita <carlosjosep...@gmail.com> >>> wrote: >>>> PS: I framed the issue between the inline_param and fixup_cfg passes >>>> because I was only looking at the tree passes, but the really relevant >>>> passes are tree-einline (obviously) and ipa-inline, which happens >>>> between tree-inline_param2 and tree-fixup_cfg2. So, restating the >>>> problem: if early inline is not happening, late inline will miss the >>>> chance to inline the reference from the static const struct member. >>>> >>>> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosep...@gmail.com> >>>> wrote: >>>>> Hi all, >>>>> >>>>> I've been trying to understand some bizarre interaction between >>>>> optimizing passes I've observed while compiling a heavily nested >>>>> inlined numerical code of mine. I managed to reduce the issue down to >>>>> this simple code: >>>>> >>>>> ``` test.c >>>>> >>>>> typedef struct F { >>>>> int (*call)(int); >>>>> } F; >>>>> >>>>> static int g(F f, int x) { >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> x = f.call(x); >>>>> return x; >>>>> } >>>>> >>>>> static int sq(int x) { >>>>> return x * x; >>>>> } >>>>> >>>>> static const F f = {sq}; >>>>> >>>>> void dosomething(int); >>>>> >>>>> int h(int x) { >>>>> dosomething(g(f, x)); >>>>> dosomething(g((F){sq}, x)); >>>>> } >>>>> >>>>> ``` >>>>> >>>>> Here we have a driver function h calling the workhorse g which >>>>> delegates some simple task to the inline-wannabe f. The distinctive >>>>> aspect of the above scheme is that f is referenced from a struct >>>>> member. The first call to g passes a static const struct while the >>>>> second call passes a compound literal (alternatively, a local version >>>>> of the struct will have the same effect regarding what follows). >>>>> >>>>> Now, say I compile this code with: >>>>> >>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c >>>>> >>>>> The einline pass will not be able to inline calls to g with such a low >>>>> value for early-inlining-insns. >>>>> >>>>> The inline_param2 pass still shows: >>>>> >>>>> ``` >>>>> h (int x) >>>>> { >>>>> struct F D.1847; >>>>> int _4; >>>>> int _8; >>>>> >>>>> <bb 2>: >>>>> _4 = g (f, x_2(D)); >>>>> dosomething (_4); >>>>> D.1847.call = sq; >>>>> _8 = g (D.1847, x_2(D)); >>>>> dosomething (_8); >>>>> return; >>>>> >>>>> } >>>>> >>>>> ``` >>>>> >>>>> The next tree pass is fixup_cfg4, which does the inline but just for >>>>> the second all to g: >>>>> >>>>> ``` >>>>> h (int x) >>>>> { >>>>> .... >>>>> >>>>> <bb 2>: >>>>> f = f; >>>>> f$call_7 = MEM[(struct F *)&f]; >>>>> x_19 = f$call_7 (x_2(D)); >>>>> x_20 = f$call_7 (x_19); >>>>> x_21 = f$call_7 (x_20); >>>>> x_22 = f$call_7 (x_21); >>>>> x_23 = f$call_7 (x_22); >>>>> x_24 = f$call_7 (x_23); >>>>> x_25 = f$call_7 (x_24); >>>>> x_26 = f$call_7 (x_25); >>>>> _43 = x_26; >>>>> _4 = _43; >>>>> dosomething (_4); >>>>> D.1847.call = sq; >>>>> f = D.1847; >>>>> f$call_10 = MEM[(struct F *)&f]; >>>>> _33 = x_2(D) * x_2(D); >>>>> _45 = _33; >>>>> x_11 = _45; >>>>> _32 = x_11 * x_11; >>>>> _46 = _32; >>>>> x_12 = _46; >>>>> _31 = x_12 * x_12; >>>>> _47 = _31; >>>>> x_13 = _47; >>>>> _30 = x_13 * x_13; >>>>> _48 = _30; >>>>> x_14 = _48; >>>>> _29 = x_14 * x_14; >>>>> _49 = _29; >>>>> x_15 = _49; >>>>> _28 = x_15 * x_15; >>>>> _50 = _28; >>>>> x_16 = _50; >>>>> _27 = x_16 * x_16; >>>>> _51 = _27; >>>>> x_17 = _51; >>>>> _3 = x_17 * x_17; >>>>> _52 = _3; >>>>> x_18 = _52; >>>>> _53 = x_18; >>>>> _8 = _53; >>>>> dosomething (_8); >>>>> return; >>>>> >>>>> } >>>>> ``` >>>>> >>>>> Now, say I recompile the code with a larger early-inlining-insns, so >>>>> that einline is able to early inline both calls to g: >>>>> >>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c >>>>> >>>>> After inline_param2 (that is, before fixup_cfg4), we now have: >>>>> >>>>> ``` >>>>> h (int x) >>>>> { >>>>> int x; >>>>> int x; >>>>> >>>>> <bb 2>: >>>>> x_13 = sq (x_2(D)); >>>>> x_14 = sq (x_13); >>>>> x_15 = sq (x_14); >>>>> x_16 = sq (x_15); >>>>> x_17 = sq (x_16); >>>>> x_18 = sq (x_17); >>>>> x_19 = sq (x_18); >>>>> x_20 = sq (x_19); >>>>> dosomething (x_20); >>>>> x_5 = sq (x_2(D)); >>>>> x_6 = sq (x_5); >>>>> x_7 = sq (x_6); >>>>> x_8 = sq (x_7); >>>>> x_9 = sq (x_8); >>>>> x_10 = sq (x_9); >>>>> x_11 = sq (x_10); >>>>> x_12 = sq (x_11); >>>>> dosomething (x_12); >>>>> return; >>>>> >>>>> } >>>>> ``` >>>>> >>>>> And fixup_cfg4 is able to do its job for both calls: >>>>> >>>>> ``` >>>>> h (int x) >>>>> { >>>>> .... >>>>> >>>>> <bb 2>: >>>>> _36 = x_2(D) * x_2(D); >>>>> _37 = _36; >>>>> x_13 = _37; >>>>> _35 = x_13 * x_13; >>>>> _38 = _35; >>>>> x_14 = _38; >>>>> _34 = x_14 * x_14; >>>>> _39 = _34; >>>>> x_15 = _39; >>>>> _33 = x_15 * x_15; >>>>> _40 = _33; >>>>> x_16 = _40; >>>>> _32 = x_16 * x_16; >>>>> _41 = _32; >>>>> x_17 = _41; >>>>> _31 = x_17 * x_17; >>>>> _42 = _31; >>>>> x_18 = _42; >>>>> _30 = x_18 * x_18; >>>>> _43 = _30; >>>>> x_19 = _43; >>>>> _29 = x_19 * x_19; >>>>> _44 = _29; >>>>> x_20 = _44; >>>>> dosomething (x_20); >>>>> _28 = x_2(D) * x_2(D); >>>>> _45 = _28; >>>>> x_5 = _45; >>>>> _27 = x_5 * x_5; >>>>> _46 = _27; >>>>> x_6 = _46; >>>>> _26 = x_6 * x_6; >>>>> _47 = _26; >>>>> x_7 = _47; >>>>> _25 = x_7 * x_7; >>>>> _48 = _25; >>>>> x_8 = _48; >>>>> _24 = x_8 * x_8; >>>>> _49 = _24; >>>>> x_9 = _49; >>>>> _23 = x_9 * x_9; >>>>> _50 = _23; >>>>> x_10 = _50; >>>>> _22 = x_10 * x_10; >>>>> _51 = _22; >>>>> x_11 = _51; >>>>> _21 = x_11 * x_11; >>>>> _52 = _21; >>>>> x_12 = _52; >>>>> dosomething (x_12); >>>>> return; >>>>> >>>>> } >>>>> ``` >>>>> >>>>> The bottom line is that I get full inlining if einline manages to >>>>> early inline both g calls, but I get incomplete inlining otherwise. I >>>>> guess the problem is that fixup_cfg4 is not able to infer that >>>>> f$call_7 is just sq in disguise when f is the global static const >>>>> struct but it is able to get it when it's a local or literal one. In >>>>> case einline expands the code early the successive passes will make >>>>> fixup_cfg4 see just sq in both cases, making inlining of sq a trivial >>>>> matter. But if einline hits its hard limits, fixup_cfg4 will have to >>>>> figure out that f$call is sq by itself. >>>>> >>>>> I'm not sure whether this should be considered a proper bug or more of >>>>> a quirk of the inlining system one must learn to live with. In the >>>>> first case, I'll report it if you ask me to do it. In the second case, >>>>> I would like to ask for some advice about the best way to cope with >>>>> this scenario (besides blindly incrementing early-inlining-insns); I >>>>> can provide more background regarding my real use case if necessary. >> >> I think the real missed optimization is that for the passed aggregate D.1772 >> >> D.1772.call = sq; >> _8 = g (D.1772, x_2(D)); >> >> IPA CP cannot figure out a jump-function with the sq function as >> value. It _can_ >> for the call passing f: >> >> callsite h/3 -> g/0 : >> param 0: UNKNOWN >> Aggregate passed by value: >> offset: 0, cst: sq >> Unknown alignment >> param 1: PASS THROUGH: 0, op nop_expr >> Unknown alignment >> callsite h/3 -> dosomething/4 : >> callsite h/3 -> g/0 : >> param 0: UNKNOWN >> Unknown alignment >> param 1: PASS THROUGH: 0, op nop_expr >> Unknown alignment >> >> not sure how hard to fix that is, but it can't be too difficult. Martin? >> >> You may want to open a bugreport. >> >> Richard. >> >>>>> Cheers >>>>> -- >>>>> Carlos