Hi Richard, I'm not quite following you. I understand the non-cp issue. But are you saying it's somehow related to the non-inlining issue (in that case I'm unable to see the relationship, because the compound literal case always gets inlined)? Or maybe you consider there is no non-inlining problem to address here, despite the different outcomes for the static const and the compound literal cases, and you're just pointing out a different, unrelated, issue.
Cheers -- Carlos On Fri, Feb 5, 2016 at 8:28 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Feb 4, 2016 at 9:10 PM, Carlos Pita <carlosjosep...@gmail.com> wrote: >> PS 2 (last one, I swear): I've isolated what I think is the root of >> the problem. When einline expands g, there is plenty of call sites for >> f.call, so the full redundancy elimination pass replaces sum for >> f.call, making things easy for the late ipa inliner. But when g is not >> early inlined, there is only one call site for the global f.call and >> another one for the local/literal f.call, so the fre pass just lets >> them be. This is innocuous from the fre point of view, but disables >> further inlining as described above. >> >> On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita <carlosjosep...@gmail.com> wrote: >>> PS: I framed the issue between the inline_param and fixup_cfg passes >>> because I was only looking at the tree passes, but the really relevant >>> passes are tree-einline (obviously) and ipa-inline, which happens >>> between tree-inline_param2 and tree-fixup_cfg2. So, restating the >>> problem: if early inline is not happening, late inline will miss the >>> chance to inline the reference from the static const struct member. >>> >>> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosep...@gmail.com> >>> wrote: >>>> Hi all, >>>> >>>> I've been trying to understand some bizarre interaction between >>>> optimizing passes I've observed while compiling a heavily nested >>>> inlined numerical code of mine. I managed to reduce the issue down to >>>> this simple code: >>>> >>>> ``` test.c >>>> >>>> typedef struct F { >>>> int (*call)(int); >>>> } F; >>>> >>>> static int g(F f, int x) { >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> return x; >>>> } >>>> >>>> static int sq(int x) { >>>> return x * x; >>>> } >>>> >>>> static const F f = {sq}; >>>> >>>> void dosomething(int); >>>> >>>> int h(int x) { >>>> dosomething(g(f, x)); >>>> dosomething(g((F){sq}, x)); >>>> } >>>> >>>> ``` >>>> >>>> Here we have a driver function h calling the workhorse g which >>>> delegates some simple task to the inline-wannabe f. The distinctive >>>> aspect of the above scheme is that f is referenced from a struct >>>> member. The first call to g passes a static const struct while the >>>> second call passes a compound literal (alternatively, a local version >>>> of the struct will have the same effect regarding what follows). >>>> >>>> Now, say I compile this code with: >>>> >>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c >>>> >>>> The einline pass will not be able to inline calls to g with such a low >>>> value for early-inlining-insns. >>>> >>>> The inline_param2 pass still shows: >>>> >>>> ``` >>>> h (int x) >>>> { >>>> struct F D.1847; >>>> int _4; >>>> int _8; >>>> >>>> <bb 2>: >>>> _4 = g (f, x_2(D)); >>>> dosomething (_4); >>>> D.1847.call = sq; >>>> _8 = g (D.1847, x_2(D)); >>>> dosomething (_8); >>>> return; >>>> >>>> } >>>> >>>> ``` >>>> >>>> The next tree pass is fixup_cfg4, which does the inline but just for >>>> the second all to g: >>>> >>>> ``` >>>> h (int x) >>>> { >>>> .... >>>> >>>> <bb 2>: >>>> f = f; >>>> f$call_7 = MEM[(struct F *)&f]; >>>> x_19 = f$call_7 (x_2(D)); >>>> x_20 = f$call_7 (x_19); >>>> x_21 = f$call_7 (x_20); >>>> x_22 = f$call_7 (x_21); >>>> x_23 = f$call_7 (x_22); >>>> x_24 = f$call_7 (x_23); >>>> x_25 = f$call_7 (x_24); >>>> x_26 = f$call_7 (x_25); >>>> _43 = x_26; >>>> _4 = _43; >>>> dosomething (_4); >>>> D.1847.call = sq; >>>> f = D.1847; >>>> f$call_10 = MEM[(struct F *)&f]; >>>> _33 = x_2(D) * x_2(D); >>>> _45 = _33; >>>> x_11 = _45; >>>> _32 = x_11 * x_11; >>>> _46 = _32; >>>> x_12 = _46; >>>> _31 = x_12 * x_12; >>>> _47 = _31; >>>> x_13 = _47; >>>> _30 = x_13 * x_13; >>>> _48 = _30; >>>> x_14 = _48; >>>> _29 = x_14 * x_14; >>>> _49 = _29; >>>> x_15 = _49; >>>> _28 = x_15 * x_15; >>>> _50 = _28; >>>> x_16 = _50; >>>> _27 = x_16 * x_16; >>>> _51 = _27; >>>> x_17 = _51; >>>> _3 = x_17 * x_17; >>>> _52 = _3; >>>> x_18 = _52; >>>> _53 = x_18; >>>> _8 = _53; >>>> dosomething (_8); >>>> return; >>>> >>>> } >>>> ``` >>>> >>>> Now, say I recompile the code with a larger early-inlining-insns, so >>>> that einline is able to early inline both calls to g: >>>> >>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c >>>> >>>> After inline_param2 (that is, before fixup_cfg4), we now have: >>>> >>>> ``` >>>> h (int x) >>>> { >>>> int x; >>>> int x; >>>> >>>> <bb 2>: >>>> x_13 = sq (x_2(D)); >>>> x_14 = sq (x_13); >>>> x_15 = sq (x_14); >>>> x_16 = sq (x_15); >>>> x_17 = sq (x_16); >>>> x_18 = sq (x_17); >>>> x_19 = sq (x_18); >>>> x_20 = sq (x_19); >>>> dosomething (x_20); >>>> x_5 = sq (x_2(D)); >>>> x_6 = sq (x_5); >>>> x_7 = sq (x_6); >>>> x_8 = sq (x_7); >>>> x_9 = sq (x_8); >>>> x_10 = sq (x_9); >>>> x_11 = sq (x_10); >>>> x_12 = sq (x_11); >>>> dosomething (x_12); >>>> return; >>>> >>>> } >>>> ``` >>>> >>>> And fixup_cfg4 is able to do its job for both calls: >>>> >>>> ``` >>>> h (int x) >>>> { >>>> .... >>>> >>>> <bb 2>: >>>> _36 = x_2(D) * x_2(D); >>>> _37 = _36; >>>> x_13 = _37; >>>> _35 = x_13 * x_13; >>>> _38 = _35; >>>> x_14 = _38; >>>> _34 = x_14 * x_14; >>>> _39 = _34; >>>> x_15 = _39; >>>> _33 = x_15 * x_15; >>>> _40 = _33; >>>> x_16 = _40; >>>> _32 = x_16 * x_16; >>>> _41 = _32; >>>> x_17 = _41; >>>> _31 = x_17 * x_17; >>>> _42 = _31; >>>> x_18 = _42; >>>> _30 = x_18 * x_18; >>>> _43 = _30; >>>> x_19 = _43; >>>> _29 = x_19 * x_19; >>>> _44 = _29; >>>> x_20 = _44; >>>> dosomething (x_20); >>>> _28 = x_2(D) * x_2(D); >>>> _45 = _28; >>>> x_5 = _45; >>>> _27 = x_5 * x_5; >>>> _46 = _27; >>>> x_6 = _46; >>>> _26 = x_6 * x_6; >>>> _47 = _26; >>>> x_7 = _47; >>>> _25 = x_7 * x_7; >>>> _48 = _25; >>>> x_8 = _48; >>>> _24 = x_8 * x_8; >>>> _49 = _24; >>>> x_9 = _49; >>>> _23 = x_9 * x_9; >>>> _50 = _23; >>>> x_10 = _50; >>>> _22 = x_10 * x_10; >>>> _51 = _22; >>>> x_11 = _51; >>>> _21 = x_11 * x_11; >>>> _52 = _21; >>>> x_12 = _52; >>>> dosomething (x_12); >>>> return; >>>> >>>> } >>>> ``` >>>> >>>> The bottom line is that I get full inlining if einline manages to >>>> early inline both g calls, but I get incomplete inlining otherwise. I >>>> guess the problem is that fixup_cfg4 is not able to infer that >>>> f$call_7 is just sq in disguise when f is the global static const >>>> struct but it is able to get it when it's a local or literal one. In >>>> case einline expands the code early the successive passes will make >>>> fixup_cfg4 see just sq in both cases, making inlining of sq a trivial >>>> matter. But if einline hits its hard limits, fixup_cfg4 will have to >>>> figure out that f$call is sq by itself. >>>> >>>> I'm not sure whether this should be considered a proper bug or more of >>>> a quirk of the inlining system one must learn to live with. In the >>>> first case, I'll report it if you ask me to do it. In the second case, >>>> I would like to ask for some advice about the best way to cope with >>>> this scenario (besides blindly incrementing early-inlining-insns); I >>>> can provide more background regarding my real use case if necessary. > > I think the real missed optimization is that for the passed aggregate D.1772 > > D.1772.call = sq; > _8 = g (D.1772, x_2(D)); > > IPA CP cannot figure out a jump-function with the sq function as > value. It _can_ > for the call passing f: > > callsite h/3 -> g/0 : > param 0: UNKNOWN > Aggregate passed by value: > offset: 0, cst: sq > Unknown alignment > param 1: PASS THROUGH: 0, op nop_expr > Unknown alignment > callsite h/3 -> dosomething/4 : > callsite h/3 -> g/0 : > param 0: UNKNOWN > Unknown alignment > param 1: PASS THROUGH: 0, op nop_expr > Unknown alignment > > not sure how hard to fix that is, but it can't be too difficult. Martin? > > You may want to open a bugreport. > > Richard. > >>>> Cheers >>>> -- >>>> Carlos