Early inlining and function references from static const struct (bug?)
Hi all, I've been trying to understand some bizarre interaction between optimizing passes I've observed while compiling a heavily nested inlined numerical code of mine. I managed to reduce the issue down to this simple code: ``` test.c typedef struct F { int (*call)(int); } F; static int g(F f, int x) { x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); return x; } static int sq(int x) { return x * x; } static const F f = {sq}; void dosomething(int); int h(int x) { dosomething(g(f, x)); dosomething(g((F){sq}, x)); } ``` Here we have a driver function h calling the workhorse g which delegates some simple task to the inline-wannabe f. The distinctive aspect of the above scheme is that f is referenced from a struct member. The first call to g passes a static const struct while the second call passes a compound literal (alternatively, a local version of the struct will have the same effect regarding what follows). Now, say I compile this code with: gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c The einline pass will not be able to inline calls to g with such a low value for early-inlining-insns. The inline_param2 pass still shows: ``` h (int x) { struct F D.1847; int _4; int _8; : _4 = g (f, x_2(D)); dosomething (_4); D.1847.call = sq; _8 = g (D.1847, x_2(D)); dosomething (_8); return; } ``` The next tree pass is fixup_cfg4, which does the inline but just for the second all to g: ``` h (int x) { : f = f; f$call_7 = MEM[(struct F *)&f]; x_19 = f$call_7 (x_2(D)); x_20 = f$call_7 (x_19); x_21 = f$call_7 (x_20); x_22 = f$call_7 (x_21); x_23 = f$call_7 (x_22); x_24 = f$call_7 (x_23); x_25 = f$call_7 (x_24); x_26 = f$call_7 (x_25); _43 = x_26; _4 = _43; dosomething (_4); D.1847.call = sq; f = D.1847; f$call_10 = MEM[(struct F *)&f]; _33 = x_2(D) * x_2(D); _45 = _33; x_11 = _45; _32 = x_11 * x_11; _46 = _32; x_12 = _46; _31 = x_12 * x_12; _47 = _31; x_13 = _47; _30 = x_13 * x_13; _48 = _30; x_14 = _48; _29 = x_14 * x_14; _49 = _29; x_15 = _49; _28 = x_15 * x_15; _50 = _28; x_16 = _50; _27 = x_16 * x_16; _51 = _27; x_17 = _51; _3 = x_17 * x_17; _52 = _3; x_18 = _52; _53 = x_18; _8 = _53; dosomething (_8); return; } ``` Now, say I recompile the code with a larger early-inlining-insns, so that einline is able to early inline both calls to g: gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c After inline_param2 (that is, before fixup_cfg4), we now have: ``` h (int x) { int x; int x; : x_13 = sq (x_2(D)); x_14 = sq (x_13); x_15 = sq (x_14); x_16 = sq (x_15); x_17 = sq (x_16); x_18 = sq (x_17); x_19 = sq (x_18); x_20 = sq (x_19); dosomething (x_20); x_5 = sq (x_2(D)); x_6 = sq (x_5); x_7 = sq (x_6); x_8 = sq (x_7); x_9 = sq (x_8); x_10 = sq (x_9); x_11 = sq (x_10); x_12 = sq (x_11); dosomething (x_12); return; } ``` And fixup_cfg4 is able to do its job for both calls: ``` h (int x) { : _36 = x_2(D) * x_2(D); _37 = _36; x_13 = _37; _35 = x_13 * x_13; _38 = _35; x_14 = _38; _34 = x_14 * x_14; _39 = _34; x_15 = _39; _33 = x_15 * x_15; _40 = _33; x_16 = _40; _32 = x_16 * x_16; _41 = _32; x_17 = _41; _31 = x_17 * x_17; _42 = _31; x_18 = _42; _30 = x_18 * x_18; _43 = _30; x_19 = _43; _29 = x_19 * x_19; _44 = _29; x_20 = _44; dosomething (x_20); _28 = x_2(D) * x_2(D); _45 = _28; x_5 = _45; _27 = x_5 * x_5; _46 = _27; x_6 = _46; _26 = x_6 * x_6; _47 = _26; x_7 = _47; _25 = x_7 * x_7; _48 = _25; x_8 = _48; _24 = x_8 * x_8; _49 = _24; x_9 = _49; _23 = x_9 * x_9; _50 = _23; x_10 = _50; _22 = x_10 * x_10; _51 = _22; x_11 = _51; _21 = x_11 * x_11; _52 = _21; x_12 = _52; dosomething (x_12); return; } ``` The bottom line is that I get full inlining if einline manages to early inline both g calls, but I get incomplete inlining otherwise. I guess the problem is that fixup_cfg4 is not able to infer that f$call_7 is just sq in disguise when f is the global static const struct but it is able to get it when it's a local or literal one. In case einline expands the code early the successive passes will make fixup_cfg4 see just sq in both cases, making inlining of sq a trivial matter. But if einline hits its hard limits, fixup_cfg4 will have to figure out that f$call is sq by itself. I'm not sure whether this should be considered a proper bug or more of a quirk of the inlining system one must learn to live with. In the first case, I'll report it if you ask me to do it. In the second case, I would like to ask for some advice about the best way to cope with this scenario (besides blindly incrementing early-inlining-insns); I can provide more background regarding my real use case if necessary. Cheers -- Carlos
Re: Early inlining and function references from static const struct (bug?)
PS: I framed the issue between the inline_param and fixup_cfg passes because I was only looking at the tree passes, but the really relevant passes are tree-einline (obviously) and ipa-inline, which happens between tree-inline_param2 and tree-fixup_cfg2. So, restating the problem: if early inline is not happening, late inline will miss the chance to inline the reference from the static const struct member. On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita wrote: > Hi all, > > I've been trying to understand some bizarre interaction between > optimizing passes I've observed while compiling a heavily nested > inlined numerical code of mine. I managed to reduce the issue down to > this simple code: > > ``` test.c > > typedef struct F { > int (*call)(int); > } F; > > static int g(F f, int x) { > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > return x; > } > > static int sq(int x) { > return x * x; > } > > static const F f = {sq}; > > void dosomething(int); > > int h(int x) { > dosomething(g(f, x)); > dosomething(g((F){sq}, x)); > } > > ``` > > Here we have a driver function h calling the workhorse g which > delegates some simple task to the inline-wannabe f. The distinctive > aspect of the above scheme is that f is referenced from a struct > member. The first call to g passes a static const struct while the > second call passes a compound literal (alternatively, a local version > of the struct will have the same effect regarding what follows). > > Now, say I compile this code with: > > gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c > > The einline pass will not be able to inline calls to g with such a low > value for early-inlining-insns. > > The inline_param2 pass still shows: > > ``` > h (int x) > { > struct F D.1847; > int _4; > int _8; > > : > _4 = g (f, x_2(D)); > dosomething (_4); > D.1847.call = sq; > _8 = g (D.1847, x_2(D)); > dosomething (_8); > return; > > } > > ``` > > The next tree pass is fixup_cfg4, which does the inline but just for > the second all to g: > > ``` > h (int x) > { > > > : > f = f; > f$call_7 = MEM[(struct F *)&f]; > x_19 = f$call_7 (x_2(D)); > x_20 = f$call_7 (x_19); > x_21 = f$call_7 (x_20); > x_22 = f$call_7 (x_21); > x_23 = f$call_7 (x_22); > x_24 = f$call_7 (x_23); > x_25 = f$call_7 (x_24); > x_26 = f$call_7 (x_25); > _43 = x_26; > _4 = _43; > dosomething (_4); > D.1847.call = sq; > f = D.1847; > f$call_10 = MEM[(struct F *)&f]; > _33 = x_2(D) * x_2(D); > _45 = _33; > x_11 = _45; > _32 = x_11 * x_11; > _46 = _32; > x_12 = _46; > _31 = x_12 * x_12; > _47 = _31; > x_13 = _47; > _30 = x_13 * x_13; > _48 = _30; > x_14 = _48; > _29 = x_14 * x_14; > _49 = _29; > x_15 = _49; > _28 = x_15 * x_15; > _50 = _28; > x_16 = _50; > _27 = x_16 * x_16; > _51 = _27; > x_17 = _51; > _3 = x_17 * x_17; > _52 = _3; > x_18 = _52; > _53 = x_18; > _8 = _53; > dosomething (_8); > return; > > } > ``` > > Now, say I recompile the code with a larger early-inlining-insns, so > that einline is able to early inline both calls to g: > > gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c > > After inline_param2 (that is, before fixup_cfg4), we now have: > > ``` > h (int x) > { > int x; > int x; > > : > x_13 = sq (x_2(D)); > x_14 = sq (x_13); > x_15 = sq (x_14); > x_16 = sq (x_15); > x_17 = sq (x_16); > x_18 = sq (x_17); > x_19 = sq (x_18); > x_20 = sq (x_19); > dosomething (x_20); > x_5 = sq (x_2(D)); > x_6 = sq (x_5); > x_7 = sq (x_6); > x_8 = sq (x_7); > x_9 = sq (x_8); > x_10 = sq (x_9); > x_11 = sq (x_10); > x_12 = sq (x_11); > dosomething (x_12); > return; > > } > ``` > > And fixup_cfg4 is able to do its job for both calls: > > ``` > h (int x) > { > > > : > _36 = x_2(D) * x_2(D); > _37 = _36; > x_13 = _37; > _35 = x_13 * x_13; > _38 = _35; > x_14 = _38; > _34 = x_14 * x_14; > _39 = _34; > x_15 = _39; > _33 = x_15 * x_15; > _40 = _33; > x_16 = _40; > _32 = x_16 * x_16; > _41 = _32; > x_17 = _41; > _31 = x_17 * x_17; > _42 = _31; > x_18 = _42; > _30 = x_18 * x_18; > _43 = _30; > x_19 = _43; > _29 = x_19 * x_19; > _44 = _29; > x_20 = _44;
Re: Early inlining and function references from static const struct (bug?)
PS 2 (last one, I swear): I've isolated what I think is the root of the problem. When einline expands g, there is plenty of call sites for f.call, so the full redundancy elimination pass replaces sum for f.call, making things easy for the late ipa inliner. But when g is not early inlined, there is only one call site for the global f.call and another one for the local/literal f.call, so the fre pass just lets them be. This is innocuous from the fre point of view, but disables further inlining as described above. On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita wrote: > PS: I framed the issue between the inline_param and fixup_cfg passes > because I was only looking at the tree passes, but the really relevant > passes are tree-einline (obviously) and ipa-inline, which happens > between tree-inline_param2 and tree-fixup_cfg2. So, restating the > problem: if early inline is not happening, late inline will miss the > chance to inline the reference from the static const struct member. > > On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita wrote: >> Hi all, >> >> I've been trying to understand some bizarre interaction between >> optimizing passes I've observed while compiling a heavily nested >> inlined numerical code of mine. I managed to reduce the issue down to >> this simple code: >> >> ``` test.c >> >> typedef struct F { >> int (*call)(int); >> } F; >> >> static int g(F f, int x) { >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> return x; >> } >> >> static int sq(int x) { >> return x * x; >> } >> >> static const F f = {sq}; >> >> void dosomething(int); >> >> int h(int x) { >> dosomething(g(f, x)); >> dosomething(g((F){sq}, x)); >> } >> >> ``` >> >> Here we have a driver function h calling the workhorse g which >> delegates some simple task to the inline-wannabe f. The distinctive >> aspect of the above scheme is that f is referenced from a struct >> member. The first call to g passes a static const struct while the >> second call passes a compound literal (alternatively, a local version >> of the struct will have the same effect regarding what follows). >> >> Now, say I compile this code with: >> >> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c >> >> The einline pass will not be able to inline calls to g with such a low >> value for early-inlining-insns. >> >> The inline_param2 pass still shows: >> >> ``` >> h (int x) >> { >> struct F D.1847; >> int _4; >> int _8; >> >> : >> _4 = g (f, x_2(D)); >> dosomething (_4); >> D.1847.call = sq; >> _8 = g (D.1847, x_2(D)); >> dosomething (_8); >> return; >> >> } >> >> ``` >> >> The next tree pass is fixup_cfg4, which does the inline but just for >> the second all to g: >> >> ``` >> h (int x) >> { >> >> >> : >> f = f; >> f$call_7 = MEM[(struct F *)&f]; >> x_19 = f$call_7 (x_2(D)); >> x_20 = f$call_7 (x_19); >> x_21 = f$call_7 (x_20); >> x_22 = f$call_7 (x_21); >> x_23 = f$call_7 (x_22); >> x_24 = f$call_7 (x_23); >> x_25 = f$call_7 (x_24); >> x_26 = f$call_7 (x_25); >> _43 = x_26; >> _4 = _43; >> dosomething (_4); >> D.1847.call = sq; >> f = D.1847; >> f$call_10 = MEM[(struct F *)&f]; >> _33 = x_2(D) * x_2(D); >> _45 = _33; >> x_11 = _45; >> _32 = x_11 * x_11; >> _46 = _32; >> x_12 = _46; >> _31 = x_12 * x_12; >> _47 = _31; >> x_13 = _47; >> _30 = x_13 * x_13; >> _48 = _30; >> x_14 = _48; >> _29 = x_14 * x_14; >> _49 = _29; >> x_15 = _49; >> _28 = x_15 * x_15; >> _50 = _28; >> x_16 = _50; >> _27 = x_16 * x_16; >> _51 = _27; >> x_17 = _51; >> _3 = x_17 * x_17; >> _52 = _3; >> x_18 = _52; >> _53 = x_18; >> _8 = _53; >> dosomething (_8); >> return; >> >> } >> ``` >> >> Now, say I recompile the code with a larger early-inlining-insns, so >> that einline is able to early inline both calls to g: >> >> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c >> >> After inline_param2 (that is, before fixup_cfg4), we now have: >> >> ``` >>
Re: Early inlining and function references from static const struct (bug?)
Hi Richard, I'm not quite following you. I understand the non-cp issue. But are you saying it's somehow related to the non-inlining issue (in that case I'm unable to see the relationship, because the compound literal case always gets inlined)? Or maybe you consider there is no non-inlining problem to address here, despite the different outcomes for the static const and the compound literal cases, and you're just pointing out a different, unrelated, issue. Cheers -- Carlos On Fri, Feb 5, 2016 at 8:28 AM, Richard Biener wrote: > On Thu, Feb 4, 2016 at 9:10 PM, Carlos Pita wrote: >> PS 2 (last one, I swear): I've isolated what I think is the root of >> the problem. When einline expands g, there is plenty of call sites for >> f.call, so the full redundancy elimination pass replaces sum for >> f.call, making things easy for the late ipa inliner. But when g is not >> early inlined, there is only one call site for the global f.call and >> another one for the local/literal f.call, so the fre pass just lets >> them be. This is innocuous from the fre point of view, but disables >> further inlining as described above. >> >> On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita wrote: >>> PS: I framed the issue between the inline_param and fixup_cfg passes >>> because I was only looking at the tree passes, but the really relevant >>> passes are tree-einline (obviously) and ipa-inline, which happens >>> between tree-inline_param2 and tree-fixup_cfg2. So, restating the >>> problem: if early inline is not happening, late inline will miss the >>> chance to inline the reference from the static const struct member. >>> >>> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita >>> wrote: >>>> Hi all, >>>> >>>> I've been trying to understand some bizarre interaction between >>>> optimizing passes I've observed while compiling a heavily nested >>>> inlined numerical code of mine. I managed to reduce the issue down to >>>> this simple code: >>>> >>>> ``` test.c >>>> >>>> typedef struct F { >>>> int (*call)(int); >>>> } F; >>>> >>>> static int g(F f, int x) { >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> x = f.call(x); >>>> return x; >>>> } >>>> >>>> static int sq(int x) { >>>> return x * x; >>>> } >>>> >>>> static const F f = {sq}; >>>> >>>> void dosomething(int); >>>> >>>> int h(int x) { >>>> dosomething(g(f, x)); >>>> dosomething(g((F){sq}, x)); >>>> } >>>> >>>> ``` >>>> >>>> Here we have a driver function h calling the workhorse g which >>>> delegates some simple task to the inline-wannabe f. The distinctive >>>> aspect of the above scheme is that f is referenced from a struct >>>> member. The first call to g passes a static const struct while the >>>> second call passes a compound literal (alternatively, a local version >>>> of the struct will have the same effect regarding what follows). >>>> >>>> Now, say I compile this code with: >>>> >>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c >>>> >>>> The einline pass will not be able to inline calls to g with such a low >>>> value for early-inlining-insns. >>>> >>>> The inline_param2 pass still shows: >>>> >>>> ``` >>>> h (int x) >>>> { >>>> struct F D.1847; >>>> int _4; >>>> int _8; >>>> >>>> : >>>> _4 = g (f, x_2(D)); >>>> dosomething (_4); >>>> D.1847.call = sq; >>>> _8 = g (D.1847, x_2(D)); >>>> dosomething (_8); >>>> return; >>>> >>>> } >>>> >>>> ``` >>>> >>>> The next tree pass is fixup_cfg4, which does the inline but just for >>>> the second all to g: >>>> >>>> ``` >>>> h (int x) >>>> { >>>> >>>> >>>> : >>>> f = f; >>>> f$call_7 = MEM[(struct F *)&f]; >>>> x_19 = f$call_7 (x_2(D)); >>>>
Re: Early inlining and function references from static const struct (bug?)
> I was saying that early inlining is not supposed to catch this case > but IPA inlining. > it shouldn't need to inline g early to end up inlining the calls to sq. IPA > CP > should clone g for the case of it calling sq and then inlining should > just do its job. Ok, I fully agree with that, forcing early inlining to trigger late inlining is just a workaround. But there is still the fact that ipa inlining is indeed inlining the compound literal case (despite ipa cp failing there) while it's unable to inline the static const case that, as I understand it, you don't seem to consider a missed oportunity for ipa cp. So say you fix the "aggregate D.1772" scenario, I guess it won't change anything regarding ipa inlining. Cheers -- Carlos
Re: Early inlining and function references from static const struct (bug?)
I've reported this at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69708. Just to summarize: 1) If early inlining is forced then fre replaces the many references to sq and ipa inlining is able to do its job. 2) If early inlining is disabled then ipa inlining only works for the compound literal case. The cp pass (happening immediately before the ipa inline one) results in: ``` h (int x) { ... _4 = g (f, x_2(D)); dosomething (_4); D.1847.call = sq; _8 = g (D.1847, x_2(D)); dosomething (_8); } ``` Nevertheless ipa inline seems clever enough to expand the second call to g. 3) The proper solution seems to be that cp were able to propagate sq to both call sites in order to make things easy to ipa inline.