Inconsistent initialization for pic_offset_table_rtx?
Hi, I noticed that pic_offset_table_rtx is initialized twice in GCC. Take x86_32 as an example. The first initialization is done in emit_init_regs, with below code: pic_offset_table_rtx = NULL_RTX; if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); On x86_32 with pic, we have: (gdb) call debug_rtx(this_target_rtl->x_pic_offset_table_rtx) (reg:SI 3 bx) The second initialization is in expand_used_vars, with below code: if (targetm.use_pseudo_pic_reg ()) pic_offset_table_rtx = gen_reg_rtx (Pmode); On x86_32 with pic, we have: (gdb) call debug_rtx(this_target_rtl->x_pic_offset_table_rtx) (reg:SI 87) So basically after expanding the first function, pic_offset_table_rtx is set to a pseudo register, rather than the one initialized in emit_init_regs. Also this causes inconsistent compilation for the first/rest functions in one compilation unit. A bug? Thanks, bin
Re: Inconsistent initialization for pic_offset_table_rtx?
2016-02-04 17:12 GMT+03:00 Bin.Cheng : > Hi, > I noticed that pic_offset_table_rtx is initialized twice in GCC. Take > x86_32 as an example. > The first initialization is done in emit_init_regs, with below code: > > pic_offset_table_rtx = NULL_RTX; > if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) > pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); > > On x86_32 with pic, we have: > > (gdb) call debug_rtx(this_target_rtl->x_pic_offset_table_rtx) > (reg:SI 3 bx) > > The second initialization is in expand_used_vars, with below code: > > if (targetm.use_pseudo_pic_reg ()) > pic_offset_table_rtx = gen_reg_rtx (Pmode); > > On x86_32 with pic, we have: > > (gdb) call debug_rtx(this_target_rtl->x_pic_offset_table_rtx) > (reg:SI 87) > > So basically after expanding the first function, pic_offset_table_rtx > is set to a pseudo register, rather than the one initialized in > emit_init_regs. > > Also this causes inconsistent compilation for the first/rest functions > in one compilation unit. > > A bug? For i386 target PIC_OFFSET_TABLE_REGNUM actually checks ix86_use_pseudo_pic_reg and is supposed to return INVALID_REGNUM in case we use pseudo register for PIC. BUT we hit a case when PIC code is generated for cost estimation via target hooks while performing some GIMPLE pass. In this case we need to return some register to generate PIC usage but we don't have any allocated. In this case we return a hard register. We detect such situation by checking pic_offset_table_rtx. Thus if we use pseudo PIC register but pic_offset_table_rtx is not initialized yet, then PIC_OFFSET_TABLE_REGNUM returns a hard register. So I suppose we may consider the first assignment as a bug. Thanks, Ilya > > Thanks, > bin
Early inlining and function references from static const struct (bug?)
Hi all, I've been trying to understand some bizarre interaction between optimizing passes I've observed while compiling a heavily nested inlined numerical code of mine. I managed to reduce the issue down to this simple code: ``` test.c typedef struct F { int (*call)(int); } F; static int g(F f, int x) { x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); return x; } static int sq(int x) { return x * x; } static const F f = {sq}; void dosomething(int); int h(int x) { dosomething(g(f, x)); dosomething(g((F){sq}, x)); } ``` Here we have a driver function h calling the workhorse g which delegates some simple task to the inline-wannabe f. The distinctive aspect of the above scheme is that f is referenced from a struct member. The first call to g passes a static const struct while the second call passes a compound literal (alternatively, a local version of the struct will have the same effect regarding what follows). Now, say I compile this code with: gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c The einline pass will not be able to inline calls to g with such a low value for early-inlining-insns. The inline_param2 pass still shows: ``` h (int x) { struct F D.1847; int _4; int _8; : _4 = g (f, x_2(D)); dosomething (_4); D.1847.call = sq; _8 = g (D.1847, x_2(D)); dosomething (_8); return; } ``` The next tree pass is fixup_cfg4, which does the inline but just for the second all to g: ``` h (int x) { : f = f; f$call_7 = MEM[(struct F *)&f]; x_19 = f$call_7 (x_2(D)); x_20 = f$call_7 (x_19); x_21 = f$call_7 (x_20); x_22 = f$call_7 (x_21); x_23 = f$call_7 (x_22); x_24 = f$call_7 (x_23); x_25 = f$call_7 (x_24); x_26 = f$call_7 (x_25); _43 = x_26; _4 = _43; dosomething (_4); D.1847.call = sq; f = D.1847; f$call_10 = MEM[(struct F *)&f]; _33 = x_2(D) * x_2(D); _45 = _33; x_11 = _45; _32 = x_11 * x_11; _46 = _32; x_12 = _46; _31 = x_12 * x_12; _47 = _31; x_13 = _47; _30 = x_13 * x_13; _48 = _30; x_14 = _48; _29 = x_14 * x_14; _49 = _29; x_15 = _49; _28 = x_15 * x_15; _50 = _28; x_16 = _50; _27 = x_16 * x_16; _51 = _27; x_17 = _51; _3 = x_17 * x_17; _52 = _3; x_18 = _52; _53 = x_18; _8 = _53; dosomething (_8); return; } ``` Now, say I recompile the code with a larger early-inlining-insns, so that einline is able to early inline both calls to g: gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c After inline_param2 (that is, before fixup_cfg4), we now have: ``` h (int x) { int x; int x; : x_13 = sq (x_2(D)); x_14 = sq (x_13); x_15 = sq (x_14); x_16 = sq (x_15); x_17 = sq (x_16); x_18 = sq (x_17); x_19 = sq (x_18); x_20 = sq (x_19); dosomething (x_20); x_5 = sq (x_2(D)); x_6 = sq (x_5); x_7 = sq (x_6); x_8 = sq (x_7); x_9 = sq (x_8); x_10 = sq (x_9); x_11 = sq (x_10); x_12 = sq (x_11); dosomething (x_12); return; } ``` And fixup_cfg4 is able to do its job for both calls: ``` h (int x) { : _36 = x_2(D) * x_2(D); _37 = _36; x_13 = _37; _35 = x_13 * x_13; _38 = _35; x_14 = _38; _34 = x_14 * x_14; _39 = _34; x_15 = _39; _33 = x_15 * x_15; _40 = _33; x_16 = _40; _32 = x_16 * x_16; _41 = _32; x_17 = _41; _31 = x_17 * x_17; _42 = _31; x_18 = _42; _30 = x_18 * x_18; _43 = _30; x_19 = _43; _29 = x_19 * x_19; _44 = _29; x_20 = _44; dosomething (x_20); _28 = x_2(D) * x_2(D); _45 = _28; x_5 = _45; _27 = x_5 * x_5; _46 = _27; x_6 = _46; _26 = x_6 * x_6; _47 = _26; x_7 = _47; _25 = x_7 * x_7; _48 = _25; x_8 = _48; _24 = x_8 * x_8; _49 = _24; x_9 = _49; _23 = x_9 * x_9; _50 = _23; x_10 = _50; _22 = x_10 * x_10; _51 = _22; x_11 = _51; _21 = x_11 * x_11; _52 = _21; x_12 = _52; dosomething (x_12); return; } ``` The bottom line is that I get full inlining if einline manages to early inline both g calls, but I get incomplete inlining otherwise. I guess the problem is that fixup_cfg4 is not able to infer that f$call_7 is just sq in disguise when f is the global static const struct but it is able to get it when it's a local or literal one. In case einline expands the code early the successive passes will make fixup_cfg4 see just sq in both cases, making inlining of sq a trivial matter. But if einline hits its hard limits, fixup_cfg4 will have to figure out that f$call is sq by itself. I'm not sure whether this should be considered a proper bug or more of a quirk of the inlining system one must learn to live with. In the first case, I'll report it if you ask me to do it. In the second case, I would like to ask for some advice about the best way to cope with this scenario (besides blindly incrementing early-inlining-insns); I can provide more background regarding my real use case if necessary. Cheers -- Carlos
Re: Inconsistent initialization for pic_offset_table_rtx?
On Thu, Feb 4, 2016 at 3:18 PM, Ilya Enkovich wrote: > 2016-02-04 17:12 GMT+03:00 Bin.Cheng : >> Hi, >> I noticed that pic_offset_table_rtx is initialized twice in GCC. Take >> x86_32 as an example. >> The first initialization is done in emit_init_regs, with below code: >> >> pic_offset_table_rtx = NULL_RTX; >> if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) >> pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); >> >> On x86_32 with pic, we have: >> >> (gdb) call debug_rtx(this_target_rtl->x_pic_offset_table_rtx) >> (reg:SI 3 bx) >> >> The second initialization is in expand_used_vars, with below code: >> >> if (targetm.use_pseudo_pic_reg ()) >> pic_offset_table_rtx = gen_reg_rtx (Pmode); >> >> On x86_32 with pic, we have: >> >> (gdb) call debug_rtx(this_target_rtl->x_pic_offset_table_rtx) >> (reg:SI 87) >> >> So basically after expanding the first function, pic_offset_table_rtx >> is set to a pseudo register, rather than the one initialized in >> emit_init_regs. >> >> Also this causes inconsistent compilation for the first/rest functions >> in one compilation unit. >> >> A bug? > > For i386 target PIC_OFFSET_TABLE_REGNUM actually checks > ix86_use_pseudo_pic_reg and is supposed to return INVALID_REGNUM > in case we use pseudo register for PIC. BUT we hit a case when PIC > code is generated for cost estimation via target hooks while performing > some GIMPLE pass. In this case we need to return some register to Thanks IIya. This is exact the case I ran into. See PR69042. > generate PIC usage but we don't have any allocated. In this case we > return a hard register. We detect such situation by checking > pic_offset_table_rtx. > > Thus if we use pseudo PIC register but pic_offset_table_rtx is not > initialized yet, > then PIC_OFFSET_TABLE_REGNUM returns a hard register. > > So I suppose we may consider the first assignment as a bug. But I don't quite follow. So hard register is returned so that gimple passes can construct PIC related addresses? If this is the case, the first initialization is necessary. Another question is about address cost: if (parts.index && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER) && (current_pass->type == GIMPLE_PASS || !pic_offset_table_rtx || !REG_P (parts.index) || REGNO (pic_offset_table_rtx) != REGNO (parts.index))) cost++; Is it a bug in the second sub condition? Considering "current_pass->type == GIMPLE_PASS" in the third sub condition, can I assume the second is for non-GIMPLE passes only? Thanks, bin > > Thanks, > Ilya > >> >> Thanks, >> bin
Re: Early inlining and function references from static const struct (bug?)
PS: I framed the issue between the inline_param and fixup_cfg passes because I was only looking at the tree passes, but the really relevant passes are tree-einline (obviously) and ipa-inline, which happens between tree-inline_param2 and tree-fixup_cfg2. So, restating the problem: if early inline is not happening, late inline will miss the chance to inline the reference from the static const struct member. On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita wrote: > Hi all, > > I've been trying to understand some bizarre interaction between > optimizing passes I've observed while compiling a heavily nested > inlined numerical code of mine. I managed to reduce the issue down to > this simple code: > > ``` test.c > > typedef struct F { > int (*call)(int); > } F; > > static int g(F f, int x) { > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > return x; > } > > static int sq(int x) { > return x * x; > } > > static const F f = {sq}; > > void dosomething(int); > > int h(int x) { > dosomething(g(f, x)); > dosomething(g((F){sq}, x)); > } > > ``` > > Here we have a driver function h calling the workhorse g which > delegates some simple task to the inline-wannabe f. The distinctive > aspect of the above scheme is that f is referenced from a struct > member. The first call to g passes a static const struct while the > second call passes a compound literal (alternatively, a local version > of the struct will have the same effect regarding what follows). > > Now, say I compile this code with: > > gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c > > The einline pass will not be able to inline calls to g with such a low > value for early-inlining-insns. > > The inline_param2 pass still shows: > > ``` > h (int x) > { > struct F D.1847; > int _4; > int _8; > > : > _4 = g (f, x_2(D)); > dosomething (_4); > D.1847.call = sq; > _8 = g (D.1847, x_2(D)); > dosomething (_8); > return; > > } > > ``` > > The next tree pass is fixup_cfg4, which does the inline but just for > the second all to g: > > ``` > h (int x) > { > > > : > f = f; > f$call_7 = MEM[(struct F *)&f]; > x_19 = f$call_7 (x_2(D)); > x_20 = f$call_7 (x_19); > x_21 = f$call_7 (x_20); > x_22 = f$call_7 (x_21); > x_23 = f$call_7 (x_22); > x_24 = f$call_7 (x_23); > x_25 = f$call_7 (x_24); > x_26 = f$call_7 (x_25); > _43 = x_26; > _4 = _43; > dosomething (_4); > D.1847.call = sq; > f = D.1847; > f$call_10 = MEM[(struct F *)&f]; > _33 = x_2(D) * x_2(D); > _45 = _33; > x_11 = _45; > _32 = x_11 * x_11; > _46 = _32; > x_12 = _46; > _31 = x_12 * x_12; > _47 = _31; > x_13 = _47; > _30 = x_13 * x_13; > _48 = _30; > x_14 = _48; > _29 = x_14 * x_14; > _49 = _29; > x_15 = _49; > _28 = x_15 * x_15; > _50 = _28; > x_16 = _50; > _27 = x_16 * x_16; > _51 = _27; > x_17 = _51; > _3 = x_17 * x_17; > _52 = _3; > x_18 = _52; > _53 = x_18; > _8 = _53; > dosomething (_8); > return; > > } > ``` > > Now, say I recompile the code with a larger early-inlining-insns, so > that einline is able to early inline both calls to g: > > gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c > > After inline_param2 (that is, before fixup_cfg4), we now have: > > ``` > h (int x) > { > int x; > int x; > > : > x_13 = sq (x_2(D)); > x_14 = sq (x_13); > x_15 = sq (x_14); > x_16 = sq (x_15); > x_17 = sq (x_16); > x_18 = sq (x_17); > x_19 = sq (x_18); > x_20 = sq (x_19); > dosomething (x_20); > x_5 = sq (x_2(D)); > x_6 = sq (x_5); > x_7 = sq (x_6); > x_8 = sq (x_7); > x_9 = sq (x_8); > x_10 = sq (x_9); > x_11 = sq (x_10); > x_12 = sq (x_11); > dosomething (x_12); > return; > > } > ``` > > And fixup_cfg4 is able to do its job for both calls: > > ``` > h (int x) > { > > > : > _36 = x_2(D) * x_2(D); > _37 = _36; > x_13 = _37; > _35 = x_13 * x_13; > _38 = _35; > x_14 = _38; > _34 = x_14 * x_14; > _39 = _34; > x_15 = _39; > _33 = x_15 * x_15; > _40 = _33; > x_16 = _40; > _32 = x_16 * x_16; > _41 = _32; > x_17 = _41; > _31 = x_17 * x_17; > _42 = _31; > x_18 = _42; > _30 = x_18 * x_18; > _43 = _30; > x_19 = _43; > _29 = x_19 * x_19; > _44 = _29; > x_20 = _44; > dosomething (x_20); > _28 = x_2(D) * x_2(D); > _45 = _28; > x_5 = _45; > _27 = x_5 * x_5; > _46 = _27; > x_6 = _46; > _26 = x_6 * x_6; > _47 = _26; > x_7 = _47; > _25 = x_7 * x_7; > _48 = _25; > x_8 = _48; > _24 = x_8 * x_8; > _49 = _24; > x_9 = _49; > _23 = x_9 * x_9; > _50 = _23; > x_10 = _50; > _22 = x_10 * x_10; > _51 = _22; > x_11 = _51; > _21 = x_11 * x_11; > _52 = _21; > x_12 = _52; > dosomething (x_12); > return; > > } > ``` > > The bottom line is that I get full inlining if einline manages to > early inline both g calls, but I get incomplete inlining other
Re: Early inlining and function references from static const struct (bug?)
PS 2 (last one, I swear): I've isolated what I think is the root of the problem. When einline expands g, there is plenty of call sites for f.call, so the full redundancy elimination pass replaces sum for f.call, making things easy for the late ipa inliner. But when g is not early inlined, there is only one call site for the global f.call and another one for the local/literal f.call, so the fre pass just lets them be. This is innocuous from the fre point of view, but disables further inlining as described above. On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita wrote: > PS: I framed the issue between the inline_param and fixup_cfg passes > because I was only looking at the tree passes, but the really relevant > passes are tree-einline (obviously) and ipa-inline, which happens > between tree-inline_param2 and tree-fixup_cfg2. So, restating the > problem: if early inline is not happening, late inline will miss the > chance to inline the reference from the static const struct member. > > On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita wrote: >> Hi all, >> >> I've been trying to understand some bizarre interaction between >> optimizing passes I've observed while compiling a heavily nested >> inlined numerical code of mine. I managed to reduce the issue down to >> this simple code: >> >> ``` test.c >> >> typedef struct F { >> int (*call)(int); >> } F; >> >> static int g(F f, int x) { >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> x = f.call(x); >> return x; >> } >> >> static int sq(int x) { >> return x * x; >> } >> >> static const F f = {sq}; >> >> void dosomething(int); >> >> int h(int x) { >> dosomething(g(f, x)); >> dosomething(g((F){sq}, x)); >> } >> >> ``` >> >> Here we have a driver function h calling the workhorse g which >> delegates some simple task to the inline-wannabe f. The distinctive >> aspect of the above scheme is that f is referenced from a struct >> member. The first call to g passes a static const struct while the >> second call passes a compound literal (alternatively, a local version >> of the struct will have the same effect regarding what follows). >> >> Now, say I compile this code with: >> >> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c >> >> The einline pass will not be able to inline calls to g with such a low >> value for early-inlining-insns. >> >> The inline_param2 pass still shows: >> >> ``` >> h (int x) >> { >> struct F D.1847; >> int _4; >> int _8; >> >> : >> _4 = g (f, x_2(D)); >> dosomething (_4); >> D.1847.call = sq; >> _8 = g (D.1847, x_2(D)); >> dosomething (_8); >> return; >> >> } >> >> ``` >> >> The next tree pass is fixup_cfg4, which does the inline but just for >> the second all to g: >> >> ``` >> h (int x) >> { >> >> >> : >> f = f; >> f$call_7 = MEM[(struct F *)&f]; >> x_19 = f$call_7 (x_2(D)); >> x_20 = f$call_7 (x_19); >> x_21 = f$call_7 (x_20); >> x_22 = f$call_7 (x_21); >> x_23 = f$call_7 (x_22); >> x_24 = f$call_7 (x_23); >> x_25 = f$call_7 (x_24); >> x_26 = f$call_7 (x_25); >> _43 = x_26; >> _4 = _43; >> dosomething (_4); >> D.1847.call = sq; >> f = D.1847; >> f$call_10 = MEM[(struct F *)&f]; >> _33 = x_2(D) * x_2(D); >> _45 = _33; >> x_11 = _45; >> _32 = x_11 * x_11; >> _46 = _32; >> x_12 = _46; >> _31 = x_12 * x_12; >> _47 = _31; >> x_13 = _47; >> _30 = x_13 * x_13; >> _48 = _30; >> x_14 = _48; >> _29 = x_14 * x_14; >> _49 = _29; >> x_15 = _49; >> _28 = x_15 * x_15; >> _50 = _28; >> x_16 = _50; >> _27 = x_16 * x_16; >> _51 = _27; >> x_17 = _51; >> _3 = x_17 * x_17; >> _52 = _3; >> x_18 = _52; >> _53 = x_18; >> _8 = _53; >> dosomething (_8); >> return; >> >> } >> ``` >> >> Now, say I recompile the code with a larger early-inlining-insns, so >> that einline is able to early inline both calls to g: >> >> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c >> >> After inline_param2 (that is, before fixup_cfg4), we now have: >> >> ``` >> h (int x) >> { >> int x; >> int x; >> >> : >> x_13 = sq (x_2(D)); >> x_14 = sq (x_13); >> x_15 = sq (x_14); >> x_16 = sq (x_15); >> x_17 = sq (x_16); >> x_18 = sq (x_17); >> x_19 = sq (x_18); >> x_20 = sq (x_19); >> dosomething (x_20); >> x_5 = sq (x_2(D)); >> x_6 = sq (x_5); >> x_7 = sq (x_6); >> x_8 = sq (x_7); >> x_9 = sq (x_8); >> x_10 = sq (x_9); >> x_11 = sq (x_10); >> x_12 = sq (x_11); >> dosomething (x_12); >> return; >> >> } >> ``` >> >> And fixup_cfg4 is able to do its job for both calls: >> >> ``` >> h (int x) >> { >> >> >> : >> _36 = x_2(D) * x_2(D); >> _37 = _36; >> x_13 = _37; >> _35 = x_13 * x_13; >> _38 = _35; >> x_14 = _38; >> _34 = x_14 * x_14; >> _39 = _34; >> x_15 = _39; >> _33 = x_15 * x_15; >> _40 = _33; >> x_16 = _40; >> _32 = x_16 * x_16; >> _41 = _32; >> x_17 = _41; >> _31 = x