https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65478
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenther at suse dot de --- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> --- Thanks, Martin. I now get Search unduplicated by ipa-cp. Funnily enough however the fix to vortex slowdown (caused by bug in inliner's LTO inline_failed bookeeping) caused differences in the inline decisions and now we do not inline FirstOne and LastOne. This seems to be very stupid implementation of clz int FirstOne(BITBOARD arg1) { union doub { unsigned short i[4]; BITBOARD d; }; #ifndef SPEC_CPU2000 register union doub x; #else union doub x; #endif /* SPEC_CPU2000 */ x.d=arg1; # if defined(LITTLE_ENDIAN_ARCH) if (x.i[3]) return (first_ones[x.i[3]]); if (x.i[2]) return (first_ones[x.i[2]]+16); if (x.i[1]) return (first_ones[x.i[1]]+32); if (x.i[0]) return (first_ones[x.i[0]]+48); # endif # if !defined(LITTLE_ENDIAN_ARCH) if (x.i[0]) return (first_ones[x.i[0]]); if (x.i[1]) return (first_ones[x.i[1]]+16); if (x.i[2]) return (first_ones[x.i[2]]+32); if (x.i[3]) return (first_ones[x.i[3]]+48); # endif return(64); } which unfortunately gets estimates as quite large by inliner: Analyzing function body size: FirstOne Accounting size:2.00, time:0.00 on new predicate:(not inlined) BB 2 predicate:(true) x.d = arg1_3(D); freq:1.00 size: 1 time: 1 Accounting size:1.00, time:1.00 on predicate:(true) _5 = x.i[3]; freq:1.00 size: 1 time: 1 Accounting size:1.00, time:1.00 on predicate:(true) if (_5 != 0) freq:1.00 size: 2 time: 2 Accounting size:2.00, time:2.00 on predicate:(true) BB 4 predicate:(true) _9 = x.i[2]; freq:0.61 size: 1 time: 1 Accounting size:1.00, time:0.61 on predicate:(true) if (_9 != 0) freq:0.61 size: 2 time: 2 Accounting size:2.00, time:1.22 on predicate:(true) ... so at this point we do not even see that x.d is the value arg. If the things was implemented by view_convert_expr, inliner would at least see that the return value of FirstOne depends on its parameter. Tree optimizers produce: Removing basic block 11 FirstOne (BITBOARD arg1) { union doub x; int _1; short unsigned int _5; int _6; unsigned char _7; int _8; short unsigned int _9; int _10; unsigned char _11; int _12; int _13; short unsigned int _14; int _15; unsigned char _16; int _17; int _18; short unsigned int _19; int _20; unsigned char _21; int _22; int _23; <bb 2>: x.d = arg1_3(D); _5 = x.i[3]; if (_5 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: _6 = (int) _5; _7 = first_ones[_6]; _8 = (int) _7; goto <bb 10>; ... this is somewhat lame. Richard, i believed we should synthetize view_convert_expr in this case? I am checking how much I need to bump up inline unit growth.