On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka <j...@suse.de> wrote: >>> Did you try using FDO with -Os? FDO should make hot code parts >>> optimized similar to -O3 but leave other pieces optimized for size. >>> Using FDO with -O3 gives you the opposite, cold portions optimized >>> for size while the rest is optimized for speed. > > FDO with -Os still optimize for size, even in hot parts.
I don't think so. Or at least that would be a bug. Shouldn't 'hot' BBs/functions be optimized for speed even at -Os? Hm, I see predict.c indeed returns always false for optimize_size :( I thought we had just the neither cold or hot parts optimized according to optimize_size. > So to get resonale > speedups you need -O3+FDO. -O3+FDO effectively defaults to -Os in cold > portions of program. Well, but unless your training coverage is 100% all parts with no coverage get optimized with -O3 instead of -Os. And I bet coverage for mozilla isn't even close to 100%. Thus I think recommending -O3 for FDO is usually a bad idea. So - did you try FDO with -O2? ;) > Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown is > bug. It is not very thoroughly since it is not really used in practice. > >>> Also do you get any warnings on profile mismatches? Perhaps something >>> is wrong to the degree that the relevant part of profile gets >>> misapplied. >> >> I don't get any warning on profile mismatches. I only get a "few" >> missing gcda files warning, but that's expected. > > Perhaps you could compile one of less trivial files you are sure that are > covered by train run and send me -fdump-tree-all-blocks -fdump-ipa-all dumps > of the compilation so I can double check the profile seems sane. This could > be good start to rule out something stupid. > > Honza >> >> Cheers, >> >> Mike >> > > >