On Tue, Jun 14, 2011 at 4:01 PM, Fang, Changpeng <changpeng.f...@amd.com> wrote: > A similar argument is for software prefetching, which we observed a ~2% > benefit on greyhound (not that much > for Bulldozer). We would also prefer turning on software prefetching at -O3 > for -mtune=generic.
Sure, we can put everything on the table and take a look. > Simply turning off 32byte aligned load split, which introduces > performance regressions on > Intel Sandy Bridge processors, isn't an appropriate solution. > > I am proposing a different approach so that we can improve > -mtune=generic performance > on current Intel and AMD processors. > > The current default GCC tuning, -mtune=generic, was implemented in > 2005 for Intel > Pentium 4, Core 2 and AMD K8 processors. Many optimization choices > are no longer > applicable to the current Intel nor AMD processors. > > We should choose a set of optimization choices for -mtune=generic, > including 32byte > unaligned load split, for the current Intel and AMD processors, which > should improve > performance with no performance regressions. > > -- H.J.