On Tue, Jun 14, 2011 at 3:16 AM, Jakub Jelinek <ja...@redhat.com> wrote: > On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote: >> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <changpeng.f...@amd.com> >> wrote: >> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which >> > introduces splitting avx256 unaligned loads. >> > However, we found that it causes significant regressions for cpu2006 ( >> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ). >> > >> > In this work, we introduce a tune option that sets splitting unaligned >> > loads default only for such CPUs that such splitting >> > is beneficial. >> > >> > The patch passed bootstrapping and regression tests on >> > x86_64-unknown-linux-gnu system. >> > >> > Is it OK to commit? >> >> It probably should go to the 4.6 branch as well. Note that I find the >> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd, >> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD? > > I also wonder what we should do for -mtune=generic. Should we split or not? > How big improvement is it on Intel chips, how big degradation does it > cause on AMD chips (I assume no other chip maker currently supports AVX)? >
Simply turning off 32byte aligned load split, which introduces performance regressions on Intel Sandy Bridge processors, isn't an appropriate solution. I am proposing a different approach so that we can improve -mtune=generic performance on current Intel and AMD processors. The current default GCC tuning, -mtune=generic, was implemented in 2005 for Intel Pentium 4, Core 2 and AMD K8 processors. Many optimization choices are no longer applicable to the current Intel nor AMD processors. We should choose a set of optimization choices for -mtune=generic, including 32byte unaligned load split, for the current Intel and AMD processors, which should improve performance with no performance regressions. -- H.J.