On Tue, Jun 14, 2011 at 3:16 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
>> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng <changpeng.f...@amd.com> 
>> wrote:
>> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which 
>> > introduces splitting avx256 unaligned loads.
>> > However, we found that it causes significant regressions for cpu2006 ( 
>> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>> >
>> > In this work, we introduce a tune option that sets splitting unaligned 
>> > loads default only for such CPUs that such splitting
>> > is beneficial.
>> >
>> > The patch passed bootstrapping and regression tests on 
>> > x86_64-unknown-linux-gnu system.
>> >
>> > Is it OK to commit?
>>
>> It probably should go to the 4.6 branch as well.  Note that I find the
>> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?
>
> I also wonder what we should do for -mtune=generic.  Should we split or not?
> How big improvement is it on Intel chips, how big degradation does it
> cause on AMD chips (I assume no other chip maker currently supports AVX)?
>

Simply turning off 32byte aligned load split, which introduces
performance regressions on
Intel Sandy Bridge processors, isn't an appropriate solution.

I am proposing a different approach so that we can improve
-mtune=generic performance
on current Intel and AMD processors.

The current default GCC tuning, -mtune=generic, was implemented in
2005 for Intel
Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
are no longer
applicable to the current Intel nor AMD processors.

We should choose a set of optimization choices for -mtune=generic,
including 32byte
unaligned load split, for the current Intel and AMD processors,  which
should improve
performance with no performance regressions.


-- 
H.J.

Reply via email to