On Thu, May 22, 2014 at 10:38 AM, H.J. Lu wrote:
> On Thu, May 22, 2014 at 2:01 AM, Kirill Yukhin
> wrote:
>> Hello,
>> On 20 May 08:24, H.J. Lu wrote:
>>> ABI alignment should be sufficient for correctness. Bigger alignments
>>> are supposed to give better performance. Can you try this patch o
On Thu, May 22, 2014 at 2:01 AM, Kirill Yukhin wrote:
> Hello,
> On 20 May 08:24, H.J. Lu wrote:
>> ABI alignment should be sufficient for correctness. Bigger alignments
>> are supposed to give better performance. Can you try this patch on
>> HSW and SLM to see if it has any impact on performance
Hello,
On 20 May 08:24, H.J. Lu wrote:
> ABI alignment should be sufficient for correctness. Bigger alignments
> are supposed to give better performance. Can you try this patch on
> HSW and SLM to see if it has any impact on performance?
Here is perf. data of your patch.
Only HSW so far
HSW, 64
On Tue, May 20, 2014 at 5:00 AM, Kirill Yukhin wrote:
> Hello,
> On 19 May 09:58, H.J. Lu wrote:
>> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak wrote:
>> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote:
>> >
>> Uros,
>> I am looking into libreoffice size and the data alignment seems
Hello,
On 19 May 09:58, H.J. Lu wrote:
> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak wrote:
> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote:
> >
> Uros,
> I am looking into libreoffice size and the data alignment seems to make
> huge
> difference. Data section has grown f
On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak wrote:
> On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote:
>
Uros,
I am looking into libreoffice size and the data alignment seems to make
huge
difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8
and 4.9,
On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote:
>>> Uros,
>>> I am looking into libreoffice size and the data alignment seems to make huge
>>> difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8
>>> and 4.9,
>>> while clang produces 5.2MB.
>>>
>>> The two patches I posted t
On Mon, May 19, 2014 at 9:14 AM, Uros Bizjak wrote:
> On Mon, May 19, 2014 at 6:48 AM, Jan Hubicka wrote:
>>> > Thanks for the pointer, there is indeed the recommendation in
>>> > optimization manual [1], section 3.6.4, where it is said:
>>> >
>>> > --quote--
>>> > Misaligned data access can incu
On Mon, May 19, 2014 at 6:48 AM, Jan Hubicka wrote:
>> > Thanks for the pointer, there is indeed the recommendation in
>> > optimization manual [1], section 3.6.4, where it is said:
>> >
>> > --quote--
>> > Misaligned data access can incur significant performance penalties.
>> > This is particular
> > Thanks for the pointer, there is indeed the recommendation in
> > optimization manual [1], section 3.6.4, where it is said:
> >
> > --quote--
> > Misaligned data access can incur significant performance penalties.
> > This is particularly true for cache line
> > splits. The size of a cache line
On Fri, Jan 17, 2014 at 3:15 PM, Jakub Jelinek wrote:
> On Tue, Jan 14, 2014 at 08:12:41PM +0100, Jakub Jelinek wrote:
>> For 4.9, if what you've added is what you want to do for performance
>> reasons, then I'd do something like:
>
> Ok, here it is in a form of patch, bootstrapped/regtested on x8
On Tue, Jan 14, 2014 at 08:12:41PM +0100, Jakub Jelinek wrote:
> For 4.9, if what you've added is what you want to do for performance
> reasons, then I'd do something like:
Ok, here it is in a form of patch, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?
2014-01-17 Jakub Je
On Tue, Jan 14, 2014 at 07:37:33PM +0100, Uros Bizjak wrote:
> OK, let's play safe. I'll revert these two changes (modulo size of
> nocona prefetch block).
Thanks.
> > opt we never return a smaller number from ix86_data_alignment than
> > we did in 4.8 and earlier, because otherwise if you have 4
On Tue, Jan 14, 2014 at 10:37 AM, Uros Bizjak wrote:
> On Tue, Jan 14, 2014 at 6:09 PM, Jakub Jelinek wrote:
>
>>> On a second thought, the crossing of 16-byte boundaries is mentioned
>>> for the data *access* (the instruction itself) if it is not naturally
>>> aligned (please see example 3-40 an
On Tue, Jan 14, 2014 at 6:09 PM, Jakub Jelinek wrote:
>> On a second thought, the crossing of 16-byte boundaries is mentioned
>> for the data *access* (the instruction itself) if it is not naturally
>> aligned (please see example 3-40 and fig 3-2), which is *NOT* in our
>> case.
>>
>> So, we don'
On Fri, Jan 03, 2014 at 05:04:39PM +0100, Uros Bizjak wrote:
> On a second thought, the crossing of 16-byte boundaries is mentioned
> for the data *access* (the instruction itself) if it is not naturally
> aligned (please see example 3-40 and fig 3-2), which is *NOT* in our
> case.
>
> So, we don'
On Fri, Jan 3, 2014 at 3:02 PM, Uros Bizjak wrote:
>>> Like in the patch below. Please note, that the block_tune setting for
>>> the nocona is wrong, -march=native on my trusted old P4 returns:
>>>
>>> --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param
>>> "l2-cache-size=2048" "-m
On Fri, Jan 03, 2014 at 03:35:11PM +0100, Uros Bizjak wrote:
> On Fri, Jan 3, 2014 at 3:13 PM, Jakub Jelinek wrote:
> > On Fri, Jan 03, 2014 at 03:02:51PM +0100, Uros Bizjak wrote:
> >> Please note that previous value was based on earlier (pre P4)
> >> recommendation and it was appropriate for old
On Fri, Jan 3, 2014 at 3:13 PM, Jakub Jelinek wrote:
> On Fri, Jan 03, 2014 at 03:02:51PM +0100, Uros Bizjak wrote:
>> Please note that previous value was based on earlier (pre P4)
>> recommendation and it was appropriate for older chips with 32byte
>> cache line. The value should be updated long
On Fri, Jan 03, 2014 at 03:02:51PM +0100, Uros Bizjak wrote:
> Please note that previous value was based on earlier (pre P4)
> recommendation and it was appropriate for older chips with 32byte
> cache line. The value should be updated long ago, when 64bit cache
> lines were introduced, but was prob
On Fri, Jan 3, 2014 at 2:43 PM, Jakub Jelinek wrote:
> On Fri, Jan 03, 2014 at 02:35:36PM +0100, Uros Bizjak wrote:
>> Like in the patch below. Please note, that the block_tune setting for
>> the nocona is wrong, -march=native on my trusted old P4 returns:
>>
>> --param "l1-cache-size=16" --param
On Fri, Jan 03, 2014 at 02:35:36PM +0100, Uros Bizjak wrote:
> Like in the patch below. Please note, that the block_tune setting for
> the nocona is wrong, -march=native on my trusted old P4 returns:
>
> --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param
> "l2-cache-size=2048" "-mt
On Fri, Jan 3, 2014 at 1:27 PM, Uros Bizjak wrote:
>>> I am testing a patch that removes "max_align" part from ix86_data_alignment.
>>
>> That looks like unnecessary pessimization. Note the hunk in question is
>> guarded with opt, which means it is an optimization rather than ABI issue,
>> it ca
On Fri, Jan 3, 2014 at 12:59 PM, Jakub Jelinek wrote:
> On Fri, Jan 03, 2014 at 12:25:00PM +0100, Uros Bizjak wrote:
>> I am testing a patch that removes "max_align" part from ix86_data_alignment.
>
> That looks like unnecessary pessimization. Note the hunk in question is
> guarded with opt, whic
On Fri, Jan 03, 2014 at 12:25:00PM +0100, Uros Bizjak wrote:
> I am testing a patch that removes "max_align" part from ix86_data_alignment.
That looks like unnecessary pessimization. Note the hunk in question is
guarded with opt, which means it is an optimization rather than ABI issue,
it can inc
On Fri, Jan 3, 2014 at 12:20 PM, Eric Botcazou wrote:
>> When compiled with -m32 -mavx, we get:
>>
>> .align 32
>> .type a, @object
>> .size a, 32
>> a:
>>
>> so, the alignment was already raised elsewhere. We get .align 16 for
>> -msse -m32 when vectorizing.
>>
>> with
> When compiled with -m32 -mavx, we get:
>
> .align 32
> .type a, @object
> .size a, 32
> a:
>
> so, the alignment was already raised elsewhere. We get .align 16 for
> -msse -m32 when vectorizing.
>
> without -msse (and consequently without vectorizing), we get for -m
On Thu, Jan 2, 2014 at 11:18 PM, Eric Botcazou wrote:
>> Note that it has unexpected side-effects: previously, in 32-bit mode,
>> 256-bit aggregate objects would have been given 256-bit alignment; now,
>> they will fall back to default alignment, for example 32-bit only.
>
> In case this wasn't cl
> Note that it has unexpected side-effects: previously, in 32-bit mode,
> 256-bit aggregate objects would have been given 256-bit alignment; now,
> they will fall back to default alignment, for example 32-bit only.
In case this wasn't clear enough, just compile in 32-bit mode:
int a[8] = { 1, 2,
> > x86-64 ABI has clause about aligning static vars to 128bit boundary at a
> > given size. This was introduced to aid compiler to generate aligned
> > vector store/load even if the object may bind to other object file.
> > This is set to stone and can not be changed for AVX/SSE.
>
> Yes, but th
> x86-64 ABI has clause about aligning static vars to 128bit boundary at a
> given size. This was introduced to aid compiler to generate aligned
> vector store/load even if the object may bind to other object file.
> This is set to stone and can not be changed for AVX/SSE.
Yes, but that's irrelev
> > Frankly speaking, I do not understand, what's wrong here.
> > IMHO, this change is pretty mechanical: we just extend maximal aligment
> > available. Because of 512-bit data types we now extend maximal aligment to
> > 512 bits.
>
> Nothing wrong per se, but...
>
> > I suspect that an issue is
> Frankly speaking, I do not understand, what's wrong here.
> IMHO, this change is pretty mechanical: we just extend maximal aligment
> available. Because of 512-bit data types we now extend maximal aligment to
> 512 bits.
Nothing wrong per se, but...
> I suspect that an issue is here:
> if (op
Hello Eric,
On 02 Jan 00:07, Eric Botcazou wrote:
> The change is actually to ix86_data_alignment, not to ix86_constant_alignment:
>
> @@ -26219,7 +26433,8 @@ ix86_constant_alignment (tree exp, int align)
> int
> ix86_data_alignment (tree type, int align, bool opt)
> {
> - int max_align = opti
> gcc/
> 2013-12-30 Alexander Ivchenko
> Maxim Kuznetsov
> Sergey Lega
> Anna Tikhonova
> Ilya Tocar
> Andrey Turetskiy
> Ilya Verbin
> Kirill Yukhin
> Michael Zolotukhin
>
> * config/i386/i386
Hello Uroš, Jakub,
On 22 Dec 11:47, Uros Bizjak wrote:
> The x86 part is OK for mainline. You will also need approval from the
> middle-end reviewer for tree-* parts.
Thanks, I'am testing (in agreed volume, bootstrap passed so far) patch
in the bottom.
If no more inputs - I'll check it in to main
On Sun, Dec 22, 2013 at 11:47:52AM +0100, Uros Bizjak wrote:
> * tree-vect-stmts.c (vectorizable_load): Support AVX512's gathers.
> * tree-vectorizer.h (MAX_VECTORIZATION_FACTOR): Extend for 512
> bit vectors.
>
> I assumed the same testing procedure as described in the original su
On Wed, Dec 18, 2013 at 2:07 PM, Kirill Yukhin wrote:
> Hello,
>
> On 02 Dec 16:13, Kirill Yukhin wrote:
>> Hello,
>> On 19 Nov 12:14, Kirill Yukhin wrote:
>> > Hello,
>> > On 15 Nov 20:10, Kirill Yukhin wrote:
>> > > > Is it ok to commit to main trunk?
>> > > Ping.
>> > Ping.
>> Ping.
> Ping.
>
>
Hello,
On 02 Dec 16:13, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:14, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:10, Kirill Yukhin wrote:
> > > > Is it ok to commit to main trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.
Updated patch in the bottom.
--
Thanks, K
---
gcc/config/i386/i386.c
Hello,
On 19 Nov 12:14, Kirill Yukhin wrote:
> Hello,
> On 15 Nov 20:10, Kirill Yukhin wrote:
> > > Is it ok to commit to main trunk?
> > Ping.
> Ping.
Ping.
--
Thanks, K
Hello,
On 15 Nov 20:10, Kirill Yukhin wrote:
> > Is it ok to commit to main trunk?
> Ping.
Ping.
--
Thanks, K
Hello,
On 12 Nov 15:36, Kirill Yukhin wrote:
> Hello,
> Patch in the bottom extends some hooks toward AVX-512 support.
> This patch decrease icount for Spec2006 FP suite (ref set):
>
> Optset was: -static -m64 -fstrict-aliasing -fno-prefetch-loop-arrays
> -Ofast -funroll-loops -flto -march=core-av
Hello,
Patch in the bottom extends some hooks toward AVX-512 support.
This patch decrease icount for Spec2006 FP suite (ref set):
Optset was: -static -m64 -fstrict-aliasing -fno-prefetch-loop-arrays
-Ofast -funroll-loops -flto -march=core-avx2 -mtune=core-avx2
Lower is better.
Test\ArchI
43 matches
Mail list logo