On Tue, Jul 24, 2018 at 3:47 PM Martin Jambor wrote:
>
> Hi,
>
> I'd like to propose again a new variant of a fix that I sent here in
> November (https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00881.html) that
> avoids store-to-load forwarding stalls in the ImageMagick benchmark by
> expanding copi
Hi,
I'd like to propose again a new variant of a fix that I sent here in
November (https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00881.html) that
avoids store-to-load forwarding stalls in the ImageMagick benchmark by
expanding copies of very small simple aggregates element-wise rather
than "by pie
On Fri, Nov 24, 2017 at 2:00 PM, Martin Jambor wrote:
> On Fri, Nov 24 2017, Richard Biener wrote:
>> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor wrote:
>>> Hi Richi,
>>>
>>> On Fri, Nov 24 2017, Richard Biener wrote:
On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
wrote:
> On
On Fri, Nov 24 2017, Richard Biener wrote:
> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor wrote:
>> Hi Richi,
>>
>> On Fri, Nov 24 2017, Richard Biener wrote:
>>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
>>> wrote:
On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>>
>> ..
>>
>
On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor wrote:
> Hi Richi,
>
> On Fri, Nov 24 2017, Richard Biener wrote:
>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
>> wrote:
>>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>
> ..
>
>> And yes, I've been worried about SRA as well here...
Hi Richi,
On Fri, Nov 24 2017, Richard Biener wrote:
> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
> wrote:
>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
..
> And yes, I've been worried about SRA as well here... it _does_
> have some early outs when seeing VIEW_CONVERT_EXPR
On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
wrote:
> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
> wrote:
>> On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor wrote:
>>> Hi,
>>>
>>> On Mon, Nov 13 2017, Richard Biener wrote:
The main concern here is that GIMPLE is not very well defin
On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
wrote:
> On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor wrote:
>> Hi,
>>
>> On Mon, Nov 13 2017, Richard Biener wrote:
>>> The main concern here is that GIMPLE is not very well defined for
>>> aggregate copies and that gimple-fold.c happily optimiz
On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor wrote:
> Hi,
>
> On Mon, Nov 13 2017, Richard Biener wrote:
>> The main concern here is that GIMPLE is not very well defined for
>> aggregate copies and that gimple-fold.c happily optimizes
>> memcpy (&a, &b, sizeof (a)) into a = b;
>>
>> struct A { s
On Thu, Nov 23, 2017 at 04:32:43PM +0100, Martin Jambor wrote:
> > struct A { short s; long i; long j; };
> > struct A a, b;
> > void foo ()
> > {
> > struct A c;
> > __builtin_memcpy (&c, &b, sizeof (struct A));
> > __builtin_memcpy (&a, &c, sizeof (struct A));
> > }
> > int main()
> > {
> >
Hi,
On Mon, Nov 13 2017, Richard Biener wrote:
> The main concern here is that GIMPLE is not very well defined for
> aggregate copies and that gimple-fold.c happily optimizes
> memcpy (&a, &b, sizeof (a)) into a = b;
>
> struct A { short s; long i; long j; };
> struct A a, b;
> void foo ()
> {
>
Hi,
I thought I sent the following email last Friday but found it in my
drafts folder right now, so let me send it now so that anybody
interested can see what the patch does on Haswell.
I have only skimmed through new messages in the thread. I am now
looking into something else right now but wil
> The chance here is, of course (find the PR, it exists...), that SRA then
> decomposes the char[] copy bytewise...
>
> That said, memcpy folding is easy to fix. The question is of course
> what the semantic of VIEW_CONVERTs is (SRA _does_ contain
> bail-outs on those). Like if you have
>
> str
On November 13, 2017 3:20:16 PM GMT+01:00, Michael Matz wrote:
>Hi,
>
>On Mon, 13 Nov 2017, Richard Biener wrote:
>
>> The chance here is, of course (find the PR, it exists...), that SRA
>then
>> decomposes the char[] copy bytewise...
>>
>> That said, memcpy folding is easy to fix. The question
Hi,
On Mon, 13 Nov 2017, Richard Biener wrote:
> The chance here is, of course (find the PR, it exists...), that SRA then
> decomposes the char[] copy bytewise...
>
> That said, memcpy folding is easy to fix. The question is of course
> what the semantic of VIEW_CONVERTs is (SRA _does_ contain
On Mon, Nov 13, 2017 at 2:46 PM, Michael Matz wrote:
> Hi,
>
> On Mon, 13 Nov 2017, Richard Biener wrote:
>
>> The main concern here is that GIMPLE is not very well defined for
>> aggregate copies and that gimple-fold.c happily optimizes
>> memcpy (&a, &b, sizeof (a)) into a = b;
>
> What you miss
Hi,
On Mon, 13 Nov 2017, Richard Biener wrote:
> The main concern here is that GIMPLE is not very well defined for
> aggregate copies and that gimple-fold.c happily optimizes
> memcpy (&a, &b, sizeof (a)) into a = b;
What you missed to mention is that we then discussed about rectifying this
sit
On Fri, Nov 3, 2017 at 5:38 PM, Martin Jambor wrote:
> Hi,
>
> On Thu, Oct 26, 2017 at 02:43:02PM +0200, Richard Biener wrote:
>> On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor wrote:
>> >
>> > Nevertheless, I still intend to experiment with the limit, I sent out
>> > this RFC exactly so that I d
Hi,
On Thu, Oct 26, 2017 at 02:43:02PM +0200, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor wrote:
> >
> > Nevertheless, I still intend to experiment with the limit, I sent out
> > this RFC exactly so that I don't spend a lot of time benchmarking
> > something that is eve
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote:
> >> I think the limit should be on the number of generated copies and not
> >> the overall size of the structure... If the struct were composed of
> >> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
> >>
> >> I wonder
On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener
wrote:
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote:
>>> I think the limit should be on the number of generated copies and not
>>> the overall size of the structure... If the struct were composed of
>>> 32 individual chars we wouldn't want
On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote:
>> I think the limit should be on the number of generated copies and not
>> the overall size of the structure... If the struct were composed of
>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>
>> I wonder how rep; mov
Hi,
On Thu, 26 Oct 2017, Martin Jambor wrote:
> > 35 bytes seems to be much - what is the code-size impact?
>
> I will find out and report on that. I need at least 32 bytes (four
> long ints) to fix imagemagick, where the problematic structure is:
Surely the final heuristic should look at the
> I think the limit should be on the number of generated copies and not
> the overall size of the structure... If the struct were composed of
> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>
> I wonder how rep; movb; interacts with store to load forwarding? Is
> that ma
On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor wrote:
> Hi,
>
> On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
>> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor wrote:
>> > Hi,
>> >
>> > I'd like to request comments to the patch below which aims to fix PR
>> > 80689, which is an
Hi,
On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor wrote:
> > Hi,
> >
> > I'd like to request comments to the patch below which aims to fix PR
> > 80689, which is an instance of a store-to-load forwarding stall on
> > x86_64 CPUs i
On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor wrote:
> Hi,
>
> I'd like to request comments to the patch below which aims to fix PR
> 80689, which is an instance of a store-to-load forwarding stall on
> x86_64 CPUs in the Image Magick benchmark, which is responsible for a
> slow down of up to 9%
Hi,
I'd like to request comments to the patch below which aims to fix PR
80689, which is an instance of a store-to-load forwarding stall on
x86_64 CPUs in the Image Magick benchmark, which is responsible for a
slow down of up to 9% compared to gcc 6, depending on options and HW
used. (Actually, I
28 matches
Mail list logo