On 04/28/2015 08:36 AM, Alan Lawrence wrote:
Ah, yes, I'd not realized this was connected to the jump-threading
issue, but I see that now. As you say, the best heuristics are unclear,
and I'm not keen on trying *too hard* to predict what later phases
will/won't do or do/don't want...maybe if the
Richard Biener wrote:
Well. In this case we hit
/* If one of the loop header's edge is an exit edge then do not
apply if-conversion. */
FOR_EACH_EDGE (e, ei, loop->header->succs)
if (loop_exit_edge_p (loop, e))
return false;
which is simply because even after if-conversion
Ajit Kumar Agarwal wrote:
-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Richard
Biener
Sent: Tuesday, April 28, 2015 4:12 PM
To: Jeff Law
Cc: Alan Lawrence; gcc@gcc.gnu.org
Subject: Re: dom1 prevents vectorization via partial loop peeling
-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Richard
Biener
Sent: Tuesday, April 28, 2015 4:12 PM
To: Jeff Law
Cc: Alan Lawrence; gcc@gcc.gnu.org
Subject: Re: dom1 prevents vectorization via partial loop peeling?
On Mon, Apr 27, 2015 at 7
On Mon, Apr 27, 2015 at 7:06 PM, Jeff Law wrote:
> On 04/27/2015 10:12 AM, Alan Lawrence wrote:
>>
>>
>> After copyrename3, immediately prior to dom1, the loop body looks like:
>>
>>:
>>
>>:
>># i_11 = PHI
>>_5 = a[i_11];
>>_6 = i_11 & _5;
>>if (_6 != 0)
>> goto ;
>>
On 04/27/2015 10:12 AM, Alan Lawrence wrote:
After copyrename3, immediately prior to dom1, the loop body looks like:
:
:
# i_11 = PHI
_5 = a[i_11];
_6 = i_11 & _5;
if (_6 != 0)
goto ;
else
goto ;
:
:
# m_2 = PHI <5(4), 4(3)>
_7 = m_2 * _5;
b[i_1
PHI of a
whole vector at a time, and so I'm wondering if anyone can give me any pointers
here - am I barking up the right tree - and is it reasonable to persuade
existing vectorizer loop-peeling code (e.g. for alignment) to do this for us
too, or would anyone recommend a different avenue?
A
> On Tue, Oct 28, 2014 at 4:55 PM, Evandro Menezes
> wrote:
> > While doing some benchmark flag mining on AArch64, I noticed that
> > -fpeel-loops was a mined option often. As a matter of fact, when using it
> > always, even without FDO, it seemed to raise most benchmarks and to leave
> > almost
in code-size.
> >>> It
> >>> seems to me that it might be safe enough to be implied perhaps at -O3.
> >>> Is
> >>> there any reason why this never came into being?
> >
> >
> > Loop peeling is done by default on AArch64 unless, IIRC,
>
at it might be safe enough to be implied perhaps at -O3.
>>> Is
>>> there any reason why this never came into being?
>
>
> Loop peeling is done by default on AArch64 unless, IIRC,
> -fvect-cost-model=cheap is specified which switches it off. There was a
> general thre
benchmarks and to leave
almost all of the rest flat, with a barely noticeable cost in code-size. It
seems to me that it might be safe enough to be implied perhaps at -O3. Is
there any reason why this never came into being?
Loop peeling is done by default on AArch64 unless, IIRC,
-fvect-cost-model
On Tue, Oct 28, 2014 at 4:55 PM, Evandro Menezes wrote:
> While doing some benchmark flag mining on AArch64, I noticed that
> -fpeel-loops was a mined option often. As a matter of fact, when using it
> always, even without FDO, it seemed to raise most benchmarks and to leave
> almost all of the r
While doing some benchmark flag mining on AArch64, I noticed that
-fpeel-loops was a mined option often. As a matter of fact, when using it
always, even without FDO, it seemed to raise most benchmarks and to leave
almost all of the rest flat, with a barely noticeable cost in code-size. It
seems t
On Sun, Nov 17, 2013 at 04:42:18PM +0100, Richard Biener wrote:
> "Ondřej Bílka" wrote:
> >On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
> >> "Ondřej Bílka" wrote:
> >> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> >>
> >> IIRC what can still be seen is st
"Ondřej Bílka" wrote:
>On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
>> "Ondřej Bílka" wrote:
>> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
>>
>> IIRC what can still be seen is store-buffer related slowdowns when
>you have a big unaligned store load in yo
On 11/16/2013 04:25 AM, Tim Prince wrote:
Many decisions on compiler defaults still are based on an unscientific
choice of benchmarks, with gcc evidently more responsive to input from
the community.
I'm also quite convinced that we are hampered by the fact that there is
no IPA on alignment in
On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
> "Ondřej Bílka" wrote:
> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
>
> IIRC what can still be seen is store-buffer related slowdowns when you have a
> big unaligned store load in your loop. Thus aligning st
"Ondřej Bílka" wrote:
>On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
>> Also keep in mind that usually costs go up significantly if
>> misalignment causes cache line splits (processor will fetch 2 lines).
>> There are non-linear costs of filling up the store queue in modern
>> o
On 11/15/2013 2:26 PM, Ondřej Bílka wrote:
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
Also keep in mind that usually costs go up significantly if
misalignment causes cache line splits (processor will fetch 2 lines).
There are non-linear costs of filling up the store queue i
On Fri, Nov 15, 2013 at 11:26:06PM +0100, Ondřej Bílka wrote:
Minor correction, a mutt read replaced a set1.s file by one that I later
used for avx2 variant. A correct file is following
.file "set1.c"
.text
.p2align 4,,15
.globl set
.type set, @function
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> Also keep in mind that usually costs go up significantly if
> misalignment causes cache line splits (processor will fetch 2 lines).
> There are non-linear costs of filling up the store queue in modern
> out-of-order processors (x86)
h tradeoff.
>
> Additionally, it seems hard to accurately estimate the costs. As Hendrik
> pointed out, misaligned access will affect cache performance for some
> processors. But for our processor, it is OK. Maybe just to pass a high cost
> for misaligned access for such proces
guarantee to generate
loop peeling.
Bingfeng
-Original Message-
From: Xinliang David Li [mailto:davi...@google.com]
Sent: 15 November 2013 17:30
To: Bingfeng Mei
Cc: Richard Biener; gcc@gcc.gnu.org
Subject: Re: Vectorization: Loop peeling with misaligned support.
The right longer
values);
David
On Fri, Nov 15, 2013 at 7:21 AM, Bingfeng Mei wrote:
> Hi, Richard,
> Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop
> peeling is also slower for our processors.
>
> By vectorization_cost, do you mean
> TARGET_VECTORIZE_BUILTIN_VECTORIZATION_CO
Richard,
> Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop
> peeling is also slower for our processors.
>
> By vectorization_cost, do you mean
> TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST hook?
>
> In our case, it is easy to make decision. But generally,
Hi, Richard,
Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop
peeling is also slower for our processors.
By vectorization_cost, do you mean TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
hook?
In our case, it is easy to make decision. But generally, if peeling loop is
On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote:
> Hi,
> In loop vectorization, I found that vectorizer insists on loop peeling even
> our target supports misaligned memory access. This results in much bigger
> code size for a very simple loo
Hi,
In loop vectorization, I found that vectorizer insists on loop peeling even our
target supports misaligned memory access. This results in much bigger code size
for a very simple loop. I defined TARGET_VECTORIZE_SUPPORT_VECTOR_MISALGINMENT
and also TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
these patterns. Somehow this
hook doesn't seem to be used. vect_enhance_data_refs_alignment
is called regardless whether the target has HW misaligned support
or not.
Shouldn't using HW misaligned memory access be better than
generating extra code for loop peeling/versioning? Or at least
i
pportable_dr_alignment to decide whether a specific misaligned
access is supported.
>
> Shouldn't using HW misaligned memory access be better than
> generating extra code for loop peeling/versioning? Or at least
> if for some architectures it is not the case, we should have
> a comp
seem to be used. vect_enhance_data_refs_alignment
is called regardless whether the target has HW misaligned support
or not.
Shouldn't using HW misaligned memory access be better than
generating extra code for loop peeling/versioning? Or at least
if for some architectures it is not th
In unroll_loop_runtime_iterations() we emit a sequence of n_peel
compare/jump instructions. Why don't we honor
TARGET_CASE_VALUES_THRESHOLD here, and use a tablejump when n_peel is
too big?
2009/8/15 Sebastian Pop :
> You should put a TODO_update_ssa in the flags of prefetching pass.
> With the attached patch I don't see an error.
Thanks. Finally, I figure out that it's because that after copying the
loop, I should have created a preheader to ensure that new_loop's
preheader has onl
Hi,
> Seems that use info is not updated.
>
You should put a TODO_update_ssa in the flags of prefetching pass.
With the attached patch I don't see an error.
Also, why don't you use trunk for your developments?
Sebastian
diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index 1d2e69a..1320b5a 10064
2009/8/13 Sebastian Pop :
> Could you please send the patch you are working on, together with
> a reduced testcase? This could help to reproduce the error.
Thanks.
I put the patch and a test below. The patch is based on 4.4.0. It's
just a toy, I haven't a nice design for now.
Actually, first_n
On Thu, Aug 13, 2009 at 4:41 PM, Sebastian Pop wrote:
> Hi,
>
> On Thu, Aug 13, 2009 at 04:02, Eric Fisher wrote:
>> The error is reported in build2_stat, by
>>
>> gcc_assert (POINTER_TYPE_P (tt) && POINTER_TYPE_P (TREE_TYPE (arg0))
>> && INTEGRAL_TYPE_P (TREE_TYPE (arg1))
>>
Hi,
On Thu, Aug 13, 2009 at 04:02, Eric Fisher wrote:
> The error is reported in build2_stat, by
>
> gcc_assert (POINTER_TYPE_P (tt) && POINTER_TYPE_P (TREE_TYPE (arg0))
> && INTEGRAL_TYPE_P (TREE_TYPE (arg1))
> && useless_type_conversion_p (sizetype, TREE_TYPE (arg1)
Hi,
I'm implementing a loop peeling function used in
tree-ssa-loop-prefetch.c according to the following comment,
/* Step 5: unroll the loop. TODO -- peeling of first and last few
iterations so that we do not issue superfluous prefetches. */
I take the functions slpeel_* in
38 matches
Mail list logo