Re: dom1 prevents vectorization via partial loop peeling?

2015-04-29 Thread Jeff Law
On 04/28/2015 08:36 AM, Alan Lawrence wrote: Ah, yes, I'd not realized this was connected to the jump-threading issue, but I see that now. As you say, the best heuristics are unclear, and I'm not keen on trying *too hard* to predict what later phases will/won't do or do/don't want...maybe if the

Re: dom1 prevents vectorization via partial loop peeling?

2015-04-29 Thread Alan Lawrence
Richard Biener wrote: Well. In this case we hit /* If one of the loop header's edge is an exit edge then do not apply if-conversion. */ FOR_EACH_EDGE (e, ei, loop->header->succs) if (loop_exit_edge_p (loop, e)) return false; which is simply because even after if-conversion

Re: dom1 prevents vectorization via partial loop peeling?

2015-04-28 Thread Alan Lawrence
Ajit Kumar Agarwal wrote: -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Richard Biener Sent: Tuesday, April 28, 2015 4:12 PM To: Jeff Law Cc: Alan Lawrence; gcc@gcc.gnu.org Subject: Re: dom1 prevents vectorization via partial loop peeling

RE: dom1 prevents vectorization via partial loop peeling?

2015-04-28 Thread Ajit Kumar Agarwal
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Richard Biener Sent: Tuesday, April 28, 2015 4:12 PM To: Jeff Law Cc: Alan Lawrence; gcc@gcc.gnu.org Subject: Re: dom1 prevents vectorization via partial loop peeling? On Mon, Apr 27, 2015 at 7

Re: dom1 prevents vectorization via partial loop peeling?

2015-04-28 Thread Richard Biener
On Mon, Apr 27, 2015 at 7:06 PM, Jeff Law wrote: > On 04/27/2015 10:12 AM, Alan Lawrence wrote: >> >> >> After copyrename3, immediately prior to dom1, the loop body looks like: >> >>: >> >>: >># i_11 = PHI >>_5 = a[i_11]; >>_6 = i_11 & _5; >>if (_6 != 0) >> goto ; >>

Re: dom1 prevents vectorization via partial loop peeling?

2015-04-27 Thread Jeff Law
On 04/27/2015 10:12 AM, Alan Lawrence wrote: After copyrename3, immediately prior to dom1, the loop body looks like: : : # i_11 = PHI _5 = a[i_11]; _6 = i_11 & _5; if (_6 != 0) goto ; else goto ; : : # m_2 = PHI <5(4), 4(3)> _7 = m_2 * _5; b[i_1

dom1 prevents vectorization via partial loop peeling?

2015-04-27 Thread Alan Lawrence
PHI of a whole vector at a time, and so I'm wondering if anyone can give me any pointers here - am I barking up the right tree - and is it reasonable to persuade existing vectorizer loop-peeling code (e.g. for alignment) to do this for us too, or would anyone recommend a different avenue? A

Re: Loop peeling

2014-10-29 Thread Jan Hubicka
> On Tue, Oct 28, 2014 at 4:55 PM, Evandro Menezes > wrote: > > While doing some benchmark flag mining on AArch64, I noticed that > > -fpeel-loops was a mined option often. As a matter of fact, when using it > > always, even without FDO, it seemed to raise most benchmarks and to leave > > almost

Re: Loop peeling

2014-10-29 Thread Jan Hubicka
in code-size. > >>> It > >>> seems to me that it might be safe enough to be implied perhaps at -O3. > >>> Is > >>> there any reason why this never came into being? > > > > > > Loop peeling is done by default on AArch64 unless, IIRC, >

Re: Loop peeling

2014-10-29 Thread Richard Biener
at it might be safe enough to be implied perhaps at -O3. >>> Is >>> there any reason why this never came into being? > > > Loop peeling is done by default on AArch64 unless, IIRC, > -fvect-cost-model=cheap is specified which switches it off. There was a > general thre

Re: Loop peeling

2014-10-29 Thread Tejas Belagod
benchmarks and to leave almost all of the rest flat, with a barely noticeable cost in code-size. It seems to me that it might be safe enough to be implied perhaps at -O3. Is there any reason why this never came into being? Loop peeling is done by default on AArch64 unless, IIRC, -fvect-cost-model

Re: Loop peeling

2014-10-29 Thread Richard Biener
On Tue, Oct 28, 2014 at 4:55 PM, Evandro Menezes wrote: > While doing some benchmark flag mining on AArch64, I noticed that > -fpeel-loops was a mined option often. As a matter of fact, when using it > always, even without FDO, it seemed to raise most benchmarks and to leave > almost all of the r

Loop peeling

2014-10-28 Thread Evandro Menezes
While doing some benchmark flag mining on AArch64, I noticed that -fpeel-loops was a mined option often. As a matter of fact, when using it always, even without FDO, it seemed to raise most benchmarks and to leave almost all of the rest flat, with a barely noticeable cost in code-size. It seems t

Re: Vectorization: Loop peeling with misaligned support.

2013-11-17 Thread Ondřej Bílka
On Sun, Nov 17, 2013 at 04:42:18PM +0100, Richard Biener wrote: > "Ondřej Bílka" wrote: > >On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote: > >> "Ondřej Bílka" wrote: > >> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote: > >> > >> IIRC what can still be seen is st

Re: Vectorization: Loop peeling with misaligned support.

2013-11-17 Thread Richard Biener
"Ondřej Bílka" wrote: >On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote: >> "Ondřej Bílka" wrote: >> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote: >> >> IIRC what can still be seen is store-buffer related slowdowns when >you have a big unaligned store load in yo

Re: Vectorization: Loop peeling with misaligned support.

2013-11-17 Thread Toon Moene
On 11/16/2013 04:25 AM, Tim Prince wrote: Many decisions on compiler defaults still are based on an unscientific choice of benchmarks, with gcc evidently more responsive to input from the community. I'm also quite convinced that we are hampered by the fact that there is no IPA on alignment in

Re: Vectorization: Loop peeling with misaligned support.

2013-11-16 Thread Ondřej Bílka
On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote: > "Ondřej Bílka" wrote: > >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote: > > IIRC what can still be seen is store-buffer related slowdowns when you have a > big unaligned store load in your loop. Thus aligning st

Re: Vectorization: Loop peeling with misaligned support.

2013-11-16 Thread Richard Biener
"Ondřej Bílka" wrote: >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote: >> Also keep in mind that usually costs go up significantly if >> misalignment causes cache line splits (processor will fetch 2 lines). >> There are non-linear costs of filling up the store queue in modern >> o

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Tim Prince
On 11/15/2013 2:26 PM, Ondřej Bílka wrote: On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote: Also keep in mind that usually costs go up significantly if misalignment causes cache line splits (processor will fetch 2 lines). There are non-linear costs of filling up the store queue i

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Ondřej Bílka
On Fri, Nov 15, 2013 at 11:26:06PM +0100, Ondřej Bílka wrote: Minor correction, a mutt read replaced a set1.s file by one that I later used for avx2 variant. A correct file is following .file "set1.c" .text .p2align 4,,15 .globl set .type set, @function

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Ondřej Bílka
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote: > Also keep in mind that usually costs go up significantly if > misalignment causes cache line splits (processor will fetch 2 lines). > There are non-linear costs of filling up the store queue in modern > out-of-order processors (x86)

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Xinliang David Li
h tradeoff. > > Additionally, it seems hard to accurately estimate the costs. As Hendrik > pointed out, misaligned access will affect cache performance for some > processors. But for our processor, it is OK. Maybe just to pass a high cost > for misaligned access for such proces

RE: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Bingfeng Mei
guarantee to generate loop peeling. Bingfeng -Original Message- From: Xinliang David Li [mailto:davi...@google.com] Sent: 15 November 2013 17:30 To: Bingfeng Mei Cc: Richard Biener; gcc@gcc.gnu.org Subject: Re: Vectorization: Loop peeling with misaligned support. The right longer

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Xinliang David Li
values); David On Fri, Nov 15, 2013 at 7:21 AM, Bingfeng Mei wrote: > Hi, Richard, > Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop > peeling is also slower for our processors. > > By vectorization_cost, do you mean > TARGET_VECTORIZE_BUILTIN_VECTORIZATION_CO

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Hendrik Greving
Richard, > Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop > peeling is also slower for our processors. > > By vectorization_cost, do you mean > TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST hook? > > In our case, it is easy to make decision. But generally,

RE: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Bingfeng Mei
Hi, Richard, Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop peeling is also slower for our processors. By vectorization_cost, do you mean TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST hook? In our case, it is easy to make decision. But generally, if peeling loop is

Re: Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Richard Biener
On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote: > Hi, > In loop vectorization, I found that vectorizer insists on loop peeling even > our target supports misaligned memory access. This results in much bigger > code size for a very simple loo

Vectorization: Loop peeling with misaligned support.

2013-11-15 Thread Bingfeng Mei
Hi, In loop vectorization, I found that vectorizer insists on loop peeling even our target supports misaligned memory access. This results in much bigger code size for a very simple loop. I defined TARGET_VECTORIZE_SUPPORT_VECTOR_MISALGINMENT and also TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST

Re: Why doesn't vetorizer skips loop peeling/versioning for target supports hardware misaligned access?

2011-01-24 Thread Tim Prince
these patterns. Somehow this hook doesn't seem to be used. vect_enhance_data_refs_alignment is called regardless whether the target has HW misaligned support or not. Shouldn't using HW misaligned memory access be better than generating extra code for loop peeling/versioning? Or at least i

Re: Why doesn't vetorizer skips loop peeling/versioning for target supports hardware misaligned access?

2011-01-24 Thread Ira Rosen
pportable_dr_alignment to decide whether a specific misaligned access is supported. > > Shouldn't using HW misaligned memory access be better than > generating extra code for loop peeling/versioning? Or at least > if for some architectures it is not the case, we should have > a comp

Why doesn't vetorizer skips loop peeling/versioning for target supports hardware misaligned access?

2011-01-24 Thread Bingfeng Mei
seem to be used. vect_enhance_data_refs_alignment is called regardless whether the target has HW misaligned support or not. Shouldn't using HW misaligned memory access be better than generating extra code for loop peeling/versioning? Or at least if for some architectures it is not th

loop peeling vs TARGET_CASE_VALUES_THRESHOLD

2010-06-30 Thread DJ Delorie
In unroll_loop_runtime_iterations() we emit a sequence of n_peel compare/jump instructions. Why don't we honor TARGET_CASE_VALUES_THRESHOLD here, and use a tablejump when n_peel is too big?

Re: Help on loop peeling

2009-08-17 Thread Eric Fisher
2009/8/15 Sebastian Pop : > You should put a TODO_update_ssa in the flags of prefetching pass. > With the attached patch I don't see an error. Thanks. Finally, I figure out that it's because that after copying the loop, I should have created a preheader to ensure that new_loop's preheader has onl

Re: Help on loop peeling

2009-08-14 Thread Sebastian Pop
Hi, > Seems that use info is not updated. > You should put a TODO_update_ssa in the flags of prefetching pass. With the attached patch I don't see an error. Also, why don't you use trunk for your developments? Sebastian diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h index 1d2e69a..1320b5a 10064

Re: Help on loop peeling

2009-08-14 Thread Eric Fisher
2009/8/13 Sebastian Pop : > Could you please send the patch you are working on, together with > a reduced testcase?  This could help to reproduce the error. Thanks. I put the patch and a test below. The patch is based on 4.4.0. It's just a toy, I haven't a nice design for now. Actually, first_n

Re: Help on loop peeling

2009-08-13 Thread Richard Guenther
On Thu, Aug 13, 2009 at 4:41 PM, Sebastian Pop wrote: > Hi, > > On Thu, Aug 13, 2009 at 04:02, Eric Fisher wrote: >> The error is reported in build2_stat, by >> >> gcc_assert (POINTER_TYPE_P (tt) && POINTER_TYPE_P (TREE_TYPE (arg0)) >>                && INTEGRAL_TYPE_P (TREE_TYPE (arg1)) >>        

Re: Help on loop peeling

2009-08-13 Thread Sebastian Pop
Hi, On Thu, Aug 13, 2009 at 04:02, Eric Fisher wrote: > The error is reported in build2_stat, by > > gcc_assert (POINTER_TYPE_P (tt) && POINTER_TYPE_P (TREE_TYPE (arg0)) >                && INTEGRAL_TYPE_P (TREE_TYPE (arg1)) >                && useless_type_conversion_p (sizetype, TREE_TYPE (arg1)

Help on loop peeling

2009-08-13 Thread Eric Fisher
Hi, I'm implementing a loop peeling function used in tree-ssa-loop-prefetch.c according to the following comment, /* Step 5: unroll the loop. TODO -- peeling of first and last few iterations so that we do not issue superfluous prefetches. */ I take the functions slpeel_* in