'm trying to studing the automatic vectorization optimization in GCC,
> > but I found one case that SLP vectorizer failed to do such things.
> >
> > Here is the sample code: (also a simplification version of a function
> > from the 625/525.x264 source code in SPEC CPU 2017)
&g
On Sat, May 25, 2024 at 3:08 PM Hanke Zhang via Gcc wrote:
>
> Hi,
> I'm trying to studing the automatic vectorization optimization in GCC,
> but I found one case that SLP vectorizer failed to do such things.
>
> Here is the sample code: (also a simplification version of a
Hi,
I'm trying to studing the automatic vectorization optimization in GCC,
but I found one case that SLP vectorizer failed to do such things.
Here is the sample code: (also a simplification version of a function
from the 625/525.x264 source code in SPEC CPU 2017)
void pixel_sub_wxh(int16_t
Hi, I've been working on vectorization-related optimization lately.
GCC seems to have some optimization vulnerabilities. I would like to
ask if it can be solved.
For example, for the following program using AVX2:
#include
// reg->node2[i].state is an unsigned long long variable
// reg-
> > > > >
> > > > > I came across a more complex version of that and found that gcc
> > > > > doesn't seem to handle it, so wanted to write a pass myself to
> > > > > optimize it.
> > > > >
> > > > > I got two question
ation, where should I put it? Put it behind the
> > > > pass_iv_optimize?
> > >
> > > GCC has the final value replacement pass (pass_scev_cprop) doing these
> > > kind of transforms. Since 'ans' does not have an affine evolution this
>
s
> > case would need to be pattern matched (there are some existing pattern
> > matchings in the pass).
> >
> > > Thanks
> > > Hanke Zhang
> > >
> > > Richard Biener 于2023年10月17日周二 20:00写道:
> > > >
> > > > On Tue, Oct 17,
g
> >
> > Richard Biener 于2023年10月17日周二 20:00写道:
> > >
> > > On Tue, Oct 17, 2023 at 1:54 PM Hanke Zhang wrote:
> > > >
> > > > Richard Biener 于2023年10月17日周二 17:26写道:
> > > > >
> > > > > On Thu, Oct 12, 2
10月17日周二 20:00写道:
> >
> > On Tue, Oct 17, 2023 at 1:54 PM Hanke Zhang wrote:
> > >
> > > Richard Biener 于2023年10月17日周二 17:26写道:
> > > >
> > > > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc
> > > > wrote:
> > > > >
> >
> > > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc
> > > wrote:
> > > >
> > > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small
> > > > problem and would like to ask for advice.
> > > >
>
Richard Biener 于2023年10月17日周二 17:26写道:
>
> On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc wrote:
> >
> > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small
> > problem and would like to ask for advice.
> >
> > For exa
On Tue, Oct 17, 2023 at 1:54 PM Hanke Zhang wrote:
>
> Richard Biener 于2023年10月17日周二 17:26写道:
> >
> > On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc wrote:
> > >
> > > Hi, I'm recently working on vectorization of GCC. I'm stuck in a small
On Thu, Oct 12, 2023 at 2:18 PM Hanke Zhang via Gcc wrote:
>
> Hi, I'm recently working on vectorization of GCC. I'm stuck in a small
> problem and would like to ask for advice.
>
> For example, for the following code:
>
> int main() {
> int size = 1000;
>
Hi, I'm recently working on vectorization of GCC. I'm stuck in a small
problem and would like to ask for advice.
For example, for the following code:
int main() {
int size = 1000;
int *foo = malloc(sizeof(int) * size);
int c1 = rand(), t1 = rand();
for (int i = 0; i
> > expected to change anything. Instead you can use -fno-tree-vectorize on
> > > > the second last one. Doing that I get 111s vs 41s thus doing both
> > > > helps.
> > > >
> > > > Note parallelization hasn't seen any development in th
he second last one. Doing that I get 111s vs 41s thus doing both helps.
> > >
> > > Note parallelization hasn't seen any development in the last years.
> > >
> > > Richard.
> >
> > Hi Richard:
> >
> > Thank you for your sincere
e following after I add
> `-fipo-info-vec`:
>
> gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec
> > test.c:29:5: optimized: loop vectorized using 32 byte vectors
> gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec -ftree-parallelize-loops=24
> > nothing happened
>
> That
incere reply.
I get what you mean above. But I still see the following after I add
`-fipo-info-vec`:
gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec
> test.c:29:5: optimized: loop vectorized using 32 byte vectors
gcc-10 test.c -O3 -flto -mavx2 -fopt-info-vec -ftree-parallelize-loops=24
> nothing
ime, if I use
> the option of `-ftree-parallelize-loops` alone, it will also bring a
> big efficiency gain. But if I use both options, vectorization fails,
> that is, I can't get the benefits of vectorization, I can only get the
> benefits of parallelizing loops.
>
> I know th
g
efficiency gain compared to doing nothing; At the same time, if I use
the option of `-ftree-parallelize-loops` alone, it will also bring a
big efficiency gain. But if I use both options, vectorization fails,
that is, I can't get the benefits of vectorization, I can only get the
benefits of p
,
> V2QI
>
> There are 11 modes.
> Should I increase the number from 8 to 11?
It will just perform dynamic allocation, no need to adjust.
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Richard Biener
> Date: 2023-08-31 19:29
> To: juzhe.zh...@rivai.ai
>
Biener
Date: 2023-08-31 19:29
To: juzhe.zh...@rivai.ai
CC: gcc; richard.sandiford
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> Hi. Thanks Richard and Richi.
>
> Now, I figure out how to choose sma
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> Hi. Thanks Richard and Richi.
>
> Now, I figure out how to choose smaller LMUL now.
>
> void
> costs::finish_cost (const vector_costs *scalar_costs)
>
,v4,v8
> sub a2,a2,a5
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
>
> Fantastic architecture of GCC Vector Cost model!
>
> Thanks a lot.
>
>
> juzhe.zh...@rivai.ai
>
> From: Richard Bi
diford
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> Thanks Richi.
>
> I am trying to figure out how to adjust finish_cost to lower the LMUL
>
> For example:
>
> void
> foo (int32_t *__res
v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> .L5:
> ret
>
> I am experimenting whether we can adjust cost statically to make loop
> vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8.
> If we can do that, I think we can ap
"juzhe.zh...@rivai.ai" writes:
> Thanks Richi.
>
> I am trying to figure out how to adjust finish_cost to lower the LMUL
>
> For example:
>
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
> for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
>
> preferred_simd_mode p
oosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> Hi, Richard and Richi.
>
> Currently, we are statically returning vectorization factor in
> 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> according to compile option.
>
> For example
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> Hi, Richard and Richi.
>
> Currently, we are statically returning vectorization factor in
> 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> according to compile option.
>
> For example:
> void
> foo (int32_t *__r
Hi, Richard and Richi.
Currently, we are statically returning vectorization factor in
'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
according to compile option.
For example:
void
foo (int32_t *__restrict a, int32_t *__restrict b, int n)
{
for (int i = 0; i < n; i++)
a[i] = a[i] +
On Wed, Mar 22, 2023 at 4:57 PM Andrew Stubbs wrote:
>
> On 22/03/2023 13:56, Richard Biener wrote:
> >> Basically, the -ffast-math instructions will always be the fastest way,
> >> but the goal is that the default optimization shouldn't just disable
> >> ve
On 22/03/2023 13:56, Richard Biener wrote:
Basically, the -ffast-math instructions will always be the fastest way,
but the goal is that the default optimization shouldn't just disable
vectorization entirely for any loop that has a divide in it.
We try to express division as multiplication
ly an option, but I think there's
> quite a lot of code in those routines. I know how to do that option at
> least (except, maybe not the errno handling without making assumptions
> about the C runtime).
>
> Basically, the -ffast-math instructions will always be the fastest way
goal is that the default optimization shouldn't just disable
vectorization entirely for any loop that has a divide in it.
Andrew
On Tue, Mar 21, 2023 at 6:00 PM Andrew Stubbs wrote:
>
> Hi all,
>
> I want to be able to vectorize divide operators (softfp and integer),
> but amdgcn only has hardware instructions suitable for -ffast-math.
>
> We have recently implemented vector versions of all the libm functions,
> but the lib
Hi all,
I want to be able to vectorize divide operators (softfp and integer),
but amdgcn only has hardware instructions suitable for -ffast-math.
We have recently implemented vector versions of all the libm functions,
but the libgcc functions aren't builtins and therefore don't use those
hoo
ites:
> > >> > Alternatively only enable loop vectorization at -O2 (the above checks
> > >> > flag_tree_slp_vectorize as well). At least the cost model kind
> > >> > does not have any influence on BB vectorization, that is, we get the
> > >> &g
Hongtao
> > >Cc: gcc Mailing List ; Marek Polacek
> > >Subject: Hongtao Liu as x86 vectorization maintainer
> > >
> > >I am pleased to announce that the GCC Steering Committee has appointed
> > >Hongtao Liu as maintainer of the i386 vector extensions in GCC.
On Mon, Jun 21, 2021 at 02:49:56AM +, Liu, Hongtao via Gcc wrote:
> >-Original Message-
> >From: Jason Merrill
> >Sent: Monday, June 21, 2021 10:07 AM
> >To: Liu, Hongtao
> >Cc: gcc Mailing List ; Marek Polacek
> >Subject: Hongtao Liu as x86
>-Original Message-
>From: Jason Merrill
>Sent: Monday, June 21, 2021 10:07 AM
>To: Liu, Hongtao
>Cc: gcc Mailing List ; Marek Polacek
>Subject: Hongtao Liu as x86 vectorization maintainer
>
>I am pleased to announce that the GCC Steering Committee has a
I am pleased to announce that the GCC Steering Committee has appointed
Hongtao Liu as maintainer of the i386 vector extensions in GCC.
Hongtao, please update your listing in the MAINTAINERS file.
Cheers,
Jason
On Tue, Apr 14, 2020 at 4:39 PM Shubham Narlawar via Gcc
wrote:
>
> Hello,
>
> I am working on gcc-4.9.4 and encountered different results of loop
> vectorization on array arr0, arr1 and arr2.
>
> Testcase -
>
> int main()
> {
> int i;
> for (i=0; i
Hello,
I am working on gcc-4.9.4 and encountered different results of loop
vectorization on array arr0, arr1 and arr2.
Testcase -
int main()
{
int i;
for (i=0; i<64; i++)
{
arr2[i]=(arr1[i]|arr0[i]);
}
}
Using -O2 -ftree-vectorize, Above loop is vectorized
On March 24, 2020 5:45:05 PM GMT+01:00, Roger Martz via Gcc
wrote:
>I was glad to see that compiler flags such as -fopt-info-vec-missed ...
>provide information about what is happening under the hood w.r.t code
>that
>can and can't be vectorized.
>
>Can anyone point me to a document, etc. that wo
I was glad to see that compiler flags such as -fopt-info-vec-missed ...
provide information about what is happening under the hood w.r.t code that
can and can't be vectorized.
Can anyone point me to a document, etc. that would be helpful in
understanding what the messages output from the compiler
; 2. The current version considers only PLUS reduction
> >> as it is encountered most often and therefore is the
> >> most practical;
> >>
> >> 3. While normally SLP transformation should operate
> >> inside single basic block this requirement greatly
> &g
sake the current version does not deal
with partial reductions which would require partial sum
merging and careful removal of the scalars that participate
in the vector part. The latter gets done automatically
by DCE in the case of full reduction vectorization;
5. There is no cost model yet for the
ll be vectorizable subexpressions
> defined in basic block(s) different from that where the
> reduction result resides. However, for the sake of
> simplicity only single uses in the same block are
> considered now;
>
> 4. For the same sake the current version does not deal
>
se of full reduction vectorization;
5. There is no cost model yet for the reasons mentioned
in the paragraphs 3 and 4.
Thanks in advance.
--
Anton
>From eb2644765d68ef1c629e584086355a8d66df7c73 Mon Sep 17 00:00:00 2001
From: Anton Youdkevitch
Date: Fri, 9 Nov 2018 20:50:05 +0300
Subject: [PATCH
> On Mon, Jan 07, 2019 at 09:29:09AM +0100, Richard Biener wrote:
> > On Sun, 6 Jan 2019, Jan Hubicka wrote:
> > > Even though it is late in release cycle I wonder if we can do that for
> > > GCC 9? Performance of vectorization is very architecture specific, I
&
On Mon, Jan 07, 2019 at 09:29:09AM +0100, Richard Biener wrote:
> On Sun, 6 Jan 2019, Jan Hubicka wrote:
> > Even though it is late in release cycle I wonder if we can do that for
> > GCC 9? Performance of vectorization is very architecture specific, I
> > would propose enabl
hmark difference between
> > cost models.
>
> ; Alias to enable both -ftree-loop-vectorize and -ftree-slp-vectorize.
> ftree-vectorize
> Common Report Optimization
> Enable vectorization on trees.
Thanks! I would probably fall into that trap and run same set of
benchmarks again.
Honza
>
> --
> Eric Botcazou
gt; cost models.
; Alias to enable both -ftree-loop-vectorize and -ftree-slp-vectorize.
ftree-vectorize
Common Report Optimization
Enable vectorization on trees.
--
Eric Botcazou
t2k17 zen generic +3.61%
> SPECint2k17 zen native +5.18%
>
> The performance results seems surprisingly a lot in favor of
> vectorization. Martin's setup is also checking code size which goes up
> by as much 26% on leslie 3d, but since many of benchmarks ar
p2k6 zen generic +9.98%
SPECfp2k6 zen native +7.04%
SPECfp2k17 zen generic +6.11%
SPECfp2k17 zen native +5.46%
SPECint2k17 zen generic +3.61%
SPECint2k17 zen native +5.18%
The performance results seems surprisingly a lot in favor of
vectorization. Martin
(resent because of mail issues on my end)
On Mon, 22 Oct 2018, Thomas Schwinge wrote:
I had a quick look at the difference, and a[j][i] remains in this form
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get
j_10 = tmp_17 / 1025;
i_11 = tmp_17 % 1025;
_1 = (long unsi
On Mon, Oct 22, 2018 at 6:35 PM Thomas Schwinge wrote:
>
> Hi!
>
> Thanks for all your comments already! I continued looked into this for a
> bit (but then got interrupted by a higher-priority task). Regarding this
> one specifically:
>
> On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse wrote:
>
Hi!
Thanks for all your comments already! I continued looked into this for a
bit (but then got interrupted by a higher-priority task). Regarding this
one specifically:
On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse wrote:
> On Fri, 12 Oct 2018, Thomas Schwinge wrote:
>
> > Hmm, and without a
Tamar Christina writes:
>> > so I'd need 5 parameters and then I'm guessing the other expressions
>> would be removed by DCE at some point?
>>
>> Are you planning to make the FCMLA behaviour directly available as an
>> internal function or provide a higher-level one that does a full complex
>> mu
Hi Richard,
Thanks for all the help so far,
> > so I'd need 5 parameters and then I'm guessing the other expressions
> would be removed by DCE at some point?
>
> Are you planning to make the FCMLA behaviour directly available as an
> internal function or provide a higher-level one that does a fu
Tamar Christina writes:
> Hi Richard,
>> > [...]
>> > 3) So I abandoned vec-patterns and instead tried to do it in
>> > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree
>> > is created. Matching the SLP tree is quite simple and getting it to
>> > emit the right SLP tree was si
Hi Richard,
> > [...]
> > 3) So I abandoned vec-patterns and instead tried to do it in
> > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree
> > is created. Matching the SLP tree is quite simple and getting it to
> > emit the right SLP tree was simple enough,except that at this
Tamar Christina writes:
> Hi All,
>
> I am trying to add support to the auto-vectorizer for complex operations where
> a target has instructions for.
>
> The instructions I have are only available as vector instructions. The
> operations
> are complex addition with a rotation or complex fmla wit
Hi All,
I am trying to add support to the auto-vectorizer for complex operations where
a target has instructions for.
The instructions I have are only available as vector instructions. The
operations
are complex addition with a rotation or complex fmla with a rotation for
half floats, floats an
t;_3 = (sizetype) i_11;
>_4 = _2 + _3;
>
> and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least
> I think that's true).
>
>
If this folding is correct, the dependence analysis would not have
to handle array accesses with div and mod, and it would b
< m1; j++)
{
a[i][j] = omp_get_thread_num ();
}
if (m_tail1)
for (int j = 0; j < m_tail1; j++)
...
with appropriate start/end for the i/j loop and the "epilogue" loop?
> > That is, can we delay the actual collapsing until after vectorization
> &
teration space might be diagonal or other not exactly
rectangular.
> That is, can we delay the actual collapsing until after vectorization
> for example?
No. We can come up with some way to propagate some of the original info to
the vectorizer if it helps (or teach vectorizer to recognize whate
> it more, etc.
> If we come up with some way to help the vectorizer with the collapsed loop,
> whether in a form of some loop flags, or internal fns, whatever, I'm all for
> it.
But isn't _actual_ collapsing an implementation detail? That is, isn't it
enough to interpret clauses in terms of the collapse result?
That is, can we delay the actual collapsing until after vectorization
for example?
Richard.
>
> Jakub
On Mon, Oct 15, 2018 at 10:55:26AM +0200, Richard Biener wrote:
> Yeah. Note this still makes the IVs not analyzable since i now effectively
> becomes wrapping in the inner loop. For some special values we might
> get away with a wrapping CHREC in a bit-precision type but we cannot
> represent wr
er loop:
>:
> i = i.0;
> j = j.1;
> _1 = a[i][j];
> _2 = _1 + 1;
> a[i][j] = _2;
> .iter.4 = .iter.4 + 1;
> j.1 = j.1 + 1;
> D.2912 = j.1 < n.7 ? 0 : 1;
> i.0 = D.2912 + i.0;
> j.1 = j.1 < n.7 ? j.1 : 0;
>
>:
> if (.iter.
i.0 = D.2912 + i.0;
j.1 = j.1 < n.7 ? j.1 : 0;
:
if (.iter.4 < D.2902)
goto ; [87.50%]
else
goto ; [12.50%]
to make it more vectorization friendly (though, in this particular case it
isn't vectorized either) and not do the expensive % and / operations inside
On Fri, 12 Oct 2018, Thomas Schwinge wrote:
Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:
for (int tmp = 0; tmp < N_J * N_I; ++tmp)
{
int j = tmp / N_I;
int i = tmp % N_I;
Hi!
I'm for the first time looking into the existing vectorization
functionality in GCC (yay!), and with that I'm also for the first time
encountering GCC's scalar evolution (scev) machinery (yay!), and the
chains of recurrences (chrec) used by that (yay!).
Obviously, I'm ri
On Thu, Oct 4, 2018 at 1:49 PM Steve Ellcey wrote:
>
> I was looking at PR tree-optimization/61247, where a loop with an unsigned
> int index on an LP64 platform was not getting vectorized and I noticed an
> odd thing. In the function below, if I define N as 1000 or 1, the
> loop does get vec
I was looking at PR tree-optimization/61247, where a loop with an unsigned
int index on an LP64 platform was not getting vectorized and I noticed an
odd thing. In the function below, if I define N as 1000 or 1, the
loop does get vectorized, even in LP64 mode. But if I define N as 10,
the
+ s2[x];
d += 16;
}
}
If we change “for( int x = 0; x < 16; x++ )” to be like “for( int x = 0; x
< 32; x++ )”, very beautiful vectorization code would be generated,
test_loop:
.LFB0:
.cfi_startproc
adrpx2, g_s1
adrpx3, g_s2
ad
On Thu, May 3, 2018 at 8:43 PM, Toon Moene wrote:
> Consider the attached Fortran code (the most expensive routine,
> computation-wise, in our weather forecasting model).
>
> verint.s.7.3 is the result of:
>
> gfortran -g -O3 -S -march=native -mtune=native verint.f
>
> using release 7.3.
>
> verin
Consider the attached Fortran code (the most expensive routine,
computation-wise, in our weather forecasting model).
verint.s.7.3 is the result of:
gfortran -g -O3 -S -march=native -mtune=native verint.f
using release 7.3.
verint.s.8.1 is the result of:
gfortran -g -O3 -S -march=native -mtun
On Fri, Feb 23, 2018 at 11:44:40AM -0800, Steve Ellcey wrote:
> I have a question about loop vectorization, OpenMP, and libmvec. I am
> experimenting with this on Aarch64 and looking at what exists on x86
> and trying to understand the relationship (if there is one) between the
> ve
I have a question about loop vectorization, OpenMP, and libmvec. I am
experimenting with this on Aarch64 and looking at what exists on x86
and trying to understand the relationship (if there is one) between the
vector library (libmvec) and OpenMP (libgomp).
On x86, an OpenMP loop with a sin
On October 21, 2017 9:50:13 PM GMT+02:00, Denis Bakhvalov
wrote:
>Hello Richard,
>Thank you. I achieved vectorization with vf = 16, using
>#pragma GCC optimize ("no-unroll-loops")
>__attribute__ ((__target__ ("sse4.2")))
>and options -march=core-avx2 -mprefer-a
Hello Richard,
Thank you. I achieved vectorization with vf = 16, using
#pragma GCC optimize ("no-unroll-loops")
__attribute__ ((__target__ ("sse4.2")))
and options -march=core-avx2 -mprefer-avx-128
But now I have a question: Is it possible in gcc to have vectorization
with vf
roll
>>> > this vectorized loop by some defined factor."
>>> >
>>> > I was playing with #pragma omp simd with the safelen clause, and
>>> > #pragma GCC optimize("unroll-loops") with no success. Compiler option
>>> > -fmax-unroll-t
laying with #pragma omp simd with the safelen clause, and
>> > #pragma GCC optimize("unroll-loops") with no success. Compiler option
>> > -fmax-unroll-times is not suitable for me, because it will affect
>> > other parts of the code.
>> >
>> > I
not suitable for me, because it will affect
> > other parts of the code.
> >
> > Is it possible to achieve this somehow?
>
> No.
#pragma omp simd has simdlen clause which is a hint on the preferable
vectorization factor, but the vectorizer doesn't use it so far;
pro
On Thu, Oct 19, 2017 at 9:22 AM, Denis Bakhvalov wrote:
> Hello!
>
> I have a hot inner loop which was vectorized by gcc, but I also want
> compiler to unroll this loop by some factor.
> It can be controled in clang with this pragma:
> #pragma clang loop vectorize(enable) vectorize_width(8)
> Plea
Hello!
I have a hot inner loop which was vectorized by gcc, but I also want
compiler to unroll this loop by some factor.
It can be controled in clang with this pragma:
#pragma clang loop vectorize(enable) vectorize_width(8)
Please see example here:
https://godbolt.org/g/UJoUJn
So I want to tell g
On Thu, Mar 9, 2017 at 9:12 AM, Jakub Jelinek wrote:
> On Thu, Mar 09, 2017 at 09:02:38AM +0100, Richard Biener wrote:
>> It would need to be done before graphite, and yes, the question is when
>> to do this (given the non-trival text size and runtime cost). One option is
>> to do sth similar lik
On Thu, Mar 09, 2017 at 09:02:38AM +0100, Richard Biener wrote:
> It would need to be done before graphite, and yes, the question is when
> to do this (given the non-trival text size and runtime cost). One option is
> to do sth similar like we do with IFN_LOOP_VECTORIZED, that is, after
> followup
On Wed, Mar 8, 2017 at 8:41 PM, Steve Ellcey wrote:
> On Tue, 2017-03-07 at 14:45 +0100, Michael Matz wrote:
>> Hi Steve,
>>
>> On Mon, 6 Mar 2017, Steve Ellcey wrote:
>>
>> >
>> > I was looking at the spec 456.hmmer benchmark and this email string
>> > from Jeff Law and Micheal Matz:
>> >
>> >
On Tue, 2017-03-07 at 14:45 +0100, Michael Matz wrote:
> Hi Steve,
>
> On Mon, 6 Mar 2017, Steve Ellcey wrote:
>
> >
> > I was looking at the spec 456.hmmer benchmark and this email string
> > from Jeff Law and Micheal Matz:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html
> >
Hi Steve,
On Mon, 6 Mar 2017, Steve Ellcey wrote:
> I was looking at the spec 456.hmmer benchmark and this email string
> from Jeff Law and Micheal Matz:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html
>
> and was wondering if anyone was looking at what more it would take
> for G
I was looking at the spec 456.hmmer benchmark and this email string
from Jeff Law and Micheal Matz:
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html
and was wondering if anyone was looking at what more it would take
for GCC to vectorize the loop in P7Viterbi. There is a big performanc
s?
> I am doing some experiments calculating coarse-grained register
> pressure for GIMPLE loop, but the motivation is not from vectorizer,
> but predcom/pre, like PR77498.
>
>> Perhaps something we could/should fix in the s390 backend? (Probably
>> hard to tell without source)
R77498.
> Perhaps something we could/should fix in the s390 backend? (Probably
> hard to tell without source)
>
> - Would it make sense to allow a backend to specify the minimal number
> of loop iterations considered for vectorization? Is this
> perhaps already possible somehow? I
a backend to specify the minimal number
of loop iterations considered for vectorization? Is this
perhaps already possible somehow? I added a check to disable
vectorization for loops with <= 3 iterations that shows no regressions
and improves two SPEC benchmarks noticeably. I'm even con
On Tue, Jan 12, 2016 at 11:05 PM, Jim Wilson wrote:
> On Tue, Jan 12, 2016 at 2:22 PM, Jim Wilson wrote:
>> I see a number of places in tree-vect-generic.c that add a
>> VIEW_CONVERT_EXPR if useless_type_convertsion_p is false. That should
>> work, except when I try this, I see that the VIEW_CON
On Tue, Jan 12, 2016 at 2:22 PM, Jim Wilson wrote:
> I see a number of places in tree-vect-generic.c that add a
> VIEW_CONVERT_EXPR if useless_type_convertsion_p is false. That should
> work, except when I try this, I see that the VIEW_CONVERT_EXPR gets
> converted to a NOP_EXPR by gimplify_build
I'm looking at an ICE on SPEC 2006 464.h264ref slice.c that occurs
with -O3 for both aarch64 and armhf.
palantir:2080$ ./xgcc -B./ -O3 -S slice.i
slice.c: In function ‘poc_ref_pic_reorder’:
slice.c:838:6: error: incorrect type of vector CONSTRUCTOR elements
{_48, _55, _189, _59}
vect_no_reorder_
On Fri, Jan 8, 2016 at 5:11 PM, Alan Lawrence
wrote:
> On Tues, Oct 27, 2015 at 2:39 PM, Richard Biener
> wrote:
>>
>> On Mon, Oct 26, 2015 at 6:59 AM, sameera
>> wrote:
>>>
>>>
>>> Richard, we have defined the input language for convenience in prototype
>>> implementation. However, we will be u
1 - 100 of 322 matches
Mail list logo