https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
Dominique d'Humieres changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
Dominique d'Humieres changed:
What|Removed |Added
Status|NEW |WAITING
--- Comment #47 from Domi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
--- Comment #46 from Tobias Burnus 2011-07-29
07:12:47 UTC ---
(In reply to comment #45)
[Commit to inline MINLOC/MAXLOC for a rank-1 array, which returns a
single-element rank-1 array.]
On my ~5 year old Athlon64 x2, I get with "-Ofast -march=
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
--- Comment #45 from Jakub Jelinek 2011-07-28
20:56:57 UTC ---
Author: jakub
Date: Thu Jul 28 20:56:50 2011
New Revision: 176897
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176897
Log:
PR fortran/31067
* frontend-passes.c (opti
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
Jakub Jelinek changed:
What|Removed |Added
Attachment #24856|0 |1
is obsolete|
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
--- Comment #43 from Jakub Jelinek 2011-07-28
13:51:32 UTC ---
Created attachment 24856
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24856
gcc47-pr31067.patch
Patch to optimize a = minloc (b) for rank 1 b into a = minloc (b, dim = 1),
acc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
--- Comment #42 from Richard Guenther 2011-07-25
15:39:54 UTC ---
With gas_dyn changed to use MINLOC (DTEMP, 1) we now inline the intrinsic
(but not with MINLOC (DTEMP), even though we know it'll be a single-element
array result ...).
We complet
--- Comment #41 from irar at il dot ibm dot com 2009-07-28 08:12 ---
That requires pattern recognition. MIN/MAX_EXPR are recognized by the first
phiopt pass, so MIN/MAXLOC should be either also recognized there or in the
vectorizer. (The phiopt pass transforms if clause to MIN/MAX_EXPR.
--- Comment #40 from jakub at gcc dot gnu dot org 2009-07-27 14:51 ---
If the cond_expr compute a minimum or maximum and the other cond_exprs compute
something based on the IV at the extremum, then I don't see why it couldn't be
vectorized by computing extremes of odd/even and correspond
--- Comment #39 from burnus at gcc dot gnu dot org 2009-07-27 13:15 ---
(In reply to comment #38)
> However, the loop can be split: [..]
> making the first loop vectorizable (inner-most loop vectorization).
OK. I tried it with a Fortran program:
http://users.physik.fu-berlin.de/~tburnus
--- Comment #38 from irar at il dot ibm dot com 2009-07-27 12:44 ---
I am not sure that that kind of computation can be generated automatically,
since in general the order of caclulation of cond_expr cannot be changed.
However, the loop can be split:
for (i = 0; i < end; i++)
if
--- Comment #37 from jakub at gcc dot gnu dot org 2009-07-27 11:10 ---
Oh, and on 64-bit arches and float or 32-bit arches and double there is another
complication - the comparison has different mode size from the cond_expr for
pos. For 32-bit pos and 64-bit double it could perhaps just
--- Comment #36 from jakub at gcc dot gnu dot org 2009-07-27 11:02 ---
Here is the loop in C and vectorized by hand as well:
#include
float arr[1024];
unsigned int
foo (unsigned int end)
{
unsigned int pos = 1;
unsigned int i;
float limit = __FLT_MAX__;
for (i = 0; i < end; i+
--- Comment #35 from burnus at gcc dot gnu dot org 2009-07-27 09:18 ---
(In reply to comment #34)
> Does ifort vectorize the exact same implemantion of minloc?
I tried to convert the minloc implementation into Fortran loops - and the
result is at
http://users.physik.fu-berlin.de/~tburnu
--- Comment #34 from irar at il dot ibm dot com 2009-07-27 08:36 ---
(In reply to comment #33)
> Using the example from comment 23 with
...
> gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt.
> and needs 2.272s. (By comparison. 4.4 needs 3.688s.)
This is for th
--- Comment #33 from burnus at gcc dot gnu dot org 2009-07-26 09:50 ---
(In reply to comment #32)
> > Regarding the just committed inline version: It would be interesting to know
> > whether it is vectorizable (with/without -ffinite-math-only [i.e.
> > -ffast-math]).
>
> It depends on w
--- Comment #32 from irar at il dot ibm dot com 2009-07-26 07:48 ---
(In reply to comment #30)
> Regarding the just committed inline version: It would be interesting to know
> whether it is vectorizable (with/without -ffinite-math-only [i.e.
> -ffast-math]).
It depends on where it is in
--- Comment #31 from jakub at gcc dot gnu dot org 2009-07-24 08:30 ---
Vectorization questions I'll defer to Ira.
For !optimize I even had a change to forcibly use the function call instead of
inline version. But it didn't really work, as there are only array versions of
the library fu
--- Comment #30 from burnus at gcc dot gnu dot org 2009-07-24 08:19 ---
Regarding the just committed inline version: It would be interesting to know
whether it is vectorizable (with/without -ffinite-math-only [i.e.
-ffast-math]).
Additionally, for size-1 result arrays, the function shou
--- Comment #29 from jakub at gcc dot gnu dot org 2009-07-24 07:57 ---
Subject: Bug 31067
Author: jakub
Date: Fri Jul 24 07:57:13 2009
New Revision: 150041
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=150041
Log:
PR fortran/40643
PR fortran/31067
* tran
--- Comment #28 from irar at il dot ibm dot com 2009-07-20 12:03 ---
I've just committed a patch that adds support of cond_expr in reductions in
nested cycles (http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01124.html).
cond_expr cannot be vectorized in reduction of inner-most loop, beca
--- Comment #27 from irar at il dot ibm dot com 2009-07-05 06:48 ---
(In reply to comment #23)
> because there are two reductions in that loop which I think the vectorizer
> cannot handle:
Actually, the vectorizer can vectorize two reductions. I think, the problem is
in cond_expr in red
--- Comment #26 from burnus at gcc dot gnu dot org 2009-07-03 13:07 ---
> has a superfluous check || (pos.0 == 0 && (*D.1568)[S.3 + D.1569] == limit.2)
> at least for flag_finite_math_only. If the array cannot contain Inf or NaN
> then it either has all elements == FLT_MAX, so pos will
--- Comment #25 from rguenth at gcc dot gnu dot org 2009-07-03 12:57
---
Btw, the inlined minloc
D.1570 = a.dim[0].lbound;
D.1571 = a.dim[0].ubound;
pos.0 = 0;
{
integer(kind=8) S.3;
({ S.3 = D.15
--- Comment #24 from burnus at gcc dot gnu dot org 2009-07-03 12:40 ---
> One issue is that
> ISET = MINLOC (DTEMP)
> will cause GCC to assume that DTEMP is clobbered.
The problem is that while "MINLOC" is pure, we cannot use DECL_PURE_P as the
result is passed by reference:
(voi
--- Comment #23 from rguenth at gcc dot gnu dot org 2009-07-03 12:19
---
We are not able to vectorize the loop in
program main
implicit none
integer, volatile, dimension(1) :: n
real, allocatable :: a(:)
integer :: i
real :: t1, t2
allocate (a(100))
call random_number(a)
--- Comment #22 from rguenth at gcc dot gnu dot org 2009-07-03 10:00
---
One issue is that
ISET = MINLOC (DTEMP)
will cause GCC to assume that DTEMP is clobbered.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
--- Comment #21 from dominiq at lps dot ens dot fr 2008-01-02 20:27 ---
> MATMUL is one distinctly possible one
Paul,
If you are interested, I have a variant of induct.f90 in which I have replaced
three dot-products by the matrix-vector product for a total disaster on all
compilers I ha
--- Comment #20 from pault at gcc dot gnu dot org 2008-01-02 19:30 ---
(In reply to comment #19)
> gfortran does inline most array intrinsics, but only if the result is a
> scalar.
> For most array intrinsics this isn't that much of a problem since usually one
> uses the variant that re
--- Comment #19 from jb at gcc dot gnu dot org 2007-06-27 14:49 ---
gfortran does inline most array intrinsics, but only if the result is a scalar.
For most array intrinsics this isn't that much of a problem since usually one
uses the variant that returns a scalar, but MINLOC is differen
--- Comment #18 from tkoenig at gcc dot gnu dot org 2007-06-15 20:35
---
Too little time right now.
Unassigning myself.
--
tkoenig at gcc dot gnu dot org changed:
What|Removed |Added
---
--- Comment #17 from jb at gcc dot gnu dot org 2007-05-18 21:20 ---
Or even better (duh):
REAL :: DTEMP
DT = HUGE(1.0d0)
DO I = 1, NODES
DTEMP = DX(I)/(ABS(VEL(I)+SOUND(I))
IF (DTEMP < DT) THEN
DT = DTEMP
END IF
END DO
--
http://gcc.gnu.org/bugzilla/show_bug.cg
--- Comment #16 from jb at gcc dot gnu dot org 2007-05-18 21:15 ---
The critical thing with inlining array intrinsics, IMHO is to give the
optimizer more data to work with allowing it to get rid of temp arrays, perform
loop fusion or fission etc. So with a trivial benchmark like #15, you
--- Comment #15 from tkoenig at gcc dot gnu dot org 2007-04-02 21:00
---
The library version doesn't do too badly compared
to the inline version:
$ cat benchmark-inline.f90
program main
implicit none
integer, dimension(1) :: n
real, allocatable :: a(:)
integer :: i
allocate
--- Comment #14 from tkoenig at gcc dot gnu dot org 2007-04-02 17:44
---
I'll give this another shot.
Maybe inlining isn't even necessary for good performance...
--
tkoenig at gcc dot gnu dot org changed:
What|Removed |Added
-
--- Comment #13 from pault at gcc dot gnu dot org 2007-03-26 12:43 ---
(In reply to comment #11)
> (In reply to comment #10)
Thomas,
It does not look too bad:
Look at the tail end of array_transfer -
gfc_trans_create_temp_array (&se->pre, &se->post, se->loop,
--- Comment #12 from pault at gcc dot gnu dot org 2007-03-26 11:37 ---
(In reply to comment #11)
> (In reply to comment #10)
> Do you have any idea what I cold do to turn this into an array?
> All the "interesting" gfc_conv_intrinsic_* functions have the
> "if (se->ss)" statement on top
--- Comment #11 from tkoenig at gcc dot gnu dot org 2007-03-13 20:12
---
(In reply to comment #10)
> Thomas, it's a bit kludgy, but why not add a constant expression = 1, if dim
> is
> not present?
Hi Paul,
unless I'm mistaken, this would also change the rank of the
function to 0, w
--- Comment #10 from pault at gcc dot gnu dot org 2007-03-12 19:04 ---
(In reply to comment #9)
>
> As a workaround, one could always use "minloc(...,dim=1)", then
> we get the inline version.
>
Thomas, it's a bit kludgy, but why not add a constant expression = 1, if dim is
not presen
--- Comment #9 from tkoenig at gcc dot gnu dot org 2007-03-11 19:43 ---
I have looked at this some more. Channging gfc_conv_intrinsic_function so that
we call gfc_conv_intrinsic_minmaxloc is easy enough:
@@ -3481,7 +3481,9 @@ gfc_conv_intrinsic_function (gfc_se * se
name = &expr->v
--- Comment #8 from tkoenig at gcc dot gnu dot org 2007-03-10 12:34 ---
(In reply to comment #7)
> (In reply to comment #6)
> > This makes minloc have rank 0, and allows for
> > inlining.
>
> No, it's wrong. See F95 13.14.71: "Result Characteristics. The result is of
> type default inte
--- Comment #7 from fxcoudert at gcc dot gnu dot org 2007-03-08 05:50
---
(In reply to comment #6)
> This makes minloc have rank 0, and allows for
> inlining.
No, it's wrong. See F95 13.14.71: "Result Characteristics. The result is of
type default integer. If DIM is absent, the result
--- Comment #6 from tkoenig at gcc dot gnu dot org 2007-03-07 21:29 ---
Created an attachment (id=13165)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13165&action=view)
Setting the correct rank in minloc
This makes minloc have rank 0, and allows for
inlining. I guess we'll find
--- Comment #5 from tkoenig at gcc dot gnu dot org 2007-03-07 21:09 ---
(In reply to comment #3)
> In gfc_conv_intrinsic_function, expr->rank is 0 for minval
> and 1 for minloc (which is bogus). I wonder where this is
> set...
To answer my own question: This is set in gfc_resolve_mi
--- Comment #4 from fxcoudert at gmail dot com 2007-03-07 21:09 ---
Subject: Re: MINLOC should sometimes be inlined (gas_dyn is so slw)
> In gfc_conv_intrinsic_function, expr->rank is 0 for minval
> and 1 for minloc (which is bogus).
It's not bogus. The MINLOC is an array of
--- Comment #3 from tkoenig at gcc dot gnu dot org 2007-03-07 21:00 ---
(In reply to comment #2)
> No, because we never get into gfc_conv_intrinsic_minmaxloc. We translate the
> expression directly into a function call by calling
> gfc_conv_intrinsic_funcall() at the head of gfc_conv_in
--- Comment #2 from fxcoudert at gcc dot gnu dot org 2007-03-07 12:18
---
(In reply to comment #1)
> We do this for minval, and from glancing at
> gfc_conv_intrinsic_minmaxval and gfc_conv_intrinsic_minmaxloc,
> it should happen already.
No, because we never get into gfc_conv_intrinsic
--- Comment #1 from tkoenig at gcc dot gnu dot org 2007-03-07 11:27 ---
(In reply to comment #0)
> Maybe we should have MINLOC inlined when there's no mask, stride 1 and
> one-dimensional?
Definitely. We do this for minval, and from glancing at
gfc_conv_intrinsic_minmaxval and gfc_con
48 matches
Mail list logo