------- Comment #2 from burnus at gcc dot gnu dot org 2009-11-20 14:20 ------- (In reply to comment #0) > I think that forall statement must be at least as fast as equivalent > do- -end do construction.
The Fortran standardization committee thought likewise, however, as it turned out in practice, it is sometimes not trivial for the compiler to see whether there is any dependence on the RHS (right-hand side) with regards to the LHS and thus it might use a temporary array even if none is needed - and temporary arrays are slow (and memory hungry). Thus, a DO loop should be always faster or as fast as a FORALL (assignment) statement (unless, one does something really stupid in the DO loop). [At least that is what I gathered from the comments at comp.lang.fortran and which matches my knowledge regarding how it is done in gfortran.] Having said that, gfortran still should try to make your program as fast for FORALL as it is for the DO loop. > But the next program (variant of LU-decomposition) shows that fragment > containing forall statement is approximately at 2.5(!) times slower then > fragment with do-end do. You could check using -fdump-tree-original how the two versions are handled; my guess is that the FORALL version uses a temporary array. (-fdump-tree-original creates a <file.f90>.004* which contains a dump of the internal representation of your code, which looks similar to C.) Seemingly, Richard already looked at the dump and confirmed my suspicion. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118