The way Intel present #pragma simd (to users, to the OpenMP committee, to the C and C++ committees, etc) is that it is not a hint, it has a meaning. The meaning is defined in term of evaluation order. Both C and C++ define an evaluation order for sequential programs. #pragma simd relaxes the sequential order into a partial order: 0. subsequent iterations of the loop are chunked together and execute in lockstep 1. there is no change in the order of evaluation of expression within an iteration 2. if X and Y are expressions in the loop, and X(i) is the evaluation of X in iteration i, then for X sequenced before Y and iteration i evaluated before iteration j, X(i) is sequenced before Y(j).
A corollary is that the sequential order is always allowed, since it satisfies the partial order. However, the partial order allows the compiler to group copies of the same expression next to each other, and then to combine the scalar instructions into a vector instruction. There are other corollaries, such as that if multiple loop iterations write into an object defined outside of the loop then it has to be an undefined behavior, the vector moral equivalent of a data race. That is what induction variables and reductions are necessary exception to this rule and require explicit support. As far as correctness, by this definition, the programmer expressed that it is correct, and the compiler should not try to prove correctness. On performance heuristics side, the Intel compiler tries to not second guess the user. There are users who work much harder than just add a #pragma simd on unmodified sequential loops. There are various changes that may be necessary, and users who worked hard to get their loops in a good shape are unhappy if the compiler does second guess them. Robert. -----Original Message----- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Renato Golin Sent: Monday, February 17, 2014 7:14 AM To: tpri...@computer.org Cc: gcc Subject: Re: Vectorizer Pragmas On 17 February 2014 14:47, Tim Prince <n...@aol.com> wrote: > I'm continuing discussions with former Intel colleagues. If you are > asking for insight into how Intel priorities vary over time, I don't > expect much, unless the next beta compiler provides some inferences. > They have talked about implementing all of OpenMP 4.0 except user > defined reduction this year. That would imply more activity in that > area than on cilkplus, I'm expecting this. Any proposal to support Cilk in LLVM would be purely temporary and not endorsed in any way. > although some fixes have come in the latter. On the other hand I had > an issue on omp simd reduction(max: ) closed with the decision "will > not be fixed." We still haven't got pragmas for induction/reduction logic, so I'm not too worried about them. > I have an icc problem report in on fixing omp simd safelen so it is > more like the standard and less like the obsolete pragma simd vectorlength. Our width metadata is slightly different in that it means "try to use that length", rather than "it's safe to use that length", this is why I'm holding on use safelen for the moment. > Also, I have some problem reports active attempting to get > clarification of their omp target implementation. Same here... RTFM is not enough in this case. ;) > You may have noticed that omp parallel for simd in current Intel > compilers can be used for combined thread and simd parallelism, > including the case where the outer loop is parallelizable and > vectorizable but the inner one is not. That's my fear of going with omp simd directly. I don't want to be throwing threads all over the place when all I really want is vector code. For the time, my proposal is to use legacy pragmas: vector/novector, unroll/nounroll and simd vectorlength which map nicely to the metadata we already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up with simple non-threaded pragmas, we should use those and deprecate the legacy ones. If GCC is trying to do the same thing regarding non-threaded-vector code, I'd be glad to be involved in the discussion. Some LLVM folks think this should be an OpenMP discussion, I personally think it's pushing the boundaries a bit too much on an inherently threaded library extension. cheers, --renato