On Tue, 11 Sep 2012, Richard Guenther wrote:
On Tue, Sep 11, 2012 at 10:41 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:
On Mon, Sep 10, 2012 at 6:37 PM, Richard Henderson <r...@redhat.com> wrote:
Whether or not the compiler creates a clone COULD BE totally up to the
compiler, based on whether or not vectorization is enabled, whether the
loop has been analyzed such that vectorization may proceed, or indeed
the phase of the moon.
But in order for that to happen, the clone must be totally private to
the module for which we are generating code (in the LTO sense, this is
the entire program or dll; without LTO, this is just the object file).
It means that we never attempt to generate clones for functions for
which the body of the function is not visible.
On the other hand, if you insist on assuming a clone exists merely
because a declaration bears an attribute, then you must address ALL
of the problems with respect to defining a stable ABI in the face of
different cpu revisions, different ISAs, and different vector lengths.
I've not seen you address ANY of these problems, despite having the
problem pointed out multiple times.
Indeed, if the definition of an elemental function is always visible to the
vectorizer the vectorizer itself can instruct the creation of the clone
if it does not already exist (just make those clones managed by the
callgraph). Then the clones are visible to the current TU only and no
ABI issues exist (though you could say that the vectorizer or the inliner
could as well force inlining of elemental functions into places it wants to
vectorize - one complication even with local clones is that the x86 ABI
has no callee-saved XMM registers which makes function calls inside
loops especially expensive).
I thought gcc wouldn't use the x86 ABI for those private calls. I guess
what I remember were vague discussions and not a description of the
current status...
Btw, this then happily fits into my suggestion that the "elementalness"
can be autodetected by the compiler simply by means of a proper IPA
pass and thus be fully LTO / whole-program aware. No need for an
attribute (where you'd need to handle the case that the attribute was placed
there by error).
Note that, apart from preventing external calls, it removes this use case:
__attribute__((vector(4))) double mysqrt(double x){return sqrt(x);}
__m256d var;
mysqrt(var);
I am not sure it is the best way to achieve this, but it is one way. I am
also planning a patch to turn {sqrt(a),sqrt(b)} into sqrt({a,b}) when the
target likes it. And there is a PR asking for a __builtin_math_sqrt.
--
Marc Glisse