[Bug tree-optimization/103797] Clang vectorized LightPixel while GCC does not

ubizjak at gmail dot com via Gcc-bugs Thu, 23 Dec 2021 00:12:44 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797


--- Comment #12 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #10)
> At least on your short testcase clang doesn't use divps either.
> We do support mulv2sf3, addv2sf3 etc. but not divv2sf3 I bet because with
> TARGET_MMX_WITH_SSE it would divide by zero in the 3rd and 4th elts,
> but perhaps we could insert 1.0f, 1.0f into those elements of the divisor
> before using divps?

It could be done, but I was under impression that the sequence to load 1.0f
into topmost elements nullifies the benefit of operation to divide two
elements.  However, if the missing pattern prevents longer vectorized chains,
this is not entirely true.

The division can be implemented in the same way as sse_cvtps2pi, but using
CONST1_RTX vector.

[Bug tree-optimization/103797] Clang vectorized LightPixel while GCC does not

Reply via email to