https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790
--- Comment #4 from Matthias Kretz (Vir) ---
I'm still not familiar with this part of GCC, but isn't `_2 == { -1, -1, -1, -1
}` equivalent to _1, i.e. it reverses VEC_COND_EXPR? However, if the `==` is
supposed to return a scalar boolean instead
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716
--- Comment #3 from Matthias Kretz (Vir) ---
Created attachment 50877
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50877&action=edit
proposed patch
Ensure dump_template_decl for function templates never prints template
parameters after
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100763
--- Comment #1 from Matthias Kretz (Vir) ---
Created attachment 50876
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50876&action=edit
proposed patch
dump_type on 'const std::string' should not print 'const string' unless
TFF_UNQUALIFIED_
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
namespace A
{
struct B {};
using C = B;
}
void f(A::B&);
void f(A::C&);
void g(const A::B& b, const A::C& c) {
f(b);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716
--- Comment #2 from Matthias Kretz (Vir) ---
I'd like to revise my opinion above. dump_template_decl should never print the
template parameter list of functions. I.e. it should be 'template f()'
not 'template f()'. Because it's also declared wit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100716
--- Comment #1 from Matthias Kretz (Vir) ---
With -fno-pretty-templates both test cases do print the template_parms.
That's because in dump_function_decl, without flag_pretty_templates, t isn't
generalized and thus is not considered a primary t
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
CC: paolo.carlini at oracle dot com
Target Milestone: ---
template
struct A {
template
void f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728
--- Comment #10 from Matthias Kretz (Vir) ---
Is this the same issue:
struct A {
double v;
};
struct B {
double v;
B& operator=(const B& rhs) {
v = rhs.v;
return *this;
}
};
// 10 loads & stores
void f(A& a, const A& b) {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728
--- Comment #6 from Matthias Kretz (Vir) ---
> I guess I need it for unaligned loads/stores, correct? Otherwise __v4df
> should work everywhere.
1. You can freely reinterpret_cast by value between all the different
[[gnu::vector_size(N)]] types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728
--- Comment #4 from Matthias Kretz (Vir) ---
FWIW, using std::experimental::native_simd also does not hoist the
stores out of the loop. However, if you pass d by value and return d, the issue
goes away. So I guess this is an aliasing pessimizatio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201
--- Comment #5 from Matthias Kretz (Vir) ---
I reduced it some more:
template
auto
make_tester(const RefF& reffun)
{
return [=](auto in) {
auto&& expected = [&](const auto&... vs) {
if constexpr (sizeof(in) > 0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201
--- Comment #4 from Matthias Kretz (Vir) ---
Manual reduction which fails with 8-11 and compiles ok with 7:
template
void
test_values_2arg(F&&... fun_pack)
{
(fun_pack(V(), V()), ...);
}
template
auto
make_tester(const TestF&
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99201
--- Comment #3 from Matthias Kretz (Vir) ---
I'll try to find a better reduction.
++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase (reduced with C-Vise from valid code):
template void test_values_2arg(int, int, F... fun_pack) {
[] {}(fun_pack()...);
}
template auto make_tester(TestF, RefF) {
return [](auto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98894
--- Comment #1 from Matthias Kretz (Vir) ---
I already posted a fix on the gcc-patches and libstdc++ lists:
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Remove unnecessary static
assertion. Allow sizeof(8) integer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98834
--- Comment #3 from Matthias Kretz (Vir) ---
Created attachment 50055
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50055&action=edit
unreduced test case
This is the test case I gave to C-Vise. It's already reduced from a more
confusing t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98834
--- Comment #2 from Matthias Kretz (Vir) ---
This is reduced from a larger (4MB) testcase which doesn't have any unused
arguments.
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-pc-linux-gnu
Created attachment 50054
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50054&action=edit
test case
The a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949
--- Comment #8 from Matthias Kretz (Vir) ---
I've been doing a lot of research into the numeric_limits intent/meaning
recently. I also implemented and used alternative interpretations of "has NaN"
and "is IEC559". My conclusion: std::numeric_limi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96600
--- Comment #3 from Matthias Kretz (Vir) ---
I should be more precise. Take this test case:
int e = 69;
int main() {
__ibm128 a = -__builtin_ldexpl(
1.9446689187403240306919491832695730985733566864714824565497322973045558e+00l,
e);
__i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96600
--- Comment #2 from Matthias Kretz (Vir) ---
The runtime modf actually returns a large number. This is not about precision
but about completely bogus values. You can adjust the testcase to:
int e = 69;
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: powerpc64le-*-*
Test case:
int e = 69;
int main
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95493
--- Comment #10 from Matthias Kretz (Vir) ---
(In reply to Richard Biener from comment #7)
> Fixed on trunk sofar.
Is there anything I can help to get this backported to 10? I applied your patch
on my GCC 10 checkout since you committed it to ma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95713
--- Comment #6 from Matthias Kretz (Vir) ---
Thank you! I applied the patch (with the necessary context) to the GCC 10
branch and was able to verify that it also fixes my unreduced test cases.
: 10.1.0
Status: UNCONFIRMED
Keywords: ice-on-valid-code
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target
Keywords: wrong-code
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (https://godbolt.org/z/egnkd7), compile with `-O2 -std=c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38470
--- Comment #22 from Matthias Kretz (Vir) ---
(In reply to Matthias Kretz (Vir) from comment #21)
> However, -O2 would still show the warning.
I meant -O0 of course.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38470
Matthias Kretz (Vir) changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (`-O3`, cf. https://godbolt.org/z/jdfv3r):
#include
#include
#include
using f4 [[gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94343
--- Comment #9 from Matthias Kretz (Vir) ---
(In reply to Jakub Jelinek from comment #8)
> Created attachment 48128 [details]
> gcc10-pr94343.patch
The avx512vl-pr94343.c test should ideally fail because `_mm_andnot_si128
((__m128i) (~v ^ a), (_
: missed-optimization, wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: i386,x86-64
Test case (`-O1 -march=knl`, cf. https
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case `-O1 -march=skylake-avx512`:
int main
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90993
--- Comment #3 from Matthias Kretz (Vir) ---
IIUC, AVX512 only allows overriding the rounding-mode from div instructions. So
that wouldn't help.
What standard requires that "integer division is not permitted to raise the
"inexact" exception flag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919
Matthias Kretz (Vir) changed:
What|Removed |Added
Resolution|FIXED |DUPLICATE
--- Comment #6 from Mat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843
--- Comment #11 from Matthias Kretz (Vir) ---
*** Bug 93919 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919
Matthias Kretz (Vir) changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843
--- Comment #7 from Matthias Kretz (Vir) ---
This one exhibits the issue without -ftree-vectorize (`-O1` suffices) (cf.
https://godbolt.org/z/Swx-jW):
using M [[gnu::vector_size(2)]] = char;
using MM [[gnu::vector_size(4)]] = short;
MM
cvt(M x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93843
Matthias Kretz (Vir) changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93919
--- Comment #4 from Matthias Kretz (Vir) ---
Yes, this is the same issue.
FWIW, a vectorization with SSE4.1 could do:
pxor xmm0, xmm0
pinsrw xmm0, WORD PTR in[rip], 0
pmovsxbw xmm0, xmm0
movd DWORD PTR out[rip], xmm0
Whether that's fast
: wrong-code
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (https://godbolt.org/z/8QYarZ
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*
Test case (https://godbolt.org/z/ramAe3):
using float2 [[gnu::vector_size(8
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (https://godbolt.org/z/ic8eXp):
#include
using V [[gnu::vector_size(32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45414
Matthias Kretz (Vir) changed:
What|Removed |Added
Keywords||rejects-valid
--- Comment #2 from
Keywords: rejects-valid
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (https://godbolt.org/z/VaSCCA):
template concept foo =
requires(T&am
: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (-std=c++2a):
template concept foo = [](auto) constexpr -> bool { return true;
}(N);
bool a = foo<2>;
Extended test case (use
Keywords: ice-on-valid-code
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (https://godbolt.org/z/_ErsXE):
struct simd {
using _Short8
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (no flags required):
template struct a {
using b = a;
void c() alignas(b::d);
};
This test case fell out of creduce while trying to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357
--- Comment #10 from Matthias Kretz (Vir) ---
(In reply to Jason Merrill from comment #9)
> Fixed for GCC 9.3/10. The patch doesn't apply cleanly to the GCC 8 branch,
> is it important to fix there?
Not important for me.
Thank you for resolvin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838
--- Comment #6 from Matthias Kretz (Vir) ---
FWIW, I'd prefer gnu::vector_size(N) to not introduce any additional UB over
the scalar arithmetic types. I.e. behave like if promotion would happen, just
with final assignment back to T (truncation).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838
--- Comment #4 from Matthias Kretz (Vir) ---
Good point. Since gnu::vector_size(N) types are defined by you, you might be
able to say that for char and short this is also UB. After all the left operand
isn't actually promoted to int. Consequently
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Testcase (cf. https://godbolt.org/z/DMQf9-):
#include
// missed optimization:
__m512 f
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: aarch64-*-*
Compile the following test case for aarch64 with -O2:
#include
int f(float x, float y)
{
std
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (cf. https://godbolt.org/z/z3TH9F):
#include
using V [[gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91841
--- Comment #4 from Matthias Kretz ---
(In reply to Uroš Bizjak from comment #3)
> [f]emms should be emitted by an intrinsic (_mm_empty), inserted by the
> programmer. The programmer can mix FP and MMX instructions in the same
> function, so ther
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91841
--- Comment #2 from Matthias Kretz ---
Ah, because of:
typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
? Too be pedantic only `int [[gnu::vector_size(8)]]` equals __m64. But I see
your point.
I guess clang interprets th
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: i?86-*-*
Test case `g++ -O2 -m32` (cf. https://godbolt.org/z/RDUZo9):
#include
using T = unsigned short;
using V [[gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838
--- Comment #1 from Matthias Kretz ---
https://godbolt.org/z/zxmCTz
Keywords: missed-optimization, wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*
Test case:
using T = unsigned char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85482
--- Comment #3 from Matthias Kretz ---
Seems like trunk (10.0.0 20190910) resolves the issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538
Matthias Kretz changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #5 from Matthias Kretz ---
> So for #c3 you are essentially asking for a .rodata size optimization.
Comment #1 also does so, no? But yes, this is a .rodata optimization and thus
potentially a visible reduction on cache pressure. Cons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #3
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (cf. https://godbolt.org/z/IfL1mF):
using V [[gnu
Keywords: wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Testcase (cf. https://godbolt.org/z/xBEtqT
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (https://godbolt.org/z/CYipz7):
template using V [[gnu::vector_size(16)]] = T;
V f(V a, V b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88918
Bug 88918 depends on bug 56253, which changed state.
Bug 56253 Summary: fp-contract does not work with SSE and AVX FMAs (neither
FMA4 nor FMA3)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
What|Removed |A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
Matthias Kretz changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752
Matthias Kretz changed:
What|Removed |Added
Known to work||7.4.0, 9.1.0
Known to fail|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58790
Matthias Kretz changed:
What|Removed |Added
Version|4.9.0 |10.0
--- Comment #2 from Matthias Kretz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90483
--- Comment #1 from Matthias Kretz ---
https://godbolt.org/z/7BFMdG (for quick verification)
ssed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Testcase (cf. https://godbolt.org/z/7NiU7O):
#inc
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
The (V)PTEST instruction of SSE4.1/AVX produces ZF = `(a & b) == 0` and CF =
`(~a & b) == 0`. Generic usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90460
--- Comment #1 from Matthias Kretz ---
PR85048 and PR77399 are related
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424
--- Comment #2 from Matthias Kretz ---
FWIW, I agree that "bit-inserting into a default-def" isn't a good idea. My
code, in the meantime, looks more like this (https://godbolt.org/z/D-yfZJ):
template
using V [[gnu::vector_size(16)]] = T;
templ
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Testcase (cf. https://godbolt.org/z/LsKcii):
template
using V [[gnu::vector_size(16)]] = T;
template
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88152
Matthias Kretz changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (https://godbolt.org/z/34KB20):
struct Z {
int y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88066
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357
--- Comment #2 from Matthias Kretz ---
I agree. The corresponding C test case produces equivalent f0 and f1:
void g(int*);
void f0() {
__attribute__((aligned(128))) int x;
g(&x);
}
void f1() {
_Alignas(128) int x;
g(&x);
}
And I agree
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: aarch64-*-*, arm-*-*
Test case (cf. https://godbolt.org/z/ubJge4):
void g(int &);
auto f0() {
__attribu
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (cf. https://godbolt.org/z/RFrftn):
#include
template
void g(T &&x) {
x = 1;
}
auto f(const __Int8x8_t &x) {
g(x[0]);
//x[0] = 1; // ill-formed
}
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Testcase `-O2 -msse2`, further missed optimization with SSSE3 / SSE4.1 (cf.
https://godbolt.org/z
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24073
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854
--- Comment #7 from Matthias Kretz ---
(In reply to rguent...@suse.de from comment #5)
> Yeah, we do not perform this kind of "flow-sensitive" TBAA. So
> when trying to DSE *a = x; we only look at
>
> int x = *a;
> *b = 1;
> *a =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854
--- Comment #6 from Matthias Kretz ---
Regarding gcc.dg/tree-ssa/ssa-pre-30.c
I'd argue that for `bar`, GCC may assume b == 0, because otherwise f would be
read both via int and float pointer, which is UB. So bar can be optimized to
`foo` shows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88854
--- Comment #4 from Matthias Kretz ---
Another test case, which the patch doesn't optimize:
short f(int *a, short *b) {
short y = *b; // 1
int x = *a; // 2
*b = 1;
*a = x;
return y;
}
The loads in 1+2 are either UB or a an
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
CC: rguenth at gcc dot gnu.org
Target Milestone: ---
Test cases:
This is optimized at -O1 and with GCC 5 at -O2. -fdisable-tree-fre1 and
-fno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #10 from Matthias Kretz ---
Experience from testing my simd implementation:
I had failures (2 ULP deviation from long double result) when using
auto __xx = abs(__x);
auto __yy = abs(__y);
auto __zz = abs(__z
-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (https://godbolt.org/z/gyCN12):
#include
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517
--- Comment #4 from Matthias Kretz ---
A similar test case showing that something is still missing
(https://gcc.godbolt.org/z/t1DT7E):
#include
inline __m128i cmp(__m128i x, __m128i y) {
return _mm_cmpeq_epi16(x, y);
}
inline unsigned to_b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80517
Matthias Kretz changed:
What|Removed |Added
Version|8.0 |9.0
--- Comment #3 from Matthias Kretz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #9 from Matthias Kretz ---
(In reply to emsr from comment #7)
> What does this do?
>
> auto __hi_exp =
> __hi & simd<_T, _Abi>(std::numeric_limits<_T>::infinity()); // no error
component-wise bitwise and of __hi and +inf. Or i
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case:
```
#include
__m128 f(__m128 x, __m128 &y) {
y = _mm_fixupimm_ps(x, _mm_set1_epi32(0x), 0x00);
retu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #6 from Matthias Kretz ---
(In reply to Marc Glisse from comment #4)
> Your "reference" number seems strange. Why not do the computation with
> double (or long double or mpfr) or use __builtin_hypotf? Note that it
> changes the value.
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Created attachment 45398
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45398&action=edit
reduced test case
Compile the attached test case with `-g -O2 -std=gnu++17 -march=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752
Matthias Kretz changed:
What|Removed |Added
Attachment #45376|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752
Matthias Kretz changed:
What|Removed |Added
Attachment #45375|0 |1
is obsolete|
: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Created attachment 45375
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45375&action=edit
not-reduced test case
Compile attached test case with `-std=gnu++17
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052
--- Comment #12 from Matthias Kretz ---
(In reply to Jakub Jelinek from comment #11)
> [...] though for 8x conversions we
> are e.g. on x86 already outside of the realm of natively supported vectors
> (we don't really want MMX and for 1024 bit an
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052
--- Comment #9 from Matthias Kretz ---
(In reply to Devin Hussey from comment #7)
> Wait, silly me, this isn't about optimizations, this is about patterns.
Regarding optimizations, PR85048 is a first step (it lists all x86
single-instruction SIM
1 - 100 of 248 matches
Mail list logo