[Bug tree-optimization/94718] New: Failure to optimize opposite signs check

2020-04-22 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(int x, int y) { return (x < 0) != (y < 0); } `(x < 0) != (y < 0)` can be optimized to `(x ^ y) < 0`. This transformation is done by clang,

[Bug tree-optimization/94779] New: Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f1(unsigned x) { switch (x) { case 0: return 1; case 1: return 2; } } gcc fails to optimize this to `return x + 1

[Bug tree-optimization/94782] New: Simple multiplication-related arithmetic not optimized to direct multiplication

2020-04-26 Thread gabravier at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int a, int b) { return (int)((a - 1U) * b) + b; } Can be optimized to `a * b`. LLVM does this

[Bug tree-optimization/94783] New: Abs-equivalent pattern is not recognized as abs

2020-04-26 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned r(int v) { const int mask = v >> (sizeof(int) * CHAR_BIT - 1); return (v + mask) ^ mask; } This can be optimized to `return abs(v)`

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779 --- Comment #2 from Gabriel Ravier --- It's fully optimized ? I don't see how. This is exactly what I was complaining about : It could be further optimized to leal1(%rdi), %eax ret but it isn't

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779 --- Comment #3 from Gabriel Ravier --- Just fyi : When I said "gcc fails to optimize this to `return x + 1`, instead opting for some rather weird code generation (involving `sbb` on x86)" the "weird code generation" I was referring to is the exac

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779 --- Comment #5 from Gabriel Ravier --- Going to take a quick look at how it gets optimized in the tree passes. This is the first case : int f1(unsigned x) { if (x >= 2) __builtin_unreachable(); switch (x) { case 0:

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779 --- Comment #6 from Gabriel Ravier --- There is another thing I realised : This code : int f1(unsigned x) { switch (x) { case 0: return 1; case 1: return 2; case 2: return 3;

[Bug tree-optimization/94779] Bad optimization of simple switch

2020-04-26 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779 --- Comment #8 from Gabriel Ravier --- Also, this code : int f1(unsigned x) { if (x >= 3) __builtin_unreachable(); switch (x) { case 0: return 1; case 1: return 2; case 2:

[Bug tree-optimization/94785] New: Failure to detect abs pattern using multiplication

2020-04-27 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned r(int v) { return (1 | -(v < 0)) * v; } `r` is equivalent to `abs(v)`. GCC does not make the transformation to an `abs`. Example of

[Bug tree-optimization/94786] New: Missed min/max pattern using xor+and+less

2020-04-27 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int r1(int x, int y) { return y ^ ((x ^ y) & -(x < y)); } int r2(int x, int y) { return x ^ ((x ^ y) & -(x < y)); } `r1` can be optimized to

[Bug tree-optimization/94787] New: Failure to detect single bit popcount pattern

2020-04-27 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(unsigned v) { return v && !(v & (v - 1)); } Depending on the supported architecture, we may want to optimize this to `__builtin_popc

[Bug tree-optimization/94787] Failure to detect single bit popcount pattern

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787 --- Comment #1 from Gabriel Ravier --- Inversely, I'd also suggest doing the opposite. That is, if there is no hardware popcount instruction, `__builtin_popcount(v) == 1` should be optimized to `v && !(v & (v - 1))`

[Bug tree-optimization/94789] New: Failure to take advantage of shift operand semantics to turn subtraction into negate

2020-04-27 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int r(int x, unsigned b) { int const m = CHAR_BIT * sizeof(x) - b; return (x << m); } `CH

[Bug tree-optimization/94790] New: Failure to use andn in specific pattern in which it is available

2020-04-27 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned r1(unsigned a, unsigned b, unsigned mask) { return a ^ ((a ^ b) & mask); } unsigned r2(unsigned a, unsign

[Bug tree-optimization/94793] New: Failure to optimize clz idiom

2020-04-27 Thread gabravier at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned r1(unsigned v) { unsigned r = 0; while (v >>= 1) r++; return r; } This can optimized to `32 - __builtin_clz(v >> 1);`. LL

[Bug rtl-optimization/94795] New: Failure to use fast sbb method on x86 for spreading any set bit to all bits

2020-04-27 Thread gabravier at gmail dot com
: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int isNonzero(int x) { if (x == 0) return 0x; else return 0x; } On x86

[Bug target/94789] Failure to take advantage of shift operand semantics to turn subtraction into negate

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94789 --- Comment #3 from Gabriel Ravier --- >From what I've seen, this optimisation could be useful on at least these targets : - x86_64 - i686 - aarch64 On other architectures I've looked at, either the optimization can't be done and/or it's useles

[Bug rtl-optimization/94796] New: Failure to reuse flags from substraction

2020-04-27 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int a, int b) { return ((a == b) & (a - b)); } The `a == b` is able to use condition flags resulting from `a - b`, and thus avoid an extra compare. LLVM

[Bug rtl-optimization/94798] New: Failure to optimize subtraction and 0 literal properly

2020-04-27 Thread gabravier at gmail dot com
Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int a, int b) { return (b >= a) ? (b - a) : 0; } Generates some *really* bad code with GCC right now, it seems to forget such basic things

[Bug tree-optimization/94800] New: Failure to optimize yet another popcount idiom

2020-04-27 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int populationCount(uint32_t x) { x = x - ((x >> 1) & 0x); x = (x & 0x) + ((x >> 2) & 0x); x = (x

[Bug tree-optimization/94801] New: Failure to optimize narrowed __builtin_clz

2020-04-27 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int a) { return __builtin_clz(a) >> 5; } Can be optimized to `return 0;`. This transformation is done by LLVM, but not by GCC. Comparison here :

[Bug tree-optimization/94802] New: Failure to recognize identities with __builtin_clz

2020-04-27 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(int a, int b) { return __builtin_clz(a - b); } This is equivalent to `return a >= b`. This transformation is done by LLVM, but not by

[Bug tree-optimization/94802] Failure to recognize identities with __builtin_clz

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94802 --- Comment #1 from Gabriel Ravier --- Also, there are also patterns like `__builtin_clz(a - b) == 31`, which can be optimized to `(a - b) == 1`

[Bug tree-optimization/94801] Failure to optimize narrowed __builtin_clz

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94801 --- Comment #4 from Gabriel Ravier --- Isn't `__builtin_clz(0)` undefined ?

[Bug rtl-optimization/94798] Failure to optimize subtraction and 0 literal properly

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94798 --- Comment #2 from Gabriel Ravier --- Ok, will do that in the future. Considering I was just linking to godbolt every time for the assembly code, should I go back to all the other bug reports that I've made to upload assembly code there too ?

[Bug rtl-optimization/94795] Failure to use fast sbb method on x86 for spreading any set bit to all bits

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94795 --- Comment #2 from Gabriel Ravier --- Also, I can also provide this a very similar function for which such an optimization could be helpful : int f(int x) { return -(x == 0); } LLVM optimises that function to this : f(int): cmp edi, 1

[Bug rtl-optimization/94804] New: Failure to elide useless movs in 128-bit addition

2020-04-27 Thread gabravier at gmail dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- using i128 = __int128; i128 add128(i128 a, i128 b) { return a + b; } This is how LLVM handles this code : add128(__int128, __int128): mov rax, rdi

[Bug rtl-optimization/94806] New: Failure to optimize unary minus for 128-bit operand

2020-04-27 Thread gabravier at gmail dot com
Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- __int128 f(__int128 x) { return -x; } It would appear like unary minus is badly optimized by GCC. This is what LLVM outputs for this : f(__int128

[Bug rtl-optimization/94804] Failure to elide useless movs in 128-bit addition

2020-04-27 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804 --- Comment #1 from Gabriel Ravier --- For subtraction, it's even worse. using i128 = __int128; i128 sub128(i128 a, i128 b) { return a - b; } results in sub128(__int128, __int128): mov rax, rdi sub rax, rdx sbb rsi, rcx mov rdx,

[Bug rtl-optimization/94804] Failure to elide useless movs in 128-bit addition

2020-04-28 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804 --- Comment #3 from Gabriel Ravier --- So, things like uint64_t swap64(uint64_t x) { uint64_t a = __builtin_bswap32(x); x >>= 32; a <<= 32; return __builtin_bswap32(x) | a; } Having similar problems with useless movs is from th

[Bug tree-optimization/94824] New: Failure to optimize with __builtin_bswap32 as well as with a function recognized as such

2020-04-28 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- uint32_t swap32(uint32_t x) { return ((x << 24) | ((x << 8) & 0x00FF

[Bug tree-optimization/94828] New: Failure to merge merge-able loops

2020-04-28 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- void f(int *__restrict a, int *__restrict b, size_t sz) { for (int i = 0; i < sz; ++i) a[i] += b[i]; for (int i = 0; i < sz; ++i) a[i]

[Bug tree-optimization/94834] New: Failure to optimize loop bswap pattern

2020-04-28 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- uint32_t load(const uint8_t* data) { uint32_t val = 0; for (int i = 0; i < sizeof(val) * CHAR_BIT; i += CHAR_BIT) { val |= *data++ &

[Bug tree-optimization/94836] New: Failure to optimize condition based on known value of static variable

2020-04-28 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x) { static int s; if (s) s = x; return s; } This can be optimized to `return 0`. This

[Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap

2020-04-28 Thread gabravier at gmail dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- float swapFloat(float x) { union { float f; uint32_t u32; } swapper; swapper.f = x; swapper.u32 = __builtin_bswap32

[Bug rtl-optimization/94838] New: Failure to optimize out useless zero-ing after register was already zero-ed

2020-04-28 Thread gabravier at gmail dot com
: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(bool b, int *p) { return b && *p; } GCC generates this with -O3: f(bool, int*): xor eax, ea

[Bug target/94838] Failure to optimize out useless zero-ing after register was already zero-ed

2020-04-28 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94838 Gabriel Ravier changed: What|Removed |Added Target|x86_64-linux-gnu|x86_64-* i?86-*-* --- Comment #2 from G

[Bug target/94838] Failure to optimize out useless zero-ing after register was already zero-ed

2020-04-28 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94838 --- Comment #3 from Gabriel Ravier --- This also occurs on i68* : f(bool, int*): xor eax, eax ; Already 0 cmp BYTE PTR [esp+4], 0 je .L1 mov eax, DWORD PTR [esp+8] ; Could use different caller-saved register such as ecx or edx mov eax

[Bug target/94838] Failure to optimize out useless zero-ing after register was already zero-ed

2020-04-28 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94838 --- Comment #4 from Gabriel Ravier --- Oops, seems like there was a weird collision. Don't pay attention to the second to last comment before this one, it's identical to the last comment before this one except for a single comment being added in

[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837 --- Comment #2 from Gabriel Ravier --- This is what I get with `-O3 -mmovbe -mtune=intel` : swapFloat(float): movd DWORD PTR [rsp-4], xmm0 movbe eax, DWORD PTR [rsp-4] movd xmm0, eax ret This seems erroneous

[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837 --- Comment #3 from Gabriel Ravier --- Also, I've tested the code from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 and the optimization in question is no longer in in `-mtune=generic`, only with specific architectures like `-mtune=k8`

[Bug tree-optimization/94834] Failure to optimize loop bswap pattern

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94834 --- Comment #2 from Gabriel Ravier --- Now I wonder why the unrolling happens too late since there was 1 ecp check that should happen after the unrolling, from my understanding. Are the multiple ecp passes detecting different things?

[Bug tree-optimization/94846] New: Failure to optimize jnc+inc into adc

2020-04-29 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned f(unsigned *p, unsigned x) { unsigned u = *p; *p += x; if (u > *p) ++*p; return *p; } This is what LLVM outputs with -O3 : f(unsig

[Bug rtl-optimization/94846] Failure to optimize jnc+inc into adc

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94846 Gabriel Ravier changed: What|Removed |Added Target|x86_64-* i?86-*-* | --- Comment #1 from Gabriel Ravier --

[Bug rtl-optimization/94846] Failure to optimize jnc+inc into adc

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94846 --- Comment #2 from Gabriel Ravier --- More notes : This seems to be generic to all targets, I've also been able to verify it on ARM. This only occurs when p is a pointer. This code : unsigned f(unsigned p, unsigned x) { unsigned u = p;

[Bug rtl-optimization/94850] New: Failure to optimize operation corresponding to shrd to shrd

2020-04-29 Thread gabravier at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- struct testStruct { uint64_t a; uint64_t b; }; uint64_t f(testStruct t, int x) { return ((t.a << (

[Bug rtl-optimization/94850] Failure to optimize operation corresponding to shrd to shrd

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94850 --- Comment #1 from Gabriel Ravier --- PS : The same optimization can apply to i686, just replace all occurences of "64" with "32" and you could use shld/shrd there too

[Bug rtl-optimization/94857] New: Failure to optimize load+add+store into add on memory when getting carry flag afterwards on x86

2020-04-29 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(unsigned *p, unsigned x) { unsigned u = *p; *p += x; return u > *p; } W

[Bug rtl-optimization/94860] New: Failure to recognize bzhi pattern

2020-04-29 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- uint32_t bzhi32(uint32_t x, uint32_t y) { return ((x << (32 - y)) >> (32 - y)); } LLVM with -O3 -mbmi2 optimizes this to : bzhi32(unsigned int, unsigned int

[Bug tree-optimization/94861] New: Don't make undefined values 0

2020-04-29 Thread gabravier at gmail dot com
ation Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f() { int x = x; return x; } LLVM compiles this to a return instruction, not bothering to initialize the result register as its value is undefined. GCC instead

[Bug rtl-optimization/94863] New: Failure to use blendps over mov when possible

2020-04-29 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef double v2df __attribute__((vector_size(16))); v2df move_sd(v2df a, v2df b) { v2df result = a; result[0] = b[0]; return result; } LLVM -O3

[Bug tree-optimization/94864] New: Failure to combine vunpckhpd+movsd into single vunpckhpd

2020-04-29 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef double v2df __attribute__((vector_size(16))); v2df move_sd(v2df a, v2df b) { v2df result = a; result[0] = b[1

[Bug tree-optimization/94865] New: Failure to combine unpckhpd+unpcklpd into blendps

2020-04-29 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef double v2df __attribute__((vector_size(16))); v2df move_sd(v2df a, v2df b) { v2df result = a; result[1] = b[1]; return result; } With

[Bug tree-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 --- Comment #1 from Gabriel Ravier --- Note : The compilation options were `-O3 -mavx`

[Bug rtl-optimization/94863] Failure to use blendps over mov when possible

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863 --- Comment #1 from Gabriel Ravier --- Note: The given outputs for LLVM and GCC are when compiling with `-O3 -msse4.1`

[Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq

2020-04-29 Thread gabravier at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef int64_t v2di __attribute__((vector_size(16))); typedef int32_t v2si __attribute__((vector_size(8))); v2di _mm_move_epi64(v2di a) { return v2di{a[0], 0LL

[Bug target/94863] Failure to use blendps over mov when possible

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863 --- Comment #3 from Gabriel Ravier --- For binary size, the `movsd` takes 4 bytes and the `blendps` takes 6 bytes The port allocations for the instructions are as such (same formatting as for the throughputs) : Wolfdale: p5, p015 Nehalem: p5,

[Bug target/94870] New: Failure to use movhlps instead of seperated mov+unpckhpd

2020-04-30 Thread gabravier at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef double v2df __attribute__((vector_size(16))); v2df _mm_sqrt_sd(v2df a, v2df b) { v2df c = __builtin_ia32_sqrtpd((v2df){b[0], b[1

[Bug target/94871] New: Failure to convert cmpeqpd+pxor with -1 into cmpneqpd

2020-04-30 Thread gabravier at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef double v2df __attribute__((vector_size(16))); typedef int64_t v2di __attribute__((vector_size(16))); typedef int8_t v16qi __attribute__((vector_size(16

[Bug tree-optimization/94872] New: Failure to optimize shuffle from u32 array into u64 array properly

2020-04-30 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- union u64Elems { uint64_t as_u64; int32_t as_i32[2]; }; uint64_t f(u64Elems m1, u64Elems m2) { u64Elems res

[Bug tree-optimization/94877] New: Failure to simplify ~(x + 1) to -2 - x

2020-04-30 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x) { return ~(x + 1); } With -O3, LLVM outputs this : f(int): # @f(int) mov eax, -2 sub eax, edi ret GCC outputs this : f(int): lea eax, [rdi+1

[Bug tree-optimization/94878] New: Failure to optimize div with bls/or pattern

2020-04-30 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned y(unsigned x) { unsigned s = x & -x; return s | (x % s); } With -O3, LLVM outputs : y(unsigned int): blsi ecx, edi lea eax, [rcx - 1]

[Bug tree-optimization/94878] Failure to optimize div with bls/or pattern

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94878 --- Comment #1 from Gabriel Ravier --- Also, the assembly outputs are for when compiling with with `-mbmi` but that should not affect the bug itself

[Bug tree-optimization/94880] New: Failure to recognize andn pattern

2020-04-30 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return (x | y) - y; } This can be optimized to a single andn : f(int, int): # @f(int, int) andn eax, esi, edi ret (LLVM output with -O3 -mbmi

[Bug tree-optimization/94882] New: Failure to optimize and+or+sub into xor+not

2020-04-30 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return (x & y) - (x | y) - 1; } This can be optimized to `~(x ^ y)`. LLVM does this transformation, but GCC does not.

[Bug tree-optimization/94884] New: Failure to recognize that result of or is always superior to operands

2020-04-30 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool decide() __attribute((const)); inline unsigned getXOrY(unsigned x, unsigned y) { return decide() ? y : x

[Bug tree-optimization/94884] Failure to recognize that result of or is always superior to operands

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94884 --- Comment #1 from Gabriel Ravier --- `f` can also be translated to `return true;` when it's this : bool f(unsigned x, unsigned y) { return (x & y) <= getXOrY(x, y); }

[Bug tree-optimization/94884] Failure to recognize that result of or is always superior to operands

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94884 --- Comment #2 from Gabriel Ravier --- Also when it is this : bool f(unsigned x, unsigned y) { return x <= (x | ~y); }

[Bug c/94889] Negate function not getting optimised to negate call

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94889 Gabriel Ravier changed: What|Removed |Added CC||gabravier at gmail dot com --- Comment

[Bug c/94889] Negate function not getting optimised to bitwise not

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94889 --- Comment #4 from Gabriel Ravier --- Investigated it a bit. It looks like with `-mavx2` the pcom pass decides to vectorize the loop, and it then later gets mowed down into a `~`.

[Bug tree-optimization/94878] Failure to optimize div with bls/or pattern

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94878 Gabriel Ravier changed: What|Removed |Added Target|x86_64-*-* i?86-*-* | --- Comment #3 from Gabriel Ravier --

[Bug tree-optimization/94892] New: (x >> 31) + 1 not getting narrowed to compare

2020-04-30 Thread gabravier at gmail dot com
ponent: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- inline int sign(int x) { return (x >> 31) | ((unsigned)-x >> 31); } bool f(int x) { return sign(x) > -1; } With -O3, LLVM produces t

[Bug tree-optimization/94893] New: Sign function not getting optimized to simple compare

2020-04-30 Thread gabravier at gmail dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- inline int sign(int x) { return (x >> 31) | ((unsigned)-x >> 31); } bool f(int x) { return sign(x) < 1; } With -O3, LLVM o

[Bug target/94892] (x >> 31) + 1 not getting narrowed to compare

2020-04-30 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94892 --- Comment #2 from Gabriel Ravier --- In that case, then, GCC is generating sub-optimal code for `(x >> 31) + 1` alone since it optimises that to the same thing as LLVM

[Bug tree-optimization/94898] New: Failure to optimize compare plus sub of same operands into compare

2020-04-30 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(int x, int y) { if (x >= y) return x - y; return 0; } This can be optimized to `x > y`

[Bug tree-optimization/94899] New: Failure to optimize out add before compare

2020-04-30 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return x + 0x8000 < y + 0x8000; } This can be optimized to `return x < y`. LLVM does this transformation, but GCC does not.

[Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns

2020-05-01 Thread gabravier at gmail dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef float v4sf __attribute__((vector_size(16))); v4sf g(); v4sf f(v4sf a, v4sf b) { return (v4sf){g()[1], a[1], a[2], a[3]}; } With -O3, LLVM

[Bug tree-optimization/94911] New: Failure to optimize comparisons of VLA sizes

2020-05-01 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- inline void assume(_Bool b) { if (!b) __builtin_unreachable(); } _Bool f(int n) { assume(n >= 1); typedef int A[n]; ++n; A a; in

[Bug c++/94912] New: Non-consistent behaviour of VLAs compared to C

2020-05-01 Thread gabravier at gmail dot com
++ Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- #include #include bool f(int n) { typedef int A[n]; ++n; A a; int b[n]; n -= 2; typedef int C[n]; C c; return (sizeof(a) < sizeo

[Bug tree-optimization/94913] New: Failure to optimize not+cmp into overflow check

2020-05-01 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(unsigned x, unsigned y) { return ~x < y; } With -O3, LLVM outputs this : f(unsigned int, unsigned int): add edi, esi setb al ret GCC outp

[Bug tree-optimization/94913] Failure to optimize not+cmp into overflow check

2020-05-01 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94913 --- Comment #1 from Gabriel Ravier --- The same thing happens for this code : bool f(unsigned x, unsigned y) { return (x - y - 1) >= x; } LLVM outputs this : f(unsigned int, unsigned int): cmp esi, edi setae al ret GCC outputs this

[Bug tree-optimization/94914] New: Failure to optimize check of high part of 64-bit result of 32 by 32 multiplication into overflow check

2020-05-01 Thread gabravier at gmail dot com
: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool f(uint32_t x, uint32_t y) { return (((uint64_t)x * y) >> 32) != 0; }

[Bug tree-optimization/94898] Failure to optimize compare plus sub of same operands into compare

2020-05-01 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94898 --- Comment #1 from Gabriel Ravier --- Also, if this function is changed to return `int`, it can then be optimized to a conditional move, which GCC fails to do

[Bug target/94915] New: MAX_EXPR weirdly optimized on x86 with -mtune=core2

2020-05-02 Thread gabravier at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return x > y ? x : y; } When compiling with -O3 -mtune=core2 -msse4.1, GCC outputs this : f(int, int): movd xmm0, edi movd xmm1,

[Bug tree-optimization/94916] New: Failure to optimize pattern into difference or zero selector

2020-05-02 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return (x - y) & -(x >= y); } This can be optimized to return x >= y ? x - y : 0. LLVM

[Bug tree-optimization/94919] New: Failure to recognize max pattern

2020-05-02 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return ((x ^ y) & -(x >= y)) ^ y; } This can be optimized to `x >= y ? x : y`. LLVM makes this transformation, but GCC does not.

[Bug tree-optimization/94920] New: Failure to optimize abs pattern from arithmetic with selected operands based on comparisons with 0

2020-05-02 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned f(int x) { return (x >= 0 ? x : 0) + (x <= 0 ? -x : 0); } This

[Bug tree-optimization/94921] New: Failure to optimize nots with sub into single add

2020-05-02 Thread gabravier at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return ~(~x - y); } This can be optimized to `x - y`. This transformation is done by LLVM, but not by GCC

[Bug tree-optimization/94930] New: Failure to optimize out subvsi in expansion of __builtin_memcmp with 1 as the operand with -ftrapv

2020-05-02 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int memcmp1(const void *s, const void *c) { return __builtin_memcmp(s, c, 1); } With

[Bug tree-optimization/94934] New: Failure to inline addv

2020-05-03 Thread gabravier at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return x + y; } With -O3 -ftrapv, LLVM outputs this : f(int, int): # @f(int, int) mov eax, edi add eax, esi jo .LBB0_1 ret .LBB0_1: ud2 GCC outputs

[Bug middle-end/94935] New: Failure to emit call to absvsi2 for __builtin_abs with -ftrapv

2020-05-03 Thread gabravier at gmail dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- unsigned f(int x) { return __builtin_abs(x); } This should emit a call to __absvsi2, not get "inlined" into a call to __subvsi3

[Bug tree-optimization/95034] New: Pattern for xor not converted to xor

2020-05-10 Thread gabravier at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool combine(bool a, bool b) { return (a || b) && !(a && b); } This can be converted to `a ^ b`. LLVM does this transformation, but GCC does not.

[Bug target/95076] New: Failure to optimize out stack alignment on function call of different type

2020-05-12 Thread gabravier at gmail dot com
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- long long f(); int g() { return f(); } With -O3, LLVM outputs : g(): # @g() jmp f() # TAILCALL GCC outputs : g

[Bug c/95142] ICE when compiling certain logic with -Ofast and -mpretend-cmove when dealing with floats

2020-05-14 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95142 Gabriel Ravier changed: What|Removed |Added CC||gabravier at gmail dot com --- Comment

[Bug tree-optimization/95176] New: Failure to optimize division followed by multiplication to modulo followed by subtraction

2020-05-17 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int a, int b) { return a * (b / a); } This is equivalent to `return b - (b % a);`. This

[Bug c++/95180] New: Failure to reject invalid code with attempted redefinition of symbol with different linkage

2020-05-17 Thread gabravier at gmail dot com
Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- void g(int i) { extern int i; } `extern int i;` redefines the `i` parameter and is thus invalid (Clang also

[Bug target/94934] Failure to inline addv

2020-05-18 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94934 --- Comment #3 from Gabriel Ravier --- In that case, it looks really easy to reimplemnet `-ftrapv` as literally just enabling `-fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error`.

[Bug tree-optimization/94919] Failure to recognize max pattern

2020-05-18 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94919 --- Comment #2 from Gabriel Ravier --- Essentially, what I've been doing in my spare time for the past few weeks is looking at random pieces of code all over the internet, looking at the results trunk gcc/clang give (usually on x86-64 (though I'v

[Bug tree-optimization/95185] New: Failure to optimize specific kind of sign comparison check

2020-05-18 Thread gabravier at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- int f(int x, int y) { return (x >= 0) == (y <= 0); } https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94718 was resolved an

  1   2   3   4   5   >