-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(int x, int y)
{
return (x < 0) != (y < 0);
}
`(x < 0) != (y < 0)` can be optimized to `(x ^ y) < 0`.
This transformation is done by clang,
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f1(unsigned x)
{
switch (x)
{
case 0:
return 1;
case 1:
return 2;
}
}
gcc fails to optimize this to `return x + 1
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int a, int b)
{
return (int)((a - 1U) * b) + b;
}
Can be optimized to `a * b`. LLVM does this
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned r(int v)
{
const int mask = v >> (sizeof(int) * CHAR_BIT - 1);
return (v + mask) ^ mask;
}
This can be optimized to `return abs(v)`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779
--- Comment #2 from Gabriel Ravier ---
It's fully optimized ? I don't see how. This is exactly what I was complaining
about : It could be further optimized to
leal1(%rdi), %eax
ret
but it isn't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779
--- Comment #3 from Gabriel Ravier ---
Just fyi : When I said "gcc fails to optimize this to `return x + 1`, instead
opting for some rather weird code generation (involving `sbb` on x86)" the
"weird code generation" I was referring to is the exac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779
--- Comment #5 from Gabriel Ravier ---
Going to take a quick look at how it gets optimized in the tree passes.
This is the first case :
int f1(unsigned x)
{
if (x >= 2)
__builtin_unreachable();
switch (x)
{
case 0:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779
--- Comment #6 from Gabriel Ravier ---
There is another thing I realised : This code :
int f1(unsigned x)
{
switch (x)
{
case 0:
return 1;
case 1:
return 2;
case 2:
return 3;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94779
--- Comment #8 from Gabriel Ravier ---
Also, this code :
int f1(unsigned x)
{
if (x >= 3)
__builtin_unreachable();
switch (x)
{
case 0:
return 1;
case 1:
return 2;
case 2:
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned r(int v)
{
return (1 | -(v < 0)) * v;
}
`r` is equivalent to `abs(v)`. GCC does not make the transformation to an
`abs`.
Example of
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int r1(int x, int y)
{
return y ^ ((x ^ y) & -(x < y));
}
int r2(int x, int y)
{
return x ^ ((x ^ y) & -(x < y));
}
`r1` can be optimized to
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(unsigned v)
{
return v && !(v & (v - 1));
}
Depending on the supported architecture, we may want to optimize this to
`__builtin_popc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787
--- Comment #1 from Gabriel Ravier ---
Inversely, I'd also suggest doing the opposite. That is, if there is no
hardware popcount instruction, `__builtin_popcount(v) == 1` should be optimized
to `v && !(v & (v - 1))`
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int r(int x, unsigned b)
{
int const m = CHAR_BIT * sizeof(x) - b;
return (x << m);
}
`CH
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned r1(unsigned a, unsigned b, unsigned mask)
{
return a ^ ((a ^ b) & mask);
}
unsigned r2(unsigned a, unsign
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned r1(unsigned v)
{
unsigned r = 0;
while (v >>= 1)
r++;
return r;
}
This can optimized to `32 - __builtin_clz(v >> 1);`. LL
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int isNonzero(int x)
{
if (x == 0)
return 0x;
else
return 0x;
}
On x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94789
--- Comment #3 from Gabriel Ravier ---
>From what I've seen, this optimisation could be useful on at least these
targets :
- x86_64
- i686
- aarch64
On other architectures I've looked at, either the optimization can't be done
and/or it's useles
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int a, int b)
{
return ((a == b) & (a - b));
}
The `a == b` is able to use condition flags resulting from `a - b`, and thus
avoid an extra compare. LLVM
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int a, int b)
{
return (b >= a) ? (b - a) : 0;
}
Generates some *really* bad code with GCC right now, it seems to forget such
basic things
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int populationCount(uint32_t x)
{
x = x - ((x >> 1) & 0x);
x = (x & 0x) + ((x >> 2) & 0x);
x = (x
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int a)
{
return __builtin_clz(a) >> 5;
}
Can be optimized to `return 0;`. This transformation is done by LLVM, but not
by GCC.
Comparison here :
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(int a, int b)
{
return __builtin_clz(a - b);
}
This is equivalent to `return a >= b`. This transformation is done by LLVM, but
not by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94802
--- Comment #1 from Gabriel Ravier ---
Also, there are also patterns like `__builtin_clz(a - b) == 31`, which can be
optimized to `(a - b) == 1`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94801
--- Comment #4 from Gabriel Ravier ---
Isn't `__builtin_clz(0)` undefined ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94798
--- Comment #2 from Gabriel Ravier ---
Ok, will do that in the future. Considering I was just linking to godbolt every
time for the assembly code, should I go back to all the other bug reports that
I've made to upload assembly code there too ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94795
--- Comment #2 from Gabriel Ravier ---
Also, I can also provide this a very similar function for which such an
optimization could be helpful :
int f(int x)
{
return -(x == 0);
}
LLVM optimises that function to this :
f(int):
cmp edi, 1
: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
using i128 = __int128;
i128 add128(i128 a, i128 b)
{
return a + b;
}
This is how LLVM handles this code :
add128(__int128, __int128):
mov rax, rdi
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
__int128 f(__int128 x)
{
return -x;
}
It would appear like unary minus is badly optimized by GCC.
This is what LLVM outputs for this :
f(__int128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804
--- Comment #1 from Gabriel Ravier ---
For subtraction, it's even worse.
using i128 = __int128;
i128 sub128(i128 a, i128 b)
{
return a - b;
}
results in
sub128(__int128, __int128):
mov rax, rdi
sub rax, rdx
sbb rsi, rcx
mov rdx,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804
--- Comment #3 from Gabriel Ravier ---
So, things like
uint64_t swap64(uint64_t x)
{
uint64_t a = __builtin_bswap32(x);
x >>= 32;
a <<= 32;
return __builtin_bswap32(x) | a;
}
Having similar problems with useless movs is from th
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
uint32_t swap32(uint32_t x)
{
return ((x << 24) | ((x << 8) & 0x00FF
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
void f(int *__restrict a, int *__restrict b, size_t sz)
{
for (int i = 0; i < sz; ++i)
a[i] += b[i];
for (int i = 0; i < sz; ++i)
a[i]
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
uint32_t load(const uint8_t* data)
{
uint32_t val = 0;
for (int i = 0; i < sizeof(val) * CHAR_BIT; i += CHAR_BIT)
{
val |= *data++ &
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x)
{
static int s;
if (s)
s = x;
return s;
}
This can be optimized to `return 0`. This
: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
float swapFloat(float x)
{
union
{
float f;
uint32_t u32;
} swapper;
swapper.f = x;
swapper.u32 = __builtin_bswap32
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(bool b, int *p)
{
return b && *p;
}
GCC generates this with -O3:
f(bool, int*):
xor eax, ea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94838
Gabriel Ravier changed:
What|Removed |Added
Target|x86_64-linux-gnu|x86_64-* i?86-*-*
--- Comment #2 from G
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94838
--- Comment #3 from Gabriel Ravier ---
This also occurs on i68* :
f(bool, int*):
xor eax, eax ; Already 0
cmp BYTE PTR [esp+4], 0
je .L1
mov eax, DWORD PTR [esp+8] ; Could use different caller-saved register such
as ecx or edx
mov eax
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94838
--- Comment #4 from Gabriel Ravier ---
Oops, seems like there was a weird collision. Don't pay attention to the second
to last comment before this one, it's identical to the last comment before this
one except for a single comment being added in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837
--- Comment #2 from Gabriel Ravier ---
This is what I get with `-O3 -mmovbe -mtune=intel` :
swapFloat(float):
movd DWORD PTR [rsp-4], xmm0
movbe eax, DWORD PTR [rsp-4]
movd xmm0, eax
ret
This seems erroneous
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837
--- Comment #3 from Gabriel Ravier ---
Also, I've tested the code from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 and the optimization in
question is no longer in in `-mtune=generic`, only with specific architectures
like `-mtune=k8`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94834
--- Comment #2 from Gabriel Ravier ---
Now I wonder why the unrolling happens too late since there was 1 ecp check
that should happen after the unrolling, from my understanding. Are the multiple
ecp passes detecting different things?
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned f(unsigned *p, unsigned x)
{
unsigned u = *p;
*p += x;
if (u > *p)
++*p;
return *p;
}
This is what LLVM outputs with -O3 :
f(unsig
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94846
Gabriel Ravier changed:
What|Removed |Added
Target|x86_64-* i?86-*-* |
--- Comment #1 from Gabriel Ravier --
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94846
--- Comment #2 from Gabriel Ravier ---
More notes :
This seems to be generic to all targets, I've also been able to verify it on
ARM.
This only occurs when p is a pointer. This code :
unsigned f(unsigned p, unsigned x)
{
unsigned u = p;
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
struct testStruct {
uint64_t a;
uint64_t b;
};
uint64_t f(testStruct t, int x)
{
return ((t.a << (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94850
--- Comment #1 from Gabriel Ravier ---
PS : The same optimization can apply to i686, just replace all occurences of
"64" with "32" and you could use shld/shrd there too
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(unsigned *p, unsigned x)
{
unsigned u = *p;
*p += x;
return u > *p;
}
W
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
uint32_t bzhi32(uint32_t x, uint32_t y)
{
return ((x << (32 - y)) >> (32 - y));
}
LLVM with -O3 -mbmi2 optimizes this to :
bzhi32(unsigned int, unsigned int
ation
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f()
{
int x = x;
return x;
}
LLVM compiles this to a return instruction, not bothering to initialize the
result register as its value is undefined. GCC instead
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef double v2df __attribute__((vector_size(16)));
v2df move_sd(v2df a, v2df b)
{
v2df result = a;
result[0] = b[0];
return result;
}
LLVM -O3
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef double v2df __attribute__((vector_size(16)));
v2df move_sd(v2df a, v2df b)
{
v2df result = a;
result[0] = b[1
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef double v2df __attribute__((vector_size(16)));
v2df move_sd(v2df a, v2df b)
{
v2df result = a;
result[1] = b[1];
return result;
}
With
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
--- Comment #1 from Gabriel Ravier ---
Note : The compilation options were `-O3 -mavx`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863
--- Comment #1 from Gabriel Ravier ---
Note: The given outputs for LLVM and GCC are when compiling with `-O3 -msse4.1`
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef int64_t v2di __attribute__((vector_size(16)));
typedef int32_t v2si __attribute__((vector_size(8)));
v2di _mm_move_epi64(v2di a)
{
return v2di{a[0], 0LL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863
--- Comment #3 from Gabriel Ravier ---
For binary size, the `movsd` takes 4 bytes and the `blendps` takes 6 bytes
The port allocations for the instructions are as such (same formatting as for
the throughputs) :
Wolfdale: p5, p015
Nehalem: p5,
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef double v2df __attribute__((vector_size(16)));
v2df _mm_sqrt_sd(v2df a, v2df b)
{
v2df c = __builtin_ia32_sqrtpd((v2df){b[0], b[1
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef double v2df __attribute__((vector_size(16)));
typedef int64_t v2di __attribute__((vector_size(16)));
typedef int8_t v16qi __attribute__((vector_size(16
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
union u64Elems
{
uint64_t as_u64;
int32_t as_i32[2];
};
uint64_t f(u64Elems m1, u64Elems m2)
{
u64Elems res
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x)
{
return ~(x + 1);
}
With -O3, LLVM outputs this :
f(int): # @f(int)
mov eax, -2
sub eax, edi
ret
GCC outputs this :
f(int):
lea eax, [rdi+1
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned y(unsigned x)
{
unsigned s = x & -x;
return s | (x % s);
}
With -O3, LLVM outputs :
y(unsigned int):
blsi ecx, edi
lea eax, [rcx - 1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94878
--- Comment #1 from Gabriel Ravier ---
Also, the assembly outputs are for when compiling with with `-mbmi` but that
should not affect the bug itself
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return (x | y) - y;
}
This can be optimized to a single andn :
f(int, int): # @f(int, int)
andn eax, esi, edi
ret
(LLVM output with -O3 -mbmi
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return (x & y) - (x | y) - 1;
}
This can be optimized to `~(x ^ y)`. LLVM does this transformation, but GCC
does not.
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool decide() __attribute((const));
inline unsigned getXOrY(unsigned x, unsigned y)
{
return decide() ? y : x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94884
--- Comment #1 from Gabriel Ravier ---
`f` can also be translated to `return true;` when it's this :
bool f(unsigned x, unsigned y)
{
return (x & y) <= getXOrY(x, y);
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94884
--- Comment #2 from Gabriel Ravier ---
Also when it is this :
bool f(unsigned x, unsigned y)
{
return x <= (x | ~y);
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94889
Gabriel Ravier changed:
What|Removed |Added
CC||gabravier at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94889
--- Comment #4 from Gabriel Ravier ---
Investigated it a bit.
It looks like with `-mavx2` the pcom pass decides to vectorize the loop, and it
then later gets mowed down into a `~`.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94878
Gabriel Ravier changed:
What|Removed |Added
Target|x86_64-*-* i?86-*-* |
--- Comment #3 from Gabriel Ravier --
ponent: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
inline int sign(int x)
{
return (x >> 31) | ((unsigned)-x >> 31);
}
bool f(int x)
{
return sign(x) > -1;
}
With -O3, LLVM produces t
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
inline int sign(int x)
{
return (x >> 31) | ((unsigned)-x >> 31);
}
bool f(int x)
{
return sign(x) < 1;
}
With -O3, LLVM o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94892
--- Comment #2 from Gabriel Ravier ---
In that case, then, GCC is generating sub-optimal code for `(x >> 31) + 1`
alone since it optimises that to the same thing as LLVM
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(int x, int y)
{
if (x >= y)
return x - y;
return 0;
}
This can be optimized to `x > y`
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return x + 0x8000 < y + 0x8000;
}
This can be optimized to `return x < y`. LLVM does this transformation, but GCC
does not.
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef float v4sf __attribute__((vector_size(16)));
v4sf g();
v4sf f(v4sf a, v4sf b)
{
return (v4sf){g()[1], a[1], a[2], a[3]};
}
With -O3, LLVM
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
inline void assume(_Bool b)
{
if (!b)
__builtin_unreachable();
}
_Bool f(int n)
{
assume(n >= 1);
typedef int A[n];
++n;
A a;
in
++
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
#include
#include
bool f(int n)
{
typedef int A[n];
++n;
A a;
int b[n];
n -= 2;
typedef int C[n];
C c;
return (sizeof(a) < sizeo
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(unsigned x, unsigned y)
{
return ~x < y;
}
With -O3, LLVM outputs this :
f(unsigned int, unsigned int):
add edi, esi
setb al
ret
GCC outp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94913
--- Comment #1 from Gabriel Ravier ---
The same thing happens for this code :
bool f(unsigned x, unsigned y)
{
return (x - y - 1) >= x;
}
LLVM outputs this :
f(unsigned int, unsigned int):
cmp esi, edi
setae al
ret
GCC outputs this
: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool f(uint32_t x, uint32_t y)
{
return (((uint64_t)x * y) >> 32) != 0;
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94898
--- Comment #1 from Gabriel Ravier ---
Also, if this function is changed to return `int`, it can then be optimized to
a conditional move, which GCC fails to do
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return x > y ? x : y;
}
When compiling with -O3 -mtune=core2 -msse4.1, GCC outputs this :
f(int, int):
movd xmm0, edi
movd xmm1,
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return (x - y) & -(x >= y);
}
This can be optimized to return x >= y ? x - y : 0. LLVM
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return ((x ^ y) & -(x >= y)) ^ y;
}
This can be optimized to `x >= y ? x : y`. LLVM makes this transformation, but
GCC does not.
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned f(int x)
{
return (x >= 0 ? x : 0) + (x <= 0 ? -x : 0);
}
This
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return ~(~x - y);
}
This can be optimized to `x - y`. This transformation is done by LLVM, but not
by GCC
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int memcmp1(const void *s, const void *c)
{
return __builtin_memcmp(s, c, 1);
}
With
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return x + y;
}
With -O3 -ftrapv, LLVM outputs this :
f(int, int): # @f(int, int)
mov eax, edi
add eax, esi
jo .LBB0_1
ret
.LBB0_1:
ud2
GCC outputs
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
unsigned f(int x)
{
return __builtin_abs(x);
}
This should emit a call to __absvsi2, not get "inlined" into a call to
__subvsi3
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool combine(bool a, bool b)
{
return (a || b) && !(a && b);
}
This can be converted to `a ^ b`. LLVM does this transformation, but GCC does
not.
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
long long f();
int g()
{
return f();
}
With -O3, LLVM outputs :
g(): # @g()
jmp f() # TAILCALL
GCC outputs :
g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95142
Gabriel Ravier changed:
What|Removed |Added
CC||gabravier at gmail dot com
--- Comment
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int a, int b)
{
return a * (b / a);
}
This is equivalent to `return b - (b % a);`. This
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
void g(int i)
{
extern int i;
}
`extern int i;` redefines the `i` parameter and is thus invalid (Clang also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94934
--- Comment #3 from Gabriel Ravier ---
In that case, it looks really easy to reimplemnet `-ftrapv` as literally just
enabling `-fsanitize=signed-integer-overflow
-fsanitize-undefined-trap-on-error`.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94919
--- Comment #2 from Gabriel Ravier ---
Essentially, what I've been doing in my spare time for the past few weeks is
looking at random pieces of code all over the internet, looking at the results
trunk gcc/clang give (usually on x86-64 (though I'v
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
int f(int x, int y)
{
return (x >= 0) == (y <= 0);
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94718 was resolved an
1 - 100 of 467 matches
Mail list logo