[Bug tree-optimization/24696] New: missing optimization in comparison of results of bit operations
Take this little program: int f (unsigned long a, unsigned long b, unsigned long c) { return (a & (c - 1)) != 0 || (b & (c - 1)) != 0; } Compiled on x86-64 with gcc 4.0.2 (but I think also with the current mainline) yields with -O2 the following code: : 0: 48 ff cadec%rdx 3: 48 85 d7test %rdx,%rdi 6: 75 07 jnef 8: 31 c0 xor%eax,%eax a: 48 85 d6test %rdx,%rsi d: 74 05 je 14 f: b8 01 00 00 00 mov$0x1,%eax 14: f3 c3 repz retq As can be seen, both comparisons are executed individually. This is unnecessarily slow. Since the right operand for & is the same and this is a pure bit-test it is perfectly fine to compile the code to the equivalent of int f (unsigned long a, unsigned long b, unsigned long c) { return ((a | b) & (c - 1)) != 0; } This would be significantly faster. On archs like x86-64 no conditional jump (just a setne) would be needed. -- Summary: missing optimization in comparison of results of bit operations Product: gcc Version: 4.0.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24696
[Bug middle-end/25521] New: change semantics of const volatile variables
In math code we often have to make sure the compiler does not fold operations at compile time. In glibc we use variable declared as static const volatile double foo = 42.0; The problem is that gcc moves such variables into .data. But we could achieve that easily by leaving out the 'const'. What is needed is a method to achieve volatile behavior while having the variable in .rodata (and .rodata.cst8 etc). I therefore would like to ask for a change in the compiler which preserves the 'const' in the presence of 'volatile' and place the variable in read-only memory. -- Summary: change semantics of const volatile variables Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521
[Bug middle-end/25522] New: zero-initialized constants are place in .bss
Compile this code: struct foo { int a, b; } const struct foo f; The compiler will mark the variable f in .bss instead of, as the const indicates, into .rodata. This can be a security problem. In glibc we deliberately use const wherever possible (as should everybody) to prevent anybody from changing the value. Allowing changes would allw an intruder to modify the variable and influence the semantics of the program. Yes, this means that binaries get larger. But that's what the programmer requested. -- Summary: zero-initialized constants are place in .bss Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25522
[Bug middle-end/25521] change semantics of const volatile variables
--- Comment #2 from drepper at redhat dot com 2005-12-21 19:38 --- Using gcc's section attributes won't fully work either. Using __attribute((section(".rodata"))) is OK in the compiler, although the assembler (correctly) complaints. But what is really needed is __attribute((section(".rodata.cst8"))). This will cause gcc to fail with an ICE. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521
[Bug c++/25541] New: invalid warning about unused variable
The -Wunused warning generation doesn't take modifications of global variables into account. Compiling the following code with -Wunused -Werror fails although this is perfectly reasonable code. Some registered exit handler could check the value of the variable. int global; struct monitor { ~monitor() { global = 1; } }; int main () { monitor m; } -- Summary: invalid warning about unused variable Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25541
[Bug middle-end/25522] zero-initialized constants are place in .bss
--- Comment #5 from drepper at redhat dot com 2005-12-26 05:52 --- > What happens if you use -fno-common? In this case the variable gets the index of .bss in the symbol table instead of using SHN_COMMON. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25522
[Bug rtl-optimization/25609] New: too agressive printf optimization
At least glibc's printf, maybe others as well, print (null) in for code like printf ("%s", NULL) gcc doesn't consider this when optimizing code where the pointer passed for a %s format specifier can be NULL. Example: #include int main (int argc, char *argv[]) { printf ("%s\n", argc > 1 ? argv[1] : NULL); return 0; } Compiling and running this code (I use gcc 4.0.2) will result in a program which crashes because the printf is transformed into a puts() call and puts() does not allow NULL pointers. There should at least be a mode in which gcc does not perform the transformation if it cannot be sure the pointer is not NULL. The default for Linux and maybe other platforms should be to not perform this optimization if the pointer can be NULL. -- Summary: too agressive printf optimization Product: gcc Version: 4.0.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
[Bug rtl-optimization/25609] too agressive printf optimization
--- Comment #4 from drepper at redhat dot com 2005-12-30 23:06 --- No, it's *NOT* undefined. The libc interface decides what is defined and what is not and it is *EXPLICITLY* documented that NULL pointers are printed as (null). The standard might leave it undefined but this does *NOT* mean the implementation cannot define it. -- drepper at redhat dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
[Bug rtl-optimization/25609] too agressive printf optimization
--- Comment #6 from drepper at redhat dot com 2005-12-30 23:08 --- This is NOT a dup of 15574. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
[Bug rtl-optimization/25609] too agressive printf optimization
--- Comment #8 from drepper at redhat dot com 2005-12-30 23:14 --- > That is true but GCC is a C compiler and not a glibc implemention C compiler. This doesn't mean anything. As soon as you configure gcc to target it to Linux the behavior of the runtime is as defined by the C library. gcc doesn't come with it's own C library so it cannot possibly override any decisions made about undefined behavior. I explicitly said that this optimization need ony be disabled for platforms using glibc. I don't give a rats ass what other platforms do. -- drepper at redhat dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
[Bug rtl-optimization/25609] too agressive printf optimization
--- Comment #10 from drepper at redhat dot com 2005-12-30 23:44 --- glibc *is* the world as far as Linux is concerned. You consistently and deliberately misinterpret what I write: I'm not talking about any platform which does not use glibc or glibc's behavior. And RTH already concurred in private that this is a problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
[Bug rtl-optimization/25609] too agressive printf optimization
--- Comment #12 from drepper at redhat dot com 2005-12-31 00:19 --- > That is not true at all and you know that. There is uclibc. Now you've completely given up on logic? First of all, uclibc and whatever other libc immitation is out there does not define the linux API. glibc *is* the world, all the others are just replacements of varying degree of conformance. This can be seen in the fact that even uclibc implements printf with the behavior in question. But more importantly here: even if there were one piece of code which behaves differently, this does not disqualify the argument that the API for Linux defines the behavior in question. This is an OR operation, not AND. glibc defines the behavior and this means the compiler must handle such code approriately if compiled for Linux. -- drepper at redhat dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609
[Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics
The implementations of the SSE intrinsics for x86 and x86-64 in gcc is tied to the use of an appropriate -m option, such as -mssse3 or -mavx. This is different from what icc does and it prevents code from being written in the most natural form. This is nothing new in gcc 4.4, it has been the behavior of gcc forever, as far as I can see. But especially the introduction of AVX brings this problem to the foreground. As an example, assume I want to write a vector class with the usual operations. I can write code like this: #ifdef __AVX__ vec operator+(vec &a, vec &b) { ... use AVX intrinsics ... } #elif defined __SSE4__ vec operator+(vec &a, vec &b) { ... use SSE4 intrinsics ... } #elif defined __SSE2__ vec operator+(vec &a, vec &b) { ... use SSE2 intrinsics ... } #else vec operator+(vec &a, vec &b) { ... generic implementation ... } #endif But this means, of course, that the binary has to be compiled for every single target and the correct one has to be chosen. This is not attractive or practical. Chances are that only a generic implementation will be available. It would be better to have a self-optimizing implementation: vec operator+(vec &a, vec &b) { if (AVX is available) ... use AVX intrinsics ... else if (SSE4 is available) ... use SSE4 intrinsics ... else if (SSE2 is available) ... use SSE2 intrinsics ... else ... generic implementation ... } This is possible with icc. It is not possible with gcc in the moment. For gcc I would have to split the implementation of all the variants in individual files and then, in the template function as seen above, these implementations would have to be called. Even if as in this case it might be doable (but terribly inconvenient) there are situations where this is really impractical or impossible. The problem is that to be able to use the AVX intrinsics the compiler has to be passed -mavx (all other extensions are implied in -mavx). But this flag has another consequence: the compiler will now take advantage of the new instructions in AVX and generate for unrelated code not associated with intrinsics (e.g., an inlined memset implementation). The result is that such a binary will fail to run on anything but an AVX-enabled machine. In icc the -mavx flag exclusively controls the code generation (i.e., whether AVX is used in inlined memset etc). The SSE intrinsics and all the associated data types are _always_ defined as soon as is included. This means the exmaple code above would be compiled with an -m parameter for the minimum ISA to support and still the AVX, SSE4, ... intrinsics are available. gcc should follow icc's way of handling the intrinsics. Since all this intrinsic business comes from icc I consider this a bug in gcc's implementation instead of an enhancement request. -- Summary: Non-optimal (or wrong) implementation of SSE intrinsics Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com GCC target triplet: i?86-* x86_64-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840
[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
--- Comment #2 from drepper at redhat dot com 2009-04-21 19:37 --- [I couldn't attach the code as an attachment, bugzilla has a bug.] The program below has to be compiled with -mavx to allow the AVX intrinsics being used. But this also triggers using the use of the vmovss instruction to load the parameter for the sin() call from memory. (Forget the reference to memset in the original report, it's as simple as passing floating point parameters that triggers the problem.) #include #include #include static unsigned int eax, ebx, ecx, edx; static int has_avx (void) { if ((ecx & (1 << 27)) == 0) /* No OSXSAVE. */ return 0; unsigned int feat_eax, feat_edx; asm ("xgetbv" : "=a" (feat_eax), "=d" (feat_edx) : "c" (0)); if ((feat_eax & 6) != 6) return 0; return (ecx & (1 << 28)) != 0; } template struct vec { union { T n[N]; __v4sf f[N / (sizeof (__v4sf) / sizeof (T))]; __v8sf fa[N / (sizeof (__v8sf) / sizeof (T))]; }; }; template T optscalar(const vec &src1, const vec &src2) { T r = 0; for (int i = 0; i < N; ++i) r += src1[i] * src2[i]; return r; } template float optscalar(const vec &src1, const vec &src2) { if (has_avx ()) { __m256 tmp = _mm256_setzero_ps (); for (int i = 0; i < N / 8; ++i) tmp = _mm256_add_ps (tmp, _mm256_mul_ps (src1.fa[i], src2.fa[i])); tmp = _mm256_hadd_ps (tmp, tmp); tmp = _mm256_hadd_ps (tmp, tmp); tmp = _mm256_hadd_ps (tmp, tmp); union { __m256 v; float a[8]; } cvt = { tmp }; return cvt.a[0]; } else { __m128 tmp = _mm_setzero_ps (); for (int i = 0; i < N / 4; ++i) tmp = _mm_add_ps (tmp, _mm_mul_ps (src1.f[i], src2.f[i])); tmp = _mm_hadd_ps (tmp, tmp); tmp = _mm_hadd_ps (tmp, tmp); return __builtin_ia32_vec_ext_v4sf (tmp, 0); } } #define N 10 #define DEF(type) vec v##type##1, v##type##2; type type##res, type##cmp DEF(float); float g; int main () { float f = sinf (g); printf ("%g\n", f); asm volatile ("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "0" (1)); float floatres = optscalar (vfloat1, vfloat2); printf ("%g\n", floatres); return 0; } -- drepper at redhat dot com changed: What|Removed |Added Status|WAITING |UNCONFIRMED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840
[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics
--- Comment #4 from drepper at redhat dot com 2009-04-21 19:51 --- (In reply to comment #3) > Gcc 4.4 and above supports different target options on the function > level but not on a basic block level. So you can create an interneral > version for AVX. This doesn't work either. Aside from being also impractical. First, you'd have to switch to AVX mode, in this case, to include . How do you switch back to what was used before? How to even determine it? Even if you can, try it, and you'll see that gcc is horribly broken when it comes to the target("...") attributes. In the current Fedora 11 compiler (4.4) all target options are apparently turned off and none of the intrinsics work at all. Even if the necessary support would be added and the bugs fixed it still differs from icc (where all this comes from) and not in a nice way. To the contrary, it's much, much more complicated. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840
[Bug middle-end/23221] New: -fstack-protector does not protect tail call functions
Compiling this little bit of code with -fstack-protector-all extern int foo (int); int bar (int a, int b) { return foo (a + b); } produces on x86-64 the following object code: 0: 01 f7 add%esi,%edi 2: 64 48 8b 04 25 28 00mov%fs:0x28,%rax 9: 00 00 b: 48 89 44 24 f8 mov%rax,0xfff8(%rsp) 10: 31 c0 xor%eax,%eax 12: e9 00 00 00 00 jmpq 17 The canary is set up but not tested. Before the jump to the next function the value must be checked. This also applies to -fstack-protector (with appropriate input) and to all architectures. -- Summary: -fstack-protector does not protect tail call functions Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23221
[Bug middle-end/23221] -fstack-protector does not protect tail call functions
-- What|Removed |Added CC||rth at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23221
[Bug target/34475] New: TLS and PIE don't mix on x86-64
Compiling this little program as a PIE leads to problems on x86-64: $ cat w.c __thread int a; int main(void) { return a; } Using gcc -o w -g -O2 -pie -fpie w.c one sees /usr/bin/ld: /tmp/ccU3JvLp.o: relocation R_X86_64_TPOFF32 against `a' can not be used when making a shared object; recompile with -fPIC R_X86_64_TPOFF32 is the correct relocation to use for non-PIC binaries but PIEs must be PIC. It's probably just a simple mistake where instead of testing for PIC vs non-PIC the test checks for executable vs DSO. This is no regression. It also exists in gcc 4.1 (the oldest version available here). -- Summary: TLS and PIE don't mix on x86-64 Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com GCC host triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34475
[Bug target/14625] tail call optimization missed
--- Additional Comments From drepper at redhat dot com 2005-01-31 23:34 --- > /* If this function requires more stack slots than the current > function, we cannot change it into a sibling call. */ > || args_size.constant > current_function_args_size > > args_size.constant == 8 (2 ints) and current_function_args_size == 0 > because nothing gets passed on the stack. Correct. But this does not take the stdcall attribute into account. It should. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14625
[Bug tree-optimization/30306] New: printf->puts optimization prevented by %%
If %% is used in printf formats without any actual format requiring substitution being used, gcc still does not perform the optimization. #include int main (void) { printf ("hello !\n"); return 0; } This code is compiled to call printf even though it should lead to code calling puts with the string containing "hello %%!". -- Summary: printf->puts optimization prevented by %% Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: drepper at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30306