[Bug tree-optimization/24696] New: missing optimization in comparison of results of bit operations

2005-11-06 Thread drepper at redhat dot com
Take this little program:

int
f (unsigned long a, unsigned long b, unsigned long c)
{
  return (a & (c - 1)) != 0 || (b & (c - 1)) != 0;
}

Compiled on x86-64 with gcc 4.0.2 (but I think also with the current mainline)
yields with -O2 the following code:

 :
   0:   48 ff cadec%rdx
   3:   48 85 d7test   %rdx,%rdi
   6:   75 07   jnef 
   8:   31 c0   xor%eax,%eax
   a:   48 85 d6test   %rdx,%rsi
   d:   74 05   je 14 
   f:   b8 01 00 00 00  mov$0x1,%eax
  14:   f3 c3   repz retq

As can be seen, both comparisons are executed individually.  This is
unnecessarily slow.  Since the right operand for & is the same and this is a
pure bit-test it is perfectly fine to compile the code to the equivalent of

int
f (unsigned long a, unsigned long b, unsigned long c)
{
  return ((a | b) & (c - 1)) != 0;
}

This would be significantly faster.  On archs like x86-64 no conditional jump
(just a setne) would be needed.


-- 
   Summary: missing optimization in comparison of results of bit
operations
   Product: gcc
   Version: 4.0.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24696



[Bug middle-end/25521] New: change semantics of const volatile variables

2005-12-21 Thread drepper at redhat dot com
In math code we often have to make sure the compiler does not fold operations
at compile time.  In glibc we use variable declared as

  static const volatile double foo = 42.0;

The problem is that gcc moves such variables into .data.  But we could achieve
that easily by leaving out the 'const'.  What is needed is a method to achieve
volatile behavior while having the variable in .rodata (and .rodata.cst8 etc).

I therefore would like to ask for a change in the compiler which preserves the
'const' in the presence of 'volatile' and place the variable in read-only
memory.


-- 
   Summary: change semantics of const volatile variables
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521



[Bug middle-end/25522] New: zero-initialized constants are place in .bss

2005-12-21 Thread drepper at redhat dot com
Compile this code:

struct foo { int a, b; }
const struct foo f;

The compiler will mark the variable f in .bss instead of, as the const
indicates, into .rodata.  This can be a security problem.  In glibc we
deliberately use const wherever possible (as should everybody) to prevent
anybody from changing the value.  Allowing changes would allw an intruder to
modify the variable and influence the semantics of the program.

Yes, this means that binaries get larger.  But that's what the programmer
requested.


-- 
   Summary: zero-initialized constants are place in .bss
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25522



[Bug middle-end/25521] change semantics of const volatile variables

2005-12-21 Thread drepper at redhat dot com


--- Comment #2 from drepper at redhat dot com  2005-12-21 19:38 ---
Using gcc's section attributes won't fully work either.

Using __attribute((section(".rodata"))) is OK in the compiler, although the
assembler (correctly) complaints.  But what is really needed is

__attribute((section(".rodata.cst8"))).  This will cause gcc to fail with an
ICE.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521



[Bug c++/25541] New: invalid warning about unused variable

2005-12-22 Thread drepper at redhat dot com
The -Wunused warning generation doesn't take modifications of global variables
into account.  Compiling the following code with -Wunused -Werror fails
although this is perfectly reasonable code.  Some registered exit handler could
check the value of the variable.

int global;

struct monitor
{
  ~monitor() { global = 1; }
};

int
main ()
{
  monitor m;
}


-- 
   Summary: invalid warning about unused variable
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25541



[Bug middle-end/25522] zero-initialized constants are place in .bss

2005-12-25 Thread drepper at redhat dot com


--- Comment #5 from drepper at redhat dot com  2005-12-26 05:52 ---
> What happens if you use -fno-common?

In this case the variable gets the index of .bss in the symbol table instead of
using SHN_COMMON.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25522



[Bug rtl-optimization/25609] New: too agressive printf optimization

2005-12-30 Thread drepper at redhat dot com
At least glibc's printf, maybe others as well, print (null) in for code like

  printf ("%s", NULL)

gcc doesn't consider this when optimizing code where the pointer passed for a
%s format specifier can be NULL.  Example:

#include 
int
main (int argc, char *argv[])
{
  printf ("%s\n", argc > 1 ? argv[1] : NULL);
  return 0;
}

Compiling and running this code (I use gcc 4.0.2) will result in a program
which crashes because the printf is transformed into a puts() call and puts()
does not allow NULL pointers.

There should at least be a mode in which gcc does not perform the
transformation if it cannot be sure the pointer is not NULL.  The default for
Linux and maybe other platforms should be to not perform this optimization if
the pointer can be NULL.


-- 
   Summary: too agressive printf optimization
   Product: gcc
   Version: 4.0.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609




[Bug rtl-optimization/25609] too agressive printf optimization

2005-12-30 Thread drepper at redhat dot com


--- Comment #4 from drepper at redhat dot com  2005-12-30 23:06 ---
No, it's *NOT* undefined.  The libc interface decides what is defined and what
is not and it is *EXPLICITLY* documented that NULL pointers are printed as
(null).

The standard might leave it undefined but this does *NOT* mean the
implementation cannot define it.


-- 

drepper at redhat dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609




[Bug rtl-optimization/25609] too agressive printf optimization

2005-12-30 Thread drepper at redhat dot com


--- Comment #6 from drepper at redhat dot com  2005-12-30 23:08 ---
This is NOT a dup of 15574.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609




[Bug rtl-optimization/25609] too agressive printf optimization

2005-12-30 Thread drepper at redhat dot com


--- Comment #8 from drepper at redhat dot com  2005-12-30 23:14 ---
> That is true but GCC is a C compiler and not a glibc implemention C compiler.

This doesn't mean anything.  As soon as you configure gcc to target it to Linux
the behavior of the runtime is as defined by the C library.  gcc doesn't come
with it's own C library so it cannot possibly override any decisions made about
undefined behavior.  I explicitly said that this optimization need ony be
disabled for platforms using glibc.  I don't give a rats ass what other
platforms do.


-- 

drepper at redhat dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609




[Bug rtl-optimization/25609] too agressive printf optimization

2005-12-30 Thread drepper at redhat dot com


--- Comment #10 from drepper at redhat dot com  2005-12-30 23:44 ---
glibc *is* the world as far as Linux is concerned.  You consistently and
deliberately misinterpret what I write: I'm not talking about any platform
which does not use glibc or glibc's behavior.

And RTH already concurred in private that this is a problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609




[Bug rtl-optimization/25609] too agressive printf optimization

2005-12-30 Thread drepper at redhat dot com


--- Comment #12 from drepper at redhat dot com  2005-12-31 00:19 ---
> That is not true at all and you know that.  There is uclibc.

Now you've completely given up on logic?  First of all, uclibc and whatever
other libc immitation is out there does not define the linux API.  glibc *is*
the world, all the others are just replacements of varying degree of
conformance.  This can be seen in the fact that even uclibc implements printf
with the behavior in question.

But more importantly here: even if there were one piece of code which behaves
differently, this does not disqualify the argument that the API for Linux
defines the behavior in question.  This is an OR operation, not AND.  glibc
defines the behavior and this means the compiler must handle such code
approriately if compiled for Linux.


-- 

drepper at redhat dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609




[Bug middle-end/39840] New: Non-optimal (or wrong) implementation of SSE intrinsics

2009-04-21 Thread drepper at redhat dot com
The implementations of the SSE intrinsics for x86 and x86-64 in gcc is tied to
the use of an appropriate -m option, such as -mssse3 or -mavx.  This is
different from what icc does and it prevents code from being written in the
most natural form.  This is nothing new in gcc 4.4, it has been the behavior of
gcc forever, as far as I can see.  But especially the introduction of AVX
brings this problem to the foreground.

As an example, assume I want to write a vector class with the usual operations.
 I can write code like this:

#ifdef __AVX__
vec operator+(vec &a, vec &b) {
  ... use AVX intrinsics ...
}
#elif defined __SSE4__
vec operator+(vec &a, vec &b) {
  ... use SSE4 intrinsics ...
}
#elif defined __SSE2__
vec operator+(vec &a, vec &b) {
  ... use SSE2 intrinsics ...
}
#else
vec operator+(vec &a, vec &b) {
  ... generic implementation ...
}
#endif

But this means, of course, that the binary has to be compiled for every single
target and the correct one has to be chosen.  This is not attractive or
practical.  Chances are that only a generic implementation will be available.

It would be better to have a self-optimizing implementation:

vec operator+(vec &a, vec &b) {
  if (AVX is available)
... use AVX intrinsics ...
  else if (SSE4 is available)
... use SSE4 intrinsics ...
  else if (SSE2 is available)
... use SSE2 intrinsics ...
  else
... generic implementation ...
}

This is possible with icc.  It is not possible with gcc in the moment.  For gcc
I would have to split the implementation of all the variants in individual
files and then, in the template function as seen above, these implementations
would have to be called.  Even if as in this case it might be doable (but
terribly inconvenient) there are situations where this is really impractical or
impossible.


The problem is that to be able to use the AVX intrinsics the compiler has to be
passed -mavx (all other extensions are implied in -mavx).   But this flag has
another consequence: the compiler will now take advantage of the new
instructions in AVX and generate for unrelated code not associated with
intrinsics (e.g., an inlined memset implementation).  The result is that such a
binary will fail to run on anything but an AVX-enabled machine.


In icc the -mavx flag exclusively controls the code generation (i.e., whether
AVX is used in inlined memset etc).  The SSE intrinsics and all the associated
data types are _always_ defined as soon as  is included.


This means the exmaple code above would be compiled with an -m parameter for
the minimum ISA to support and still the AVX, SSE4, ... intrinsics are
available.


gcc should follow icc's way of handling the intrinsics.  Since all this
intrinsic business comes from icc I consider this a bug in gcc's implementation
instead of an enhancement request.


-- 
   Summary: Non-optimal (or wrong) implementation of SSE intrinsics
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: drepper at redhat dot com
GCC target triplet: i?86-* x86_64-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840



[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics

2009-04-21 Thread drepper at redhat dot com


--- Comment #2 from drepper at redhat dot com  2009-04-21 19:37 ---
[I couldn't attach the code as an attachment, bugzilla has a bug.]

The program below has to be compiled with -mavx to allow the AVX intrinsics
being used.  But this also triggers using the use of the vmovss instruction to
load the parameter for the sin() call from memory.

(Forget the reference to memset in the original report, it's as simple as
passing floating point parameters that triggers the problem.)

#include 
#include 
#include 


static unsigned int eax, ebx, ecx, edx;


static int
has_avx (void)
{
  if ((ecx & (1 << 27)) == 0)
/* No OSXSAVE.  */
return 0;

  unsigned int feat_eax, feat_edx;
  asm ("xgetbv" : "=a" (feat_eax), "=d" (feat_edx) : "c" (0));
  if ((feat_eax & 6) != 6)
return 0;

  return (ecx & (1 << 28)) != 0;
}


template 
struct vec {
  union {
T n[N];
__v4sf f[N / (sizeof (__v4sf) / sizeof (T))];
__v8sf fa[N / (sizeof (__v8sf) / sizeof (T))];
  };
};


template 
T
optscalar(const vec &src1, const vec &src2)
{
  T r = 0;
  for (int i = 0; i < N; ++i)
r += src1[i] * src2[i];
  return r;
}


template 
float
optscalar(const vec &src1, const vec &src2)
{
  if (has_avx ())
{
  __m256 tmp = _mm256_setzero_ps ();
  for (int i = 0; i < N / 8; ++i)
tmp = _mm256_add_ps (tmp, _mm256_mul_ps (src1.fa[i], src2.fa[i]));
  tmp = _mm256_hadd_ps (tmp, tmp);
  tmp = _mm256_hadd_ps (tmp, tmp);
  tmp = _mm256_hadd_ps (tmp, tmp);
  union
  {
__m256 v;
float a[8];
  } cvt = { tmp };
  return cvt.a[0];
}
  else
{
  __m128 tmp = _mm_setzero_ps ();
  for (int i = 0; i < N / 4; ++i)
tmp = _mm_add_ps (tmp, _mm_mul_ps (src1.f[i], src2.f[i]));
  tmp = _mm_hadd_ps (tmp, tmp);
  tmp = _mm_hadd_ps (tmp, tmp);
  return __builtin_ia32_vec_ext_v4sf (tmp, 0);
}
}


#define N 10
#define DEF(type) vec v##type##1, v##type##2; type type##res, type##cmp
DEF(float);

float g;

int
main ()
{
  float f = sinf  (g);
  printf ("%g\n", f);

  asm volatile ("cpuid"
: "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
: "0" (1));

  float floatres = optscalar (vfloat1, vfloat2);
  printf ("%g\n", floatres);

  return 0;
}


-- 

drepper at redhat dot com changed:

   What|Removed |Added

 Status|WAITING |UNCONFIRMED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840



[Bug middle-end/39840] Non-optimal (or wrong) implementation of SSE intrinsics

2009-04-21 Thread drepper at redhat dot com


--- Comment #4 from drepper at redhat dot com  2009-04-21 19:51 ---
(In reply to comment #3)
> Gcc 4.4 and above supports different target options on the function  
> level but not on a basic block level. So you can create an interneral  
> version for AVX.

This doesn't work either.  Aside from being also impractical.

First, you'd have to switch to AVX mode, in this case, to include
.  How do you switch back to what was used before?  How to even
determine it?

Even if you can, try it, and you'll see that gcc is horribly broken when it
comes to the target("...") attributes.  In the current Fedora 11 compiler (4.4)
all target options are apparently turned off and none of the intrinsics work at
all.

Even if the necessary support would be added and the bugs fixed it still
differs from icc (where all this comes from) and not in a nice way.  To the
contrary, it's much, much more complicated.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39840



[Bug middle-end/23221] New: -fstack-protector does not protect tail call functions

2005-08-03 Thread drepper at redhat dot com
Compiling this little bit of code with -fstack-protector-all

extern int foo (int);
int bar (int a, int b)
{
  return foo (a + b);
}

produces on x86-64 the following object code:

   0:   01 f7   add%esi,%edi
   2:   64 48 8b 04 25 28 00mov%fs:0x28,%rax
   9:   00 00
   b:   48 89 44 24 f8  mov%rax,0xfff8(%rsp)
  10:   31 c0   xor%eax,%eax
  12:   e9 00 00 00 00  jmpq   17 

The canary is set up but not tested.  Before the jump to the next function the
value must be checked.  This also applies to -fstack-protector (with appropriate
input) and to all architectures.

-- 
   Summary: -fstack-protector does not protect tail call functions
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: drepper at redhat dot com
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23221


[Bug middle-end/23221] -fstack-protector does not protect tail call functions

2005-08-03 Thread drepper at redhat dot com


-- 
   What|Removed |Added

 CC||rth at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23221


[Bug target/34475] New: TLS and PIE don't mix on x86-64

2007-12-14 Thread drepper at redhat dot com
Compiling this little program as a PIE leads to problems on x86-64:

$ cat w.c
__thread int a;

int
main(void)
{
  return a;
}

Using

  gcc -o w -g -O2 -pie -fpie w.c


one sees

/usr/bin/ld: /tmp/ccU3JvLp.o: relocation R_X86_64_TPOFF32 against `a' can not
be used when making a shared object; recompile with -fPIC


R_X86_64_TPOFF32 is the correct relocation to use for non-PIC binaries but PIEs
must be PIC.  It's probably just a simple mistake where instead of testing for
PIC vs non-PIC the test checks for executable vs DSO.

This is no regression.  It also exists in gcc 4.1 (the oldest version available
here).


-- 
   Summary: TLS and PIE don't mix on x86-64
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: drepper at redhat dot com
  GCC host triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34475



[Bug target/14625] tail call optimization missed

2005-01-31 Thread drepper at redhat dot com

--- Additional Comments From drepper at redhat dot com  2005-01-31 23:34 
---
>  /* If this function requires more stack slots than the current 
> function, we cannot change it into a sibling call.  */ 
>  || args_size.constant > current_function_args_size 
> 
> args_size.constant == 8 (2 ints) and current_function_args_size == 0 
> because nothing gets passed on the stack. 

Correct.  But this does not take the stdcall attribute into account.  It 
should. 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14625


[Bug tree-optimization/30306] New: printf->puts optimization prevented by %%

2006-12-26 Thread drepper at redhat dot com
If %% is used in printf formats without any actual format requiring
substitution being used, gcc still does not perform the optimization.

#include 
int
main (void)
{
  printf ("hello !\n");
  return  0;
}

This code is compiled to call printf even though it should lead to code calling
puts with the string containing "hello %%!".


-- 
   Summary: printf->puts optimization prevented by %%
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30306