[Bug c++/35553] New: -fkeep-inline-functions and -O errors out in SSE headers

2008-03-12 Thread gpiez at web dot de
#include 

int main(int argc, char** argv) {
return 0;
}

---

If compiled with g++ -O -fkeep-inline-functions, this errors out with 

/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1-pre20080306/include/emmintrin.h: In
function ‘long long int __vector__ _mm_shuffle_epi32(long long int
__vector__, int)’:
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1-pre20080306/include/emmintrin.h:1382:
error: mask must be an immediate
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1-pre20080306/include/emmintrin.h: In
function ‘long long int __vector__ _mm_shufflelo_epi16(long long int
__vector__, int)’:
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1-pre20080306/include/emmintrin.h:1376:
error: mask must be an immediate
...
and much more lines to follow.

This did not happen with 4.2.3. I am not able to make sure there are no bogus
headers on the host involved, so I attached the preprocessed source.


-- 
   Summary: -fkeep-inline-functions and -O errors out in SSE headers
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: gpiez at web dot de
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35553



[Bug c++/35553] -fkeep-inline-functions and -O errors out in SSE headers

2008-03-12 Thread gpiez at web dot de


--- Comment #1 from gpiez at web dot de  2008-03-12 14:49 ---
Created an attachment (id=15303)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15303&action=view)
preprocessed source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35553



[Bug middle-end/36041] Speed up builtin_popcountll

2012-10-26 Thread gpiez at web dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041



Gunther Piez  changed:



   What|Removed |Added



 CC||gpiez at web dot de



--- Comment #10 from Gunther Piez  2012-10-26 15:51:24 UTC 
---

Just noted the exceptional slowness of the provided __builtin_popcountll() even

on ARMv5.



I already used the above parallel bit count algorithm in the case that a native

bit count instruction (like the SSE popcnt or NEON vcnt) is not present, but

native 64 bit registers are available. 



But on a 32 bit architecture like ARM I figured it made sense to just use the

__builtin_popcountll() because the many 64 bit instructions in the algorithm

may be very slow without NEON or similar support on a pure 32 bit architecture.



But "optimizing" my code with some macro magic to make it use the library

popcount made the whole program 25% slower, although only a minor part of it

actually does use the popcount instruction.


[Bug c/50168] New: __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64

2011-08-23 Thread gpiez at web dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168

 Bug #: 50168
   Summary: __builtin_ctz() and intrinsics __bsr(), __bsf()
generate suboptimal code on x86_64
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: gp...@web.de


Testcase:


#include 

static inline long my_bsfq(long x) __attribute__((__always_inline__));
static inline long my_bsfq(long x) {
long result;
asm(" bsfq %1, %0 \n"
: "=r"(result)
: "r"(x)
);
return result;
}

long c[64];

long f(long i) {
return c[ __bsfq(i) ];
}

long g(long i) {
return c[ __builtin_ctzll(i) ];
}

long h(long i) {
return c[ my_bsfq(i) ];
}
--



When I compile this with 'gcc -O3 -g testcase.c -c -o testcase.o
&& objdump -d testcase', I get



--
 :
   0:   48 0f bc ff bsf%rdi,%rdi
   4:   48 63 ffmovslq %edi,%rdi
   7:   48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax
   e:   00 
   f:   c3  retq   

0010 :
  10:   48 0f bc ff bsf%rdi,%rdi
  14:   48 63 ffmovslq %edi,%rdi
  17:   48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax
  1e:   00 
  1f:   c3  retq   

0020 :
  20:   48 0f bc ff bsf%rdi,%rdi
  24:   48 8b 04 fd 00 00 00mov0x0(,%rdi,8),%rax
  2b:   00 
  2c:   c3  retq   
---



Please note the unneeded 32 to 64 bit conversion 'movslq ...' inserted by the
compiler in functions f() and g(). It should look like h() instead.

I suspect the source is the prototype of the builtin, whose return type 'int'
does not match the "natural" return type on x86_64, which is 64 bit, the same
register size as the input register.

If I replace the builtin/intrinsic with the selfmade asm one, I get a nice
speedup of 2% in my chessengine.


[Bug c/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64

2011-08-23 Thread gpiez at web dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168

--- Comment #3 from Gunther Piez  2011-08-23 21:54:40 UTC 
---
On 23.08.2011 19:58, jakub at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
>
> Jakub Jelinek  changed:
>
>What|Removed |Added
> 
>  CC||uros at gcc dot gnu.org
>
> --- Comment #2 from Jakub Jelinek  2011-08-23 
> 17:58:52 UTC ---
> Those aren't equivalent unfortunately, because bsf and bsr insns on x86 have
> undefined value if the source is zero.  While __builtin_c[lt]z* documentation
> says that the result is undefined in that case, I wonder if it would be fine
> even if long l = (int) __builtin_c[lt]z* (x); gave a value that wasn't 
> actually
> sign-extended to 64 bits.
> The combiner already simplifies zero or sign extension of popcount/parity/ffs
> and, if ctz or clz value is defined at zero, also those, but if it is 
> undefined
> it assumes anything in any of the bits and thus can't optimize the sign/zero
> extension away.  With -mbmi it will be optimized just fine, because for tzcnt
> (and lzcnt for -mlzcnt) insns are well defined even for source operand zero.
>


[Bug c/50168] __builtin_ctz() and intrinsics __bsr(), __bsf() generate suboptimal code on x86_64

2011-08-23 Thread gpiez at web dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168

--- Comment #4 from Gunther Piez  2011-08-23 22:00:31 UTC 
---
On 23.08.2011 19:58, jakub at gcc dot gnu.org wrote:
> While __builtin_c[lt]z* documentation
> says that the result is undefined in that case, I wonder if it would be fine
> even if long l = (int) __builtin_c[lt]z* (x); gave a value that wasn't 
> actually
> sign-extended to 64 bits.

So that software operating on the assumption that the value return by
__builtin_c[lt]z* is always int, even in the undefined case, would break
as soon at it sees a value outside the int range. Which could very well
be the case, AFAIK in the zero case the value of the target register is
just unchanged.

IMHO this is ok, I doubt that such code exists and even if, it is very
broken by design :-)
 Just my 2 cent.


[Bug lto/48246] New: ICE in lto_wpa_write_files

2011-03-22 Thread gpiez at web dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48246

   Summary: ICE in lto_wpa_write_files
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: gp...@web.de


Created attachment 23754
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23754
testcase

I get an ICE when compiling the testcase with

g++ -r  -nostdlib testcase.ii -O3  -flto  -o /dev/null

Error message is lto1: internal compiler error: in lto_wpa_write_files, at
lto/lto.c:1518
This is gcc-4.6.0-rc2.


[Bug c++/44500] [C++0x] Bogus narrowing conversion error

2011-03-24 Thread gpiez at web dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



--- Comment #18 from Gunther Piez  2011-03-24 11:45:47 UTC 
---

I have chosen the "recommended" way and added a cast, -fpermissive would allow

to many other dubious constructs to pass. Still I think c++ should get rid of

implicit integer conversions :-)


[Bug c++/44500] New: Bogus narrowing conversion error

2010-06-11 Thread gpiez at web dot de
Compiling with g++ -std=c++0x, using gcc-4.5.0 :

struct A {
char x;
};

template void f() {
char y = 42;
A a = { y+C };
}

int main() {
f<1>();
}

yields an "error: narrowing conversion of ‘(((int)y) + 8)’ from ‘int’
to ‘char’ inside { }".
If I change the template parameter type from "char C" to "int C" the error
message persists, this seems wrong too, but I am not quite shre.

If I leave out the "y", everything is fine.


-- 
   Summary: Bogus narrowing conversion error
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: gpiez at web dot de
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



[Bug c++/44500] [C++0x] Bogus narrowing conversion error

2010-06-11 Thread gpiez at web dot de


--- Comment #2 from gpiez at web dot de  2010-06-11 11:34 ---
Sorry for the unicode mess. The error message is 'error: narrowing conversion
of "(((int)y) + 1)" from "int" to "char" inside { }'.

The same error happens with a non templated function, but if I use two template
parameters, the error disappears, even if they are to large. So this is at
least very inconsistent.


no error:

struct A {  
<-->char x; 
};  

templatevoid f() {  
<-->A a = { C+D };  
}   

int main() {
<-->f<1,2>();   
}   




still no error:

struct A {  
<-->char x; 
};  

templatevoid f() {
<-->A a = { C+D };  
}   

int main() {
<-->f<1,2>();   
}   


error:

struct A {  
<-->char x; 
};  

void f(char C, char D) {
<-->A a = { C+D };  
}   

int main() {
<-->f(1,2); 
}   



I believe I should not get an error, even if the template parameter type is
larger than a char, as long as the template parameter value fits in in char, so

template void f() {
char y = 42;
A a = { y+C };
}

should give no error, as long as C fits in a char. IMHO ;-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



[Bug c++/44500] [C++0x] Bogus narrowing conversion error

2010-06-11 Thread gpiez at web dot de


--- Comment #5 from gpiez at web dot de  2010-06-11 12:09 ---
So is it provable that for a "T op T" to be stored in T no narrowing takes
place?

If the answer for T == char is no and for T == int it is yes this is rather
fishy ;-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



[Bug c++/44500] [C++0x] Bogus narrowing conversion error

2010-06-11 Thread gpiez at web dot de


--- Comment #9 from gpiez at web dot de  2010-06-11 13:27 ---
I understand now after the implicit promotion to int of a non constant value
the result of the narrowing operation can't be guaranteed to fit in the
original type. But I still think it shouldn't give an error, and if the
standard says so, I think it is flawed in this regard ;-)

Consider

g();  // Warning, but no Error

despite it can be proven that the value will not fit and this is very likely an
error. Opposing to

char c,d;
A a = { c+d };

which is very likely not an error and would only require a mild warning. IMHO.

Manuel, in your testcase, you do not only warn, you error out if compiled with
-std=c++0x.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



[Bug c++/44500] [C++0x] Bogus narrowing conversion error

2010-06-12 Thread gpiez at web dot de


--- Comment #13 from gpiez at web dot de  2010-06-12 08:47 ---
...


-- 

gpiez at web dot de changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



[Bug c++/44500] [C++0x] Bogus narrowing conversion error

2010-06-12 Thread gpiez at web dot de


--- Comment #12 from gpiez at web dot de  2010-06-12 08:46 ---
I am closing this, as it isn't a gcc bug, as it behaves according to the
standard.

The bug is in the standard, as it mandates

f<1,1>  // ok
f() // error
g()   // no error, but undefined behaviuour

f(char, char)   // error
g(int, int) // ok

which is inconsistent and surprising. C++0x should really have got rid of the
implicit integer promotion. Wasn't the intent of the implicit promotion to be
able to write 

char a,b,c,d;
a = b*c/d;

and get a correct result even if b*c > CHAR_MAX? I believe nobody does write
code like this anymore, and even if, you could simply say "undefined behaviour"
;-) It doesn't work for ints anyway.

Instead I have now an implicit integer promotion which forces me to use an
explicit cast in compound initializers, where narrowing conversion isn't
allowed, while in a simple assignment of course it is allowed (or else a hell
would break loose... ). Why not make -Wconversion an error, at least this would
be consistent ;-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44500



[Bug c++/44811] New: non controlable bogus warning: right/left shift count is negative

2010-07-04 Thread gpiez at web dot de
template
uint64_t shift(uint64_t b) {
if (N > 0)
return b << N;
else
return b >> -N;
}

int main() {
int a = shift<-5>(0x100);
int b = shift<5>(0x100);
return a+b;
}

---
I am using this function template in a header, and other warnings and even
errors tend to get cluttered by the output of bogus "shift count is negative"
warnings. I understand that dead code elimination happens only in the optimizer
and probably to late to see that the negative shift count branch is never
executed, and I could live with that if the warning was controlable with some
"-W" option. Alas, it seems, it is not.
The only way to supress this warning is using "-w", which also inhibits other,
potential useful warnings, so using "-w" everywhere is not really an option.


-- 
   Summary: non controlable bogus warning: right/left shift count is
negative
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: gpiez at web dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44811