I have enountered a strange difference in behavior with gcc optimizations. I am
testing this with gcc-4.5.0 on a x86_64 target. Given the following code which
defines to template functions. The goal is to have a "find_first" and
"find_last" for a bitset. It will return the size of the bitset if no bits are
set, otherwise it should return the 0 based index of the first/last set bit.
There is a generic version that loops, and a specialized version which uses
builtins to avoid loops:
#include <bitset>
#include <iostream>
#include <string>
namespace util {
template <std::size_t N>
int find_last(const std::bitset<N> &bs) {
if(bs.none()) {
return N;
}
int i = bs.size();
while(i >= 0) {
if(bs[i]) {
break;
}
--i;
}
return i;
}
template <std::size_t N>
int find_first(const std::bitset<N> &bs) {
if(bs.none()) {
return N;
}
int i;
for(i = 0; i < bs.size(); ++i) {
if(bs[i]) {
break;
}
}
return i;
}
// optimized versions for 32-bit bitsets
template <>
int find_first(const std::bitset<32> &bs) {
const int x = __builtin_ffsl(bs.to_ulong());
return x ? x - 1 : 32;
}
template <>
int find_last(const std::bitset<32> &bs) {
const int x = __builtin_clz(bs.to_ulong());
return (x != 31) ? 32 - x - 1 : 32;
}
}
int main() {
std::bitset<32> bs(0);
std::cout << util::find_last(bs) << std::endl;
std::cout << util::find_first(bs) << std::endl;
std::cout << bs.to_string<char, std::char_traits<char>,
std::allocator<char> >() << std::endl;
}
I receive the following output from this command:
$ g++ -W -Wall -ansi -pedantic -O0 test.cpp -o test && ./test
32
32
00000000000000000000000000000000
but if i enable optimizations, i get this instead:
$ g++ -W -Wall -ansi -pedantic -O1 test.cpp -o test && ./test
-1217882104
32
00000000000000000000000000000000
here's where it get's interesting... If i put some code in the specialized
version of find_last, it goes back to behaving as expected:
template <>
int find_last(const std::bitset<32> &bs) {
const int x = __builtin_clz(bs.to_ulong());
std::cout << "\n";
return (x != 31) ? 32 - x - 1 : 32;
}
$ g++ -W -Wall -ansi -pedantic -O1 test.cpp -o test && ./test
32
32
00000000000000000000000000000000
I am having a hard time figuring out which optimization flag is causing this.
Inspecing with -Q shows me that the following flags are enabled by -O1
(-fcprop-registers -fdefer-pop -fforward-propagate -fguess-branch-probability
-fif-conversion -fif-conversion2 -fipa-pure-const -fipa-reference
-fmerge-constants -fomit-frame-pointer -fsplit-wide-types -ftree-ccp -ftree-ch
-ftree-copy-prop -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse
-ftree-fre -ftree-sink -ftree-sra -ftree-ter) yet manually enabling all of
these, doesn't cause the behavior.
--
Summary: __builtin_clz is behaving differently with different
optimization levels
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: eteran at alum dot rit dot edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44330