https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
--- Comment #24 from Joel Yliluoma <bisqwit at iki dot fi> ---
The simple horizontal 8-bit add seems to work nicely. Very nice work.
However, the original bug report — that the code snippet quoted below no longer
receives love from the SIMD optimization unless you explicitly say “pragma #omp
simd” — seems still unaddressed.
#define num_words 2
typedef unsigned long long E;
E bytes[num_words];
unsigned char sum()
{
E b[num_words] = {};
//#pragma omp simd
for(unsigned n=0; n<num_words; ++n)
{
// Calculate the sum of all bytes in a word
E temp = bytes[n];
temp += (temp >> 32);
temp += (temp >> 16);
temp += (temp >> 8);
// Save that number in an array
b[n] = temp;
}
// Calculate sum of those sums
unsigned char result = 0;
//#pragma omp simd
for(unsigned n=0; n<num_words; ++n) result += b[n];
return result;
}
Compiler Explorer link: https://godbolt.org/z/XL3cIK