https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117001
Bug ID: 117001 Summary: O3 auto tree loop vectorization produces incorrect output on armv8.2-a+sve Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: Robert.Hardwick at arm dot com Target Milestone: --- We have seen some incorrect numbers being produced when O3 is enabled on Arm Neoverse V1 ( armv8.2-a+sve ). I have reduced the problem down to a small reproducer and identified that adding -fno-tree-loop-vectorize to gcc options will produce the correct output. It seems to happen when we have a C style array contained within a std::array stucture and it occurs when auto loop vectorization is enabled. This has been observed on 10.2.1 and 11.4.1 Reproducible example #include <array> typedef std::array<uint64_t[4], 2> my_type; // helpful to print output to stdout std::ostream& operator<<(std::ostream& stream, const my_type& vec) { stream << "["; for ( int j = 0; j < 2; j++){ for (int i = 0; i != 4; i++) { if (i != 0 || j != 0) { stream << ", "; } stream << vec[j][i]; } } stream << "]"; return stream; } int main() { my_type a = {{0, 0, 0, 1, 0, 0, 1, 0}}; my_type b = {{1, 1, 1, 1, 1, 1, 1, 1}}; my_type mask = {{0, 0, 0, 0, 0, 1, 0, 0}}; my_type result = {{0, 0, 0, 0, 0, 0, 0, 0}}; for (int i = 0; i < 2; i++) { for (int j = 0; j < 4; j++) { if ( mask[i][j] != 0 ) { result[i][j] = b[i][j]; } else { result[i][j] = a[i][j]; } } } std::cout << result << std::endl; } Observations With -O3 -fno-tree-loop-vectorize -march=armv8.2-a+sve output is INCORRECT [0, 0, 0, 1, 0, 0, 1, 0] with -O3 -march=armv8.2-a+sve output is CORRECT [0, 0, 0, 1, 0, 1, 1, 0] The operation should be doing the equivalent of result[i] = mask[i] ? b[i] : a[i] So the 6th element ( at i=1, j=1 ) should be 1, not 0.