https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117001

            Bug ID: 117001
           Summary: O3 auto tree loop vectorization produces incorrect
                    output on armv8.2-a+sve
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Robert.Hardwick at arm dot com
  Target Milestone: ---

We have seen some incorrect numbers being produced when O3 is enabled on Arm
Neoverse V1 ( armv8.2-a+sve ). I have reduced the problem down to a small
reproducer and identified that adding -fno-tree-loop-vectorize to gcc options
will produce the correct output.

It seems to happen when we have a C style array contained within a std::array
stucture and it occurs when auto loop vectorization is enabled.

This has been observed on 10.2.1 and 11.4.1

Reproducible example

  #include <array>

  typedef std::array<uint64_t[4], 2> my_type;

  // helpful to print output to stdout
  std::ostream& operator<<(std::ostream& stream, const my_type& vec) {
    stream << "[";
    for ( int j = 0; j < 2; j++){
      for (int i = 0; i != 4; i++) {
        if (i != 0 || j != 0) {
          stream << ", ";
        }
        stream << vec[j][i];
      }
    }
    stream << "]";
    return stream;
  }

  int main() {
    my_type a = {{0, 0, 0, 1, 0, 0, 1, 0}};
    my_type b = {{1, 1, 1, 1, 1, 1, 1, 1}};
    my_type mask = {{0, 0, 0, 0, 0, 1, 0, 0}};

    my_type result = {{0, 0, 0, 0, 0, 0, 0, 0}};

    for (int i = 0; i < 2; i++) {
      for (int j = 0; j < 4; j++) {
        if ( mask[i][j] != 0 )
        {
          result[i][j] = b[i][j];
        } else {
          result[i][j] = a[i][j];
        }
      }
    }

    std::cout << result << std::endl;
  }


Observations

With -O3 -fno-tree-loop-vectorize -march=armv8.2-a+sve  output is INCORRECT

[0, 0, 0, 1, 0, 0, 1, 0]

with -O3 -march=armv8.2-a+sve output is CORRECT

[0, 0, 0, 1, 0, 1, 1, 0]


The operation should be doing the equivalent of 

result[i] = mask[i] ? b[i] : a[i]

So the 6th element ( at i=1, j=1 ) should be 1, not 0.

Reply via email to