https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96133

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  The i == 1 lane is different.  We're using standard interleaving
vectorization here, the innermost two loops are unrolled and rgb_cam is elided.

Note eventually we optimize the whole loop at compile-time to

  <bb 2> [local count: 89478486]:
  MEM <vector(2) double> [(double *)&xyz_cam] = {
2.97789709999999985257090884260833263397216796875e+0,
3.94211709999999992959374139900319278240203857421875e+0 };
  MEM <vector(2) double> [(double *)&xyz_cam + 16B] = {
4.9063371000000000066165739553980529308319091796875e+0,
3.291832700000000055950977184693329036235809326171875e+0 };
  MEM <vector(2) double> [(double *)&xyz_cam + 32B] = {
4.06932820000000017301999832852743566036224365234375e+0,
4.8468236999999998459998096222989261150360107421875e+0 };
  MEM <vector(2) double> [(double *)&xyz_cam + 48B] = {
5.40156330000000028945805752300657331943511962890625e+0,
6.2267732999999996224005371914245188236236572265625e+0 };
  xyz_cam[2][2] = 7.051983299999999843521436559967696666717529296875e+0;

Reply via email to