https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435

--- Comment #2 from jchrist at linux dot ibm.com ---
I tried this, but it seems like pcom does not handle vectors at all:  In the
gimple input I have

  vectp.5_32 = r_26(D);
  # VUSE <.MEM_52>
  vect__51.6_1 = MEM <vector(2) doubleD.32> [(doubleD.32 *)vectp.5_32];
  # PT = nonlocal null 
  # ALIGN = 8, MISALIGN = 0
  vectp.5_2 = vectp.5_32 + 16;
  # VUSE <.MEM_52>
  vect__51.7_3 = MEM <vector(2) doubleD.32> [(doubleD.32 *)vectp.5_2];
[...]
  vectp.15_12 = r_26(D);
  # .MEM_13 = VDEF <.MEM_52>
  MEM <vector(2) doubleD.32> [(doubleD.32 *)vectp.15_12] = vect__45.13_11;
  # PT = nonlocal null 
  # ALIGN = 8, MISALIGN = 0
  vectp.15_14 = vectp.15_12 + 16;
  # .MEM_15 = VDEF <.MEM_13>
  MEM <vector(2) doubleD.32> [(doubleD.32 *)vectp.15_14] = vect__45.13_29;

But the analyzed data dependencies are:

(Data Dep: 
#(Data Ref: 
#  bb: 9 
#  stmt: vect__51.6_1 = MEM <vector(2) double> [(double *)vectp.5_32];
#  ref: MEM <vector(2) double> [(double *)vectp.5_32];
#  base_object: MEM <vector(2) double> [(double *)vectp.5_32];
#)
#(Data Ref: 
#  bb: 9 
#  stmt: MEM <vector(2) double> [(double *)vectp.15_12] = vect__45.13_11;
#  ref: MEM <vector(2) double> [(double *)vectp.15_12];
#  base_object: MEM <vector(2) double> [(double *)vectp.15_12];
#)
    (don't know)
)

(Data Dep: 
#(Data Ref: 
#  bb: 9 
#  stmt: vect__51.7_3 = MEM <vector(2) double> [(double *)vectp.5_2];
#  ref: MEM <vector(2) double> [(double *)vectp.5_2];
#  base_object: MEM <vector(2) double> [(double *)vectp.5_2];
#)
#(Data Ref: 
#  bb: 9 
#  stmt: MEM <vector(2) double> [(double *)vectp.15_14] = vect__45.13_29;
#  ref: MEM <vector(2) double> [(double *)vectp.15_14];
#  base_object: MEM <vector(2) double> [(double *)vectp.15_14];
#)
    (don't know)
)

Is this expected?  Because I think this is the reason why the generated code is
still not optimal.  In every loop iteration, we still load the two accumulation
vectors on s390x, just to use them for fma and then store them.  If I
understand commoning correctly, this would be one case where it should solve
this problem and improve the code by loading and storing the accumulator
outside of the loop.

Reply via email to