https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493

--- Comment #10 from luoxhu at gcc dot gnu.org ---
In expand, Power8 will emit two register permute instructions to byte swap the
contents by rs6000_emit_le_vsx_move.

P9:
    5: NOTE_INSN_BASIC_BLOCK 2
    2: r129:TF=%1:TF
    3: r130:TF=%3:TF
    4: NOTE_INSN_FUNCTION_BEG
    7: r117:DF=unspec[r129:TF,0] 70
    8: r131:V2DF=r121:V2DF
    9: r133:DF=vec_select(r131:V2DF,parallel)
   10: r131:V2DF=vec_concat(r117:DF,r133:DF)
   11: r122:V2DF=r131:V2DF
   12: r118:DF=unspec[r129:TF,0x1] 70
   13: r119:DF=unspec[r130:TF,0] 70
   14: r134:V2DF=r124:V2DF
   15: r136:DF=vec_select(r134:V2DF,parallel)
   16: r134:V2DF=vec_concat(r119:DF,r136:DF)
   17: r125:V2DF=r134:V2DF
   18: r120:DF=unspec[r130:TF,0x1] 70
   19: r137:V2DF=r122:V2DF
   20: r139:DF=vec_select(r137:V2DF,parallel)
   21: r137:V2DF=vec_concat(r139:DF,r118:DF)
   22: [r112:DI]=r137:V2DF
   23: r140:V2DF=r125:V2DF
   24: r142:DF=vec_select(r140:V2DF,parallel)
   25: r140:V2DF=vec_concat(r142:DF,r120:DF)
   26: [r112:DI+0x10]=r140:V2DF
   27: r143:V4SI=[r112:DI]
   28: r144:V4SI=[r112:DI+0x10]
   29: r127:V4SI=r143:V4SI
   30: r128:V4SI=r144:V4SI
   34: %2:V4SI=r127:V4SI
   35: %3:V4SI=r128:V4SI
   36: use %2:V4SI
   37: use %3:V4SI

P8:
    5: NOTE_INSN_BASIC_BLOCK 2
    2: r129:TF=%1:TF
    3: r130:TF=%3:TF
    4: NOTE_INSN_FUNCTION_BEG
    7: r117:DF=unspec[r129:TF,0] 70
    8: r131:V2DF=r121:V2DF
    9: r133:DF=vec_select(r131:V2DF,parallel)
   10: r131:V2DF=vec_concat(r117:DF,r133:DF)
   11: r122:V2DF=r131:V2DF
   12: r118:DF=unspec[r129:TF,0x1] 70
   13: r119:DF=unspec[r130:TF,0] 70
   14: r134:V2DF=r124:V2DF
   15: r136:DF=vec_select(r134:V2DF,parallel)
   16: r134:V2DF=vec_concat(r119:DF,r136:DF)
   17: r125:V2DF=r134:V2DF
   18: r120:DF=unspec[r130:TF,0x1] 70
   19: r137:V2DF=r122:V2DF
   20: r139:DF=vec_select(r137:V2DF,parallel)
   21: r137:V2DF=vec_concat(r139:DF,r118:DF)
   22: r140:V2DF=vec_select(r137:V2DF,parallel)
   23: [r112:DI]=vec_select(r140:V2DF,parallel)
   24: r141:V2DF=r125:V2DF
   25: r143:DF=vec_select(r141:V2DF,parallel)
   26: r141:V2DF=vec_concat(r143:DF,r120:DF)
   27: r144:V2DF=vec_select(r141:V2DF,parallel)
   28: [r112:DI+0x10]=vec_select(r144:V2DF,parallel)
   29: r146:V4SI=vec_select([r112:DI],parallel)
   30: r145:V4SI=vec_select(r146:V4SI,parallel)
   31: r148:V4SI=vec_select([r112:DI+0x10],parallel)
   32: r147:V4SI=vec_select(r148:V4SI,parallel)
   33: r127:V4SI=r145:V4SI
   34: r128:V4SI=r147:V4SI
   38: %2:V4SI=r127:V4SI
   39: %3:V4SI=r128:V4SI
   40: use %2:V4SI
   41: use %3:V4SI

Difference starts from #22. Power8 will emit two vec_select instructions for
stack store/load operations. But power9 needs only one.

Reply via email to