https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493
--- Comment #10 from luoxhu at gcc dot gnu.org --- In expand, Power8 will emit two register permute instructions to byte swap the contents by rs6000_emit_le_vsx_move. P9: 5: NOTE_INSN_BASIC_BLOCK 2 2: r129:TF=%1:TF 3: r130:TF=%3:TF 4: NOTE_INSN_FUNCTION_BEG 7: r117:DF=unspec[r129:TF,0] 70 8: r131:V2DF=r121:V2DF 9: r133:DF=vec_select(r131:V2DF,parallel) 10: r131:V2DF=vec_concat(r117:DF,r133:DF) 11: r122:V2DF=r131:V2DF 12: r118:DF=unspec[r129:TF,0x1] 70 13: r119:DF=unspec[r130:TF,0] 70 14: r134:V2DF=r124:V2DF 15: r136:DF=vec_select(r134:V2DF,parallel) 16: r134:V2DF=vec_concat(r119:DF,r136:DF) 17: r125:V2DF=r134:V2DF 18: r120:DF=unspec[r130:TF,0x1] 70 19: r137:V2DF=r122:V2DF 20: r139:DF=vec_select(r137:V2DF,parallel) 21: r137:V2DF=vec_concat(r139:DF,r118:DF) 22: [r112:DI]=r137:V2DF 23: r140:V2DF=r125:V2DF 24: r142:DF=vec_select(r140:V2DF,parallel) 25: r140:V2DF=vec_concat(r142:DF,r120:DF) 26: [r112:DI+0x10]=r140:V2DF 27: r143:V4SI=[r112:DI] 28: r144:V4SI=[r112:DI+0x10] 29: r127:V4SI=r143:V4SI 30: r128:V4SI=r144:V4SI 34: %2:V4SI=r127:V4SI 35: %3:V4SI=r128:V4SI 36: use %2:V4SI 37: use %3:V4SI P8: 5: NOTE_INSN_BASIC_BLOCK 2 2: r129:TF=%1:TF 3: r130:TF=%3:TF 4: NOTE_INSN_FUNCTION_BEG 7: r117:DF=unspec[r129:TF,0] 70 8: r131:V2DF=r121:V2DF 9: r133:DF=vec_select(r131:V2DF,parallel) 10: r131:V2DF=vec_concat(r117:DF,r133:DF) 11: r122:V2DF=r131:V2DF 12: r118:DF=unspec[r129:TF,0x1] 70 13: r119:DF=unspec[r130:TF,0] 70 14: r134:V2DF=r124:V2DF 15: r136:DF=vec_select(r134:V2DF,parallel) 16: r134:V2DF=vec_concat(r119:DF,r136:DF) 17: r125:V2DF=r134:V2DF 18: r120:DF=unspec[r130:TF,0x1] 70 19: r137:V2DF=r122:V2DF 20: r139:DF=vec_select(r137:V2DF,parallel) 21: r137:V2DF=vec_concat(r139:DF,r118:DF) 22: r140:V2DF=vec_select(r137:V2DF,parallel) 23: [r112:DI]=vec_select(r140:V2DF,parallel) 24: r141:V2DF=r125:V2DF 25: r143:DF=vec_select(r141:V2DF,parallel) 26: r141:V2DF=vec_concat(r143:DF,r120:DF) 27: r144:V2DF=vec_select(r141:V2DF,parallel) 28: [r112:DI+0x10]=vec_select(r144:V2DF,parallel) 29: r146:V4SI=vec_select([r112:DI],parallel) 30: r145:V4SI=vec_select(r146:V4SI,parallel) 31: r148:V4SI=vec_select([r112:DI+0x10],parallel) 32: r147:V4SI=vec_select(r148:V4SI,parallel) 33: r127:V4SI=r145:V4SI 34: r128:V4SI=r147:V4SI 38: %2:V4SI=r127:V4SI 39: %3:V4SI=r128:V4SI 40: use %2:V4SI 41: use %3:V4SI Difference starts from #22. Power8 will emit two vec_select instructions for stack store/load operations. But power9 needs only one.