As I mentioned in the intro, for the case where we are optimizing the extract of a variable element from a vector in memory, the current code takes a regular address, and the temporary that holds the byte offset, and tries to generate a new address. In particular, it failed when the vector was a PC-relative address, because it didn't have enough temporary registers, and it used the temporary to hold the byte offset to hold the address.
Initially in doing these patches, I reworked the constraints for prefixed and non-prefixed memory so we could identify when we needed a second temporary. Then I realized that eventaully we will want to generate an X-FORM (register + register) address, and it was just simpler to use the 'Q' constraint, and have the register allocator put the address into a register. I have verified that the bug is indeed fixed (patch #15 will include the new tests for this). I have also bootstrapped the compiler on a little endian power8 machine and there were no regressions in the test suite. Can I check this patch into the trunk? 2019-12-20 Michael Meissner <meiss...@linux.ibm.com> * config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator): Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. (vsx_extract_v4sf_var): Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. (vsx_extract_<mode>_var, VSX_EXTRACT_I iterator):Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. (vsx_extract_<mode>_<VS_scalar>mode_var): Use 'Q' for memory constraints because we need to do an X-FORM load with the variable index. Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 279597) +++ gcc/config/rs6000/vsx.md (working copy) @@ -3245,10 +3245,11 @@ (define_insn "vsx_vslo_<mode>" "vslo %0,%1,%2" [(set_attr "type" "vecperm")]) -;; Variable V2DI/V2DF extract +;; Variable V2DI/V2DF extract. Use 'Q' for the memory because we will +;; ultimately have to convert the address into base + index. (define_insn_and_split "vsx_extract_<mode>_var" [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r") - (unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m") + (unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) @@ -3318,7 +3319,7 @@ (define_insn_and_split "*vsx_extract_v4s ;; Variable V4SF extract (define_insn_and_split "vsx_extract_v4sf_var" [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r") - (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m") + (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) @@ -3681,7 +3682,7 @@ (define_insn_and_split "*vsx_extract_<mo (define_insn_and_split "vsx_extract_<mode>_var" [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") (unspec:<VS_scalar> - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,r,&b")) @@ -3701,7 +3702,7 @@ (define_insn_and_split "*vsx_extract_<mo [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") (zero_extend:<VS_scalar> (unspec:<VSX_EXTRACT_I:VS_scalar> - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT))) (clobber (match_scratch:DI 3 "=r,r,&b")) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797