On 14/11/2019 12:43, Kwok Cheung Yeung wrote:
Hello
This patch fixes an issue seen in the following test cases on AMD GCN:
libgomp.oacc-fortran/gemm.f90
libgomp.oacc-fortran/gemm-2.f90
libgomp.c/for-5-test_ttdpfs_ds128_auto.c
libgomp.c/for-5-test_ttdpfs_ds128_guided32.c
libgomp.c/for-5-test_ttdpfs_ds128_runtime.c
libgomp.c/for-5-test_ttdpfs_ds128_static.c
libgomp.c/for-5-test_ttdpfs_ds128_static32.c
libgomp.c/for-6-test_tdpfs_ds128_auto.c
libgomp.c/for-6-test_tdpfs_ds128_guided32.c
libgomp.c/for-6-test_tdpfs_ds128_runtime.c
libgomp.c/for-6-test_tdpfs_ds128_static.c
libgomp.c/for-6-test_tdpfs_ds128_static32.c
libgomp.c-c++-common/for-5.c
libgomp.c-c++-common/for-6.c
The compiler is generating code like this:
v_cmp_gt_i32 vcc, s3, v1
v_writelane_b32 v3, vcc_lo, 0 ; Move VCC to v[3:4]
v_writelane_b32 v4, vcc_hi, 0
...
v_cmp_eq_f32 vcc, v20, v0 ; Clobber VCC
...
; Move old VCC_LO into s62 - this is okay as v3 has not
; been clobbered.
v_readlane_b32 s62, v3, 0
; Move old VCC_HI into s63 - this is _not_ okay as VCC_HI
; has been clobbered by the intervening v_cmp_eq_f32 instruction.
s_mov_b32 s63, vcc_hi
GCC is failing to notice that the v_cmp_eq_f32 instruction clobbers both
vcc_lo and vcc_hi. This is because gcn_hard_regno_nregs uses
REGNO_REG_CLASS to determine the register class of the queried hard reg,
but vcc_lo and vcc_hi are classified as GENERAL_REGS by
gcn_regno_reg_class. gcn_hard_regno_nregs therefore returns 1 for the
BImode store into vcc, so vcc_hi is regarded as untouched by the operation.
This is fixed by making gcn_regno_reg_class classify vcc_lo and vcc_hi
into VCC_CONDITIONAL_REG (REGNO_REG_CLASS is supposed to return the
smallest class anyway). I have also added a spill class for
VCC_CONDITIONAL_REG (into SGPR_REGS) to avoid expensive spills into memory.
Built for and tested on the AMD GCN target with no regressions.
Okay for trunk?
Kwok
2019-11-14 Kwok Cheung Yeung <k...@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_regno_reg_class): Return VCC_CONDITIONAL_REG
register class for VCC_LO and VCC_HI.
(gcn_spill_class): Use SGPR_REGS to spill registers in
VCC_CONDITIONAL_REG.
---
gcc/config/gcn/gcn.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 976843b..dd89c26 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -470,6 +470,9 @@ gcn_regno_reg_class (int regno)
{
case SCC_REG:
return SCC_CONDITIONAL_REG;
+ case VCC_LO_REG:
+ case VCC_HI_REG:
+ return VCC_CONDITIONAL_REG;
case VCCZ_REG:
return VCCZ_CONDITIONAL_REG;
case EXECZ_REG:
@@ -637,7 +640,8 @@ gcn_can_split_p (machine_mode, rtx op)
static reg_class_t
gcn_spill_class (reg_class_t c, machine_mode /*mode */ )
{
- if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c))
+ if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c)
+ || c == VCC_CONDITIONAL_REG)
return SGPR_REGS;
else
return NO_REGS;
OK.
Andrew