Sorry for the slow reply. luoxhu <luo...@linux.ibm.com> writes: > Hi Richard, > > This is the updated version that could pass all regression test on > Power9-LE. > Just need another "maybe_lt (GET_MODE_SIZE (new_mode), access_size)" > before generating shift for store_info->const_rhs to ensure correct > constant is generated, take testsuite/gfortran1/equiv_2.x for example: > > equiv_2.x-equiv_2.f90.264r.cse2: > 5: r119:DI=0x6867666564636261 > 6: [sfp:DI+0x20]=r119:DI > 18: r123:SI=0x65646362 > 19: r124:SI=[sfp:DI+0x21] > 20: r125:CC=cmp(r124:SI,r123:SI) > > Bad code due to get SImode subreg0 and shift right 8 bits got 0x646362 in > insn #32, equiv_2.x-equiv_2.f90.265r.dse1: > 5: r119:DI=0x6867666564636261 > 32: r126:SI=0x646362 > 6: [sfp:DI+0x20]=r119:DI > 18: r123:SI=0x65646362 > 19: r124:SI=r126:SI > 20: r125:CC=cmp(r124:SI,r123:SI) > > Good code, get 0x65646362 in insn #32, equiv_2.x-equiv_2.f90.265r.dse1: > 5: r119:DI=0x6867666564636261 > 32: r126:SI=0x65646362 > 6: [sfp:DI+0x20]=r119:DI > 18: r123:SI=0x65646362 > 19: r124:SI=r126:SI > 20: r125:CC=cmp(r124:SI,r123:SI) > > So the new_mode size should also be GE than access_size for const_rhs shift. > It seems a bit more complicated now...
Yeah. The structure of this code makes it hard to make changes to it. :-/ Since the const_rhs code is only trying to fold the operation at compile time, without regard for what the target supports at runtime, I think it *does* make sense to use smallest_int_mode_for_size there. I.e. we should be able to hoist the code out of the loop and do: if (store_info->const_rhs) { auto new_mode = smallest_int_mode_for_size (access_size * BITS_PER_UNIT); auto byte = subreg_lowpart_offset (new_mode, store_mode); rtx ret = simplify_subreg (new_mode, store_info->const_rhs, store_mode, byte); if (ret && CONSTANT_P (ret)) … } if (require_cst) return NULL_RTX; …the current loop… I don't see any reason to restrict the const_rhs case to BITS_PER_WORD. And if the simplify_subreg above fails for new_mode, I can't think it would succeed for a wider mode (which would have to provide *more* constant data than the failed subreg). > @@ -1728,8 +1728,7 @@ find_shift_sequence (poly_int64 access_size, > the machine. */ > > opt_scalar_int_mode new_mode_iter; > - FOR_EACH_MODE_FROM (new_mode_iter, > - smallest_int_mode_for_size (access_size * BITS_PER_UNIT)) > + FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT) Sorry, I now realise it would be better to start at: smallest_int_mode_for_size (GET_MODE_BITSIZE (read_mode)) Thanks, Richard