Richard Sandiford <richard.sandif...@arm.com> writes: > luoxhu <luo...@linux.ibm.com> writes: >> Hi, >> >> On 2020/7/22 19:05, Richard Sandiford wrote: >>> This wasn't really what I meant. Using subregs is fine, but I was >>> thinking of: >>> >>> /* Also try a wider mode if the necessary punning is either not >>> desirable or not possible. */ >>> if (!CONSTANT_P (store_info->rhs) >>> && !targetm.modes_tieable_p (new_mode, store_mode)) >>> continue; >>> >>> if (multiple_p (shift, GET_MODE_BITSIZE (new_mode))) >>> { >>> /* Try to implement the shift using a subreg. */ >>> poly_int64 offset = subreg_offset_from_lsb (new_mode, store_mode, >>> shift); >>> rhs_subreg = simplify_gen_subreg (new_mode, store_info->rhs, >>> store_mode, offset); >>> if (rhs_subreg) >>> { >>> ... >>> break; >>> } >>> } >>> >>> where the rhs_subreg is from your original patch. >>> >>> The multiple_p should be that way round: the shift needs to be a >>> multiple of the new_mode for the subreg to be valid. >>> >>> I think this should also avoid the BITS_PER_WORD problem. On the >>> other hand, I agree BITS_PER_UNIT isn't a very sensible limit if >>> we're using subregs, so maybe moving it to after the multiple_p >>> if block would still make sense. >>> >> >> Thanks, I took that rhs_subreg part back for the v3 patch and updated a bit >> based on your prototype, shift should be put in op1 as multiple_p requires >> op0 >= op1. >> >> Then, new_mode is still TImode same to store_mode, offset will return 8 when >> shift is 64, simplify_gen_subreg needs an additional inner_mode(DImode) >> generated from "smallest_int_mode_for_size (shift)" to get rhs_subreg, >> otherwise it will return NULL if new_mode is equal to store_mode. >> >> Lastly, move the BITS_PER_UNIT after multiple_p as it still need generate >> shift_seq for other circumstances. :) > > I don't understand why my version doesn't work though. The point > is that we're using the containing: > > FOR_EACH_MODE_FROM (new_mode_iter, > smallest_int_mode_for_size (access_size * BITS_PER_UNIT)) > > to find a suitable mode. In the existing code it's searching for a mode > that is suitable for the shift. In the new code it's finding a mode that > is suitable for the outer mode of the subreg (hence using new_mode as the > first argument to simplify_gen_subreg above). It shouldn't be necessary > to use smallest_int_mode_for_size to find a different mode.
I now realise the reason is that the starting mode is too wide. I think we should fix that by doing: FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT) { … and then add: if (maybe_lt (GET_MODE_SIZE (new_mode), access_size)) continue; after your optimisation, so that the shift code proper still only considers modes that are wide enough to hold the unshifted value. I don't think there are any efficiency concerns with that, since smallest_int_mode_for_size does its own similar iteration internally. Sorry for not picking up on that first time. Thanks, Richard