Richard Sandiford <richard.sandif...@arm.com> writes:
> luoxhu <luo...@linux.ibm.com> writes:
>> Hi,
>>
>> On 2020/7/22 19:05, Richard Sandiford wrote:
>>> This wasn't really what I meant.  Using subregs is fine, but I was
>>> thinking of:
>>> 
>>>        /* Also try a wider mode if the necessary punning is either not
>>>      desirable or not possible.  */
>>>        if (!CONSTANT_P (store_info->rhs)
>>>       && !targetm.modes_tieable_p (new_mode, store_mode))
>>>     continue;
>>> 
>>>        if (multiple_p (shift, GET_MODE_BITSIZE (new_mode)))
>>>     {
>>>       /* Try to implement the shift using a subreg.  */
>>>       poly_int64 offset = subreg_offset_from_lsb (new_mode, store_mode,
>>>                                                   shift);
>>>       rhs_subreg = simplify_gen_subreg (new_mode, store_info->rhs,
>>>                                         store_mode, offset);
>>>       if (rhs_subreg)
>>>         {
>>>           ...
>>>           break;
>>>         }
>>>     }
>>> 
>>> where the rhs_subreg is from your original patch.
>>> 
>>> The multiple_p should be that way round: the shift needs to be a
>>> multiple of the new_mode for the subreg to be valid.
>>> 
>>> I think this should also avoid the BITS_PER_WORD problem.  On the
>>> other hand, I agree BITS_PER_UNIT isn't a very sensible limit if
>>> we're using subregs, so maybe moving it to after the multiple_p
>>> if block would still make sense.
>>> 
>>
>> Thanks, I took that rhs_subreg part back for the v3 patch and updated a bit
>> based on your prototype, shift should be put in op1 as multiple_p requires
>> op0 >= op1. 
>>
>> Then, new_mode is still TImode same to store_mode, offset will return 8 when
>> shift is 64,  simplify_gen_subreg needs an additional inner_mode(DImode) 
>> generated from "smallest_int_mode_for_size (shift)" to get rhs_subreg, 
>> otherwise it will return NULL if new_mode is equal to store_mode.
>>
>> Lastly, move the BITS_PER_UNIT after multiple_p as it still need generate
>> shift_seq for other circumstances. :)
>
> I don't understand why my version doesn't work though.  The point
> is that we're using the containing:
>
>   FOR_EACH_MODE_FROM (new_mode_iter,
>                     smallest_int_mode_for_size (access_size * BITS_PER_UNIT))
>
> to find a suitable mode.  In the existing code it's searching for a mode
> that is suitable for the shift.  In the new code it's finding a mode that
> is suitable for the outer mode of the subreg (hence using new_mode as the
> first argument to simplify_gen_subreg above).  It shouldn't be necessary
> to use smallest_int_mode_for_size to find a different mode.

I now realise the reason is that the starting mode is too wide.
I think we should fix that by doing:

  FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT)
    {
      …

and then add:

      if (maybe_lt (GET_MODE_SIZE (new_mode), access_size))
        continue;

after your optimisation, so that the shift code proper still only
considers modes that are wide enough to hold the unshifted value.

I don't think there are any efficiency concerns with that, since
smallest_int_mode_for_size does its own similar iteration internally.

Sorry for not picking up on that first time.

Thanks,
Richard

Reply via email to