> On 16 Sep 2024, at 16:32, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> "Pengxuan Zheng (QUIC)" <quic_pzh...@quicinc.com> writes:
>>> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng
>>> <quic_pzh...@quicinc.com> wrote:
>>>> 
>>>> SVE's INDEX instruction can be used to populate vectors by values
>>>> starting from "base" and incremented by "step" for each subsequent
>>>> value. We can take advantage of it to generate vector constants if
>>>> TARGET_SVE is available and the base and step values are within [-16, 15].
>>> 
>>> Are there multiplication by or addition of scalar immediate instructions to
>>> enhance this with two-instruction sequences?
>> 
>> No, Richard, I can't think of any equivalent two-instruction sequences.
> 
> There are some.  E.g.:
> 
>     { 16, 17, 18, 19, ... }
> 
> could be:
> 
>        index   z0.b, #0, #1
>        add     z0.b, z0.b, #16
> 
> or, alternatively:
> 
>        mov     w0, #16
>        index   z0.b, w0, #1
> 
> But these cases are less obviously a win, so I think it's ok to handle
> single instructions only for now.

(Not related to this patch, this work is great, thanks Pengxuan!)
Looking at some SWOGs like for Neoverse V2 it looks like the first sequence is 
preferable.
On that core the INDEX-immediates-only operation has latency 4 and throughput 2 
and the SVE ADD is as cheap as SIMD operations can be on that core.
But in the second sequence the INDEX-reg-operand has latency 7 and throughput 1 
as it seems to treat it as a GP <-> SIMD transfer of some sort.
We may encounter a situation in the future where we’ll want to optimize the 
second sequence (if it comes from intrinsics code for example) into the first.
Thanks,
Kyrill


> 
> The patch is ok for trunk, thanks, but:
> 
>>>> @@ -22991,7 +22991,7 @@ aarch64_simd_valid_immediate (rtx op,
>>> simd_immediate_info *info,
>>>>   if (CONST_VECTOR_P (op)
>>>>       && CONST_VECTOR_DUPLICATE_P (op))
>>>>     n_elts = CONST_VECTOR_NPATTERNS (op);
>>>> -  else if ((vec_flags & VEC_SVE_DATA)
>>>> +  else if (which == AARCH64_CHECK_MOV && TARGET_SVE
>>>>           && const_vec_series_p (op, &base, &step))
> 
> ...the convention is to have one && condition per line if the whole
> expression doesn't fit on a single line:
> 
>  else if (which == AARCH64_CHECK_MOV
>           && TARGET_SVE
>           && const_vec_series_p (op, &base, &step))
> 
> Richard

Reply via email to