> Hmm, I think it's the other way around and it is being shifted left.  In that 
> case we're only keeping the lower bits and the "overflow bits" don't matter.  
> That means we should indeed be OK here.

Yes, will add some comments here for series2.

> I guess the worst that can happen theoretically is i = 2 ^ 32 - 1,
> step = 2 ^ 32 - 1, and base = 2 ^ 32 - 1?  But i is bound by our "largest"
> mode so IMHO a 64-bit value should be sufficient.

> We always have at least two elements BTW, no need to check that.

> Please also adjust the comment above the code regarding overflow.

Sure thing, your are right, the last element is good enough here.

> Have you checked how the more generic version (last else branch) compares?  I 
> could imagine it's close or even better for our case.

If you mean the last branch of interleave, I think it is safe because it 
leverage the
merge to generate the result, instead of IOR. Only the IOR for final result have
this issue.

Pan

-----Original Message-----
From: Robin Dapp <rdapp....@gmail.com> 
Sent: Thursday, February 27, 2025 12:23 AM
To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp <rdapp....@gmail.com>
Subject: Re: [PATCH v2] RISC-V: Fix bug for expand_const_vector interleave 
[PR118931]

> Thanks Robin.
>
>> - IMHO we need to check both series for overflow, if step2 overflows in the
>> smaller type isn't the result equally wrong?
>
> The series2 will shift right before IOR, thus the overflow bits never effect
> on the final result.
> For example, the series2 will be similar as below after shift.
>
> v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}

Hmm, I think it's the other way around and it is being shifted left.  In that 
case we're only keeping the lower bits and the "overflow bits" don't matter.  
That means we should indeed be OK here.

> For i32 interleave, we will extend to i64 for series1 and series2, I think we
> can design
> a series like base + i * step with last element overflow to bit 65 but have
> bits 32-64 all zero, and then
> the element before the last one will have overflow bits pollute the result.
>
> It seems checking the last 2 elements is good enough here.

I guess the worst that can happen theoretically is i = 2 ^ 32 - 1,
step = 2 ^ 32 - 1, and base = 2 ^ 32 - 1?  But i is bound by our "largest"
mode so IMHO a 64-bit value should be sufficient.

We always have at least two elements BTW, no need to check that.

Please also adjust the comment above the code regarding overflow.

>> We need to either forbid variable-length vectors for this scheme altogether 
>> or
>> determine the maximum runtime number of elements possible for the current 
>> mode.
>> We don't support more than 65536 bits in a vector which would "naturally" 
>> limit
>> the number of elements.  I suppose variable-length vectors are not too common
>> for this scheme and falling back to a less optimized one shouldn't be too
>> costly.  That's just a gut feeling, though.
>
> Make sense, will fall back to less optimized for VLA.

Have you checked how the more generic version (last else branch) compares?  I 
could imagine it's close or even better for our case.

Reply via email to