On Wed, Jul 17, 2019 at 05:22:38PM +0800, Kewen.Lin wrote:
> Good question, the vector rotation for byte looks like (others are similar):
> 
> vrlb VRT,VRA,VRB
>   do i=0 to 127 by 8
>    sh = (VRB)[i+5:i+7]
>    VRT[i:i+7] = (VRA)[i:i+7] <<< sh
>   end
> 
> It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) 
> bits]
> So it's fine even operands[2] are zero or negative.
> 
> Take byte as example, prec is 8.
>   - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0.
>   - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the 
> original 
>     rot count 9 was parsed as 1 (in 3 bits range).
>   - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original
>     rot count was parsed as 7 (in 3 bits range).
> 
> It's a good idea to just use negate!  Thanks!!

Ok, so the hw for the vectors truncates, the question is how happy will the
RTL generic code with that.  rs6000 defines SHIFT_COUNT_TRUNCATED to 0,
so the generic code can't assume there is a truncation going on.  Either it
will punt some optimizations when it sees say negative or too large
shift/rotate count (that is the better case), or it might just assume there
is UB.
As the documentation says, for zero SHIFT_COUNT_TRUNCATED there is an option
of having a pattern with the truncation being explicit, so in your case
*vrotl<mode>3_and or similar that would have an explicit AND on the shift
operand with say {7, 7...} vector for the byte shifts etc. but emit in the
end identical instruction to vrotl<mode>3 and use the MINUS + that pattern
for vrotr<mode>3.  If the rotate argument is CONST_VECTOR, you can of course
just canonicalize, i.e. perform -operands[2] & mask, fold that into constant
and keep using vrotl<mode>3 in that case.

        Jakub

Reply via email to