On Wed, Jul 17, 2019 at 05:22:38PM +0800, Kewen.Lin wrote: > Good question, the vector rotation for byte looks like (others are similar): > > vrlb VRT,VRA,VRB > do i=0 to 127 by 8 > sh = (VRB)[i+5:i+7] > VRT[i:i+7] = (VRA)[i:i+7] <<< sh > end > > It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) > bits] > So it's fine even operands[2] are zero or negative. > > Take byte as example, prec is 8. > - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0. > - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the > original > rot count 9 was parsed as 1 (in 3 bits range). > - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original > rot count was parsed as 7 (in 3 bits range). > > It's a good idea to just use negate! Thanks!!
Ok, so the hw for the vectors truncates, the question is how happy will the RTL generic code with that. rs6000 defines SHIFT_COUNT_TRUNCATED to 0, so the generic code can't assume there is a truncation going on. Either it will punt some optimizations when it sees say negative or too large shift/rotate count (that is the better case), or it might just assume there is UB. As the documentation says, for zero SHIFT_COUNT_TRUNCATED there is an option of having a pattern with the truncation being explicit, so in your case *vrotl<mode>3_and or similar that would have an explicit AND on the shift operand with say {7, 7...} vector for the byte shifts etc. but emit in the end identical instruction to vrotl<mode>3 and use the MINUS + that pattern for vrotr<mode>3. If the rotate argument is CONST_VECTOR, you can of course just canonicalize, i.e. perform -operands[2] & mask, fold that into constant and keep using vrotl<mode>3 in that case. Jakub