On 13.04.19 02:29, Richard Henderson wrote: > On 4/11/19 12:08 AM, David Hildenbrand wrote: >> +static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c) >> +{ >> + TCGv_i32 t0 = tcg_temp_new_i32(); >> + TCGv_i32 t1 = tcg_temp_new_i32(); >> + >> + tcg_gen_andc_i32(t0, a, b); >> + tcg_gen_rotli_i32(t1, a, c & 31); >> + tcg_gen_and_i32(t1, t1, b); >> + tcg_gen_or_i32(d, t0, t1); > > The ANDC and ROTL look to be in the wrong order. > > "For each bit in the third operand (b) that is one, > the corresponding bit *of the rotated elements* in > the second operand replaces the corresponding bit in > the first operand". > > I think you need > > tcg_gen_rotli_i32(a, a, c & 31); > tcg_gen_and_i32(a, a, b); > tcg_gen_andc_i32(d, d, b); > tcg_gen_or_i32(d, d, a); > > with > > { .fni4 = gen_rim_32, .load_dest = true }, > >> + const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i); >> \ >> + const uint##BITS##_t mask = s390_vec_read_element##BITS(v3, i); >> \ >> + const uint##BITS##_t d = (a & ~mask) | (rotl##BITS(a, count) & mask); >> \ > > Again, this seems to be missing the insert into "the first operand", i.e. > loading from v1 as well.
Yes indeed, I misinterpreted/misread the PoP. Nice catch! (as usual, excellent review) > > > r~ > -- Thanks, David / dhildenb