On 5/15/19 1:31 PM, David Hildenbrand wrote: > +#define DEF_VFEE(BITS) >
Same comment wrt inline functions applies. Here, because there's one result, writing to byte 7, I wonder if it isn't clearer to write the loop first_equal = n; first_zero = n; for (i = n - 1; i >= 0; --i) { if (data1 == data2) { first_equal = i; } if (data1 == 0) { first_zero = i; } } // As an aside, there are bit tricks for the above, // but let's stay simple(r) for now. if (zs) { if (first_equal < first_zero) { cc = (first_zero < n ? 2 : 1); } else { first_equal = first_zero; cc = (first_zero < n ? 0 : 3); } } else { cc = (first_equal < n ? 1 : 3); } s390_vec_write_element64(v1, 0, first_equal); s390_vec_write_element64(v1, 1, 0); Note that you don't need S390Vector tmp, since the result is written after all of the inputs are consumed. r~