On 8/21/19 10:29 AM, Jan Bobek wrote: > + for (intptr_t i = 0; i * sizeof(uint8_t) < oprsz; ++i) { > + const uint8_t t = a->B(i) & (1 << 7); > + ret |= i < 8 ? t >> (7 - i) : t << (i - 7);
You can avoid this variable shift by doing uint32_t t = a->B(i) >> 7; ret |= t << i; > +uint64_t glue(helper_pmovmskbq, SUFFIX)(Reg *a, uint32_t desc) > +{ > + return glue(helper_pmovmskbd, SUFFIX)(a, desc); > } ... > +DEF_GEN_INSN2_GVEC(vpmovmskb, Gd, Uqq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, > pmovmskbd_xmm) > +DEF_GEN_INSN2_GVEC(vpmovmskb, Gq, Uqq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, > pmovmskbq_xmm) What is the difference between these two? Given that we aren't attempting avx512, uint32_t is sufficient for all of the bytes of a YMM register. I have a feeling that some of this should simply use target_ulong, so that a direct assignment to the general register can be done without extra extensions within the generated code. r~