On 2/26/19 3:38 AM, David Hildenbrand wrote: > +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o) > +{ > + const uint16_t i2 = get_field(s->fields, i2); > + TCGv_i32 ones = tcg_const_i32(-1u); > + TCGv_i32 zeroes = tcg_const_i32(0); > + int i; > + > + for (i = 0; i < 16; i++) { > + if (extract32(i2, 15 - i, 1)) { > + write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8); > + } else { > + write_vec_element_i32(zeroes, get_field(s->fields, v1), i, MO_8); > + } > + } > + tcg_temp_free_i32(ones); > + tcg_temp_free_i32(zeroes); > + return DISAS_NEXT; > +}
While this works, it's not in the spirit of > Programming Note: VECTOR GENERATE BYTE > MASK is the preferred method for setting a vector > register to all zeroes or ones. Better, I think, with uint64_t generate_byte_mask(uint8_t mask) { uint64_t r = 0; int i; for (i = 0; i < 8; i++) { if ((mask >> i) & 1) { r |= 0xffull << (i * 8); } } return r; } if (i2 == (i2 & 0xff) * 0x0101) { /* masks for both halves of the vector are the same. trust tcg to produce a good constant loading. */ tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16, generate_byte_mask(i2 & 0xff)); } else { TCGv_i64 t = tcg_temp_new_i64(); tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8)); write_vec_element_i64(t, v1, 0, MO_64); tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff)); write_vec_element_i64(t, v1, 1, MO_64); tcg_temp_free_i64(); } Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be decomposed further, which will eventually bottom out at vpxor %xmm0,%xmm0,%xmm0 // all zeros vpcmpeq %xmm0,%xmm0,%xmm0 // all ones and even more interesting combinations for tcg/aarch64. r~