On 26.02.19 20:23, David Hildenbrand wrote: > On 26.02.19 20:12, Richard Henderson wrote: >> On 2/26/19 3:38 AM, David Hildenbrand wrote: >>> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o) >>> +{ >>> + const uint16_t i2 = get_field(s->fields, i2); >>> + TCGv_i32 ones = tcg_const_i32(-1u); >>> + TCGv_i32 zeroes = tcg_const_i32(0); >>> + int i; >>> + >>> + for (i = 0; i < 16; i++) { >>> + if (extract32(i2, 15 - i, 1)) { >>> + write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8); >>> + } else { >>> + write_vec_element_i32(zeroes, get_field(s->fields, v1), i, >>> MO_8); >>> + } >>> + } >>> + tcg_temp_free_i32(ones); >>> + tcg_temp_free_i32(zeroes); >>> + return DISAS_NEXT; >>> +} >> >> While this works, it's not in the spirit of >> >>> Programming Note: VECTOR GENERATE BYTE >>> MASK is the preferred method for setting a vector >>> register to all zeroes or ones. > > Good point, I skipped that note so far. > >> >> Better, I think, with > > Many instructions to implement, so little time to fine tune stuff so > far. However I have tests for VGBM, so I can easily get it working. Will > play with it! > >> >> uint64_t generate_byte_mask(uint8_t mask) >> { >> uint64_t r = 0; >> int i; >> for (i = 0; i < 8; i++) { >> if ((mask >> i) & 1) { >> r |= 0xffull << (i * 8); >> } >> } >> return r; >> } >> >> if (i2 == (i2 & 0xff) * 0x0101) { >> /* masks for both halves of the vector are the same. >> trust tcg to produce a good constant loading. */ >> tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16, >> generate_byte_mask(i2 & 0xff)); >> } else { >> TCGv_i64 t = tcg_temp_new_i64(); >> tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8)); >> write_vec_element_i64(t, v1, 0, MO_64); >> tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff)); >> write_vec_element_i64(t, v1, 1, MO_64); >> tcg_temp_free_i64(); >> } >> >> Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be >> decomposed further, which will eventually bottom out at >> >> vpxor %xmm0,%xmm0,%xmm0 // all zeros >> vpcmpeq %xmm0,%xmm0,%xmm0 // all ones >>
Just tested with minor adaptions, works like a charm! -- Thanks, David / dhildenb