On 23/01/2019 05:09, Richard Henderson wrote: > On 1/7/19 5:11 AM, Mark Cave-Ayland wrote: >> #7 0x0000555555852e53 in expand_4_vec (vece=2, dofs=197872, >> aofs=198288, bofs=197776, cofs=197792, oprsz=16, tysz=16, >> type=TCG_TYPE_V128, write_aofs=true, fni=0x55555599182a >> <gen_vaddsws_vec>) at >> /home/hsp/src/qemu-altivec-55/tcg/tcg-op-gvec.c:903 >> t0 = 0x1848 >> t1 = 0x1880 >> t2 = 0x18b8 >> t3 = 0x18f0 >> i = 0 >> #8 0x0000555555853cc4 in tcg_gen_gvec_4 (dofs=197872, aofs=198288, >> bofs=197776, cofs=197792, oprsz=16, maxsz=16, g=0x5555562d33c0 <g>) at >> /home/hsp/src/qemu-altivec-55/tcg/tcg-op-gvec.c:1211 >> type = TCG_TYPE_V128 >> some = 21845 >> __PRETTY_FUNCTION__ = "tcg_gen_gvec_4" >> __func__ = "tcg_gen_gvec_4" >> #9 0x0000555555991987 in gen_vaddsws (ctx=0x7fffe3ffe5f0) at >> /home/hsp/src/qemu-altivec-55/target/ppc/translate/vmx-impl.inc.c:597 >> g = {fni8 = 0x0, fni4 = 0x0, fniv = 0x55555599182a >> <gen_vaddsws_vec>, fno = 0x5555559601a1 <gen_helper_vaddsws>, opc = >> INDEX_op_add_vec, data = 0, vece = 2 '\002', prefer_i64 = false, >> write_aofs = true} >> >> >> Certainly according to patch 7 of the series only 8-bit and 16-bit accesses >> are >> supported on i386 hosts, but shouldn't we be falling back to the previous >> implementations rather than hitting an assert()? > > In here: > > #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3) \ > static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t, \ > TCGv_vec sat, TCGv_vec a, \ > TCGv_vec b) \ > { \ > TCGv_vec x = tcg_temp_new_vec_matching(t); \ > glue(glue(tcg_gen_, NORM), _vec)(VECE, x, a, b); \ > glue(glue(tcg_gen_, SAT), _vec)(VECE, t, a, b); \ > tcg_gen_cmp_vec(TCG_COND_NE, VECE, x, x, t); \ > tcg_gen_or_vec(VECE, sat, sat, x); \ > tcg_temp_free_vec(x); \ > } \ > static void glue(gen_, NAME)(DisasContext *ctx) \ > { \ > static const GVecGen4 g = { \ > .fniv = glue(glue(gen_, NAME), _vec), \ > .fno = glue(gen_helper_, NAME), \ > .opc = glue(glue(INDEX_op_, NORM), _vec), \ > > s/NORM/SAT/, so that we query whether the saturated opcode is supported. The > normal arithmetic, cmp, and or opcodes are mandatory; we don't need to do > anything with those.
Now that this and the other pre-requisite patches have been merged into master, I've rebased the outstanding PPC parts of your "tcg, target/ppc vector improvements" on master including the above fix and pushed the result to https://github.com/mcayland/qemu/commits/ppc-altivec-v6. The good news is that the graphics corruption I originally noticed caused by the patch introducing the saturating add/sub vector ops has now gone, and with my little-endian vsplt fix included then both OS X and MacOS 9 appear to run without any obvious issues on an x86 host, and certainly feel smoother compared to before. The only minor question I had with the patchset in its current form is whether to use the new VsrD() macro for vscr_sat, or whether we don't really care enough? ATB, Mark.