Re: [Qemu-devel] [PATCH v2 00/10] tcg vector improvements

Mark Cave-Ayland Tue, 05 Feb 2019 13:45:00 -0800

On 23/01/2019 05:09, Richard Henderson wrote:

> On 1/7/19 5:11 AM, Mark Cave-Ayland wrote:
>> #7  0x0000555555852e53 in expand_4_vec (vece=2, dofs=197872,
>> aofs=198288, bofs=197776, cofs=197792, oprsz=16, tysz=16,
>> type=TCG_TYPE_V128, write_aofs=true, fni=0x55555599182a
>> <gen_vaddsws_vec>) at
>> /home/hsp/src/qemu-altivec-55/tcg/tcg-op-gvec.c:903
>>         t0 = 0x1848
>>         t1 = 0x1880
>>         t2 = 0x18b8
>>         t3 = 0x18f0
>>         i = 0
>> #8  0x0000555555853cc4 in tcg_gen_gvec_4 (dofs=197872, aofs=198288,
>> bofs=197776, cofs=197792, oprsz=16, maxsz=16, g=0x5555562d33c0 <g>) at
>> /home/hsp/src/qemu-altivec-55/tcg/tcg-op-gvec.c:1211
>>         type = TCG_TYPE_V128
>>         some = 21845
>>         __PRETTY_FUNCTION__ = "tcg_gen_gvec_4"
>>         __func__ = "tcg_gen_gvec_4"
>> #9  0x0000555555991987 in gen_vaddsws (ctx=0x7fffe3ffe5f0) at
>> /home/hsp/src/qemu-altivec-55/target/ppc/translate/vmx-impl.inc.c:597
>>         g = {fni8 = 0x0, fni4 = 0x0, fniv = 0x55555599182a
>> <gen_vaddsws_vec>, fno = 0x5555559601a1 <gen_helper_vaddsws>, opc =
>> INDEX_op_add_vec, data = 0, vece = 2 '\002', prefer_i64 = false,
>> write_aofs = true}
>>
>>
>> Certainly according to patch 7 of the series only 8-bit and 16-bit accesses 
>> are
>> supported on i386 hosts, but shouldn't we be falling back to the previous
>> implementations rather than hitting an assert()?
> 
> In here:
> 
> #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
> static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
>                                          TCGv_vec sat, TCGv_vec a,      \
>                                          TCGv_vec b)                    \
> {                                                                       \
>     TCGv_vec x = tcg_temp_new_vec_matching(t);                          \
>     glue(glue(tcg_gen_, NORM), _vec)(VECE, x, a, b);                    \
>     glue(glue(tcg_gen_, SAT), _vec)(VECE, t, a, b);                     \
>     tcg_gen_cmp_vec(TCG_COND_NE, VECE, x, x, t);                        \
>     tcg_gen_or_vec(VECE, sat, sat, x);                                  \
>     tcg_temp_free_vec(x);                                               \
> }                                                                       \
> static void glue(gen_, NAME)(DisasContext *ctx)                         \
> {                                                                       \
>     static const GVecGen4 g = {                                         \
>         .fniv = glue(glue(gen_, NAME), _vec),                           \
>         .fno = glue(gen_helper_, NAME),                                 \
>         .opc = glue(glue(INDEX_op_, NORM), _vec),                       \
> 
> s/NORM/SAT/, so that we query whether the saturated opcode is supported.  The
> normal arithmetic, cmp, and or opcodes are mandatory; we don't need to do
> anything with those.


Now that this and the other pre-requisite patches have been merged into master, 
I've
rebased the outstanding PPC parts of your "tcg, target/ppc vector improvements" 
on
master including the above fix and pushed the result to
https://github.com/mcayland/qemu/commits/ppc-altivec-v6.

The good news is that the graphics corruption I originally noticed caused by the
patch introducing the saturating add/sub vector ops has now gone, and with my
little-endian vsplt fix included then both OS X and MacOS 9 appear to run 
without any
obvious issues on an x86 host, and certainly feel smoother compared to before.

The only minor question I had with the patchset in its current form is whether 
to use
the new VsrD() macro for vscr_sat, or whether we don't really care enough?


ATB,

Mark.

Re: [Qemu-devel] [PATCH v2 00/10] tcg vector improvements

Reply via email to