hexagon) add Hexagon Vector eXtensions (HVX) to core

Taylor Simpson Sun, 25 Jul 2021 21:02:56 -0700


> -----Original Message-----
> From: Richard Henderson <richard.hender...@linaro.org>
> Sent: Sunday, July 25, 2021 8:08 AM
> To: Taylor Simpson <tsimp...@quicinc.com>; qemu-devel@nongnu.org
> Cc: phi...@redhat.com; a...@rev.ng; Brian Cain <bc...@quicinc.com>;
> peter.mayd...@linaro.org
> Subject: Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon
> Vector eXtensions (HVX) to core
> 
> On 7/5/21 1:34 PM, Taylor Simpson wrote:
> > HVX is a set of wide vector instructions.  Machine state includes
> >      vector registers (VRegs)
> >      vector predicate registers (QRegs)
> >      temporary registers for intermediate values
> >      store buffer (masked stores and scatter/gather)
> >
> > Signed-off-by: Taylor Simpson <tsimp...@quicinc.com>
> > ---
> >   target/hexagon/cpu.h            | 35 ++++++++++++++++-
> >   target/hexagon/hex_arch_types.h |  5 +++
> >   target/hexagon/insn.h           |  3 ++
> >   target/hexagon/internal.h       |  3 ++
> >   target/hexagon/mmvec/mmvec.h    | 83
> +++++++++++++++++++++++++++++++++++++++++
> >   target/hexagon/cpu.c            | 72
> ++++++++++++++++++++++++++++++++++-
> >   6 files changed, 198 insertions(+), 3 deletions(-)
> >   create mode 100644 target/hexagon/mmvec/mmvec.h
> >
> > diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index
> > 2855dd3..0b377c3 100644
> > --- a/target/hexagon/cpu.h
> > +++ b/target/hexagon/cpu.h
> > @@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
> >   #include "qemu-common.h"
> >   #include "exec/cpu-defs.h"
> >   #include "hex_regs.h"
> > +#include "mmvec/mmvec.h"
> >
> >   #define NUM_PREGS 4
> >   #define TOTAL_PER_THREAD_REGS 64
> > @@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
> >   #define STORES_MAX 2
> >   #define REG_WRITES_MAX 32
> >   #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
> > +#define VSTORES_MAX 2
> >
> >   #define TYPE_HEXAGON_CPU "hexagon-cpu"
> >
> > @@ -52,6 +54,13 @@ typedef struct {
> >       uint64_t data64;
> >   } MemLog;
> >
> > +typedef struct {
> > +    target_ulong va;
> > +    int size;
> > +    MMVector mask QEMU_ALIGNED(16);
> > +    MMVector data QEMU_ALIGNED(16);
> > +} VStoreLog;
> 
> Do you really need a MMVector mask, or should this be a QRegMask?


You are correct.  I'll change this.

> 
> > -    target_ulong gather_issued;
> > +    bool gather_issued;
> 
> Surely unrelated to adding state.

This was unintentionally included in the patch series for the scalar core.  
Based on previous feedback, I know it should be a bool.  However, this can 
actually be removed altogether.  So, in the next iteration of this series, 
you'll see it removed.

> 
> > +    MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> > +    MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> > +    MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> 
> Ok, this is where I'm not keen on how you handle this for integer code, and
> for vector code has got to be past the realm of acceptable.
> 
> You have exactly 4 slots in your vliw packet.  You cannot possibly use 32
> future or tmp slots.  For integers this wastage was at least small, but for
> vectors these waste just shy of 8k.
> 
> All you need to do is to be smarter about mapping e.g. 5 to 8 temp slots
> during translation.

OK

> 
> > +    MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> > +    MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> 
> Likewise.
> 
> > +    /* Temporaries used within instructions */
> > +    MMVector zero_vector QEMU_ALIGNED(16);
> 
> You must be doing something wrong to need zero in memory.

The architecture specifies that if you use a .new in a packet where the vector 
register isn't  defined, it gets zero.  So, the generator produces the 
following for .new references
    const intptr_t OsN_off =
        test_bit(OsN_num, ctx->vregs_updated)
            ? offsetof(CPUHexagonState, future_VRegs[OsN_num])
            : offsetof(CPUHexagonState, zero_vector);

> 
> > +/*
> > + * The HVX register dump takes up a ton of space in the log
> > + * Don't print it unless it is needed  */ #define DUMP_HVX 0 #if
> > +DUMP_HVX
> > +    qemu_fprintf(f, "Vector Registers = {\n");
> > +    for (int i = 0; i < NUM_VREGS; i++) {
> > +        print_vreg(f, env, i);
> > +    }
> > +    for (int i = 0; i < NUM_QREGS; i++) {
> > +        print_qreg(f, env, i);
> > +    }
> > +    qemu_fprintf(f, "}\n");
> > +#endif
> 
> Use CPU_DUMP_FPU, controlled by -d fpu.

These aren't FP registers, but OK.


> 
> > -    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
> > +    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS
> +
> > + NUM_QREGS;
> 
> They're not really core regs though are they?
> Surely gdb_register_coprocessor is a better representation.

I'll look into this.


Thanks,
Taylor

RE: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core

Reply via email to