> -----Original Message----- > From: Richard Henderson <richard.hender...@linaro.org> > Sent: Sunday, July 25, 2021 8:08 AM > To: Taylor Simpson <tsimp...@quicinc.com>; qemu-devel@nongnu.org > Cc: phi...@redhat.com; a...@rev.ng; Brian Cain <bc...@quicinc.com>; > peter.mayd...@linaro.org > Subject: Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon > Vector eXtensions (HVX) to core > > On 7/5/21 1:34 PM, Taylor Simpson wrote: > > HVX is a set of wide vector instructions. Machine state includes > > vector registers (VRegs) > > vector predicate registers (QRegs) > > temporary registers for intermediate values > > store buffer (masked stores and scatter/gather) > > > > Signed-off-by: Taylor Simpson <tsimp...@quicinc.com> > > --- > > target/hexagon/cpu.h | 35 ++++++++++++++++- > > target/hexagon/hex_arch_types.h | 5 +++ > > target/hexagon/insn.h | 3 ++ > > target/hexagon/internal.h | 3 ++ > > target/hexagon/mmvec/mmvec.h | 83 > +++++++++++++++++++++++++++++++++++++++++ > > target/hexagon/cpu.c | 72 > ++++++++++++++++++++++++++++++++++- > > 6 files changed, 198 insertions(+), 3 deletions(-) > > create mode 100644 target/hexagon/mmvec/mmvec.h > > > > diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index > > 2855dd3..0b377c3 100644 > > --- a/target/hexagon/cpu.h > > +++ b/target/hexagon/cpu.h > > @@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState; > > #include "qemu-common.h" > > #include "exec/cpu-defs.h" > > #include "hex_regs.h" > > +#include "mmvec/mmvec.h" > > > > #define NUM_PREGS 4 > > #define TOTAL_PER_THREAD_REGS 64 > > @@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState; > > #define STORES_MAX 2 > > #define REG_WRITES_MAX 32 > > #define PRED_WRITES_MAX 5 /* 4 insns + endloop */ > > +#define VSTORES_MAX 2 > > > > #define TYPE_HEXAGON_CPU "hexagon-cpu" > > > > @@ -52,6 +54,13 @@ typedef struct { > > uint64_t data64; > > } MemLog; > > > > +typedef struct { > > + target_ulong va; > > + int size; > > + MMVector mask QEMU_ALIGNED(16); > > + MMVector data QEMU_ALIGNED(16); > > +} VStoreLog; > > Do you really need a MMVector mask, or should this be a QRegMask?
You are correct. I'll change this. > > > - target_ulong gather_issued; > > + bool gather_issued; > > Surely unrelated to adding state. This was unintentionally included in the patch series for the scalar core. Based on previous feedback, I know it should be a bool. However, this can actually be removed altogether. So, in the next iteration of this series, you'll see it removed. > > > + MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16); > > + MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16); > > + MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16); > > Ok, this is where I'm not keen on how you handle this for integer code, and > for vector code has got to be past the realm of acceptable. > > You have exactly 4 slots in your vliw packet. You cannot possibly use 32 > future or tmp slots. For integers this wastage was at least small, but for > vectors these waste just shy of 8k. > > All you need to do is to be smarter about mapping e.g. 5 to 8 temp slots > during translation. OK > > > + MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16); > > + MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16); > > Likewise. > > > + /* Temporaries used within instructions */ > > + MMVector zero_vector QEMU_ALIGNED(16); > > You must be doing something wrong to need zero in memory. The architecture specifies that if you use a .new in a packet where the vector register isn't defined, it gets zero. So, the generator produces the following for .new references const intptr_t OsN_off = test_bit(OsN_num, ctx->vregs_updated) ? offsetof(CPUHexagonState, future_VRegs[OsN_num]) : offsetof(CPUHexagonState, zero_vector); > > > +/* > > + * The HVX register dump takes up a ton of space in the log > > + * Don't print it unless it is needed */ #define DUMP_HVX 0 #if > > +DUMP_HVX > > + qemu_fprintf(f, "Vector Registers = {\n"); > > + for (int i = 0; i < NUM_VREGS; i++) { > > + print_vreg(f, env, i); > > + } > > + for (int i = 0; i < NUM_QREGS; i++) { > > + print_qreg(f, env, i); > > + } > > + qemu_fprintf(f, "}\n"); > > +#endif > > Use CPU_DUMP_FPU, controlled by -d fpu. These aren't FP registers, but OK. > > > - cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS; > > + cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS > + > > + NUM_QREGS; > > They're not really core regs though are they? > Surely gdb_register_coprocessor is a better representation. I'll look into this. Thanks, Taylor