On Mon, Sep 28, 2020 at 03:47:56PM +0100, Dave Martin wrote: > On Mon, Sep 28, 2020 at 02:59:34PM +0100, André Przywara wrote: > > On 28/09/2020 14:21, Dave Martin wrote: > > > > Hi Dave, > > > > > On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote: > > >> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension > > >> that introduces very long vector operations (up to 2048 bits). > > > > > > (8192, in fact, though don't expect to see that on real hardware any > > > time soon... qemu and the Arm fast model can do it, though.) > > > > > >> The SPE profiling feature can tag SVE instructions with additional > > >> properties like predication or the effective vector length. > > >> > > >> Decode the new operation type bits in the SPE decoder to allow the perf > > >> tool to correctly report about SVE instructions. > > > > > > > > > I don't know anything about SPE, so just commenting on a few minor > > > things that catch my eye here. > > > > Many thanks for taking a look! > > Please note that I actually missed a prior submission by Wei, so the > > code changes here will end up in: > > https://lore.kernel.org/patchwork/patch/1288413/ > > > > But your two points below magically apply to his patch as well, so.... > > > > > > > >> Signed-off-by: Andre Przywara <andre.przyw...@arm.com> > > >> --- > > >> .../arm-spe-decoder/arm-spe-pkt-decoder.c | 48 ++++++++++++++++++- > > >> 1 file changed, 47 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c > > >> b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c > > >> index a033f34846a6..f0c369259554 100644 > > >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c > > >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c > > >> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt > > >> *packet, char *buf, > > >> } > > >> case ARM_SPE_OP_TYPE: > > >> switch (idx) { > > >> - case 0: return snprintf(buf, buf_len, "%s", payload & > > >> 0x1 ? > > >> + case 0: { > > >> + size_t blen = buf_len; > > >> + > > >> + if ((payload & 0x89) == 0x08) { > > >> + ret = snprintf(buf, buf_len, "SVE"); > > >> + buf += ret; > > >> + blen -= ret; > > > > > > (Nit: can ret be < 0 ? I've never been 100% clear on this myself for > > > the s*printf() family -- if this assumption is widespread in perf tool > > > a lready that I guess just go with the flow.) > > > > Yeah, some parts of the code in here check for -1, actually, but doing > > this on every call to snprintf would push this current code over the > > edge - and I cowardly avoided a refactoring ;-) > > > > Please note that his is perf userland, and also we are printing constant > > strings here. > > Although admittedly this starts to sounds like an excuse now ... > > > > > I wonder if this snprintf+increment+decrement sequence could be wrapped > > > up as a helper, rather than having to be repeated all over the place. > > > > Yes, I was hoping nobody would notice ;-) > > It's probably not worth losing sleep over. > > snprintf(3) says, under NOTES: > > Until glibc 2.0.6, they would return -1 when the output was > truncated. > > which is probably ancient enough history that we don't care. C11 does > say that a negative return value can happen "if an encoding error > occurred". _Probably_ not a problem if perf tool never calls > setlocale(), but ...
I have one patch which tried to fix the snprintf+increment sequence [1], to be honest, the change seems urgly for me. I agree it's better to use a helper to wrap up. [1] https://lore.kernel.org/patchwork/patch/1288410/ > > >> + if (payload & 0x2) > > >> + ret = snprintf(buf, buf_len, " > > >> FP"); > > >> + else > > >> + ret = snprintf(buf, buf_len, " > > >> INT"); > > >> + buf += ret; > > >> + blen -= ret; > > >> + if (payload & 0x4) { > > >> + ret = snprintf(buf, buf_len, " > > >> PRED"); > > >> + buf += ret; > > >> + blen -= ret; > > >> + } > > >> + /* Bits [7..4] encode the vector length > > >> */ > > >> + ret = snprintf(buf, buf_len, " EVLEN%d", > > >> + 32 << ((payload >> 4) & > > >> 0x7)); > > > > > > Isn't this just extracting 3 bits (0x7)? > > > > Ah, right, the comment is wrong. It's actually bits [6:4]. > > > > > And what unit are we aiming > > > for here: is it the number of bytes per vector, or something else? I'm > > > confused by the fact that this will go up in steps of 32, which doesn't > > > seem to match up to the architecure. > > > > So this is how SPE encodes the effective vector length in its payload: > > the format is described in section "D10.2.7 Operation Type packet" in a > > (recent) ARMv8 ARM. I put the above statement in a C file and ran all > > input values through it, it produced the exact *bit* length values as in > > the spec. > > > > Is there any particular pattern you are concerned about? > > I admit this is somewhat hackish, I can do an extra function to put some > > comments in there. > > Mostly I'm curious because the encoding doesn't match the SVE > architecture: SVE requires 4 bits to specify the vector length, not 3. > This might have been a deliberate limitation in the SPE spec., but it > raises questions about what should happen when 3 bits is not enough. > > For SVE, valid vector lengths are 16 bytes * n > or equivalently 128 bits * n), where 1 <= n <= 16. > > The code here though cannot print EVLEN16 or EVLEN48 etc. This might > not be a bug, but I'd like to understand where it comes from... In the SPE's spec, the defined values for EVL are: 0b'000 -> EVLEN: 32 bits. 0b'001 -> EVLEN: 64 bits. 0b'010 -> EVLEN: 128 bits. 0b'011 -> EVLEN: 256 bits. 0b'100 -> EVLEN: 512 bits. 0b'101 -> EVLEN: 1024 bits. 0b'110 -> EVLEN: 2048 bits. Note that 0b'111 is reserved. In theory, I think SPE Operation packet can support up to 4196 bits (32 << 7) when the EVL field is 0b'111; but it's impossible to express vector length for 8192 bits as you mentioned. Thanks, Leo > > > I notice that bit 7 has to be zero to get into this if() though. > > > > > >> + buf += ret; > > >> + blen -= ret; > > >> + return buf_len - blen; > > >> + } > > >> + > > >> + return snprintf(buf, buf_len, "%s", payload & > > >> 0x1 ? > > >> "COND-SELECT" : "INSN-OTHER"); > > >> + } > > >> case 1: { > > >> size_t blen = buf_len; > > >> > > >> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt > > >> *packet, char *buf, > > >> ret = snprintf(buf, buf_len, " > > >> NV-SYSREG"); > > >> buf += ret; > > >> blen -= ret; > > >> + } else if ((payload & 0x0a) == 0x08) { > > >> + ret = snprintf(buf, buf_len, " SVE"); > > >> + buf += ret; > > >> + blen -= ret; > > >> + if (payload & 0x4) { > > >> + ret = snprintf(buf, buf_len, " > > >> PRED"); > > >> + buf += ret; > > >> + blen -= ret; > > >> + } > > >> + if (payload & 0x80) { > > >> + ret = snprintf(buf, buf_len, " > > >> SG"); > > >> + buf += ret; > > >> + blen -= ret; > > >> + } > > >> + /* Bits [7..4] encode the vector length > > >> */ > > >> + ret = snprintf(buf, buf_len, " EVLEN%d", > > >> + 32 << ((payload >> 4) & > > >> 0x7)); > > > > > > Same comment as above. Maybe have a common helper for decoding the > > > vector length bits so it can be fixed in a single place? > > > > Yup. Although I wonder if this is the smallest of the problems with this > > function going forward. > > > > Cheers, > > Andre > > Fair enough. > > Cheers > ---Dave