The first patch has been seen before. https://patchwork.ozlabs.org/patch/1115039/
It had a bug and I didn't fix it right away and then forgot. Fixed now; I had mixed up the operand ordering for aarch32. The next 3 are something that I noticed while doing other stuff. In particular, pmull is used heavily during https transfers. While cloning a repository, the old code peaks at 27% of the total runtime, as measured by perf top. The new code does not quite reach 3% repeating the same clone. In addition, the new helper functions are in the form that will be required for the implementation of SVE2. The comment in patch 2 about ARMv8.4-DIT is perhaps a stretch, but re-reading the pmull instruction description in the current ARM ARM brought it to mind. Since TCG is officially not in the security domain, it's probably not a bug to just claim to support DIT without actually doing anything to ensure the algorithms used are in fact timing independent of the data. On the other hand, I expect the bit distribution of stuff going through these sort of hashing algorithms to approach 50% 1's and 0's, so I also don't think we gain anything on average to terminate the loop early. Thoughts on DIT specifically? r~ Richard Henderson (4): target/arm: Vectorize USHL and SSHL target/arm: Convert PMUL.8 to gvec target/arm: Convert PMULL.64 to gvec target/arm: Convert PMULL.8 to gvec target/arm/helper-sve.h | 2 + target/arm/helper.h | 21 ++- target/arm/translate.h | 6 + target/arm/neon_helper.c | 117 ------------- target/arm/translate-a64.c | 83 ++++----- target/arm/translate.c | 350 ++++++++++++++++++++++++++++++++----- target/arm/vec_helper.c | 211 ++++++++++++++++++++++ 7 files changed, 562 insertions(+), 228 deletions(-) -- 2.17.1