https://bugs.kde.org/show_bug.cgi?id=432801
--- Comment #17 from Julian Seward <[email protected]> --- Interesting analysis, and a plausible patch; thank you for that. This seems like a new trick from LLVM. I'm still struggling to understand what's going on, though. I can see that for (size_t i = 0; i < plen; ++i) hp += pattern[i]; could be vectorised as you say, so that it loads 4 bytes at a time, and uses punpcklbw twice to interleave them as described in comment 12. But: * where's the addition instruction that merges the lanes together? I don't see that. * what is the purpose of the pcmpgtd instruction? The original sources contain a scalar comparison against zero if (hp==j) { j++; } Is that related? If so, how does a scalar 32-bit equality test against zero get translated into a vector 32x4 signed-greater-than operation? --- In the patch, there's mention of biasing: + // From here on out, we're dealing with biased integers instead of 2's + // complement. What does that mean, in this context? Regarding the test: * you put it in memcheck/tests/x86; "x86" here means 32-bit only. Is that what you intended? I would have expected it to go in the "amd64" directory. * because the test is written in C, whether or not it tests what you expect it to test depends entirely on the compiler used to compile it. And most likely, it won't be vectorised, or won't be vectorised in the same way. This kind of test really needs to be written in assembly (inline assembly) so we know what we're testing. -- You are receiving this mail because: You are watching all bug changes.
