On Wed, May 28, 2025 at 10:18 AM David Rowley <dgrowle...@gmail.com> wrote:
> I think we'll end up needing some SWAR code. There are plenty of > places where 16 bytes is too much to do at once. e.g. looking for the > delimiter character in a COPY FROM, 16 is likely too many when you're > important a bunch of smallish ints. A 4 or 8 byte SWAR search is > likely better for that. With 16 you're probably going to find a > delimiter every time you look and do byte-at-a-time processing to find > that delimiter. I believe I saw an implementation or paper where the delimiter locations for an input segment are turned into a bitmap, and in that case the stride wouldn't matter so much. I've seen a use of SWAR in a hash table that intentionally allowed bytes to overflow into the next higher byte, which doesn't happen in SIMD lanes, so it can be useful on all platforms. > Isn't that mostly a performance regression? D'oh! My brain was imagining time, not TPS. > How does it do with ANSI chars where the high bit is set? It would always fail in that case. > I had in mind we'd have a swar.h header and have a bunch of inline > functions for this in there. I've not yet studied how well compilers > would inline multiple such SWAR functions to de-duplicate the common > parts. That would be a good step when we have a use case, and with that we might also be able to clean up some odd-looking code in simd.h. -- John Naylor Amazon Web Services