On Wed, May 28, 2025 at 10:18 AM David Rowley <dgrowle...@gmail.com> wrote:

> I think we'll end up needing some SWAR code. There are plenty of
> places where 16 bytes is too much to do at once. e.g. looking for the
> delimiter character in a COPY FROM, 16 is likely too many when you're
> important a bunch of smallish ints. A 4 or 8 byte SWAR search is
> likely better for that. With 16 you're probably going to find a
> delimiter every time you look and do byte-at-a-time processing to find
> that delimiter.

I believe I saw an implementation or paper where the delimiter
locations for an input segment are turned into a bitmap, and in that
case the stride wouldn't matter so much. I've seen a use of SWAR in a
hash table that intentionally allowed bytes to overflow into the next
higher byte, which doesn't happen in SIMD lanes, so it can be useful
on all platforms.

> Isn't that mostly a performance regression?

D'oh! My brain was imagining time, not TPS.

> How does it do with ANSI chars where the high bit is set?

It would always fail in that case.

> I had in mind we'd have a swar.h header and have a bunch of inline
> functions for this in there. I've not yet studied how well compilers
> would inline multiple such SWAR functions to de-duplicate the common
> parts.

That would be a good step when we have a use case, and with that we
might also be able to clean up some odd-looking code in simd.h.

-- 
John Naylor
Amazon Web Services


Reply via email to