On Sun, Aug 28, 2022 at 10:58 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > Here is a new patch set in which I've attempted to address all feedback.
Looks in pretty good shape. Some more comments: + uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32); + uint32 nelem_per_iteration = 4 * nelem_per_vector; Using local #defines would be my style. I don't have a reason to object to this way, but adding const makes these vars more clear. Speaking of const: - const __m128i tmp1 = _mm_or_si128(result1, result2); - const __m128i tmp2 = _mm_or_si128(result3, result4); - const __m128i result = _mm_or_si128(tmp1, tmp2); + tmp1 = vector32_or(result1, result2); + tmp2 = vector32_or(result3, result4); + result = vector32_or(tmp1, tmp2); Any reason to throw away the const declarations? +static inline bool +vector32_is_highbit_set(const Vector32 v) +{ +#ifdef USE_SSE2 + return (_mm_movemask_epi8(v) & 0x8888) != 0; +#endif +} I'm not sure why we need this function -- AFAICS it just adds more work on x86 for zero benefit. For our present application, can we just cast to Vector8 (for Arm's sake) and call the 8-bit version? Aside from that, I plan on rewriting some comments for commit, some of which pre-date this patch: - * operations using bitwise operations on unsigned integers. + * operations using bitwise operations on unsigned integers. Note that many + * of the functions in this file presently do not have non-SIMD + * implementations. It's unclear to the reader whether this is a matter of 'round-to-it's. I'd like to document what I asserted in this thread, that it's likely not worthwhile to do anything with a uint64 representing two 32-bit ints. (It *is* demonstrably worth it for handling 8 byte-values at a time) * Use saturating subtraction to find bytes <= c, which will present as - * NUL bytes in 'sub'. + * NUL bytes. I'd like to to point out that the reason to do it this way is to workaround SIMD architectures frequent lack of unsigned comparison. + * Return the result of subtracting the respective elements of the input + * vectors using saturation. I wonder if we should explain briefly what saturating arithmetic is. I had never encountered it outside of a SIMD programming context. -- John Naylor EDB: http://www.enterprisedb.com