Re: use SSE2 for is_valid_ascii

John Naylor Wed, 10 Aug 2022 21:11:00 -0700

On Thu, Aug 11, 2022 at 5:31 AM Nathan Bossart <nathandboss...@gmail.com> wrote:
>
> This is a neat patch.  I don't know that we need an entirely separate code
> block for the USE_SSE2 path, but I do think that a little bit of extra
> commentary would improve the readability.  IMO the existing comment for the
> zero accumulator has the right amount of detail.
>
> +               /*
> +                * Set all bits in each lane of the error accumulator where 
> input
> +                * bytes are zero.
> +                */
> +               error_cum = _mm_or_si128(error_cum,
> +                                                                
> _mm_cmpeq_epi8(chunk, _mm_setzero_si128()));


Okay, I will think about the comments, thanks for looking.

> I wonder if reusing a zero vector (instead of creating a new one every
> time) has any noticeable effect on performance.

Creating a zeroed register is just FOO PXOR FOO, which should get
hoisted out of the (unrolled in this case) loop, and which a recent
CPU will just map to a hard-coded zero in the register file, in which
case the execution latency is 0 cycles. :-)

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: use SSE2 for is_valid_ascii

Reply via email to