Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Richard Biener
On Wed, Aug 7, 2024 at 1:37 PM Alexander Monakov wrote: > > > On Wed, 7 Aug 2024, Richard Biener wrote: > > > > > This is probably to work around bugs in older compiler versions? If > > > > not I agree. > > > > > > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not > > > macro-

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Alexander Monakov
On Wed, 7 Aug 2024, Richard Biener wrote: > > > This is probably to work around bugs in older compiler versions? If > > > not I agree. > > > > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not > > macro-fused on Intel, so with propagation it is two uops early in the > > CPU

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 01:16:20PM +0200, Richard Biener wrote: > Well, merging the memory operand into the pshufb would be wrong - embedded > memory ops are always considered aligned, no? Depends. For VEX/EVEX encoded can be unaligned, for the pre-AVX encoding aligned except when in explicitly u

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Richard Biener
On Wed, Aug 7, 2024 at 11:08 AM Alexander Monakov wrote: > > > On Wed, 7 Aug 2024, Richard Biener wrote: > > > > > + data = *(const v16qi_u *)s; > > > > + /* Prevent propagation into pshufb and pcmp as memory operand. > > > > */ > > > > + __asm__ ("" : "+x" (data)); > > > > > > It

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Alexander Monakov
On Wed, 7 Aug 2024, Richard Biener wrote: > > > + data = *(const v16qi_u *)s; > > > + /* Prevent propagation into pshufb and pcmp as memory operand. */ > > > + __asm__ ("" : "+x" (data)); > > > > It would probably make sense to a file a PR on this separately, > > to eventually fi

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Richard Biener
On Tue, Aug 6, 2024 at 8:50 PM Andi Kleen wrote: > > > - s += 16; > > + v16qi data, t; > > + /* Unaligned load. Reading beyond the final newline is safe, since > > + files.cc:read_file_guts pads the allocation. */ > > You need to change that function to use 32 byte padding as

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Andi Kleen
On Tue, Aug 06, 2024 at 11:50:00AM -0700, Andi Kleen wrote: > > - s += 16; > > + v16qi data, t; > > + /* Unaligned load. Reading beyond the final newline is safe, since > > +files.cc:read_file_guts pads the allocation. */ > > You need to change that function to use 32 byte pad

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Andi Kleen
> - s += 16; > + v16qi data, t; > + /* Unaligned load. Reading beyond the final newline is safe, since > + files.cc:read_file_guts pads the allocation. */ You need to change that function to use 32 byte padding as Jakub pointed out (I forgot that too) > + data = *(const

[PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Alexander Monakov
Since the characters we are searching for (CR, LF, '\', '?') all have distinct ASCII codes mod 16, PSHUFB can help match them all at once. libcpp/ChangeLog: * lex.cc (search_line_sse42): Replace with... (search_line_ssse3): ... this new function. Adjust the use... (init_v