On Wed, Aug 7, 2024 at 1:37 PM Alexander Monakov wrote:
>
>
> On Wed, 7 Aug 2024, Richard Biener wrote:
>
> > > > This is probably to work around bugs in older compiler versions? If
> > > > not I agree.
> > >
> > > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
> > > macro-
On Wed, 7 Aug 2024, Richard Biener wrote:
> > > This is probably to work around bugs in older compiler versions? If
> > > not I agree.
> >
> > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
> > macro-fused on Intel, so with propagation it is two uops early in the
> > CPU
On Wed, Aug 07, 2024 at 01:16:20PM +0200, Richard Biener wrote:
> Well, merging the memory operand into the pshufb would be wrong - embedded
> memory ops are always considered aligned, no?
Depends. For VEX/EVEX encoded can be unaligned, for the pre-AVX encoding
aligned except when in explicitly u
On Wed, Aug 7, 2024 at 11:08 AM Alexander Monakov wrote:
>
>
> On Wed, 7 Aug 2024, Richard Biener wrote:
>
> > > > + data = *(const v16qi_u *)s;
> > > > + /* Prevent propagation into pshufb and pcmp as memory operand.
> > > > */
> > > > + __asm__ ("" : "+x" (data));
> > >
> > > It
On Wed, 7 Aug 2024, Richard Biener wrote:
> > > + data = *(const v16qi_u *)s;
> > > + /* Prevent propagation into pshufb and pcmp as memory operand. */
> > > + __asm__ ("" : "+x" (data));
> >
> > It would probably make sense to a file a PR on this separately,
> > to eventually fi
On Tue, Aug 6, 2024 at 8:50 PM Andi Kleen wrote:
>
> > - s += 16;
> > + v16qi data, t;
> > + /* Unaligned load. Reading beyond the final newline is safe, since
> > + files.cc:read_file_guts pads the allocation. */
>
> You need to change that function to use 32 byte padding as
On Tue, Aug 06, 2024 at 11:50:00AM -0700, Andi Kleen wrote:
> > - s += 16;
> > + v16qi data, t;
> > + /* Unaligned load. Reading beyond the final newline is safe, since
> > +files.cc:read_file_guts pads the allocation. */
>
> You need to change that function to use 32 byte pad
> - s += 16;
> + v16qi data, t;
> + /* Unaligned load. Reading beyond the final newline is safe, since
> + files.cc:read_file_guts pads the allocation. */
You need to change that function to use 32 byte padding as Jakub
pointed out (I forgot that too)
> + data = *(const
Since the characters we are searching for (CR, LF, '\', '?') all have
distinct ASCII codes mod 16, PSHUFB can help match them all at once.
libcpp/ChangeLog:
* lex.cc (search_line_sse42): Replace with...
(search_line_ssse3): ... this new function. Adjust the use...
(init_v