On Tue, Jul 30, 2024 at 08:41:59AM -0700, Andi Kleen wrote:
> From: Andi Kleen <a...@gcc.gnu.org>
> 
> AVX2 is widely available on x86 and it allows to do the scanner line
> check with 32 bytes at a time. The code is similar to the SSE2 code
> path, just using AVX and 32 bytes at a time instead of SSE2 16 bytes.
> 
> Also adjust the code to allow inlining when the compiler
> is built for an AVX2 host, following what other architectures
> do.
> 
> I see about a ~0.6% compile time improvement for compiling i386
> insn-recog.i with -O0.
> 
> libcpp/ChangeLog:
> 
>       * config.in (HAVE_AVX2): Add.
>       * configure: Regenerate.
>       * configure.ac: Add HAVE_AVX2 check.
>       * lex.cc (repl_chars): Extend to 32 bytes.
>       (search_line_avx2): New function to scan line using AVX2.
>       (init_vectorized_lexer): Check for AVX2 in CPUID.

I'd like to just mention that there in libcpp/files.cc (read_file_guts)
we have
  /* The + 16 here is space for the final '\n' and 15 bytes of padding,
     used to quiet warnings from valgrind or Address Sanitizer, when the
     optimized lexer accesses aligned 16-byte memory chunks, including
     the bytes after the malloced, area, and stops lexing on '\n'.  */
  buf = XNEWVEC (uchar, size + 16);
So, if for AVX2 we handle 32 bytes at a time rather than 16 this would
need to change (at least conditionally for arches where the AVX2 code could
be used).

        Jakub

Reply via email to