On Tue, Jul 30, 2024 at 08:41:59AM -0700, Andi Kleen wrote: > From: Andi Kleen <a...@gcc.gnu.org> > > AVX2 is widely available on x86 and it allows to do the scanner line > check with 32 bytes at a time. The code is similar to the SSE2 code > path, just using AVX and 32 bytes at a time instead of SSE2 16 bytes. > > Also adjust the code to allow inlining when the compiler > is built for an AVX2 host, following what other architectures > do. > > I see about a ~0.6% compile time improvement for compiling i386 > insn-recog.i with -O0. > > libcpp/ChangeLog: > > * config.in (HAVE_AVX2): Add. > * configure: Regenerate. > * configure.ac: Add HAVE_AVX2 check. > * lex.cc (repl_chars): Extend to 32 bytes. > (search_line_avx2): New function to scan line using AVX2. > (init_vectorized_lexer): Check for AVX2 in CPUID.
I'd like to just mention that there in libcpp/files.cc (read_file_guts) we have /* The + 16 here is space for the final '\n' and 15 bytes of padding, used to quiet warnings from valgrind or Address Sanitizer, when the optimized lexer accesses aligned 16-byte memory chunks, including the bytes after the malloced, area, and stops lexing on '\n'. */ buf = XNEWVEC (uchar, size + 16); So, if for AVX2 we handle 32 bytes at a time rather than 16 this would need to change (at least conditionally for arches where the AVX2 code could be used). Jakub