On Wed, Aug 7, 2024 at 7:41 AM Alexander Monakov <amona...@ispras.ru> wrote: > > > On Tue, 6 Aug 2024, Alexander Monakov wrote: > > > --- a/libcpp/files.cc > > +++ b/libcpp/files.cc > [...] > > + pad = HAVE_AVX2 ? 32 : 16; > > This should have been > > #ifdef HAVE_AVX2 > pad = 32; > #else > pad = 16; > #endif
OK with that change. Did you think about a AVX512 version (possibly with 32 byte vectors)? In case there's a more efficient variant of pshufb/pmovmskb available there - possibly the load on the branch unit could be lessened with using masking. Thanks, Richard. > Alexander