wc -l AVX code 10%+10% speedup

2024-03-30 Thread Evgeny Nizhibitsky
Dear GNU coreutils maintainers, It seems that I found a way to both speed-up (~10%) and simplify (13 insertions, 43 deletions) the wc -l avx code while playing with it, at least on several million to 1 billion row files I tested with my cpu. It mostly involves using _mm256_movemask_epi8 and __bui

[PATCH] wc: speed-up by simplifying avx code

2024-03-30 Thread Evgeny Nizhibitsky
Here is the proposed patch for both simplifying and consistently speeding up the avx version of wc -l by 10% in up to 1 billion rows scenarios on 7800X3D (probably should be tested on different data samples and CPUs). --- src/wc_avx2.c | 56 --- 1 fil

Re: [PATCH] wc: speed-up by simplifying avx code

2024-03-31 Thread Evgeny Nizhibitsky
On Sun, 31 Mar 2024 at 18:17, Pádraig Brady wrote: > On 31/03/2024 13:12, Pádraig Brady wrote: > > On 31/03/2024 00:18, Evgeny Nizhibitsky wrote: > >> Here is the proposed patch for both simplifying and consistently > speeding up the avx version of wc -l by 10% in up