Dear GNU coreutils maintainers,
It seems that I found a way to both speed-up (~10%) and simplify (13
insertions, 43 deletions) the wc -l avx code while playing with it, at
least on several million to 1 billion row files I tested with my cpu.
It mostly involves using _mm256_movemask_epi8 and __bui
Here is the proposed patch for both simplifying and consistently speeding
up the avx version of wc -l by 10% in up to 1 billion rows scenarios on
7800X3D (probably should be tested on different data samples and CPUs).
---
src/wc_avx2.c | 56 ---
1 fil
On Sun, 31 Mar 2024 at 18:17, Pádraig Brady wrote:
> On 31/03/2024 13:12, Pádraig Brady wrote:
> > On 31/03/2024 00:18, Evgeny Nizhibitsky wrote:
> >> Here is the proposed patch for both simplifying and consistently
> speeding up the avx version of wc -l by 10% in up