Re: [PATCH 0/2] x86: Optimize memchr() for x86-64

2022-05-28 Thread Andi Kleen
On 5/28/2022 1:12 AM, Yu-Jen Chang wrote: *** BLURB HERE *** These patch series add an optimized "memchr()" for x86-64 and USER-MODE LINUX (UML). There exists an assemebly implementation for x86-32. However, for x86-64, there isn't any optimized version. We implement word-wise comparison so

Re: [PATCH 1/2] x86/lib: Optimize memchr()

2022-05-28 Thread Tao Zhou
On Sat, May 28, 2022 at 04:12:35PM +0800, Yu-Jen Chang wrote: > The original assembly version of memchr() is implemented with > the byte-wise comparing technique, which does not fully > use 64-bits registers in x86_64 CPU. We use word-wide > comparing so that 8 characters can be compared at the sa

[PATCH 2/2] x86/um: Use x86_64-optimized memchr

2022-05-28 Thread Yu-Jen Chang
Add x86_64-optimized memchr, which is 4x faster than the original implementation, into um. Signed-off-by: Yu-Jen Chang Signed-off-by: Ching-Chun (Jim) Huang --- arch/x86/um/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile i

[PATCH 1/2] x86/lib: Optimize memchr()

2022-05-28 Thread Yu-Jen Chang
The original assembly version of memchr() is implemented with the byte-wise comparing technique, which does not fully use 64-bits registers in x86_64 CPU. We use word-wide comparing so that 8 characters can be compared at the same time on x86_64 CPU. First we align the input and then use word-wise

[PATCH 0/2] x86: Optimize memchr() for x86-64

2022-05-28 Thread Yu-Jen Chang
*** BLURB HERE *** These patch series add an optimized "memchr()" for x86-64 and USER-MODE LINUX (UML). There exists an assemebly implementation for x86-32. However, for x86-64, there isn't any optimized version. We implement word-wise comparison so that 8 characters can be compared at the sam