Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-25 Thread Nathan Bossart
On Wed, Jan 22, 2025 at 10:58:09AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> The functions that test the length before potentially calling a function >> pointer should probably be inlined (see pg_popcount() in pg_bitutils.h). >> I wouldn't be surprised if some compilers are inlining this

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-25 Thread Nathan Bossart
On Wed, Jan 22, 2025 at 11:10:10AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > I realized I didn't attach the patch. Thanks. Would you mind creating a commitfest entry for this one? -- nathan

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-22 Thread chiranmoy.bhattacha...@fujitsu.com
I realized I didn't attach the patch. v2-0001-SVE-support-for-hex-encode-and-hex-decode.patch Description: v2-0001-SVE-support-for-hex-encode-and-hex-decode.patch

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-22 Thread chiranmoy.bhattacha...@fujitsu.com
> The approach looks generally reasonable to me, but IMHO the code needs much more commentary to explain how it works. Added comments to explain the SVE implementation. > I would be interested to see how your bytea test compares with the improvements added in commit e24d770 and with sending the

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-17 Thread Nathan Bossart
With commit e24d770 in place, I took a closer look at hex_decode(), and I concluded that doing anything better without intrinsics would likely require either a huge lookup table or something with complexity rivalling the instrinsics approach (while also not rivalling its performance). So, I took a

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-15 Thread Tom Lane
David Rowley writes: > I agree that the evidence you (John) gathered is enough reason to use > memcpy(). Okay ... doesn't quite match my intuition, but intuition is a poor guide to such things. regards, tom lane

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-15 Thread David Rowley
On Wed, 15 Jan 2025 at 23:57, John Naylor wrote: > > On Wed, Jan 15, 2025 at 2:14 PM Tom Lane wrote: > > Compilers that inline memcpy() may arrive at the same machine code, > > but why rely on the compiler to make that optimization? If the > > compiler fails to do so, an out-of-line memcpy() cal

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-15 Thread Ranier Vilela
Hi. Em qua., 15 de jan. de 2025 às 07:57, John Naylor escreveu: > On Wed, Jan 15, 2025 at 2:14 PM Tom Lane wrote: > > > Couple of thoughts: > > > > 1. I was actually hoping for a comment on the constant's definition, > > perhaps along the lines of > > > > /* > > * The hex expansion of each pos

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-15 Thread John Naylor
On Wed, Jan 15, 2025 at 2:14 PM Tom Lane wrote: > Couple of thoughts: > > 1. I was actually hoping for a comment on the constant's definition, > perhaps along the lines of > > /* > * The hex expansion of each possible byte value (two chars per value). > */ Works for me. With that, did you mean

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-14 Thread Tom Lane
John Naylor writes: > Okay, I added a comment. I also agree with Michael that my quick > one-off was a bit hard to read so I've cleaned it up a bit. I plan to > commit the attached by Friday, along with any bikeshedding that > happens by then. Couple of thoughts: 1. I was actually hoping for a c

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-14 Thread John Naylor
On Tue, Jan 14, 2025 at 11:57 PM Nathan Bossart wrote: > > On Tue, Jan 14, 2025 at 12:59:04AM -0500, Tom Lane wrote: > > John Naylor writes: > >> We can do about as well simply by changing the nibble lookup to a byte > >> lookup, which works on every compiler and architecture: > > Nice. I tried

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-14 Thread Nathan Bossart
On Tue, Jan 14, 2025 at 12:59:04AM -0500, Tom Lane wrote: > John Naylor writes: >> We can do about as well simply by changing the nibble lookup to a byte >> lookup, which works on every compiler and architecture: Nice. I tried enabling auto-vectorization and loop unrolling on top of this patch,

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-13 Thread Tom Lane
John Naylor writes: > We can do about as well simply by changing the nibble lookup to a byte > lookup, which works on every compiler and architecture: I didn't attempt to verify your patch, but I do prefer addressing this issue in a machine-independent fashion. I also like the brevity of the pat

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-13 Thread Michael Paquier
On Tue, Jan 14, 2025 at 12:27:30PM +0700, John Naylor wrote: > We can do about as well simply by changing the nibble lookup to a byte > lookup, which works on every compiler and architecture: > > select hex_encode_test(100, 1024); > master: > Time: 1158.700 ms > v2: > Time: 777.443 ms > > If

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-13 Thread John Naylor
On Sat, Jan 11, 2025 at 3:46 AM Nathan Bossart wrote: > > I was able to get auto-vectorization to take effect on Apple clang 16 with > the following addition to src/backend/utils/adt/Makefile: > > encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8 > > This gave the follow

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-13 Thread Nathan Bossart
On Mon, Jan 13, 2025 at 03:48:49PM +, chiranmoy.bhattacha...@fujitsu.com wrote: > There is a 30% improvement using auto-vectorization. It might be worth enabling auto-vectorization independently of any patches that use intrinsics, then. > Currently, it is assumed that all aarch64 machine sup

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-13 Thread chiranmoy.bhattacha...@fujitsu.com
On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote: > Do you mean that the auto-vectorization worked and you observed no > performance improvement, or the auto-vectorization had no effect on the > code generated? Auto-vectorization is working now with the following addition on Graviton

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-10 Thread Nathan Bossart
On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote: > On Fri, Jan 10, 2025 at 11:10:03AM +, chiranmoy.bhattacha...@fujitsu.com > wrote: >> We tried auto-vectorization and observed no performance improvement. > > Do you mean that the auto-vectorization worked and you observed no >

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-10 Thread Nathan Bossart
On Fri, Jan 10, 2025 at 11:10:03AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > We tried auto-vectorization and observed no performance improvement. Do you mean that the auto-vectorization worked and you observed no performance improvement, or the auto-vectorization had no effect on the cod

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-10 Thread chiranmoy.bhattacha...@fujitsu.com
Hello Nathan, We tried auto-vectorization and observed no performance improvement. The instructions in src/include/port/simd.h are based on older SIMD architectures like NEON, whereas the patch uses the newer SVE, so some of the instructions used in the patch may not have direct equivalents in N

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-09 Thread Nathan Bossart
On Thu, Jan 09, 2025 at 11:22:05AM +, devanga.susmi...@fujitsu.com wrote: > This email aims to discuss the contribution of optimized hex_encode and > hex_decode functions for ARM (aarch64) machines. These functions are > widely used for encoding and decoding binary data in the bytea data type.