RE: CRC32C Parallel Computation Optimization on ARM

2023-12-03 Thread Xiang Gao
On Date: Thu, 30 Nov 2023 14:54:26PM -0600, Nathan Bossart wrote: >>pg_crc32c_armv8.o: CFLAGS += ${CFLAGS_CRC} ${CFLAGS_CRYPTO} >> >> It does not work correctly. CFLAGS ='-march=armv8-a+crc, >> -march=armv8-a+crypto', what actually works is '-march=armv8-a+crypto'. >> >> We set a new variable CLAGS

RE: Question about the Implementation of vector32_is_highbit_set on ARM

2023-11-23 Thread Xiang Gao
On Date: Mon, 20 Nov 2023 16:05:43PM +0700, John Naylor wrote: >On Wed, Nov 8, 2023 at 2:44=E2=80=AFPM Xiang Gao wrote: >> * function. We could instead adopt the behavior of Arm's vmaxvq_u32(), i= >.e. >> * check each 32-bit element, but that would require an additional

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-23 Thread Xiang Gao
On Date: Wed, 22 Nov 2023 15:06:18PM -0600, Nathan Bossart wrote: >> On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote: >>>+__attribute__((target("+crc+crypto"))) >>> >>>I'm not sure we can assume that all compilers will understand this, and I'm >>>not sure we need it. >> >> CFLAGS_

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-22 Thread Xiang Gao
On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote: >-# all versions of pg_crc32c_armv8.o need CFLAGS_CRC >-pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC) >-pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC) >-pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC) > >Why are these lines deleted? > >- [

Question about the Implementation of vector32_is_highbit_set on ARM

2023-11-07 Thread Xiang Gao
Hi all, I have some questions about the implementation of vector32_is_highbit_set on arm. Below is the comment and the implementation for this function. /* * Exactly like vector8_is_highbit_set except for the input type, so it * looks at each byte separately. * * XXX x86 uses the same underly

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-07 Thread Xiang Gao
On Mon, 6 Nov 2023 13:16:13PM -0600, Nathan Bossart wrote: >>> The idea is that we don't want to start forcing runtime checks on builds >>>where we aren't already doing runtime checks. IOW if the compiler can use >>>the ARMv8 CRC instructions with the default compiler flags, we should only >>>use

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-03 Thread Xiang Gao
On Date: Thu, 2 Nov 2023 09:35:50AM -0500, Nathan Bossart wrote: >On Thu, Nov 02, 2023 at 06:17:20AM +0000, Xiang Gao wrote: >> After reading the discussion, I understand that in order to avoid performance >> regression in some instances, we need to try our best to avoid runtime

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-01 Thread Xiang Gao
On Tue, 31 Oct 2023 15:48:21PM -0500, Nathan Bossart wrote: >> Thanks. I went ahead and split this prerequisite part out to a separate >> thread [0] since it's sort-of unrelated to your proposal here. It's not >> really a prerequisite, but I do think it will simplify things a bit. >Per the other

RE: CRC32C Parallel Computation Optimization on ARM

2023-10-27 Thread Xiang Gao
On Thu, 26 Oct, 2023 11:37:52AM -0500, Nathan Bossart wrote: >> We consider that a runtime check needs to be done in any scenario. >> Here we only confirm that the compilation can be successful. > >A runtime check will be done when choosing which algorithm. > >You can think of us as merging USE_ARM

RE: CRC32C Parallel Computation Optimization on ARM

2023-10-26 Thread Xiang Gao
On Tue, 24 Oct, 2023 20:45:39PM -0500, Nathan Bossart wrote: >I tried this. pg_waldump on 2 million ~8kB records took around 8.1 seconds >without the patch and around 7.4 seconds with it (an 8% improvement). >pg_waldump on 1 million ~16kB records took around 3.2 seconds without the >patch and a

RE: CRC32C Parallel Computation Optimization on ARM

2023-10-26 Thread Xiang Gao
On Wed, 25 Oct, 2023 at 10:43:25 -0500, Nathan Bossart wrote: >+pg_crc32c >+pg_comp_crc32c_with_vmull_armv8(pg_crc32c crc, const void *data, size_t len) >It looks like most of this function is duplicated from >pg_comp_crc32c_armv8(). I understand that we probably need a separate >function becau

RE: CRC32C Parallel Computation Optimization on ARM

2023-10-24 Thread Xiang Gao
Thanks for your suggestion, this is the modified patch and two test files. -Original Message- From: Michael Paquier Sent: Friday, October 20, 2023 4:19 PM To: Xiang Gao Cc: pgsql-hackers@lists.postgresql.org Subject: Re: CRC32C Parallel Computation Optimization on ARM On Fri, Oct 20

CRC32C Parallel Computation Optimization on ARM

2023-10-20 Thread Xiang Gao
Hi all This patch uses a parallel computing optimization algorithm to improve crc32c computing performance on ARM. The algorithm comes from Intel whitepaper: crc-iscsi-polynomial-crc32-instruction-paper. Input data is divided into three equal-sized blocks.Three parallel blocks (crc0, crc1, crc2

Optimize Arm64 crc32 implementation in PostgreSQL

2023-08-21 Thread Xiang Gao
Hi all, Currently PostgreSQL has three different variants of a 32-bit CRC calculation: CRC-32C, CRC-32(Ethernet polynomial), and a legacy CRC-32 version that uses the lookup table. Some ARMv8 (AArch64) CPUs implement the CRC32 extension which is equivalent with CRC-32(Ethernet polynomial), so th