(Unfortunately, I'm posting this too late for the November commitfest, but I'm hoping this will be the first in a series of proposed improvements involving SIMD instructions for v17.)
Presently, we ask compilers to autovectorize checksum.c and numeric.c. The page checksum code actually lives in checksum_impl.h, and checksum.c just includes it. But checksum_impl.h is also used in pg_upgrade/file.c and pg_checksums.c, and since we don't ask compilers to autovectorize those files, the page checksum code may remain un-vectorized. The attached patch is a quick attempt at adding CFLAGS_UNROLL_LOOPS and CFLAGS_VECTORIZE to the CFLAGS for the aforementioned objects. The gains are modest (i.e., some system CPU and/or a few percentage points on the total time), but it seemed like a no-brainer. Separately, I'm wondering whether we should consider using CFLAGS_VECTORIZE on the whole tree. Commit fdea253 seems to be responsible for introducing this targeted autovectorization strategy, and AFAICT this was just done to minimize the impact elsewhere while optimizing page checksums. Are there fundamental problems with adding CFLAGS_VECTORIZE everywhere? Or is it just waiting on someone to do the analysis/benchmarking? [0] https://postgr.es/m/1367013190.11576.249.camel%40sussancws0025 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
diff --git a/src/bin/pg_checksums/Makefile b/src/bin/pg_checksums/Makefile index ac736b2260..3f946ee9d6 100644 --- a/src/bin/pg_checksums/Makefile +++ b/src/bin/pg_checksums/Makefile @@ -22,6 +22,8 @@ OBJS = \ $(WIN32RES) \ pg_checksums.o +pg_checksums.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE} + all: pg_checksums pg_checksums: $(OBJS) | submake-libpgport diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile index bde91e2beb..b344b59da2 100644 --- a/src/bin/pg_upgrade/Makefile +++ b/src/bin/pg_upgrade/Makefile @@ -28,6 +28,8 @@ OBJS = \ util.o \ version.o +file.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE} + override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)