On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote: > On Wed, 22 Nov 2023 at 11:44, John Naylor <johncnaylo...@gmail.com> wrote: >> Poking in those files a bit, I also see references to building with >> SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect >> function call is surely worth it for page-sized data)
Yes, I think we should, but we also need to be careful not to hurt performance on platforms that aren't able to benefit [0] [1]. There are a couple of other threads about adding support for newer instructions [2] [3], and properly detecting the availability of these instructions seems to be a common obstacle. We have a path forward for stuff that's already using a runtime check (e.g., CRC32C), but I think we're still trying to figure out what to do for things that must be inlined (e.g., simd.h). One half-formed idea I have is to introduce some sort of ./configure flag that enables all the newer instructions that your CPU understands. It would also remove any existing runtime checks. This option would make it easy to take advantage of the newer instructions if you are building Postgres for only your machine (or others just like it). > For reference, executing the page checksum 10M times on a AMD 3900X CPU: > > clang-14 -O2 4.292s (17.8 GiB/s) > clang-14 -O2 -msse4.1 2.859s (26.7 GiB/s) > clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s) Nice. I've noticed similar improvements with AVX2 intrinsics in simd.h. [0] https://postgr.es/m/2613682.1698779776%40sss.pgh.pa.us [1] https://postgr.es/m/36329.1699325578%40sss.pgh.pa.us [2] https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com [3] https://postgr.es/m/db9pr08mb6991329a73923bf8ed4b3422f5...@db9pr08mb6991.eurprd08.prod.outlook.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com