On Sat, May 31, 2025 at 2:41 AM Eduard Stefes <eduard.ste...@ibm.com> wrote: > > On Thu, 2025-05-08 at 05:23 +0700, John Naylor wrote: > > > This case is a bit different, since Arm can compute hardware CRC on > > any input size. The fast path here is only guaranteed to be taken at > > inputs of 79 bytes or bigger, with the 64-byte main loop and 16-byte > > alignment. For larger inputs, an indirect call shouldn't matter, so > > to > > keep it simple I'd recommend having only the runtime check. And for > > smaller inputs call sb8 directly -- this will avoid the indirect call > > which in turn just jumps to sb8. This is important because the server > > computes CRC on a 20-byte input for the WAL record header while > > holding a lock. > > I worked on the code and got it working on 16 byte chunks so the WAL > writes will directly benefit from this. I will try to calculate and add > the polynomials for the smaller chunks. Maybe we where wrong and we > will still see a speed improvement, also for the smaller pieces. > However that is something that I cannot tackle immediately. I'll come > back to this in the summer.
The measurements from your previous message used 16-byte chunks -- did that have the necessary constants to get the correct answer? I'm not sure what you're in doubt about, but I guess it will be clear with your next patch. > > Something like: > > > > #define COMP_CRC32C(crc, data, len) \ > > ((crc) = pg_comp_crc32c_dispatch((crc), (data), (len))) > > > > static inline > > pg_crc32c > > pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len) > > { > > /* > > if (len < VX_MIN_LEN + VX_ALIGN_MASK) > > { > > /* > > * For inputs small enough that we may not take the vector path, > > * just call sb8 directly. > > */ > > return pg_comp_crc32c_sb8(crc, data, len);; > > } > > else > > /* Otherwise, use a runtime check for S390_VX instructions. */ > > return pg_comp_crc32c(crc, data, len); > > } > > Is static inline generally preferred here or should I do something > like: > > #define COMP_CRC32C(crc, data, len) \ > ((crc) = (len) < MAX_S390X_CRC ? \ > pg_comp_crc32c_sb8((crc),(data),(len)) : \ > pg_comp_crc32c((crc), (data), (len))) That macro is actually better -- the static inline was used for x86 because it has looping which is not the case for S390X. > > Does the build still fall back to sb8 with a warning, or does it > > fail? > > It'd simpler if the guard against certain clang versions went into > > the > > choose function, so it can fall back to sb8. > > My current implementation will silently fall back to sb8 and not > compile any of the s390x code. That seems like a good design. > Other possible strategies are: > > - print a warning and fall back to sb0 > - fail the compile and inform the user to switch compiler or to disable > crc32vx in the build system > - move all the checks into the .c file and to this with with macros > > indeed for zlib-ng we went the way to put all the checks into the code > and fail the build if compiled with a known broken compiler. > > My arguments to not do the same in postgres are: > > - postgres uses sb8 on s390x already so by NOT changing it to another > algorithm we do not change the expected behavior. > - imho, this is something that the build system should check and take > care of, not the code that's getting compiled. I'm inclined to agree, but I wonder if these guards should go inside the link check. I.e. the separate config machinery for "have bad compiler" seems like unnecessary clutter. -- John Naylor Amazon Web Services