On Sat, May 31, 2025 at 2:41 AM Eduard Stefes <eduard.ste...@ibm.com> wrote:
>
> On Thu, 2025-05-08 at 05:23 +0700, John Naylor wrote:
>
> > This case is a bit different, since Arm can compute hardware CRC on
> > any input size. The fast path here is only guaranteed to be taken at
> > inputs of 79 bytes or bigger, with the 64-byte main loop and 16-byte
> > alignment. For larger inputs, an indirect call shouldn't matter, so
> > to
> > keep it simple I'd recommend having only the runtime check.  And for
> > smaller inputs call sb8 directly -- this will avoid the indirect call
> > which in turn just jumps to sb8. This is important because the server
> > computes CRC on a 20-byte input for the WAL record header while
> > holding a lock.
>
> I worked on the code and got it working on 16 byte chunks so the WAL
> writes will directly benefit from this. I will try to calculate and add
> the polynomials for the smaller chunks. Maybe we where wrong and we
> will still see a speed improvement, also for the smaller pieces.
> However that is something that I cannot tackle immediately. I'll come
> back to this in the summer.

The measurements from your previous message used 16-byte chunks -- did
that have the necessary constants to get the correct answer?

I'm not sure what you're in doubt about, but I guess it will be clear
with your next patch.

> > Something like:
> >
> > #define COMP_CRC32C(crc, data, len)  \
> >   ((crc) = pg_comp_crc32c_dispatch((crc), (data), (len)))
> >
> > static inline
> > pg_crc32c
> > pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
> > {
> >   /*
> >   if (len < VX_MIN_LEN + VX_ALIGN_MASK)
> >   {
> >     /*
> >      * For inputs small enough that we may not take the vector path,
> >      * just call sb8 directly.
> >      */
> >     return pg_comp_crc32c_sb8(crc, data, len);;
> >   }
> >   else
> >     /* Otherwise, use a runtime check for S390_VX instructions. */
> >     return pg_comp_crc32c(crc, data, len);
> > }
>
> Is static inline generally preferred here or should I do something
> like:
>
> #define COMP_CRC32C(crc, data, len)          \
>   ((crc) = (len) < MAX_S390X_CRC ?           \
>     pg_comp_crc32c_sb8((crc),(data),(len)) : \
>     pg_comp_crc32c((crc), (data), (len)))

That macro is actually better -- the static inline was used for x86
because it has looping which is not the case for S390X.

> > Does the build still fall back to sb8 with a warning, or does it
> > fail?
> > It'd simpler if the guard against certain clang versions went into
> > the
> > choose function, so it can fall back to sb8.
>
> My current implementation will silently fall back to sb8 and not
> compile any of the s390x code.

That seems like a good design.

> Other possible strategies are:
>
> - print a warning and fall back to sb0
> - fail the compile and inform the user to switch compiler or to disable
> crc32vx in the build system
> - move all the checks into the .c file and to this with with macros
>
> indeed for zlib-ng we went the way to put all the checks into the code
> and fail the build if compiled with a known broken compiler.
>
> My arguments to not do the same in postgres are:
>
> - postgres uses sb8 on s390x already so by NOT changing it to another
> algorithm we do not change the expected behavior.
> - imho, this is something that the build system should check and take
> care of, not the code that's getting compiled.

I'm inclined to agree, but I wonder if these guards should go inside
the link check. I.e. the separate config machinery for "have bad
compiler" seems like unnecessary clutter.

--
John Naylor
Amazon Web Services


Reply via email to