On Wed, 16 Aug 2023, Philipp Tomsich wrote:

> > > I fully expect that latency to drop within the next 12-18 months.  In that
> > > world, there's not going to be much benefit to using hand-coded libraries 
> > > vs
> > > just letting the compiler do it.
> 
> I would also hope that the hand-coded libraries would eventually have
> a code path for compilers that support the built-in.

You seem to be working with the false assumption that the interface of the
proposed builtin matches how high-performance CRC computation is structured.
It is not. State-of-the-art CRC keeps unreduced intermediate residual, split
over multiple temporaries to allow overlapping CLMULs in the CPU. The
intermediate residuals are reduced only once, when the final CRC value is
needed. In constrast, the proposed builtin has data dependencies between
all adjacent instructions, and cannot allow the CPU to work at IPC > 1.0.

Shame how little you apparently understand of the "mindbending math".

Alexander

Reply via email to