On Wed, 16 Aug 2023 at 21:10, Alexander Monakov <amona...@ispras.ru> wrote: > > > On Tue, 15 Aug 2023, Jeff Law wrote: > > > Because if the compiler can optimize it automatically, then the projects > > have > > to do literally nothing to take advantage of it. They just compile normally > > and their bitwise CRC gets optimized down to either a table lookup or a > > clmul > > variant. That's the real goal here. > > The only high-profile FOSS project that carries a bitwise CRC implementation > I'm aware of is the 'xz' compression library. There bitwise CRC is used for > populating the lookup table under './configure --enable-small': > > https://github.com/tukaani-project/xz/blob/2b871f4dbffe3801d0da3f89806b5935f758d5f3/src/liblzma/check/crc64_small.c > > It's a well-reasoned choice and your compiler would be undoing it > (reintroducing the table when the bitwise CRC is employed specifically > to avoid carrying the table). > > > One final note. Elsewhere in this thread you described performance > > concerns. > > Right now clmuls can be implemented in 4c, fully piped. > > Pipelining doesn't matter in the implementation being proposed here, because > the builtin is expanded to > > li a4,quotient > li a5,polynomial > xor a0,a1,a0 > clmul a0,a0,a4 > srli a0,a0,crc_size > clmul a0,a0,a5 > slli a0,a0,GET_MODE_BITSIZE (word_mode) - crc_size > srli a0,a0,GET_MODE_BITSIZE (word_mode) - crc_size > > making CLMULs data-dependent, so the second can only be started one cycle > after the first finishes, and consecutive invocations of __builtin_crc > are likewise data-dependent (with three cycles between CLMUL). So even > when you get CLMUL down to 3c latency, you'll have two CLMULs and 10 cycles > per input block, while state of the art is one widening CLMUL per input block > (one CLMUL per 32-bit block on a 64-bit CPU) limited by throughput, not > latency. > > > I fully expect that latency to drop within the next 12-18 months. In that > > world, there's not going to be much benefit to using hand-coded libraries vs > > just letting the compiler do it.
I would also hope that the hand-coded libraries would eventually have a code path for compilers that support the built-in. For what it's worth, there now is CRC in Boost: https://www.boost.org/doc/libs/1_83_0/doc/html/crc.html Cheers, philipp.