> On Aug 9, 2023, at 2:32 AM, Alexander Monakov <amona...@ispras.ru> wrote:
>
>
> On Tue, 8 Aug 2023, Jeff Law wrote:
>
>> If the compiler can identify a CRC and collapse it down to a table or clmul,
>> that's a major win and such code does exist in the real world. That was the
>> whole point behind the Fedora experiment -- to determine if these things are
>> showing up in the real world or if this is just a benchmarking exercise.
>
> Can you share the results of the experiment and give your estimate of what
> sort of real-world improvement is expected? I already listed the popular
> FOSS projects where CRC performance is important: the Linux kernel and
> a few compression libraries. Those projects do not use a bitwise CRC loop,
> except sometimes for table generation on startup (which needs less time
> than a page fault that may be necessary to bring in a hardcoded table).
>
> For those projects that need a better CRC, why is the chosen solution is
> to optimize it in the compiler instead of offering them a library they
> could use with any compiler?
>
> Was there any thought given to embedded projects that use bitwise CRC
> exactly because they little space for a hardcoded table to spare?
Or those that use smaller tables -- for example, the classic VAX microcode
approach with a 16-entry table, doing CRC 4 bits at a time.
I agree that this seems an odd thing to optimize. CRC is a well known CPU hog
with well established efficient solutions, and it's hard to see why anyone who
needs good performance would fail to understand and apply that knowledge.
paul