Fwd: RFA: crc builtin functions & optimizations

Joern Rennecke Tue, 15 Mar 2022 05:52:27 -0700

Oops, that was meant to go to the list too.

On Tue, 15 Mar 2022 at 01:04, Andrew Pinski <pins...@gmail.com> wrote:
>
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
> <joern.renne...@embecosm.com> wrote:
> >
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> >
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
>
>
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.

You are correct that the documentation could use some work, but that part
would go into extend.texi .

> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?

It's a constant integer.
I haven't found a CRC in https://gcc.gnu.org/wiki .
If you mean wikipedia.org, they focus mainly on big endian CRC.  I've added
a function code IFN_CRC_BE for it because for completeness it should be
there, but haven't fleshed out anything further around that.  IFN_CRC and
its associated built-in functions are little-endian.  If you look at the start
of the confg/riscv/crc.md patch, there is a comment with a simple C
implementation of crchihi4.

> Does it make sense to have to list the most common polynomials in the
> documentation?

Maybe.  You could give advice on what makes cryptographic sense for
people who want to use CRCs in their code for integrity checks.
Or once some ports with special-purpose instructons are supported,
there could be comments on which polynoms will result in faster operation
because of the specialized expansion for the respective targets.

> Also I am sorry but micro-optimizing coremarks is just wrong.

The claim for that benchmark is that it tests a set of common
operations, including CRC calculations.  Without compiler support,
what we test instead is how well this particular implementation of CRC is
compiled for the target CPU, which can be very different from the actual
CRC computation performance.  So recognizing the CRC computation
helps the benchmark archive the stated goal of gauging CRC computation
performance.

Moreover, since the benchmark is commonly used, this also makes
it a commonly used idiom, and the license allows to copy the code
into your own programs to a large extent.

> Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.

I'm not sure what's inside there, but in principle, the more the merrier.
I had a look at the bzip2 CRC computation, but that's just a table
lookup.  We can recognize table lookups that compute a CRC if the
array is a constant, but there is no point if you haven't either a faster
implementation or want further optimization to be enabled.  Going
there was beyond the scope of my work at this time.

In principle, it would be interesting to do reduction / vectorization of
block CRC computations.  But you have to start with having a
representation for the CRC computations first.

> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

Do you want to contribute that?

> Also a huge problem, you don't check to make sure the third argument
> to the crc builtin function is constant in the rsicv backend.
Why is that a huge problem?  I see it as a further refinement not yet
added.  Strictly speaking, there is a check, but it's an assert, OTOH
it shouldn't be triggered with the infrstructure as it is now because the
optimizer only looks for a computation with a constant polynom, and
the third argument of the builtin crc functions is BT_CONST_SIZE for
now.  Variable polynoms are interesting, but before we introduce them,
we must make sure that constants remain inside the builtin function,
lest we get severe perfromance degradation if table lookups and
special instructions are not available.

> Plus
> since you expose the crc builtins as a non target specific builtin, I
> assume there should be a libcall right and therefore either you need
> to have a generic expansion for it or a function added to libgcc?

I thought we had other builtin functions that are not supported everywhere?
If we must have a default implementation, I suppose a looping one
in libgcc is the simplest one, and it can cover variable polynoms
(even if it's not needed right away).

> Also why not expand generically to the table or a loop based on a
> target hook instead of in the backend? This way a target can choose
> without even doing much.

I thought the optabs interface was already flexible enough.  Yes,
the work of the target author is simpler if you put pre-made functions
for expanding a shft loop / rotate loop / table lookup into the backend.

At any rate, I don't know when I next work on this, so further contributions
are welcome.

Fwd: RFA: crc builtin functions & optimizations

Reply via email to