On 5/28/24 12:44 AM, Richard Biener wrote:
On Mon, May 27, 2024 at 5:16 PM Jeff Law <jeffreya...@gmail.com> wrote:
On 5/27/24 12:38 AM, Richard Biener wrote:
On Fri, May 24, 2024 at 10:44 AM Mariam Arutunian
<mariamarutun...@gmail.com> wrote:
This patch introduces new built-in functions to GCC for computing bit-forward
and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code for
a table-based CRC calculation.
I wonder whether for embedded target use we should arrange for the
table-based CRC calculation to be out-of-line and implemented in a
way so uses across TUs can be merged? I guess a generic
implementation inside libgcc is difficult?
I think the difficulty is the table is dependent upon the polynomial.
So we'd have to arrange to generate, then pass in the table.
In theory we could have the linker fold away duplicate tables as those
should be in read only sections without relocations to internal members.
So much like we do for constant pool entries. Though this hasn't been
tested.
The CRC implementation itself could be subject to ICF if it's off in its
own function. If it's inlined (and that's a real possibility), then
there's little hope of ICF helping on the codesize.
I was wondering of doing some "standard" mangling in the implementation
namespace and using comdat groups for both code and data?
But I'm not sure how that really solves anything given the dependencies
on the polynomial. ie, the contents of the table varies based on that
polynomial and the polynomial can (and will) differ across CRC
implementations.
Or we could just not do any of this for -Os/-Oz if the target doesn't
have a carryless multiply or crc with the appropriate polynomial. Given
the CRC table is probably larger than all the code in a bitwise
impementation, disabling for -Os/-Oz seems like a very reasonable choice.
I was mainly thinking about the case where the user uses the new builtins,
but yes, when optimizing for size we can disable the recognition of open-coded
variants.
Turns out Mariam's patch already disables this for -Os. :-)
For someone directly using the builtin, they're going to have to pass
the polynomial as a constant to the builtin, with the possible exception
of when the target has a crc instruction where the polynomial is defined
by the hardware.
Jeff