On 1/12/25 11:47 AM, Jakub Jelinek wrote:
Hi!

As mentioned in the second PR, using table names like
crc_table_for_crc_8_polynomial_0x12
in the user namespace is wrong, user could have defined such variables
in their code and as can be seen on the last testcase, then it just
misbehaves.
At minimum such names should start with 2 underscores, moving it into
implementation namespace, and if possible have some dot or dollar in the
name if target supports it.
Totally agreed. At some point the namespace and related issues came up, but I think that was a much earlier version and I (obviously incorrectly) thought those issues had been addressed when Mariam rewrote this code.

It might be nice to share the tables between TUs in the same binary or
shared library, but it in that case should have hidden visibility if
possible, so that it isn't exported from the libraries or binaries, we don't
want the optimization to affect set of exported symbols from libraries.
Likewise. Note that it's probably not *that* likely that we'll have multiple instances of the table in an executable/DSO. I don't think that ever got tested, it was as much a "nice to have" as much as anything.



Now, all of the above (except sharing between multiple TUs) is already
implemented in output_constant_def, so I think it is much better to just
use that function.
As I mentioned in the PR.  Agreed.


And, if we want to share it between multiple TUs, we could extend the
SHF_MERGE usage in gcc, currently we only use it for constant pool
entries with same size as alignment, from 1 to 32 bytes, using .rodata.cstN
sections.  We could just use say .rodata.cstM.N sections where M would be
alignment and N would be the entity size.  We could use that for all
constant pool entries say up to 2048 bytes.
Though, as the current code doesn't share between multiple TUs, I think it
can be done incrementally (either still for GCC 15, or GCC 16+).
If it's not going to "just work", then I'd probably defer trying to make sharing across multiple TUs work until we see it being useful in practice.


Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, on
aarch64 it also fixes
-FAIL: crypto/rsa
-FAIL: hash
ok for trunk?

2025-01-12  Jakub Jelinek  <ja...@redhat.com>

        PR tree-optimization/117997
        PR middle-end/118415
        * expr.cc (assemble_crc_table): Make static, remove id argument,
        use output_constant_def.  Emit note if -fdump-rtl-expand-details
        about which table has been emitted.
        (generate_crc_table): Make static, adjust assemble_crc_table
        caller, call it always.
        (calculate_table_based_CRC): Make static.
        * internal-fn.cc (expand_crc_optab_fn): Emit note if
        -fdump-rtl-expand-details about using optab for crc.  Formatting fix.

        * gcc.dg/crc-builtin-target32.c: Add -fdump-rtl-expand-details
        as dg-additional-options.  Scan expand dump rather than assembly,
        adjust the regexps.
        * gcc.dg/crc-builtin-target64.c: Likewise.
        * gcc.dg/crc-builtin-rev-target32.c: Likewise.
        * gcc.dg/crc-builtin-rev-target64.c: Likewise.
        * gcc.dg/pr117997.c: New test.
        * gcc.dg/pr118415.c: New test.
OK. Thanks a ton for taking care of this. I hadn't been able to spend more than a few minutes here an there on it. Not sure what it's priority was, but I'd always assumed it'd end up as a P1 regression due to it being a code correctness regression, so figured it'd be at the top of the list once we got into stage4. It's obviously better to have it out of the way sooner rather than later.

Jeff

Reply via email to