On Fri, May 24, 2024, 04:42 Mariam Arutunian <mariamarutun...@gmail.com> wrote:
> Hello! > This patch set detects bitwise CRC implementation loops (with branches) in > the GIMPLE optimizers and replaces them with more optimal CRC > implementations in RTL. These patches introduce new internal functions, > built-in functions, and expanders for CRC generation, leveraging hardware > instructions where available. Additionally, various tests are included to > check CRC detection and generation. Main Features: > > 1. > > CRC Loop Detection and Replacement: > - Detection of CRC loops involves two stages: fast checks to identify > potential candidates and verification using symbolic execution. The > algorithm detects only CRCs (8, 16, 32, and 64 bits, both bit-forward > and > bit-reversed) with constant polynomials used without the leading 1. This > part can be improved to detect more implementation types. > - Once identified, the CRC loops are replaced with calls to newly > added internal functions. These internal functions use target-specific > expanders if available, otherwise generating table-based CRCs. > 2. > > Architecture-Specific Expanders: > - Expanders are added for RISC-V, aarch64, and i386 architectures. > - These expanders generate CRCs using either carry-less > multiplication instructions or direct CRC instructions, based on the > target > architecture's capabilities. > 3. > > New Internal and Built-In Functions: > - Introduces internal functions and built-in functions for CRC > generation, supporting various CRC and data sizes (8, 16, 32, and 64 > bits). > > I presented this work during the GNU Tools Cauldron 2023. You can view the > presentation here: GCC CRC optimization presentation > <https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=GCC+CRC+optimization.pdf> > . > > Previously, I submitted a patch to GCC upstream that included built-in > parts and expanders for RISC-V. However, the main component of the > previously sent patch has been changed. You can find the patch here: > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626279.html > > > Best regards, > Mariam > Could this detect a specific CRC32C calculation as well (vs any urge CRC) and replace it with optimized calls to the dedicated crc32c hardware instructions via __builtin_ia32_crc32*i? I'm asking because ironically I was just a few days ago trying to see how current GCC optimizes simpler but slower approaches compared to hand tuning. For example, https://github.com/htot/crc32c. Almost every implementation I found was ultimately based on Mark Adler's (of zlib and that fun Mars mission fame) implementation posted to stack overflow here: https://stackoverflow.com/a/17646775 Because so many current libraries are based on that, it would be nice to see if GCC can improve upon it with your patch. Or if they are no longer necessary, because your patch optimizes the simpler software approach to basically yield Mark's solution. >