https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Bug ID: 100363 Summary: gcc generating wider load/store than warranted at -O3 Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: vgupta at synopsys dot com Target Milestone: --- Created attachment 50722 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50722&action=edit test case with an additional nop to annotate codegen In Linux kernel's initramfs gzip inflate code, an inner copy loop using unsigned short pointers (src/dst) is generated with wider 8 or 16-byte at a time (vs. 2 bytes at a time) causing extra/unintended bytes to be copied - leading to corruption of inflated files on target. The showed up on upstream v5.6 Linux kernel built for ARC (defaults to -O3). Issue doesn't happen at -O2. Full test case attached, but the gist of it is: lib/zlib_inflate/inffast.c if (dist > 2) { unsigned short *sfrom; sfrom = (unsigned short *)(from); loops = len >> 1; do *sout++ = *sfrom++; while (--loops); out = (unsigned char *)sout; from = (unsigned char *)sfrom; } ... @sfrom and @sout are unsigned short pointers and thus expected to work on 2 bytes. However at -O3 gcc is generating wide loads (8-byte LDD/STD on ARCv2, 16-byte LDR q0 on aarch64. For aarch64, it seems there's code generated for 16-byte access as well as 2-byte, and I haven't verified if it elides the 16-byte code based on size etc - but the code is generated nonetheless. For ARC 8-byte loop is certainly executed causing bad things as described The issue was originally seen with mainline gcc 10.2 (again both ARC and aarch64) at -O3 and I can confirm it exists in gcc 9.3 as well. Attaching preprocessed source file is from ARC linux build (but builds for aarch64 too since non of arch specific functions are used here.