https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113467
Kacper Słomiński <kacper.slominski72 at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kacper.slominski72 at gmail dot co | |m --- Comment #26 from Kacper Słomiński <kacper.slominski72 at gmail dot com> --- Created attachment 57232 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57232&action=edit reduced standalone failing part of the libgcrypt test I've isolated the failing part of the test, inlined the relevant libgcrypt code and reduced it. Worth noting is that in the bad output, limbs 1-8 (inclusive) are all off by 1 vs the good output (limbs 0 and 9-15 stay the same). This is because, when computing the carry, the faulty code adds a whole 8-element vector of 1s to the output, instead of only a single 1 to the output limb (because the carry loop terminates early after one iteration). The faulty generated code is: 8049280: b8 01 00 00 00 mov $0x1,%eax ... x = *s1_ptr++ + 1; 804928d: c5 f9 6e c0 vmovd %eax,%xmm0 ... x = *s1_ptr++ + 1; 8049296: c4 e2 7d 58 c0 vpbroadcastd %xmm0,%ymm0 804929b: c5 fd fe 42 e0 vpaddd -0x20(%edx),%ymm0,%ymm0 *res_ptr++ = x; 80492a0: c5 fe 7f 42 e0 vmovdqu %ymm0,-0x20(%edx) While in the working binary (produced by gcc 13) the relevant code is: x = *s1_ptr++ + 1; 1270: 8b 54 81 04 mov 0x4(%ecx,%eax,4),%edx 1274: 42 inc %edx *res_ptr++ = x; 1275: 89 54 81 04 mov %edx,0x4(%ecx,%eax,4)