https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107051
--- Comment #2 from absoler at smail dot nju.edu.cn --- (In reply to Richard Biener from comment #1) > With -O2 I see > > func_1: > .LFB0: > .cfi_startproc > movl e(%rip), %eax > testl %eax, %eax > je .L2 > .L3: > jmp .L3 > .p2align 4,,10 > .p2align 3 > .L2: > movq g_284+8(%rip), %rax > movq %rax, g_284(%rip) > ret > > note that with -O1 we retain > > c = g_284[1]; > c$f0_3 = g_284[1].f0; > c.f0 = c$f0_3; > g_284[0] = c; > > after GIMPLE optimization which possibly explains this compared to > > c = g_284[1]; > g_284[0] = c; > > with -O2. for gcc-13.2.0 -O2, it seems still forget to remove the load for this reduced case: ``` union U0 { short f1; int f2; }; union U0 g1, g2; volatile int flag; void func_1() { union U0 d[1] = {{.f1 = 1}}; for (; flag;) ; d[0] = g2; g1 = d[0]; } ``` ``` func_1: .LFB0: .cfi_startproc .p2align 4,,10 .p2align 3 .L2: movl flag(%rip), %eax testl %eax, %eax jne .L2 movl g2(%rip), %eax movw g2(%rip), %ax movl %eax, g1(%rip) ret .cfi_endproc ```