https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #26 from Jakub Jelinek <jakub at gcc dot gnu.org> --- The above slightly simplified (dead var removal, preprocessing etc.): typedef struct __attribute__((__packed__)) _Atom { float x, y, z; int type; } Atom; typedef struct __attribute__((__packed__)) _FFParams { int hbtype; float radius; float hphb; float elsc; } FFParams; void fasten_main (unsigned long group, unsigned long natlig, unsigned long natpro, const Atom *protein, const Atom *ligand, const FFParams *forcefield, float *energies) { float etot[64]; float lpos_x[64]; for (int l = 0; l < 64; l++) { etot[l] = 0.f; lpos_x[l] = 0.f; } for (int il = 0; il < natlig; il++) { const Atom l_atom = ligand[il]; const FFParams l_params = forcefield[l_atom.type]; for (int ip = 0; ip < natpro; ip++) { const Atom p_atom = protein[ip]; const FFParams p_params = forcefield[p_atom.type]; const float radij = p_params.radius + l_params.radius; const float elcdst = (p_params.hbtype == 70 && l_params.hbtype == 70) ? 4.0f : 2.0f; const float elcdst1 = (p_params.hbtype == 70 && l_params.hbtype == 70) ? 0.25f : 0.5f; const int type_E = ((p_params.hbtype == 69 || l_params.hbtype == 69)); const float chrg_init = l_params.elsc * p_params.elsc; for (int l = 0; l < 64; l++) { const float x = lpos_x[l] - p_atom.x; const float distij = (x * x); const float distbb = distij - radij; const int zone1 = (distbb < 0.0f); float chrg_e = chrg_init * ((zone1 ? 1.0f : (1.0f - distbb * elcdst1)) * (distbb < elcdst ? 1.0f : 0.0f)); float neg_chrg_e = -__builtin_fabsf(chrg_e); chrg_e = type_E ? neg_chrg_e : chrg_e; etot[l] += chrg_e * 45.0f; } } } for (int l = 0; l < 64; l++) energies[group * 64 + l] = etot[l] * 0.5f; } The r13-2266 to r13-2267 diff indeed starts during threadfull1, the dump says: ... Registering killing_def (path_oracle) distbb_75 - Registering value_relation (path_oracle) (iftmp.0_32 <= distbb_75) (root: bb16) -path: 16->18->xx REJECTED -Checking profitability of path (backwards): bb:18 (3 insns) bb:16 (6 insns) bb:23 - Control statement insns: 2 - Overall: 7 insns - - Registering killing_def (path_oracle) distbb_75 - Registering value_relation (path_oracle) (iftmp.0_32 <= distbb_75) (root: bb23) -path: 23->16->18->xx REJECTED -Checking profitability of path (backwards): bb:18 (3 insns) bb:16 (6 insns) bb:23 (3 insns) bb:15 - Control statement insns: 2 - Overall: 10 insns - FAIL: Did not thread around loop and would copy too many statements. -Checking profitability of path (backwards): bb:18 (3 insns) bb:16 (6 insns) bb:23 (3 insns) bb:22 (latch) - Control statement insns: 2 - Overall: 10 insns - FAIL: Did not thread around loop and would copy too many statements. +Checking profitability of path (backwards): + [10] Registering jump thread: (16, 18) incoming edge; (18, 20) nocopy; +path: 16->18->20 SUCCESS Checking profitability of path (backwards): bb:20 (7 insns) bb:18 Control statement insns: 2 Overall: 5 insns ... etc. Though, I know nothing about the threader and don't see suspect ranges in the decisions there.