https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68541
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- It also looks like the "inner loop" will always have two exits now, one to the outer loop and one to the original exit block. That doesn't sound like sth desirable in any way (unless we can jump-thread one of the exits). The single testcase added points at followup transforms in the duplicated tail which would again mean that having the CFG transform separate from any analysis and actual optimization is bad. In this case it fits more into FSM "threading" territorry. Not yet suggesting to remove this pass again, but close... at _least_ suggesting to limit it to -O3, optimize_function_for_speed (), the duplication to the existing DOM threading param. Btw, the added testcase doesn't show any difference in generated assembly that looks like an improvement vs. -fno-split-paths.