https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120996
--- Comment #10 from Dhruv Chawla <dhruvc at nvidia dot com> --- Looks like there's not much codegen change, but the difference does show up in the thread1 pass: https://godbolt.org/z/xz5qo1xTd (which is what I was comparing to create the minimized source)