https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- Btw, for me module_configure.fppized.f90 is much more problematic, compiling longest and using most memory. IIRC that one has long series of initialization expressions. And load CSE after reload : 143.49 ( 41%) 0.02 ( 3%) 143.52 ( 41%) 1001 kB ( 0%) (known issue I think) Then there's module_alloc_space_0.fppized.f90 with similar load CSE after reload : 55.97 ( 61%) 0.00 ( 0%) 55.97 ( 60%) 341 kB ( 0%) and more of these... :/ And module_domain.fppized.f90 with machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( 95%) 54 kB ( 0%) that's probably STV ... same for module_dm.fppized.f90 module_first_rk_step_part1.fppized.f90 compile is also slow with callgraph ipa passes : 21.30 ( 14%) 0.13 ( 9%) 21.44 ( 14%) 95303 kB ( 11%) alias stmt walking : 17.93 ( 12%) 0.12 ( 8%) 18.16 ( 12%) 136 kB ( 0%) tree FRE : 14.19 ( 9%) 0.03 ( 2%) 14.32 ( 9%) 2744 kB ( 0%) complete unrolling : 6.07 ( 4%) 0.02 ( 1%) 6.08 ( 4%) 95401 kB ( 11%) load CSE after reload : 33.62 ( 22%) 0.01 ( 1%) 33.63 ( 22%) 174 kB ( 0%) and solve_em.fppized.f90 might be similar. Looking at .original of module_first_rk_step_part1.fppized.f90 it is the decomposed "grid" that gets passed along causing all the re-packs. So the caller has SUBROUTINE first_rk_step_part1 ( grid , ... TYPE ( domain ), INTENT(INOUT) :: grid ... CALL phy_prep ( config_flags, & grid%mut, grid%muu, grid%muv, grid%u_2, & grid%v_2, grid%p, grid%pb, grid%alt, & grid%ph_2, grid%phb, grid%t_2, grid%tsk, moist, num_moist, & grid%rho,th_phy, p_phy, pi_phy, grid%u_phy, grid%v_phy, & p8w, t_phy, t8w, grid%z, grid%z_at_w, dz8w, & grid%p_hyd, grid%p_hyd_w, grid%dnw, & grid%fnm, grid%fnp, grid%znw, grid%p_top, & grid%rthraten, & grid%rthblten, grid%rublten, grid%rvblten, & grid%rqvblten, grid%rqcblten, grid%rqiblten, & grid%rucuten, grid%rvcuten, grid%rthcuten, & grid%rqvcuten, grid%rqccuten, grid%rqrcuten, & grid%rqicuten, grid%rqscuten, & grid%rushten, grid%rvshten, grid%rthshten, & grid%rqvshten, grid%rqcshten, grid%rqrshten, & grid%rqishten, grid%rqsshten, grid%rqgshten, & grid%rthften, grid%rqvften, & grid%RUNDGDTEN, grid%RVNDGDTEN, grid%RTHNDGDTEN, & grid%RPHNDGDTEN,grid%RQVNDGDTEN, grid%RMUNDGDTEN,& !jdf grid%landmask,grid%xland, & !jdf ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & grid%i_start(ij), grid%i_end(ij), & grid%j_start(ij), grid%j_end(ij), & k_start, k_end ) and more of that while TYPE (domain) having real ,DIMENSION(:,:,:) ,POINTER :: rucuten real ,DIMENSION(:,:) ,POINTER :: mut ... so here are the assumed-shaped arrays. Note the packing is done conditional like contiguous.11171 = (D.83839.dim[0].stride == 1 && D.83839.dim[1].stride == D.83839.dim[0].stride * ((D.83839.dim[0].ubound - D.83839.dim[0].lbound) + 1)) && D.83839.dim[2].stride == D.83839.dim[1].stride * ((D.83839.dim[1].ubound - D.83839.dim[1].lbound) + 1); if (__builtin_expect ((integer(kind=8)) contiguous.11171, 1, 50)) { arg_ptr.11170 = (real(kind=4)[0:] * restrict) grid->u_phy.data; } else { D.83779 = (real(kind=4)[0:] *) grid->u_phy.data; ... repack ... } so this simply exposes quite a number of loop nests in this file while there were no loops but only calls before (repack + the actual calls). Given calls might be inlined it seems to be worth expanding the repacking inline. IIRC the original motivation of adding the inline expansion was exactly such a case, correct? So a testcase for the "regression" would be a function with a single call stmt with a _lot_ of arguments all in need of repacking.