https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512
--- Comment #31 from Michael Meissner <meissner at gcc dot gnu.org> --- For the Spec 2017 521.wrf_r benchmark on little endian PowerPC power9 systems, there was no difference in runtime between a normal run using -Ofast -mcpu=power9 and one with -Ofast -mcpu=power9 -fno-inline-arg-packing. Of the seven rate benchmarks in Spec 2017 that use Fortran (548.exchange2_r, 503.bwaves_r, 507.cactuBSSN_r, 521.wrf_r, 527.cam4_r, 549.fotonik3d_r, and 554.roms_r) none of them vary by more tha 0.7% depending on whether the switch is used or not. I used the compiler checked out from the master branch on March 27, 2020 to build and run the benchmarks. As others have said, using -fno-inline-arg-packing does dramatically reduce the time it takes to compile 521.wrf_r.