Hi Rafael, On Tue, Jun 30, 2020 at 6:11 PM Rafael J. Wysocki <raf...@kernel.org> wrote: > On Tue, Jun 30, 2020 at 5:39 PM Greg Kroah-Hartman > <gre...@linuxfoundation.org> wrote: > > On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote: > > > On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven > > > <ge...@linux-m68k.org> wrote: > > > > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <sarava...@google.com> > > > > wrote: > > > > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <raf...@kernel.org> > > > > > wrote: > > > > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan > > > > > > <sarava...@google.com> wrote: > > > > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki > > > > > > > <raf...@kernel.org> wrote: > > > > > > > > Note that deferred probing gets in the way here and so the > > > > > > > > problem is > > > > > > > > related to it. > > > > > > > > > > > > > > I mean, we officially support deferred probing. Shouldn't we fix > > > > > > > it so > > > > > > > that it doesn't break suspend/resume? > > > > > > > > > > > > Yes, we should fix deferred probing. > > > > > > > > Please take into account that breakage is an actual regression. > > > > > > > > > > > Also, it's pretty easy to have > > > > > > > cases where one module probes multiple device instances and > > > > > > > loading it > > > > > > > in one order would break dpm_list order for one device and > > > > > > > loading it > > > > > > > in another order would break it for another device. And there > > > > > > > would be > > > > > > > no "proper" order to load modules (because module order != device > > > > > > > order). > > > > > > > > > > > > I'm not saying that the current code is perfect. I'm saying that > > > > > > the > > > > > > fix as proposed adds too much cost for everybody who may not care > > > > > > IMO. > > > > > > > > > > Ok, how about I don't do this reordering until we see the first > > > > > deferred probe request? Will that work for you? In that case, systems > > > > > with no deferred probing will not incur any reordering cost. Or if > > > > > reordering starts only towards the end, all the previous probes won't > > > > > incur reordering cost. > > > > > > > > That first deferred probe request is more or less as of the first probe, > > > > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when > > > > adding all top level devices"), at least on DT systems. > > > > > > The deferred probe reordering of devices to the end of dpm_list > > > started in 2012, so it is nothing new, and it demonstrably works for > > > devices where the dependencies are known to the driver core. > > > > > > That said, in the cases when the dependencies are known to the driver > > > core, it is also unnecessary to reorder dpm_list in > > > deferred_probe_work_func(), because the right ordering of it is going > > > to be determined elsewhere. > > > > > > Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking > > > suspend resume order") is not the source of the problem here, because > > > the problem would have still been there without it, due to the > > > device_pm_move_last() that was there before, so the Fixes: tag > > > pointing to that commit is misleading. > > > > > > Now, because 716a7a259690 ("driver core: fw_devlink: Add support for > > > batching fwnode parsing") is an optimization and the regression is > > > present because of it AFAICS, the best way to address it at that point > > > would be to revert commit 716a7a259690 for 5.8 and maybe do the > > > optimization more carefully. > > > > > > Greg, what do you think? > > > > I've been ignoreing this and letting you all sort it out :) > > > > But if you think that patch should be reverted, I'll not object and will > > be glad to to it if this solves the issue. > > Well, if Geert can confirm that reverting commit 716a7a259690 makes > the problem go away, IMO this would be the most reasonable thing to do > at this stage of the cycle without risking that more regressions will > be introduced. > > Geert?
Reverting commit 716a7a25969003d8 ("driver core: fw_devlink: Add support for batching fwnode parsing") requires reverting commits fefcfc968723caf9 ("driver core: Remove check in driver_deferred_probe_force_trigger()") and 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when adding all top level devices"), too. While reverting the latter is sufficient to fix the regression for me, I can confirm that reverting all three fixes the issue, too. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds