On Thu, Jan 19, 2017 at 03:29:34PM -0800, Guenter Roeck wrote: > On Fri, Jan 20, 2017 at 12:00:36AM +0100, Sebastian Reichel wrote: > > Hi Thierry, > > > > > > > [...] > > > > > > > > Please use register_restart_handler() instead. It has support for > > > > priorities, is not arm specific and properly supports unregistering > > > > (some handler with lower priority will take over). You can check > > > > gpio-restart as an example for the API. > > > > > > > > If you have some other arm_pm_restart handler (those are tried first) > > > > I suggest to convert that to the restart handler framework. Actually > > > > it may be a good idea to convert all of them and drop arm_pm_restart, > > > > since there are only 4 left: > > > > > > > > $ git grep "arm_pm_restart =" > > > > drivers/firmware/psci.c: arm_pm_restart = psci_sys_reset; > > > > arch/arm/mach-prima2/rstc.c: arm_pm_restart = sirfsoc_restart; > > > > arch/arm/xen/enlighten.c: arm_pm_restart = xen_restart; > > > > arch/arm/kernel/setup.c: arm_pm_restart = mdesc->restart; > > > > > > > > The first 3 should be easy to update and the last one could > > > > be solved by registering a wrapper function as restart handler, > > > > which calls mdesc->restart(). > > > > > > I remember this not being the first time that this confuses me. And from > > > looking around a little it seems like I'm not the only one. > > > > > > Do you know if there's ever been any attempts to formalize all of this > > > by adding some sort of framework for this? That way some of the > > > confusion may be removed and architecture-specific bits could be > > > eliminated more easily. > > > > We have a framework for restart handlers, which has been written > > by Guenter Roeck [0] in 2014. You just have to call > > register_restart_handler() > > during probe and unregister_restart_handler() during removal of > > your reboot driver. All reboot drivers in drivers/power/reset use > > that framework. > > > > The restart handlers are invoked by calling do_kernel_restart() > > based on their configured priority. That function is called by > > the architecture's machine_restart() function. It's currently > > used by mips, ppc, arm & arm64 as far as I can see. ARM's > > implementation looks like this: > > > > if (arm_pm_restart) > > arm_pm_restart() > > else > > do_kernel_restart() /* reboot handler framework */ > > > I actually have a set of patches somewhere which transforms the remaining > direct users of arm_pm_restart to use the framework (unless I removed it > from my trees - I don't really recall). I just never got around to submit > it, and then [2] happened and I lost interest. > > > No such thing exists for poweroff. Guenter also wrote something for > > that [1], but Linus intervened [2]. Anyways, pm_power_off is at > > least architecture independent. > > > That was by far the most frustrating kernel development experience > I ever head :-(.
It was a little difficult to track down all the discussion around this, but reading through what I could find, I can understand why this would have been frustrating. You obviously spent an enormous amount of work only to get it shot down. I have to say that I have similar concerns to those voiced by Linus and Rafael. Notifier chains and priority seem too little restrictive for this kind of functionality, in my opinion. I think those concerns could be removed if things got formalized using a framework. Without having spent the amount of time on the topic that you have, the following straw-man proposal is perhaps a little naive: struct system_power_controller; struct system_power_ops { int (*power_off)(struct system_power_controller *spc); int (*restart)(struct system_power_controller *spc, enum reboot_mode mode, const char *cmd); }; struct system_power_controller { struct device *dev; const struct system_power_ops *ops; struct list_head list; ... }; int system_power_controller_add(struct system_power_controller *spc); void system_power_controller_remove(struct system_power_controller *spc); int system_power_off(void); int system_restart(enum reboot_mode mode, const char *cmd); The above is based on what other subsystems use. Drivers would embed the struct system_power_controller in a driver-specific data structure and use container_of() to get at that from the callbacks. Controllers can be added and removed to the subsystem. Internally it may be worth to keep all of the controllers in a list, but only use the most appropriate ones. The above would need some sort of flag or type list that drivers can set in addition to operations to indicate what your proposal had done via priorities. I'm thinking of something along the lines of: enum system_power_controller_type { SYSTEM_POWER_CONTROLLER_TYPE_CPU, SYSTEM_POWER_CONTROLLER_TYPE_SOC, SYSTEM_POWER_CONTROLLER_TYPE_BOARD, }; struct system_power_controller { ... enum system_power_controller_type type; ... }; There's a fairly natural ordering in the above "levels". Obviously the board-level controller would be highest priority because it controls power or reset of the complete board. This would likely be implemented by GPIO-based or PMIC type of drivers. SoC-level controllers would use mechanisms provided by the SoC in order to power off or reset only the SoC. This is obviously lower priority than board-level because some of the components on the board may end up remaining powered on or not getting reset. But it could be a useful fallback in case a board-level controller isn't present or fails. Finally there's the CPU-level code that would most likely be implemented by architecture code. In the most basic version this could be a WFI kind of instruction, or a busy loop, or perhaps even a simple call to panic(). One complication might be that there are things like PSCI that have an architecture-specific implementation (CPU-level) but could actually be a board-level "controller". To keep things simple, I think it would be okay to allow only one of each type of controller in any running system. It's very unlikely that board designers would devise two different ways of powering off or restarting a system, while in a similar way an SoC or CPU would only ever provide one way to do so. Even if theoretically multiple possibilities exist, I think the board code should pick which ones are appropriate. Another missing feature in the above is that sometimes a mechanism exists to record in persistent storage (registers, memory, ...) what kind of reset was executed and which boot code will inspect and use to run different paths during boot. One such example is Tegra where the PMC has "scratch" registers that don't get reset on reboot. Software sets some of the bits to enable things like forced-recovery mode or Android-type recovery booting. I'm sure other similar mechanisms exist on other SoCs. Upon board-level reset, we would still want to have the SoC-level reset prep code execute, but not reset the SoC, otherwise the board-level code wouldn't have a chance of running anymore. This could possibly be solved by adding more fine-grained operations in the struct system_power_ops. One other possible solution that comes to mind is that system power controllers could be explicitly stacked. This would be complicated to achieve in code, but I'm thinking it could have interesting use-cases in device tree based systems, for example. I'm just brainstorming here, this may not be a good idea at all: pmc: pmc@... { system-power-controller; }; i2c@... { pmic: pmic@... { ... system-power-parent = <&pmc>; ... }; }; The PMIC would in the above call into PMC in order to reset on a SoC level before performing the board-level reset. I suppose it could use a separate callback (->prepare({RESET,POWER})) instead of the usual ->restart() and ->power_off(). Does the above sound at all sane to you, given your broad experience in this field? Anything I missed that I'd need to consider in a design for such a framework? Sebastian, any thoughts from you on the above? Thierry
signature.asc
Description: PGP signature