Hi, **** RFC not for inclusion ****
When we perform a CPU-Offline operation today, we do not put the CPU into the most energy efficient state. On x86, it loops in hlt as opposed to going to one of the low-power C-states. On pSeries, we call rtas_stop_self() and hand over the vCPU back to the resource pool, thereby deallocating the vCPU. Thus, when applications or platforms desire to put a particular CPU to an extended low-power state for a short while, currently they have to piggy-back on scheduler heuristics such as sched_mc_powersavings or play with exclusive Cpusets. The former does a good job based on the workload, but fails to provide any guarentee that the CPU won't be used for the next <> seconds, while the latter might conflict with the existing CPUsets configurations. There were efforts to alleviate these problems and various proposals have been put forth. They include putting the CPU to the deepest possible idle-state when offlined [1], removing the desired CPU from the topmost-cpuset [2], a driver which forces a high-priority idle thread to run on the desired CPU thereby putting it to idle [3]. In this patch-series, we propose to extend the CPU-Hotplug infrastructure and allow the system administrator to choose the desired state the CPU should go to when it is offlined. We think this approach addresses the concerns about determinism as well as transparency, since CPU-Hotplug already provides notification mechanism which the userspace can listen to for any change in the configuration and correspondingly readjust any previously set cpu-affinities. Also, approaches such as [1] can make use of this extended infrastructure instead of putting the CPU to an arbitrary C-state when it is offlined, thereby providing the system administrator a rope to hang himself with should he feel the need to do so. This patch-series tries to achieve this by implementing an architecture independent framework that exposes sysfs tunables to allow the system-adminstrator to choose the offline-state of a CPU. /sys/devices/system/cpu/cpu<number>/available_offline_states and /sys/devices/system/cpu/cpu<number>/preferred_offline_states For the purpose of proof-of-concept, we've implemented the backend for pSeries. For pSeries, we define two available_offline_states. They are: deallocate: This is default behaviour which on an offline, deallocates the vCPU by invoking rtas_stop_self() and hands it back to the resource pool. deactivate: This calls H_CEDE, which will request the hypervisor to idle the vCPU in the lowest power mode and give it back as soon as we need it. Any feedback on the patchset will be immensely valuable. References: ----------- [1] Pallipadi, Venkatesh: x86: Make offline cpus to go to deepest idle state using mwait (URL: http://lkml.org/lkml/2009/5/22/431) [2] Li, Shaohua: cpuset: add new API to change cpuset top group's cpus (URL: http://lkml.org/lkml/2009/5/19/54) [3] Li, Shaohua: new ACPI processor driver to force CPUs idle (URL: http://www.spinics.net/lists/linux-acpi/msg22863.html) Changelog: --- Gautham R Shenoy (3): pSeries: cpu: Cede CPU during a deactivate-offline cpu: Implement cpu-offline-state callbacks for pSeries. cpu: Offline state Framework. arch/powerpc/platforms/pseries/hotplug-cpu.c | 160 ++++++++++++++++++++++- arch/powerpc/platforms/pseries/offline_driver.h | 17 ++ arch/powerpc/platforms/pseries/plpar_wrappers.h | 6 + arch/powerpc/platforms/pseries/smp.c | 18 ++- drivers/base/cpu.c | 111 ++++++++++++++++ include/linux/cpu.h | 15 ++ 6 files changed, 319 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h -- Thanks and Regards gautham. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev