On Mon, 2016-11-21 at 22:34 +0100, Thomas Gleixner wrote: > On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:
[...] > Stupid me. I tested putting a socket offline, which works, but did > not > check what happens on module removal. Delta fix below. That needs to > be > folded into the series as the wreckage already happens before the > last > patch. Your change below fixes the crash issue. Now I tested a case where the last cpu offlined from a package, it removed thermal zone and added zone back once any cpu from the package onlined. So this is working. I want to try to run some workload on those cpu to bump up the temperature and check interrupts. I am hitting some issue unrelated to this change may be. I onlined three cpus from the package 1. [189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e [189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20 [189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21 But I can't schedule anything on those CPUs. For example now can't run turbostat, it complains " turbostat: re-initialized with num_cpus 19 Could not migrate to CPU 8 " Same with #taskset 0x100 stress -c 1 taskset: failed to set pid 0's affinity: Invalid argument I am on the latest linux-pm/linux-next tree on this server. I will switch to latest main line and try. Thanks, Srinivas 8<-------------------- --- a/drivers/thermal/x86_pkg_temp_thermal.c +++ b/drivers/thermal/x86_pkg_temp_thermal.c @@ -63,6 +63,7 @@ struct pkg_device { u32 msr_pkg_therm_high; struct delayed_work work; struct thermal_zone_device *tzone; + struct cpumask cpumask; }; static struct thermal_zone_params pkg_temp_tz_params = { @@ -391,6 +392,7 @@ static int pkg_temp_thermal_device_add(u rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev- >msr_pkg_therm_low, pkgdev->msr_pkg_therm_high); + cpumask_set_cpu(cpu, &pkgdev->cpumask); spin_lock_irq(&pkg_temp_lock); packages[pkgid] = pkgdev; spin_unlock_irq(&pkg_temp_lock); @@ -399,13 +401,15 @@ static int pkg_temp_thermal_device_add(u static int pkg_thermal_cpu_offline(unsigned int cpu) { - int target = cpumask_any_but(topology_core_cpumask(cpu), cpu); struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu); bool lastcpu, was_target; + int target; if (!pkgdev) return 0; + target = cpumask_any_but(&pkgdev->cpumask, cpu); + cpumask_clear_cpu(cpu, &pkgdev->cpumask); lastcpu = target >= nr_cpu_ids; /* @@ -492,8 +496,10 @@ static int pkg_thermal_cpu_online(unsign return -ENODEV; /* If the package exists, nothing to do */ - if (pkgdev) + if (pkgdev) { + cpumask_set_cpu(cpu, &pkgdev->cpumask); return 0; + } return pkg_temp_thermal_device_add(cpu); }