On 09/19/2013 11:41 PM, Srivatsa S. Bhat wrote: > On 09/19/2013 06:28 PM, Srivatsa S. Bhat wrote: >> On 09/19/2013 06:25 PM, Linus Walleij wrote: >>> On Thu, Sep 19, 2013 at 2:46 PM, Srivatsa S. Bhat >>> <srivatsa.b...@linux.vnet.ibm.com> wrote: >>> >>>>>> I don't really know if this is the right solution at all, so please >>>>>> help me out here... if you want that patch I can send it once >>>>>> I understand this properly. >>>> >>>> IIRC, recent kernels didn't return 0 or any error code when the !policy >>>> condition was matched. So can you check whether this problem occurs with >>>> 3.11 or 3.10 as well? >>> >>> v3.11 works fine. >>> > > Ok, now that I got a chance to look at the code and the git logs, I think > I have a clearer idea of what is happening. > > Basically commit 474deff74 (cpufreq: remove cpufreq_policy_cpu per-cpu > variable) exposed a hidden issue. Before that commit, the code looked like > this: > > static DEFINE_PER_CPU(int, cpufreq_policy_cpu); > > [...] > > static int lock_policy_rwsem_##mode(int cpu) \ > { \ > int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu); \ > BUG_ON(policy_cpu == -1); \ > down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \ > \ > return 0; \ > } > > But there was no code to set the per-cpu values to -1 to begin with. Since > the per-cpu variable was defined as static, it would have been initialized > to zero. Thus, we would never actually hit the BUG_ON() condition, since > policy_cpu didn't turn out to be -1. > > (The per-cpu variable was set to -1 only during error in __cpufreq_add_dev > and during cpufreq_remove_dev, both of which are irrelevant to the scenario > we are dealing with here). > > So, code ended up accessing and locking CPU 0's rwsemaphore. But how was > its initialization taken care of? The answer is the initcall sequence. > Cpufreq core code initialized itself at the core_initcall stage, and the > sa1100 cpufreq driver initialized itself at the arch_initcall stage, like > Viresh mentioned. And core-initcall comes before arch-initcall. So the > cpufreq core would have finished initializing the per-cpu rwsemaphores, > so that locking/unlocking them wouldn't crash the system or get complaints > from lockdep. > > To summarize, I think before commit 474deff74, we were accessing CPU0's > rwsems (perhaps incorrectly) and due to the lacking initialization of the > per-cpu vars, the BUG_ON() would never fire. > > To confirm this, you can perhaps try out one commit before 474deff74 and see > if it works for you. Or equivalently, you can apply the following patch > (revert of 474deff74) over mainline and see if it works.
Just before sending, I rebased that patch on top of Rafael's linux-next branch, but forgot to update the above sentence. However I think the same patch might apply cleanly over mainline as well. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/