On 02/06/17 11:40, Daniel Lezcano wrote: > On 02/06/2017 12:14, Sudeep Holla wrote: >> >> >> On 02/06/17 11:06, Daniel Lezcano wrote: >>> On 02/06/2017 11:39, Sudeep Holla wrote: >>>> >>>> >>>> On 02/06/17 10:25, Daniel Lezcano wrote: >>>>> On 02/06/2017 11:20, Sudeep Holla wrote: >>>>>> >>>>>> >>>>>> On 01/06/17 12:39, Daniel Lezcano wrote: >>>>>>> Some hardware have clusters with different idle states. The current >>>>>>> code does >>>>>>> not support this and fails as it expects all the idle states to be >>>>>>> identical. >>>>>>> >>>>>>> Because of this, the Mediatek mtk8173 had to create the same idle state >>>>>>> for a >>>>>>> big.Little system and now the Hisilicon 960 is facing the same >>>>>>> situation. >>>>>>> >>>>>>> Solve this by simply assuming the multiple driver will be needed for >>>>>>> all the >>>>>>> platforms using the ARM generic cpuidle driver which makes sense >>>>>>> because of the >>>>>>> different topologies we can support with a single kernel for ARM32 or >>>>>>> ARM64. >>>>>>> >>>>>>> Every CPU has its own driver, so every single CPU can specify in the DT >>>>>>> the >>>>>>> idle states. >>>>>>> >>>>>>> This simple approach allows to support the future dynamIQ system, >>>>>>> current SMP >>>>>>> and HMP. >>>>>>> >>>>>>> It is unoptimal from a memory point of view for a system with a large >>>>>>> number of >>>>>>> CPUs but nowadays there is no such system with a cpuidle driver on ARM. >>>>>>> >>>>>> >>>>>> While I agree this may be simple solution, but just not necessary for >>>>>> systems with symmetric idle states especially one with large number of >>>>>> CPUs. I don't like to see 96 CPU Idle driver on say ThunderX. So we >>>>>> *must* have some basic distinction done here. >>>>>> >>>>>> IMO, we can't punish a large SMP systems just because they don't have >>>>>> asymmetric idle states. >>>>> >>>>> Can you point me in the upstream kernel a DTS with 96 cpus and using the >>>>> cpuidle-arm driver ? >>>>> >>>> >>>> The bindings are upstream right. Not all DTS are upstream, firmware >>>> generate them especially for large systems. >>>> >>>> Check arch/arm64/boot/dts/cavium/thunder{,2}-{88,99}xx.dtsi, it has >>>> supports PSCI and firmware can update DTB to add the idle states. >>>> They are systems with 96 and 128 CPUs. >>> >>> Ok, so I have to assume there are out of tree DTB with idle state >>> definitions. In other circumstances I would have just ignored this >>> argument but I can admit the DTB blob thing is in the grey area between >>> what we have to support upstream and out of tree changes. >>> >> >> Not entirely true. It's clear, we support anything whose binding is >> upstream. I do agree that there are lots of out of tree bindings but I >> am not referring to them. I am just referring to out of tree DTS with >> already upstreamed binding. > > That is what I meant :) > >>> However, even if it is suboptimal, I'm convinced doing a per-cpu driver >>> makes more sense. >>> >> >> OK, but I think a simple check to decide not to, on SMP system is not >> too much a ask ? > > Yes, it is :) >
If you are objecting out of the tree DTS whose bindings are upstream, then that's simply wrong. 1. Firmware can generate/update DTB 2. There's or was a plan to move DTS out of the tree. In short, bindings is all what matters. > We end up with a hack by parsing and analyzing the DT based on the idea > different idle states means different cluster where you stated right > before in the topology comment that is not necessary true. So Lorenzo's > patch adds heuristic which may be wrong. > Completely disagree. When it may look hacky, but it's not heuristics. The CPU idle bindings doesn't and need not link with CPU topology. I re-iterate that "different idle states doesn't means different cluster". It just means they are probably in different power domains. Power domains need not share topological boundaries. > If we decide to consider all cpus with their own set of idle states, > everything is solved with a cpuidle driver per cpu. > > The only drawback is the memory consumption. > Yes, I get that. I just feel that we need not do that for systems that don't need it. > But on the other side, as the idle state are per cpu, we don't trash the > cache. > > Moreover, we already had that, an array of idle states per cpu (see > commit 46bcfad7), it was in the struct cpuidle_device instead of the > struct cpuidle_driver. It has been moved away in the latter with the > assumption the SMP have all the same idle states but now we see it is no > longer true. > > Creating a cpuidle_driver per cpu is like using the previous approach. > The only thing is we must improve to save memory, that's all (eg. a > pointer allocating the idle states rather than hardcoding it). > > We can live with a driver per cpu and do the memory optimization. > > If we have an observed regression with large SMP systems, we fix it. > OK, I am fine with that if you agree that we can fix it later. -- Regards, Sudeep