Re: [PATCH V4 11/18] powerpc/mm: Hugetlbfs is book3s_64 and fsl_book3e (32 or 64)
Paul Mackerras writes: > On Tue, Feb 23, 2016 at 10:18:13AM +0530, Aneesh Kumar K.V wrote: >> We move large part of fsl related code to hugetlbpage-book3e.c. >> Only code movement. This also avoid #ifdef in the code. >> >> Eventhough we allow hugetlbfs only for book3s 64 and fsl book3e, I am >> still retaining the #ifdef in hugetlbpage-book3e.c. It looks like there >> was an attempt to support hugetlbfs on other non hash platforms. I >> didn't want to loose that work. > > Well... that stuff is always in git. It seems that the only 64-bit > non-Freescale Book E platform we had was A2, and that code got deleted > because the product was never released. > > You get rid of some but not all of the ifdefs in hugetlbpage-book3e.c, > and the result is a bit confusing (see below). In fact that file will > only get compiled for Freescale Book E processors, because the > Makefile has an ifdef on HUGETLB_PAGE, which ultimately depends on > SYS_SUPPORTS_HUGETLBFS, which is only set for Freescale Book E and > 64-bit server. > > However, a larger question is why this cleanup is necessary for > preparing for the radix tree. Nothing in the following patches in > this series seems to depend on this patch. I am putting together radix changes on top of the PTE rewrite patch series and I thought of retaining most of the code cleanup because that makes the applying further patches easy. This was mostly needed so that we can update hugetlb for radix easily. https://github.com/kvaneesh/linux/commit/31e7c351e8a2fd9e347f9837e9e765b648786e7a I haven't got that change on top of my current tree yet. IMHO I found the cleanup easier. I agree that I was focusing mostly on book3s and was trying to keep nonhash as close to the original version. I can move this cleanup as a separate series if it is confusing to look at it related to radix preparation. If the cleanup is not helping, let me know, we can add them later along with rest of radix changes if needed. The most important change in this series that we need to get merged in the next merge window is the 4 level page table. Rest of the patches are all code movement/cleanups. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4 15/18] powerpc/mm: Move hash page table related functions to pgtable-hash64.c
Scott Wood writes: Hi Scott, Thanks for testing this series. > On Tue, 2016-02-23 at 10:18 +0530, Aneesh Kumar K.V wrote: >> >> diff --git a/arch/powerpc/mm/pgtable-book3e.c b/arch/powerpc/mm/pgtable >> -book3e.c >> new file mode 100644 >> index ..bdaa4376f838 >> --- /dev/null >> +++ b/arch/powerpc/mm/pgtable-book3e.c >> @@ -0,0 +1,163 @@ >> + >> +/* >> + * Copyright IBM Corporation, 2015 >> + * Author Aneesh Kumar K.V >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of version 2 of the GNU Lesser General Public License >> + * as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope that it would be useful, but >> + * WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. >> + * >> + */ >> + >> +/* >> + * PPC64 THP Support for hash based MMUs >> + */ > > This comment doesn't match the filename. > > Regarding the rest of the file's contents, why is this being code being > copied? The very terse changelog doesn't cover it. > While doing the cleanup/code movement, I was trying to keep nonhash code as much closer to as it is now, because there is no way for me to test it and I wanted to avoid any regression. I agree that in some case it is creating confusion like above. If we agree that some of these cleanup/code movement are helping, then I can redo the series making sure changes like above are taken care of. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Problems with swapping in v4.5-rc on POWER
On Thu, 25 Feb 2016, Hugh Dickins wrote: > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > Can you test the impact of the merge listed below ?(ie, revert the merge > > > and see if > > > we can reproduce and also verify with merge applied). This will give us a > > > set of commits to look closer. We had quiet a lot of page table > > > related changes going in this merge window. > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > That is the merge commit that added _PAGE_PTE. > > > > Another experiment running on it at the moment, I'd like to give that > > a few more hours, and then will try the revert you suggest. But does > > that merge revert cleanly, did you try? I'm afraid of interactions, > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > I don't have THP configured on, maybe I can ignore any issues from that. > > That revert worked painlessly, only a very few and simple conflicts, > I ran that under load for 12 hours, no problem seen. > > I've now checked out an f689b742 tree and started on that, just to > confirm that it fails fairly quickly I hope; and will then proceed > to git bisect, giving that as bad and 37cea93b as good. > > Given the uncertainty of whether 12 hours is really long enough to be > sure, and perhaps difficulties along the way, I don't rate my chances > of a reliable bisection higher than 60%, but we'll see. I'm sure you won't want a breathless report from me on each bisection step, but I ought to report that: contrary to our expectations, the f689b742 survived without error for 12 hours, so appears to be good. I'll bisect between there and v4.5-rc1. Hugh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 07/12] powerpc/ftrace: FTRACE_WITH_REGS implementation for ppc64le
On Thu, 2016-02-25 at 16:11 +0100, Torsten Duwe wrote: > On Thu, Feb 25, 2016 at 11:48:59AM +1100, Balbir Singh wrote: > > > @@ -608,6 +621,9 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, > > > return -ENOENT; > > > if (!restore_r2((u32 *)location + 1, me)) > > > return -ENOEXEC; > > > + /* Squash the TOC saver for profiler calls */ > > > + if (!strcmp("_mcount", strtab+sym->st_name)) > > > + SQUASH_TOC_SAVE_INSN(value); > > I don't think we need this anymore, do we? > > I'm not sure. Once a module is loaded, are all the "bl _mcount"s NOPed out > before any of its functions are run? If not, the _mcount trampoline will > be used, and it must not save R2! With dynamic ftrace, yes they are all nop'ed out before the module runs. See ftrace_module_init() called from load_module(). But with static ftrace they are just left as-is. As this series is currently written you can't enable mprofile-kernel with static ftrace. But that's a bit fragile, someone could easily send a patch to enable it for static ftrace and we'd probably merge it without thinking about this code. So I'll leave this as is for now, and we will clean it up once the series is in. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 04/12] powerpc/ftrace: Prepare for -mprofile-kernel
On Thu, 2016-02-25 at 14:52 +0100, Torsten Duwe wrote: > On Thu, Feb 25, 2016 at 01:28:27AM +1100, Michael Ellerman wrote: > > @@ -450,17 +448,44 @@ static unsigned long stub_for_addr(const Elf64_Shdr > > *sechdrs, > > return (unsigned long)&stubs[i]; > > } > > > > +#ifdef CC_USING_MPROFILE_KERNEL > > +static int is_early_mcount_callsite(u32 *instruction) > > +{ > > + /* -mprofile-kernel sequence starting with > > +* mflr r0 and maybe std r0, LRSAVE(r1). > > +*/ > > + if ((instruction[-3] == PPC_INST_MFLR && > > +instruction[-2] == PPC_INST_STD_LR) || > > + instruction[-2] == PPC_INST_MFLR) { > > + /* Nothing to be done here, it's an _mcount > > +* call location and r2 will have to be > > +* restored in the _mcount function. > > +*/ > > + return 1; > > + } > > + return 0; > > +} > > +#else > > *You* said this might page fault :) It does :) - I fixed it up in patch 6, sorry I realise that's hard to review. > Did we agree yet whether we insist on a streamlined compiler? > (GCC commit e95d0248dace required)? No we didn't really. I think you're right that it's not /too/ hard to support both sequences. But we may change our mind in future :) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 03/12] powerpc/module: Create a special stub for ftrace_caller()
On Thu, 2016-02-25 at 14:31 +0100, Torsten Duwe wrote: > On Thu, Feb 25, 2016 at 11:08:54AM +1100, Balbir Singh wrote: > > How about some comments on r2 > > r2 is still pointing to the module's toc, will be saved by ftrace_caller > > and restored by the instruction following bl ftrace_caller (after patching > > _mcount/nop) > > To be precise: ftrace_caller needs to save _and_ restore r2 in case of > -mprofile-kernel. > > > + /* Stub uses address relative to kernel_toc */ > > > + reladdr = (unsigned long)ftrace_caller - get_paca()->kernel_toc; Yeah true. I originally wrote it with the address of ftrace_caller passed as an argument, so it had to be computed at runtime, and getting it from the paca is ~= to getting it out of the GOT. > kernel_toc is a compile time constant; do you really want to look it up in > memory at runtime each time? It's a bit tricky to get the +- 0x8000 right > OTOH... > > I wrote: > extern unsigned long __toc_start; > reladdr = addr - ((unsigned long)(&__toc_start) + 0x8000UL); > > looks a bit odd, but evaluates to a constant for ftrace_caller. Yeah that makes sense. I'll add a helper to do the + 32k. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 02/12] powerpc/module: Mark module stubs with a magic value
On Thu, 2016-02-25 at 14:17 +0100, Torsten Duwe wrote: > On Thu, Feb 25, 2016 at 01:28:25AM +1100, Michael Ellerman wrote: > > > > We can make that process easier by marking the generated stubs with a > > magic value, and then looking for that magic value. Altough this is not > > as rigorous as the current method, I believe it is sufficient in > > practice. > > The actual magic value is sort of debatable; it should be "improbable" > enough. But this can be changed easily, for each kernel compile, even. Yeah. Given the locations we're trying to patch are computed in the first place from the mcount call sites, I feel like we don't need to be super paranoid here. The only time I've heard of this code (the current version) tripping up is when folks are hacking on ftrace. > > Signed-off-by: Michael Ellerman > Reviewed-by: Torsten Duwe > > [for reference:] > > +#define STUB_MAGIC 0x73747562 /* stub */ Which is one of: ori r21,r19,29811 andi. r20,r27,30050 Both of which are pretty improbable. They don't appear in any kernel I have around here. I have more plans for this code, which would hopefully mean we can get rid of the magic checking entirely. But I think this is OK for now. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v10 0/2] cpufreq: powernv: Export throttle stat attributes and a bug fix
Patch[1] solves a bug in the module{init/exit} path and Patch[2] exports the throttle stats for the chip. This patchset is on top of linux-pm/linux-next. Changes from v9: - Patch[1/2] is newly added to correctly handle error path in powernv_cpufreq_init() and unregistration path in powernv_cpufreq_exit() as suggested by Viresh. - Patch[2/2] is rebased on top of Patch[1/2]. Shilpasri G Bhat (2): cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit} cpufreq: powernv: Add sysfs attributes to show throttle stats Documentation/ABI/testing/sysfs-devices-system-cpu | 69 + drivers/cpufreq/powernv-cpufreq.c | 162 ++--- 2 files changed, 212 insertions(+), 19 deletions(-) -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v10 1/2] cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}
Unregister the notifiers if cpufreq_driver_register() fails in powernv_cpufreq_init(). Re-arrange the unregistration and cleanup routines in powernv_cpufreq_exit() to free all the resources after the driver has unregistered. Signed-off-by: Shilpasri G Bhat --- drivers/cpufreq/powernv-cpufreq.c | 40 --- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 1bbc10a..50bf120 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -595,6 +595,19 @@ out: return ret; } +static inline void clean_chip_info(void) +{ + kfree(chips); + kfree(core_to_chip_map); +} + +static inline void unregister_all_notifiers(void) +{ + opal_message_notifier_unregister(OPAL_MSG_OCC, +&powernv_cpufreq_opal_nb); + unregister_reboot_notifier(&powernv_cpufreq_reboot_nb); +} + static int __init powernv_cpufreq_init(void) { int rc = 0; @@ -605,30 +618,35 @@ static int __init powernv_cpufreq_init(void) /* Discover pstates from device tree and init */ rc = init_powernv_pstates(); - if (rc) { - pr_info("powernv-cpufreq disabled. System does not support PState control\n"); - return rc; - } + if (rc) + goto out; /* Populate chip info */ rc = init_chip_info(); if (rc) - return rc; + goto out; register_reboot_notifier(&powernv_cpufreq_reboot_nb); opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb); - return cpufreq_register_driver(&powernv_cpufreq_driver); + + rc = cpufreq_register_driver(&powernv_cpufreq_driver); + if (!rc) + return 0; + + pr_info("Failed to register the cpufreq driver (%d)\n", rc); + unregister_all_notifiers(); + clean_chip_info(); +out: + pr_info("Platform driver disabled. System does not support PState control\n"); + return rc; } module_init(powernv_cpufreq_init); static void __exit powernv_cpufreq_exit(void) { - unregister_reboot_notifier(&powernv_cpufreq_reboot_nb); - opal_message_notifier_unregister(OPAL_MSG_OCC, -&powernv_cpufreq_opal_nb); - kfree(chips); - kfree(core_to_chip_map); cpufreq_unregister_driver(&powernv_cpufreq_driver); + unregister_all_notifiers(); + clean_chip_info(); } module_exit(powernv_cpufreq_exit); -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v10 2/2] cpufreq: powernv: Add sysfs attributes to show throttle stats
Create sysfs attributes to export throttle information in /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory. The newly added sysfs files are as follows: 1)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat 2)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub-turbo_stat 3)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle 4)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap 5)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp 6)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault 7)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent 8)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset Detailed explanation of each attribute is added to Documentation/ABI/testing/sysfs-devices-system-cpu CC: linux-...@vger.kernel.org Signed-off-by: Shilpasri G Bhat --- Changes from v9: - Modified documentation. - s/throttle_nominal/throttle_sub_turbo Changes from v8: - Moved the sysfs attributes from cpu/cpufreq/chipX to cpuX/cpufreq/throttle_stats - Adhering to one-value-per-file, replace throttle_table with multiple sysfs files. - Using CPUFREQ_POLICY_NOTIFIER to add/remove attribute_group. Documentation/ABI/testing/sysfs-devices-system-cpu | 69 +++ drivers/cpufreq/powernv-cpufreq.c | 127 +++-- 2 files changed, 187 insertions(+), 9 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index b683e8e..eaa2c87 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -271,3 +271,72 @@ Description: Parameters for the CPU cache attributes - WriteBack: data is written only to the cache line and the modified cache line is written to main memory only when it is replaced + +What: /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset +Date: Feb 2016 +Contact: Linux kernel mailing list + Linux for PowerPC mailing list +Description: POWERNV CPUFreq driver's frequency throttle stats directory and + attributes + + 'cpuX/cpufreq/throttle_stats' directory contains the CPU frequency + throttle stat attributes for the chip. The throttle stats of a cpu + is common across all the cpus belonging to a chip. Below are the + throttle attributes exported in the 'throttle_stats' directory: + + - turbo_stat : This file gives the total number of times the max + frequency is throttled to lower frequency in turbo (at and above + nominal frequency) range of frequencies. + + - sub_turbo_stat : This file gives the total number of times the + max frequency is throttled to lower frequency in sub-turbo(below + nominal frequency) range of frequencies. + + - unthrottle : This file gives the total number of times the max + frequency is unthrottled after being throttled. + + - powercap : This file gives the total number of times the max + frequency is throttled due to 'Power Capping'. + + - overtemp : This file gives the total number of times the max + frequency is throttled due to 'CPU Over Temperature'. + + - supply_fault : This file gives the total number of times the + max frequency is throttled due to 'Power Supply Failure'. + + - overcurrent : This file gives the total number of times the + max frequency is throttled due to 'Overcurrent'. + + - occ_reset : This file gives the total number of times the max + frequency is throttled due to 'OCC Reset'. + + The sysfs attributes representing different throttle reasons like + powercap, overtemp, supply_fault, overcurrent and occ_reset map to + the reasons provided by OCC firmware for throttling the frequency. + +What: /sys/devices/system/cpu/cpufreq/policyX/throttle_stats + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/turbo_stat + /sys/devices/system/c
Re: [PATCH v10 1/2] cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}
On 26-02-16, 16:06, Shilpasri G Bhat wrote: > Unregister the notifiers if cpufreq_driver_register() fails in > powernv_cpufreq_init(). Re-arrange the unregistration and cleanup routines > in powernv_cpufreq_exit() to free all the resources after the driver > has unregistered. > > Signed-off-by: Shilpasri G Bhat > --- > drivers/cpufreq/powernv-cpufreq.c | 40 > --- > 1 file changed, 29 insertions(+), 11 deletions(-) Acked-by: Viresh Kumar -- viresh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Fri, Dec 18, 2015 at 4:32 PM, Arnd Bergmann wrote: > On Tuesday 15 December 2015 00:58:26 Rafael J. Wysocki wrote: >> On Thursday, November 26, 2015 05:21:11 PM Jia Hongtao wrote: >> > Register the qoriq cpufreq driver as a cooling device, based on the >> > thermal device tree framework. When temperature crosses the passive trip >> > point cpufreq is used to throttle CPUs. >> > >> > Signed-off-by: Jia Hongtao >> > Reviewed-by: Viresh Kumar >> >> Applied, thanks! >> > > I got a randconfig build error today: > > drivers/built-in.o: In function `qoriq_cpufreq_ready': > debugfs.c:(.text+0x1f4688): undefined reference to > `of_cpufreq_cooling_register' > > CONFIG_OF=y > CONFIG_QORIQ_CPUFREQ=y > CONFIG_THERMAL=m > CONFIG_THERMAL_OF=y > > I think you need a 'depends on THERMAL' to prevent the driver from > being built-in when THERMAL=m. Maybe this is not the best approach. The cpufreq feature itself should be working independently without thermal framework. I think we should make the qoriq_cpufreq_ready() defined as null function if THERMAL is not defined. Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 5/5] cpufreq: qoriq: Don't look at clock implementation details
On Fri, Sep 25, 2015 at 4:50 PM, Rafael J. Wysocki wrote: > On Friday, September 25, 2015 04:17:07 PM Scott Wood wrote: >> On Fri, 2015-09-25 at 23:42 +0200, Rafael J. Wysocki wrote: >> > On Tuesday, September 22, 2015 12:46:54 PM Viresh Kumar wrote: >> > > On 19-09-15, 23:29, Scott Wood wrote: >> > > > Get the CPU clock's potential parent clocks from the clock interface >> > > > itself, rather than manually parsing the clocks property to find a >> > > > phandle, looking at the clock-names property of that, and assuming that >> > > > those are valid parent clocks for the cpu clock. >> > > > >> > > > This is necessary now that the clocks are generated based on the clock >> > > > driver's knowledge of the chip rather than a fragile device-tree >> > > > description of the mux options. >> > > > >> > > > We can now rely on the clock driver to ensure that the mux only exposes >> > > > options that are valid. The cpufreq driver was currently being overly >> > > > conservative in some cases -- for example, the "min_cpufreq = >> > > > get_bus_freq()" restriction only applies to chips with erratum >> > > > A-004510, and whether the freq_mask used on p5020 is needed depends on >> > > > the actual frequencies of the PLLs (FWIW, p5040 has a similar >> > > > limitation but its .freq_mask was zero) -- and the frequency mask >> > > > mechanism made assumptions about particular parent clock indices that >> > > > are no longer valid. >> > > > >> > > > Signed-off-by: Scott Wood >> > > > --- >> > > > v3: was patch 1/5 and patch 4/5, plus blacklist e6500 and changes >> > > > to clk api usage >> > > > >> > > > drivers/cpufreq/qoriq-cpufreq.c | 137 >> > > > - >> > > > --- >> > > > 1 file changed, 40 insertions(+), 97 deletions(-) >> > > >> > > Acked-by: Viresh Kumar >> > >> > I'm wondering who's supposed to be merging this set? >> >> As I noted in the cover letter, I'm looking for acks so that I can apply >> these to a topic branch which can be pulled through the PPC and ARM trees, >> each of which will have patches that depend on it. > > OK, so no objections from the cpufreq side and you have the ACK from Viresh. Hi Scott, Did you drop this patch later? I can not find it in 4.5-rc still. Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 5/5] cpufreq: qoriq: Don't look at clock implementation details
On Fri, 2016-02-26 at 12:14 -0600, Li Yang wrote: > On Fri, Sep 25, 2015 at 4:50 PM, Rafael J. Wysocki > wrote: > > On Friday, September 25, 2015 04:17:07 PM Scott Wood wrote: > > > On Fri, 2015-09-25 at 23:42 +0200, Rafael J. Wysocki wrote: > > > > On Tuesday, September 22, 2015 12:46:54 PM Viresh Kumar wrote: > > > > > On 19-09-15, 23:29, Scott Wood wrote: > > > > > > Get the CPU clock's potential parent clocks from the clock > > > > > > interface > > > > > > itself, rather than manually parsing the clocks property to find a > > > > > > phandle, looking at the clock-names property of that, and assuming > > > > > > that > > > > > > those are valid parent clocks for the cpu clock. > > > > > > > > > > > > This is necessary now that the clocks are generated based on the > > > > > > clock > > > > > > driver's knowledge of the chip rather than a fragile device-tree > > > > > > description of the mux options. > > > > > > > > > > > > We can now rely on the clock driver to ensure that the mux only > > > > > > exposes > > > > > > options that are valid. The cpufreq driver was currently being > > > > > > overly > > > > > > conservative in some cases -- for example, the "min_cpufreq = > > > > > > get_bus_freq()" restriction only applies to chips with erratum > > > > > > A-004510, and whether the freq_mask used on p5020 is needed > > > > > > depends on > > > > > > the actual frequencies of the PLLs (FWIW, p5040 has a similar > > > > > > limitation but its .freq_mask was zero) -- and the frequency mask > > > > > > mechanism made assumptions about particular parent clock indices > > > > > > that > > > > > > are no longer valid. > > > > > > > > > > > > Signed-off-by: Scott Wood > > > > > > --- > > > > > > v3: was patch 1/5 and patch 4/5, plus blacklist e6500 and changes > > > > > > to clk api usage > > > > > > > > > > > > drivers/cpufreq/qoriq-cpufreq.c | 137 --- > > > > > > -- > > > > > > --- > > > > > > 1 file changed, 40 insertions(+), 97 deletions(-) > > > > > > > > > > Acked-by: Viresh Kumar > > > > > > > > I'm wondering who's supposed to be merging this set? > > > > > > As I noted in the cover letter, I'm looking for acks so that I can apply > > > these to a topic branch which can be pulled through the PPC and ARM > > > trees, > > > each of which will have patches that depend on it. > > > > OK, so no objections from the cpufreq side and you have the ACK from > > Viresh. > > Hi Scott, > > Did you drop this patch later? I can not find it in 4.5-rc still. I was asked to get an ack from Russell King patch 4/5, which this patch requires. Despite repeated pings, I could not get a reply from Russell King. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 01/14] powerpc: Add simple cache inhibited MMIO accessors
Add simple cache inhibited accessors for memory mapped I/O. Unlike the accessors built from the DEF_MMIO_* macros, these don't include any hardware memory barriers, callers need to manage memory barriers on their own. These can only be called in hypervisor mode. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/io.h | 28 1 file changed, 28 insertions(+) diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 6c1297e..d329a01 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -241,6 +241,34 @@ static inline void out_be64(volatile u64 __iomem *addr, u64 val) #endif #endif /* __powerpc64__ */ + +/* + * Simple Cache inhibited accessors + * Unlike the DEF_MMIO_* macros, these don't include any h/w memory + * barriers, callers need to manage memory barriers on their own. + */ + +static inline u32 _lwzcix(unsigned long addr) +{ + u32 ret; + + __asm__ __volatile__("lwzcix %0,0, %1" +: "=r" (ret) : "r" (addr) : "memory"); + return ret; +} + +static inline void _stbcix(u64 addr, u8 val) +{ + __asm__ __volatile__("stbcix %0,0,%1" + : : "r" (val), "r" (addr) : "memory"); +} + +static inline void _stwcix(u64 addr, u32 val) +{ + __asm__ __volatile__("stwcix %0,0,%1" + : : "r" (val), "r" (addr) : "memory"); +} + /* * Low level IO stream instructions are defined out of line for now */ -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 05/14] KVM: PPC: Book3S HV: Enable IRQ bypass
Add the irq_bypass_add_producer and irq_bypass_del_producer functions. These functions get called whenever a GSI is being defined for a guest. They create/remove the mapping between host real IRQ numbers and the guest GSI. Add the following helper functions to manage the passthrough IRQ map. kvmppc_set_passthru_irq() Creates a mapping in the passthrough IRQ map that maps a host IRQ to a guest GSI. It allocates the structure (one per guest VM) the first time it is called. kvmppc_clr_passthru_irq() Removes the passthrough IRQ map entry given a guest GSI. The passthrough IRQ map structure is not freed even when the number of mapped entries goes to zero. It is only freed when the VM is destroyed. Signed-off-by: Suresh Warrier --- arch/powerpc/kvm/book3s_hv.c | 158 ++- 1 file changed, 157 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 22d3054..4d802b8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -55,6 +55,8 @@ #include #include #include +#include +#include #include #include "book3s.h" @@ -3246,6 +3248,8 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm) kvmppc_free_vcores(kvm); kvmppc_free_hpt(kvm); + + kvmppc_free_pimap(kvm); } /* We don't need to emulate any privileged instructions or dcbz */ @@ -3282,7 +3286,7 @@ void kvmppc_free_pimap(struct kvm *kvm) kfree(kvm->arch.pimap); } -struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(struct irq_desc *desc) +static struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(struct irq_desc *desc) { struct kvmppc_passthru_irqmap *pimap; @@ -3292,6 +3296,154 @@ struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(struct irq_desc *desc) return pimap; } + +static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) +{ + struct irq_desc *desc; + struct kvmppc_irq_map *irq_map; + struct kvmppc_passthru_irqmap *pimap; + struct irq_chip *chip; + int i; + + desc = irq_to_desc(host_irq); + if (!desc) + return -EIO; + + mutex_lock(&kvm->lock); + + if (kvm->arch.pimap == NULL) { + /* First call, allocate structure to hold IRQ map */ + pimap = kvmppc_alloc_pimap(desc); + if (pimap == NULL) { + mutex_unlock(&kvm->lock); + return -ENOMEM; + } + } else + pimap = kvm->arch.pimap; + + /* +* For now, we support only a single IRQ chip +*/ + chip = irq_data_get_irq_chip(&desc->irq_data); + if (!chip || (strcmp(chip->name, pimap->irq_chip->name) != 0)) { + pr_warn("kvmppc_set_passthru_irq_hv: Could not assign IRQ map for (%d,%d)\n", + host_irq, guest_gsi); + mutex_unlock(&kvm->lock); + return -ENOENT; + } + + if (pimap->n_mapped == KVMPPC_PIRQ_MAPPED) { + mutex_unlock(&kvm->lock); + return -EAGAIN; + } + + for (i = 0; i < pimap->n_mapped; i++) { + if (guest_gsi == pimap->mapped[i].v_hwirq) { + mutex_unlock(&kvm->lock); + return -EINVAL; + } + } + + irq_map = &pimap->mapped[pimap->n_mapped]; + + irq_map->v_hwirq = guest_gsi; + irq_map->r_hwirq = desc->irq_data.hwirq; + irq_map->desc = desc; + + pimap->n_mapped++; + + if (!kvm->arch.pimap) + kvm->arch.pimap = pimap; + + mutex_unlock(&kvm->lock); + + return 0; +} + +static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) +{ + struct irq_desc *desc; + struct kvmppc_passthru_irqmap *pimap; + int i; + + desc = irq_to_desc(host_irq); + if (!desc) + return -EIO; + + mutex_lock(&kvm->lock); + + if (kvm->arch.pimap == NULL) { + mutex_unlock(&kvm->lock); + return 0; + } + pimap = kvm->arch.pimap; + + WARN_ON(pimap->n_mapped < 1); + + for (i = 0; i < pimap->n_mapped; i++) { + if (guest_gsi == pimap->mapped[i].v_hwirq) + break; + } + + if (i == pimap->n_mapped) { + mutex_unlock(&kvm->lock); + return -ENODEV; + } + + /* +* Replace mapped entry to be cleared with highest entry (unless +* this is already the highest) so as to not leave any holes in +* the array of mapped. +*/ + pimap->n_mapped--; + if (i != pimap->n_mapped) + pimap->mapped[i] = pimap->mapped[pimap->n_mapped]; + + /* +* We don't free this structure even when the count goes to +* zero. The structure is freed when we destroy the VM. +*/ + + mutex_unlock(&kvm->lock)
[PATCH 00/14] PCI Passthrough Interrupt Optimizations
This patch set adds support for handling interrupts for PCI adapters entirely in the guest under the right conditions. When an interrupt is received by KVM in real mode, if the interrupt is from a PCI passthrough adapter owned by the guest, KVM will update the virtual ICP for the VCPU that is the target of the interrupt entirely in real mode and generate the virtual interrupt. If the VCPU is not running in the guest, it will wake up the VCPU. It will also update the affinity of the interrupt to directly target the CPU (core) where this VCPU is being scheduled as an optimization. KVM needs the mapping between hardware interrupt numbers in the host to the virtual hardware interrupt (GSI) that needs to get injected into the guest. This patch set takes advantage of the IRQ bypass manager feature to create this mapping. For now, we allocate and manage a separate mapping structure per VM. Although a mapping is created for every passthrough IRQ requested in the guest, we also maintain a cache of mappings that is used to speed up search. For now, KVM real mode code only looks in the cache for a mapping. If no mapping is found, we fall back on the usual interrupt routing mechanism - switch back to host and run the VFIO interrupt handler. This is based on 4.5-rc1 plus the patch set in http://www.spinics.net/lists/kvm-ppc/msg11131.html since it has dependencies on vmalloc_to_phys() being public. Suresh Warrier (14): powerpc: Add simple cache inhibited MMIO accessors KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function KVM: PPC: select IRQ_BYPASS_MANAGER KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap KVM: PPC: Book3S HV: Enable IRQ bypass KVM: PPC: Book3S HV: Caching for passthrough IRQ map KVM: PPC: Book3S HV: Handle passthrough interrupts in guest KVM: PPC: Book3S HV: Complete passthrough interrupt in host KVM: PPC: Book3S HV: Enable KVM real mode handling of passthrough IRQs KVM: PPC: Book3S HV: Dump irqmap in debugfs KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode KVM: PPC: Book3S HV: Change affinity for passthrough IRQ KVM: PPC: Book3S HV: Counters for passthrough IRQ stats arch/powerpc/include/asm/io.h | 28 +++ arch/powerpc/include/asm/kvm_asm.h| 10 + arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_host.h | 25 +++ arch/powerpc/include/asm/kvm_ppc.h| 28 +++ arch/powerpc/include/asm/pnv-pci.h| 1 + arch/powerpc/kvm/Kconfig | 2 + arch/powerpc/kvm/book3s.c | 45 + arch/powerpc/kvm/book3s_hv.c | 318 +- arch/powerpc/kvm/book3s_hv_builtin.c | 157 +++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 181 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 226 - arch/powerpc/kvm/book3s_xics.c| 68 ++- arch/powerpc/kvm/book3s_xics.h| 3 + arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- 15 files changed, 1013 insertions(+), 94 deletions(-) -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 06/14] KVM: PPC: Book3S HV: Caching for passthrough IRQ map
Add the following functions to support caching of the IRQ mapped entries: kvmppc_cache_passthru_irq() Caches an existing mapping in the cached array. _uncache_passthru_irq() Uncaches a cached entry. This is an internal function and is only invoked when unmapping a passthrough IRQ mapping. There is no support to just uncache an entry. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s_hv.c | 97 ++ 2 files changed, 98 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 75d4c64..4107f7f 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -291,6 +291,7 @@ struct kvmppc_ops { struct irq_bypass_producer *); void (*irq_bypass_del_producer)(struct irq_bypass_consumer *, struct irq_bypass_producer *); + int (*cache_passthru_irq)(struct kvm *kvm, int irq); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 4d802b8..97150f0 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3297,6 +3297,95 @@ static struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(struct irq_desc *desc) return pimap; } +/* + * Cache a passthrough IRQ + * This is accomplished by copying the IRQ details from the + * mapped array to the cached array. + * + * Return: + * 0: if this was accomplished successfully + * 1: if the caching could not be done + */ +static int kvmppc_cache_passthru_irq_hv(struct kvm *kvm, int irq) +{ + struct kvmppc_passthru_irqmap *pimap; + int cidx, midx; + + mutex_lock(&kvm->lock); + + if (kvm->arch.pimap == NULL) + goto err_out; + + pimap = kvm->arch.pimap; + + /* Look for first empty slot */ + for (cidx = 0; cidx < KVMPPC_PIRQ_CACHED; cidx++) + if (pimap->cached[cidx].r_hwirq == 0) + break; + + /* Out of empty cache slots */ + if (cidx == KVMPPC_PIRQ_CACHED) + goto err_out; + + /* Find entry in the mapped array */ + for (midx = 0; midx < pimap->n_mapped; midx++) { + if (irq == pimap->mapped[midx].v_hwirq) + break; + } + + /* IRQ not found */ + if (midx == pimap->n_mapped) + goto err_out; + + if (pimap->mapped[midx].r_hwirq == 0) + /* Someone beat us to caching the IRQ */ + goto err_out; + + pimap->cached[cidx].v_hwirq = pimap->mapped[midx].v_hwirq; + pimap->cached[cidx].desc = pimap->mapped[midx].desc; + pimap->cached[cidx].r_hwirq = pimap->mapped[midx].r_hwirq; + + if (cidx >= pimap->n_cached) + pimap->n_cached = cidx + 1; + + /* r_hwirq == 0 in mapped array to indicate a cached IRQ */ + pimap->mapped[midx].r_hwirq = 0; + + mutex_unlock(&kvm->lock); + return 0; + +err_out: + mutex_unlock(&kvm->lock); + return 1; +} + +/* Called with kvm->lock already acquired */ +static void _uncache_passthru_irq(struct kvmppc_passthru_irqmap *pimap, int irq) +{ + int i; + + for (i = 0; i < pimap->n_cached; i++) { + if (irq == pimap->cached[i].v_hwirq) { + + /* +* Zero out the IRQ being uncached. +*/ + pimap->cached[i].r_hwirq = 0; + pimap->cached[i].v_hwirq = 0; + pimap->cached[i].desc = NULL; + + /* +* Only need to decrement maximum cached count if +* this is the highest entry being uncached. +*/ + if (i + 1 == pimap->n_cached) + pimap->n_cached--; + return; + } + } + +} + static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) { struct irq_desc *desc; @@ -3391,6 +3480,13 @@ static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) } /* +* If this is a cached IRQ, remove it from the cached array also +* mapped.r_hwirq is set to zero when we cache an entry. +*/ + if (!pimap->mapped[i].r_hwirq) + _uncache_passthru_irq(pimap, guest_gsi); + + /* * Replace mapped entry to be cleared with highest entry (unless * this is already the highest) so as to not leave any holes in * the array of mapped. @@ -3567,6 +3663,7 @@ static struct kvmppc_ops kvm_ops_hv = { #ifdef CONFIG_KVM_XICS .irq_bypass_add_producer = kvmppc_irq_bypass_add_producer_hv, .irq_bypass_del_producer = kvmppc_irq_bypass_del_producer_hv, + .cache_pas
[PATCH 02/14] KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function
Modify kvmppc_read_intr to make it a C function. This also adds in the optimization of clearing saved_xirr in the case where we completely handle and EOI an IPI. Without this, the next device interrupt will require two trips through the host interrupt handling code. Signed-off-by: Suresh Warrier --- arch/powerpc/kvm/book3s_hv_builtin.c| 84 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 184 +--- 2 files changed, 179 insertions(+), 89 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 5f0380d..5db386a 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -25,6 +25,7 @@ #include #include #include +#include #define KVM_CMA_CHUNK_ORDER18 @@ -286,3 +287,86 @@ void kvmhv_commence_exit(int trap) struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv; EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv); + +/* + * Determine what sort of external interrupt is pending (if any). + * Returns: + * 0 if no interrupt is pending + * 1 if an interrupt is pending that needs to be handled by the host + * -1 if there was a guest wakeup IPI (which has now been cleared) + */ + +long kvmppc_read_intr(struct kvm_vcpu *vcpu, int path) +{ + unsigned long xics_phys; + u32 h_xirr; + __be32 xirr; + u32 xisr; + u8 host_ipi; + + /* see if a host IPI is pending */ + host_ipi = local_paca->kvm_hstate.host_ipi; + if (host_ipi) + return 1; + + /* Now read the interrupt from the ICP */ + xics_phys = local_paca->kvm_hstate.xics_phys; + if (unlikely(!xics_phys)) + return 1; + + /* +* Save XIRR for later. Since we get control in reverse endian +* on LE systems, save it byte reversed and fetch it back in +* host endian. Note that xirr is the value read from the +* XIRR register, while h_xirr is the host endian version. +*/ + xirr = _lwzcix(xics_phys + XICS_XIRR); + h_xirr = be32_to_cpu(xirr); + local_paca->kvm_hstate.saved_xirr = h_xirr; + xisr = h_xirr & 0xff; + /* +* Ensure that the store/load complete to guarantee all side +* effects of loading from XIRR has completed +*/ + smp_mb(); + + /* if nothing pending in the ICP */ + if (!xisr) + return 0; + + /* We found something in the ICP... +* +* If it is an IPI, clear the MFRR and EOI it. +*/ + if (xisr == XICS_IPI) { + _stbcix(xics_phys + XICS_MFRR, 0xff); + _stwcix(xics_phys + XICS_XIRR, xirr); + /* +* Need to ensure side effects of above stores +* complete before proceeding. +*/ + smp_mb(); + + /* +* We need to re-check host IPI now in case it got set in the +* meantime. If it's clear, we bounce the interrupt to the +* guest +*/ + host_ipi = local_paca->kvm_hstate.host_ipi; + if (unlikely(host_ipi != 0)) { + /* We raced with the host, +* we need to resend that IPI, bummer +*/ + _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY); + /* Let side effects complete */ + smp_mb(); + return 1; + } + + /* OK, it's an IPI for us */ + local_paca->kvm_hstate.saved_xirr = 0; + return -1; + } + + return 1; +} diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index ed16182..29e6a8a 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -220,6 +220,13 @@ kvmppc_primary_no_guest: li r3, 0 /* Don't wake on privileged (OS) doorbell */ b kvm_do_nap +/* + * kvm_novcpu_wakeup + * Entered from kvm_start_guest if kvm_hstate.napping is set + * to NAPPING_NOVCPU + * r2 = kernel TOC + * r13 = paca + */ kvm_novcpu_wakeup: ld r1, HSTATE_HOST_R1(r13) ld r5, HSTATE_KVM_VCORE(r13) @@ -227,8 +234,18 @@ kvm_novcpu_wakeup: stb r0, HSTATE_NAPPING(r13) /* check the wake reason */ + ld r3, HSTATE_KVM_VCPU(r13) bl kvmppc_check_wake_reason + /* +* Restore volatile registers since we could have called +* a C routine in kvmppc_check_wake_reason. +* Wake reason (trap) is returned through r31 +* r5 = VCORE +*/ + ld r5, HSTATE_KVM_VCORE(r13) + mr r12, r31 + /* see if any other thread is already exiting */ lwz r0, VCORE_ENTRY_EXIT(r5) cmpwi r0, 0x100 @@ -320
[PATCH 04/14] KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap
This patch introduces an IRQ mapping structure, the kvmppc_passthru_irqmap structure that is to be used to map the real hardware IRQ in the host with the virtual hardware IRQ (gsi) that is injected into a guest by KVM for passthrough adapters. Currently, we assume a separate IRQ mapping structure for each guest. Each kvmppc_passthru_irqmap has two mapping arrays, the mapped array contains all defined real<->virtual IRQs, the cached array can be used to limit the real<->virtual IRQs to a smaller subset, like a cache of the most frequently used mappings. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_host.h | 21 + arch/powerpc/include/asm/kvm_ppc.h | 15 +++ arch/powerpc/kvm/book3s_hv.c| 19 +++ 3 files changed, 55 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ffdbc2d..fc10248 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -191,6 +191,8 @@ struct kvmppc_spapr_tce_table { struct kvmppc_xics; struct kvmppc_icp; +struct kvmppc_passthru_irqmap; + /* * The reverse mapping array has one entry for each HPTE, * which stores the guest's view of the second word of the HPTE @@ -261,6 +263,7 @@ struct kvm_arch { #endif #ifdef CONFIG_KVM_XICS struct kvmppc_xics *xics; + struct kvmppc_passthru_irqmap *pimap; #endif struct kvmppc_ops *kvm_ops; #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE @@ -391,6 +394,24 @@ struct kvmhv_tb_accumulator { u64 tb_max; /* max time */ }; +#ifdef CONFIG_PPC_BOOK3S_64 +struct kvmppc_irq_map { + u32 r_hwirq; + u32 v_hwirq; + struct irq_desc *desc; +}; + +#defineKVMPPC_PIRQ_CACHED 16 +#defineKVMPPC_PIRQ_MAPPED 1024 +struct kvmppc_passthru_irqmap { + int n_cached; + int n_mapped; + struct irq_chip *irq_chip; + struct kvmppc_irq_map cached[KVMPPC_PIRQ_CACHED]; + struct kvmppc_irq_map mapped[KVMPPC_PIRQ_MAPPED]; +}; +#endif + # ifdef CONFIG_PPC_FSL_BOOK3E #define KVMPPC_BOOKE_IAC_NUM 2 #define KVMPPC_BOOKE_DAC_NUM 2 diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 780a017..75d4c64 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -457,8 +457,19 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) { return vcpu->arch.irq_type == KVMPPC_IRQ_XICS; } + +static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( + struct kvm_vcpu *vcpu) +{ + if (vcpu) + return vcpu->kvm->arch.pimap; + else + return NULL; +} + extern void kvmppc_alloc_host_rm_ops(void); extern void kvmppc_free_host_rm_ops(void); +extern void kvmppc_free_pimap(struct kvm *kvm); extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu); extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server); extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args); @@ -470,8 +481,12 @@ extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, extern void kvmppc_xics_ipi_action(void); extern int h_ipi_redirect; #else +static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( + struct kvm_vcpu *vcpu) + { return NULL; } static inline void kvmppc_alloc_host_rm_ops(void) {}; static inline void kvmppc_free_host_rm_ops(void) {}; +static inline void kvmppc_free_pimap(struct kvm *kvm) {}; static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) { return 0; } static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { } diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index f47fffe..22d3054 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3275,6 +3275,25 @@ static int kvmppc_core_check_processor_compat_hv(void) return 0; } +#ifdef CONFIG_KVM_XICS + +void kvmppc_free_pimap(struct kvm *kvm) +{ + kfree(kvm->arch.pimap); +} + +struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(struct irq_desc *desc) +{ + struct kvmppc_passthru_irqmap *pimap; + + pimap = kzalloc(sizeof(struct kvmppc_passthru_irqmap), GFP_KERNEL); + if (pimap != NULL) + pimap->irq_chip = irq_data_get_irq_chip(&desc->irq_data); + + return pimap; +} +#endif + static long kvm_arch_vm_ioctl_hv(struct file *filp, unsigned int ioctl, unsigned long arg) { -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 03/14] KVM: PPC: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set. Add the PPC producer functions for add and del producer. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_ppc.h | 4 arch/powerpc/kvm/Kconfig | 2 ++ arch/powerpc/kvm/book3s.c | 32 3 files changed, 38 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 197a8ac..780a017 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -287,6 +287,10 @@ struct kvmppc_ops { long (*arch_vm_ioctl)(struct file *filp, unsigned int ioctl, unsigned long arg); int (*hcall_implemented)(unsigned long hcall); + int (*irq_bypass_add_producer)(struct irq_bypass_consumer *, + struct irq_bypass_producer *); + void (*irq_bypass_del_producer)(struct irq_bypass_consumer *, + struct irq_bypass_producer *); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index c2024ac..7ac0569 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -22,6 +22,8 @@ config KVM select ANON_INODES select HAVE_KVM_EVENTFD select SRCU + select IRQ_BYPASS_MANAGER + select HAVE_KVM_IRQ_BYPASS config KVM_BOOK3S_HANDLER bool diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index b34220d..2492b7e 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -35,6 +35,8 @@ #include #include #include +#include +#include #include "book3s.h" #include "trace.h" @@ -921,6 +923,36 @@ int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hcall) return kvm->arch.kvm_ops->hcall_implemented(hcall); } +/* + * irq_bypass_add_producer and irq_bypass_del_producer are only + * useful if the architecture supports PCI passthrough. + * irq_bypass_stop and irq_bypass_start are not needed and so + * kvm_ops are not defined for them. + */ +int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, +struct irq_bypass_producer *prod) +{ + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + struct kvm *kvm = irqfd->kvm; + + if (kvm->arch.kvm_ops->irq_bypass_add_producer) + return kvm->arch.kvm_ops->irq_bypass_add_producer(cons, prod); + + return 0; +} + +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + struct kvm *kvm = irqfd->kvm; + + if (kvm->arch.kvm_ops->irq_bypass_del_producer) + kvm->arch.kvm_ops->irq_bypass_del_producer(cons, prod); +} + static int kvmppc_book3s_init(void) { int r; -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 08/14] KVM: PPC: Book3S HV: Complete passthrough interrupt in host
In existing real mode ICP code, when updating the virtual ICP state, if there is a required action that cannot be completely handled in real mode, as for instance, a VCPU needs to be woken up, flags are set in the ICP to indicate the required action. This is checked when returning from hypercalls to decide whether the call needs switch back to the host where the action can be performed in virtual mode. Note that if h_ipi_redirect is enabled, real mode code will first try to message a free host CPU to complete this job instead of returning the host to do it ourselves. Currently, the real mode PCI passthrough interrupt handling code checks if any of these flags are set and simply returns to the host. This is not good enough as the trap value (0x500) is treated as an external interrupt by the host code. It is only when the trap value is a hypercall that the host code searches for and acts on unfinished work by calling kvmppc_xics_rm_complete. This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD which is returned by KVM if there is unfinished business to be completed in host virtual mode after handling a PCI passthrough interrupt. The host checks for this special interrupt condition and calls into the kvmppc_xics_rm_complete, which is made an exported function for this reason. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_asm.h | 10 ++ arch/powerpc/include/asm/kvm_ppc.h | 3 +++ arch/powerpc/kvm/book3s_hv.c| 8 +++- arch/powerpc/kvm/book3s_hv_builtin.c| 1 + arch/powerpc/kvm/book3s_hv_rm_xics.c| 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 26 ++ arch/powerpc/kvm/book3s_xics.c | 3 ++- 7 files changed, 50 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 5bca220..05cabed 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -105,6 +105,15 @@ #define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60 #define BOOK3S_INTERRUPT_H_FAC_UNAVAIL 0xf80 +/* book3s_hv */ + +/* + * Special trap used to indicate to host that this is a + * passthrough interrupt that could not be handled + * completely in the guest. + */ +#define BOOK3S_INTERRUPT_HV_RM_HARD0x + #define BOOK3S_IRQPRIO_SYSTEM_RESET0 #define BOOK3S_IRQPRIO_DATA_SEGMENT1 #define BOOK3S_IRQPRIO_INST_SEGMENT2 @@ -136,6 +145,7 @@ #define RESUME_FLAG_NV (1<<0) /* Reload guest nonvolatile state? */ #define RESUME_FLAG_HOST(1<<1) /* Resume host? */ #define RESUME_FLAG_ARCH1 (1<<2) +#define RESUME_FLAG_ARCH2 (1<<3) #define RESUME_GUEST0 #define RESUME_GUEST_NV RESUME_FLAG_NV diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index c5c7386..b19bb30 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -474,6 +474,7 @@ static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( extern void kvmppc_alloc_host_rm_ops(void); extern void kvmppc_free_host_rm_ops(void); extern void kvmppc_free_pimap(struct kvm *kvm); +extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall); extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu); extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server); extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args); @@ -491,6 +492,8 @@ static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( static inline void kvmppc_alloc_host_rm_ops(void) {}; static inline void kvmppc_free_host_rm_ops(void) {}; static inline void kvmppc_free_pimap(struct kvm *kvm) {}; +static inline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) + { return 0; } static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) { return 0; } static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { } diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8504a5d..cc5aea96 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -70,6 +70,8 @@ /* Used to indicate that a guest page fault needs to be handled */ #define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1) +/* Used to indicate that a guest passthrough interrupt needs to be handled */ +#define RESUME_PASSTHROUGH (RESUME_GUEST | RESUME_FLAG_ARCH2) /* Used as a "null" value for timebase values */ #define TB_NIL (~(u64)0) @@ -991,6 +993,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_core_queue_program(vcpu, SRR1_PROGILL); r = RESUME_GUEST; break; + case BOOK3S_INTERRUPT_HV_RM_HARD: + r = RESUME_PASSTHROUGH; + break; default: kvmppc_dump_regs(vcpu); printk(KERN_EMERG "trap=0x%x | pc=0x%lx
[PATCH 07/14] KVM: PPC: Book3S HV: Handle passthrough interrupts in guest
Currently, KVM switches back to the host to handle any external interrupt (when the interrupt is received while running in the guest). This patch updates real-mode KVM to check if an interrupt is generated by a passthrough adapter that is owned by this guest. If so, the real mode KVM will directly inject the corresponding virtual interrupt to the guest VCPU's ICS and also EOI the interrupt in hardware. In short, the interrupt is handled entirely in real mode in the guest context without switching back to the host. In some rare cases, the interrupt cannot be completely handled in real mode, for instance, a VCPU that is sleeping needs to be woken up. In this case, KVM simply switches back to the host with trap reason set to 0x500. This works, but it is clearly not very efficient. A following patch will distinguish this case and handle it correctly in the host. Note that we can use the existing check_too_hard() routine even though we are not in a hypercall to determine if there is unfinished business that needs to be completed in host virtual mode. The patch assumes that the mapping between hardware interrupt IRQ and virtual IRQ to be injected to the guest already exists for the PCI passthrough interrupts that need to be handled in real mode. If the mapping does not exist, KVM falls back to the default existing behavior. The KVM real mode code only reads mappings from the cached array in the passthrough IRQ map. The caching code fails if there are no more cache slots available and the uncaching code is only called a mapping is removed. We also carefully orders the loads and stores of the fields in the kvmppc_irq_map data structure using memory barriers to avoid an inconsistent mapping being seen by the reader. Thus, although it is possible to miss a cache entry, it is not possible to read a stale value. One additional complication involves HOT plugging of SRIOV functions. If a SRIOV function gets removed and then re-added through HOT plug to the same guest, it is possible for the HW IRQ to be assigned a new value for the guest GSI. To ensure that the KVM real mode handlers do not read a stale value for this case, we call kick_all_cpus_sync() after unmapping which does not return until every vcpu executing in the guest has come back to the host at least once. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_ppc.h| 3 ++ arch/powerpc/include/asm/pnv-pci.h| 1 + arch/powerpc/kvm/book3s_hv.c | 21 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 64 +++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 44 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 16 arch/powerpc/platforms/powernv/pci-ioda.c | 14 +-- 7 files changed, 160 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 4107f7f..c5c7386 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -226,6 +226,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority); extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq); extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq); +extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr, +struct kvmppc_irq_map *irq_map, +struct kvmppc_passthru_irqmap *pimap); void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu); void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h index 6f77f71..f0564ee 100644 --- a/arch/powerpc/include/asm/pnv-pci.h +++ b/arch/powerpc/include/asm/pnv-pci.h @@ -20,6 +20,7 @@ int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num); void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num); int pnv_cxl_get_irq_count(struct pci_dev *dev); struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev); +int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq); #ifdef CONFIG_CXL_BASE int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs, diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 97150f0..8504a5d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3343,6 +3343,12 @@ static int kvmppc_cache_passthru_irq_hv(struct kvm *kvm, int irq) pimap->cached[cidx].v_hwirq = pimap->mapped[midx].v_hwirq; pimap->cached[cidx].desc = pimap->mapped[midx].desc; + + /* +* Order the above two stores before the next to serialize with +* the KVM real mode handler. +*/ + smp_wmb(); pimap->cached[cidx].r_hwirq = pimap->mapped[midx].r_hwirq; if (cidx >= pimap->n_cached) @@ -3369,6 +3375,10 @@ static void _uncache_passthru_irq(struct kvmppc_passthru_irqmap *pimap, int irq) /*
[PATCH 09/14] KVM: PPC: Book3S HV: Enable KVM real mode handling of passthrough IRQs
The KVM real mode passthrough handling code only searches for "cached" IRQ maps in the passthrough IRQ map when checking for passthrough IRQs that can be redirected to the guest. This patch enables KVM real mode handling of passthrough IRQs by turning on caching of selected passthrough IRQs. Currently, we follow a simple method and cache any passthrough IRQ when its virtual IRQ is first injected into the guest. Since we have a limit of 16 cache entries per guest, this will limit passthrough IRQs that are handled in KVM real mode to 16. This should work well for the general case for VMs with small number of passthrough adapters or SRIOV VFs. In the future, we can increase the number of cached entries, but we would then need to come up with faster search/filtering mechanisms for an IRQ in the map of cached passthrough IRQs. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/kvm_ppc.h | 2 ++ arch/powerpc/kvm/book3s.c | 10 + arch/powerpc/kvm/book3s_hv.c| 4 arch/powerpc/kvm/book3s_xics.c | 41 + arch/powerpc/kvm/book3s_xics.h | 2 ++ 6 files changed, 60 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fc10248..558d195 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -63,6 +63,7 @@ extern int kvm_unmap_hva_range(struct kvm *kvm, extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end); extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); +extern int kvmppc_cache_passthru_irq(struct kvm *kvm, int guest_gsi); static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm, unsigned long address) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index b19bb30..93531cc 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -484,6 +484,8 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval); extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu, u32 cpu); extern void kvmppc_xics_ipi_action(void); +extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long irq); +extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq); extern int h_ipi_redirect; #else static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 2492b7e..1b4f5bd 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -953,6 +953,16 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, kvm->arch.kvm_ops->irq_bypass_del_producer(cons, prod); } +int kvmppc_cache_passthru_irq(struct kvm *kvm, int irq) +{ + int r = 0; + + if (kvm->arch.kvm_ops->cache_passthru_irq) + r = kvm->arch.kvm_ops->cache_passthru_irq(kvm, irq); + + return r; +} + static int kvmppc_book3s_init(void) { int r; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cc5aea96..487657f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3468,6 +3468,8 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) pimap->n_mapped++; + kvmppc_xics_set_mapped(kvm, guest_gsi); + if (!kvm->arch.pimap) kvm->arch.pimap = pimap; @@ -3522,6 +3524,8 @@ static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) if (i != pimap->n_mapped) pimap->mapped[i] = pimap->mapped[pimap->n_mapped]; + kvmppc_xics_clr_mapped(kvm, guest_gsi); + /* * We don't free this structure even when the count goes to * zero. The structure is freed when we destroy the VM. diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index be23f88..b90570c 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -88,6 +88,18 @@ static int ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level) return -EINVAL; /* +* If this is a mapped passthrough IRQ that is not cached, +* add this to the IRQ cached map so that real mode KVM +* will redirect this directly to the guest where possible. +* Currently, we will cache a passthrough IRQ the first time +* we inject it into the guest. +*/ + if (state->pmapped && !state->pcached) { + if (kvmppc_cache_passthru_irq(xics->kvm, irq) == 0) + state->pcached = 1; + } + + /* * We set state->asserted locklessly. This should be fine as * we ar
[PATCH 10/14] KVM: PPC: Book3S HV: Dump irqmap in debugfs
Dump the passthrough irqmap structure associated with a guest as part of /sys/kernel/debug/powerpc/kvm-xics-*. Signed-off-by: Suresh Warrier --- arch/powerpc/kvm/book3s_xics.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index b90570c..855d669 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -905,6 +905,21 @@ EXPORT_SYMBOL_GPL(kvmppc_xics_hcall); /* -- Initialisation code etc. -- */ +static void xics_debugfs_irqmap(struct seq_file *m, + struct kvmppc_passthru_irqmap *pimap) +{ + int i; + + if (!pimap) + return; + seq_printf(m, "\nPIRQMAP Cache: %d maps\n===\n", + pimap->n_cached); + for (i = 0; i < pimap->n_cached; i++) { + seq_printf(m, "r_hwirq=%x, v_hwirq=%x\n", + pimap->cached[i].r_hwirq, pimap->cached[i].v_hwirq); + } +} + static int xics_debug_show(struct seq_file *m, void *private) { struct kvmppc_xics *xics = m->private; @@ -926,6 +941,8 @@ static int xics_debug_show(struct seq_file *m, void *private) t_check_resend = 0; t_reject = 0; + xics_debugfs_irqmap(m, kvm->arch.pimap); + seq_printf(m, "=\nICP state\n=\n"); kvm_for_each_vcpu(i, vcpu, kvm) { -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 12/14] KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode
When a passthrough IRQ is handled completely within KVM real mode code, it has to also update the IRQ stats since this does not go through the generic IRQ handling code. However, the per CPU kstat_irqs field is an allocated (not static) field and so cannot be directly accessed in real mode safely. The function this_cpu_inc_rm() is introduced to safely increment per CPU fields (currently coded for unsigned integers only) that are allocated and could thus be vmalloced also. Signed-off-by: Suresh Warrier --- arch/powerpc/kvm/book3s_hv_rm_xics.c | 50 1 file changed, 50 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 97a09c2..f33c7cc 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -18,6 +19,7 @@ #include #include #include +#include #include #include @@ -734,6 +736,53 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr) _stwcix(xics_phys + XICS_XIRR, xirr); } +/* + * Increment a per-CPU 32-bit unsigned integer variable. + * Safe to call in real-mode. Handles vmalloc'ed addresses + * + * ToDo: Make this work for any integral type + */ + +static inline void this_cpu_inc_rm(unsigned int __percpu *addr) +{ + unsigned long l; + unsigned int *raddr; + int cpu = smp_processor_id(); + + raddr = per_cpu_ptr(addr, cpu); + l = (unsigned long)raddr; + + if (REGION_ID(l) == VMALLOC_REGION_ID) { + l = vmalloc_to_phys(raddr); + raddr = (unsigned int *)l; + } + ++*raddr; +} + +/* + * We don't try to update the flags in the irq_desc 'istate' field in + * here as would happen in the normal IRQ handling path for several reasons: + * - state flags represent internal IRQ state and are not expected to be + *updated outside the IRQ subsystem + * - more importantly, these are useful for edge triggered interrupts, + *IRQ probing, etc., but we are only handling MSI/MSIx interrupts here + *and these states shouldn't apply to us. + * + * However, we do update irq_stats - we somewhat duplicate the code in + * kstat_incr_irqs_this_cpu() for this since this function is defined + * in irq/internal.h which we don't want to include here. + * The only difference is that desc->kstat_irqs is an allocated per CPU + * variable and could have been vmalloc'ed, so we can't directly + * call __this_cpu_inc() on it. The kstat structure is a static + * per CPU variable and it should be accessible by real-mode KVM. + * + */ +static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc) +{ + this_cpu_inc_rm(desc->kstat_irqs); + __this_cpu_inc(kstat.irqs_sum); +} + long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr, struct kvmppc_irq_map *irq_map, @@ -747,6 +796,7 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, xics = vcpu->kvm->arch.xics; icp = vcpu->arch.icp; + kvmppc_rm_handle_irq_desc(irq_map->desc); icp_rm_deliver_irq(xics, icp, irq); /* EOI the interrupt */ -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 13/14] KVM: PPC: Book3S HV: Change affinity for passthrough IRQ
Change the affinity in the host for a passthrough interrupt to the hardware thread running the VCPU which has affinity to this interrupt in the guest. Since the cores run in single threaded mode on a PowerKVM host, the affinity is actually changed to the hardware thread's first sibling thread in its core. This is only done for IRQs that have been mapped for IRQ bypass since in this case if the interrupt occurs while the core is in the guest, the real mode KVM will be able to simply redirect the interrupt to the appropriate sibling hardware thread. Signed-off-by: Suresh Warrier --- arch/powerpc/kvm/book3s_hv_builtin.c | 14 +-- arch/powerpc/kvm/book3s_hv_rm_xics.c | 78 arch/powerpc/kvm/book3s_xics.c | 7 arch/powerpc/kvm/book3s_xics.h | 3 +- 4 files changed, 98 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 62252da..f95aa63 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -288,8 +288,16 @@ void kvmhv_commence_exit(int trap) struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv; EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv); -static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap, -u32 xisr) +/* + * This returns the v_hwirq -> r_hwirq mapping, if any, + * when the r_hwirq is passed in as input + * There is also the similar get_irqmap_gsi() routine + * defined elsewhere, which returns the mapping when passed + * the v_hwirq as input. + */ +static struct kvmppc_irq_map *get_irqmap_xisr( + struct kvmppc_passthru_irqmap *pimap, + u32 xisr) { int i; @@ -425,7 +433,7 @@ long kvmppc_read_intr(struct kvm_vcpu *vcpu, int path) */ pimap = kvmppc_get_passthru_irqmap(vcpu); if (pimap) { - irq_map = get_irqmap(pimap, xisr); + irq_map = get_irqmap_xisr(pimap, xisr); if (irq_map) { r = kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap); diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index f33c7cc..e2bbfdf 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -661,6 +661,80 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr) return check_too_hard(xics, icp); } +/* + * This returns the v_hwirq -> r_hwirq mapping, if any, + * when the v_hwirq is passed in as input + * There is also the similar get_irqmap_xisr() routine + * defined elsewhere, which returns the mapping when passed + * the r_hwirq as input. + */ + +static struct kvmppc_irq_map *get_irqmap_gsi( + struct kvmppc_passthru_irqmap *pimap, + u32 gsi) +{ + int i; + + /* +* We access this array unsafely. +* Read comments in get_irqmap_xisr for details of this +* as well as the need for the memory barrier used below. +*/ + for (i = 0; i < pimap->n_cached; i++) { + if (gsi == pimap->cached[i].v_hwirq) { + /* +* Order subsequent reads in the caller to serialize +* with the writer. +*/ + smp_rmb(); + return &pimap->cached[i]; + } + } + return NULL; +} + +unsigned long irq_map_err; + +/* + * Change affinity to CPU running the target VCPU. + */ +static void ics_set_affinity_passthru(struct ics_irq_state *state, + struct kvm_vcpu *vcpu, + u32 irq) +{ + struct kvmppc_passthru_irqmap *pimap; + struct kvmppc_irq_map *irq_map; + struct irq_data *d; + s16 intr_cpu; + u32 pcpu; + + intr_cpu = state->intr_cpu; + + if (intr_cpu == -1) + return; + + state->intr_cpu = -1; + + pcpu = cpu_first_thread_sibling(raw_smp_processor_id()); + if (intr_cpu == pcpu) + return; + + pimap = kvmppc_get_passthru_irqmap(vcpu); + if (likely(pimap)) { + irq_map = get_irqmap_gsi(pimap, irq); + if (unlikely(!irq_map)) { + irq_map_err++; + return; + } + d = irq_desc_get_irq_data(irq_map->desc); + if (unlikely(!d->chip->irq_set_affinity)) + return; + d->chip->irq_set_affinity(d, cpumask_of(pcpu), false); + } else + irq_map_err++; + +} + int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr) { struct kvmppc_xics *xics = vcpu->kvm->arch.xics; @@ -713,6 +787,10
[PATCH 14/14] KVM: PPC: Book3S HV: Counters for passthrough IRQ stats
Add VCPU stat counters to track affinity for passthrough interrupts. pthru_all: Counts all passthrough interrupts whose IRQ mappings have been cached in the kvmppc_passthru_irq_map cache. pthru_host: Counts all cached passthrough interrupts that were injected from the host through kvm_set_irq. pthru_bad_aff: Counts how many cached passthrough interrupts have bad affinity (receiving CPU is not running VCPU that is the target of the virtual interrupt in the guest). Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_host.h | 3 +++ arch/powerpc/kvm/book3s.c| 3 +++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 7 +++ 3 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 558d195..9230b1a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -128,6 +128,9 @@ struct kvm_vcpu_stat { u32 ld_slow; u32 st_slow; #endif + u32 pthru_all; + u32 pthru_host; + u32 pthru_bad_aff; }; enum kvm_exit_types { diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 1b4f5bd..b3d44b1 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -65,6 +65,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { "ld_slow", VCPU_STAT(ld_slow) }, { "st", VCPU_STAT(st) }, { "st_slow", VCPU_STAT(st_slow) }, + { "pthru_all", VCPU_STAT(pthru_all) }, + { "pthru_host", VCPU_STAT(pthru_host) }, + { "pthru_bad_aff", VCPU_STAT(pthru_bad_aff) }, { NULL } }; diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index e2bbfdf..4004a35 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -696,6 +696,7 @@ static struct kvmppc_irq_map *get_irqmap_gsi( unsigned long irq_map_err; /* + * Count affinity for passthrough IRQs. * Change affinity to CPU running the target VCPU. */ static void ics_set_affinity_passthru(struct ics_irq_state *state, @@ -708,17 +709,23 @@ static void ics_set_affinity_passthru(struct ics_irq_state *state, s16 intr_cpu; u32 pcpu; + vcpu->stat.pthru_all++; + intr_cpu = state->intr_cpu; if (intr_cpu == -1) return; + vcpu->stat.pthru_host++; + state->intr_cpu = -1; pcpu = cpu_first_thread_sibling(raw_smp_processor_id()); if (intr_cpu == pcpu) return; + vcpu->stat.pthru_bad_aff++; + pimap = kvmppc_get_passthru_irqmap(vcpu); if (likely(pimap)) { irq_map = get_irqmap_gsi(pimap, irq); -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 11/14] KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass
Add a module parameter kvm_irq_bypass for kvm_hv.ko to disable IRQ bypass for passthrough interrupts. The default value of this tunable is 1 - that is enable the feature. Since the tunable is used by built-in kernel code, we use the module_param_cb macro to achieve this. Signed-off-by: Suresh Warrier --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_ppc.h| 2 +- arch/powerpc/kvm/book3s_hv.c | 13 + arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 ++ 4 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 8f39796..8e5fac6 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -191,6 +191,7 @@ extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, struct kvm_vcpu *vcpu); extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, struct kvmppc_book3s_shadow_vcpu *svcpu); +extern int kvm_irq_bypass; static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) { diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 93531cc..a13fd2b 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -465,7 +465,7 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( struct kvm_vcpu *vcpu) { - if (vcpu) + if (vcpu && kvm_irq_bypass) return vcpu->kvm->arch.pimap; else return NULL; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 487657f..2d82c4d 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -91,6 +91,10 @@ static struct kernel_param_ops module_param_ops = { .get = param_get_int, }; +module_param_cb(kvm_irq_bypass, &module_param_ops, &kvm_irq_bypass, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(kvm_irq_bypass, "Bypass passthrough interrupt optimization"); + module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core"); @@ -3317,6 +3321,9 @@ static int kvmppc_cache_passthru_irq_hv(struct kvm *kvm, int irq) struct kvmppc_passthru_irqmap *pimap; int cidx, midx; + if (!kvm_irq_bypass) + return 1; + mutex_lock(&kvm->lock); if (kvm->arch.pimap == NULL) @@ -3421,6 +3428,9 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) struct irq_chip *chip; int i; + if (!kvm_irq_bypass) + return 0; + desc = irq_to_desc(host_irq); if (!desc) return -EIO; @@ -3484,6 +3494,9 @@ static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) struct kvmppc_passthru_irqmap *pimap; int i; + if (!kvm_irq_bypass) + return 0; + desc = irq_to_desc(host_irq); if (!desc) return -EIO; diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 8390c50..97a09c2 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -27,6 +27,8 @@ int h_ipi_redirect = 1; EXPORT_SYMBOL(h_ipi_redirect); +int kvm_irq_bypass = 1; +EXPORT_SYMBOL(kvm_irq_bypass); static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, u32 new_irq); -- 1.8.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 00/14] KVM: PPC: Book3S HV: PCI Passthrough Interrupt Optimizations
On 02/26/2016 12:40 PM, Suresh Warrier wrote: > This patch set adds support for handling interrupts for PCI adapters > entirely in the guest under the right conditions. When an interrupt > is received by KVM in real mode, if the interrupt is from a PCI > passthrough adapter owned by the guest, KVM will update the virtual > ICP for the VCPU that is the target of the interrupt entirely in > real mode and generate the virtual interrupt. If the VCPU is not > running in the guest, it will wake up the VCPU. It will also update > the affinity of the interrupt to directly target the CPU (core) > where this VCPU is being scheduled as an optimization. > > KVM needs the mapping between hardware interrupt numbers in the host > to the virtual hardware interrupt (GSI) that needs to get injected > into the guest. This patch set takes advantage of the IRQ bypass > manager feature to create this mapping. For now, we allocate and > manage a separate mapping structure per VM. > > Although a mapping is created for every passthrough IRQ requested > in the guest, we also maintain a cache of mappings that is used to > speed up search. For now, KVM real mode code only looks in the cache for > a mapping. If no mapping is found, we fall back on the usual interrupt > routing mechanism - switch back to host and run the VFIO interrupt > handler. > > This is based on 4.5-rc1 plus the patch set in > http://www.spinics.net/lists/kvm-ppc/msg11131.html since it has > dependencies on vmalloc_to_phys() being public. > > Suresh Warrier (14): > powerpc: Add simple cache inhibited MMIO accessors > KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function > KVM: PPC: select IRQ_BYPASS_MANAGER > KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap > KVM: PPC: Book3S HV: Enable IRQ bypass > KVM: PPC: Book3S HV: Caching for passthrough IRQ map > KVM: PPC: Book3S HV: Handle passthrough interrupts in guest > KVM: PPC: Book3S HV: Complete passthrough interrupt in host > KVM: PPC: Book3S HV: Enable KVM real mode handling of passthrough IRQs > KVM: PPC: Book3S HV: Dump irqmap in debugfs > KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass > KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode > KVM: PPC: Book3S HV: Change affinity for passthrough IRQ > KVM: PPC: Book3S HV: Counters for passthrough IRQ stats > > arch/powerpc/include/asm/io.h | 28 +++ > arch/powerpc/include/asm/kvm_asm.h| 10 + > arch/powerpc/include/asm/kvm_book3s.h | 1 + > arch/powerpc/include/asm/kvm_host.h | 25 +++ > arch/powerpc/include/asm/kvm_ppc.h| 28 +++ > arch/powerpc/include/asm/pnv-pci.h| 1 + > arch/powerpc/kvm/Kconfig | 2 + > arch/powerpc/kvm/book3s.c | 45 + > arch/powerpc/kvm/book3s_hv.c | 318 > +- > arch/powerpc/kvm/book3s_hv_builtin.c | 157 +++ > arch/powerpc/kvm/book3s_hv_rm_xics.c | 181 + > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 226 - > arch/powerpc/kvm/book3s_xics.c| 68 ++- > arch/powerpc/kvm/book3s_xics.h| 3 + > arch/powerpc/platforms/powernv/pci-ioda.c | 14 +- > 15 files changed, 1013 insertions(+), 94 deletions(-) > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Friday 26 February 2016 12:04:59 Li Yang wrote: > On Fri, Dec 18, 2015 at 4:32 PM, Arnd Bergmann wrote: > > On Tuesday 15 December 2015 00:58:26 Rafael J. Wysocki wrote: > >> On Thursday, November 26, 2015 05:21:11 PM Jia Hongtao wrote: > >> > Register the qoriq cpufreq driver as a cooling device, based on the > >> > thermal device tree framework. When temperature crosses the passive trip > >> > point cpufreq is used to throttle CPUs. > >> > > >> > Signed-off-by: Jia Hongtao > >> > Reviewed-by: Viresh Kumar > >> > >> Applied, thanks! > >> > > > > I got a randconfig build error today: > > > > drivers/built-in.o: In function `qoriq_cpufreq_ready': > > debugfs.c:(.text+0x1f4688): undefined reference to > > `of_cpufreq_cooling_register' > > > > CONFIG_OF=y > > CONFIG_QORIQ_CPUFREQ=y > > CONFIG_THERMAL=m > > CONFIG_THERMAL_OF=y > > > > I think you need a 'depends on THERMAL' to prevent the driver from > > being built-in when THERMAL=m. > > Maybe this is not the best approach. The cpufreq feature itself > should be working independently without thermal framework. I think we > should make the qoriq_cpufreq_ready() defined as null function if > THERMAL is not defined. It already does this when CONFIG_THERMAL is not defined, and my patch doesn't change that. I'm not sure what you are asking for now. Do you want to allow using the cpufreq driver as a built-in driver even when the thermal code is in a module, and then silently skip all thermal management as if it was turned off? That would be this patch: diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h index c156f5082758..a8d9241fc1bb 100644 --- a/include/linux/cpu_cooling.h +++ b/include/linux/cpu_cooling.h @@ -48,7 +48,7 @@ cpufreq_power_cooling_register(const struct cpumask *clip_cpus, * @np: a valid struct device_node to the cooling device device tree node. * @clip_cpus: cpumask of cpus where the frequency constraints will happen */ -#ifdef CONFIG_THERMAL_OF +#if IS_REACHABLE(CONFIG_THERMAL_OF) struct thermal_cooling_device * of_cpufreq_cooling_register(struct device_node *np, const struct cpumask *clip_cpus); but my feeling is that this would cause more surprises when users find their thermal management is not active even though it was enabled in Kconfig and the thermal module gets loaded. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 5/5] cpufreq: qoriq: Don't look at clock implementation details
On Fri, Feb 26, 2016 at 12:16 PM, Scott Wood wrote: > On Fri, 2016-02-26 at 12:14 -0600, Li Yang wrote: >> On Fri, Sep 25, 2015 at 4:50 PM, Rafael J. Wysocki >> wrote: >> > On Friday, September 25, 2015 04:17:07 PM Scott Wood wrote: >> > > On Fri, 2015-09-25 at 23:42 +0200, Rafael J. Wysocki wrote: >> > > > On Tuesday, September 22, 2015 12:46:54 PM Viresh Kumar wrote: >> > > > > On 19-09-15, 23:29, Scott Wood wrote: >> > > > > > Get the CPU clock's potential parent clocks from the clock >> > > > > > interface >> > > > > > itself, rather than manually parsing the clocks property to find a >> > > > > > phandle, looking at the clock-names property of that, and assuming >> > > > > > that >> > > > > > those are valid parent clocks for the cpu clock. >> > > > > > >> > > > > > This is necessary now that the clocks are generated based on the >> > > > > > clock >> > > > > > driver's knowledge of the chip rather than a fragile device-tree >> > > > > > description of the mux options. >> > > > > > >> > > > > > We can now rely on the clock driver to ensure that the mux only >> > > > > > exposes >> > > > > > options that are valid. The cpufreq driver was currently being >> > > > > > overly >> > > > > > conservative in some cases -- for example, the "min_cpufreq = >> > > > > > get_bus_freq()" restriction only applies to chips with erratum >> > > > > > A-004510, and whether the freq_mask used on p5020 is needed >> > > > > > depends on >> > > > > > the actual frequencies of the PLLs (FWIW, p5040 has a similar >> > > > > > limitation but its .freq_mask was zero) -- and the frequency mask >> > > > > > mechanism made assumptions about particular parent clock indices >> > > > > > that >> > > > > > are no longer valid. >> > > > > > >> > > > > > Signed-off-by: Scott Wood >> > > > > > --- >> > > > > > v3: was patch 1/5 and patch 4/5, plus blacklist e6500 and changes >> > > > > > to clk api usage >> > > > > > >> > > > > > drivers/cpufreq/qoriq-cpufreq.c | 137 --- >> > > > > > -- >> > > > > > --- >> > > > > > 1 file changed, 40 insertions(+), 97 deletions(-) >> > > > > >> > > > > Acked-by: Viresh Kumar >> > > > >> > > > I'm wondering who's supposed to be merging this set? >> > > >> > > As I noted in the cover letter, I'm looking for acks so that I can apply >> > > these to a topic branch which can be pulled through the PPC and ARM >> > > trees, >> > > each of which will have patches that depend on it. >> > >> > OK, so no objections from the cpufreq side and you have the ACK from >> > Viresh. >> >> Hi Scott, >> >> Did you drop this patch later? I can not find it in 4.5-rc still. > > I was asked to get an ack from Russell King patch 4/5, which this patch > requires. Despite repeated pings, I could not get a reply from Russell King. This patch? I think you should try to get ACK from clock maintainers instead of Russell. Commit fc4a05d4b0eb ("clk: Remove unused provider APIs") removed __clk_get_num_parents() and clk_hw_get_parent_by_index(), leaving only true provider API versions that operate on struct clk_hw. qoriq-cpufreq needs these functions in order to determine the options it has for calling clk_set_parent() and thus populate the cpufreq table, so revive them as legitimate consumer APIs. Signed-off-by: Scott Wood --- v3: new patch drivers/clk/clk.c | 19 +++ include/linux/clk.h | 31 +++ 2 files changed, 50 insertions(+) Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 5/5] cpufreq: qoriq: Don't look at clock implementation details
On Fri, 2016-02-26 at 15:01 -0600, Li Yang wrote: > On Fri, Feb 26, 2016 at 12:16 PM, Scott Wood wrote: > > On Fri, 2016-02-26 at 12:14 -0600, Li Yang wrote: > > > On Fri, Sep 25, 2015 at 4:50 PM, Rafael J. Wysocki > > > wrote: > > > > On Friday, September 25, 2015 04:17:07 PM Scott Wood wrote: > > > > > On Fri, 2015-09-25 at 23:42 +0200, Rafael J. Wysocki wrote: > > > > > > On Tuesday, September 22, 2015 12:46:54 PM Viresh Kumar wrote: > > > > > > > On 19-09-15, 23:29, Scott Wood wrote: > > > > > > > > Get the CPU clock's potential parent clocks from the clock > > > > > > > > interface > > > > > > > > itself, rather than manually parsing the clocks property to > > > > > > > > find a > > > > > > > > phandle, looking at the clock-names property of that, and > > > > > > > > assuming > > > > > > > > that > > > > > > > > those are valid parent clocks for the cpu clock. > > > > > > > > > > > > > > > > This is necessary now that the clocks are generated based on > > > > > > > > the > > > > > > > > clock > > > > > > > > driver's knowledge of the chip rather than a fragile device > > > > > > > > -tree > > > > > > > > description of the mux options. > > > > > > > > > > > > > > > > We can now rely on the clock driver to ensure that the mux > > > > > > > > only > > > > > > > > exposes > > > > > > > > options that are valid. The cpufreq driver was currently > > > > > > > > being > > > > > > > > overly > > > > > > > > conservative in some cases -- for example, the "min_cpufreq = > > > > > > > > get_bus_freq()" restriction only applies to chips with erratum > > > > > > > > A-004510, and whether the freq_mask used on p5020 is needed > > > > > > > > depends on > > > > > > > > the actual frequencies of the PLLs (FWIW, p5040 has a similar > > > > > > > > limitation but its .freq_mask was zero) -- and the frequency > > > > > > > > mask > > > > > > > > mechanism made assumptions about particular parent clock > > > > > > > > indices > > > > > > > > that > > > > > > > > are no longer valid. > > > > > > > > > > > > > > > > Signed-off-by: Scott Wood > > > > > > > > --- > > > > > > > > v3: was patch 1/5 and patch 4/5, plus blacklist e6500 and > > > > > > > > changes > > > > > > > > to clk api usage > > > > > > > > > > > > > > > > drivers/cpufreq/qoriq-cpufreq.c | 137 --- > > > > > > > > > > > > > > > > -- > > > > > > > > --- > > > > > > > > 1 file changed, 40 insertions(+), 97 deletions(-) > > > > > > > > > > > > > > Acked-by: Viresh Kumar > > > > > > > > > > > > I'm wondering who's supposed to be merging this set? > > > > > > > > > > As I noted in the cover letter, I'm looking for acks so that I can > > > > > apply > > > > > these to a topic branch which can be pulled through the PPC and ARM > > > > > trees, > > > > > each of which will have patches that depend on it. > > > > > > > > OK, so no objections from the cpufreq side and you have the ACK from > > > > Viresh. > > > > > > Hi Scott, > > > > > > Did you drop this patch later? I can not find it in 4.5-rc still. > > > > I was asked to get an ack from Russell King patch 4/5, which this patch > > requires. Despite repeated pings, I could not get a reply from Russell > > King. > > This patch? I think you should try to get ACK from clock maintainers > instead of Russell. A clock maintainer was who asked me to get an ACK from Russell. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Fri, Feb 26, 2016 at 2:20 PM, Arnd Bergmann wrote: > On Friday 26 February 2016 12:04:59 Li Yang wrote: >> On Fri, Dec 18, 2015 at 4:32 PM, Arnd Bergmann wrote: >> > On Tuesday 15 December 2015 00:58:26 Rafael J. Wysocki wrote: >> >> On Thursday, November 26, 2015 05:21:11 PM Jia Hongtao wrote: >> >> > Register the qoriq cpufreq driver as a cooling device, based on the >> >> > thermal device tree framework. When temperature crosses the passive trip >> >> > point cpufreq is used to throttle CPUs. >> >> > >> >> > Signed-off-by: Jia Hongtao >> >> > Reviewed-by: Viresh Kumar >> >> >> >> Applied, thanks! >> >> >> > >> > I got a randconfig build error today: >> > >> > drivers/built-in.o: In function `qoriq_cpufreq_ready': >> > debugfs.c:(.text+0x1f4688): undefined reference to >> > `of_cpufreq_cooling_register' >> > >> > CONFIG_OF=y >> > CONFIG_QORIQ_CPUFREQ=y >> > CONFIG_THERMAL=m >> > CONFIG_THERMAL_OF=y >> > >> > I think you need a 'depends on THERMAL' to prevent the driver from >> > being built-in when THERMAL=m. >> >> Maybe this is not the best approach. The cpufreq feature itself >> should be working independently without thermal framework. I think we >> should make the qoriq_cpufreq_ready() defined as null function if >> THERMAL is not defined. > > It already does this when CONFIG_THERMAL is not defined, and my > patch doesn't change that. I'm not sure what you are asking for now. Oh. Actually I didn't see you just sent a patch for this. I accidentally get into this topic when I tried to find out why cpufreq is not working on ARMv8 platforms. I didn't notice that of_cpufreq_cooling_register() is already ifdef-ed. > > Do you want to allow using the cpufreq driver as a built-in driver > even when the thermal code is in a module, and then silently skip > all thermal management as if it was turned off? Having thermal code to be built as module and qoriq_cpufreq to be built-in is a valid situation. Making cpufreq not possible to be used when thermal is a module doesn't seem to be right. > > That would be this patch: > > diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h > index c156f5082758..a8d9241fc1bb 100644 > --- a/include/linux/cpu_cooling.h > +++ b/include/linux/cpu_cooling.h > @@ -48,7 +48,7 @@ cpufreq_power_cooling_register(const struct cpumask > *clip_cpus, > * @np: a valid struct device_node to the cooling device device tree node. > * @clip_cpus: cpumask of cpus where the frequency constraints will happen > */ > -#ifdef CONFIG_THERMAL_OF > +#if IS_REACHABLE(CONFIG_THERMAL_OF) > struct thermal_cooling_device * > of_cpufreq_cooling_register(struct device_node *np, > const struct cpumask *clip_cpus); > > > but my feeling is that this would cause more surprises when users > find their thermal management is not active even though it was > enabled in Kconfig and the thermal module gets loaded. I don't have a perfect solution either. But I think this is still better than making cpufreq not usable. The cpufreq driver will print out an error message if thermal is not reachable. Maybe this can relief the confusion a little bit? Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Friday 26 February 2016 17:07:09 Li Yang wrote: > > I don't have a perfect solution either. But I think this is still > better than making cpufreq not usable. The cpufreq driver will print > out an error message if thermal is not reachable. Maybe this can > relief the confusion a little bit? With my patch, the configuration will just force the cpufreq driver to be a loadable module as well if thermal is a module, so the dependency can be resolved by loading the thermal module first. I think that is really the best way around the problem, and it matches what other platforms do for the same problem. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Fri, Feb 26, 2016 at 5:16 PM, Arnd Bergmann wrote: > On Friday 26 February 2016 17:07:09 Li Yang wrote: >> >> I don't have a perfect solution either. But I think this is still >> better than making cpufreq not usable. The cpufreq driver will print >> out an error message if thermal is not reachable. Maybe this can >> relief the confusion a little bit? > > With my patch, the configuration will just force the cpufreq > driver to be a loadable module as well if thermal is a module, > so the dependency can be resolved by loading the thermal module first. It would be perfect if this it true. But I tried with the following change, it just makes QORIQ_CPUFREQ non-selectable if THERMAL=m. diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index dcb972a38fbc..ca05037dd565 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -297,6 +297,7 @@ endif config QORIQ_CPUFREQ tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" depends on OF && COMMON_CLK && (PPC_E500MC || ARM) + depends on !CPU_THERMAL || THERMAL=y select CLK_QORIQ help This adds the CPUFreq driver support for Freescale QorIQ SoCs Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Fri, Feb 26, 2016 at 5:31 PM, Li Yang wrote: > On Fri, Feb 26, 2016 at 5:16 PM, Arnd Bergmann wrote: >> On Friday 26 February 2016 17:07:09 Li Yang wrote: >>> >>> I don't have a perfect solution either. But I think this is still >>> better than making cpufreq not usable. The cpufreq driver will print >>> out an error message if thermal is not reachable. Maybe this can >>> relief the confusion a little bit? >> >> With my patch, the configuration will just force the cpufreq >> driver to be a loadable module as well if thermal is a module, >> so the dependency can be resolved by loading the thermal module first. > > It would be perfect if this it true. But I tried with the following > change, it just makes QORIQ_CPUFREQ non-selectable if THERMAL=m. > > diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig > index dcb972a38fbc..ca05037dd565 100644 > --- a/drivers/cpufreq/Kconfig > +++ b/drivers/cpufreq/Kconfig > @@ -297,6 +297,7 @@ endif > config QORIQ_CPUFREQ > tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" > depends on OF && COMMON_CLK && (PPC_E500MC || ARM) > + depends on !CPU_THERMAL || THERMAL=y > select CLK_QORIQ > help > This adds the CPUFreq driver support for Freescale QorIQ SoCs I find we can achieve your desired result with the following change instead: + depends on (THERMAL=m && m) || THERMAL=y || THERMAL=n Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v10 1/2] cpufreq: powernv: Fix bugs in powernv_cpufreq_{init/exit}
On Friday, February 26, 2016 04:15:00 PM Viresh Kumar wrote: > On 26-02-16, 16:06, Shilpasri G Bhat wrote: > > Unregister the notifiers if cpufreq_driver_register() fails in > > powernv_cpufreq_init(). Re-arrange the unregistration and cleanup routines > > in powernv_cpufreq_exit() to free all the resources after the driver > > has unregistered. > > > > Signed-off-by: Shilpasri G Bhat > > --- > > drivers/cpufreq/powernv-cpufreq.c | 40 > > --- > > 1 file changed, 29 insertions(+), 11 deletions(-) > > Acked-by: Viresh Kumar Queued up for 4.6, thanks! Rafael ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Fri, 2016-02-26 at 18:04 -0600, Li Yang wrote: > On Fri, Feb 26, 2016 at 5:31 PM, Li Yang wrote: > > On Fri, Feb 26, 2016 at 5:16 PM, Arnd Bergmann wrote: > > > On Friday 26 February 2016 17:07:09 Li Yang wrote: > > > > > > > > I don't have a perfect solution either. But I think this is still > > > > better than making cpufreq not usable. The cpufreq driver will print > > > > out an error message if thermal is not reachable. Maybe this can > > > > relief the confusion a little bit? > > > > > > With my patch, the configuration will just force the cpufreq > > > driver to be a loadable module as well if thermal is a module, > > > so the dependency can be resolved by loading the thermal module first. > > > > It would be perfect if this it true. But I tried with the following > > change, it just makes QORIQ_CPUFREQ non-selectable if THERMAL=m. > > > > diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig > > index dcb972a38fbc..ca05037dd565 100644 > > --- a/drivers/cpufreq/Kconfig > > +++ b/drivers/cpufreq/Kconfig > > @@ -297,6 +297,7 @@ endif > > config QORIQ_CPUFREQ > > tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" > > depends on OF && COMMON_CLK && (PPC_E500MC || ARM) > > + depends on !CPU_THERMAL || THERMAL=y > > select CLK_QORIQ > > help > > This adds the CPUFreq driver support for Freescale QorIQ SoCs > > > I find we can achieve your desired result with the following change instead: > > + depends on (THERMAL=m && m) || THERMAL=y || THERMAL=n "depends on THERMAL || !THERMAL" should also work. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V3] cpufreq: qoriq: Register cooling device based on device tree
On Fri, Feb 26, 2016 at 6:08 PM, Scott Wood wrote: > On Fri, 2016-02-26 at 18:04 -0600, Li Yang wrote: >> On Fri, Feb 26, 2016 at 5:31 PM, Li Yang wrote: >> > On Fri, Feb 26, 2016 at 5:16 PM, Arnd Bergmann wrote: >> > > On Friday 26 February 2016 17:07:09 Li Yang wrote: >> > > > >> > > > I don't have a perfect solution either. But I think this is still >> > > > better than making cpufreq not usable. The cpufreq driver will print >> > > > out an error message if thermal is not reachable. Maybe this can >> > > > relief the confusion a little bit? >> > > >> > > With my patch, the configuration will just force the cpufreq >> > > driver to be a loadable module as well if thermal is a module, >> > > so the dependency can be resolved by loading the thermal module first. >> > >> > It would be perfect if this it true. But I tried with the following >> > change, it just makes QORIQ_CPUFREQ non-selectable if THERMAL=m. >> > >> > diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig >> > index dcb972a38fbc..ca05037dd565 100644 >> > --- a/drivers/cpufreq/Kconfig >> > +++ b/drivers/cpufreq/Kconfig >> > @@ -297,6 +297,7 @@ endif >> > config QORIQ_CPUFREQ >> > tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" >> > depends on OF && COMMON_CLK && (PPC_E500MC || ARM) >> > + depends on !CPU_THERMAL || THERMAL=y >> > select CLK_QORIQ >> > help >> > This adds the CPUFreq driver support for Freescale QorIQ SoCs >> >> >> I find we can achieve your desired result with the following change instead: >> >> + depends on (THERMAL=m && m) || THERMAL=y || THERMAL=n > > "depends on THERMAL || !THERMAL" should also work. Right. And this is more simpler. Regards, Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev