Re: [PATCH v15 2/7] power: add power sequence library
On Wed, Jun 14, 2017 at 10:53:29AM +0200, Ulf Hansson wrote: > On 14 June 2017 at 03:53, Peter Chen wrote: > > On Tue, Jun 13, 2017 at 12:24:42PM +0200, Ulf Hansson wrote: > >> [...] > >> > >> > + > >> > +/** > >> > + * of_pwrseq_on - Carry out power sequence on for device node > >> > + * > >> > + * @np: the device node would like to power on > >> > + * > >> > + * Carry out a single device power on. If multiple devices > >> > + * need to be handled, use of_pwrseq_on_list() instead. > >> > + * > >> > + * Return a pointer to the power sequence instance on success, > >> > + * or an error code otherwise. > >> > + */ > >> > +struct pwrseq *of_pwrseq_on(struct device_node *np) > >> > +{ > >> > + struct pwrseq *pwrseq; > >> > + int ret; > >> > + > >> > + pwrseq = pwrseq_find_available_instance(np); > >> > + if (!pwrseq) > >> > + return ERR_PTR(-ENOENT); > >> > >> In case the pwrseq instance hasn't been registered yet, then there is > >> no way to deal with -EPROBE_DEFER properly here. > >> > >> I haven't been following the discussions in-depth during all > >> iterations, so perhaps you have already discussed why doing it like > >> this. > > > > Yes, it has been discussed. In order to compare with compatible string > > at dts, we need to have one registered pwrseq instance for each > > pwrseq library, this pre-registered one is allocated using > > postcore_initcall, and the new (eg, second) instance is registered > > after pwrseq_get has succeeded. > > I understand you need one compatible per pwrseq library, but how does > that have anything to do with -EPROBE_DEFER? > > My point is that, if a driver calls of_pwrseq_on() (which calls > pwrseq_find_available_instance()), but the corresponding pwrseq > library and instance has not yet been registered for that device. Then > how will you handle -EPROBE_DEFER? I guess you simply can't, which is > why *all* pwrseq libraries needs to be registered in early boot phase, > like at postcore_initcall(). Right? > > If that is the case, I really don't like it. > Yes, you are right. This is the limitation for this power sequence library, the registration for the 1st power sequence instance must be finished before device driver uses it. I am appreciated that you can supply some suggestions for it. > Moreover, I have found yet another severe problem but reviewing the code: > In the struct pwrseq, you have a "bool used", which you are setting to > "true" once the pwrseq has been hooked up with the device, when a > driver calls of_pwrseq_on(). Setting that variable to true, will also > prevent another driver from using the same instance of the pwrseq for > its device. So, to cope with multiple users, you register a new > instance of the same pwrseq library that got hooked up, once the > ->get() callback is about to complete. > > The problem the occurs, when there is another driver calling > of_pwrseq_on() in between, meaning that the new instance has not yet > been registered. This will simply fail, won't it? Yes, you are right, thanks for pointing that, I will add mutex_lock for of_pwrseq_on. > > Sorry for jumping in late, however to me it seems like there is still > some pieces missing to make this work. > > [...] > > Kind regards > Uffe -- Best Regards, Peter Chen
Re: [PATCH] usb: host: ehci: workaround PME bug on AMD EHCI controller
On Thu, Jun 15, 2017 at 2:55 AM, Alan Stern wrote: > On Tue, 13 Jun 2017, Bjorn Helgaas wrote: > >> [+cc Rafael, linux-pm] >> >> On Tue, Jun 13, 2017 at 12:21:15PM +0800, Kai-Heng Feng wrote: >> > On Mon, Jun 12, 2017 at 10:18 PM, Alan Stern >> > wrote: >> > > Let's get some help from people who understand PCI well. >> > > >> > > Here's the general problem: Kai-Heng has a PCI-based USB host >> > > controller that advertises wakeup capability from D3, but it doesn't >> > > assert PME# from D3 when it should. For "lspci -vv" output, see >> > > >> > > http://marc.info/?l=linux-usb&m=149570231732519&w=2 >> > > >> > > On Mon, 12 Jun 2017, Kai-Heng Feng wrote: >> > > >> > >> On Mon, Jun 12, 2017 at 3:04 PM, Kai-Heng Feng >> > >> wrote: >> > >> > On Fri, Jun 9, 2017 at 10:43 PM, Alan Stern >> > >> > wrote: >> > >> >> On Fri, 9 Jun 2017, Kai-Heng Feng wrote: >> > >> >> >> > >> >> Is this really the right solution? Maybe it would be better to allow >> > >> >> the controller to go into D3 provided no wakeup signal is needed. >> > >> >> You >> > >> >> could do: >> > >> >> >> > >> >> device_set_wakeup_capable(&pdev->dev, 0); >> > >> > >> > >> > This doesn't work. >> > >> > After applying this function, still nothing happens when devices get >> > >> > plugged in. >> > >> > IIUC this function disable the wakeup function, but what I want to do >> > >> > here is to have PME signal works even when runtime PM is enabled. >> > > >> > > This may indicate a bug in either the PCI or USB stacks (or both!). If >> > > a driver requires wakeup capability from runtime suspend but the device >> > > does not provide it, the PCI core should not allow the device to go >> > > into runtime suspend. Or is that the driver's responsibility? >> > > >> > >> > I also saw some legacy PCI PM stuff, so I also tried: >> > >> > device_set_wakeup_capable(&pdev->dev, 1); >> > >> > ...doesn't work either. >> > >> > >> > >> >> >> > >> >> Another alternative is to put the controller into D2 instead of D3, >> > >> >> but >> > >> >> (1) I don't know how to do that, and (2) we don't know if wakeup >> > >> >> signalling works any better in D2 than it does in D3. >> > >> > >> > >> > I'll try if D2 works. >> > >> >> > >> Put the device into D2 instead of D3 can make the wakeup signaling >> > >> work, i.e. USB devices can be correctly detected after plugged into >> > >> EHCI port. >> > >> >> > >> Do you think this alternative an acceptable workaround? >> > > >> > > Yes, it is. The difficulty is that I don't know how to tell the PCI >> > > core that the device should go in D2 during runtime suspend instead of >> > > D3. Some sort of quirk may be needed -- perhaps Bjorn can help. > >> The lspci output [1] shows: >> >> 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI >> Controller (rev 39) (prog-if 20 [EHCI]) >> Capabilities: [c0] Power Management version 2 >> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA >> PME(D0+,D1+,D2+,D3hot+,D3cold+) >> Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME- >> Bridge: PM- B3+ >> >> The device claims it can assert PME# from D3hot. If it can't, that >> sounds like a hardware defect that should be addressed with a quirk. >> Ideally we would also have a pointer to the AMD hardware erratum. >> >> Is the following path involved here? >> >> pci_finish_runtime_suspend >> target_state = pci_target_state() >> if (device_may_wakup()) >> if (dev->pme_support) >> ... >> pci_set_power_state(..., target_state) >> >> If so, I would naively expect that a quirk could clear the >> PCI_PM_CAP_PME_D3 and PCI_PM_CAP_PME_D3cold bits in dev->pme_support, >> and pci_target_state() would then avoid selecting D3 or D3cold. But >> I'm not an expert in power management. > > That's a good idea. However, we should apply the quirk only when it is > needed. Which means we need to know the numeric values for the PCI > IDs. Also, this will help searching for published errata. > > Kai-Heng, what does "lspci -nvs 00:12.0" show? 00:12.0 0c03: 1022:7808 (rev 39) (prog-if 20 [EHCI]) Subsystem: 1028:0732 Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 18 Memory at fe769000 (32-bit, non-prefetchable) [size=256] Capabilities: [c0] Power Management version 2 Capabilities: [e4] Debug port: BAR=1 offset=00e0 Kernel driver in use: ehci-pci Here's the diff that can make it work: diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 1a14ca8965e6..7bd278535ab3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2208,6 +2208,12 @@ void pci_pm_init(struct pci_dev *dev) } pmc &= PCI_PM_CAP_PME_MASK; + + if (unlikely(dev->vendor == 0x1022 && dev->device == 0x7808)) { + dev_info(&dev->dev, "PME# does not work under D3, disabling it\n"); + pmc &= ~(PCI_PM_CAP_PME_D3 | PCI_PM_CAP_PME_D3cold); + } + if (pmc) { dev_
Re: linux-next: build warning after merge of the i2c tree
> > drivers/i2c/i2c-stub.c:18:0: warning: "DEBUG" redefined > > #define DEBUG > > ^ > > :0:0: note: this is the location of the previous definition > > > > Introduced by commit > > > > 6c42778780c4 ("i2c: stub: use pr_fmt") > > I am still getting this ... Sorry, that slipped through the cracks, will fix today! Thanks for the reminder. signature.asc Description: PGP signature
Re: [PATCH v2 11/11] kasan: rework Kconfig settings
On Wed, Jun 14, 2017 at 11:15 PM, Arnd Bergmann wrote: > We get a lot of very large stack frames using gcc-7.0.1 with the default > -fsanitize-address-use-after-scope --param asan-stack=1 options, which > can easily cause an overflow of the kernel stack, e.g. > > drivers/acpi/nfit/core.c:2686:1: warning: the frame size of 4080 bytes is > larger than 2048 bytes [-Wframe-larger-than=] > drivers/gpu/drm/amd/amdgpu/si.c:1756:1: warning: the frame size of 7304 bytes > is larger than 2048 bytes [-Wframe-larger-than=] > drivers/gpu/drm/i915/gvt/handlers.c:2200:1: warning: the frame size of 43752 > bytes is larger than 2048 bytes [-Wframe-larger-than=] > drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:952:1: warning: the frame size of 6032 > bytes is larger than 2048 bytes [-Wframe-larger-than=] > drivers/isdn/hardware/avm/b1.c:637:1: warning: the frame size of 13200 bytes > is larger than 2048 bytes [-Wframe-larger-than=] > drivers/media/dvb-frontends/stv090x.c:3089:1: warning: the frame size of 5880 > bytes is larger than 2048 bytes [-Wframe-larger-than=] > drivers/media/i2c/cx25840/cx25840-core.c:4964:1: warning: the frame size of > 93992 bytes is larger than 2048 bytes [-Wframe-larger-than=] > drivers/net/wireless/ralink/rt2x00/rt2800lib.c:4994:1: warning: the frame > size of 23928 bytes is larger than 2048 bytes [-Wframe-larger-than=] > drivers/staging/dgnc/dgnc_tty.c:2788:1: warning: the frame size of 7072 bytes > is larger than 2048 bytes [-Wframe-larger-than=] > fs/ntfs/mft.c:2762:1: warning: the frame size of 7432 bytes is larger than > 2048 bytes [-Wframe-larger-than=] > lib/atomic64_test.c:242:1: warning: the frame size of 12648 bytes is larger > than 2048 bytes [-Wframe-larger-than=] > > To reduce this risk, -fsanitize-address-use-after-scope is now split out > into a separate Kconfig option, vhich cannot be selected at the same > time as KMEMCHECK, leading to stack frames that are smaller than 2 > kilobytes most of the time on x86_64. An earlier version of this > patch also prevented combining KASAN_EXTRA with KASAN_INLINE, but that > is no longer necessary with gcc-7.0.1. > > A lot of warnings with KASAN_EXTRA go away if we disable KMEMCHECK, > as -fsanitize-address-use-after-scope seems to understand the builtin > memcpy, but adds checking code around an extern memcpy call. I had > to work around a circular dependency, as DEBUG_SLAB/SLUB depended > on !KMEMCHECK, while KASAN did it the other way round. Now we handle > both the same way. > > All patches to get the frame size below 2048 bytes with CONFIG_KASAN=y > and CONFIG_KASAN_EXTRA=n have been submitted along with this patch, > so we can bring back that default now. KASAN_EXTRA=y still causes lots > of warnings but now defaults to !COMPILE_TEST to disable it in > allmodconfig, and it remains disabled in all other defconfigs since > it is a new option. > > This reverts parts of commit commit 3f181b4 ("lib/Kconfig.debug: > disable -Wframe-larger-than warnings with KASAN=y"). > > I experimented a bit more with smaller stack frames and have another > follow-up series that reduces the warning limit for 64-bit architectures > to 1280 bytes and 1536 when CONFIG_KASAN (but not KASAN_EXTRA) is > enabled, this requires another ~25 patches to address the additional > warnings. I also have patches for all KASAN_EXTRA warnings, but we > should look at those separately and then decide whether to remove > it completely, leaving out -fsanitize-address-use-after-scope. > > Signed-off-by: Arnd Bergmann > --- > lib/Kconfig.debug | 4 ++-- > lib/Kconfig.kasan | 11 ++- > lib/Kconfig.kmemcheck | 1 + > scripts/Makefile.kasan | 3 +++ > 4 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index ddbef2cac189..02ec4a4da7b1 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -217,7 +217,7 @@ config ENABLE_MUST_CHECK > config FRAME_WARN > int "Warn for stack frames larger than (needs gcc 4.4)" > range 0 8192 > - default 0 if KASAN > + default 3072 if KASAN_EXTRA > default 2048 if GCC_PLUGIN_LATENT_ENTROPY > default 1024 if !64BIT > default 2048 if 64BIT > @@ -500,7 +500,7 @@ config DEBUG_OBJECTS_ENABLE_DEFAULT > > config DEBUG_SLAB > bool "Debug slab memory allocations" > - depends on DEBUG_KERNEL && SLAB && !KMEMCHECK > + depends on DEBUG_KERNEL && SLAB && !KMEMCHECK && !KASAN > help > Say Y here to have the kernel do limited verification on memory > allocation as well as poisoning memory on free to catch use of freed > diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan > index bd38aab05929..4d17a8f4742f 100644 > --- a/lib/Kconfig.kasan > +++ b/lib/Kconfig.kasan > @@ -5,7 +5,7 @@ if HAVE_ARCH_KASAN > > config KASAN > bool "KASan: runtime memory debugger" > - depends on SLUB || (SLAB && !DEBUG_SLAB) > + depends on SLUB || SLAB > select CONSTRUCTORS >
Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers
On Thu, 15 Jun 2017, "J. R. Okajima" wrote: > Thanx, I got linux-v4.12-rc4 and it contains > 4681ee2 2017-05-18 drm/i915: Do not sync RCU during shrinking > > How about v4.11.x series? > I got v4.11.5, but it doesn't contain the fix. > Do you have a plan? The upstream commit has the proper Cc: stable and Fixes: tags in place, it just takes a while for the patches to trickle to stable kernels. BR, Jani. -- Jani Nikula, Intel Open Source Technology Center
Re: [PATCH v3 1/9] ARM: dts: imx6ul-isiot: Add Sound card with codec node
On Thu, Jun 15, 2017 at 10:21:43AM +0530, Jagan Teki wrote: > On Thu, Jun 15, 2017 at 7:50 AM, Shawn Guo wrote: > > On Wed, Jun 14, 2017 at 08:17:04PM +0530, Jagan Teki wrote: > >> On Fri, Apr 7, 2017 at 6:46 PM, Shawn Guo wrote: > >> > On Thu, Apr 06, 2017 at 11:32:07PM +0530, Jagan Teki wrote: > >> >> From: Jagan Teki > >> >> > >> >> Add support for Sound card and related codec(via i2c1) nodes > >> >> on Engicam Is.IoT MX6UL variant module boards. > >> >> > >> >> Cc: Shawn Guo > >> >> Cc: Matteo Lisi > >> >> Cc: Michael Trimarchi > >> >> Signed-off-by: Jagan Teki > >> >> --- > >> >> Changes for v3: > >> >> - Replace fsl,imx-audio-sgtl5000 and use simple-audio-card > >> >> Changes for v2: > >> >> - Use proper [label:] node-name[@unit-address] for codec > >> >> - Remove incorrect codec property 'wlf,shared-lrclk' > >> >> - Remove 'gpr' from sound card node > >> >> > >> >> arch/arm/boot/dts/imx6ul-isiot-common.dtsi | 10 +++ > >> >> arch/arm/boot/dts/imx6ul-isiot.dtsi| 44 > >> >> ++ > >> > > >> > Can you help me understand how these two files are related? Why is > >> > sgtl5000 added into one and sound node added into the other? > >> > >> lcdif, ts and sound card which may differ based on the base-board > >> connected with SOM, So I moved these stuff which are related to > >> Starter kit supported once's and used with SOM dts files. if some > >> other board with same SOM can have different lcdif and etc so they can > >> define locally to dts. > > > > I do not follow how these stuff are organized. So far we have the > > following isiot dts files. > > > > - imx6ul-isiot-common.dtsi > > - imx6ul-isiot.dtsi > > - imx6ul-isiot-emmc.dts and imx6ul-isiot-nand.dts > > > > How are they mapping to SoM and base-board? > > isiot is a modules class, with that emmc and nand are two separate > SOM's. the current support is for mounting these SOM's on Development > base board[1]. So, for isiot module class we have imx6ul-isiot.dtsi > and emmc and nand SOM's have imx6ul-isiot-emmc.dts and > imx6ul-isiot-nand.dts. There are some Carrier boards[1] which were > used with different lcdif and other changes, So > imx6ul-isiot-common.dtsi have changes common across emmc and nand, > instead of adding them into individual dts files I moved in > -common.dtsi. So in future if isiot SOM mounted on carrier board > which should have a separate dts and which may or may not use > imx6ul-isiot-common.dtsi So you are not sure if imx6ul-isiot-common.dtsi will be used by carrier board. Then what's the point to have it now? You are saying imx6ul-isiot-common.dtsi is created to accommodate the common things across emmc and nand SoMs, but it actually contains LCD and Touch such base-board level of stuff. Confusing. I feel the abstraction is wrong from the beginning. Ideally, we should have something like below. - imx6ul-isiot.dtsi - imx6ul-isiot-kit.dts and imx6ul-isiot-carrier.dts The -isiot should have everything on SoM and common stuff between -kit and -carrier boards, while -kit and -carrier include -isiot and contains the base-board specific things. The -isiot can have both emmc and nand devices with "disabled" status, and let firmware turn device on per SoM it boots. In that case, the abstraction level can be less and clearer. Thoughts? Shawn
[PATCH 2/6] Documentation: rockchip-dw-mshc: add description for rk3228
From: Shawn Lin Add "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc" for dwmmc on rk322x platform. Signed-off-by: Shawn Lin --- Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt index 520d61d..ce30dff 100644 --- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt @@ -15,6 +15,7 @@ Required Properties: - "rockchip,rk3288-dw-mshc": for Rockchip RK3288 - "rockchip,rv1108-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RV1108 - "rockchip,rk3036-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3036 + - "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK322X - "rockchip,rk3368-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3368 - "rockchip,rk3399-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3399 -- 2.0.0
[PATCH 4/6] ARM: dts: rockchip: add sdmmc and sdio nodes for rk3228 SoC
From: Shawn Lin This patch adds sdmmc/sdio controller nodes for rk3228 SoC. Signed-off-by: Shawn Lin --- arch/arm/boot/dts/rk322x.dtsi | 60 +++ 1 file changed, 60 insertions(+) diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index a812422..5e7b54c 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -500,6 +500,32 @@ status = "disabled"; }; + sdmmc: dwmmc@3000 { + compatible = "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc"; + reg = <0x3000 0x4000>; + interrupts = ; + clocks = <&cru HCLK_SDMMC>, <&cru SCLK_SDMMC>, +<&cru SCLK_SDMMC_DRV>, <&cru SCLK_SDMMC_SAMPLE>; + clock-names = "biu", "ciu", "ciu_drv", "ciu_sample"; + fifo-depth = <0x100>; + pinctrl-names = "default"; + pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_bus4>; + status = "disabled"; + }; + + sdio: dwmmc@3001 { + compatible = "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc"; + reg = <0x3001 0x4000>; + interrupts = ; + clocks = <&cru HCLK_SDIO>, <&cru SCLK_SDIO>, +<&cru SCLK_SDIO_DRV>, <&cru SCLK_SDIO_SAMPLE>; + clock-names = "biu", "ciu", "ciu_drv", "ciu_sample"; + fifo-depth = <0x100>; + pinctrl-names = "default"; + pinctrl-0 = <&sdio_clk &sdio_cmd &sdio_bus4>; + status = "disabled"; + }; + emmc: dwmmc@3002 { compatible = "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc"; reg = <0x3002 0x4000>; @@ -710,6 +736,40 @@ drive-strength = <12>; }; + sdmmc { + sdmmc_clk: sdmmc-clk { + rockchip,pins = <1 16 RK_FUNC_1 &pcfg_pull_none_drv_12ma>; + }; + + sdmmc_cmd: sdmmc-cmd { + rockchip,pins = <1 15 RK_FUNC_1 &pcfg_pull_none_drv_12ma>; + }; + + sdmmc_bus4: sdmmc-bus4 { + rockchip,pins = <1 18 RK_FUNC_1 &pcfg_pull_none_drv_12ma>, + <1 19 RK_FUNC_1 &pcfg_pull_none_drv_12ma>, + <1 20 RK_FUNC_1 &pcfg_pull_none_drv_12ma>, + <1 21 RK_FUNC_1 &pcfg_pull_none_drv_12ma>; + }; + }; + + sdio { + sdio_clk: sdio-clk { + rockchip,pins = <3 0 RK_FUNC_1 &pcfg_pull_none_drv_12ma>; + }; + + sdio_cmd: sdio-cmd { + rockchip,pins = <3 1 RK_FUNC_1 &pcfg_pull_none_drv_12ma>; + }; + + sdio_bus4: sdio-bus4 { + rockchip,pins = <3 2 RK_FUNC_1 &pcfg_pull_none_drv_12ma>, + <3 3 RK_FUNC_1 &pcfg_pull_none_drv_12ma>, + <3 4 RK_FUNC_1 &pcfg_pull_none_drv_12ma>, + <3 5 RK_FUNC_1 &pcfg_pull_none_drv_12ma>; + }; + }; + emmc { emmc_clk: emmc-clk { rockchip,pins = <2 7 RK_FUNC_2 &pcfg_pull_none>; -- 2.0.0
[PATCH 0/6] add some device nodes support for rk322x SoC
These series add sdmmc, sdio, and other device nodes support for rk322x SoCs, and also introduce rk3229 basic dtsi file specifically. David Wu (1): ARM: dts: rockchip: Add io-domain node for rk3228 Finley Xiao (1): ARM: dts: rockchip: add efuse device node for rk3228 Frank Wang (1): ARM: dts: rockchip: add basic dtsi file for RK3229 SoC Shawn Lin (3): Documentation: rockchip-dw-mshc: add description for rk3228 ARM: dts: rockchip: fix compatible string for eMMC node of rk3228 SoC ARM: dts: rockchip: add sdmmc and sdio nodes for rk3228 SoC .../devicetree/bindings/mmc/rockchip-dw-mshc.txt | 1 + arch/arm/boot/dts/rk3229-evb.dts | 2 +- arch/arm/boot/dts/rk3229.dtsi | 110 + arch/arm/boot/dts/rk322x.dtsi | 84 +++- 4 files changed, 195 insertions(+), 2 deletions(-) create mode 100644 arch/arm/boot/dts/rk3229.dtsi -- 2.0.0
[PATCH 3/6] ARM: dts: rockchip: fix compatible string for eMMC node of rk3228 SoC
From: Shawn Lin This adds amend compatible content for eMMC of RK3228 SoC. Signed-off-by: Shawn Lin --- arch/arm/boot/dts/rk322x.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index f3e4ffd..a812422 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -501,7 +501,7 @@ }; emmc: dwmmc@3002 { - compatible = "rockchip,rk3288-dw-mshc"; + compatible = "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc"; reg = <0x3002 0x4000>; interrupts = ; clock-frequency = <3750>; -- 2.0.0
[PATCH 1/6] ARM: dts: rockchip: add basic dtsi file for RK3229 SoC
Due to some tiny differences between RK3228 and RK3229, this patch adds a basic dtsi file which includes a new CPU opp table and PSCI brought up support for RK3229. Signed-off-by: Frank Wang --- arch/arm/boot/dts/rk3229-evb.dts | 2 +- arch/arm/boot/dts/rk3229.dtsi| 110 +++ 2 files changed, 111 insertions(+), 1 deletion(-) create mode 100644 arch/arm/boot/dts/rk3229.dtsi diff --git a/arch/arm/boot/dts/rk3229-evb.dts b/arch/arm/boot/dts/rk3229-evb.dts index 1b55192..82e8a53 100644 --- a/arch/arm/boot/dts/rk3229-evb.dts +++ b/arch/arm/boot/dts/rk3229-evb.dts @@ -40,7 +40,7 @@ /dts-v1/; -#include "rk322x.dtsi" +#include "rk3229.dtsi" / { model = "Rockchip RK3229 Evaluation board"; diff --git a/arch/arm/boot/dts/rk3229.dtsi b/arch/arm/boot/dts/rk3229.dtsi new file mode 100644 index 000..d43d133 --- /dev/null +++ b/arch/arm/boot/dts/rk3229.dtsi @@ -0,0 +1,110 @@ +/* + * Copyright (c) 2017 Fuzhou Rockchip Electronics Co., Ltd + * + * This file is dual-licensed: you can use it either under the terms + * of the GPL or the X11 license, at your option. Note that this dual + * licensing only applies to this file, and not this project as a + * whole. + * + * a) This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Or, alternatively, + * + * b) Permission is hereby granted, free of charge, to any person + * obtaining a copy of this software and associated documentation + * files (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, + * copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following + * conditions: + * + * The above copyright notice and this permission notice shall be + * included in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES + * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT + * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, + * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include "rk322x.dtsi" + +/ { + compatible = "rockchip,rk3229"; + + /delete-node/ opp-table0; + + cpu0_opp_table: opp_table0 { + compatible = "operating-points-v2"; + opp-shared; + + opp-40800 { + opp-hz = /bits/ 64 <40800>; + opp-microvolt = <95>; + clock-latency-ns = <4>; + opp-suspend; + }; + opp-6 { + opp-hz = /bits/ 64 <6>; + opp-microvolt = <975000>; + }; + opp-81600 { + opp-hz = /bits/ 64 <81600>; + opp-microvolt = <100>; + }; + opp-100800 { + opp-hz = /bits/ 64 <100800>; + opp-microvolt = <1175000>; + }; + opp-12 { + opp-hz = /bits/ 64 <12>; + opp-microvolt = <1275000>; + }; + opp-129600 { + opp-hz = /bits/ 64 <129600>; + opp-microvolt = <1325000>; + }; + opp-139200 { + opp-hz = /bits/ 64 <139200>; + opp-microvolt = <1375000>; + }; + opp-146400 { + opp-hz = /bits/ 64 <146400>; + opp-microvolt = <140>; + }; + }; + + psci { + compatible = "arm,psci-1.0", "arm,psci-0.2"; + method = "smc"; + }; +}; + +&cpu0 { + enable-method = "psci"; +}; + +&cpu1 { + enable-method = "psci"; +}; + +&cpu2 { + enable-method = "psci"; +}; + +&cpu3 { + enable-method = "psci"; +}; -- 2.0.0
Re: [PATCH 4.11 049/150] efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
On Thu, Jun 15, 2017 at 01:34:38AM +0200, Maniaxx wrote: > On 12.06.2017 at 17:24 wrote Greg Kroah-Hartman: > > 4.11-stable review patch. If anyone has any objections, please let me know. > > > > -- > > > > From: Dave Young > > > > commit 7425826f4f7ac60f2538b06a7f0a5d1006405159 upstream. > > > > Sabrina Dubroca reported an early panic: > > > > BUG: unable to handle kernel paging request at ff240001 > > IP: efi_bgrt_init+0xdc/0x134 > > > > [...] > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! > > > > ... which was introduced by: > > > > 7b0a911478c7 ("efi/x86: Move the EFI BGRT init code to early init code") > > > > The cause is that on this machine the firmware provides the EFI ACPI BGRT > > table even on legacy non-EFI bootups - which table should be EFI only. > > > > The garbage BGRT data causes the efi_bgrt_init() panic. > > > > Add a check to skip efi_bgrt_init() in case non-EFI bootup to work around > > this firmware bug. > > > > Tested-by: Sabrina Dubroca > > Signed-off-by: Dave Young > > Signed-off-by: Ard Biesheuvel > > Signed-off-by: Matt Fleming > > Cc: Linus Torvalds > > Cc: Peter Zijlstra > > Cc: Thomas Gleixner > > Cc: linux-...@vger.kernel.org > > Fixes: 7b0a911478c7 ("efi/x86: Move the EFI BGRT init code to early init > > code") > > Link: > > http://lkml.kernel.org/r/20170526113652.21339-6-m...@codeblueprint.co.uk > > [ Rewrote the changelog to be more readable. ] > > Signed-off-by: Ingo Molnar > > Signed-off-by: Greg Kroah-Hartman > > > > --- > > arch/x86/platform/efi/efi-bgrt.c |3 +++ > > 1 file changed, 3 insertions(+) > > > > --- a/arch/x86/platform/efi/efi-bgrt.c > > +++ b/arch/x86/platform/efi/efi-bgrt.c > > @@ -36,6 +36,9 @@ void __init efi_bgrt_init(struct acpi_ta > > if (acpi_disabled) > > return; > > > > + if (!efi_enabled(EFI_BOOT)) > > + return; > > + > > if (table->length < sizeof(bgrt_tab)) { > > pr_notice("Ignoring BGRT: invalid length %u (expected %zu)\n", > >table->length, sizeof(bgrt_tab)); > > > > > > > > The patch is ok but it only fixes BIOS systems. > To fix the regression above (commit 7b0a911478c7) for EFI systems > it needs this patch as well: > commit 792ef14df5c585c19b2831673a077504a09e5203 master > (efi: Fix boot panic because of invalid BGRT image address) Thanks for letting me know, now queued up. greg k-h
[PATCH 5/6] ARM: dts: rockchip: Add io-domain node for rk3228
From: David Wu This patch adds io-domain support for rk3228 SoC. Signed-off-by: David Wu Signed-off-by: Frank Wang --- arch/arm/boot/dts/rk322x.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index 5e7b54c..c2a78f4 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -215,6 +215,11 @@ #address-cells = <1>; #size-cells = <1>; + io_domains: io-domains { + compatible = "rockchip,rk3228-io-voltage-domain"; + status = "disabled"; + }; + u2phy0: usb2-phy@760 { compatible = "rockchip,rk3228-usb2phy"; reg = <0x0760 0x0c>; -- 2.0.0
[PATCH 6/6] ARM: dts: rockchip: add efuse device node for rk3228
From: Finley Xiao Add a efuse node in the device tree for the rk3228 SoC. Signed-off-by: Finley Xiao --- arch/arm/boot/dts/rk322x.dtsi | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index c2a78f4..dad195e 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -314,6 +314,23 @@ status = "disabled"; }; + efuse: efuse@1104 { + compatible = "rockchip,rk322x-efuse"; + reg = <0x1104 0x20>; + #address-cells = <1>; + #size-cells = <1>; + clocks = <&cru PCLK_EFUSE_256>; + clock-names = "pclk_efuse"; + + /* Data cells */ + efuse_id: id@7 { + reg = <0x7 0x10>; + }; + cpu_leakage: cpu_leakage@17 { + reg = <0x17 0x1>; + }; + }; + i2c0: i2c@1105 { compatible = "rockchip,rk3228-i2c"; reg = <0x1105 0x1000>; -- 2.0.0
Re: [PATCH v5 0/7] ARM: Fix dma_alloc_coherent() and friends for NOMMU
Greg? On 08/06/17 17:25, Russell King - ARM Linux wrote: > Well, I've no objection to this, but it does need acks from other > people before I can apply it. > > There's two patches that touch drivers/base that need Greg's ack. > > I'm not sure what's happening with lib/dma-noop.c, there doesn't > appear to be a maintainer list for it, so I guess that's a > free-for-all. > > On Thu, Jun 08, 2017 at 09:28:30AM +0100, Vladimir Murzin wrote: >> Ping! >> >> On 24/05/17 11:24, Vladimir Murzin wrote: >>> Short story: >>> >>> Without these patches coherent DMA is broken for András and Alexandre, >>> so they cannot safely enable DMA on their platforms. >>> >>> Patches have been circulated on a list since last year without much >>> attention to changes in dma-coherent.c and dma-noop.c. Meanwhile, ARM >>> bits have been reviewed and there is no strict objection to get them >>> merged. Unfortunately, applying only ARM bits doesn't help much and >>> the original issue would still exist. >>> >>> Please, let me know how to move with this fix forward? >>> >>> Long story: >>> >>> It seems that addition of cache support for M-class CPUs uncovered >>> latent bug in DMA usage. NOMMU memory model has been treated as being >>> always consistent; however, for R/M CPU classes memory can be covered >>> by MPU which in turn might configure RAM as Normal i.e. bufferable and >>> cacheable. It breaks dma_alloc_coherent() and friends, since data can >>> stuck in caches now or be buffered. >>> >>> This patch set is trying to address the issue by providing region of >>> memory suitable for consistent DMA operations. It is supposed that >>> such region is marked by MPU as non-cacheable. Robin suggested to >>> advertise such memory as reserved shared-dma-pool, rather then using >>> homebrew command line option, and extend dma-coherent to provide >>> default DMA area in the similar way as it is done for CMA (PATCH >>> 4/7). It allows us to offload all bookkeeping on generic coherent DMA >>> framework, and it seems that it might be reused by other architectures >>> like c6x and blackfin. >>> >>> While reviewing/testing previous versions of the patch set it turned >>> out that dma-coherent does not take into account "dma-ranges" device >>> tree property, so it is addressed in PATCH 3/7. >>> >>> For ARM, dedicated DMA region is required for cases other than: >>> - MMU/MPU is off >>> - cpu is v7m w/o cache support >>> - device is coherent >>> >>> In case any of the above conditions is true dma operations are forced >>> to be coherent and wired with dma_noop_ops. >>> >>> To make life easier NOMMU dma operations are kept in separate >>> compilation unit. >>> >>> Since the issue was reported at the same time as Benjamin sent his >>> patch [1] to allow mmap for NOMMU, his case is also addressed in this >>> series (PATCH 1/7 and PATCH 2/7). >>> >>> Thanks! >>> >>> [1] http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8633/1 >>> >>> Cc: Joerg Roedel >>> Cc: Christian Borntraeger >>> Cc: Michal Nazarewicz >>> Cc: Marek Szyprowski >>> Cc: Alan Stern >>> Cc: Yoshinori Sato >>> Cc: Rich Felker >>> Cc: Roger Quadros >>> Cc: Greg Kroah-Hartman >>> Cc: Rob Herring >>> Cc: Mark Rutland >>> Cc: Doug Ledford >>> >>> Changelog: >>> v4 -> v5 >>>- rebased on v4.12-rc2 >>>- updated description for CONFIG_ARM_DMA_MEM_BUFFERABLE >>> >>> v3 -> v4 >>>- rebased on v4.11-rc7 >>>- made CONFIG_ARM_DMA_MEM_BUFFERABLE optional for CPU_V7M >>>- added Arnd's Acked-by >>> >>> v2 -> v3 >>>- fixed warnings reported by Alexandre and kbuild robot >>> >>> v1 -> v2 >>>- rebased on v4.11-rc1 >>>- added Robin's Reviewed-by >>>- dedicated flag is introduced to use dev->dma_pfn_offset >>> rather than mem->device_base in case memory region is >>> configured via device tree (so Tested-by discarded there) >>> >>> RFC v6 -> v1 >>>- dropped RFC tag >>>- added Alexandre's Tested-by >>> >>> Vladimir Murzin (7): >>> dma: Take into account dma_pfn_offset >>> dma: Add simple dma_noop_mmap >>> drivers: dma-coherent: Account dma_pfn_offset when used with device >>> tree >>> drivers: dma-coherent: Introduce default DMA pool >>> ARM: NOMMU: Introduce dma operations for noMMU >>> ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus >>> ARM: dma-mapping: Remove traces of NOMMU code >>> >>> .../bindings/reserved-memory/reserved-memory.txt | 3 + >>> arch/arm/Kconfig | 1 + >>> arch/arm/include/asm/dma-mapping.h | 2 +- >>> arch/arm/mm/Kconfig| 8 +- >>> arch/arm/mm/Makefile | 5 +- >>> arch/arm/mm/dma-mapping-nommu.c| 253 >>> + >>> arch/arm/mm/dma-mapping.c | 29 +-- >>> d
Re: [BISECTED, REGRESSION] v4.12-rc: omapdrm fails to probe on Nokia N900
On 2017-06-15 01:11, Aaro Koskinen wrote: > Hi, > > When booting v4.12-rc5 on Nokia N900, omapdrm fails to probe and there > is no display. > > Bisected to: > > a09d2bc1503508c17ef3a71c6b1905e3660f3029 is the first bad commit > commit a09d2bc1503508c17ef3a71c6b1905e3660f3029 > Author: Peter Ujfalusi > Date: Tue May 3 22:08:01 2016 +0300 > > drm/omap: Use omapdss_stack_is_ready() to check that the display stack is > up > > Instead of 'guessing' based on aliases of the status of the DSS drivers, > use the new interface to check that all needed drivers are loaded. > In this way we can be sure that all needed drivers are loaded so it is > safe to continue the probing of omapdrm. > This method will allow the omapdrm to be probed 'headless', without > outputs. > > Signed-off-by: Peter Ujfalusi > Signed-off-by: Tomi Valkeinen > > Reverting the commit seems to fix the issue. When you revert this patch do you see a warning saying: "could not connect display: blah" ? if so what is 'blah'? n900 have two displays afaik, LCD and TVout. omapdss_stack_is_ready() is to ensure that we have all the drivers loaded for both displays, while by reverting it it is enough if one of them is loaded at the time we do the check and omapdrm would continue to probe, but the missing display (even if it is going to be probed a bit later) will not work. - Péter
Re: [PATCH v2] PCI: dwc: dra7xx: Fix compilation warning.
Hi, On Thursday 15 June 2017 11:52 AM, Arvind Yadav wrote: > drivers/pci/dwc/pci-dra7xx.c: In function ‘dra7xx_pcie_enable_msi_interrupts’: > drivers/pci/dwc/pci-dra7xx.c:177:7: warning: large integer implicitly > truncated to unsigned type [-Woverflow] >~LEG_EP_INTERRUPTS & ~MSI); >^ > drivers/pci/dwc/pci-dra7xx.c: In function > ‘dra7xx_pcie_enable_wrapper_interrupts’: > drivers/pci/dwc/pci-dra7xx.c:187:7: warning: large integer implicitly > truncated to unsigned type [-Woverflow] >~INTERRUPTS); Er.. actually both PCIECTRL_TI_CONF_IRQSTATUS_MSI and PCIECTRL_TI_CONF_IRQSTATUS_MAIN are "write 1 to clear" registers. So writing '0' here means no action. So the right fix should be diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c index 8decf46cf525..aab0187cdf87 100644 --- a/drivers/pci/dwc/pci-dra7xx.c +++ b/drivers/pci/dwc/pci-dra7xx.c @@ -174,7 +174,7 @@ static int dra7xx_pcie_establish_link(struct dw_pcie *pci) static void dra7xx_pcie_enable_msi_interrupts(struct dra7xx_pcie *dra7xx) { dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, - ~LEG_EP_INTERRUPTS & ~MSI); + MSI | LEG_EP_INTERRUPTS); dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQENABLE_SET_MSI, @@ -184,7 +184,7 @@ static void dra7xx_pcie_enable_msi_interrupts(struct dra7xx_pcie *dra7xx) static void dra7xx_pcie_enable_wrapper_interrupts(struct dra7xx_pcie *dra7xx) { dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MAIN, - ~INTERRUPTS); + INTERRUPTS); dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQENABLE_SET_MAIN, INTERRUPTS); } > > Signed-off-by: Arvind Yadav > > Changes in v2: > Add casts in the definitions. please move the change log below "---" Thanks Kishon
Re: [PATCH v3 1/9] ARM: dts: imx6ul-isiot: Add Sound card with codec node
Hi Shawn, On Thu, Jun 15, 2017 at 12:45 PM, Shawn Guo wrote: > On Thu, Jun 15, 2017 at 10:21:43AM +0530, Jagan Teki wrote: >> On Thu, Jun 15, 2017 at 7:50 AM, Shawn Guo wrote: >> > On Wed, Jun 14, 2017 at 08:17:04PM +0530, Jagan Teki wrote: >> >> On Fri, Apr 7, 2017 at 6:46 PM, Shawn Guo wrote: >> >> > On Thu, Apr 06, 2017 at 11:32:07PM +0530, Jagan Teki wrote: >> >> >> From: Jagan Teki >> >> >> >> >> >> Add support for Sound card and related codec(via i2c1) nodes >> >> >> on Engicam Is.IoT MX6UL variant module boards. >> >> >> >> >> >> Cc: Shawn Guo >> >> >> Cc: Matteo Lisi >> >> >> Cc: Michael Trimarchi >> >> >> Signed-off-by: Jagan Teki >> >> >> --- >> >> >> Changes for v3: >> >> >> - Replace fsl,imx-audio-sgtl5000 and use simple-audio-card >> >> >> Changes for v2: >> >> >> - Use proper [label:] node-name[@unit-address] for codec >> >> >> - Remove incorrect codec property 'wlf,shared-lrclk' >> >> >> - Remove 'gpr' from sound card node >> >> >> >> >> >> arch/arm/boot/dts/imx6ul-isiot-common.dtsi | 10 +++ >> >> >> arch/arm/boot/dts/imx6ul-isiot.dtsi| 44 >> >> >> ++ >> >> > >> >> > Can you help me understand how these two files are related? Why is >> >> > sgtl5000 added into one and sound node added into the other? >> >> >> >> lcdif, ts and sound card which may differ based on the base-board >> >> connected with SOM, So I moved these stuff which are related to >> >> Starter kit supported once's and used with SOM dts files. if some >> >> other board with same SOM can have different lcdif and etc so they can >> >> define locally to dts. >> > >> > I do not follow how these stuff are organized. So far we have the >> > following isiot dts files. >> > >> > - imx6ul-isiot-common.dtsi >> > - imx6ul-isiot.dtsi >> > - imx6ul-isiot-emmc.dts and imx6ul-isiot-nand.dts >> > >> > How are they mapping to SoM and base-board? >> >> isiot is a modules class, with that emmc and nand are two separate >> SOM's. the current support is for mounting these SOM's on Development >> base board[1]. So, for isiot module class we have imx6ul-isiot.dtsi >> and emmc and nand SOM's have imx6ul-isiot-emmc.dts and >> imx6ul-isiot-nand.dts. There are some Carrier boards[1] which were >> used with different lcdif and other changes, So >> imx6ul-isiot-common.dtsi have changes common across emmc and nand, >> instead of adding them into individual dts files I moved in >> -common.dtsi. So in future if isiot SOM mounted on carrier board >> which should have a separate dts and which may or may not use >> imx6ul-isiot-common.dtsi > > So you are not sure if imx6ul-isiot-common.dtsi will be used by carrier > board. Then what's the point to have it now? > > You are saying imx6ul-isiot-common.dtsi is created to accommodate the > common things across emmc and nand SoMs, but it actually contains LCD > and Touch such base-board level of stuff. Confusing. > > I feel the abstraction is wrong from the beginning. Ideally, we should > have something like below. > > - imx6ul-isiot.dtsi > - imx6ul-isiot-kit.dts and imx6ul-isiot-carrier.dts > > The -isiot should have everything on SoM and common stuff between -kit > and -carrier boards, while -kit and -carrier include -isiot and contains > the base-board specific things. The -isiot can have both emmc and nand > devices with "disabled" status, and let firmware turn device on per SoM > it boots. In that case, the abstraction level can be less and clearer. > > Thoughts? So,even the common stuff (lcdif, ts and etc) should be in -isiot.dtsi and make it "disabled" and let them enabled on respective dts. this what you mentioned here? thank! -- Jagan Teki Senior Linux Kernel Engineer | Amarula Solutions U-Boot, Linux | Upstream Maintainer Hyderabad, India.
RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
Hi, Peter > From: Peter Hutterer [mailto:peter.hutte...@who-t.net] > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch > exported by ACPI > > On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote: > > Hi, Benjamin > > > > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com] > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID > > > switch exported by ACPI > > > > > > Hi, > > > > > > [Sorry for the delay, I have been sidetracked from this] > > > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote: > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires > > > > (benjamin.tissoi...@redhat.com) wrote: > > > > > > > > > Hi, > > > > > > > > > > Sending this as a WIP as it still need a few changes, but it mostly > > > > > works as > > > > > expected (still not fully compliant yet). > > > > > > > > > > So this is based on Lennart's comment in [1]: if the LID state is not > > > > > reliable, > > > > > the kernel should not export the LID switch device as long as we are > > > > > not sure > > > > > about its state. > > > > > > > > Ah nice! I (obviously) like this approach. > > > > > > Heh. Now I just need to convince Lv that it's the right approach. > > > > I feel we don't have big conflicts. > > And I already took part of your idea into this patchset: > > https://patchwork.kernel.org/patch/9771121/ > > https://patchwork.kernel.org/patch/9771119/ > > I tested my surface pros with Ubuntu, they are working as expected. > > > > > > > Note that systemd currently doesn't sync the state when the input > > > > > node just > > > > > appears. This is a systemd bug, and it should not be handled by the > > > > > kernel > > > > > community. > > > > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there > > > > already a systemd github bug about this? If not, please create one, > > > > and we'll look into it! > > > > > > I don't think there is. I haven't raised it yet because I am not so sure > > > this will not break again those worthless unreliable LID, and if we play > > > whack a mole between the kernel and user space, things are going to be > > > nasty. So I'd rather have this fixed in systemd along with the > > > unreliable LID switch knowledge, so we are sure that the kernel behaves > > > the way we expect it to be. > > > > This is my feeling: > > We needn't go that far. > > We can interpret "input node appears" into "default input node state". > > Sorry, can you clarify this bit please? I'm not sure what you mean here. > Note that there's an unknown amount of time between "device node appearing > in the system" and when a userspace process actually opens it and looks at > its state. By then, the node may have changed state again. We can see: "logind" has already implemented a timeout, and will not respond lid state unless it can be stable within this timeout period. I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"? I feel "removing the input node for a period where its state is not trustful" is technically identical to this mechanism. Cheers, Lv > > Cheers, >Peter > > > That's what you want for acpi button driver - we now defaults to "method" > > mode. > > > > What's your opinion? > > > > Thanks > > Lv > >
Re: [PATCH v2 1/2] libsas: Don't process sas events in static works
在 2017/6/14 21:08, John Garry 写道: > On 14/06/2017 10:04, wangyijing wrote: static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event) >> { >> +struct sas_ha_event *ev; >> + >> BUG_ON(event >= HA_NUM_EVENTS); >> >> -sas_queue_event(event, &sas_ha->pending, >> -&sas_ha->ha_events[event].work, sas_ha); >> +ev = kzalloc(sizeof(*ev), GFP_ATOMIC); >> +if (!ev) >> +return; >>> > GFP_ATOMIC allocations can fail and then no events will be queued *and* we >>> > don't report the error back to the caller. >>> > >> Yes, it's really a problem, but I don't find a better solution, do you have >> some suggestion ? >> > > Dan raised an issue with this approach, regarding a malfunctioning PHY which > spews out events. I still don't think we're handling it safely. Here's the > suggestion: > - each asd_sas_phy owns a finite-sized pool of events > - when the event pool becomes exhausted, libsas stops queuing events > (obviously) and disables the PHY in the LLDD > - upon attempting to re-enable the PHY from sysfs, libsas first checks that > the pool is still not exhausted > > If you cannot find a good solution, then let us know and we can help. Hi John and Dan, what's event you found on malfunctioning PHY, if the event is PORTE_BROADCAST_RCVD, since every PORTE_BROADCAST_RCVD libsas always call sas_revalidate_domain(), what about keeping a broadcast waiting(not queued in workqueue) and discard others. If the event is other types, things may become knotty. > > John > > > . >
Re: [PATCH 23/28] mbcache: make mbcache more generic
On Wed 31-05-17 01:15:12, Tahsin Erdogan wrote: > Large xattr feature would like to use the mbcache for xattr value > deduplication. Current implementation is geared towards xattr block > deduplication. Make it more generic so that it can be used by both. Can you explain a bit more what do you mean by "make it more generic" as it seems you just rename a couple of things here... Honza > > Signed-off-by: Tahsin Erdogan > --- > fs/ext2/xattr.c | 18 +- > fs/ext4/xattr.c | 10 +- > fs/mbcache.c| 43 +-- > include/linux/mbcache.h | 14 -- > 4 files changed, 43 insertions(+), 42 deletions(-) > > diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c > index fbdb8f171893..1e5f76070580 100644 > --- a/fs/ext2/xattr.c > +++ b/fs/ext2/xattr.c > @@ -493,8 +493,8 @@ bad_block:ext2_error(sb, "ext2_xattr_set", >* This must happen under buffer lock for >* ext2_xattr_set2() to reliably detect modified block >*/ > - mb_cache_entry_delete_block(EXT2_SB(sb)->s_mb_cache, > - hash, bh->b_blocknr); > + mb_cache_entry_delete(EXT2_SB(sb)->s_mb_cache, hash, > + bh->b_blocknr); > > /* keep the buffer locked while modifying it. */ > } else { > @@ -721,8 +721,8 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head > *old_bh, >* This must happen under buffer lock for >* ext2_xattr_set2() to reliably detect freed block >*/ > - mb_cache_entry_delete_block(ext2_mb_cache, > - hash, old_bh->b_blocknr); > + mb_cache_entry_delete(ext2_mb_cache, hash, > + old_bh->b_blocknr); > /* Free the old block. */ > ea_bdebug(old_bh, "freeing"); > ext2_free_blocks(inode, old_bh->b_blocknr, 1); > @@ -795,8 +795,8 @@ ext2_xattr_delete_inode(struct inode *inode) >* This must happen under buffer lock for ext2_xattr_set2() to >* reliably detect freed block >*/ > - mb_cache_entry_delete_block(EXT2_SB(inode->i_sb)->s_mb_cache, > - hash, bh->b_blocknr); > + mb_cache_entry_delete(EXT2_SB(inode->i_sb)->s_mb_cache, hash, > + bh->b_blocknr); > ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1); > get_bh(bh); > bforget(bh); > @@ -907,11 +907,11 @@ ext2_xattr_cache_find(struct inode *inode, struct > ext2_xattr_header *header) > while (ce) { > struct buffer_head *bh; > > - bh = sb_bread(inode->i_sb, ce->e_block); > + bh = sb_bread(inode->i_sb, ce->e_value); > if (!bh) { > ext2_error(inode->i_sb, "ext2_xattr_cache_find", > "inode %ld: block %ld read error", > - inode->i_ino, (unsigned long) ce->e_block); > + inode->i_ino, (unsigned long) ce->e_value); > } else { > lock_buffer(bh); > /* > @@ -931,7 +931,7 @@ ext2_xattr_cache_find(struct inode *inode, struct > ext2_xattr_header *header) > } else if (le32_to_cpu(HDR(bh)->h_refcount) > > EXT2_XATTR_REFCOUNT_MAX) { > ea_idebug(inode, "block %ld refcount %d>%d", > - (unsigned long) ce->e_block, > + (unsigned long) ce->e_value, > le32_to_cpu(HDR(bh)->h_refcount), > EXT2_XATTR_REFCOUNT_MAX); > } else if (!ext2_xattr_cmp(header, HDR(bh))) { > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c > index 886d06e409b6..772948f168c3 100644 > --- a/fs/ext4/xattr.c > +++ b/fs/ext4/xattr.c > @@ -678,7 +678,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode > *inode, >* This must happen under buffer lock for >* ext4_xattr_block_set() to reliably detect freed block >*/ > - mb_cache_entry_delete_block(ext4_mb_cache, hash, bh->b_blocknr); > + mb_cache_entry_delete(ext4_mb_cache, hash, bh->b_blocknr); > get_bh(bh); > unlock_buffer(bh); > ext4_free_blocks(handle, inode, bh, 0, 1, > @@ -1115,8 +1115,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode > *inode, >
Re: [PATCH 0/4] firmware: fix fallback mechanism by ignoring SIGCHLD
On 15/06/17 00:20, Luis R. Rodriguez wrote: Martin reported an issue with Android where if sysfs is used to trigger a sync fw load which *relies* on the fallback mechanism and a background job completes while the trigger is ongoing in the foreground it will immediately fail the fw request. The issue can be observed in this simple test script using the test_firmware driver: set -e /etc/init.d/udev stop modprobe test_firmware DIR=/sys/devices/virtual/misc/test_firmware echo 10 >/sys/class/firmware/timeout sleep 2 & echo -n "does-not-exist-file.bin" > "$DIR"/trigger_request The background sleep triggers the SIGCHLD signal and we fail the firmware request on the fallback mechanism. This was due to the type of wait used which ... Note that although I *feared* this might implicate any use of non-killable waits on other system calls, such as finit_module(), initial testing confirms this to not be the case. For instance replacing the echo with modprobe on a module which does the same on init does not present the same issues. This could be due to the special SA_RESTART flag case on write() as noted above and sysfs... however, its not perfectly clear yet to me. The reason the problem does not occur with modprobe is that in that case the processes triggering the firmware load (modprobe) and the process dying (sleep) are *siblings* rather than father and child. So the modprobe process does *not* receive a SIGCHLD when its' *brother* dies. echo is a shell built-in so the process triggering the firmware load (the shell) and the process dying (sleep) *are* father and child. Martin
Re: [PATCH v3 1/9] ARM: dts: imx6ul-isiot: Add Sound card with codec node
On Thu, Jun 15, 2017 at 01:01:22PM +0530, Jagan Teki wrote: > > I feel the abstraction is wrong from the beginning. Ideally, we should > > have something like below. > > > > - imx6ul-isiot.dtsi > > - imx6ul-isiot-kit.dts and imx6ul-isiot-carrier.dts > > > > The -isiot should have everything on SoM and common stuff between -kit > > and -carrier boards, while -kit and -carrier include -isiot and contains > > the base-board specific things. The -isiot can have both emmc and nand > > devices with "disabled" status, and let firmware turn device on per SoM > > it boots. In that case, the abstraction level can be less and clearer. > > > > Thoughts? > > So,even the common stuff (lcdif, ts and etc) should be in -isiot.dtsi Yes, anything common can be in -isiot.dtsi. > and make it "disabled" and let them enabled on respective dts. this > what you mentioned here? It doesn't matter. If the lcd/touch is same on -kit and -carrier, you can even have them enabled by default in -isiot.dtsi. The -kit.dts and -carrier.dts are there to accommodate base-board specific differences. Shawn
Re: [PATCH v4] mfd: lp87565: Add lp87565 PMIC support
On Mon, 12 Jun 2017, Javier Martinez Canillas wrote: > Hello Lee and Keerthy, > > On Mon, Jun 12, 2017 at 11:17 AM, Keerthy wrote: > > > > > > On Monday 12 June 2017 02:41 PM, Lee Jones wrote: > >> On Sun, 11 Jun 2017, Keerthy wrote: > >> > >>> > >>> > >>> On Sunday 11 June 2017 10:36 AM, Keerthy wrote: > > > On Friday 09 June 2017 07:58 PM, Rob Herring wrote: > > On Thu, Jun 08, 2017 at 09:38:14AM +0530, Keerthy wrote: > >> The LP87565 chip is a power management IC for Portable Navigation > >> Systems > >> and Tablet Computing devices. It contains the following components: > >> > >> - Configurable Bucks(Single and multi-phase). > >> - Configurable General Purpose Output Signals (GPO). > >> > >> The LP87565-Q1 variant device uses two 2-phase outputs configuration, > >> Buck0 is master for Buck0/1 output and Buck2 is master for Buck2/3 > >> output. > >> > >>> > >>> Lee Jones, > >>> > >>> Shall i add back i2c_device_id as pointed here: > >>> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1417316.html > >> > >> Hasn't that been fixed yet? > >> > > There are only 2 patch series remaining to be merged so we can finally > fix this in the I2C core, making sure that no drivers will be > regressed. One of the series is for MFD and have been around for a > while (it already contains all the relevant acks AFAICT), it would be > very helpful if you can look at it and merge if you think is correct: > > https://lkml.org/lkml/2017/5/4/11 Please use the normal procedure and [RESEND]. > >> I guess so then. :( > > > > Okay. So with that i assume i should reintroduce probe instead of probe_new. > > > > It's orthogonal, you can have probe_new and also the I2C device ID > table (the OF table will be used for matching, you just need the I2C > table to export the aliases with MODULE_DEVICE_TABLE(i2c,.. ). > > Best regards, > Javier -- Lee Jones Linaro STMicroelectronics Landing Team Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH 2/6] Documentation: rockchip-dw-mshc: add description for rk3228
Hi Frank, Am Donnerstag, 15. Juni 2017, 15:16:16 CEST schrieb Frank Wang: > From: Shawn Lin > > Add "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc" for > dwmmc on rk322x platform. > > Signed-off-by: Shawn Lin > --- > Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt > b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt index > 520d61d..ce30dff 100644 > --- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt > +++ b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt > @@ -15,6 +15,7 @@ Required Properties: > - "rockchip,rk3288-dw-mshc": for Rockchip RK3288 > - "rockchip,rv1108-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip > RV1108 - "rockchip,rk3036-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip > RK3036 + - "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc": for > Rockchip RK322X - "rockchip,rk3368-dw-mshc", "rockchip,rk3288-dw-mshc": for > Rockchip RK3368 - "rockchip,rk3399-dw-mshc", "rockchip,rk3288-dw-mshc": for > Rockchip RK3399 you might want to rebase this patch on top of Ulfs next branch [0], as there is also the support for the rk3328 in there now. Otherwise looks good to me, so Reviewed-by: Heiko Stuebner [0] https://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git/log/?h=next
[RFC][PATCH 0/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed
Our customer reported that Kernel text may be located on non-mirror region (movable zone) when both address range mirroring feature and KASLR are enabled. The functions of address range mirroring feature are as follows. - The physical memory region whose descriptors in EFI memory map have EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored - The function arranges such mirror region into normal zone and other region into movable zone in order to locate kernel code and data on mirror region So we need restrict kernel to be located inside mirror region if it is existed. The method is very simple. If efi is enabled, just iterate all efi memory map and pick up mirror region to process for adding candidate of slot. If efi disabled or no mirror region existed, still process e820 memory map. This won't bring much efficiency loss, at worst we just go through all efi memory maps and found no mirror. One question: >From code, though mirror regions are existed, they are meaningful only if kernelcore=mirror kernel option is specified. Not sure if my understanding is correct. NOTE: I haven't got a machine with efi mirror region enabled, so only test the e820 map processing case and the case of no mirror region on efi machine. So set this as a RFC patchset, will post formal one after above question is made clear and mirror issue test passed. Baoquan He (2): x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed arch/x86/boot/compressed/kaslr.c | 129 +++ 1 file changed, 104 insertions(+), 25 deletions(-) -- 2.5.5
[PATCH 1/2] x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map
Now function process_e820_entry is only used to process e820 memory entry. Adapt it for memory region only, not just for e820. Later we will use it to process efi mirror regions. So rename the original process_e820_entry to process_mem_region, and extract and wrap the e820 specific processing code into process_e820_entry. Signed-off-by: Baoquan He --- arch/x86/boot/compressed/kaslr.c | 60 ++-- 1 file changed, 33 insertions(+), 27 deletions(-) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index fe318b4..c2ed051 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -479,35 +479,31 @@ static unsigned long slots_fetch_random(void) return 0; } -static void process_e820_entry(struct boot_e820_entry *entry, +static void process_mem_region(struct mem_vector *entry, unsigned long minimum, unsigned long image_size) { struct mem_vector region, overlap; struct slot_area slot_area; unsigned long start_orig, end; - struct boot_e820_entry cur_entry; - - /* Skip non-RAM entries. */ - if (entry->type != E820_TYPE_RAM) - return; + struct mem_vector cur_entry; /* On 32-bit, ignore entries entirely above our maximum. */ - if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE) + if (IS_ENABLED(CONFIG_X86_32) && entry->start >= KERNEL_IMAGE_SIZE) return; /* Ignore entries entirely below our minimum. */ - if (entry->addr + entry->size < minimum) + if (entry->start + entry->size < minimum) return; /* Ignore entries above memory limit */ - end = min(entry->size + entry->addr, mem_limit); - if (entry->addr >= end) + end = min(entry->size + entry->start, mem_limit); + if (entry->start >= end) return; - cur_entry.addr = entry->addr; - cur_entry.size = end - entry->addr; + cur_entry.start = entry->start; + cur_entry.size = end - entry->start; - region.start = cur_entry.addr; + region.start = cur_entry.start; region.size = cur_entry.size; /* Give up if slot area array is full. */ @@ -522,7 +518,7 @@ static void process_e820_entry(struct boot_e820_entry *entry, region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN); /* Did we raise the address above this e820 region? */ - if (region.start > cur_entry.addr + cur_entry.size) + if (region.start > cur_entry.start + cur_entry.size) return; /* Reduce size by any delta from the original address. */ @@ -562,12 +558,31 @@ static void process_e820_entry(struct boot_e820_entry *entry, } } -static unsigned long find_random_phys_addr(unsigned long minimum, - unsigned long image_size) +static void process_e820_entry(unsigned long minimum,unsigned long image_size) { int i; - unsigned long addr; + struct mem_vector region; + struct boot_e820_entry *entry; + + /* Verify potential e820 positions, appending to slots list. */ +for (i = 0; i < boot_params->e820_entries; i++) { +entry = &boot_params->e820_table[i]; +/* Skip non-RAM entries. */ +if (entry->type != E820_TYPE_RAM) +continue; +region.start = entry->addr; +region.size = entry->size; +process_mem_region(®ion, minimum, image_size); +if (slot_area_index == MAX_SLOT_AREA) { +debug_putstr("Aborted e820 scan (slot_areas full)!\n"); +break; +} +} +} +static unsigned long find_random_phys_addr(unsigned long minimum, + unsigned long image_size) +{ /* Check if we had too many memmaps. */ if (memmap_too_large) { debug_putstr("Aborted e820 scan (more than 4 memmap= args)!\n"); @@ -577,16 +592,7 @@ static unsigned long find_random_phys_addr(unsigned long minimum, /* Make sure minimum is aligned. */ minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN); - /* Verify potential e820 positions, appending to slots list. */ - for (i = 0; i < boot_params->e820_entries; i++) { - process_e820_entry(&boot_params->e820_table[i], minimum, - image_size); - if (slot_area_index == MAX_SLOT_AREA) { - debug_putstr("Aborted e820 scan (slot_areas full)!\n"); - break; - } - } - + process_e820_entry(minimum, image_size); return slots_fetch_random(); } -- 2.5.5
[PATCH 2/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed
Kernel text may be located on non-mirror region (movable zone) when both address range mirroring feature and KASLR are enabled. The functions of address range mirroring feature arranges such mirror region into normal zone and other region into movable zone in order to locate kernel code and data on mirror region. The physical memory region whose descriptors in EFI memory map have EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored. If efi is detected, iterate efi memory map and pick up the mirror region to process for adding candidate of randomization slot. If efi is disabled or no mirror region found, still process e820 memory map. Signed-off-by: Baoquan He --- arch/x86/boot/compressed/kaslr.c | 73 1 file changed, 73 insertions(+) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index c2ed051..a6aa69e 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -37,7 +37,9 @@ #include #include #include +#include #include +#include /* Macros used by the included decompressor code below. */ #define STATIC @@ -558,6 +560,73 @@ static void process_mem_region(struct mem_vector *entry, } } +/* This variable marks if efi mirror regions have been handled. */ +bool efi_mirror_found = false; + +static void process_efi_entry(unsigned long minimum, unsigned long image_size) +{ + struct efi_info *e = &boot_params->efi_info; + efi_memory_desc_t *md; + struct mem_vector region; + unsigned long pmap; + bool is_efi = false; + u32 nr_desc; + int i; + unsigned long addr; + u64 end; + char *cmdline = (char *)get_cmd_line_ptr(); + char *str; + char *signature; + + +#ifdef CONFIG_EFI + signature = (char *)&boot_params->efi_info.efi_loader_signature; +#endif + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) && + strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) + return; + + /* +* Mirrored regions are meaningful only if "kernelcore=mirror" +* specified. +*/ + str = strstr(cmdline, "kernelcore="); + if (!str) + return; + str += strlen("kernelcore="); + if (strncmp(str, "mirror", 6)) + return; + +#ifdef CONFIG_X86_32 + /* Can't handle data above 4GB at this time */ + if (e->efi_memmap_hi) { +warn("Memory map is above 4GB, disabling EFI.\n"); +return -EINVAL; +} +pmap = e->efi_memmap; +#else +pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32)); +#endif + + nr_desc = e->efi_memmap_size / e->efi_memdesc_size; + for (i = 0; i < nr_desc; i++) { + md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size)); + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) { + region.start = md->phys_addr; + region.size = md->num_pages << EFI_PAGE_SHIFT; + process_mem_region(®ion, minimum, image_size); + efi_mirror_found = true; + } + debug_putaddr(i); + debug_putaddr(md->attribute); + debug_putaddr(md->phys_addr); + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; + debug_putaddr(end); + } + + return; +} + static void process_e820_entry(unsigned long minimum,unsigned long image_size) { int i; @@ -592,6 +661,10 @@ static unsigned long find_random_phys_addr(unsigned long minimum, /* Make sure minimum is aligned. */ minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN); + process_efi_entry(minimum, image_size); + if (efi_mirror_found) + return slots_fetch_random(); + process_e820_entry(minimum, image_size); return slots_fetch_random(); } -- 2.5.5
RE: [RFC 0/3] WhiteEgret LSM module
Hi Mehmet, Thank you for your suggestion to use IMA appraisal. I'm sorry for the delay in replying to you. I'm studying IMA appraisal. There is something I don't understand yet. Could you please teach me the following items? We assume that "fixing" has already finished and that IMA appraisal is running in "enforce" mode. - I have a question for a procedure of labeling and appraising a new or updated executable file. Suppose that we want to create a new executable file (included in policy) and make it be measured and appraised. Then what kind of procedure should I do? Similarly, how do I update appraised file to be continuously permitted to execute? - When we copy (cp command with -a option) or move an appraised executable file to somewhere, is the copied or moved executable file permitted to execute as well? - (related to the above question) What kind of data is hashed to security.ima? Thanks in advance, Masanobu Koike > -Original Message- > > > On May 31, 2017, at 6:59 AM, Peter Dolding wrote: > > > > Number 1 we need to split the idea of signed and whitelisted. IMA is > > signed should not be confused with white-listed.You will find > > policies stating whitelist and signed as two different things. > > IMA-appraisal can do both. If the securtiy.ima extended attribute > of the file is a hash and not a signature, then it is whitelisting. > > > Like you see here in Australian government policy there is another > > thing called whitelisted. > > > https://www.asd.gov.au/publications/protect/top_4_mitigations_linux.ht > m > > Matthew Garrett you might want to call IMA whitelisting Australian > > government for one does not agree. IMA is signed. The difference > > between signed and white-listed is you might have signed a lot more > > than what a particular system is white-listed to allowed used. > > I doubt the Australian government is an authority on Linux features. > IMA-appraisal can be set to "fix" mode with a boot parameter. If the > policy covers what you want to whitelist (e.g. files opened by user x), > and then when those files are accessed, the kernel writes out the hash. > Then, you can switch to "enforce" mode to allow only files with hashes. > > Also, you can achieve the same thing by signing all whitelisted > files and add the certificate to .ima keyring and throwing away the > signing key. > > > The feature need to include in it name whitelisting or just like the > > Australian Department of Defence other parties will mark Linux has not > > having this feature. > > I guess we need to advertise IMA-appraisal better. > > > Whitelist is program name/path and checksum/s. If the file any more > > than that is now not a Whitelist but a Security Policy Enforcement or > > signing. Whitelist and blacklists are meant to be simple things. > > This is also why IMA fails and is signed to too complete to be a basic > > Whitelist. > > When you work out all the little details, you arrive at IMA-appraisal. > You have to consider how the scheme is bootstrapped and how it > is protected against the root. IMA-appraisal either relies on a boot > parameter and write-once policy, or the trusted keyrings. > > > Peter Dolding. > > Mehmet >
Re: [PATCH 28/28] quota: add extra inode count to dquot transfer functions
On Wed 31-05-17 01:15:17, Tahsin Erdogan wrote: > Ext4 ea_inode feature allows storing xattr values in external inodes to > be able to store values that are bigger than a block in size. Ext4 also > has deduplication support for these type of inodes. With deduplication, > the actual storage waste is eliminated but the users of such inodes are > still charged full quota for the inodes as if there was no sharing > happening in the background. > > This design requires ext4 to manually charge the users because the > inodes are shared. > > An implication of this is that, if someone calls chown on a file that > has such references we need to transfer the quota for the file and xattr > inodes. Current dquot_transfer() function implicitly transfers one inode > charge. In our case, we would like to specify additional inodes to be > transferred. Hum, rather handle this similarly to how we handle delalloc reserved space. Add a callback to dq_ops to get "inode usage" of an inode and then use it in dquot_transfer(), dquot_free_inode(), dquot_alloc_inode(). Honza > Signed-off-by: Tahsin Erdogan > --- > fs/ext2/inode.c | 2 +- > fs/ext4/inode.c | 8 ++- > fs/ext4/ioctl.c | 13 +++- > fs/ext4/xattr.c | 54 > > fs/ext4/xattr.h | 2 ++ > fs/jfs/file.c| 2 +- > fs/ocfs2/file.c | 2 +- > fs/quota/dquot.c | 16 +++--- > fs/reiserfs/inode.c | 2 +- > include/linux/quotaops.h | 8 --- > 10 files changed, 93 insertions(+), 16 deletions(-) > > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c > index 2dcbd5698884..a13ba5dcb355 100644 > --- a/fs/ext2/inode.c > +++ b/fs/ext2/inode.c > @@ -1656,7 +1656,7 @@ int ext2_setattr(struct dentry *dentry, struct iattr > *iattr) > } > if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, > inode->i_uid)) || > (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, > inode->i_gid))) { > - error = dquot_transfer(inode, iattr); > + error = dquot_transfer(inode, iattr, 0); > if (error) > return error; > } > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 6f5872197d6c..28abbbdbbb80 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5267,6 +5267,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr > *attr) > int error, rc = 0; > int orphan = 0; > const unsigned int ia_valid = attr->ia_valid; > + int ea_inode_refs; > > if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb > return -EIO; > @@ -5293,7 +5294,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr > *attr) > error = PTR_ERR(handle); > goto err_out; > } > - error = dquot_transfer(inode, attr); > + > + down_read(&EXT4_I(inode)->xattr_sem); > + error = ea_inode_refs = ext4_xattr_inode_count(inode); > + if (ea_inode_refs >= 0) > + error = dquot_transfer(inode, attr, ea_inode_refs); > + up_read(&EXT4_I(inode)->xattr_sem); > if (error) { > ext4_journal_stop(handle); > return error; > diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c > index dde8deb11e59..9938dc8e24c8 100644 > --- a/fs/ext4/ioctl.c > +++ b/fs/ext4/ioctl.c > @@ -21,6 +21,7 @@ > #include "ext4.h" > #include > #include "fsmap.h" > +#include "xattr.h" > #include > > /** > @@ -319,6 +320,7 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 > projid) > struct ext4_iloc iloc; > struct ext4_inode *raw_inode; > struct dquot *transfer_to[MAXQUOTAS] = { }; > + int ea_inode_refs; > > if (!ext4_has_feature_project(sb)) { > if (projid != EXT4_DEF_PROJID) > @@ -371,9 +373,17 @@ static int ext4_ioctl_setproject(struct file *filp, > __u32 projid) > if (err) > goto out_stop; > > + down_read(&EXT4_I(inode)->xattr_sem); > + ea_inode_refs = ext4_xattr_inode_count(inode); > + if (ea_inode_refs < 0) { > + up_read(&EXT4_I(inode)->xattr_sem); > + err = ea_inode_refs; > + goto out_stop; > + } > + > transfer_to[PRJQUOTA] = dqget(sb, make_kqid_projid(kprojid)); > if (!IS_ERR(transfer_to[PRJQUOTA])) { > - err = __dquot_transfer(inode, transfer_to); > + err = __dquot_transfer(inode, transfer_to, ea_inode_refs); > dqput(transfer_to[PRJQUOTA]); > if (err) > goto out_dirty; > @@ -382,6 +392,7 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 > projid) > EXT4_I(inode)->i_projid = kprojid; > inode->i_ctime = current_time(inode); > out_dirty: > + up_read(&EXT4_I(inode)->xattr_sem); >
Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
On Thu, Jun 15, 2017 at 07:33:58AM +, Zheng, Lv wrote: > Hi, Peter > > > From: Peter Hutterer [mailto:peter.hutte...@who-t.net] > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID > > switch exported by ACPI > > > > On Thu, Jun 15, 2017 at 02:52:57AM +, Zheng, Lv wrote: > > > Hi, Benjamin > > > > > > > From: Benjamin Tissoires [mailto:benjamin.tissoi...@redhat.com] > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID > > > > switch exported by ACPI > > > > > > > > Hi, > > > > > > > > [Sorry for the delay, I have been sidetracked from this] > > > > > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote: > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires > > > > > (benjamin.tissoi...@redhat.com) wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Sending this as a WIP as it still need a few changes, but it mostly > > > > > > works as > > > > > > expected (still not fully compliant yet). > > > > > > > > > > > > So this is based on Lennart's comment in [1]: if the LID state is > > > > > > not reliable, > > > > > > the kernel should not export the LID switch device as long as we > > > > > > are not sure > > > > > > about its state. > > > > > > > > > > Ah nice! I (obviously) like this approach. > > > > > > > > Heh. Now I just need to convince Lv that it's the right approach. > > > > > > I feel we don't have big conflicts. > > > And I already took part of your idea into this patchset: > > > https://patchwork.kernel.org/patch/9771121/ > > > https://patchwork.kernel.org/patch/9771119/ > > > I tested my surface pros with Ubuntu, they are working as expected. > > > > > > > > > Note that systemd currently doesn't sync the state when the input > > > > > > node just > > > > > > appears. This is a systemd bug, and it should not be handled by the > > > > > > kernel > > > > > > community. > > > > > > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there > > > > > already a systemd github bug about this? If not, please create one, > > > > > and we'll look into it! > > > > > > > > I don't think there is. I haven't raised it yet because I am not so sure > > > > this will not break again those worthless unreliable LID, and if we play > > > > whack a mole between the kernel and user space, things are going to be > > > > nasty. So I'd rather have this fixed in systemd along with the > > > > unreliable LID switch knowledge, so we are sure that the kernel behaves > > > > the way we expect it to be. > > > > > > This is my feeling: > > > We needn't go that far. > > > We can interpret "input node appears" into "default input node state". > > > > Sorry, can you clarify this bit please? I'm not sure what you mean here. > > Note that there's an unknown amount of time between "device node appearing > > in the system" and when a userspace process actually opens it and looks at > > its state. By then, the node may have changed state again. > > We can see: > "logind" has already implemented a timeout, and will not respond lid state > unless it can be stable within this timeout period. > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"? > > I feel "removing the input node for a period where its state is not trustful" > is technically identical to this mechanism. but you'd be making kernel policy based on one userspace implementation. e.g. libinput doesn't have a timeout period, it assumes the state is correct when an input node is present. Cheers, Peter
Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'
On 06/14/17 21:12, Guenter Roeck wrote: < snip > > Good (v4.12-rc4): > < snip > > OF: Checking node /soc@e000/pic@4 > OF: type match > OF: node '/soc@e000/pic@4' compatible '' type 'open-pic' name '' > score 2 > OF: node '/soc@e000/pic@4' compatible 'open-pic' type '' name '' > score 0 < snip > > > bad: < snip > > OF: Checking node /soc@e000/pic@4 > OF: node '/soc@e000/pic@4' compatible '' type 'open-pic' name '' > score 0 > OF: node '/soc@e000/pic@4' compatible 'open-pic' type '' name '' > score 0 < snip > > No matching open-pic node > [ cut here ] > kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50! > > So, in __of_device_is_compatible(), the difference is in > __of_device_is_compatible() after > > /* Matching type is better than matching name */ > > Further debugging shows that device->type is NULL in the bad case. > > OF: Checking node /soc@e000/pic@4 > OF: trying type match open-pic - > OF: node '/soc@e000/pic@4' compatible '' type 'open-pic' name '' > score 0 > OF: node '/soc@e000/pic@4' compatible 'open-pic' type '' name '' > score 0 > > Do you need more information ? I think I know what part of my patch is causing the problem. Can you try the following patch to see if if fixes the failure in __of_device_is_compatible()? If this fixes the failure, then I know what is going on. If it works then I will have to rework my original patch in a different way than this quick hack. -Frank --- drivers/of/dynamic.c | 14 ++ 1 file changed, 14 insertions(+) Index: b/drivers/of/dynamic.c === --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -218,6 +218,20 @@ int of_property_notify(int action, struc static void __of_attach_node(struct device_node *np) { + const __be32 *phandle; + int sz; + + /* use "" to be consistent with populate_node() */ + np->name = __of_get_property(np, "name", NULL) ? : ""; + np->type = __of_get_property(np, "device_type", NULL) ? : ""; + + phandle = __of_get_property(np, "phandle", &sz); + if (!phandle) + phandle = __of_get_property(np, "linux,phandle", &sz); + if (IS_ENABLED(CONFIG_PPC_PSERIES) && !phandle) + phandle = __of_get_property(np, "ibm,phandle", &sz); + np->phandle = (phandle && (sz >= 4)) ? be32_to_cpup(phandle) : 0; + np->child = NULL; np->sibling = np->parent->child; np->parent->child = np;
Re: [git pull] first batch of ufs fixes
On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote: > NOTE: all I have is your image *after* it had counters buggered; I don't > know the exact sequence of operations that fucked it in your case. One > way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount > on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD. This patch > apparently fixes that, but your reproducer might be something different. FWIW, it seems to work here. Said that, *BSD fsck_ffs is not worth much - play a bit with redundancy in UFS superblock (starting with fragment and block sizes, their ratio, logarithms, bitmasks, etc.) and you can screw at least 10.3 into the ground when mounting an image that passes their fsck. Sure, anyone who mounts untrusted images is a cretin who deserves everything they get, fsck or no fsck, but... no complaints from fsck is not a reliable indicator of image being in good condition and that's PITA for testing. Another pile of fun: "reserve ->s_minfree percents of total" logics had been broken. * using hardwired 5% is wrong - especially for ufs2, where it's not even the default * ufs_freespace() returns u64; testing for <= 0 is not doing the right thing * no capability checks before we need them, TYVM... * ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it). 64bit variants were even calculated - and never used. * while we are at it, doing "multiply the total data frags by s_minfree and divide by 100" every time we allocate a block is bloody dumb - that should be calculated once. We really need to get the sodding tail unpacking moved up from the place where it's buried - turns out that my doubts about that code managing to avoid deadlocks had been correct. Long-term we need to move that thing to iomap-based ->write_iter() and do unpacking there and in truncate(). For now I've slapped together something that is easier to backport - avoiding ->truncate_mutex when possible and not holding ->s_lock over ufs_change_blocknr(). Another bug in the same area: ufs_get_locked_page() doesn't guarantee that buffer_heads are attached (race with vmscan trying to evict the page in question can end with buffer_heads freed and page left alive and uptodate). Callers do expect buffer_heads to be there, so we either need to do create_empty_buffers() in those callers or in ufs_get_locked_page(); I went for the latter for now. Off-by-one in ufs_truncate_blocks(): the logics when deciding whether we need to do anything with direct blocks is broken when new size is within the last direct block. It's better to find the path to the last byte _not_ to be removed and use that instead of the path to the beginning of the first block to be freed. I've pushed fixes for those into vfs.git#ufs-fixes; they do need more testing before I send a pull request, though.
[PATCH v9 1/7] drm/i915/gvt: Extend the GVT-g architecture
This patch extends the GVT-g architecture to support vfio device region. Later we will add a vfio device region for the vgpu to support OpRegion. Signed-off-by: Xiaoguang Chen --- drivers/gpu/drm/i915/gvt/kvmgt.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 1ae0b40..3c6a02b 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -53,11 +53,21 @@ static const struct intel_gvt_ops *intel_gvt_ops; #define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT) #define VFIO_PCI_OFFSET_MASK(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1) +struct vfio_region; +struct intel_vgpu_regops { + size_t (*rw)(struct intel_vgpu *vgpu, char *buf, + size_t count, loff_t *ppos, bool iswrite); + void (*release)(struct intel_vgpu *vgpu, + struct vfio_region *region); +}; + struct vfio_region { u32 type; u32 subtype; size_t size; u32 flags; + const struct intel_vgpu_regops *ops; + void*data; }; struct kvmgt_pgfn { @@ -642,7 +652,7 @@ static ssize_t intel_vgpu_rw(struct mdev_device *mdev, char *buf, int ret = -EINVAL; - if (index >= VFIO_PCI_NUM_REGIONS) { + if (index >= VFIO_PCI_NUM_REGIONS + vgpu->vdev.num_regions) { gvt_vgpu_err("invalid index: %u\n", index); return -EINVAL; } @@ -676,8 +686,11 @@ static ssize_t intel_vgpu_rw(struct mdev_device *mdev, char *buf, case VFIO_PCI_BAR5_REGION_INDEX: case VFIO_PCI_VGA_REGION_INDEX: case VFIO_PCI_ROM_REGION_INDEX: + break; default: - gvt_vgpu_err("unsupported region: %u\n", index); + index -= VFIO_PCI_NUM_REGIONS; + return vgpu->vdev.region[index].ops->rw(vgpu, buf, count, + ppos, is_write); } return ret == 0 ? count : ret; @@ -940,7 +953,8 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, unsigned int cmd, info.flags = VFIO_DEVICE_FLAGS_PCI; info.flags |= VFIO_DEVICE_FLAGS_RESET; - info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_regions = VFIO_PCI_NUM_REGIONS + + vgpu->vdev.num_regions; info.num_irqs = VFIO_PCI_NUM_IRQS; return copy_to_user((void __user *)arg, &info, minsz) ? @@ -1061,6 +1075,7 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, unsigned int cmd, } if (caps.size) { + info.flags |= VFIO_REGION_INFO_FLAG_CAPS; if (info.argsz < sizeof(info) + caps.size) { info.argsz = sizeof(info) + caps.size; info.cap_offset = 0; -- 2.7.4
[PATCH v9 0/7] drm/i915/gvt: Dma-buf support for GVT-g
v8->v9: 1) refine the dma-buf ioctl definition 2) add a lock to protect the dmabuf list 3) move drm format change to a separate patch 4) codes cleanup v7->v8: 1) refine framebuffer decoder code 2) fix a bug in decoding primary plane v6->v7: 1) release dma-buf related allocations in dma-buf's associated release function. 2) refine ioctl interface for querying plane info or create dma-buf 3) refine framebuffer decoder code 4) the patch series is based on 4.12.0-rc1 v5->v6: 1) align the dma-buf life cycle with the vfio device. 2) add the dma-buf releated operations in a separate patch. 3) i915 releated changes. v4->v5: 1) fix bug while checking whether the gem obj is gvt's dma-buf when user change caching mode or domains. Add a helper function to do it. 2) add definition for the query plane and create dma-buf. v3->v4: 1) fix bug while checking whether the gem obj is gvt's dma-buf when set caching mode or doamins. v2->v3: 1) add a field gvt_plane_info in the drm_i915_gem_obj structure to save the decoded plane information to avoid look up while need the plane info. 2) declare a new flag I915_GEM_OBJECT_IS_GVT_DMABUF in drm_i915_gem_object to represent the gem obj for gvt's dma-buf. The tiling mode, caching mode and domains can not be changed for this kind of gem object. 3) change dma-buf related information to be more generic. So other vendor can use the same interface. v1->v2: 1) create a management fd for dma-buf operations. 2) alloc gem object's backing storage in gem obj's get_pages() callback. This patch set adds the dma-buf support for intel GVT-g. dma-buf is a uniform mechanism to share DMA buffers across different devices and sub-systems. dma-buf for intel GVT-g is mainly used to share the vgpu's framebuffer to other users or sub-systems so they can use the dma-buf to show the desktop of a vm which uses intel vgpu. The main idea is we create a gem object and set vgpu's framebuffer as the backing storage of this gem object. And associate this gem obj to a dma-buf object then export this dma-buf at the meantime generate a file descriptor for this dma-buf. Finally deliver this file descriptor to user space. And user can use this dma-buf fd to do render or other operations. User need to create a fd(for intel GVT-g dma-buf support it is a:dma-buf management fd) then user can use this fd to query the plane information or create a dma-buf. The life cycle of this fd is managed by GVT-g user do not need to care about that. We have an example program on how to use the dma-buf. You can download the program to have a try. Good luck :) git repo: https://github.com/01org/igvtg-qemu branch:kvmgt_dmabuf_example Xiaoguang Chen (7): drm/i915/gvt: Extend the GVT-g architecture drm/i915/gvt: OpRegion support for GVT-g drm: Extend the drm format drm/i915/gvt: Frame buffer decoder support for GVT-g vfio: Define vfio based dma-buf operations drm/i915/gvt: Dmabuf support for GVT-g drm/i915/gvt: Adding user interface for dma-buf drivers/gpu/drm/i915/gvt/Makefile | 3 +- drivers/gpu/drm/i915/gvt/display.c | 2 +- drivers/gpu/drm/i915/gvt/display.h | 2 + drivers/gpu/drm/i915/gvt/dmabuf.c | 307 drivers/gpu/drm/i915/gvt/dmabuf.h | 42 drivers/gpu/drm/i915/gvt/fb_decoder.c | 425 + drivers/gpu/drm/i915/gvt/fb_decoder.h | 171 + drivers/gpu/drm/i915/gvt/gvt.c | 3 + drivers/gpu/drm/i915/gvt/gvt.h | 8 + drivers/gpu/drm/i915/gvt/hypercall.h | 4 + drivers/gpu/drm/i915/gvt/kvmgt.c | 245 ++- drivers/gpu/drm/i915/gvt/mpt.h | 45 drivers/gpu/drm/i915/gvt/opregion.c| 26 +- drivers/gpu/drm/i915/gvt/vgpu.c| 6 + drivers/gpu/drm/i915/i915_gem.c| 26 +- drivers/gpu/drm/i915/i915_gem_object.h | 9 + drivers/gpu/drm/i915/i915_gem_tiling.c | 5 + include/uapi/drm/drm_fourcc.h | 4 + include/uapi/linux/vfio.h | 57 + 19 files changed, 1378 insertions(+), 12 deletions(-) create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.c create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.h create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.c create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.h -- 2.7.4
[PATCH v9 2/7] drm/i915/gvt: OpRegion support for GVT-g
OpRegion is needed to support display related operation for intel vgpu. A vfio device region is added to intel vgpu to deliver the host OpRegion information to user space so user space can construct the OpRegion for vgpu. Signed-off-by: Bing Niu Signed-off-by: Xiaoguang Chen --- drivers/gpu/drm/i915/gvt/hypercall.h | 1 + drivers/gpu/drm/i915/gvt/kvmgt.c | 88 drivers/gpu/drm/i915/gvt/mpt.h | 15 ++ drivers/gpu/drm/i915/gvt/opregion.c | 26 --- drivers/gpu/drm/i915/gvt/vgpu.c | 4 ++ 5 files changed, 128 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h index df7f33a..32c345c 100644 --- a/drivers/gpu/drm/i915/gvt/hypercall.h +++ b/drivers/gpu/drm/i915/gvt/hypercall.h @@ -55,6 +55,7 @@ struct intel_gvt_mpt { unsigned long mfn, unsigned int nr, bool map); int (*set_trap_area)(unsigned long handle, u64 start, u64 end, bool map); + int (*set_opregion)(void *vgpu); }; extern struct intel_gvt_mpt xengt_mpt; diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 3c6a02b..6b4652a 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -53,6 +53,8 @@ static const struct intel_gvt_ops *intel_gvt_ops; #define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT) #define VFIO_PCI_OFFSET_MASK(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1) +#define OPREGION_SIGNATURE "IntelGraphicsMem" + struct vfio_region; struct intel_vgpu_regops { size_t (*rw)(struct intel_vgpu *vgpu, char *buf, @@ -436,6 +438,91 @@ static void kvmgt_protect_table_del(struct kvmgt_guest_info *info, } } +static size_t intel_vgpu_reg_rw_opregion(struct intel_vgpu *vgpu, char *buf, + size_t count, loff_t *ppos, bool iswrite) +{ + unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - + VFIO_PCI_NUM_REGIONS; + void *base = vgpu->vdev.region[i].data; + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; + + if (pos >= vgpu->vdev.region[i].size || iswrite) { + gvt_vgpu_err("invalid op or offset for Intel vgpu OpRegion\n"); + return -EINVAL; + } + count = min(count, (size_t)(vgpu->vdev.region[i].size - pos)); + memcpy(buf, base + pos, count); + + return count; +} + +static void intel_vgpu_reg_release_opregion(struct intel_vgpu *vgpu, + struct vfio_region *region) +{ + memunmap(region->data); +} + +static const struct intel_vgpu_regops intel_vgpu_regops_opregion = { + .rw = intel_vgpu_reg_rw_opregion, + .release = intel_vgpu_reg_release_opregion, +}; + +static int intel_vgpu_register_reg(struct intel_vgpu *vgpu, + unsigned int type, unsigned int subtype, + const struct intel_vgpu_regops *ops, + size_t size, u32 flags, void *data) +{ + struct vfio_region *region; + + region = krealloc(vgpu->vdev.region, + (vgpu->vdev.num_regions + 1) * sizeof(*region), + GFP_KERNEL); + if (!region) + return -ENOMEM; + + vgpu->vdev.region = region; + vgpu->vdev.region[vgpu->vdev.num_regions].type = type; + vgpu->vdev.region[vgpu->vdev.num_regions].subtype = subtype; + vgpu->vdev.region[vgpu->vdev.num_regions].ops = ops; + vgpu->vdev.region[vgpu->vdev.num_regions].size = size; + vgpu->vdev.region[vgpu->vdev.num_regions].flags = flags; + vgpu->vdev.region[vgpu->vdev.num_regions].data = data; + vgpu->vdev.num_regions++; + + return 0; +} + +static int kvmgt_set_opregion(void *p_vgpu) +{ + struct intel_vgpu *vgpu = (struct intel_vgpu *)p_vgpu; + unsigned int addr; + void *base; + int ret; + + addr = vgpu->gvt->opregion.opregion_pa; + if (!addr || !(~addr)) + return -ENODEV; + + base = memremap(addr, OPREGION_SIZE, MEMREMAP_WB); + if (!base) + return -ENOMEM; + + if (memcmp(base, OPREGION_SIGNATURE, 16)) { + memunmap(base); + return -EINVAL; + } + + ret = intel_vgpu_register_reg(vgpu, + PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE, + VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION, + &intel_vgpu_regops_opregion, OPREGION_SIZE, + VFIO_REGION_INFO_FLAG_READ, base); + if (ret) + memunmap(base); + + return ret; +} + static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev) { struct intel_vgpu *vgpu = NULL; @@ -1524,6 +1611,7 @@ struct intel_gvt_mpt kvmgt_mpt = { .read_gpa = kvmgt_read_gpa, .write_gpa = kvmgt_write_gpa, .gfn_to_mfn = kvmgt_gfn_to_pfn, + .set_opregion = kvmgt_set_o
Re: [PATCH v2 1/2] libsas: Don't process sas events in static works
On 15/06/2017 08:37, wangyijing wrote: 在 2017/6/14 21:08, John Garry 写道: On 14/06/2017 10:04, wangyijing wrote: static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event) { +struct sas_ha_event *ev; + BUG_ON(event >= HA_NUM_EVENTS); -sas_queue_event(event, &sas_ha->pending, -&sas_ha->ha_events[event].work, sas_ha); +ev = kzalloc(sizeof(*ev), GFP_ATOMIC); +if (!ev) +return; GFP_ATOMIC allocations can fail and then no events will be queued *and* we don't report the error back to the caller. Yes, it's really a problem, but I don't find a better solution, do you have some suggestion ? Dan raised an issue with this approach, regarding a malfunctioning PHY which spews out events. I still don't think we're handling it safely. Here's the suggestion: - each asd_sas_phy owns a finite-sized pool of events - when the event pool becomes exhausted, libsas stops queuing events (obviously) and disables the PHY in the LLDD - upon attempting to re-enable the PHY from sysfs, libsas first checks that the pool is still not exhausted If you cannot find a good solution, then let us know and we can help. Hi John and Dan, what's event you found on malfunctioning PHY, if the event is PORTE_BROADCAST_RCVD, since every PORTE_BROADCAST_RCVD libsas always call sas_revalidate_domain(), what about keeping a broadcast waiting(not queued in workqueue) and discard others. If the event is other types, things may become knotty. As I mentioned in the v1 series discussion, I found a poorly connected expander PHY was spewing out PHY up and loss of signal events continuously. This is the sort of situation we should protect against. Current solution is ok, as it uses a static event per port/PHY/HA. The point is that we cannot allow a PHY to continuously send events to libsas, which may lead to memory exhaustion. John John . .
[PATCH v9 6/7] drm/i915/gvt: Dmabuf support for GVT-g
dmabuf for GVT-g can be exported to users who can use the dmabuf to show the desktop of vm which use intel vgpu. Currently we provide query and create new dmabuf operations. Users of dmabuf can cache some created dmabufs and related information such as the framebuffer's address, size, tiling mode, width, height etc. When refresh the screen first query the currnet vgpu's frambuffer and compare with the cached ones(address, size, tiling, width, height etc) if found one then reuse the found dmabuf to gain performance improvment. If there is no dmabuf created yet or not found in the cached dmabufs then need to create a new dmabuf. To create a dmabuf first a gem object will be created and the backing storage of this gem object is the vgpu's framebuffer(primary/cursor). Set caching mode, change tiling mode and set domains of this gem object is not supported. Then associate this gem object to a dmabuf and export this dmabuf. A file descriptor will be generated for this dmabuf and this file descriptor can be sent to user space to do display. Signed-off-by: Xiaoguang Chen Tested-by: Kechen Lu --- drivers/gpu/drm/i915/gvt/Makefile | 2 +- drivers/gpu/drm/i915/gvt/dmabuf.c | 264 + drivers/gpu/drm/i915/gvt/dmabuf.h | 37 + drivers/gpu/drm/i915/gvt/gvt.h | 1 + drivers/gpu/drm/i915/i915_gem.c| 26 +++- drivers/gpu/drm/i915/i915_gem_object.h | 9 ++ drivers/gpu/drm/i915/i915_gem_tiling.c | 5 + 7 files changed, 342 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.c create mode 100644 drivers/gpu/drm/i915/gvt/dmabuf.h diff --git a/drivers/gpu/drm/i915/gvt/Makefile b/drivers/gpu/drm/i915/gvt/Makefile index 192ca26..e480f7d 100644 --- a/drivers/gpu/drm/i915/gvt/Makefile +++ b/drivers/gpu/drm/i915/gvt/Makefile @@ -2,7 +2,7 @@ GVT_DIR := gvt GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o firmware.o \ interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \ execlist.o scheduler.o sched_policy.o render.o cmd_parser.o \ - fb_decoder.o + fb_decoder.o dmabuf.o ccflags-y += -I$(src) -I$(src)/$(GVT_DIR) -Wall i915-y += $(addprefix $(GVT_DIR)/, $(GVT_SOURCE)) diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c new file mode 100644 index 000..6ef4f60 --- /dev/null +++ b/drivers/gpu/drm/i915/gvt/dmabuf.c @@ -0,0 +1,264 @@ +/* + * Copyright 2017 Intel Corporation. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Authors: + *Zhiyuan Lv + * + * Contributors: + *Xiaoguang Chen + */ + +#include +#include +#include + +#include "i915_drv.h" +#include "gvt.h" + +#define GEN8_DECODE_PTE(pte) (pte & GENMASK_ULL(63, 12)) + +static struct sg_table *intel_vgpu_gem_get_pages( + struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *dev_priv = to_i915(obj->base.dev); + struct sg_table *st; + struct scatterlist *sg; + int i, ret; + gen8_pte_t __iomem *gtt_entries; + struct intel_vgpu_fb_info *fb_info; + + fb_info = (struct intel_vgpu_fb_info *)obj->gvt_info; + if (WARN_ON(!fb_info)) + return ERR_PTR(-ENODEV); + + st = kmalloc(sizeof(*st), GFP_KERNEL); + if (!st) + return ERR_PTR(-ENOMEM); + + ret = sg_alloc_table(st, fb_info->fb_size, GFP_KERNEL); + if (ret) { + kfree(st); + return ERR_PTR(ret); + } + gtt_entries = (gen8_pte_t __iomem *)dev_priv->ggtt.gsm + + (fb_info->fb_addr >> PAGE_SHIFT); + for_each_sg(st->sgl, sg, fb_info->fb_size, i) { + sg->offset = 0; + sg->length = PAGE_SIZE; + sg_dma_address(sg) = + GEN8_DECODE_PTE(readq(>
[PATCH v9 7/7] drm/i915/gvt: Adding user interface for dma-buf
User space should create the management fd for the dma-buf operation first. Then user can query the plane information and create dma-buf if necessary using the management fd. Signed-off-by: Xiaoguang Chen Tested-by: Kechen Lu --- drivers/gpu/drm/i915/gvt/dmabuf.c| 45 +++- drivers/gpu/drm/i915/gvt/dmabuf.h| 5 ++ drivers/gpu/drm/i915/gvt/gvt.c | 3 + drivers/gpu/drm/i915/gvt/gvt.h | 6 ++ drivers/gpu/drm/i915/gvt/hypercall.h | 3 + drivers/gpu/drm/i915/gvt/kvmgt.c | 136 +++ drivers/gpu/drm/i915/gvt/mpt.h | 30 drivers/gpu/drm/i915/gvt/vgpu.c | 2 + 8 files changed, 229 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c index 6ef4f60..a6a6f6d 100644 --- a/drivers/gpu/drm/i915/gvt/dmabuf.c +++ b/drivers/gpu/drm/i915/gvt/dmabuf.c @@ -81,6 +81,31 @@ static void intel_vgpu_gem_put_pages(struct drm_i915_gem_object *obj, static void intel_vgpu_gem_release(struct drm_i915_gem_object *obj) { + struct intel_vgpu_dmabuf_obj *dmabuf_obj; + struct intel_vgpu_fb_info *fb_info; + struct intel_vgpu *vgpu; + struct list_head *pos; + + fb_info = (struct intel_vgpu_fb_info *)obj->gvt_info; + if (WARN_ON(!fb_info || !fb_info->vgpu)) { + gvt_vgpu_err("gvt info is invalid\n"); + goto out; + } + + vgpu = fb_info->vgpu; + mutex_lock(&vgpu->dmabuf_list_lock); + list_for_each(pos, &vgpu->dmabuf_obj_list_head) { + dmabuf_obj = container_of(pos, struct intel_vgpu_dmabuf_obj, + list); + if ((dmabuf_obj != NULL) && (dmabuf_obj->obj == obj)) { + kfree(dmabuf_obj); + list_del(pos); + break; + } + } + mutex_unlock(&vgpu->dmabuf_list_lock); + intel_gvt_hypervisor_put_vfio_device(vgpu); +out: kfree(obj->gvt_info); } @@ -216,6 +241,7 @@ int intel_vgpu_create_dmabuf(struct intel_vgpu *vgpu, void *args) struct vfio_dmabuf_mgr_create_dmabuf *gvt_dmabuf = args; struct intel_vgpu_fb_info *fb_info; int ret; + struct intel_vgpu_dmabuf_obj *dmabuf_obj; ret = intel_vgpu_get_plane_info(dev, vgpu, &gvt_dmabuf->plane_info, gvt_dmabuf->plane_id); @@ -238,6 +264,18 @@ int intel_vgpu_create_dmabuf(struct intel_vgpu *vgpu, void *args) fb_info->vgpu = vgpu; obj->gvt_info = fb_info; + dmabuf_obj = kmalloc(sizeof(*dmabuf_obj), GFP_KERNEL); + if (!dmabuf_obj) { + gvt_vgpu_err("alloc dmabuf_obj failed\n"); + ret = -ENOMEM; + goto out_free_info; + } + dmabuf_obj->obj = obj; + INIT_LIST_HEAD(&dmabuf_obj->list); + mutex_lock(&vgpu->dmabuf_list_lock); + list_add_tail(&dmabuf_obj->list, &vgpu->dmabuf_obj_list_head); + mutex_unlock(&vgpu->dmabuf_list_lock); + dmabuf = i915_gem_prime_export(dev, &obj->base, DRM_CLOEXEC | DRM_RDWR); if (IS_ERR(dmabuf)) { @@ -251,11 +289,16 @@ int intel_vgpu_create_dmabuf(struct intel_vgpu *vgpu, void *args) gvt_vgpu_err("create dma-buf fd failed ret:%d\n", ret); goto out_free; } - + if (intel_gvt_hypervisor_get_vfio_device(vgpu)) { + gvt_vgpu_err("get vfio device failed\n"); + goto out_free; + } gvt_dmabuf->fd = ret; return 0; out_free: + kfree(dmabuf_obj); +out_free_info: kfree(fb_info); out: i915_gem_object_put(obj); diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.h b/drivers/gpu/drm/i915/gvt/dmabuf.h index 8be9979..cafa781 100644 --- a/drivers/gpu/drm/i915/gvt/dmabuf.h +++ b/drivers/gpu/drm/i915/gvt/dmabuf.h @@ -31,6 +31,11 @@ struct intel_vgpu_fb_info { uint32_t fb_size; }; +struct intel_vgpu_dmabuf_obj { + struct drm_i915_gem_object *obj; + struct list_head list; +}; + int intel_vgpu_query_plane(struct intel_vgpu *vgpu, void *args); int intel_vgpu_create_dmabuf(struct intel_vgpu *vgpu, void *args); diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c index 2032917..d589830 100644 --- a/drivers/gpu/drm/i915/gvt/gvt.c +++ b/drivers/gpu/drm/i915/gvt/gvt.c @@ -54,6 +54,9 @@ static const struct intel_gvt_ops intel_gvt_ops = { .vgpu_reset = intel_gvt_reset_vgpu, .vgpu_activate = intel_gvt_activate_vgpu, .vgpu_deactivate = intel_gvt_deactivate_vgpu, + .vgpu_query_plane = intel_vgpu_query_plane, + .vgpu_create_dmabuf = intel_vgpu_create_dmabuf, + }; /** diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h index 763a8c5..df7e216 100644 --- a/drivers/gpu/drm/i915/gvt/gvt.h +++ b/drivers/gpu/drm/i915/gvt/gvt.h @@ -185,8 +185,12 @@ struct intel_vgpu { struct kv
[PATCH v9 3/7] drm: Extend the drm format
Add new drm format which will be used by GVT-g. Signed-off-by: Xiaoguang Chen --- include/uapi/drm/drm_fourcc.h | 4 1 file changed, 4 insertions(+) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 55e3010..2681862 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -113,6 +113,10 @@ extern "C" { #define DRM_FORMAT_AYUVfourcc_code('A', 'Y', 'U', 'V') /* [31:0] A:Y:Cb:Cr 8:8:8:8 little endian */ +/* 64 bpp RGB */ +#define DRM_FORMAT_XRGB161616 fourcc_code('X', 'R', '4', '8') /* [63:0] x:R:G:B 16:16:16:16 little endian */ +#define DRM_FORMAT_XBGR161616 fourcc_code('X', 'B', '4', '8') /* [63:0] x:B:G:R 16:16:16:16 little endian */ + /* * 2 plane RGB + A * index 0 = RGB plane, same format as the corresponding non _A8 format has -- 2.7.4
[PATCH v9 5/7] vfio: Define vfio based dma-buf operations
Here we defined a new ioctl to create a fd for a vfio device based on the input type. Now only one type is supported that is a dma-buf management fd. Two ioctls are defined for the dma-buf management fd: query the vfio vgpu's plane information and create a dma-buf for a plane. Signed-off-by: Xiaoguang Chen --- include/uapi/linux/vfio.h | 57 +++ 1 file changed, 57 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ae46105..7d86101 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -502,6 +502,63 @@ struct vfio_pci_hot_reset { #define VFIO_DEVICE_PCI_HOT_RESET _IO(VFIO_TYPE, VFIO_BASE + 13) +/** + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32) + * + * Create a fd for a vfio device based on the input type + * Vendor driver should handle this ioctl to create a fd and manage the + * life cycle of this fd. + * + * Return: a fd if vendor support that type, -errno if not supported + */ + +#define VFIO_DEVICE_GET_FD _IO(VFIO_TYPE, VFIO_BASE + 14) + +#define VFIO_DEVICE_DMABUF_MGR_FD 0 /* Supported fd types */ + +struct vfio_dmabuf_mgr_plane_info { + __u64 start; + __u64 drm_format_mod; + __u32 drm_format; + __u32 width; + __u32 height; + __u32 stride; + __u32 size; + __u32 x_pos; + __u32 y_pos; + __u32 padding; +}; + +/* + * VFIO_DMABUF_MGR_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15, + * struct vfio_dmabuf_mgr_query_plane) + * Query plane information + */ +struct vfio_dmabuf_mgr_query_plane { + __u32 argsz; + __u32 flags; + struct vfio_dmabuf_mgr_plane_info plane_info; + __u32 plane_id; +}; + +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15) + +/* + * VFIO_DMABUF_MGR_CREATE_DMABUF - _IO(VFIO, VFIO_BASE + 16, + * struct vfio_dmabuf_mgr_create_dmabuf) + * + * Create a dma-buf for a plane + */ +struct vfio_dmabuf_mgr_create_dmabuf { + __u32 argsz; + __u32 flags; + struct vfio_dmabuf_mgr_plane_info plane_info; + __u32 plane_id; + __s32 fd; +}; + +#define VFIO_DMABUF_MGR_CREATE_DMABUF _IO(VFIO_TYPE, VFIO_BASE + 16) + /* API for Type1 VFIO IOMMU */ /** -- 2.7.4
[PATCH v9 4/7] drm/i915/gvt: Frame buffer decoder support for GVT-g
decode frambuffer attributes of primary, cursor and sprite plane Signed-off-by: Xiaoguang Chen --- drivers/gpu/drm/i915/gvt/Makefile | 3 +- drivers/gpu/drm/i915/gvt/display.c| 2 +- drivers/gpu/drm/i915/gvt/display.h| 2 + drivers/gpu/drm/i915/gvt/fb_decoder.c | 425 ++ drivers/gpu/drm/i915/gvt/fb_decoder.h | 171 ++ drivers/gpu/drm/i915/gvt/gvt.h| 1 + 6 files changed, 602 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.c create mode 100644 drivers/gpu/drm/i915/gvt/fb_decoder.h diff --git a/drivers/gpu/drm/i915/gvt/Makefile b/drivers/gpu/drm/i915/gvt/Makefile index b123c20..192ca26 100644 --- a/drivers/gpu/drm/i915/gvt/Makefile +++ b/drivers/gpu/drm/i915/gvt/Makefile @@ -1,7 +1,8 @@ GVT_DIR := gvt GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o firmware.o \ interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \ - execlist.o scheduler.o sched_policy.o render.o cmd_parser.o + execlist.o scheduler.o sched_policy.o render.o cmd_parser.o \ + fb_decoder.o ccflags-y += -I$(src) -I$(src)/$(GVT_DIR) -Wall i915-y += $(addprefix $(GVT_DIR)/, $(GVT_SOURCE)) diff --git a/drivers/gpu/drm/i915/gvt/display.c b/drivers/gpu/drm/i915/gvt/display.c index e0261fc..f5f63c5 100644 --- a/drivers/gpu/drm/i915/gvt/display.c +++ b/drivers/gpu/drm/i915/gvt/display.c @@ -67,7 +67,7 @@ static int edp_pipe_is_enabled(struct intel_vgpu *vgpu) return 1; } -static int pipe_is_enabled(struct intel_vgpu *vgpu, int pipe) +int pipe_is_enabled(struct intel_vgpu *vgpu, int pipe) { struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; diff --git a/drivers/gpu/drm/i915/gvt/display.h b/drivers/gpu/drm/i915/gvt/display.h index d73de22..b46b868 100644 --- a/drivers/gpu/drm/i915/gvt/display.h +++ b/drivers/gpu/drm/i915/gvt/display.h @@ -179,4 +179,6 @@ int intel_vgpu_init_display(struct intel_vgpu *vgpu, u64 resolution); void intel_vgpu_reset_display(struct intel_vgpu *vgpu); void intel_vgpu_clean_display(struct intel_vgpu *vgpu); +int pipe_is_enabled(struct intel_vgpu *vgpu, int pipe); + #endif diff --git a/drivers/gpu/drm/i915/gvt/fb_decoder.c b/drivers/gpu/drm/i915/gvt/fb_decoder.c new file mode 100644 index 000..a4614b5 --- /dev/null +++ b/drivers/gpu/drm/i915/gvt/fb_decoder.c @@ -0,0 +1,425 @@ +/* + * Copyright(c) 2011-2016 Intel Corporation. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Authors: + *Kevin Tian + * + * Contributors: + *Bing Niu + *Xu Han + *Ping Gao + *Xiaoguang Chen + *Yang Liu + * + */ + +#include +#include "i915_drv.h" +#include "gvt.h" + +#define PRIMARY_FORMAT_NUM 16 +struct pixel_format { + int drm_format; /* Pixel format in DRM definition */ + int bpp;/* Bits per pixel, 0 indicates invalid */ + char *desc; /* The description */ +}; + +/* non-supported format has bpp default to 0 */ +static struct pixel_format bdw_pixel_formats[PRIMARY_FORMAT_NUM] = { + [0x2] = {DRM_FORMAT_C8, 8, "8-bit Indexed"}, + [0x5] = {DRM_FORMAT_RGB565, 16, "16-bit BGRX (5:6:5 MSB-R:G:B)"}, + [0x6] = {DRM_FORMAT_XRGB, 32, + "32-bit BGRX (8:8:8:8 MSB-X:R:G:B)"}, + [0x8] = {DRM_FORMAT_XBGR2101010, 32, + "32-bit RGBX (2:10:10:10 MSB-X:B:G:R)"}, + [0xa] = {DRM_FORMAT_XRGB2101010, 32, + "32-bit BGRX (2:10:10:10 MSB-X:R:G:B)"}, + [0xc] = {DRM_FORMAT_XRGB161616, 64, + "64-bit RGBX Floating Point(16:16:16:16 MSB-X:B:G:R)"}, + [0xe] = {DRM_FORMAT_XBGR, 32, + "32-bit RGBX (8:8:8:8 MSB-X:B:G:R)"}, +}; + +/* non-supported format has bpp default to 0 */ +static struct pixel_format skl_pixel_
Re: [RFC][PATCH 0/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed
Sorry, forget adding Taku to the list. Hi Taku, On 06/15/17 at 03:52pm, Baoquan He wrote: > Our customer reported that Kernel text may be located on non-mirror > region (movable zone) when both address range mirroring feature and > KASLR are enabled. > > The functions of address range mirroring feature are as follows. > - The physical memory region whose descriptors in EFI memory map have > EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored > - The function arranges such mirror region into normal zone and other region > into movable zone in order to locate kernel code and data on mirror region > > So we need restrict kernel to be located inside mirror region if it > is existed. > > The method is very simple. If efi is enabled, just iterate all efi > memory map and pick up mirror region to process for adding candidate > of slot. If efi disabled or no mirror region existed, still process > e820 memory map. This won't bring much efficiency loss, at worst we > just go through all efi memory maps and found no mirror. > > One question: > From code, though mirror regions are existed, they are meaningful only > if kernelcore=mirror kernel option is specified. Not sure if my understanding > is correct. Since you are the author of kernelcore=mirror related code and expert on mirror feature, could you help answer above question? Thanks Baoquan > > NOTE: > I haven't got a machine with efi mirror region enabled, so only test the > e820 map processing case and the case of no mirror region on efi machine. > So set this as a RFC patchset, will post formal one after above question > is made clear and mirror issue test passed. > > Baoquan He (2): > x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map > x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if > existed > > arch/x86/boot/compressed/kaslr.c | 129 > +++ > 1 file changed, 104 insertions(+), 25 deletions(-) > > -- > 2.5.5 >
Re: [virtio-dev] Re: [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
On 06/14/2017 01:56 AM, Michael S. Tsirkin wrote: On Fri, Jun 09, 2017 at 06:41:38PM +0800, Wei Wang wrote: Add a new feature, VIRTIO_BALLOON_F_PAGE_CHUNKS, which enables the transfer of the ballooned (i.e. inflated/deflated) pages in chunks to the host. so now these chunks are just s/g list entry. So let's rename this VIRTIO_BALLOON_F_SG with a comment: * Use standard virtio s/g instead of PFN lists * Actually, it's not using the standard s/g list in the implementation, because: using the standard s/g will need kmalloc() the indirect table on demand (i.e. when virtqueue_add() converts s/g to indirect table); The implementation directly pre-allocates an indirect desc table, and uses a entry (i.e. vring_desc) to describe a chunk. This avoids the overhead of kmalloc() the indirect table. +/* + * Callulates how many pfns can a page_bmap record. A bit corresponds to a + * page of PAGE_SIZE. + */ +#define VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP \ + (VIRTIO_BALLOON_PAGE_BMAP_SIZE * BITS_PER_BYTE) + +/* The number of page_bmap to allocate by default. */ +#define VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM 1 It's not by default, it's at probe time, right? It is the number of page bitmap being kept throughout the whole lifecycle of the driver. The page bmap will be temporarily extended due to insufficiency during a ballooning process, but when that ballooning finishes, the extended part will be freed. +/* The maximum number of page_bmap that can be allocated. */ Not really, this is the size of the array we use to keep them. This is the max number of the page bmap that can be extended temporarily. +#define VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM 32 + So you still have a home-grown bitmap. I'd like to know why isn't xbitmap suggested for this purpose by Matthew Wilcox appropriate. Please add a comment explaining the requirements from the data structure. I didn't find his xbitmap being upstreamed, did you? +/* + * QEMU virtio implementation requires the desc table size less than + * VIRTQUEUE_MAX_SIZE, so minus 1 here. I think it doesn't, the issue is probably that you add a header as a separate s/g. In any case see below. + */ +#define VIRTIO_BALLOON_MAX_PAGE_CHUNKS (VIRTQUEUE_MAX_SIZE - 1) This is wrong, virtio spec says s/g size should not exceed VQ size. If you want to support huge VQ sizes, you can add a fallback to smaller sizes until it fits in 1 page. Probably no need for huge VQ size, 1024 queue size should be enough. And we can have 1024 descriptors in the indirect table, so the above size doesn't exceed the vq size, right? +static unsigned int extend_page_bmap_size(struct virtio_balloon *vb, + unsigned long pfn_num) what's this API doing? Pls add comments. this seems to assume it will only be called once. OK, I will add some comments here. This is the function to extend the number of page bitmap when the original 1 page bmap is not sufficient during a ballooning process. As mentioned above, at the end of this ballooning process, the extended part will be freed. it would be better to avoid making this assumption, just look at what has been allocated and extend it. Actually it's not an assumption. The rule here is that we always keep "1" page bmap. "1" is defined by the VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM. So when freeing, it also references VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM (not assuming any number) +} + +/* Add a chunk to the buffer. */ +static void add_one_chunk(struct virtio_balloon *vb, struct virtqueue *vq, + u64 base_addr, u32 size) +{ + unsigned int *num = &vb->balloon_page_chunk.chunk_num; + struct vring_desc *desc = &vb->balloon_page_chunk.desc_table[*num]; + + desc->addr = cpu_to_virtio64(vb->vdev, base_addr); + desc->len = cpu_to_virtio32(vb->vdev, size); + *num += 1; + if (*num == VIRTIO_BALLOON_MAX_PAGE_CHUNKS) + send_page_chunks(vb, vq); +} + Poking at virtio internals like this is not nice. Pls move to virtio code. Also, pages must be read descriptors as host might modify them. This also lacks viommu support but this is not mandatory as that is borken atm anyway. I'll send a patch to at least fail cleanly. OK, thanks. +static void convert_bmap_to_chunks(struct virtio_balloon *vb, + struct virtqueue *vq, + unsigned long *bmap, + unsigned long pfn_start, + unsigned long size) +{ + unsigned long next_one, next_zero, pos = 0; + u64 chunk_base_addr; + u32 chunk_size; + + while (pos < size) { + next_one = find_next_bit(bmap, size, pos); + /* +* No "1" bit found, which means that there is no pfn +* recorded in the rest of this bmap. +*/ + if (next_one == size) + break; +
Re: [PATCH] mfd: intel_soc_pmic: use 'depends on' instead of 'select'
On Fri, 09 Jun 2017, Arnd Bergmann wrote: > I ran into a build error on ARM with a platform that has a non-standard > clk implementation: > > drivers/clk/clk.o: In function `clk_disable': > clk.c:(.text.clk_disable+0x0): multiple definition of `clk_disable' > arch/arm/mach-omap1/clock.o:clock.c:(.text.clk_disable+0x0): first defined > here > drivers/clk/clk.o: In function `clk_enable': > clk.c:(.text.clk_enable+0x0): multiple definition of `clk_enable' > arch/arm/mach-omap1/clock.o:clock.c:(.text.clk_enable+0x0): first defined here > > The problem is a device driver that uses 'select COMMON_CLK', which is > generally a bad idea: selecting a subsystem should only be done from > a platform, otherwise we run into circular dependencies. The same driver > also selects 'GPIOLIB' and 'I2C', which has a similar effect. > > This turns all three into 'depends on', as it should be. The same pattern > exists for INTEL_SOC_PMIC and INTEL_SOC_PMIC_CHTWC, so we fix both the > same way to keep them in sync. INTEL_SOC_PMIC does not depend on ACPI, > so we don't need to 'select' the I2C master driver when ACPI is disabled. > > Finally, we can limit the build to x86, unless we are compile testing. > > Fixes: 2f91ded5f8f4 ("mfd: Add Cherry Trail Whiskey Cove PMIC driver") > Fixes: 5f125f1f5705 ("mfd: intel_soc_pmic: Select designware i2c-bus driver") > Signed-off-by: Arnd Bergmann > --- > drivers/mfd/Kconfig | 15 ++- > 1 file changed, 6 insertions(+), 9 deletions(-) I need 2 patches, one for each of the Fixes above. The plan being to squash them into the original commits (keeping sign-off credits of course) to prevent bisectability breakage. If that does not happen, I will have to remove both offending patches until they are fixed. > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig > index ea5daa935518..74fa52582f06 100644 > --- a/drivers/mfd/Kconfig > +++ b/drivers/mfd/Kconfig > @@ -454,14 +454,12 @@ config LPC_SCH > > config INTEL_SOC_PMIC > bool "Support for Crystal Cove PMIC" > - depends on HAS_IOMEM > - select GPIOLIB > - select I2C > + depends on HAS_IOMEM && I2C=y && GPIOLIB && COMMON_CLK > + depends on X86 || COMPILE_TEST > select MFD_CORE > select REGMAP_I2C > select REGMAP_IRQ > - select COMMON_CLK > - select I2C_DESIGNWARE_PLATFORM > + select I2C_DESIGNWARE_PLATFORM if ACPI > help > Select this option to enable support for Crystal Cove PMIC > on some Intel SoC systems. The PMIC provides ADC, GPIO, > @@ -484,13 +482,12 @@ config INTEL_SOC_PMIC_BXTWC > on these systems. > > config INTEL_SOC_PMIC_CHTWC > - bool "Support for Intel Cherry Trail Whiskey Cove PMIC" > - depends on ACPI && HAS_IOMEM > + tristate "Support for Intel Cherry Trail Whiskey Cove PMIC" > + depends on ACPI && HAS_IOMEM && I2C=y && COMMON_CLK > + depends on X86 || COMPILE_TEST > select MFD_CORE > - select I2C > select REGMAP_I2C > select REGMAP_IRQ > - select COMMON_CLK > select I2C_DESIGNWARE_PLATFORM > help > Select this option to enable support for the Intel Cherry Trail -- Lee Jones Linaro STMicroelectronics Landing Team Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH v4 13/14] libnvdimm, pmem: gate cache management on QUEUE_FLAG_WC in pmem_dax_flush()
On Wed 14-06-17 16:11:26, Dan Williams wrote: > Some platforms arrange for cpu caches to be flushed on power-fail. On > those platforms there is no requirement that the kernel track and flush > potentially dirty cache lines. Given that we still insert entries into > the radix for locking purposes this patch only disables the cache flush > loop, not the dirty tracking. > > Userspace can override the default cache setting via the block device > queue "write_cache" attribute in sysfs. > > Cc: Jeff Moyer > Cc: Christoph Hellwig > Cc: Matthew Wilcox > Cc: Ross Zwisler > Suggested-by: Jan Kara > Signed-off-by: Dan Williams Looks good. You can add: Reviewed-by: Jan Kara Honza > --- > Changes since v3: > * move the check of QUEUE_FLAG_WC into the pmem driver directly (Jan) > > drivers/nvdimm/pmem.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index 06f6c27ec1e9..49938b246a7b 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -244,7 +244,16 @@ static size_t pmem_copy_from_iter(struct dax_device > *dax_dev, pgoff_t pgoff, > static void pmem_dax_flush(struct dax_device *dax_dev, pgoff_t pgoff, > void *addr, size_t size) > { > - arch_wb_cache_pmem(addr, size); > + struct pmem_device *pmem = dax_get_private(dax_dev); > + struct gendisk *disk = pmem->disk; > + struct request_queue *q = disk->queue; > + > + /* > + * Only perform cache management when the queue has caching > + * enabled. > + */ > + if (test_bit(QUEUE_FLAG_WC, &q->queue_flags)) > + arch_wb_cache_pmem(addr, size); > } > > static const struct dax_operations pmem_dax_ops = { > -- Jan Kara SUSE Labs, CR
Re: [PATCH v3 08/14] x86, dax, libnvdimm: move wb_cache_pmem() to libnvdimm
On Wed 14-06-17 09:49:29, Dan Williams wrote: > On Wed, Jun 14, 2017 at 3:54 AM, Jan Kara wrote: > >> -/** > >> - * arch_wb_cache_pmem - write back a cache range with CLWB > >> - * @vaddr: virtual start address > >> - * @size:number of bytes to write back > >> - * > >> - * Write back a cache range using the CLWB (cache line write back) > >> - * instruction. Note that @size is internally rounded up to be cache > >> - * line size aligned. > >> - */ > >> static inline void arch_wb_cache_pmem(void *addr, size_t size) > >> { > >> - u16 x86_clflush_size = boot_cpu_data.x86_clflush_size; > >> - unsigned long clflush_mask = x86_clflush_size - 1; > >> - void *vend = addr + size; > >> - void *p; > >> - > >> - for (p = (void *)((unsigned long)addr & ~clflush_mask); > >> - p < vend; p += x86_clflush_size) > >> - clwb(p); > >> + clean_cache_range(addr,size); > >> } > > > > So this will make compilation break on 32-bit x86 as it does not define > > clean_cache_range(). Do we somewhere force we are on x86_64 when pmem is > > enabled? > > Yes, this is enforced by: > > select ARCH_HAS_PMEM_API if X86_64 > > ...in arch/x86/Kconfig. We fallback to a dummy arch_wb_cache_pmem() > implementation and emit this warning for !ARCH_HAS_PMEM_API archs: > > "nd_pmem namespace0.0: unable to guarantee persistence of writes" Aha, right. Feel free to add: Reviewed-by: Jan Kara Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH v15 2/7] power: add power sequence library
On 15 June 2017 at 08:58, Peter Chen wrote: > On Wed, Jun 14, 2017 at 10:53:29AM +0200, Ulf Hansson wrote: >> On 14 June 2017 at 03:53, Peter Chen wrote: >> > On Tue, Jun 13, 2017 at 12:24:42PM +0200, Ulf Hansson wrote: >> >> [...] >> >> >> >> > + >> >> > +/** >> >> > + * of_pwrseq_on - Carry out power sequence on for device node >> >> > + * >> >> > + * @np: the device node would like to power on >> >> > + * >> >> > + * Carry out a single device power on. If multiple devices >> >> > + * need to be handled, use of_pwrseq_on_list() instead. >> >> > + * >> >> > + * Return a pointer to the power sequence instance on success, >> >> > + * or an error code otherwise. >> >> > + */ >> >> > +struct pwrseq *of_pwrseq_on(struct device_node *np) >> >> > +{ >> >> > + struct pwrseq *pwrseq; >> >> > + int ret; >> >> > + >> >> > + pwrseq = pwrseq_find_available_instance(np); >> >> > + if (!pwrseq) >> >> > + return ERR_PTR(-ENOENT); >> >> >> >> In case the pwrseq instance hasn't been registered yet, then there is >> >> no way to deal with -EPROBE_DEFER properly here. >> >> >> >> I haven't been following the discussions in-depth during all >> >> iterations, so perhaps you have already discussed why doing it like >> >> this. >> > >> > Yes, it has been discussed. In order to compare with compatible string >> > at dts, we need to have one registered pwrseq instance for each >> > pwrseq library, this pre-registered one is allocated using >> > postcore_initcall, and the new (eg, second) instance is registered >> > after pwrseq_get has succeeded. >> >> I understand you need one compatible per pwrseq library, but how does >> that have anything to do with -EPROBE_DEFER? >> >> My point is that, if a driver calls of_pwrseq_on() (which calls >> pwrseq_find_available_instance()), but the corresponding pwrseq >> library and instance has not yet been registered for that device. Then >> how will you handle -EPROBE_DEFER? I guess you simply can't, which is >> why *all* pwrseq libraries needs to be registered in early boot phase, >> like at postcore_initcall(). Right? >> >> If that is the case, I really don't like it. >> > > Yes, you are right. This is the limitation for this power sequence > library, the registration for the 1st power sequence instance must > be finished before device driver uses it. I am appreciated that > you can supply some suggestions for it. In general this kind of problems is solved by first parsing the DTB, which means you will find out whether there is a resource (a pwrseq) required for the device. Then you try to fetch that resource, and if that fails, it means the resource is not yet available, and hence you want to retry later and should return -EPROBE_DEFER. In this case, of_pwrseq_on() needs to be converted to start looking for a pwrseq compatible in it's child node - I guess. Then if that is found, you try to fetch the instance of the corresponding library. Failing to fetch the library instance should then cause a return -EPROBE_DEFER. > >> Moreover, I have found yet another severe problem but reviewing the code: >> In the struct pwrseq, you have a "bool used", which you are setting to >> "true" once the pwrseq has been hooked up with the device, when a >> driver calls of_pwrseq_on(). Setting that variable to true, will also >> prevent another driver from using the same instance of the pwrseq for >> its device. So, to cope with multiple users, you register a new >> instance of the same pwrseq library that got hooked up, once the >> ->get() callback is about to complete. >> >> The problem the occurs, when there is another driver calling >> of_pwrseq_on() in between, meaning that the new instance has not yet >> been registered. This will simply fail, won't it? > > Yes, you are right, thanks for pointing that, I will add mutex_lock for > of_pwrseq_on. Another option is to entirely skip to two step approach. In other words, make the library to cope with multiple users via the same registered library instance. [...] Kind regards Uffe
Re: [RFC PATCH 2/4] hugetlb: add support for preferred node to alloc_huge_page_nodemask
On Wed 14-06-17 17:12:31, Mike Kravetz wrote: > On 06/14/2017 03:12 PM, Mike Kravetz wrote: > > On 06/13/2017 02:00 AM, Michal Hocko wrote: > >> From: Michal Hocko > >> > >> alloc_huge_page_nodemask tries to allocate from any numa node in the > >> allowed node mask starting from lower numa nodes. This might lead to > >> filling up those low NUMA nodes while others are not used. We can reduce > >> this risk by introducing a concept of the preferred node similar to what > >> we have in the regular page allocator. We will start allocating from the > >> preferred nid and then iterate over all allowed nodes in the zonelist > >> order until we try them all. > >> > >> This is mimicking the page allocator logic except it operates on > >> per-node mempools. dequeue_huge_page_vma already does this so distill > >> the zonelist logic into a more generic dequeue_huge_page_nodemask > >> and use it in alloc_huge_page_nodemask. > >> > >> Signed-off-by: Michal Hocko > >> --- > > > > > > I built attempts/hugetlb-zonelists, threw it on a test machine, ran the > > libhugetlbfs test suite and saw failures. The failures started with this > > patch: commit 7e8b09f14495 in your tree. I have not yet started to look > > into the failures. It is even possible that the tests are making bad > > assumptions, but there certainly appears to be changes in behavior visible > > to the application(s). > > nm. The failures were the result of dequeue_huge_page_nodemask() always > returning NULL. Vlastimil already noticed this issue and provided a > solution. I have pushed my current version to the same branch. -- Michal Hocko SUSE Labs
Re: [PATCH v2 2/2] phy: Add stingray SATA phy support
Hi Kishon, I have re-based this patch to "linux-phy -next". Please review this. Thank you. Regards, Srinath.
Re: [PATCH] mm, memory_hotplug: support movable_node for hotplugable nodes
On Thu 15-06-17 11:13:54, Wei Yang wrote: > On Mon, Jun 12, 2017 at 08:45:02AM +0200, Michal Hocko wrote: > >On Mon 12-06-17 12:28:32, Wei Yang wrote: > >> On Thu, Jun 08, 2017 at 02:23:18PM +0200, Michal Hocko wrote: > >> >From: Michal Hocko > >> > > >> >movable_node kernel parameter allows to make hotplugable NUMA > >> >nodes to put all the hotplugable memory into movable zone which > >> >allows more or less reliable memory hotremove. At least this > >> >is the case for the NUMA nodes present during the boot (see > >> >find_zone_movable_pfns_for_nodes). > >> > > >> > >> When movable_node is enabled, we would have overlapped zones, right? > > > >It won't based on this patch. See movable_pfn_range > > > > Ok, I went through the code and here maybe a question not that close related > to this patch. Please start a new thread with unrelated questions > I did some experiment with qemu+kvm and see this. > > Guest config: 8G RAM, 2 nodes with 4G on each > Guest kernel: 4.11 > Guest kernel command: kernelcore=1G > > The log message in kernel is: > > [0.00] Zone ranges: > [0.00] DMA [mem 0x1000-0x00ff] > [0.00] DMA32[mem 0x0100-0x] > [0.00] Normal [mem 0x0001-0x00023fff] > [0.00] Movable zone start for each node > [0.00] Node 0: 0x0001 > [0.00] Node 1: 0x00014000 > > We see on node 2, ZONE_NORMAL overlap with ZONE_MOVABLE. > [0x00014000 - 0x00023fff] belongs to both ZONE. Not really. The above output is just confusing a bit. Zone ranges print arch_zone_{lowest,highest}_possible_pfn range while the Movable zone is excluded from that in adjust_zone_range_for_zone_movable -- Michal Hocko SUSE Labs
Re: [patch 1/2] staging: speakup: add function to convert dev name to number
Hi, On Wed, Jun 14, 2017 at 9:23 AM, Dan Carpenter wrote: [...] > > Could you call it "dev_name" instead? I normally expect "dev" to be a > device struct. Thanks for the feedback. Will keep these in mind for next version of the patch. Okash
Re: [PATCH v2 1/2] libsas: Don't process sas events in static works
在 2017/6/15 16:00, John Garry 写道: > On 15/06/2017 08:37, wangyijing wrote: >> >> >> 在 2017/6/14 21:08, John Garry 写道: >>> On 14/06/2017 10:04, wangyijing wrote: >> static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event >> event) { +struct sas_ha_event *ev; + BUG_ON(event >= HA_NUM_EVENTS); -sas_queue_event(event, &sas_ha->pending, -&sas_ha->ha_events[event].work, sas_ha); +ev = kzalloc(sizeof(*ev), GFP_ATOMIC); +if (!ev) +return; >> GFP_ATOMIC allocations can fail and then no events will be queued *and* >> we >> don't report the error back to the caller. >> Yes, it's really a problem, but I don't find a better solution, do you have some suggestion ? >>> >>> Dan raised an issue with this approach, regarding a malfunctioning PHY >>> which spews out events. I still don't think we're handling it safely. >>> Here's the suggestion: >>> - each asd_sas_phy owns a finite-sized pool of events >>> - when the event pool becomes exhausted, libsas stops queuing events >>> (obviously) and disables the PHY in the LLDD >>> - upon attempting to re-enable the PHY from sysfs, libsas first checks that >>> the pool is still not exhausted >>> >>> If you cannot find a good solution, then let us know and we can help. >> >> Hi John and Dan, what's event you found on malfunctioning PHY, if the event >> is PORTE_BROADCAST_RCVD, since >> every PORTE_BROADCAST_RCVD libsas always call sas_revalidate_domain(), what >> about keeping a broadcast waiting(not queued in workqueue) >> and discard others. If the event is other types, things may become knotty. >> > > As I mentioned in the v1 series discussion, I found a poorly connected > expander PHY was spewing out PHY up and loss of signal events continuously. > This is the sort of situation we should protect against. Current solution is > ok, as it uses a static event per port/PHY/HA. > > The point is that we cannot allow a PHY to continuously send events to > libsas, which may lead to memory exhaustion. The current solution won't introduce memory exhaustion, but it's not ok, since the root of this issue is it may lost event which is normal. If we cannot identify the abnormal PHY, I think your mem pool idea is a candidate solution. > > John > >> >>> >>> John >>> >>> >>> . >>> >> >> >> . >> > > > > . >
Re: [PATCH] mm, memory_hotplug: support movable_node for hotplugable nodes
On Thu 15-06-17 11:29:27, Wei Yang wrote: [...] > >+static inline bool movable_pfn_range(int nid, struct zone *default_zone, > >+unsigned long start_pfn, unsigned long nr_pages) > >+{ > >+if (!allow_online_pfn_range(nid, start_pfn, nr_pages, > >+MMOP_ONLINE_KERNEL)) > >+return true; > >+ > >+if (!movable_node_is_enabled()) > >+return false; > >+ > >+return !zone_intersects(default_zone, start_pfn, nr_pages); > >+} > >+ > > To be honest, I don't understand this clearly. > > move_pfn_range() will choose and move the range to a zone based on the > online_type, where we have two cases: > 1. ONLINE_MOVABLE -> ZONE_MOVABLE will be chosen > 2. ONLINE_KEEP-> ZONE_NORMAL is the default while ZONE_MOVABLE will be > chosen in case movable_pfn_range() returns true. > > There are three conditions in movable_pfn_range(): > 1. Not allowed in kernel_zone, returns true > 2. Movable_node not enabled, return false > 3. Range [start_pfn, start_pfn + nr_pages) doesn't intersect with > default_zone, return true > > The first one is inherited from original code, so lets look at the other two. > > Number 3 is easy to understand, if the hot-added range is already part of > ZONE_NORMAL, use it. > > Number 2 makes me confused. If movable_node is not enabled, ZONE_NORMAL will > be chosen. If movable_node is enabled, it still depends on other two > condition. So how a memory_block is onlined to ZONE_MOVABLE because > movable_node is enabled? This is simple. If the movable_node is set then ONLINE_KEEP defaults to the movable zone unless the range is already covered by a kernel zone (read Normal zone most of the time). > What I see is you would forbid a memory_block to be > onlined to ZONE_MOVABLE when movable_node is not enabled. Please note that this is ONLINE_KEEP not ONLINE_MOVABLE and as such the movable zone is used only if we are withing the movable zone range already (test 1). > Instead of you would > online a memory_block to ZONE_MOVABLE when movable_node is enabled, which is > implied in your change log. > > BTW, would you mind giving me these two information? > 1. Which branch your code is based on? I have cloned your > git(//git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git), while still see > some difference. yes this is based on the mmotm tree (use since-4.11 or auto-latest branch) > 2. Any example or test case I could try your patch and see the difference? It > would be better if it could run in qemu+kvm. See http://lkml.kernel.org/r/20170421120512.23960-1-mho...@kernel.org -- Michal Hocko SUSE Labs
Re: [PATCH 2/6] drivers base/arch_topology: frequency-invariant load-tracking support
Hi, On 14/06/17 15:08, Vincent Guittot wrote: > On 14 June 2017 at 09:55, Dietmar Eggemann wrote: > > > > On 06/12/2017 04:27 PM, Vincent Guittot wrote: > > > On 8 June 2017 at 09:55, Dietmar Eggemann > > > wrote: > > > > Hi Vincent, > > > > Thanks for the review! > > > > [...] > > > > >> @@ -225,8 +265,14 @@ static int __init register_cpufreq_notifier(void) > > >> > > >> cpumask_copy(cpus_to_visit, cpu_possible_mask); > > >> > > >> - return cpufreq_register_notifier(&init_cpu_capacity_notifier, > > >> -CPUFREQ_POLICY_NOTIFIER); > > >> + ret = cpufreq_register_notifier(&init_cpu_capacity_notifier, > > >> + CPUFREQ_POLICY_NOTIFIER); > > >> + > > >> + if (ret) > > > > > > Don't you have to free memory allocated for cpus_to_visit in case of > > > errot ? it was not done before your patch as well > > > > Yes, we should free cpus_to_visit if the policy notifier registration > > fails. But IMHO also, once the parsing of the capacity-dmips-mhz property > > is done. free cpus_to_visit is only used in the notifier call > > init_cpu_capacity_callback() after being allocated and initialized in > > register_cpufreq_notifier(). > > > > We could add something like this as the first patch of this set. Only > > mildly tested on Juno. Juri, what do you think? > > > > Author: Dietmar Eggemann > > Date: Tue Jun 13 23:21:59 2017 +0100 > > > > drivers base/arch_topology: free cpumask cpus_to_visit > > > > Free cpumask cpus_to_visit in case registering > > init_cpu_capacity_notifier has failed or the parsing of the cpu > > capacity-dmips-mhz property is done. The cpumask cpus_to_visit is > > only used inside the notifier call init_cpu_capacity_callback. > > > > Reported-by: Vincent Guittot > > Signed-off-by: Dietmar Eggemann > > your proposal for freeing cpus_to_visit looks good for me > > Acked-by: Vincent Guittot > Yep, looks good to me too. Thanks for fixing! Best, - Juri
Re: [PATCH] pci: iov: use device lock to protect IOV sysfs accesses
On Wed, Jun 14, 2017 at 09:47:26PM -0500, Bjorn Helgaas wrote: > > Signed-off-by: Jakub Kicinski > > Applied with Christoph's reviewed-by to pci/virtualization for v4.13, > thanks! Btw, given how you wanted the comments on locking for the reset methods it might be worth to comment the locking here as well.
Re: [PATCH 1/1] Support PTP Stick and Touchpad device
On Jun 14 2017 or thereabouts, Masaki Ota wrote: > From Masaki Ota > Support PTP Stick and Touchpad device. > This Touchpad is Precision Touchpad(PTP), > and Stick Pointer data is the same as Mouse. > So Stick Pointer works as Mouse. > > Signed-off-by: Masaki Ota > --- > drivers/hid/hid-ids.h| 2 ++ > drivers/hid/hid-multitouch.c | 24 ++-- > 2 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h > index 8ca1e8ce0af2..d36d4ac508f6 100644 > --- a/drivers/hid/hid-ids.h > +++ b/drivers/hid/hid-ids.h > @@ -75,6 +75,8 @@ > > #define USB_VENDOR_ID_ALPS_JP0x044E > #define HID_DEVICE_ID_ALPS_U1_DUAL 0x120B > +#define HID_DEVICE_ID_ALPS_U1_DUAL_PTP 0x121F > +#define HID_DEVICE_ID_ALPS_U1_DUAL_3BTN_PTP 0x1220 > > #define USB_VENDOR_ID_AMI0x046b > #define USB_DEVICE_ID_AMI_VIRT_KEYBOARD_AND_MOUSE0xff10 > diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c > index 24d5b6deb571..4ffdda9d80da 100644 > --- a/drivers/hid/hid-multitouch.c > +++ b/drivers/hid/hid-multitouch.c > @@ -161,6 +161,7 @@ static void mt_post_parse(struct mt_device *td); > #define MT_CLS_GENERALTOUCH_PWT_TENFINGERS 0x0109 > #define MT_CLS_LG0x010a > #define MT_CLS_VTL 0x0110 > +#define MT_CLS_WIN_8_DUAL0x0111 This is not vendor specific, so we should use 0x0014 and move this above in the list. > > #define MT_DEFAULT_MAXCONTACT10 > #define MT_MAX_MAXCONTACT250 > @@ -278,6 +279,13 @@ static struct mt_class mt_classes[] = { > MT_QUIRK_CONTACT_CNT_ACCURATE | > MT_QUIRK_FORCE_GET_FEATURE, > }, > + { .name = MT_CLS_WIN_8_DUAL, > + .quirks = MT_QUIRK_ALWAYS_VALID | > + MT_QUIRK_IGNORE_DUPLICATES | > + MT_QUIRK_HOVERING | > + MT_QUIRK_CONTACT_CNT_ACCURATE, > + .export_all_inputs = true > + }, Same than above, please move this just after MT_CLS_EXPORT_ALL_INPUTS. > { } > }; > > @@ -512,7 +520,8 @@ static int mt_touch_input_mapping(struct hid_device > *hdev, struct hid_input *hi, > mt_store_field(usage, td, hi); > return 1; > case HID_DG_CONFIDENCE: > - if (cls->name == MT_CLS_WIN_8 && > + if ((cls->name == MT_CLS_WIN_8 || > + cls->name == MT_CLS_WIN_8_DUAL) && > field->application == HID_DG_TOUCHPAD) > cls->quirks |= MT_QUIRK_CONFIDENCE; > mt_store_field(usage, td, hi); > @@ -579,7 +588,8 @@ static int mt_touch_input_mapping(struct hid_device > *hdev, struct hid_input *hi, >* MS PTP spec says that external buttons left and right have >* usages 2 and 3. >*/ > - if (cls->name == MT_CLS_WIN_8 && > + if ((cls->name == MT_CLS_WIN_8 || > + cls->name == MT_CLS_WIN_8_DUAL) && > field->application == HID_DG_TOUCHPAD && > (usage->hid & HID_USAGE) > 1) > code--; > @@ -1290,6 +1300,16 @@ static const struct hid_device_id mt_devices[] = { > MT_USB_DEVICE(USB_VENDOR_ID_3M, > USB_DEVICE_ID_3M3266) }, > > + /* Alps devices */ > + { .driver_data = MT_CLS_WIN_8_DUAL, > + HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, > + USB_VENDOR_ID_ALPS_JP, > + HID_DEVICE_ID_ALPS_U1_DUAL_PTP) }, > + { .driver_data = MT_CLS_WIN_8_DUAL, > + HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, > + USB_VENDOR_ID_ALPS_JP, > + HID_DEVICE_ID_ALPS_U1_DUAL_3BTN_PTP) }, > + > /* Anton devices */ > { .driver_data = MT_CLS_EXPORT_ALL_INPUTS, > MT_USB_DEVICE(USB_VENDOR_ID_ANTON, > -- > 2.11.0 > Rest looks good to me. Cheers, Benjamin
Re: Sleeping BUG in khugepaged for i586
On Wed 14-06-17 18:12:06, David Rientjes wrote: > On Thu, 8 Jun 2017, Michal Hocko wrote: > > > collapse_huge_page > > pte_offset_map > > kmap_atomic > > kmap_atomic_prot > > preempt_disable > > __collapse_huge_page_copy > > pte_unmap > > kunmap_atomic > > __kunmap_atomic > > preempt_enable > > > > I suspect, so cond_resched seems indeed inappropriate on 32b systems. > > > > Seems to be an issue for i386 and arm with ARM_LPAE. I'm slightly > surprised we can get away with __collapse_huge_page_swapin() for > VM_FAULT_RETRY, unless that hasn't been encountered yet. I do not see what you mean here or how is it related. __collapse_huge_page_swapin is called outside of pte_offset_map/pte_unmap section > I think the cond_resched() in __collapse_huge_page_copy() could be > done only for !in_atomic() if we choose. in_atomic() depends on having PREEMPT_COUNT enabled to work properly AFAIR. I haven't double checked and something might have changed since I've looked the last time. -- Michal Hocko SUSE Labs
Re: udf: allow implicit blocksize specification during mount
On Wed 14-06-17 21:36:45, Pali Rohár wrote: > On Tuesday 13 June 2017 14:59:55 Jan Kara wrote: > > Hi, > > > > On Mon 12-06-17 22:40:14, Pali Rohár wrote: > > > Hi! I found that following UDF patch was included into linus tree: > > > https://patchwork.kernel.org/patch/9524557/ > > > > > > It is really a good improvement to recognize UDF file system which > > > have block size different from disk sector size and also different > > > from 2048. > > > > > > But should not detection on 4K native disks (4096/4096) try to also > > > use block size of 512 bytes? Because current loop is from logical > > > sector size to 4096. > > > > By definition, bdev_logical_block_size() is the smallest block size a > > device can support. So if it is larger than 512, the device driver > > had explicitely declared that it cannot handle smaller blocks... > > Ok, but it is a really problem when trying to read data from filesystem > which has smaller blocks as the smallest block size of a device? > > In the worst case filesystem driver needs to read 512 bytes, but device > can send only block of 4096 bytes (as it does not support smaller > block). Driver receives 4096 bytes, then it process just first 512 bytes > and do not care about remaining data... Well, as much as I agree this is possible in principle, the block layer, block device page cache etc. don't handle this so it would be a non-trivial effort to support this. Honza -- Jan Kara SUSE Labs, CR
[PATCH] kbuild: fix header installation under fakechroot environment
Since commit fcc8487d477a ("uapi: export all headers under uapi directories") fakechroot make bindeb-pkg fails, mismatching files for directories: touch: cannot touch 'usr/include/video/uvesafb.h/.install': Not a directory This due to a bug in fakechroot: when using the function $(wildcard $(srcdir)/*/.) in a makefile, under a fakechroot environment, not only directories but also files are returned. To circumvent that, we are using the functions: $(sort $(dir $(wildcard $(srcdir)/*/ And thanks to Yamada Masahiro who figured out the right filter-out/patsubst order ! Fixes: fcc8487d477a ("uapi: export all headers under uapi directories") Signed-off-by: Richard Genoud --- scripts/Makefile.headersinst | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst index ce753a408c56..c583a1e1bd3c 100644 --- a/scripts/Makefile.headersinst +++ b/scripts/Makefile.headersinst @@ -14,7 +14,15 @@ __headers: include scripts/Kbuild.include srcdir:= $(srctree)/$(obj) -subdirs := $(patsubst $(srcdir)/%/.,%,$(wildcard $(srcdir)/*/.)) + +# When make is run under a fakechroot environment, the function +# $(wildcard $(srcdir)/*/.) doesn't only return directories, but also regular +# files. So, we are using a combination of sort/dir/wildcard which works +# with fakechroot. +subdirs := $(patsubst $(srcdir)/%/,%,\ +$(filter-out $(srcdir)/,\ +$(sort $(dir $(wildcard $(srcdir)/*/) + # caller may set destination dir (when installing to asm/) _dst := $(if $(dst),$(dst),$(obj))
Re: [PATCH v4 2/2] amd: uncore: Get correct number of cores sharing last level cache
On Wed, Jun 14, 2017 at 11:26:58AM -0500, Janakarajan Natarajan wrote: > In Family 17h, the number of cores sharing a cache level is obtained > from the Cache Properties CPUID leaf (0x801d) by passing in the > cache level in ECX. In prior families, a cache level of 2 was used to > determine this information. > > To get the right information, irrespective of Family, iterate over > the cache levels using CPUID 0x801d. The last level cache is the > last value to return a non-zero value in EAX. > > Signed-off-by: Janakarajan Natarajan > --- > arch/x86/events/amd/uncore.c | 19 --- > 1 file changed, 16 insertions(+), 3 deletions(-) Reviewed-by: Borislav Petkov -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCH v2] [media] mtk-mdp: Fix g_/s_selection capture/compose logic
On 06/15/17 08:29, Minghsiu Tsai wrote: > Hi, Hans, > > Would you have time to review this patch v2? > The patch v1 violates v4l2 spec. I have fixed it in v2. I plan to review it Friday or Monday. Regards, Hans > > > Sincerely, > Ming Hsiu > > On Fri, 2017-05-12 at 10:42 +0800, Minghsiu Tsai wrote: >> From: Daniel Kurtz >> >> Experiments show that the: >> (1) mtk-mdp uses the _MPLANE form of CAPTURE/OUTPUT >> (2) CAPTURE types use CROP targets, and OUTPUT types use COMPOSE targets >> >> Signed-off-by: Daniel Kurtz >> Signed-off-by: Minghsiu Tsai >> Signed-off-by: Houlong Wei >> >> --- >> Changes in v2: >> . Can not use *_MPLANE type in g_/s_selection >> --- >> drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c | 10 +- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c >> b/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c >> index 13afe48..e18ac626 100644 >> --- a/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c >> +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c >> @@ -838,10 +838,10 @@ static int mtk_mdp_m2m_g_selection(struct file *file, >> void *fh, >> bool valid = false; >> >> if (s->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) { >> -if (mtk_mdp_is_target_compose(s->target)) >> +if (mtk_mdp_is_target_crop(s->target)) >> valid = true; >> } else if (s->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) { >> -if (mtk_mdp_is_target_crop(s->target)) >> +if (mtk_mdp_is_target_compose(s->target)) >> valid = true; >> } >> if (!valid) { >> @@ -908,10 +908,10 @@ static int mtk_mdp_m2m_s_selection(struct file *file, >> void *fh, >> bool valid = false; >> >> if (s->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) { >> -if (s->target == V4L2_SEL_TGT_COMPOSE) >> +if (s->target == V4L2_SEL_TGT_CROP) >> valid = true; >> } else if (s->type == V4L2_BUF_TYPE_VIDEO_OUTPUT) { >> -if (s->target == V4L2_SEL_TGT_CROP) >> +if (s->target == V4L2_SEL_TGT_COMPOSE) >> valid = true; >> } >> if (!valid) { >> @@ -925,7 +925,7 @@ static int mtk_mdp_m2m_s_selection(struct file *file, >> void *fh, >> if (ret) >> return ret; >> >> -if (mtk_mdp_is_target_crop(s->target)) >> +if (mtk_mdp_is_target_compose(s->target)) >> frame = &ctx->s_frame; >> else >> frame = &ctx->d_frame; > >
[PATCH v3 02/11] ARC: send ipi to all cpus sharing task mm in case of page fault
From: Noam Camus This patch is derived due to performance issue. The use case is a page fault that resides on more than the local cpu. Trying to broadcast all CPUs results on performance degradation. So we try to avoid this by sending only to the relevant CPUs. Signed-off-by: Noam Camus Reviewed-by: Alexey Brodkin --- arch/arc/include/asm/cacheflush.h |3 ++- arch/arc/mm/cache.c | 12 ++-- arch/arc/mm/tlb.c |2 +- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h index fc662f4..716dba1 100644 --- a/arch/arc/include/asm/cacheflush.h +++ b/arch/arc/include/asm/cacheflush.h @@ -33,7 +33,8 @@ void flush_icache_range(unsigned long kstart, unsigned long kend); void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len); -void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr); +void __inv_icache_page(struct vm_area_struct *vma, + phys_addr_t paddr, unsigned long vaddr); void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr); #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1 diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c index bdb5227..bfad0fa 100644 --- a/arch/arc/mm/cache.c +++ b/arch/arc/mm/cache.c @@ -934,9 +934,17 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len) } /* wrapper to compile time eliminate alignment checks in flush loop */ -void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr) +void __inv_icache_page(struct vm_area_struct *vma, + phys_addr_t paddr, unsigned long vaddr) { - __ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE); + struct ic_inv_args ic_inv = { + .paddr = paddr, + .vaddr = vaddr, + .sz = PAGE_SIZE + }; + + on_each_cpu_mask(mm_cpumask(vma->vm_mm), +__ic_line_inv_vaddr_helper, &ic_inv, 1); } /* diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c index 2b6da60..e298da9 100644 --- a/arch/arc/mm/tlb.c +++ b/arch/arc/mm/tlb.c @@ -626,7 +626,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned, /* invalidate any existing icache lines (U-mapping) */ if (vma->vm_flags & VM_EXEC) - __inv_icache_page(paddr, vaddr); + __inv_icache_page(vma, paddr, vaddr); } } } -- 1.7.1
[PATCH v3 04/11] ARC: Add CPU topology
From: Noam Camus Now it is used for NPS SoC for multi-core of 256 cores and SMT of 16 HW threads per core. This way with topology the scheduler is much efficient in creating domains and later using them. Signed-off-by: Noam Camus --- arch/arc/Kconfig| 27 arch/arc/include/asm/Kbuild |1 - arch/arc/include/asm/topology.h | 34 +++ arch/arc/kernel/Makefile|1 + arch/arc/kernel/setup.c |4 +- arch/arc/kernel/smp.c |5 ++ arch/arc/kernel/topology.c | 125 +++ 7 files changed, 194 insertions(+), 3 deletions(-) create mode 100644 arch/arc/include/asm/topology.h create mode 100644 arch/arc/kernel/topology.c diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index f464f97..08a9003 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -202,6 +202,33 @@ config ARC_SMP_HALT_ON_RESET at designated entry point. For other case, all jump to common entry point and spin wait for Master's signal. +config NPS_CPU_TOPOLOGY + bool "Support cpu topology definition" + depends on EZNPS_MTM_EXT + default y + help + Support NPS cpu topology definition. + NPS400 got 16 clusters of cores. + NPS400 cluster got 16 cores. + NPS core got 16 symetrical threads. + Totally there are such 4096 threads (NR_CPUS=4096) + +config SCHED_MC + bool "Multi-core scheduler support" + depends on NPS_CPU_TOPOLOGY + help + Multi-core scheduler support improves the CPU scheduler's decision + making when dealing with multi-core CPU chips at a cost of slightly + increased overhead in some places. If unsure say N here. + +config SCHED_SMT + bool "SMT scheduler support" + depends on NPS_CPU_TOPOLOGY + help + Improves the CPU scheduler's decision making when dealing with + MultiThreading at a cost of slightly increased overhead in some + places. If unsure say N here. + endif #SMP config ARC_MCIP diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index 7bee4e4..d8cb607 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -43,7 +43,6 @@ generic-y += stat.h generic-y += statfs.h generic-y += termbits.h generic-y += termios.h -generic-y += topology.h generic-y += trace_clock.h generic-y += types.h generic-y += ucontext.h diff --git a/arch/arc/include/asm/topology.h b/arch/arc/include/asm/topology.h new file mode 100644 index 000..a9be3f8 --- /dev/null +++ b/arch/arc/include/asm/topology.h @@ -0,0 +1,34 @@ +#ifndef _ASM_ARC_TOPOLOGY_H +#define _ASM_ARC_TOPOLOGY_H + +#ifdef CONFIG_NPS_CPU_TOPOLOGY + +#include + +struct cputopo_nps { + int thread_id; + int core_id; + cpumask_t thread_sibling; + cpumask_t core_sibling; +}; + +extern struct cputopo_nps cpu_topology[NR_CPUS]; + +#define topology_core_id(cpu) (cpu_topology[cpu].core_id) +#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) +#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) + +void init_cpu_topology(void); +void store_cpu_topology(unsigned int cpuid); +const struct cpumask *cpu_coregroup_mask(int cpu); + +#else + +static inline void init_cpu_topology(void) { } +static inline void store_cpu_topology(unsigned int cpuid) { } + +#endif + +#include + +#endif /* _ASM_ARC_TOPOLOGY_H */ diff --git a/arch/arc/kernel/Makefile b/arch/arc/kernel/Makefile index 8942c5c..46af80a 100644 --- a/arch/arc/kernel/Makefile +++ b/arch/arc/kernel/Makefile @@ -23,6 +23,7 @@ obj-$(CONFIG_ARC_EMUL_UNALIGNED) += unaligned.o obj-$(CONFIG_KGDB) += kgdb.o obj-$(CONFIG_ARC_METAWARE_HLINK) += arc_hostlink.o obj-$(CONFIG_PERF_EVENTS) += perf_event.o +obj-$(CONFIG_NPS_CPU_TOPOLOGY) += topology.o obj-$(CONFIG_ARC_FPU_SAVE_RESTORE) += fpu.o CFLAGS_fpu.o += -mdpfp diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c index de29ea9..379ebda 100644 --- a/arch/arc/kernel/setup.c +++ b/arch/arc/kernel/setup.c @@ -571,14 +571,14 @@ static void c_stop(struct seq_file *m, void *v) .show = show_cpuinfo }; -static DEFINE_PER_CPU(struct cpu, cpu_topology); +static DEFINE_PER_CPU(struct cpu, cpu_topo_info); static int __init topology_init(void) { int cpu; for_each_present_cpu(cpu) - register_cpu(&per_cpu(cpu_topology, cpu), cpu); + register_cpu(&per_cpu(cpu_topo_info, cpu), cpu); return 0; } diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c index f462671..91668c5 100644 --- a/arch/arc/kernel/smp.c +++ b/arch/arc/kernel/smp.c @@ -67,6 +67,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus) { int i; + init_cpu_topology(); + store_cpu_topology(smp_processor_id()); + /* * if platform didn't set the present map already, do it n
[PATCH v3 06/11] ARC: [NUMA] added CONFIG_NUMA for plat-eznps
From: Noam Camus This is needed for NPS400 where high memory is assigned to node1 where the associated addresses are lower than node0. This use case is not typical and just using discontigmem is not enough since nodes assumed to have increasing address range. i.e. address range of node0 assumed to be lower than node1. Signed-off-by: Noam Camus --- arch/arc/Kconfig|9 + arch/arc/include/asm/topology.h |6 ++ arch/arc/kernel/setup.c |3 +++ arch/arc/mm/init.c |6 ++ 4 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 982bd18..18c37de 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -378,6 +378,15 @@ config ARC_HUGEPAGE_16M endchoice +config NUMA + bool "NUMA Memory Allocation and Scheduler Support" + depends on SMP && DISCONTIGMEM + default y if ARC_PLAT_EZNPS + ---help--- + NUMA memory allocation is required for NPS400 processors. + The reason is that node1 in NPS400 is assigned to lower + addresses than node0, which is not typical scenario. + config NODES_SHIFT int "Maximum NUMA Nodes (as a power of 2)" default "0" if !DISCONTIGMEM diff --git a/arch/arc/include/asm/topology.h b/arch/arc/include/asm/topology.h index a9be3f8..dfbc2ab 100644 --- a/arch/arc/include/asm/topology.h +++ b/arch/arc/include/asm/topology.h @@ -1,6 +1,12 @@ #ifndef _ASM_ARC_TOPOLOGY_H #define _ASM_ARC_TOPOLOGY_H +#ifdef CONFIG_NUMA +#define cpu_to_node(cpu) ((void)(cpu), 0) +#define parent_node(node) (node) +#define cpumask_of_node(node) ((void)node, cpu_online_mask) +#endif + #ifdef CONFIG_NPS_CPU_TOPOLOGY #include diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c index 379ebda..3d1509b 100644 --- a/arch/arc/kernel/setup.c +++ b/arch/arc/kernel/setup.c @@ -577,6 +577,9 @@ static int __init topology_init(void) { int cpu; + for_each_online_node(cpu) + register_one_node(cpu); + for_each_present_cpu(cpu) register_cpu(&per_cpu(cpu_topo_info, cpu), cpu); diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c index 8c9415e..f9f80d9 100644 --- a/arch/arc/mm/init.c +++ b/arch/arc/mm/init.c @@ -113,6 +113,10 @@ void __init setup_arch_memory(void) init_mm.end_data = (unsigned long)_edata; init_mm.brk = (unsigned long)_end; + node_set_online(0); + node_set_state(0, N_MEMORY); + node_set_state(0, N_NORMAL_MEMORY); + /* first page of system - kernel .vector starts here */ min_low_pfn = ARCH_PFN_OFFSET; @@ -182,6 +186,8 @@ void __init setup_arch_memory(void) * populated with normal memory zone while node 1 only has highmem */ node_set_online(1); + node_set_state(1, N_MEMORY); + node_set_state(1, N_HIGH_MEMORY); min_high_pfn = PFN_DOWN(high_mem_start); max_high_pfn = PFN_DOWN(high_mem_start + high_mem_sz); -- 1.7.1
[PATCH v3 11/11] ARC: [plat-eznps] avoid toggling of DPC register
From: Elad Kanfi HW bug description: in case of HW thread context switch the dpc configuration of the exiting thread is dragged one cycle into the next thread. In order to avoid the consequences of this bug, the DPC register is set to an initial value, and not changed afterwards. Signed-off-by: Elad Kanfi Signed-off-by: Noam Camus --- arch/arc/plat-eznps/include/plat/ctop.h |1 + arch/arc/plat-eznps/mtm.c | 12 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/arc/plat-eznps/include/plat/ctop.h b/arch/arc/plat-eznps/include/plat/ctop.h index 7729d3d..0c7d110 100644 --- a/arch/arc/plat-eznps/include/plat/ctop.h +++ b/arch/arc/plat-eznps/include/plat/ctop.h @@ -39,6 +39,7 @@ #define CTOP_AUX_LOGIC_CORE_ID (CTOP_AUX_BASE + 0x018) #define CTOP_AUX_MT_CTRL (CTOP_AUX_BASE + 0x020) #define CTOP_AUX_HW_COMPLY (CTOP_AUX_BASE + 0x024) +#define CTOP_AUX_DPC (CTOP_AUX_BASE + 0x02C) #define CTOP_AUX_LPC (CTOP_AUX_BASE + 0x030) #define CTOP_AUX_EFLAGS(CTOP_AUX_BASE + 0x080) #define CTOP_AUX_IACK (CTOP_AUX_BASE + 0x088) diff --git a/arch/arc/plat-eznps/mtm.c b/arch/arc/plat-eznps/mtm.c index 9c78ad6..909bbd4 100644 --- a/arch/arc/plat-eznps/mtm.c +++ b/arch/arc/plat-eznps/mtm.c @@ -110,6 +110,18 @@ void mtm_enable_core(unsigned int cpu) int i; struct nps_host_reg_aux_mt_ctrl mt_ctrl; struct nps_host_reg_mtm_cfg mtm_cfg; + struct nps_host_reg_aux_dpc dpc; + + /* +* Initializing dpc register in each CPU. +* Overwriting the init value of the DPC +* register so that CMEM and FMT virtual address +* spaces are accessible, and Data Plane HW +* facilities are enabled. +*/ + dpc.ien = 1; + dpc.men = 1; + write_aux_reg(CTOP_AUX_DPC, dpc.value); if (NPS_CPU_TO_THREAD_NUM(cpu) != 0) return; -- 1.7.1
[PATCH v3 05/11] ARC: Support more than one PGDIR for KVADDR
From: Noam Camus This way FIXMAP can have 2 PTEs per CPU even for NR_CPUS=4096 For the extreme case like in eznps platform We use all gutter between kernel and user. Signed-off-by: Noam Camus --- arch/arc/Kconfig | 11 +++ arch/arc/include/asm/highmem.h |8 +--- arch/arc/include/asm/pgtable.h |9 + arch/arc/include/asm/processor.h |5 +++-- arch/arc/mm/fault.c |8 arch/arc/mm/highmem.c| 16 +++- arch/arc/mm/tlbex.S | 31 +++ 7 files changed, 78 insertions(+), 10 deletions(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 08a9003..982bd18 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -477,6 +477,17 @@ config ARC_HAS_PAE40 Enable access to physical memory beyond 4G, only supported on ARC cores with 40 bit Physical Addressing support +config HIGHMEM_PGDS_SHIFT + int "log num of PGDs for HIGHMEM" + range 0 5 + default "0" if !ARC_PLAT_EZNPS || !HIGHMEM + default "5" if ARC_PLAT_EZNPS + help + This way we can map more pages for HIGHMEM. + Single PGD (2M) is supporting 256 PTEs (8K PAGE_SIZE) + For FIXMAP where at least 2 PTEs are needed per CPU + large NR_CPUS e.g. 4096 will consume 32 PGDs + config ARCH_PHYS_ADDR_T_64BIT def_bool ARC_HAS_PAE40 diff --git a/arch/arc/include/asm/highmem.h b/arch/arc/include/asm/highmem.h index b1585c9..c5cb473 100644 --- a/arch/arc/include/asm/highmem.h +++ b/arch/arc/include/asm/highmem.h @@ -17,13 +17,13 @@ /* start after vmalloc area */ #define FIXMAP_BASE(PAGE_OFFSET - FIXMAP_SIZE - PKMAP_SIZE) -#define FIXMAP_SIZEPGDIR_SIZE /* only 1 PGD worth */ -#define KM_TYPE_NR ((FIXMAP_SIZE >> PAGE_SHIFT)/NR_CPUS) +#define FIXMAP_SIZE(PGDIR_SIZE * _BITUL(CONFIG_HIGHMEM_PGDS_SHIFT)) +#define KM_TYPE_NR (((FIXMAP_SIZE >> PAGE_SHIFT)/NR_CPUS) > 2 ?: 2) #define FIXMAP_ADDR(nr)(FIXMAP_BASE + ((nr) << PAGE_SHIFT)) /* start after fixmap area */ #define PKMAP_BASE (FIXMAP_BASE + FIXMAP_SIZE) -#define PKMAP_SIZE PGDIR_SIZE +#define PKMAP_SIZE (PGDIR_SIZE * _BITUL(CONFIG_HIGHMEM_PGDS_SHIFT)) #define LAST_PKMAP (PKMAP_SIZE >> PAGE_SHIFT) #define LAST_PKMAP_MASK(LAST_PKMAP - 1) #define PKMAP_ADDR(nr) (PKMAP_BASE + ((nr) << PAGE_SHIFT)) @@ -32,6 +32,7 @@ #define kmap_prot PAGE_KERNEL +#ifndef __ASSEMBLY__ #include extern void *kmap(struct page *page); @@ -54,6 +55,7 @@ static inline void kunmap(struct page *page) return; kunmap_high(page); } +#endif /* __ASSEMBLY__ */ #endif diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h index 08fe338..d08e207 100644 --- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -224,6 +224,8 @@ #definePTRS_PER_PTE_BITUL(BITS_FOR_PTE) #definePTRS_PER_PGD_BITUL(BITS_FOR_PGD) +#define PTRS_HMEM_PTE _BITUL(BITS_FOR_PTE + CONFIG_HIGHMEM_PGDS_SHIFT) + /* * Number of entries a user land program use. * TASK_SIZE is the maximum vaddr that can be used by a userland program. @@ -285,7 +287,14 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep) /* Don't use virt_to_pfn for macros below: could cause truncations for PAE40*/ #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT) +#if CONFIG_HIGHMEM_PGDS_SHIFT +#define __pte_index(addr) (((addr) >= VMALLOC_END) ? \ + (((addr) >> PAGE_SHIFT) & (PTRS_HMEM_PTE - 1)) \ + : \ + (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))) +#else #define __pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +#endif /* * pte_offset gets a @ptr to PMD entry (PGD in our 2-tier paging system) diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h index 6e1242d..fd7bdfa 100644 --- a/arch/arc/include/asm/processor.h +++ b/arch/arc/include/asm/processor.h @@ -121,8 +121,9 @@ extern void start_thread(struct pt_regs * regs, unsigned long pc, #define VMALLOC_START (PAGE_OFFSET - (CONFIG_ARC_KVADDR_SIZE << 20)) -/* 1 PGDIR_SIZE each for fixmap/pkmap, 2 PGDIR_SIZE gutter (see asm/highmem.h) */ -#define VMALLOC_SIZE ((CONFIG_ARC_KVADDR_SIZE << 20) - PGDIR_SIZE * 4) +/* 1 << CONFIG_HIGHMEM_PGDS_SHIFT PGDIR_SIZE each for fixmap/pkmap */ +#define VMALLOC_SIZE ((CONFIG_ARC_KVADDR_SIZE << 20) - \ + PGDIR_SIZE * _BITUL(CONFIG_HIGHMEM_PGDS_SHIFT) * 2) #define VMALLOC_END(VMALLOC_START + VMALLOC_SIZE) diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index a0b7bd6..fd89c9a 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -17,6 +17,7 @@
Re: [PATCH v2] perf: libdw support for powerpc [ping]
On Tuesday, June 13, 2017 5:55:09 PM CEST Ravi Bangoria wrote: > Hi Mark, > > On Tuesday 13 June 2017 05:14 PM, Mark Wielaard wrote: > > I see the same on very short runs. But when doing a slightly longer run, > > even just using ls -lahR, which does some more work, then I do see user > > backtraces. They are still missing for some of the early samples though. > > It is as if there is a stack/memory address mismatch when the probe is > > "too early" in ld.so. > > > > Could you do a test run on some program that does some more work to see > > if you never get any user stack traces, or if you only not get them for > > some specific probes? > > Thanks for checking. I tried a proper workload this time, but I still > don't see any userspace callchain getting unwound. > > $ ./perf record --call-graph=dwarf -- zip -q -r temp.zip . > [ perf record: Woken up 2891 times to write data ] > [ perf record: Captured and wrote 723.290 MB perf.data (87934 samples) ] > > > With libdw: > > $ LD_LIBRARY_PATH=/home/ravi/elfutils-git/usr/local/lib:\ > /home/ravi/elfutils-git/usr/local/lib/elfutils/:$LD_LIBRARY_PATH\ > ./perf script > > zip 16699 6857.354633: 37371 cycles:u: >ecedc xmon_core > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) 8c4fc > __hash_page_64K (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) > 83450 hash_preload > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) 7cc34 > update_mmu_cache (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) > 330064 alloc_set_pte > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) 330efc do_fault > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) 334580 > __handle_mm_fault (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) > 335040 handle_mm_fault > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) 7bf94 > do_page_fault (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) > 7bec4 do_page_fault > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) 7be78 > do_page_fault (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) > 1a4f8 handle_page_fault > (/usr/lib/debug/lib/modules/4.11.0-3.el7.ppc64le/vmlinux) > > zip 16699 6857.354663: 300677 cycles:u: > > zip 16699 6857.354895: 584131 cycles:u: > > zip 16699 6857.355312: 589687 cycles:u: > > zip 16699 6857.355606: 560142 cycles:u: Just a quick question: Have you guys applied my recent patch: commit 5ea0416f51cc93436bbe497c62ab49fd9cb245b6 Author: Milian Wolff Date: Thu Jun 1 23:00:21 2017 +0200 perf report: Include partial stacks unwound with libdw So far the whole stack was thrown away when any error occurred before the maximum stack depth was unwound. This is actually a very common scenario though. The stacks that got unwound so far are still interesting. This removes a large chunk of differences when comparing perf script output for libunwind and libdw perf unwinding. If not, then this could explain the issue you are seeing. Cheers -- Milian Wolff | milian.wo...@kdab.com | Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts smime.p7s Description: S/MIME cryptographic signature
[PATCH v3 07/11] ARC: [plat-eznps] new command line argument for HW scheduler at MTM
From: Noam Camus We add ability for all cores at NPS SoC to control the number of cycles HW thread can execute before it is replace with another eligible HW thread within the same core. The replacement is done by the HW scheduler. Signed-off-by: Noam Camus --- Documentation/admin-guide/kernel-parameters.txt |9 arch/arc/plat-eznps/mtm.c | 46 ++- 2 files changed, 53 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 15f79c2..5b551f7 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2693,6 +2693,15 @@ If the dependencies are under your control, you can turn on cpu0_hotplug. + nps_mtm_hs_ctr= [KNL,ARC] + This parameter sets the maximum duration, in + cycles, each HW thread of the CTOP can run + without interruptions, before HW switches it. + The actual maximum duration is 16 times this + parameter's value. + Format: integer between 1 and 255 + Default: 255 + nptcg= [IA-64] Override max number of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or SAL PALO. diff --git a/arch/arc/plat-eznps/mtm.c b/arch/arc/plat-eznps/mtm.c index dcbf8f6..9c78ad6 100644 --- a/arch/arc/plat-eznps/mtm.c +++ b/arch/arc/plat-eznps/mtm.c @@ -21,10 +21,13 @@ #include #include -#define MT_CTRL_HS_CNT 0xFF +#define MT_HS_CNT_MIN 0x01 +#define MT_HS_CNT_MAX 0xFF #define MT_CTRL_ST_CNT 0xF #define NPS_NUM_HW_THREADS 0x10 +static int mtm_hs_ctr = MT_HS_CNT_MAX; + #ifdef CONFIG_EZNPS_MEM_ERROR_ALIGN int do_memory_error(unsigned long address, struct pt_regs *regs) { @@ -127,7 +130,7 @@ void mtm_enable_core(unsigned int cpu) /* Enable HW schedule, stall counter, mtm */ mt_ctrl.value = 0; mt_ctrl.hsen = 1; - mt_ctrl.hs_cnt = MT_CTRL_HS_CNT; + mt_ctrl.hs_cnt = mtm_hs_ctr; mt_ctrl.mten = 1; write_aux_reg(CTOP_AUX_MT_CTRL, mt_ctrl.value); @@ -138,3 +141,42 @@ void mtm_enable_core(unsigned int cpu) */ cpu_relax(); } + +/* Handle an out of bounds mtm hs counter value */ +static void __init handle_mtm_hs_ctr_out_of_bounds_error(uint8_t val) +{ + pr_err("** The value must be in range [%d,%d] (inclusive)\n", + MT_HS_CNT_MIN, MT_HS_CNT_MAX); + + mtm_hs_ctr = val; +} + +/* Verify and set the value of the mtm hs counter */ +static int __init set_mtm_hs_ctr(char *ctr_str) +{ + int ret; + long hs_ctr; + + ret = kstrtol(ctr_str, 0, &hs_ctr); + if (ret) { + pr_err("** Out of range mtm_hs_ctr, using default value %d\n", + MT_HS_CNT_MAX); + mtm_hs_ctr = MT_HS_CNT_MAX; + return -EINVAL; + } + + if (hs_ctr > MT_HS_CNT_MAX) { + handle_mtm_hs_ctr_out_of_bounds_error(MT_HS_CNT_MAX); + return -EDOM; + } + + if (hs_ctr < MT_HS_CNT_MIN) { + handle_mtm_hs_ctr_out_of_bounds_error(MT_HS_CNT_MIN); + return -EDOM; + } + + mtm_hs_ctr = hs_ctr; + + return 0; +} +early_param("nps_mtm_hs_ctr", set_mtm_hs_ctr); -- 1.7.1
RE: [RFC][PATCH 0/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed
Dear Baoquan, > > Our customer reported that Kernel text may be located on non-mirror > > region (movable zone) when both address range mirroring feature and > > KASLR are enabled. I know your customer :) > > The functions of address range mirroring feature are as follows. > > - The physical memory region whose descriptors in EFI memory map have > > EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored > > - The function arranges such mirror region into normal zone and other > region > > into movable zone in order to locate kernel code and data on mirror > > region > > > > So we need restrict kernel to be located inside mirror region if it is > > existed. > > > > The method is very simple. If efi is enabled, just iterate all efi > > memory map and pick up mirror region to process for adding candidate > > of slot. If efi disabled or no mirror region existed, still process > > e820 memory map. This won't bring much efficiency loss, at worst we > > just go through all efi memory maps and found no mirror. > > > > One question: > > From code, though mirror regions are existed, they are meaningful only > > if kernelcore=mirror kernel option is specified. Not sure if my > > understanding is correct. Your understanding is almost correct. Only when "kernelcore=mirror" specified, the above procedure works. But, if mirrored regions are existed, bootmem allocator tries to allocate from mirrored region independently of "kerenelcore=mirror" option. So, IMHO, kernel text is important, so putting it to mirrored (more reliable) region is reasonable whether or not "kernelcore=mirror" is specified. Anyway thanks for submitting patch. We have Address Range Mirroring capable machine, so we'll test your patch. Sincerely, Taku Izumi > > Since you are the author of kernelcore=mirror related code and expert on > mirror feature, could you help answer above question? > > Thanks > Baoquan > > > > NOTE: > > I haven't got a machine with efi mirror region enabled, so only test > > the > > e820 map processing case and the case of no mirror region on efi machine. > > So set this as a RFC patchset, will post formal one after above > > question is made clear and mirror issue test passed. > > > > Baoquan He (2): > > x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map > > x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if > > existed > > > > arch/x86/boot/compressed/kaslr.c | 129 > > +++ > > 1 file changed, 104 insertions(+), 25 deletions(-) > > > > -- > > 2.5.5 > >
[PATCH v3 08/11] ARC: [plat-eznps] Update the init sequence of aux regs per cpu.
From: Liav Rehana This commit add new configuration that enables us to distinguish between building the kernel for platforms that have a different set of auxiliary registers for each cpu and platforms that have a shared set of auxiliary registers across every thread in each core. On platforms that implement a different set of auxiliary registers disabling this configuration insures that we initialize registers on every cpu and not just for the first thread of the core. Example for non shared registers is working with EZsim (non silicon) Signed-off-by: Liav Rehana Signed-off-by: Noam Camus --- arch/arc/plat-eznps/Kconfig | 11 +++ arch/arc/plat-eznps/entry.S |2 +- 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/arc/plat-eznps/Kconfig b/arch/arc/plat-eznps/Kconfig index b36afb1..e151e20 100644 --- a/arch/arc/plat-eznps/Kconfig +++ b/arch/arc/plat-eznps/Kconfig @@ -43,3 +43,14 @@ config EZNPS_MEM_ERROR_ALIGN simulator platform for NPS, is handled as a Level 2 interrupt (just a stock ARC700) which is recoverable. This option makes simulator behave like hardware. + +config EZNPS_SHARED_AUX_REGS + bool "ARC-EZchip Shared Auxiliary Registers Per Core" + depends on ARC_PLAT_EZNPS + default y + help + On the real chip of the NPS, auxiliary registers are shared between + all the cpus of the core, whereas on simulator platform for NPS, + each cpu has a different set of auxiliary registers. Configuration + should be unset if auxiliary registers are not shared between the cpus + of the core, so there will be a need to initialize them per cpu. diff --git a/arch/arc/plat-eznps/entry.S b/arch/arc/plat-eznps/entry.S index 328261c..091c92c 100644 --- a/arch/arc/plat-eznps/entry.S +++ b/arch/arc/plat-eznps/entry.S @@ -27,7 +27,7 @@ .align 1024 ; HW requierment for restart first PC ENTRY(res_service) -#ifdef CONFIG_EZNPS_MTM_EXT +#if defined(CONFIG_EZNPS_MTM_EXT) && defined(CONFIG_EZNPS_SHARED_AUX_REGS) ; There is no work for HW thread id != 0 lr r3, [CTOP_AUX_THREAD_ID] cmp r3, 0 -- 1.7.1
[PATCH v3 03/11] ARC: Allow irq threading
From: Noam Camus Working with NPS400 we noticed that there is a possibility of L1 interrupt nesting that may run out kernel stack. The scenario include serving invoke_softirqs() from irq_exit() and once local_irq_enable() called can hit another one before we managed to restore last one and pop some place from kernel stack. Serving softirqs at dedicated kernel thread may mitigate this. We see that many architectures, including x86, behave like this. Note 1: All interrupts which must be non threaded should be marked IRQF_NO_THREAD. Note 2: using kernel param "threadirqs" is needed to actually turn this on. This configuration is only a preperation. Signed-off-by: Noam Camus --- arch/arc/Kconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index a545969..f464f97 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -33,6 +33,7 @@ config ARC select HAVE_OPROFILE select HAVE_PERF_EVENTS select HANDLE_DOMAIN_IRQ + select IRQ_FORCED_THREADING select IRQ_DOMAIN select MODULES_USE_ELF_RELA select NO_BOOTMEM -- 1.7.1
[PATCH v3 09/11] ARC: [plat-eznps] Save/Restore extra auxiliary registers
From: Noam Camus thread_struct got new field for data plane of eznps platform. This field got place for data plane auxiliary registers and for any extra registers that might be changed in kernel code. We save EFLAGS, and GPA1 auxiliary registers since they may be changed by the new task while using atomic operations e.g. cmpxchg. Signed-off-by: Noam Camus --- arch/arc/include/asm/arcregs.h |7 +++ arch/arc/include/asm/processor.h |3 +++ arch/arc/include/asm/switch_to.h | 11 +++ arch/arc/plat-eznps/Makefile |2 +- arch/arc/plat-eznps/ctop.c | 33 + 5 files changed, 55 insertions(+), 1 deletions(-) create mode 100644 arch/arc/plat-eznps/ctop.c diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h index ba8e802..9437d42 100644 --- a/arch/arc/include/asm/arcregs.h +++ b/arch/arc/include/asm/arcregs.h @@ -123,6 +123,13 @@ #define PAGES_TO_MB(n_pages) (PAGES_TO_KB(n_pages) >> 10) +#ifdef CONFIG_ARC_PLAT_EZNPS +struct eznps_dp { + unsigned int eflags; + unsigned int gpa1; +}; +#endif + /* *** * Build Configuration Registers, with encoded hardware config diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h index fd7bdfa..130bb55 100644 --- a/arch/arc/include/asm/processor.h +++ b/arch/arc/include/asm/processor.h @@ -38,6 +38,9 @@ struct thread_struct { #ifdef CONFIG_ARC_FPU_SAVE_RESTORE struct arc_fpu fpu; #endif +#ifdef CONFIG_ARC_PLAT_EZNPS + struct eznps_dp dp; +#endif }; #define INIT_THREAD { \ diff --git a/arch/arc/include/asm/switch_to.h b/arch/arc/include/asm/switch_to.h index 1b171ab..4c53080 100644 --- a/arch/arc/include/asm/switch_to.h +++ b/arch/arc/include/asm/switch_to.h @@ -26,13 +26,24 @@ #endif /* !CONFIG_ARC_FPU_SAVE_RESTORE */ +#ifdef CONFIG_ARC_PLAT_EZNPS +extern void dp_save_restore(struct task_struct *p, struct task_struct *n); +#define ARC_DP_PREV(p, n) dp_save_restore(p, n) +#define ARC_DP_NEXT(t) +#else +#define ARC_DP_PREV(p, n) +#define ARC_DP_NEXT(n) +#endif /* !CONFIG_ARC_PLAT_EZNPS */ + struct task_struct *__switch_to(struct task_struct *p, struct task_struct *n); #define switch_to(prev, next, last)\ do { \ + ARC_DP_PREV(prev, next);\ ARC_FPU_PREV(prev, next); \ last = __switch_to(prev, next);\ ARC_FPU_NEXT(next); \ + ARC_DP_NEXT(next); \ mb(); \ } while (0) diff --git a/arch/arc/plat-eznps/Makefile b/arch/arc/plat-eznps/Makefile index 21091b1..8d43717 100644 --- a/arch/arc/plat-eznps/Makefile +++ b/arch/arc/plat-eznps/Makefile @@ -2,6 +2,6 @@ # Makefile for the linux kernel. # -obj-y := entry.o platform.o +obj-y := entry.o platform.o ctop.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_EZNPS_MTM_EXT) += mtm.o diff --git a/arch/arc/plat-eznps/ctop.c b/arch/arc/plat-eznps/ctop.c new file mode 100644 index 000..8b13a08 --- /dev/null +++ b/arch/arc/plat-eznps/ctop.c @@ -0,0 +1,33 @@ +/* + * Copyright(c) 2015 EZchip Technologies. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + */ + +#include +#include +#include + +void dp_save_restore(struct task_struct *prev, struct task_struct *next) +{ + struct eznps_dp *prev_task_dp = &prev->thread.dp; + struct eznps_dp *next_task_dp = &next->thread.dp; + + /* Here we save all Data Plane related auxiliary registers */ + prev_task_dp->eflags = read_aux_reg(CTOP_AUX_EFLAGS); + write_aux_reg(CTOP_AUX_EFLAGS, next_task_dp->eflags); + + prev_task_dp->gpa1 = read_aux_reg(CTOP_AUX_GPA1); + write_aux_reg(CTOP_AUX_GPA1, next_task_dp->gpa1); +} + -- 1.7.1
[PATCH v3 10/11] ARC: [plat-eznps] handle dedicated AUX registers
From: Liav Rehana Preserve eflags and gpa1 auxiliaries during exception Registers used by compare exchange instructions. GPA1 is used for compare value, and EFLAGS got bit reflects atomic operation response. EFLAGS is zeroed for each new user task so it won't get its parent value. Signed-off-by: Noam Camus --- arch/arc/include/asm/entry-compact.h | 24 arch/arc/include/asm/ptrace.h|5 + arch/arc/kernel/process.c|4 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/arch/arc/include/asm/entry-compact.h b/arch/arc/include/asm/entry-compact.h index 14c310f..9e4458a 100644 --- a/arch/arc/include/asm/entry-compact.h +++ b/arch/arc/include/asm/entry-compact.h @@ -192,6 +192,12 @@ PUSHAX lp_start PUSHAX erbta +#ifdef CONFIG_ARC_PLAT_EZNPS + .word CTOP_INST_SCHD_RW + PUSHAX CTOP_AUX_GPA1 + PUSHAX CTOP_AUX_EFLAGS +#endif +` lr r9, [ecr] st r9, [sp, PT_event]/* EV_Trap expects r9 to have ECR */ .endm @@ -208,6 +214,12 @@ * by hardware and that is not good. *-*/ .macro EXCEPTION_EPILOGUE +#ifdef CONFIG_ARC_PLAT_EZNPS + .word CTOP_INST_SCHD_RW + POPAX CTOP_AUX_EFLAGS + POPAX CTOP_AUX_GPA1 +#endif + POPAX erbta POPAX lp_start POPAX lp_end @@ -265,6 +277,12 @@ PUSHAX lp_end PUSHAX lp_start PUSHAX bta_l\LVL\() + +#ifdef CONFIG_ARC_PLAT_EZNPS + .word CTOP_INST_SCHD_RW + PUSHAX CTOP_AUX_GPA1 + PUSHAX CTOP_AUX_EFLAGS +#endif .endm /*-- @@ -277,6 +295,12 @@ * by hardware and that is not good. *-*/ .macro INTERRUPT_EPILOGUE LVL +#ifdef CONFIG_ARC_PLAT_EZNPS + .word CTOP_INST_SCHD_RW + POPAX CTOP_AUX_EFLAGS + POPAX CTOP_AUX_GPA1 +#endif + POPAX bta_l\LVL\() POPAX lp_start POPAX lp_end diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h index 5297faa..5a8cb22 100644 --- a/arch/arc/include/asm/ptrace.h +++ b/arch/arc/include/asm/ptrace.h @@ -19,6 +19,11 @@ #ifdef CONFIG_ISA_ARCOMPACT struct pt_regs { +#ifdef CONFIG_ARC_PLAT_EZNPS + unsigned long eflags; /* Extended FLAGS */ + unsigned long gpa1; /* General Purpose Aux */ +#endif + /* Real registers */ unsigned long bta; /* bta_l1, bta_l2, erbta */ diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 5c631a1..5ac3b54 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -234,6 +234,10 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long usp) */ regs->status32 = STATUS_U_MASK | STATUS_L_MASK | ISA_INIT_STATUS_BITS; +#ifdef CONFIG_EZNPS_MTM_EXT + regs->eflags = 0; +#endif + /* bogus seed values for debugging */ regs->lp_start = 0x10; regs->lp_end = 0x80; -- 1.7.1
Re: [PATCH 3/3] mm, thp: Do not loose dirty bit in __split_huge_pmd_locked()
On Wed, Jun 14, 2017 at 05:31:31PM +0200, Andrea Arcangeli wrote: > Hello, > > On Wed, Jun 14, 2017 at 04:18:57PM +0200, Martin Schwidefsky wrote: > > Could we change pmdp_invalidate to make it return the old pmd entry? > > That to me seems the simplest fix to avoid losing the dirty bit. > > I earlier suggested to replace pmdp_invalidate with something like > old_pmd = pmdp_establish(pmd_mknotpresent(pmd)) (then tlb flush could > then be conditional to the old pmd being present). Making > pmdp_invalidate return the old pmd entry would be mostly equivalent to > that. > > The advantage of not changing pmdp_invalidate is that we could skip a > xchg which is more costly in __split_huge_pmd_locked and > madvise_free_huge_pmd so perhaps there's a point to keep a variant of > pmdp_invalidate that doesn't use xchg internally (and in turn can't > return the old pmd value atomically). > > If we don't want new messy names like pmdp_establish we could have a > __pmdp_invalidate that returns void, and pmdp_invalidate that returns > the old pmd and uses xchg (and it'd also be backwards compatible as > far as the callers are concerned). So those places that don't need the > old value returned and can skip the xchg, could simply > s/pmdp_invalidate/__pmdp_invalidate/ to optimize. We have few pmdp_invalidate() callers: - clear_soft_dirty_pmd(); - madvise_free_huge_pmd(); - change_huge_pmd(); - __split_huge_pmd_locked(); Only madvise_free_huge_pmd() doesn't care about old pmd. __split_huge_pmd_locked() actually needs to check dirty after pmdp_invalidate(), see patch 3/3 of the patchset. I don't think it worth introduce one more primitive only for madvise_free_huge_pmd(). I'll stick with single pmdp_invalidate() that returns old value. -- Kirill A. Shutemov
Re: linux-next: build warning after merge of the i2c tree
Hi Wolfram, On Thu, 15 Jun 2017 09:02:44 +0200 Wolfram Sang wrote: > > > > drivers/i2c/i2c-stub.c:18:0: warning: "DEBUG" redefined > > > #define DEBUG > > > ^ > > > :0:0: note: this is the location of the previous definition > > > > > > Introduced by commit > > > > > > 6c42778780c4 ("i2c: stub: use pr_fmt") > > > > I am still getting this ... > > Sorry, that slipped through the cracks, will fix today! Thanks. > Thanks for the reminder. I am trying to be more proactive with these things. -- Cheers, Stephen Rothwell
[PATCH v3 01/11] ARC: set level of log per CPU during boot to be info level
From: Noam Camus Now it can be hidden by passing higher loglevel sevirity at cmdline The reasons are: 1) speeding up boot time, becomes critical for many CPUs machine, e.g. NPS400 with 4K CPUs 2) shorten kernel log at boot time, again easy to scan for large scale machines such NPS400 Signed-off-by: Noam Camus --- arch/arc/kernel/setup.c |6 +++--- arch/arc/mm/cache.c |2 +- arch/arc/mm/tlb.c |2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c index fc8211f..de29ea9 100644 --- a/arch/arc/kernel/setup.c +++ b/arch/arc/kernel/setup.c @@ -385,13 +385,13 @@ void setup_processor(void) read_arc_build_cfg_regs(); arc_init_IRQ(); - printk(arc_cpu_mumbojumbo(cpu_id, str, sizeof(str))); + pr_info("%s", arc_cpu_mumbojumbo(cpu_id, str, sizeof(str))); arc_mmu_init(); arc_cache_init(); - printk(arc_extn_mumbojumbo(cpu_id, str, sizeof(str))); - printk(arc_platform_smp_cpuinfo()); + pr_info("%s", arc_extn_mumbojumbo(cpu_id, str, sizeof(str))); + pr_info("%s", arc_platform_smp_cpuinfo()); arc_chk_core_config(); } diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c index a867575..bdb5227 100644 --- a/arch/arc/mm/cache.c +++ b/arch/arc/mm/cache.c @@ -1188,7 +1188,7 @@ void __ref arc_cache_init(void) unsigned int __maybe_unused cpu = smp_processor_id(); char str[256]; - printk(arc_cache_mumbojumbo(0, str, sizeof(str))); + pr_info("%s", arc_cache_mumbojumbo(0, str, sizeof(str))); /* * Only master CPU needs to execute rest of function: diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c index d0126fd..2b6da60 100644 --- a/arch/arc/mm/tlb.c +++ b/arch/arc/mm/tlb.c @@ -814,7 +814,7 @@ void arc_mmu_init(void) char str[256]; struct cpuinfo_arc_mmu *mmu = &cpuinfo_arc700[smp_processor_id()].mmu; - printk(arc_mmu_mumbojumbo(0, str, sizeof(str))); + pr_info("%s", arc_mmu_mumbojumbo(0, str, sizeof(str))); /* * Can't be done in processor.h due to header include depenedencies -- 1.7.1
[PATCH v3 00/11] plat-eznps upstream cont. set 2
From: Noam Camus Change Log: V2 -> V3 1) turn ARC prink's into pr_info as suggested by Vineet 2) For new command line argument (hs counter) shorten error massage to a single line, again as Vineet commented. V1 -> V2 1) I added "Handle memory error as an exception" patch from previous set It now turn do_memory_error() into weak sybol. It is then overriden by NPS400 platform, to simply call die(). 2) This set is now based on arc-next branch Summary: With this patch set I continue the effort of upstreaming the eznps platform for arch/arc. It comprise of couple of patches from last set yet not accepted, patches for HW erratas and some misc extensions such for HIGHMEM / NUMA. This set got more generic ARC changes than previous set. Additional ifdef seem like unavoidable, however it may seem Ugly. Let's see if we need to do it more elegant. Elad Kanfi (1): ARC: [plat-eznps] avoid toggling of DPC register Liav Rehana (2): ARC: [plat-eznps] Update the init sequence of aux regs per cpu. ARC: [plat-eznps] handle dedicated AUX registers Noam Camus (8): ARC: set level of log per CPU during boot to be info level ARC: send ipi to all cpus sharing task mm in case of page fault ARC: Allow irq threading ARC: Add CPU topology ARC: Support more than one PGDIR for KVADDR ARC: [NUMA] added CONFIG_NUMA for plat-eznps ARC: [plat-eznps] new command line argument for HW scheduler at MTM ARC: [plat-eznps] Save/Restore extra auxiliary registers Documentation/admin-guide/kernel-parameters.txt |9 ++ arch/arc/Kconfig| 48 + arch/arc/include/asm/Kbuild |1 - arch/arc/include/asm/arcregs.h |7 ++ arch/arc/include/asm/cacheflush.h |3 +- arch/arc/include/asm/entry-compact.h| 24 + arch/arc/include/asm/highmem.h |8 +- arch/arc/include/asm/pgtable.h |9 ++ arch/arc/include/asm/processor.h|8 +- arch/arc/include/asm/ptrace.h |5 + arch/arc/include/asm/switch_to.h| 11 ++ arch/arc/include/asm/topology.h | 40 +++ arch/arc/kernel/Makefile|1 + arch/arc/kernel/process.c |4 + arch/arc/kernel/setup.c | 13 ++- arch/arc/kernel/smp.c |5 + arch/arc/kernel/topology.c | 125 +++ arch/arc/mm/cache.c | 14 ++- arch/arc/mm/fault.c |8 ++ arch/arc/mm/highmem.c | 16 ++- arch/arc/mm/init.c |6 + arch/arc/mm/tlb.c |4 +- arch/arc/mm/tlbex.S | 31 ++ arch/arc/plat-eznps/Kconfig | 11 ++ arch/arc/plat-eznps/Makefile|2 +- arch/arc/plat-eznps/ctop.c | 33 ++ arch/arc/plat-eznps/entry.S |2 +- arch/arc/plat-eznps/include/plat/ctop.h |1 + arch/arc/plat-eznps/mtm.c | 58 ++- 29 files changed, 481 insertions(+), 26 deletions(-) create mode 100644 arch/arc/include/asm/topology.h create mode 100644 arch/arc/kernel/topology.c create mode 100644 arch/arc/plat-eznps/ctop.c
Re: [HELP-NEEDED, PATCH 0/3] Do not loose dirty bit on THP pages
On Thu, Jun 15, 2017 at 06:35:21AM +0530, Aneesh Kumar K.V wrote: > > > On Wednesday 14 June 2017 10:25 PM, Will Deacon wrote: > > Hi Aneesh, > > > > On Wed, Jun 14, 2017 at 08:55:26PM +0530, Aneesh Kumar K.V wrote: > > > On Wednesday 14 June 2017 07:21 PM, Kirill A. Shutemov wrote: > > > > Vlastimil noted that pmdp_invalidate() is not atomic and we can loose > > > > dirty and access bits if CPU sets them after pmdp dereference, but > > > > before set_pmd_at(). > > > > > > > > The bug doesn't lead to user-visible misbehaviour in current kernel, but > > > > fixing this would be critical for future work on THP: both huge-ext4 > > > > and THP > > > > swap out rely on proper dirty tracking. > > > > > > > > Unfortunately, there's no way to address the issue in a generic way. We > > > > need to > > > > fix all architectures that support THP one-by-one. > > > > > > > > All architectures that have THP supported have to provide atomic > > > > pmdp_invalidate(). If generic implementation of pmdp_invalidate() is > > > > used, > > > > architecture needs to provide atomic pmdp_mknonpresent(). > > > > > > > > I've fixed the issue for x86, but I need help with the rest. > > > > > > > > So far THP is supported on 8 architectures. Power and S390 already > > > > provides > > > > atomic pmdp_invalidate(). x86 is fixed by this patches, so 5 > > > > architectures > > > > left: > > > > > > > > - arc; > > > > - arm; > > > > - arm64; > > > > - mips; > > > > - sparc -- it has custom pmdp_invalidate(), but it's racy too; > > > > > > > > Please, help me with them. > > > > > > > > Kirill A. Shutemov (3): > > > >x86/mm: Provide pmdp_mknotpresent() helper > > > >mm: Do not loose dirty and access bits in pmdp_invalidate() > > > >mm, thp: Do not loose dirty bit in __split_huge_pmd_locked() > > > > > > > > > > > > > But in __split_huge_pmd_locked() we collected the dirty bit early. So even > > > if we made pmdp_invalidate() atomic, if we had marked the pmd pte entry > > > dirty after we collected the dirty bit, we still loose it right ? > > > > > > > > > May be we should relook at pmd PTE udpate interface. We really need an > > > interface that can update pmd entries such that we don't clear it in > > > between. IMHO, we can avoid the pmdp_invalidate() completely, if we can > > > switch from a pmd PTE entry to a pointer to PTE page (pgtable_t). We also > > > need this interface to avoid the madvise race fixed by > > > > There's a good chance I'm not following your suggestion here, but it's > > probably worth me pointing out that swizzling a page table entry from a > > block mapping (e.g. a huge page mapped at the PMD level) to a table entry > > (e.g. a pointer to a page of PTEs) can lead to all sorts of horrible > > problems on ARM, including amalgamation of TLB entries and fatal aborts. > > > > So we really need to go via an invalid entry, with appropriate TLB > > invalidation before installing the new entry. > > > > I am not suggesting we don't do the invalidate (the need for that is > documented in __split_huge_pmd_locked(). I am suggesting we need a new > interface, something like Andrea suggested. > > old_pmd = pmdp_establish(pmd_mknotpresent()); > > instead of pmdp_invalidate(). We can then use this in scenarios where we > want to update pmd PTE entries, where right now we go through a pmdp_clear > and set_pmd path. We should really not do that for THP entries. Which cases are you talking about? When do we need to clear pmd and set later? -- Kirill A. Shutemov
Re: [PATCH 2/3] mm: Do not loose dirty and access bits in pmdp_invalidate()
Hi Kirill, [auto build test ERROR on mmotm/master] [also build test ERROR on v4.12-rc5 next-20170614] [cannot apply to tip/x86/core] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/Do-not-loose-dirty-bit-on-THP-pages/20170615-115540 base: git://git.cmpxchg.org/linux-mmotm.git master config: arm64-defconfig (attached as .config) compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm64 All errors (new ones prefixed by >>): mm/pgtable-generic.c: In function 'pmdp_invalidate': >> mm/pgtable-generic.c:185:2: error: implicit declaration of function >> 'pmdp_mknotpresent' [-Werror=implicit-function-declaration] pmdp_mknotpresent(pmdp); ^ cc1: some warnings being treated as errors vim +/pmdp_mknotpresent +185 mm/pgtable-generic.c 179 #endif 180 181 #ifndef __HAVE_ARCH_PMDP_INVALIDATE 182 void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, 183 pmd_t *pmdp) 184 { > 185 pmdp_mknotpresent(pmdp); 186 flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); 187 } 188 #endif --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH v3] PCI: dwc: dra7xx: Fix compilation warning.
drivers/pci/dwc/pci-dra7xx.c: In function ‘dra7xx_pcie_enable_msi_interrupts’: drivers/pci/dwc/pci-dra7xx.c:177:7: warning: large integer implicitly truncated to unsigned type [-Woverflow] ~LEG_EP_INTERRUPTS & ~MSI); ^ drivers/pci/dwc/pci-dra7xx.c: In function ‘dra7xx_pcie_enable_wrapper_interrupts’: drivers/pci/dwc/pci-dra7xx.c:187:7: warning: large integer implicitly truncated to unsigned type [-Woverflow] ~INTERRUPTS); Signed-off-by: Arvind Yadav --- Changes in v2: Add casts in the definitions. Changes in v3: Change logic insted of casting. drivers/pci/dwc/pci-dra7xx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c index 8decf46..668dc15 100644 --- a/drivers/pci/dwc/pci-dra7xx.c +++ b/drivers/pci/dwc/pci-dra7xx.c @@ -174,7 +174,7 @@ static int dra7xx_pcie_establish_link(struct dw_pcie *pci) static void dra7xx_pcie_enable_msi_interrupts(struct dra7xx_pcie *dra7xx) { dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, - ~LEG_EP_INTERRUPTS & ~MSI); + LEG_EP_INTERRUPTS | MSI); dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQENABLE_SET_MSI, @@ -184,7 +184,7 @@ static void dra7xx_pcie_enable_msi_interrupts(struct dra7xx_pcie *dra7xx) static void dra7xx_pcie_enable_wrapper_interrupts(struct dra7xx_pcie *dra7xx) { dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MAIN, - ~INTERRUPTS); + INTERRUPTS); dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQENABLE_SET_MAIN, INTERRUPTS); } -- 1.9.1
Re: [PATCH v3] xen/mce: don't issue error message for failed /dev/mcelog registration
On Wed, Jun 14, 2017 at 10:40:59AM +0200, Juergen Gross wrote: > When running under Xen as dom0 /dev/mcelog is being registered by Xen > instead of the normal mcelog driver. Avoid an error message being > issued by the mcelog driver in this case. Instead issue an informative > message that Xen has registered the device. > > Signed-off-by: Juergen Gross > --- > arch/x86/kernel/cpu/mcheck/dev-mcelog.c | 7 ++- > drivers/xen/mcelog.c| 2 ++ > 2 files changed, 8 insertions(+), 1 deletion(-) Applied, thanks. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
[PATCH v4 1/1] f2fs: dax: implement direct access
From: Qiuyang Sun This patch implements Direct Access (DAX) in F2FS. Signed-off-by: Qiuyang Sun --- Changelog v3 -> v4: In f2fs_iomap_begin(): - For the write branch, if f2fs_map_blocks() returns error (probably due to ENOSPC), the allocated blocks beyond original_i_size are truncated. - For the read branch, use F2FS_GET_BLOCK_FIEMAP instead of READ for f2fs_map_blocks(), so that contiguous unwritten blocks can be treated in a batch. Accordingly, judge F2FS_MAP_UNWRITTEN before F2FS_MAP_MAPPED for iomap->type. - Add a call of f2fs_update_time() in f2fs_iomap_end(). - In f2fs_move_file_range() and f2fs_ioc_defragment(), return -EINVAL for DAX files, as the current implementation uses page cache. - Call f2fs_bug_on() in f2fs_ioc_commit_atomic_write() and f2fs_ioc_(release|abort)_volatile_write() when the inode is DAX, which should not happen. - Optimize the logic in dax_move_data_page(). - Enable setting the S_DAX flag for an inode in f2fs_set_inode_flags(). The v4 patch is at f2fs-dev-test. --- fs/f2fs/data.c | 100 + fs/f2fs/f2fs.h | 8 +++ fs/f2fs/file.c | 192 ++- fs/f2fs/gc.c | 104 -- fs/f2fs/inline.c | 4 ++ fs/f2fs/inode.c | 8 ++- fs/f2fs/namei.c | 5 ++ fs/f2fs/super.c | 15 + 8 files changed, 429 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7d3af48..58efce0 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2257,3 +2257,103 @@ int f2fs_migrate_page(struct address_space *mapping, .migratepage= f2fs_migrate_page, #endif }; + +#ifdef CONFIG_FS_DAX +#include +#include + +static int f2fs_iomap_begin(struct inode *inode, loff_t offset, + loff_t length, unsigned int flags, struct iomap *iomap) +{ + struct block_device *bdev; + unsigned long first_block = F2FS_BYTES_TO_BLK(offset); + unsigned long last_block = F2FS_BYTES_TO_BLK(offset + length - 1); + struct f2fs_map_blocks map; + int ret; + + if (WARN_ON_ONCE(f2fs_has_inline_data(inode))) + return -ERANGE; + + map.m_lblk = first_block; + map.m_len = last_block - first_block + 1; + map.m_next_pgofs = NULL; + + if (!(flags & IOMAP_WRITE)) + ret = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_FIEMAP); + else { + /* i_size should be kept here and changed later in f2fs_iomap_end */ + loff_t original_i_size = i_size_read(inode); + + ret = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO); + if (i_size_read(inode) > original_i_size) { + f2fs_i_size_write(inode, original_i_size); + if (ret) { + truncate_pagecache(inode, original_i_size); + truncate_blocks(inode, original_i_size, true); + } + } + } + + if (ret) + return ret; + + iomap->flags = 0; + bdev = inode->i_sb->s_bdev; + iomap->bdev = bdev; + if (blk_queue_dax(bdev->bd_queue)) + iomap->dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); + else + iomap->dax_dev = NULL; + iomap->offset = F2FS_BLK_TO_BYTES((u64)first_block); + + if (map.m_len == 0) { + iomap->type = IOMAP_HOLE; + iomap->blkno = IOMAP_NULL_BLOCK; + iomap->length = F2FS_BLKSIZE; + } else { + if (map.m_flags & F2FS_MAP_UNWRITTEN) { + iomap->type = IOMAP_UNWRITTEN; + } else if (map.m_flags & F2FS_MAP_MAPPED) { + iomap->type = IOMAP_MAPPED; + } else { + WARN_ON_ONCE(1); + return -EIO; + } + iomap->blkno = + (sector_t)map.m_pblk << F2FS_LOG_SECTORS_PER_BLOCK; + iomap->length = F2FS_BLK_TO_BYTES((u64)map.m_len); + } + + if (map.m_flags & F2FS_MAP_NEW) + iomap->flags |= IOMAP_F_NEW; + return 0; +} + +static int f2fs_iomap_end(struct inode *inode, loff_t offset, loff_t length, + ssize_t written, unsigned int flags, struct iomap *iomap) +{ + put_dax(iomap->dax_dev); + if (!(flags & IOMAP_WRITE) || (flags & IOMAP_FAULT)) + return 0; + + if (offset + written > i_size_read(inode)) + f2fs_i_size_write(inode, offset + written); + + if (iomap->offset + iomap->length > + ALIGN(i_size_read(inode), F2FS_BLKSIZE)) { + block_t written_blk = F2FS_BYTES_TO_BLK(offset + written); + block_t end_blk = F2FS_BYTES_TO_BLK(offset + length); + + if (written_blk < end_blk) + f2fs_write_failed(inode->i_mapping, offset + length); + } + + f2fs_update
[PATCH] staging: fsl-mc/dpio: Propagate error code
dpaa2_io_service_register() returns zero even if qbman_swp_CDAN_set() encountered an error. Fix this by propagating the error code so the caller is informed data availability notifications are not properly set for a channel. Signed-off-by: Ioana Radulescu --- drivers/staging/fsl-mc/bus/dpio/dpio-service.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c index e5d66749614c..762f045f53f7 100644 --- a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c +++ b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c @@ -260,9 +260,9 @@ int dpaa2_io_service_register(struct dpaa2_io *d, /* Enable the generation of CDAN notifications */ if (ctx->is_cdan) - qbman_swp_CDAN_set_context_enable(d->swp, - (u16)ctx->id, - ctx->qman64); + return qbman_swp_CDAN_set_context_enable(d->swp, +(u16)ctx->id, +ctx->qman64); return 0; } EXPORT_SYMBOL(dpaa2_io_service_register); -- 2.11.0
Re: [PATCH v2] nvme: use uuid_t in nvme_ns
Reviewed-by: Sagi Grimberg
Re: [PATCH v2] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched
On Wed, 14 Jun 2017, Petr Mladek wrote: > rcu_read_(un)lock(), list_*_rcu(), and synchronize_rcu() are used for > a secure access and manipulation of the list of patches that modify > the same function. In particular, it is the variable func_stack that > is accessible from the ftrace handler via struct ftrace_ops and klp_ops. > > Of course, it synchronizes also some states of the patch on the top > of the stack, e.g. func->transition in klp_ftrace_handler. > > At the same time, this mechanism guards also the manipulation > of task->patch_state. It is modified according to the state of > the transition and the state of the process. > > Now, all this works well as long as RCU works well. Sadly livepatching > might get into some corner cases when this is not true. For example, > RCU is not watching when rcu_read_lock() is taken in idle threads. > It is because they might sleep and prevent reaching the grace period > for too long. > > There are ways how to make RCU watching even in idle threads, > see rcu_irq_enter(). But there is a small location inside RCU > infrastructure when even this does not work. > > This small problematic location can be detected either before > calling rcu_irq_enter() by rcu_irq_enter_disabled() or later by > rcu_is_watching(). Sadly, there is no safe way how to handle it. > Once we detect that RCU was not watching, we might see inconsistent > state of the function stack and the related variables in > klp_ftrace_handler(). Then we could do a wrong decision, > use an incompatible implementation of the function and > break the consistency of the system. We could warn but > we could not avoid the damage. > > Fortunately, ftrace has similar problems and they seem to > be solved well there. It uses a heavy weight implementation > of some RCU operations. In particular, it replaces: > > + rcu_read_lock() with preempt_disable_notrace() > + rcu_read_unlock() with preempt_enable_notrace() > + synchronize_rcu() with schedule_on_each_cpu(sync_work) > > My understanding is that this is RCU implementation from > a stone age. It meets the core RCU requirements but it is > rather ineffective. Especially, it does not allow to batch > or speed up the synchronize calls. > > On the other hand, it is very trivial. It allows to safely > trace and/or livepatch even the RCU core infrastructure. > And the effectiveness is a not a big issue because using ftrace > or livepatches on productive systems is a rare operation. > The safety is much more important than a negligible extra > load. > > Note that the alternative implementation follows the RCU > principles. Therefore, we could and actually must use > list_*_rcu() variants when manipulating the func_stack. > These functions allow to access the pointers in > the right order and with the right barriers. But they > do not use any other information that would be set > only by rcu_read_lock(). > > Also note that there are actually two problems solved in ftrace: > > First, it cares about the consistency of RCU read sections. > It is being solved the way as described and used in this patch. > > Second, ftrace needs to make sure that nobody is inside > the dynamic trampoline when it is being freed. For this, it also > calls synchronize_rcu_tasks() in preemptive kernel in > ftrace_shutdown(). > > Livepatch has similar problem but it is solved by ftrace for free. > klp_ftrace_handler() is a good guy and newer sleeps. In addition, s/newer/never/ > it is registered with FTRACE_OPS_FL_DYNAMIC. It causes that > unregister_ftrace_function() calls: > > * schedule_on_each_cpu(ftrace_sync) - always > * synchronize_rcu_tasks() - in preemptive kernel > > The effect is that nobody is neither inside the dynamic trampoline > nor inside the ftrace handler after unregister_ftrace_function() > returns. > > Signed-off-by: Petr Mladek Acked-by: Miroslav Benes > +/* > + * We allow to patch also functions where RCU is not watching, > + * e.g. before user_exit(). We can not rely on the RCU infrastructure > + * to do the synchronization. Instead hard force the sched synchronization. > + * > + * This approach allows to use RCU functions for manipulating func_stack > + * a safe way . s/a safe way /safely/. Miroslav
RE: [PATCH] staging: fsl-mc/dpio: Propagate error code
> -Original Message- > From: Ioana Radulescu [mailto:ruxandra.radule...@nxp.com] > Sent: Thursday, June 15, 2017 11:55 AM > To: gre...@linuxfoundation.org > Cc: de...@driverdev.osuosl.org; linux-kernel@vger.kernel.org; > ag...@suse.de; a...@arndb.de; linux-arm-ker...@lists.infradead.org; Bogdan > Purcareata ; stuyo...@gmail.com; Laurentiu Tudor > ; Ruxandra Ioana Radulescu > ; Roy Pledge ; Haiying Wang > > Subject: [PATCH] staging: fsl-mc/dpio: Propagate error code > > dpaa2_io_service_register() returns zero even if > qbman_swp_CDAN_set() encountered an error. Fix this > by propagating the error code so the caller is informed > data availability notifications are not properly set > for a channel. > > Signed-off-by: Ioana Radulescu Acked-by: Bogdan Purcareata > --- > drivers/staging/fsl-mc/bus/dpio/dpio-service.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c > b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c > index e5d66749614c..762f045f53f7 100644 > --- a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c > +++ b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c > @@ -260,9 +260,9 @@ int dpaa2_io_service_register(struct dpaa2_io *d, > > /* Enable the generation of CDAN notifications */ > if (ctx->is_cdan) > - qbman_swp_CDAN_set_context_enable(d->swp, > - (u16)ctx->id, > - ctx->qman64); > + return qbman_swp_CDAN_set_context_enable(d->swp, > + (u16)ctx->id, > + ctx->qman64); > return 0; > } > EXPORT_SYMBOL(dpaa2_io_service_register); > -- > 2.11.0
Re: Crypto Fixes for 4.12
On Thu, Jun 15, 2017 at 9:54 AM, Herbert Xu wrote: > > This push fixes a bug on sparc where we may dereference freed stack > memory. Ugh, that's a particularly ugly fix for a random gcc bug on a random architecture that almost nobody tests. In other words, it's nasty. It's nasty because nobody sane will ever realize this pattern, and the code will either bit-rot or just happen again somewhere else. I'd have been *much* happier if this had been some nicer abstraction that is built up around the use of SHASH_DESC_ON_STACK(), and just have some rule that "SHASH_DESC_ON_STACK()" needs to be paired with retrieving the final value and then a SHASH_DESC_DEALLOC() or whatever. Then you *could* implement SHASH_DESC_ON_STACK() as a kmalloc, and SHASH_DESC_DEALLOC() would be a kfree - but with an alloca()-like allocation the SHASH_DESC_DEALLOC() would be that "barrier_data()". At that point the interface would make _sense_ at some conceptual level, rather than being a random hack for a small collection of random users of this thing. There's a fair number of SHASH_DESC_ON_STACK users, are all the others safe for some random reason that just happens to be about code generation? Did people actually verify that? Linus
Re: Crypto Fixes for 4.12
On Thu, Jun 15, 2017 at 6:04 PM, Linus Torvalds wrote: > > Ugh, that's a particularly ugly fix for a random gcc bug on a random > architecture that almost nobody tests. .. anway, I pulled it, but I don't have to like it. Linus
Re: [PATCH v1 1/1] gpio: gpio-wcove: Fix GPIO control register offset calculation
On Thu, Jun 15, 2017 at 12:39 AM, wrote: > From: Kuppuswamy Sathyanarayanan > > According to Whiskey Cove PMIC GPIO controller specification, for GPIO > pins 0-12, GPIO input and output register control address range from, > > 0x4e44-0x4e50 for GPIO outputs control register > > 0x4e51-0x4e5d for GPIO input control register > > But, currently when calculating the GPIO register offsets in to_reg() > function, all GPIO pins in the same bank uses the same GPIO control > register address. This logic is incorrect. This patch fixes this > issue. > > This patch also adds support to selectively skip register modification > for virtual GPIOs. > > In case of Whiskey Cove PMIC, ACPI code may use up 94 virtual GPIOs. > These virtual GPIOs are used by the ACPI code as means to access various > non GPIO bits of PMIC. So for these virtual GPIOs, we don't need to > manipulate the physical GPIO pin register. A similar patch has been > merged recently by Hans for Crystal Cove PMIC GPIO driver. You can > find more details about it in Commit 9a752b4c9ab9 ("gpio: crystalcove: > Do not write regular gpio registers for virtual GPIOs") > > Signed-off-by: Kuppuswamy Sathyanarayanan > > Reported-by: Jukka Laitinen It seems it should have a Fixes tag. > static inline unsigned int to_reg(int gpio, enum ctrl_register reg_type) > { > unsigned int reg; > - int bank; > > - if (gpio < BANK0_NR_PINS) > - bank = 0; > - else if (gpio < BANK0_NR_PINS + BANK1_NR_PINS) > - bank = 1; > - else > - bank = 2; > + if (gpio >= WCOVE_GPIO_NUM) > + return -EOPNOTSUPP; How this can happen? > > if (reg_type == CTRL_IN) > - reg = GPIO_IN_CTRL_BASE + bank; > + /* > +* GPIO input control registers > +* (one per pin): 0x4e51 - 0x4e5d > +*/ Noise. > + reg = GPIO_IN_CTRL_BASE + gpio; > else > - reg = GPIO_OUT_CTRL_BASE + bank; > + /* GPIO output control registers > +* (one per pin): 0x4e44 - 0x4e50 > +*/ Wrong multi-line comment and noise overall. If you wish to leave the comments, put them on top of the function as its description. > + reg = GPIO_OUT_CTRL_BASE + gpio; > > return reg; > } > @@ -145,7 +147,10 @@ static void wcove_update_irq_mask(struct wcove_gpio *wg, > int gpio) > > static void wcove_update_irq_ctrl(struct wcove_gpio *wg, int gpio) > { > - unsigned int reg = to_reg(gpio, CTRL_IN); > + int reg = to_reg(gpio, CTRL_IN); > + > + if (reg < 0) > + return; Since above comment this change would gone. > + int reg = to_reg(gpio, CTRL_OUT); > + > + if (reg < 0) > + return 0; > > - return regmap_write(wg->regmap, to_reg(gpio, CTRL_OUT), > - CTLO_INPUT_SET); > + return regmap_write(wg->regmap, reg, CTLO_INPUT_SET); Ditto. > + int reg = to_reg(gpio, CTRL_OUT); > > - return regmap_write(wg->regmap, to_reg(gpio, CTRL_OUT), > - CTLO_OUTPUT_SET | value); > + if (reg < 0) > + return 0; > + > + return regmap_write(wg->regmap, reg, CTLO_OUTPUT_SET | value); Ditto. > + int ret, reg = to_reg(gpio, CTRL_OUT); Don't fit such variable on one line. > + > + if (reg < 0) > + return 0; > > - ret = regmap_read(wg->regmap, to_reg(gpio, CTRL_OUT), &val); > + ret = regmap_read(wg->regmap, reg, &val); This would gone after addressing first comment. > - int ret; > + int ret, reg = to_reg(gpio, CTRL_IN); > + > + if (reg < 0) > + return 0; > > - ret = regmap_read(wg->regmap, to_reg(gpio, CTRL_IN), &val); > + ret = regmap_read(wg->regmap, reg, &val); Ditto. > + int reg = to_reg(gpio, CTRL_OUT); > + > + if (reg < 0) > + return; > > if (value) > - regmap_update_bits(wg->regmap, to_reg(gpio, CTRL_OUT), 1, 1); > + regmap_update_bits(wg->regmap, reg, 1, 1); > else > - regmap_update_bits(wg->regmap, to_reg(gpio, CTRL_OUT), 1, 0); > + regmap_update_bits(wg->regmap, reg, 1, 0); Ditto. > + int reg = to_reg(gpio, CTRL_OUT); > + > + if (reg < 0) > + return 0; > > switch (pinconf_to_config_param(config)) { > case PIN_CONFIG_DRIVE_OPEN_DRAIN: > - return regmap_update_bits(wg->regmap, to_reg(gpio, CTRL_OUT), > - CTLO_DRV_MASK, CTLO_DRV_OD); > + return regmap_update_bits(wg->regmap, reg, CTLO_DRV_MASK, > + CTLO_DRV_OD); > case PIN_CONFIG_DRIVE_PUSH_PULL: > - return regmap_update_bits(wg->regmap, to_reg(gpio, CTRL_OUT), > - CTLO_DR
Re: [PATCH v6 25/34] swiotlb: Add warnings for use of bounce buffers with SME
On Wed, Jun 14, 2017 at 02:49:02PM -0500, Tom Lendacky wrote: > I guess I don't need the sme_active() check since the second part of the > if statement can only ever be true if SME is active (since mask is > unsigned). ... and you can define sme_me_mask as an u64 directly (it is that already, practically, as we don't do SME on 32-bit) and then get rid of the cast. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCH 03/27] VFS: Make get_mnt_ns() return the namespace [ver #5]
On Wed, Jun 14, 2017 at 04:15:42PM +0100, David Howells wrote: > Make get_mnt_ns() return the namespace it got a ref on for consistency with > other namespace ref getting functions. Is there any point in doing that? I mean, it's not used in your patchset anymore and existing callers are a mixed bag...
[PATCH] usb: gadget: bdc: 64-bit pointer capability check
Corrected the register to check the 64-bit pointer capability state. 64-bit pointer implementation capability was checking in wrong register, which causes the BDC enumeration failure in 64-bit memory address. Fixes: efed421a94e6 ("usb: gadget: Add UDC driver for Broadcom USB3.0 device controller IP BDC") Signed-off-by: Srinath Mannam --- drivers/usb/gadget/udc/bdc/bdc_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/udc/bdc/bdc_core.c b/drivers/usb/gadget/udc/bdc/bdc_core.c index ccb9c21..e9bd8d4 100644 --- a/drivers/usb/gadget/udc/bdc/bdc_core.c +++ b/drivers/usb/gadget/udc/bdc/bdc_core.c @@ -475,7 +475,7 @@ static int bdc_probe(struct platform_device *pdev) bdc->dev = dev; dev_dbg(bdc->dev, "bdc->regs: %p irq=%d\n", bdc->regs, bdc->irq); - temp = bdc_readl(bdc->regs, BDC_BDCSC); + temp = bdc_readl(bdc->regs, BDC_BDCCAP1); if ((temp & BDC_P64) && !dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) { dev_dbg(bdc->dev, "Using 64-bit address\n"); -- 2.7.4
Re: [PATCH 30/31] ext4: eliminate xattr entry e_hash recalculation for removes
On Wed 14-06-17 10:23:40, Tahsin Erdogan wrote: > When an extended attribute block is modified, ext4_xattr_hash_entry() > recalculates e_hash for the entry that is pointed by s->here. This is > unnecessary if the modification is to remove an entry. > > Currently, if the removed entry is the last one and there are other > entries remaining, hash calculation targets the just erased entry which > has been filled with zeroes and effectively does nothing. If the removed > entry is not the last one and there are more entries, this time it will > recalculate hash on the next entry which is totally unnecessary. > > Fix these by moving the decision on when to recalculate hash to > ext4_xattr_set_entry(). I agree with moving ext4_xattr_rehash_entry() out of ext4_xattr_rehash(). However how about just keeping ext4_xattr_rehash() in ext4_xattr_block_set() (so that you don't have to pass aditional argument to ext4_xattr_set_entry()) and calling ext4_xattr_rehash_entry() when i->value != NULL? That would seem easier and cleaner as well... Honza > > Signed-off-by: Tahsin Erdogan > --- > fs/ext4/xattr.c | 50 ++ > 1 file changed, 26 insertions(+), 24 deletions(-) > > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c > index c9579d220a0c..6c6dce2f874e 100644 > --- a/fs/ext4/xattr.c > +++ b/fs/ext4/xattr.c > @@ -77,8 +77,9 @@ static void ext4_xattr_block_cache_insert(struct mb_cache *, > static struct buffer_head * > ext4_xattr_block_cache_find(struct inode *, struct ext4_xattr_header *, > struct mb_cache_entry **); > -static void ext4_xattr_rehash(struct ext4_xattr_header *, > - struct ext4_xattr_entry *); > +static void ext4_xattr_hash_entry(struct ext4_xattr_entry *entry, > + void *value_base); > +static void ext4_xattr_rehash(struct ext4_xattr_header *); > > static const struct xattr_handler * const ext4_xattr_handler_map[] = { > [EXT4_XATTR_INDEX_USER] = &ext4_xattr_user_handler, > @@ -1436,7 +1437,8 @@ static int ext4_xattr_inode_lookup_create(handle_t > *handle, struct inode *inode, > > static int ext4_xattr_set_entry(struct ext4_xattr_info *i, > struct ext4_xattr_search *s, > - handle_t *handle, struct inode *inode) > + handle_t *handle, struct inode *inode, > + bool is_block) > { > struct ext4_xattr_entry *last; > struct ext4_xattr_entry *here = s->here; > @@ -1500,8 +1502,8 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info > *i, >* attribute block so that a long value does not occupy the >* whole space and prevent futher entries being added. >*/ > - if (ext4_has_feature_ea_inode(inode->i_sb) && new_size && > - (s->end - s->base) == i_blocksize(inode) && > + if (ext4_has_feature_ea_inode(inode->i_sb) && > + new_size && is_block && > (min_offs + old_size - new_size) < > EXT4_XATTR_BLOCK_RESERVE(inode)) { > ret = -ENOSPC; > @@ -1631,6 +1633,13 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info > *i, > } > here->e_value_size = cpu_to_le32(i->value_len); > } > + > + if (is_block) { > + if (s->not_found || i->value) > + ext4_xattr_hash_entry(here, s->base); > + ext4_xattr_rehash((struct ext4_xattr_header *)s->base); > + } > + > ret = 0; > out: > iput(old_ea_inode); > @@ -1720,14 +1729,11 @@ ext4_xattr_block_set(handle_t *handle, struct inode > *inode, > mb_cache_entry_delete(ext4_mb_cache, hash, > bs->bh->b_blocknr); > ea_bdebug(bs->bh, "modifying in-place"); > - error = ext4_xattr_set_entry(i, s, handle, inode); > - if (!error) { > - if (!IS_LAST_ENTRY(s->first)) > - ext4_xattr_rehash(header(s->base), > - s->here); > + error = ext4_xattr_set_entry(i, s, handle, inode, > + true /* is_block */); > + if (!error) > ext4_xattr_block_cache_insert(ext4_mb_cache, > bs->bh); > - } > ext4_xattr_block_csum_set(inode, bs->bh); > unlock_buffer(bs->bh); > if (error == -EFSCORRUPTED) > @@ -1787,7 +1793,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode > *inode, > s->end
Re: [PATCH v15 2/7] power: add power sequence library
On Thu, Jun 15, 2017 at 10:11:45AM +0200, Ulf Hansson wrote: > > Yes, you are right. This is the limitation for this power sequence > > library, the registration for the 1st power sequence instance must > > be finished before device driver uses it. I am appreciated that > > you can supply some suggestions for it. > > In general this kind of problems is solved by first parsing the DTB, > which means you will find out whether there is a resource (a pwrseq) > required for the device. Then you try to fetch that resource, and if > that fails, it means the resource is not yet available, and hence you > want to retry later and should return -EPROBE_DEFER. > > In this case, of_pwrseq_on() needs to be converted to start looking > for a pwrseq compatible in it's child node - I guess. Then if that is > found, you try to fetch the instance of the corresponding library. > Failing to fetch the library instance should then cause a return > -EPROBE_DEFER. The most difficulty for this is we can't know whether the requested pwrseq instance will be registered or not, the kernel configuration for this pwrseq library may not be chosen at all. > > > > >> Moreover, I have found yet another severe problem but reviewing the code: > >> In the struct pwrseq, you have a "bool used", which you are setting to > >> "true" once the pwrseq has been hooked up with the device, when a > >> driver calls of_pwrseq_on(). Setting that variable to true, will also > >> prevent another driver from using the same instance of the pwrseq for > >> its device. So, to cope with multiple users, you register a new > >> instance of the same pwrseq library that got hooked up, once the > >> ->get() callback is about to complete. > >> > >> The problem the occurs, when there is another driver calling > >> of_pwrseq_on() in between, meaning that the new instance has not yet > >> been registered. This will simply fail, won't it? > > > > Yes, you are right, thanks for pointing that, I will add mutex_lock for > > of_pwrseq_on. > > Another option is to entirely skip to two step approach. > > In other words, make the library to cope with multiple users via the > same registered library instance. > No, the pwrseq instance stores dtb information (clock, gpio, etc), it needs to be per device. -- Best Regards, Peter Chen
Re: [PATCH] virtio_balloon: disable VIOMMU support
On 2017年06月14日 02:00, Michael S. Tsirkin wrote: virtio balloon bypasses the DMA API entirely so does not support the VIOMMU right now. It's not clear we need that support, for now let's just make sure we don't pretend to support it. Cc: sta...@vger.kernel.org Cc: Wei Wang Fixes: 1a937693993f ("virtio: new feature to detect IOMMU device quirk") Signed-off-by: Michael S. Tsirkin --- drivers/virtio/virtio_balloon.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 408c174..22caf80 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -663,6 +663,12 @@ static int virtballoon_restore(struct virtio_device *vdev) } #endif +static int virtballoon_validate(struct virtio_device *vdev) +{ + __virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM); + return 0; +} + static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, @@ -675,6 +681,7 @@ static struct virtio_driver virtio_balloon_driver = { .driver.name = KBUILD_MODNAME, .driver.owner = THIS_MODULE, .id_table = id_table, + .validate = virtballoon_validate, .probe =virtballoon_probe, .remove = virtballoon_remove, .config_changed = virtballoon_changed, Acked-by: Jason Wang
Re: [PATCH v1 1/1] gpio: gpio-crystalcove: Skip IRQ CTRL register update for virtual GPIOs
On Thu, Jun 15, 2017 at 2:21 AM, wrote: > From: Kuppuswamy Sathyanarayanan > > Commit 9a752b4c9ab9 ("gpio: crystalcove: Do not write regular gpio > registers for virtual GPIOs") added support to skip GPIO register > update for virtual GPIOs, but it missed to add skip logic in > crystalcove_update_irq_ctrl() function. This patch fixes it. > @@ -134,6 +134,9 @@ static void crystalcove_update_irq_ctrl(struct > crystalcove_gpio *cg, int gpio) > { > int reg = to_reg(gpio, CTRL_IN); > > + if (reg < 0) > + return; > + > regmap_update_bits(cg->regmap, reg, CTLI_INTCNT_BE, cg->intcnt_value); > } Shouldn't it have been done using irq_valid_mask flag in the first place? -- With Best Regards, Andy Shevchenko
Re: [RFC][PATCH 0/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed
On 06/15/17 at 08:34am, Izumi, Taku wrote: > Dear Baoquan, > > > > Our customer reported that Kernel text may be located on non-mirror > > > region (movable zone) when both address range mirroring feature and > > > KASLR are enabled. > >I know your customer :) LOL, have to agree. > > > The method is very simple. If efi is enabled, just iterate all efi > > > memory map and pick up mirror region to process for adding candidate > > > of slot. If efi disabled or no mirror region existed, still process > > > e820 memory map. This won't bring much efficiency loss, at worst we > > > just go through all efi memory maps and found no mirror. > > > > > > One question: > > > From code, though mirror regions are existed, they are meaningful only > > > if kernelcore=mirror kernel option is specified. Not sure if my > > > understanding is correct. > >Your understanding is almost correct. >Only when "kernelcore=mirror" specified, the above procedure works. >But, if mirrored regions are existed, bootmem allocator tries to >allocate from mirrored region independently of "kerenelcore=mirror" option. > >So, IMHO, kernel text is important, so putting it to mirrored (more > reliable) >region is reasonable whether or not "kernelcore=mirror" is specified. Ah, yeah, thanks for telling. So at boot time memblock will put mirror region in highest priority to allocate. Then let me remove the kernelcore=mirror handling code, the process_efi_entry will be simpler. commit a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95 Author: Tony Luck Date: Wed Jun 24 16:58:12 2015 -0700 mm/memblock: allocate boot time data structures from mirrored memory > >Anyway thanks for submitting patch. >We have Address Range Mirroring capable machine, so we'll test your patch. Thanks a lot for help, Yasuaki Ishimatsu said he also will loan me a testing machine when it's available. Thanks Baoquan > > > > > > > > NOTE: > > > I haven't got a machine with efi mirror region enabled, so only test > > > the > > > e820 map processing case and the case of no mirror region on efi machine. > > > So set this as a RFC patchset, will post formal one after above > > > question is made clear and mirror issue test passed. > > > > > > Baoquan He (2): > > > x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map > > > x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if > > > existed > > > > > > arch/x86/boot/compressed/kaslr.c | 129 > > > +++ > > > 1 file changed, 104 insertions(+), 25 deletions(-) > > > > > > -- > > > 2.5.5 > > > >
[PATCH] drm: atmel-hlcdc: sama5d4 does not have overlay2
From: Peter Rosin Remove the layer. Fixes: 5b9fb5e6c6c7 ("drm: atmel-hlcdc: add support for sama5d4 SoCs") Signed-off-by: Peter Rosin --- drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c index 30dbffd..888524a 100644 --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c @@ -295,28 +295,10 @@ static const struct atmel_hlcdc_layer_desc atmel_hlcdc_sama5d4_layers[] = { }, }, { - .name = "overlay2", - .formats = &atmel_hlcdc_plane_rgb_formats, - .regs_offset = 0x240, - .id = 2, - .type = ATMEL_HLCDC_OVERLAY_LAYER, - .cfgs_offset = 0x2c, - .layout = { - .pos = 2, - .size = 3, - .xstride = { 4 }, - .pstride = { 5 }, - .default_color = 6, - .chroma_key = 7, - .chroma_key_mask = 8, - .general_config = 9, - }, - }, - { .name = "high-end-overlay", .formats = &atmel_hlcdc_plane_rgb_and_yuv_formats, .regs_offset = 0x340, - .id = 3, + .id = 2, .type = ATMEL_HLCDC_OVERLAY_LAYER, .cfgs_offset = 0x4c, .layout = { -- 2.1.4
Re: [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
Alexey Kardashevskiy writes: > From: Yongji Xie > > Any IODA host bridge have the capability of IRQ remapping. > So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge > is detected. Where's the code that actually enforces this property? It would be good to have a comment in pnv_pci_ioda_root_bridge_prepare() (probably), pointing to that code, so that we can remember the relationship between the two. cheers