[PATCH] Documentation: move Documentation/virtual to Documentation/virt
Renaming docs seems to be en vogue at the moment, so fix on of the grossly misnamed directories. We usually never use "virtual" as a shortcut for virtualization in the kernel, but always virt, as seen in the virt/ top-level directory. Fix up the documentation to match that. Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:") Signed-off-by: Christoph Hellwig --- Documentation/admin-guide/kernel-parameters.txt | 2 +- Documentation/{virtual => virt}/index.rst | 0 .../{virtual => virt}/kvm/amd-memory-encryption.rst | 0 Documentation/{virtual => virt}/kvm/api.txt | 2 +- Documentation/{virtual => virt}/kvm/arm/hyp-abi.txt | 0 Documentation/{virtual => virt}/kvm/arm/psci.txt| 0 Documentation/{virtual => virt}/kvm/cpuid.rst | 0 Documentation/{virtual => virt}/kvm/devices/README | 0 .../{virtual => virt}/kvm/devices/arm-vgic-its.txt | 0 Documentation/{virtual => virt}/kvm/devices/arm-vgic-v3.txt | 0 Documentation/{virtual => virt}/kvm/devices/arm-vgic.txt| 0 Documentation/{virtual => virt}/kvm/devices/mpic.txt| 0 Documentation/{virtual => virt}/kvm/devices/s390_flic.txt | 0 Documentation/{virtual => virt}/kvm/devices/vcpu.txt| 0 Documentation/{virtual => virt}/kvm/devices/vfio.txt| 0 Documentation/{virtual => virt}/kvm/devices/vm.txt | 0 Documentation/{virtual => virt}/kvm/devices/xics.txt| 0 Documentation/{virtual => virt}/kvm/devices/xive.txt| 0 Documentation/{virtual => virt}/kvm/halt-polling.txt| 0 Documentation/{virtual => virt}/kvm/hypercalls.txt | 4 ++-- Documentation/{virtual => virt}/kvm/index.rst | 0 Documentation/{virtual => virt}/kvm/locking.txt | 0 Documentation/{virtual => virt}/kvm/mmu.txt | 2 +- Documentation/{virtual => virt}/kvm/msr.txt | 0 Documentation/{virtual => virt}/kvm/nested-vmx.txt | 0 Documentation/{virtual => virt}/kvm/ppc-pv.txt | 0 Documentation/{virtual => virt}/kvm/review-checklist.txt| 2 +- Documentation/{virtual => virt}/kvm/s390-diag.txt | 0 Documentation/{virtual => virt}/kvm/timekeeping.txt | 0 Documentation/{virtual => virt}/kvm/vcpu-requests.rst | 0 Documentation/{virtual => virt}/paravirt_ops.rst| 0 Documentation/{virtual => virt}/uml/UserModeLinux-HOWTO.txt | 0 MAINTAINERS | 6 +++--- arch/powerpc/include/uapi/asm/kvm_para.h| 2 +- arch/x86/kvm/mmu.c | 2 +- include/uapi/linux/kvm.h| 4 ++-- tools/include/uapi/linux/kvm.h | 4 ++-- virt/kvm/arm/arm.c | 2 +- virt/kvm/arm/vgic/vgic-mmio-v3.c| 2 +- virt/kvm/arm/vgic/vgic.h| 4 ++-- 40 files changed, 19 insertions(+), 19 deletions(-) rename Documentation/{virtual => virt}/index.rst (100%) rename Documentation/{virtual => virt}/kvm/amd-memory-encryption.rst (100%) rename Documentation/{virtual => virt}/kvm/api.txt (99%) rename Documentation/{virtual => virt}/kvm/arm/hyp-abi.txt (100%) rename Documentation/{virtual => virt}/kvm/arm/psci.txt (100%) rename Documentation/{virtual => virt}/kvm/cpuid.rst (100%) rename Documentation/{virtual => virt}/kvm/devices/README (100%) rename Documentation/{virtual => virt}/kvm/devices/arm-vgic-its.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/arm-vgic-v3.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/arm-vgic.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/mpic.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/s390_flic.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/vcpu.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/vfio.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/vm.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/xics.txt (100%) rename Documentation/{virtual => virt}/kvm/devices/xive.txt (100%) rename Documentation/{virtual => virt}/kvm/halt-polling.txt (100%) rename Documentation/{virtual => virt}/kvm/hypercalls.txt (97%) rename Documentation/{virtual => virt}/kvm/index.rst (100%) rename Documentation/{virtual => virt}/kvm/locking.txt (100%) rename Documentation/{virtual => virt}/kvm/mmu.txt (99%) rename Documentation/{virtual => virt}/kvm/msr.txt (100%) rename Documentation/{virtual => virt}/kvm/nested-vmx.txt (100%) rename Documentation/{virtual => virt}/kvm/ppc-pv.txt (100%) rename Documentation/{virtual => virt}/kvm/review-checklist.txt (95%) rename Documentation/{virtual => virt}/kvm/s390-diag.txt (100%) rename Documentation/{virtual => virt}/kvm/timekeeping.txt (100%) rename Documen
Re: [PATCH] Documentation: move Documentation/virtual to Documentation/virt
On 24/07/19 09:24, Christoph Hellwig wrote: > Renaming docs seems to be en vogue at the moment, so fix on of the > grossly misnamed directories. We usually never use "virtual" as > a shortcut for virtualization in the kernel, but always virt, > as seen in the virt/ top-level directory. Fix up the documentation > to match that. > > Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a common > "virtual" directory, I.E:") > Signed-off-by: Christoph Hellwig Queued, thanks. I can't count how many times I said "I really should rename that directory". Paolo > --- > Documentation/admin-guide/kernel-parameters.txt | 2 +- > Documentation/{virtual => virt}/index.rst | 0 > .../{virtual => virt}/kvm/amd-memory-encryption.rst | 0 > Documentation/{virtual => virt}/kvm/api.txt | 2 +- > Documentation/{virtual => virt}/kvm/arm/hyp-abi.txt | 0 > Documentation/{virtual => virt}/kvm/arm/psci.txt| 0 > Documentation/{virtual => virt}/kvm/cpuid.rst | 0 > Documentation/{virtual => virt}/kvm/devices/README | 0 > .../{virtual => virt}/kvm/devices/arm-vgic-its.txt | 0 > Documentation/{virtual => virt}/kvm/devices/arm-vgic-v3.txt | 0 > Documentation/{virtual => virt}/kvm/devices/arm-vgic.txt| 0 > Documentation/{virtual => virt}/kvm/devices/mpic.txt| 0 > Documentation/{virtual => virt}/kvm/devices/s390_flic.txt | 0 > Documentation/{virtual => virt}/kvm/devices/vcpu.txt| 0 > Documentation/{virtual => virt}/kvm/devices/vfio.txt| 0 > Documentation/{virtual => virt}/kvm/devices/vm.txt | 0 > Documentation/{virtual => virt}/kvm/devices/xics.txt| 0 > Documentation/{virtual => virt}/kvm/devices/xive.txt| 0 > Documentation/{virtual => virt}/kvm/halt-polling.txt| 0 > Documentation/{virtual => virt}/kvm/hypercalls.txt | 4 ++-- > Documentation/{virtual => virt}/kvm/index.rst | 0 > Documentation/{virtual => virt}/kvm/locking.txt | 0 > Documentation/{virtual => virt}/kvm/mmu.txt | 2 +- > Documentation/{virtual => virt}/kvm/msr.txt | 0 > Documentation/{virtual => virt}/kvm/nested-vmx.txt | 0 > Documentation/{virtual => virt}/kvm/ppc-pv.txt | 0 > Documentation/{virtual => virt}/kvm/review-checklist.txt| 2 +- > Documentation/{virtual => virt}/kvm/s390-diag.txt | 0 > Documentation/{virtual => virt}/kvm/timekeeping.txt | 0 > Documentation/{virtual => virt}/kvm/vcpu-requests.rst | 0 > Documentation/{virtual => virt}/paravirt_ops.rst| 0 > Documentation/{virtual => virt}/uml/UserModeLinux-HOWTO.txt | 0 > MAINTAINERS | 6 +++--- > arch/powerpc/include/uapi/asm/kvm_para.h| 2 +- > arch/x86/kvm/mmu.c | 2 +- > include/uapi/linux/kvm.h| 4 ++-- > tools/include/uapi/linux/kvm.h | 4 ++-- > virt/kvm/arm/arm.c | 2 +- > virt/kvm/arm/vgic/vgic-mmio-v3.c| 2 +- > virt/kvm/arm/vgic/vgic.h| 4 ++-- > 40 files changed, 19 insertions(+), 19 deletions(-) > rename Documentation/{virtual => virt}/index.rst (100%) > rename Documentation/{virtual => virt}/kvm/amd-memory-encryption.rst (100%) > rename Documentation/{virtual => virt}/kvm/api.txt (99%) > rename Documentation/{virtual => virt}/kvm/arm/hyp-abi.txt (100%) > rename Documentation/{virtual => virt}/kvm/arm/psci.txt (100%) > rename Documentation/{virtual => virt}/kvm/cpuid.rst (100%) > rename Documentation/{virtual => virt}/kvm/devices/README (100%) > rename Documentation/{virtual => virt}/kvm/devices/arm-vgic-its.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/arm-vgic-v3.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/arm-vgic.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/mpic.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/s390_flic.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/vcpu.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/vfio.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/vm.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/xics.txt (100%) > rename Documentation/{virtual => virt}/kvm/devices/xive.txt (100%) > rename Documentation/{virtual => virt}/kvm/halt-polling.txt (100%) > rename Documentation/{virtual => virt}/kvm/hypercalls.txt (97%) > rename Documentation/{virtual => virt}/kvm/index.rst (100%) > rename Documentation/{virtual => virt}/kvm/locking.txt (100%) > rename Documentation/{virtual => virt}/kvm/mmu.txt (99%) > rename Documentation/{virtual => virt}/kvm/msr.txt (100%) > rename Documentation/{virtual => virt}/kvm/neste
Re: [PATCH v3 02/12] fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
On Tue, Jul 23, 2019 at 12:51:25PM +0800, Wu Hao wrote: > +/** > + * dfl_fpga_cdev_config_port - configure a port feature dev > + * @cdev: parent container device. > + * @port_id: id of the port feature device. > + * @release: release port or assign port back. > + * > + * This function allows user to release port platform device or assign it > back. > + * e.g. to safely turn one port from PF into VF for PCI device SRIOV support, > + * release port platform device is one necessary step. > + */ > +int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, int port_id, > + bool release) > +{ > + return release ? detach_port_dev(cdev, port_id) : > + attach_port_dev(cdev, port_id); > +} > +EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port); That's a horrible api. Every time you see this call in code, you have to go and look up what "bool" means here. There's no reason for it. Just have 2 different functions, one that attaches a port, and one that detaches it. That way when you read the code that calls this function, you know what it does instantly without having to go look up some api function somewhere else. Write code for people to read first. And you are saving nothing here by trying to do two different things in the same exact function. thanks, greg k-h
Re: [PATCH v3 01/12] fpga: dfl: fme: support 512bit data width PR
On Tue, Jul 23, 2019 at 12:51:24PM +0800, Wu Hao wrote: > In early partial reconfiguration private feature, it only > supports 32bit data width when writing data to hardware for > PR. 512bit data width PR support is an important optimization > for some specific solutions (e.g. XEON with FPGA integrated), > it allows driver to use AVX512 instruction to improve the > performance of partial reconfiguration. e.g. programming one > 100MB bitstream image via this 512bit data width PR hardware > only takes ~300ms, but 32bit revision requires ~3s per test > result. > > Please note now this optimization is only done on revision 2 > of this PR private feature which is only used in integrated > solution that AVX512 is always supported. This revision 2 > hardware doesn't support 32bit PR. > > Signed-off-by: Ananda Ravuri > Signed-off-by: Xu Yilun > Signed-off-by: Wu Hao > Acked-by: Alan Tull > Signed-off-by: Moritz Fischer > --- > v2: remove DRV/MODULE_VERSION modifications > --- > drivers/fpga/dfl-fme-mgr.c | 110 > ++--- > drivers/fpga/dfl-fme-pr.c | 43 +++--- > drivers/fpga/dfl-fme.h | 2 + > drivers/fpga/dfl.h | 5 +++ > 4 files changed, 129 insertions(+), 31 deletions(-) > > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c > index b3f7eee..46e17f0 100644 > --- a/drivers/fpga/dfl-fme-mgr.c > +++ b/drivers/fpga/dfl-fme-mgr.c > @@ -22,6 +22,7 @@ > #include > #include > > +#include "dfl.h" > #include "dfl-fme-pr.h" > > /* FME Partial Reconfiguration Sub Feature Register Set */ > @@ -30,6 +31,7 @@ > #define FME_PR_STS 0x10 > #define FME_PR_DATA 0x18 > #define FME_PR_ERR 0x20 > +#define FME_PR_512_DATA 0x40 /* Data Register for 512bit > datawidth PR */ > #define FME_PR_INTFC_ID_L0xA8 > #define FME_PR_INTFC_ID_H0xB0 > > @@ -67,8 +69,43 @@ > #define PR_WAIT_TIMEOUT 800 > #define PR_HOST_STATUS_IDLE 0 > > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512) > + > +#include > +#include > + > +static inline int is_cpu_avx512_enabled(void) > +{ > + return cpu_feature_enabled(X86_FEATURE_AVX512F); > +} That's a very arch specific function, why would a driver ever care about this? > + > +static inline void copy512(const void *src, void __iomem *dst) > +{ > + kernel_fpu_begin(); > + > + asm volatile("vmovdqu64 (%0), %%zmm0;" > + "vmovntdq %%zmm0, (%1);" > + : > + : "r"(src), "r"(dst) > + : "memory"); > + > + kernel_fpu_end(); > +} Shouldn't this be an arch-specific function somewhere? Burying this in a random driver is not ok. Please make this generic for all systems. > +#else > +static inline int is_cpu_avx512_enabled(void) > +{ > + return 0; > +} > + > +static inline void copy512(const void *src, void __iomem *dst) > +{ > + WARN_ON_ONCE(1); Are you trying to get reports from syzbot? :) Please fix this all up. greg k-h
Re: [PATCH v3 03/12] fpga: dfl: pci: enable SRIOV support.
On Tue, Jul 23, 2019 at 12:51:26PM +0800, Wu Hao wrote: > This patch enables the standard sriov support. It allows user to > enable SRIOV (and VFs), then user could pass through accelerators > (VFs) into virtual machine or use VFs directly in host. > > Signed-off-by: Zhang Yi Z > Signed-off-by: Xu Yilun > Signed-off-by: Wu Hao > Acked-by: Alan Tull > Acked-by: Moritz Fischer > Signed-off-by: Moritz Fischer > --- > v2: remove DRV/MODULE_VERSION modifications. > --- > drivers/fpga/dfl-pci.c | 39 +++ > drivers/fpga/dfl.c | 41 + > drivers/fpga/dfl.h | 1 + > 3 files changed, 81 insertions(+) > > diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c > index 66b5720..0e65d81 100644 > --- a/drivers/fpga/dfl-pci.c > +++ b/drivers/fpga/dfl-pci.c > @@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct > pci_device_id *pcidevid) > return ret; > } > > +static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs) > +{ > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > + struct dfl_fpga_cdev *cdev = drvdata->cdev; > + int ret = 0; > + > + mutex_lock(&cdev->lock); > + > + if (!num_vfs) { > + /* > + * disable SRIOV and then put released ports back to default > + * PF access mode. > + */ > + pci_disable_sriov(pcidev); > + > + __dfl_fpga_cdev_config_port_vf(cdev, false); > + > + } else if (cdev->released_port_num == num_vfs) { > + /* > + * only enable SRIOV if cdev has matched released ports, put > + * released ports into VF access mode firstly. > + */ > + __dfl_fpga_cdev_config_port_vf(cdev, true); > + > + ret = pci_enable_sriov(pcidev, num_vfs); > + if (ret) > + __dfl_fpga_cdev_config_port_vf(cdev, false); > + } else { > + ret = -EINVAL; > + } > + > + mutex_unlock(&cdev->lock); > + return ret; > +} > + > static void cci_pci_remove(struct pci_dev *pcidev) > { > + if (dev_is_pf(&pcidev->dev)) > + cci_pci_sriov_configure(pcidev, 0); > + > cci_remove_feature_devs(pcidev); > pci_disable_pcie_error_reporting(pcidev); > } > @@ -234,6 +272,7 @@ static void cci_pci_remove(struct pci_dev *pcidev) > .id_table = cci_pcie_id_tbl, > .probe = cci_pci_probe, > .remove = cci_pci_remove, > + .sriov_configure = cci_pci_sriov_configure, > }; > > module_pci_driver(cci_pci_driver); > diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c > index e04ed45..c3a8e1d 100644 > --- a/drivers/fpga/dfl.c > +++ b/drivers/fpga/dfl.c > @@ -1112,6 +1112,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev > *cdev, int port_id, > } > EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port); > > +static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf) > +{ > + void __iomem *base; > + u64 v; > + > + base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER); > + > + v = readq(base + FME_HDR_PORT_OFST(port_id)); > + > + v &= ~FME_PORT_OFST_ACC_CTRL; > + v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL, > + is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF); > + > + writeq(v, base + FME_HDR_PORT_OFST(port_id)); > +} > + > +/** > + * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode > + * > + * @cdev: parent container device. > + * @if_vf: true for VF access mode, and false for PF access mode > + * > + * Return: 0 on success, negative error code otherwise. > + * > + * This function is needed in sriov configuration routine. It could be used > to > + * configures the released ports access mode to VF or PF. > + * The caller needs to hold lock for protection. > + */ > +void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf) > +{ > + struct dfl_feature_platform_data *pdata; > + > + list_for_each_entry(pdata, &cdev->port_dev_list, node) { > + if (device_is_registered(&pdata->dev->dev)) > + continue; > + > + config_port_vf(cdev->fme_dev, pdata->id, is_vf); > + } > +} > +EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf); Why are you exporting a function with a leading __? You are expecting someone else, in who knows what code, to do locking correctly? If so, and the caller always has to have a local lock, then it's not a big deal, just drop the '__', otherwise if you have to have a specific lock for a specific device, then you have a really complex and probably broken api here :( thanks, greg k-h
Re: [PATCH v3 04/12] fpga: dfl: afu: add AFU state related sysfs interfaces
On Tue, Jul 23, 2019 at 12:51:27PM +0800, Wu Hao wrote: > This patch introduces more sysfs interfaces for Accelerated > Function Unit (AFU). These interfaces allow users to read > current AFU Power State (APx), read / clear AFU Power (APx) > events which are sticky to identify transient APx state, > and manage AFU's LTR (latency tolerance reporting). > > Signed-off-by: Ananda Ravuri > Signed-off-by: Xu Yilun > Signed-off-by: Wu Hao > Acked-by: Alan Tull > Signed-off-by: Moritz Fischer > --- > v2: rebased, and remove DRV/MODULE_VERSION modifications > v3: update kernel version and date in sysfs doc > --- > Documentation/ABI/testing/sysfs-platform-dfl-port | 30 + > drivers/fpga/dfl-afu-main.c | 137 > ++ > drivers/fpga/dfl.h| 11 ++ > 3 files changed, 178 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port > b/Documentation/ABI/testing/sysfs-platform-dfl-port > index 6a92dda..5961fb2 100644 > --- a/Documentation/ABI/testing/sysfs-platform-dfl-port > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port > @@ -14,3 +14,33 @@ Description: Read-only. User can program different > PR bitstreams to FPGA > Accelerator Function Unit (AFU) for different functions. It > returns uuid which could be used to identify which PR bitstream > is programmed in this AFU. > + > +What:/sys/bus/platform/devices/dfl-port.0/power_state > +Date:July 2019 > +KernelVersion: 5.4 > +Contact: Wu Hao > +Description: Read-only. It reports the APx (AFU Power) state, different APx > + means different throttling level. When reading this file, it > + returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6. > + > +What:/sys/bus/platform/devices/dfl-port.0/ap1_event > +Date:July 2019 > +KernelVersion: 5.4 > +Contact: Wu Hao > +Description: Read-write. Read or set 1 to clear AP1 (AFU Power State 1) > + event. It's used to indicate transient AP1 state. So reading the value changes the state of the system? That's almost always never a good idea. Force userspace to write the value to change something. Otherwise all libraries that use sysfs will be accidentally changing the state of your system without you ever knowing it. > + > +What:/sys/bus/platform/devices/dfl-port.0/ap2_event > +Date:July 2019 > +KernelVersion: 5.4 > +Contact: Wu Hao > +Description: Read-write. Read or set 1 to clear AP2 (AFU Power State 2) > + event. It's used to indicate transient AP2 state. > + > +What:/sys/bus/platform/devices/dfl-port.0/ltr > +Date:July 2019 > +KernelVersion: 5.4 > +Contact: Wu Hao > +Description: Read-write. Read and set AFU latency tolerance reporting value. > + Set ltr to 1 if the AFU can tolerate latency >= 40us or set it > + to 0 if it is latency sensitive. > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c > index 68b4d08..cb3f73e 100644 > --- a/drivers/fpga/dfl-afu-main.c > +++ b/drivers/fpga/dfl-afu-main.c > @@ -141,8 +141,145 @@ static int port_get_id(struct platform_device *pdev) > } > static DEVICE_ATTR_RO(id); > > +static ssize_t > +ltr_show(struct device *dev, struct device_attribute *attr, char *buf) > +{ > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > + void __iomem *base; > + u64 v; > + > + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); > + > + mutex_lock(&pdata->lock); > + v = readq(base + PORT_HDR_CTRL); > + mutex_unlock(&pdata->lock); Why do you need a lock to call readq()? What are you protecting here? > + > + return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_CTRL_LATENCY, v)); > +} > + > +static ssize_t > +ltr_store(struct device *dev, struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > + void __iomem *base; > + u8 ltr; > + u64 v; > + > + if (kstrtou8(buf, 0,1) > + return -EINVAL; Are you doing anything with this value? If not, how about just using the sysfs boolean read function and if it is 1, then do the clearing? Same for all other show/store functions in here. thanks, greg k-h
doc: mds: nitpicking and typo fix
Consistently end sentences, fix typo. Signed-off-by: Pavel Machek commit 310cb17613f46db97cebbd9dc11187961e4b1c69 Author: Pavel Date: Mon May 20 10:46:35 2019 +0200 doc: typo fix, consistency in mds. diff --git a/Documentation/x86/mds.rst b/Documentation/x86/mds.rst index 5d4330b..9983b50 100644 --- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -54,13 +54,13 @@ needed for exploiting MDS requires: - to control the load to trigger a fault or assist - to have a disclosure gadget which exposes the speculatively accessed - data for consumption through a side channel. + data for consumption through a side channel - to control the pointer through which the disclosure gadget exposes the data The existence of such a construct in the kernel cannot be excluded with -100% certainty, but the complexity involved makes it extremly unlikely. +100% certainty, but the complexity involved makes it extremely unlikely. There is one exception, which is untrusted BPF. The functionality of untrusted BPF is limited, but it needs to be thoroughly investigated -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v3 09/12] fpga: dfl: afu: add STP (SignalTap) support
On Tue, Jul 23, 2019 at 12:51:32PM +0800, Wu Hao wrote: > STP (SignalTap) is one of the private features under the port for > debugging. This patch adds private feature driver support for it > to allow userspace applications to mmap related mmio region and > provide STP service. > > Signed-off-by: Xu Yilun > Signed-off-by: Wu Hao > Acked-by: Moritz Fischer > Acked-by: Alan Tull > Signed-off-by: Moritz Fischer > --- > drivers/fpga/dfl-afu-main.c | 34 ++ > 1 file changed, 34 insertions(+) > > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c > index 15dd4cb..395f96e 100644 > --- a/drivers/fpga/dfl-afu-main.c > +++ b/drivers/fpga/dfl-afu-main.c > @@ -514,6 +514,36 @@ static void port_afu_uinit(struct platform_device *pdev, > .uinit = port_afu_uinit, > }; > > +static int port_stp_init(struct platform_device *pdev, > + struct dfl_feature *feature) > +{ > + struct resource *res = &pdev->resource[feature->resource_index]; > + > + dev_dbg(&pdev->dev, "PORT STP Init.\n"); ftrace is your friend, no need to do a lot of "look I am here!" messages. > + > + return afu_mmio_region_add(dev_get_platdata(&pdev->dev), > +DFL_PORT_REGION_INDEX_STP, > +resource_size(res), res->start, > +DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ | > +DFL_PORT_REGION_WRITE); > +} > + > +static void port_stp_uinit(struct platform_device *pdev, > +struct dfl_feature *feature) > +{ > + dev_dbg(&pdev->dev, "PORT STP UInit.\n"); Same here. Why have this function at all if it does not do anything? thanks, greg k-h
Re: [PATCH v3 09/12] fpga: dfl: afu: add STP (SignalTap) support
On Wed, Jul 24, 2019 at 12:11:09PM +0200, Greg KH wrote: > On Tue, Jul 23, 2019 at 12:51:32PM +0800, Wu Hao wrote: > > STP (SignalTap) is one of the private features under the port for > > debugging. This patch adds private feature driver support for it > > to allow userspace applications to mmap related mmio region and > > provide STP service. > > > > Signed-off-by: Xu Yilun > > Signed-off-by: Wu Hao > > Acked-by: Moritz Fischer > > Acked-by: Alan Tull > > Signed-off-by: Moritz Fischer > > --- > > drivers/fpga/dfl-afu-main.c | 34 ++ > > 1 file changed, 34 insertions(+) > > > > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c > > index 15dd4cb..395f96e 100644 > > --- a/drivers/fpga/dfl-afu-main.c > > +++ b/drivers/fpga/dfl-afu-main.c > > @@ -514,6 +514,36 @@ static void port_afu_uinit(struct platform_device > > *pdev, > > .uinit = port_afu_uinit, > > }; > > > > +static int port_stp_init(struct platform_device *pdev, > > +struct dfl_feature *feature) > > +{ > > + struct resource *res = &pdev->resource[feature->resource_index]; > > + > > + dev_dbg(&pdev->dev, "PORT STP Init.\n"); > > ftrace is your friend, no need to do a lot of "look I am here!" > messages. Hi Greg, Thanks for the code review! Sure, let me remove them. > > > + > > + return afu_mmio_region_add(dev_get_platdata(&pdev->dev), > > + DFL_PORT_REGION_INDEX_STP, > > + resource_size(res), res->start, > > + DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ | > > + DFL_PORT_REGION_WRITE); > > +} > > + > > +static void port_stp_uinit(struct platform_device *pdev, > > + struct dfl_feature *feature) > > +{ > > + dev_dbg(&pdev->dev, "PORT STP UInit.\n"); > > Same here. > > Why have this function at all if it does not do anything? Let me remove them in the next version. actually uinit callback is always required in current code, i will add one more patch to change it, and remove all uinit functions who do nothing, it does save code. Thanks for the comments. Hao > > > thanks, > > greg k-h
Re: [PATCH v3 04/12] fpga: dfl: afu: add AFU state related sysfs interfaces
On Wed, Jul 24, 2019 at 11:41:10AM +0200, Greg KH wrote: > On Tue, Jul 23, 2019 at 12:51:27PM +0800, Wu Hao wrote: > > This patch introduces more sysfs interfaces for Accelerated > > Function Unit (AFU). These interfaces allow users to read > > current AFU Power State (APx), read / clear AFU Power (APx) > > events which are sticky to identify transient APx state, > > and manage AFU's LTR (latency tolerance reporting). > > > > Signed-off-by: Ananda Ravuri > > Signed-off-by: Xu Yilun > > Signed-off-by: Wu Hao > > Acked-by: Alan Tull > > Signed-off-by: Moritz Fischer > > --- > > v2: rebased, and remove DRV/MODULE_VERSION modifications > > v3: update kernel version and date in sysfs doc > > --- > > Documentation/ABI/testing/sysfs-platform-dfl-port | 30 + > > drivers/fpga/dfl-afu-main.c | 137 > > ++ > > drivers/fpga/dfl.h| 11 ++ > > 3 files changed, 178 insertions(+) > > > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port > > b/Documentation/ABI/testing/sysfs-platform-dfl-port > > index 6a92dda..5961fb2 100644 > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-port > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port > > @@ -14,3 +14,33 @@ Description: Read-only. User can program different > > PR bitstreams to FPGA > > Accelerator Function Unit (AFU) for different functions. It > > returns uuid which could be used to identify which PR bitstream > > is programmed in this AFU. > > + > > +What: /sys/bus/platform/devices/dfl-port.0/power_state > > +Date: July 2019 > > +KernelVersion: 5.4 > > +Contact: Wu Hao > > +Description: Read-only. It reports the APx (AFU Power) state, > > different APx > > + means different throttling level. When reading this file, it > > + returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6. > > + > > +What: /sys/bus/platform/devices/dfl-port.0/ap1_event > > +Date: July 2019 > > +KernelVersion: 5.4 > > +Contact: Wu Hao > > +Description: Read-write. Read or set 1 to clear AP1 (AFU Power State > > 1) > > + event. It's used to indicate transient AP1 state. > > So reading the value changes the state of the system? That's almost > always never a good idea. > > Force userspace to write the value to change something. Otherwise all > libraries that use sysfs will be accidentally changing the state of your > system without you ever knowing it. Oh.. I think the description makes some misunderstanding here, will fix it in the next version. This AP1/AP2 event will only be cleared by write 1 to it, read will not change the state. > > > + > > +What: /sys/bus/platform/devices/dfl-port.0/ap2_event > > +Date: July 2019 > > +KernelVersion: 5.4 > > +Contact: Wu Hao > > +Description: Read-write. Read or set 1 to clear AP2 (AFU Power State > > 2) > > + event. It's used to indicate transient AP2 state. > > + > > +What: /sys/bus/platform/devices/dfl-port.0/ltr > > +Date: July 2019 > > +KernelVersion: 5.4 > > +Contact: Wu Hao > > +Description: Read-write. Read and set AFU latency tolerance > > reporting value. > > + Set ltr to 1 if the AFU can tolerate latency >= 40us or set it > > + to 0 if it is latency sensitive. > > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c > > index 68b4d08..cb3f73e 100644 > > --- a/drivers/fpga/dfl-afu-main.c > > +++ b/drivers/fpga/dfl-afu-main.c > > @@ -141,8 +141,145 @@ static int port_get_id(struct platform_device *pdev) > > } > > static DEVICE_ATTR_RO(id); > > > > +static ssize_t > > +ltr_show(struct device *dev, struct device_attribute *attr, char *buf) > > +{ > > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > > + void __iomem *base; > > + u64 v; > > + > > + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); > > + > > + mutex_lock(&pdata->lock); > > + v = readq(base + PORT_HDR_CTRL); > > + mutex_unlock(&pdata->lock); > > Why do you need a lock to call readq()? What are you protecting here? If this code is running on 32bit machine, readq will be replaced with 2 readl operation. If that is the case, should we protect the code against it? > > > > + > > + return sprintf(buf, "%x\n", (u8)FIELD_GET(PORT_CTRL_LATENCY, v)); > > +} > > + > > +static ssize_t > > +ltr_store(struct device *dev, struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > > + void __iomem *base; > > + u8 ltr; > > + u64 v; > > + > > + if (kstrtou8(buf, 0,1) > > + return -EINVAL; > > Are you doing anything with this value? If not, how about just using > the sysfs boolean read function and if it is 1, then do the clearin
Re: [PATCH v3 03/12] fpga: dfl: pci: enable SRIOV support.
On Wed, Jul 24, 2019 at 11:37:44AM +0200, Greg KH wrote: > On Tue, Jul 23, 2019 at 12:51:26PM +0800, Wu Hao wrote: > > This patch enables the standard sriov support. It allows user to > > enable SRIOV (and VFs), then user could pass through accelerators > > (VFs) into virtual machine or use VFs directly in host. > > > > Signed-off-by: Zhang Yi Z > > Signed-off-by: Xu Yilun > > Signed-off-by: Wu Hao > > Acked-by: Alan Tull > > Acked-by: Moritz Fischer > > Signed-off-by: Moritz Fischer > > --- > > v2: remove DRV/MODULE_VERSION modifications. > > --- > > drivers/fpga/dfl-pci.c | 39 +++ > > drivers/fpga/dfl.c | 41 + > > drivers/fpga/dfl.h | 1 + > > 3 files changed, 81 insertions(+) > > > > diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c > > index 66b5720..0e65d81 100644 > > --- a/drivers/fpga/dfl-pci.c > > +++ b/drivers/fpga/dfl-pci.c > > @@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct > > pci_device_id *pcidevid) > > return ret; > > } > > > > +static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs) > > +{ > > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > > + struct dfl_fpga_cdev *cdev = drvdata->cdev; > > + int ret = 0; > > + > > + mutex_lock(&cdev->lock); > > + > > + if (!num_vfs) { > > + /* > > +* disable SRIOV and then put released ports back to default > > +* PF access mode. > > +*/ > > + pci_disable_sriov(pcidev); > > + > > + __dfl_fpga_cdev_config_port_vf(cdev, false); > > + > > + } else if (cdev->released_port_num == num_vfs) { > > + /* > > +* only enable SRIOV if cdev has matched released ports, put > > +* released ports into VF access mode firstly. > > +*/ > > + __dfl_fpga_cdev_config_port_vf(cdev, true); > > + > > + ret = pci_enable_sriov(pcidev, num_vfs); > > + if (ret) > > + __dfl_fpga_cdev_config_port_vf(cdev, false); > > + } else { > > + ret = -EINVAL; > > + } > > + > > + mutex_unlock(&cdev->lock); > > + return ret; > > +} > > + > > static void cci_pci_remove(struct pci_dev *pcidev) > > { > > + if (dev_is_pf(&pcidev->dev)) > > + cci_pci_sriov_configure(pcidev, 0); > > + > > cci_remove_feature_devs(pcidev); > > pci_disable_pcie_error_reporting(pcidev); > > } > > @@ -234,6 +272,7 @@ static void cci_pci_remove(struct pci_dev *pcidev) > > .id_table = cci_pcie_id_tbl, > > .probe = cci_pci_probe, > > .remove = cci_pci_remove, > > + .sriov_configure = cci_pci_sriov_configure, > > }; > > > > module_pci_driver(cci_pci_driver); > > diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c > > index e04ed45..c3a8e1d 100644 > > --- a/drivers/fpga/dfl.c > > +++ b/drivers/fpga/dfl.c > > @@ -1112,6 +1112,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev > > *cdev, int port_id, > > } > > EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port); > > > > +static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf) > > +{ > > + void __iomem *base; > > + u64 v; > > + > > + base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER); > > + > > + v = readq(base + FME_HDR_PORT_OFST(port_id)); > > + > > + v &= ~FME_PORT_OFST_ACC_CTRL; > > + v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL, > > + is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF); > > + > > + writeq(v, base + FME_HDR_PORT_OFST(port_id)); > > +} > > + > > +/** > > + * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode > > + * > > + * @cdev: parent container device. > > + * @if_vf: true for VF access mode, and false for PF access mode > > + * > > + * Return: 0 on success, negative error code otherwise. > > + * > > + * This function is needed in sriov configuration routine. It could be > > used to > > + * configures the released ports access mode to VF or PF. > > + * The caller needs to hold lock for protection. > > + */ > > +void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf) > > +{ > > + struct dfl_feature_platform_data *pdata; > > + > > + list_for_each_entry(pdata, &cdev->port_dev_list, node) { > > + if (device_is_registered(&pdata->dev->dev)) > > + continue; > > + > > + config_port_vf(cdev->fme_dev, pdata->id, is_vf); > > + } > > +} > > +EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf); > > Why are you exporting a function with a leading __? > > You are expecting someone else, in who knows what code, to do locking > correctly? If so, and the caller always has to have a local lock, then > it's not a big deal, just drop the '__', otherwise if you have to have a > specific lock for a specific device, then you have a really complex and > probably broken api here :( Yes, I just want to remind the user of this API, caller needs to
Re: [PATCH v3 02/12] fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
On Wed, Jul 24, 2019 at 11:33:57AM +0200, Greg KH wrote: > On Tue, Jul 23, 2019 at 12:51:25PM +0800, Wu Hao wrote: > > +/** > > + * dfl_fpga_cdev_config_port - configure a port feature dev > > + * @cdev: parent container device. > > + * @port_id: id of the port feature device. > > + * @release: release port or assign port back. > > + * > > + * This function allows user to release port platform device or assign it > > back. > > + * e.g. to safely turn one port from PF into VF for PCI device SRIOV > > support, > > + * release port platform device is one necessary step. > > + */ > > +int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev, int port_id, > > + bool release) > > +{ > > + return release ? detach_port_dev(cdev, port_id) : > > +attach_port_dev(cdev, port_id); > > +} > > +EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port); > > That's a horrible api. Every time you see this call in code, you have > to go and look up what "bool" means here. There's no reason for it. > > Just have 2 different functions, one that attaches a port, and one that > detaches it. That way when you read the code that calls this function, > you know what it does instantly without having to go look up some api > function somewhere else. > > Write code for people to read first. And you are saving nothing here by > trying to do two different things in the same exact function. I see, you're right, it saves everybody's time on reading, very important. I will fix this and keep it in mind. Thank you. Hao > > thanks, > > greg k-h
Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing
On Wed, Jul 24, 2019 at 01:28:42PM +0900, Minchan Kim wrote: > On Tue, Jul 23, 2019 at 10:20:49AM -0400, Joel Fernandes wrote: > > On Tue, Jul 23, 2019 at 03:13:58PM +0900, Minchan Kim wrote: > > > Hi Joel, > > > > > > On Mon, Jul 22, 2019 at 05:32:04PM -0400, Joel Fernandes (Google) wrote: > > > > The page_idle tracking feature currently requires looking up the pagemap > > > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > > > This is quite cumbersome and can be error-prone too. If between > > > > > > cumbersome: That's the fair tradeoff between idle page tracking and > > > clear_refs because idle page tracking could check even though the page > > > is not mapped. > > > > It is fair tradeoff, but could be made simpler. The userspace code got > > reduced by a good amount as well. > > > > > error-prone: What's the error? > > > > We see in normal Android usage, that some of the times pages appear not to > > be > > idle even when they really are idle. Reproducing this is a bit unpredictable > > and happens at random occasions. With this new interface, we are seeing this > > happen much much lesser. > > I don't know how you did test. Maybe that could be contributed by > swapping out or shared pages touched by other processes or some kernel > behavior not to keep access bit of their operation. It could be something along these lines is my thinking as well. So we know its already has issues due to what you mentioned, I am not sure what else needs investigation? > Please investigate more what's the root cause. That would be important > point to justify for the patch motivation. The motivation is security. I am dropping the 'accuracy' factor I mentioned from the patch description since it created a lot of confusion. > > > > More over looking up PFN from pagemap in Android devices is not > > > > supported by unprivileged process and requires SYS_ADMIN and gives 0 for > > > > the PFN. > > > > > > > > This patch adds support to directly interact with page_idle tracking at > > > > the PID level by introducing a /proc//page_idle file. This > > > > eliminates the need for userspace to calculate the mapping of the page. > > > > It follows the exact same semantics as the global > > > > /sys/kernel/mm/page_idle, however it is easier to use for some usecases > > > > where looking up PFN is not needed and also does not require SYS_ADMIN. > > > > > > Ah, so the primary goal is to provide convinience interface and it would > > > help accurary, too. IOW, accuracy is not your main goal? > > > > There are a couple of primary goals: Security, conveience and also solving > > the accuracy/reliability problem we are seeing. Do keep in mind looking up > > PFN has security implications. The PFN field in pagemap is zeroed if the > > user > > does not have CAP_SYS_ADMIN. > > Myaybe you don't need PFN. is it? With the traditional idle tracking, PFN is needed which has the mentioned security issues. This patch solves it. And the interface is identical and familiar to the existing page_idle bitmap interface. > > > > In Android, we are using this for the heap profiler (heapprofd) which > > > > profiles and pin points code paths which allocates and leaves memory > > > > idle for long periods of time. > > > > > > So the goal is to detect idle pages with idle memory tracking? > > > > Isn't that what idle memory tracking does? > > To me, it's rather misleading. Please read motivation section in document. > The feature would be good to detect workingset pages, not idle pages > because workingset pages are never freed, swapped out and even we could > count on newly allocated pages. > > Motivation > == > > The idle page tracking feature allows to track which memory pages are being > accessed by a workload and which are idle. This information can be useful for > estimating the workload's working set size, which, in turn, can be taken into > account when configuring the workload parameters, setting memory cgroup > limits, > or deciding where to place the workload within a compute cluster. As we discussed by chat, we could collect additional metadata to check if pages were swapped or freed ever since the time we marked them as idle. However this can be incremental improvement. > > > It couldn't work well because such idle pages could finally swap out and > > > lose every flags of the page descriptor which is working mechanism of > > > idle page tracking. It should have named "workingset page tracking", > > > not "idle page tracking". > > > > The heap profiler that uses page-idle tracking is not to measure working > > set, > > but to look for pages that are idle for long periods of time. > > It's important part. Please include it in the description so that people > understands what's the usecase. As I said above, if it aims for finding > idle pages durting the period, current idle page tracking feature is not > good ironically. Ok, I will mention. > > Thanks for bringing up the swapping c
Re: [PATCH v3 01/12] fpga: dfl: fme: support 512bit data width PR
On Wed, Jul 24, 2019 at 11:35:32AM +0200, Greg KH wrote: > On Tue, Jul 23, 2019 at 12:51:24PM +0800, Wu Hao wrote: > > In early partial reconfiguration private feature, it only > > supports 32bit data width when writing data to hardware for > > PR. 512bit data width PR support is an important optimization > > for some specific solutions (e.g. XEON with FPGA integrated), > > it allows driver to use AVX512 instruction to improve the > > performance of partial reconfiguration. e.g. programming one > > 100MB bitstream image via this 512bit data width PR hardware > > only takes ~300ms, but 32bit revision requires ~3s per test > > result. > > > > Please note now this optimization is only done on revision 2 > > of this PR private feature which is only used in integrated > > solution that AVX512 is always supported. This revision 2 > > hardware doesn't support 32bit PR. > > > > Signed-off-by: Ananda Ravuri > > Signed-off-by: Xu Yilun > > Signed-off-by: Wu Hao > > Acked-by: Alan Tull > > Signed-off-by: Moritz Fischer > > --- > > v2: remove DRV/MODULE_VERSION modifications > > --- > > drivers/fpga/dfl-fme-mgr.c | 110 > > ++--- > > drivers/fpga/dfl-fme-pr.c | 43 +++--- > > drivers/fpga/dfl-fme.h | 2 + > > drivers/fpga/dfl.h | 5 +++ > > 4 files changed, 129 insertions(+), 31 deletions(-) > > > > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c > > index b3f7eee..46e17f0 100644 > > --- a/drivers/fpga/dfl-fme-mgr.c > > +++ b/drivers/fpga/dfl-fme-mgr.c > > @@ -22,6 +22,7 @@ > > #include > > #include > > > > +#include "dfl.h" > > #include "dfl-fme-pr.h" > > > > /* FME Partial Reconfiguration Sub Feature Register Set */ > > @@ -30,6 +31,7 @@ > > #define FME_PR_STS 0x10 > > #define FME_PR_DATA0x18 > > #define FME_PR_ERR 0x20 > > +#define FME_PR_512_DATA0x40 /* Data Register for 512bit > > datawidth PR */ > > #define FME_PR_INTFC_ID_L 0xA8 > > #define FME_PR_INTFC_ID_H 0xB0 > > > > @@ -67,8 +69,43 @@ > > #define PR_WAIT_TIMEOUT 800 > > #define PR_HOST_STATUS_IDLE0 > > > > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512) > > + > > +#include > > +#include > > + > > +static inline int is_cpu_avx512_enabled(void) > > +{ > > + return cpu_feature_enabled(X86_FEATURE_AVX512F); > > +} > > That's a very arch specific function, why would a driver ever care about > this? Yes, this is only applied to a specific FPGA solution, which FPGA has been integrated with XEON. Hardware indicates this using register to software. As it's cpu integrated solution, so CPU always has this AVX512 capability. The only check we do, is make sure this is not manually disabled by kernel. With this hardware, software could use AVX512 to accelerate the FPGA partial reconfiguration as mentioned in the patch commit message. It brings performance benifits to people who uses it. This is only one optimization (512 vs 32bit data write to hw) for a specific hardware. For other discrete solutions, e.g. FPGA PCIe Card, this is not used at all as driver does check hardware register to avoid any AVX512 code. > > > + > > +static inline void copy512(const void *src, void __iomem *dst) > > +{ > > + kernel_fpu_begin(); > > + > > + asm volatile("vmovdqu64 (%0), %%zmm0;" > > +"vmovntdq %%zmm0, (%1);" > > +: > > +: "r"(src), "r"(dst) > > +: "memory"); > > + > > + kernel_fpu_end(); > > +} > > Shouldn't this be an arch-specific function somewhere? Burying this in > a random driver is not ok. Please make this generic for all systems. If more people need the same avx operation like this in kernel, then maybe this can be moved to some arch-specific lib code somewhere as some common functions to everybody, but i am not very sure if this is the case. Let me think about this more. > > > +#else > > +static inline int is_cpu_avx512_enabled(void) > > +{ > > + return 0; > > +} > > + > > +static inline void copy512(const void *src, void __iomem *dst) > > +{ > > + WARN_ON_ONCE(1); > > Are you trying to get reports from syzbot? :) Oh.. no.. I will remove it. :) Thank you very much! Hao > > Please fix this all up. > > greg k-h
[PATCH] hung_task: Allow printing warnings every check interval
Hung task detector has one timeout and has two associated actions on it: - issuing warnings with names and stacks of blocked tasks - panic() We want switches to panic (and reboot) if there's a task in uninterruptible sleep for some minutes - at that moment something ugly has happened and the box needs a reboot. But we also want to detect conditions that are "out of range" or approaching the point of failure. Under such conditions we want to issue an "early warning" of an impending failure, minutes before the switch is going to panic. Those "early warnings" serve a purpose while monitoring the network infrastructure. Those are also valuable on post-mortem analysis, when the logs from userspace applications aren't enough. Furthermore, we have a test pool of long-running duts that are constantly under close to real-world load for weeks. And such early warnings allowed to figure out some bottle necks without much engineer work intervention. There are also not yet upstream patches for other kinds of "early warnings" as prints whenever a mutex/semaphore is released after being held for long time, but those patches are much more intricate and have their runtime cost. It seems rather easy to add printing tasks and their stacks for notification and debugging purposes into hung task detector without complicating the code or major cost (prints are with KERN_INFO loglevel and so don't go on console, only into dmesg log). Since commit a2e514453861 ("kernel/hung_task.c: allow to set checking interval separately from timeout") it's possible to set checking interval for hung task detector with `hung_task_check_interval_secs`. Provide `hung_task_interval_warnings` sysctl that allows printing hung tasks every detection interval. It's not ratelimited, so the root should be cautious configuring it. Cc: Andrew Morton Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Tetsuo Handa Cc: Thomas Gleixner Cc: "Peter Zijlstra (Intel)" Cc: Vasiliy Khoruzhick Cc: linux-doc@vger.kernel.org Cc: linux-fsde...@vger.kernel.org Signed-off-by: Dmitry Safonov --- Documentation/admin-guide/sysctl/kernel.rst | 20 - include/linux/sched/sysctl.h| 1 + kernel/hung_task.c | 50 ++--- kernel/sysctl.c | 8 4 files changed, 62 insertions(+), 17 deletions(-) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 032c7cd3cede..2e36620ec1e4 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -45,6 +45,7 @@ show up in /proc/sys/kernel: - hung_task_timeout_secs - hung_task_check_interval_secs - hung_task_warnings +- hung_task_interval_warnings - hyperv_record_panic_msg - kexec_load_disabled - kptr_restrict @@ -383,14 +384,29 @@ Possible values to set are in range {0..LONG_MAX/HZ}. hung_task_warnings: === -The maximum number of warnings to report. During a check interval -if a hung task is detected, this value is decreased by 1. +The maximum number of warnings to report. If after timeout a hung +task is present, this value is decreased by 1 every check interval, +producing a warning. When this value reaches 0, no more warnings will be reported. This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. -1: report an infinite number of warnings. +hung_task_interval_warnings: +=== + +The same as hung_task_warnings, but set the number of interval +warnings to be issued about detected hung tasks during check +interval. That will produce warnings *before* the timeout happens. +If a hung task is detected during check interval, this value is +decreased by 1. When this value reaches 0, only timeout warnings +will be reported. +This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. + +-1: report an infinite number of check interval warnings. + + hyperv_record_panic_msg: diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index d4f6215ee03f..89f55e914673 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -12,6 +12,7 @@ extern unsigned int sysctl_hung_task_panic; extern unsigned long sysctl_hung_task_timeout_secs; extern unsigned long sysctl_hung_task_check_interval_secs; extern int sysctl_hung_task_warnings; +extern int sysctl_hung_task_interval_warnings; extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 14a625c16cb3..cd971eef8226 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -49,6 +49,7 @@ unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_ unsigned long __read_mostly sysctl_hung_task_check_interval_secs; int __read_mostly sysctl_hung_ta
[PATCH v15 00/13] TCU patchset v15
Hi, This is the V15 of my Ingenic TCU patchet. The big change since V14 is that the custom MFD driver (ex patch 04/13) was dropped in favor of a small patch to syscon and a "simple-mfd" compatible. The patchset was based on mips/mips-next, but all of them minus the last one will apply cleanly on v5.3-rc1. Changelog: * [02/13]: Remove info about MFD driver * [03/13]: Add "simple-mfd" compatible string * [04/13]: New patch * [05/13]: - Use CLK_OF_DECLARE_DRIVER since we use "simple-mfd" - Use device_node_to_regmap() * [06/13]: Use device_node_to_regmap() * [07/13]: Use device_node_to_regmap() * [09/13]: Add "simple-mfd" compatible string Cheers, -Paul
[PATCH v15 01/13] dt-bindings: ingenic: Add DT bindings for TCU clocks
This header provides clock numbers for the ingenic,tcu DT binding. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek Reviewed-by: Rob Herring Acked-by: Stephen Boyd --- Notes: v2: Use SPDX identifier for the license v3/v4: No change v5: s/JZ47*_/TCU_/ and dropped *_CLK_LAST defines v6-v15: No change include/dt-bindings/clock/ingenic,tcu.h | 20 1 file changed, 20 insertions(+) create mode 100644 include/dt-bindings/clock/ingenic,tcu.h diff --git a/include/dt-bindings/clock/ingenic,tcu.h b/include/dt-bindings/clock/ingenic,tcu.h new file mode 100644 index ..d569650a7945 --- /dev/null +++ b/include/dt-bindings/clock/ingenic,tcu.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This header provides clock numbers for the ingenic,tcu DT binding. + */ + +#ifndef __DT_BINDINGS_CLOCK_INGENIC_TCU_H__ +#define __DT_BINDINGS_CLOCK_INGENIC_TCU_H__ + +#define TCU_CLK_TIMER0 0 +#define TCU_CLK_TIMER1 1 +#define TCU_CLK_TIMER2 2 +#define TCU_CLK_TIMER3 3 +#define TCU_CLK_TIMER4 4 +#define TCU_CLK_TIMER5 5 +#define TCU_CLK_TIMER6 6 +#define TCU_CLK_TIMER7 7 +#define TCU_CLK_WDT8 +#define TCU_CLK_OST9 + +#endif /* __DT_BINDINGS_CLOCK_INGENIC_TCU_H__ */ -- 2.21.0.593.g511ec345e18
[PATCH v15 02/13] doc: Add doc for the Ingenic TCU hardware
Add documentation about the Timer/Counter Unit (TCU) present in the Ingenic JZ47xx SoCs. The Timer/Counter Unit (TCU) in Ingenic JZ47xx SoCs is a multi-function hardware block. It features up to to eight channels, that can be used as counters, timers, or PWM. - JZ4725B, JZ4750, JZ4755 only have six TCU channels. The other SoCs all have eight channels. - JZ4725B introduced a separate channel, called Operating System Timer (OST). It is a 32-bit programmable timer. On JZ4770 and above, it is 64-bit. - Each one of the TCU channels has its own clock, which can be reparented to three different clocks (pclk, ext, rtc), gated, and reclocked, through their TCSR register. * The watchdog and OST hardware blocks also feature a TCSR register with the same format in their register space. * The TCU registers used to gate/ungate can also gate/ungate the watchdog and OST clocks. - Each TCU channel works in one of two modes: * mode TCU1: channels cannot work in sleep mode, but are easier to operate. * mode TCU2: channels can work in sleep mode, but the operation is a bit more complicated than with TCU1 channels. - The mode of each TCU channel depends on the SoC used: * On the oldest SoCs (up to JZ4740), all of the eight channels operate in TCU1 mode. * On JZ4725B, channel 5 operates as TCU2, the others operate as TCU1. * On newest SoCs (JZ4750 and above), channels 1-2 operate as TCU2, the others operate as TCU1. - Each channel can generate an interrupt. Some channels share an interrupt line, some don't, and this changes between SoC versions: * on older SoCs (JZ4740 and below), channel 0 and channel 1 have their own interrupt line; channels 2-7 share the last interrupt line. * On JZ4725B, channel 0 has its own interrupt; channels 1-5 share one interrupt line; the OST uses the last interrupt line. * on newer SoCs (JZ4750 and above), channel 5 has its own interrupt; channels 0-4 and (if eight channels) 6-7 all share one interrupt line; the OST uses the last interrupt line. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v4: New patch in this series v5: Added information about number of channels, and improved documentation about channel modes v6: Add info about OST (can be 32-bit on older SoCs) v7-v11: No change v12: Add details about new implementation v13: No change v14: Convert to ReStructured Text v15: Remove info about MFD driver Documentation/index.rst| 1 + Documentation/mips/index.rst | 11 + Documentation/mips/ingenic-tcu.rst | 71 ++ 3 files changed, 83 insertions(+) create mode 100644 Documentation/mips/index.rst create mode 100644 Documentation/mips/ingenic-tcu.rst diff --git a/Documentation/index.rst b/Documentation/index.rst index 70ae148ec980..87214feda41f 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -143,6 +143,7 @@ implementation. arm64/index ia64/index m68k/index + mips/index riscv/index s390/index sh/index diff --git a/Documentation/mips/index.rst b/Documentation/mips/index.rst new file mode 100644 index ..321b4794f3b8 --- /dev/null +++ b/Documentation/mips/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=== +MIPS-specific Documentation +=== + +.. toctree:: + :maxdepth: 1 + :numbered: + + ingenic-tcu diff --git a/Documentation/mips/ingenic-tcu.rst b/Documentation/mips/ingenic-tcu.rst new file mode 100644 index ..c4ef4c45aade --- /dev/null +++ b/Documentation/mips/ingenic-tcu.rst @@ -0,0 +1,71 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=== +Ingenic JZ47xx SoCs Timer/Counter Unit hardware +=== + +The Timer/Counter Unit (TCU) in Ingenic JZ47xx SoCs is a multi-function +hardware block. It features up to to eight channels, that can be used as +counters, timers, or PWM. + +- JZ4725B, JZ4750, JZ4755 only have six TCU channels. The other SoCs all + have eight channels. + +- JZ4725B introduced a separate channel, called Operating System Timer + (OST). It is a 32-bit programmable timer. On JZ4760B and above, it is + 64-bit. + +- Each one of the TCU channels has its own clock, which can be reparented to three + different clocks (pclk, ext, rtc), gated, and reclocked, through their TCSR register. + +- The watchdog and OST hardware blocks also feature a TCSR register with the same + format in their register space. +- The TCU registers used to gate/ungate can also gate/ungate the watchdog and + OST clocks. + +- Each TCU channel works in one of two modes: + +- mode TCU1: channels cannot work in sleep mode, but are easier to + operate. +- mode TCU2: channels can work in sleep mode, but the operation
[PATCH v15 03/13] dt-bindings: Add doc for the Ingenic TCU drivers
Add documentation about how to properly use the Ingenic TCU (Timer/Counter Unit) drivers from devicetree. Signed-off-by: Paul Cercueil Reviewed-by: Rob Herring Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v4: New patch in this series. Corresponds to V2 patches 3-4-5 with added content. v5: - Edited PWM/watchdog DT bindings documentation to point to the new document. - Moved main document to Documentation/devicetree/bindings/timer/ingenic,tcu.txt - Updated documentation to reflect the new devicetree bindings. v6: - Removed PWM/watchdog documentation files as asked by upstream - Removed doc about properties that should be implicit - Removed doc about ingenic,timer-channel / ingenic,clocksource-channel as they are gone - Fix WDT clock name in the binding doc - Fix lengths of register areas in watchdog/pwm nodes v7: No change v8: - Fix address of the PWM node - Added doc about system timer and clocksource children nodes v9: - Remove doc about system timer and clocksource children nodes... - Add doc about ingenic,pwm-channels-mask property v10: No change v11: Fix info about default value of ingenic,pwm-channels-mask v12: Drop sub-nodes for now; they will be introduced in a follow-up patchset. v13: - Revert back to v11. Turns out it was okay. - Remove 'interrupt-parent' of the list of required properties. v14: No change v15: Add "simple-mfd" compatible string .../bindings/pwm/ingenic,jz47xx-pwm.txt | 22 --- .../devicetree/bindings/timer/ingenic,tcu.txt | 137 ++ .../bindings/watchdog/ingenic,jz4740-wdt.txt | 17 --- 3 files changed, 137 insertions(+), 39 deletions(-) delete mode 100644 Documentation/devicetree/bindings/pwm/ingenic,jz47xx-pwm.txt create mode 100644 Documentation/devicetree/bindings/timer/ingenic,tcu.txt delete mode 100644 Documentation/devicetree/bindings/watchdog/ingenic,jz4740-wdt.txt diff --git a/Documentation/devicetree/bindings/pwm/ingenic,jz47xx-pwm.txt b/Documentation/devicetree/bindings/pwm/ingenic,jz47xx-pwm.txt deleted file mode 100644 index 493bec80d59b.. --- a/Documentation/devicetree/bindings/pwm/ingenic,jz47xx-pwm.txt +++ /dev/null @@ -1,22 +0,0 @@ -Ingenic JZ47xx PWM Controller -= - -Required properties: -- compatible: Should be "ingenic,jz4740-pwm" -- #pwm-cells: Should be 3. See pwm.txt in this directory for a description - of the cells format. -- clocks : phandle to the external clock. -- clock-names : Should be "ext". - - -Example: - - pwm: pwm@10002000 { - compatible = "ingenic,jz4740-pwm"; - reg = <0x10002000 0x1000>; - - #pwm-cells = <3>; - - clocks = <&ext>; - clock-names = "ext"; - }; diff --git a/Documentation/devicetree/bindings/timer/ingenic,tcu.txt b/Documentation/devicetree/bindings/timer/ingenic,tcu.txt new file mode 100644 index ..5a4b9ddd9470 --- /dev/null +++ b/Documentation/devicetree/bindings/timer/ingenic,tcu.txt @@ -0,0 +1,137 @@ +Ingenic JZ47xx SoCs Timer/Counter Unit devicetree bindings +== + +For a description of the TCU hardware and drivers, have a look at +Documentation/mips/ingenic-tcu.txt. + +Required properties: + +- compatible: Must be one of: + * ingenic,jz4740-tcu + * ingenic,jz4725b-tcu + * ingenic,jz4770-tcu + followed by "simple-mfd". +- reg: Should be the offset/length value corresponding to the TCU registers +- clocks: List of phandle & clock specifiers for clocks external to the TCU. + The "pclk", "rtc" and "ext" clocks should be provided. The "tcu" clock + should be provided if the SoC has it. +- clock-names: List of name strings for the external clocks. +- #clock-cells: Should be <1>; + Clock consumers specify this argument to identify a clock. The valid values + may be found in . +- interrupt-controller : Identifies the node as an interrupt controller +- #interrupt-cells : Specifies the number of cells needed to encode an + interrupt source. The value should be 1. +- interrupts : Specifies the interrupt the controller is connected to. + +Optional properties: + +- ingenic,pwm-channels-mask: Bitmask of TCU channels reserved for PWM use. + Default value is 0xfc. + + +Children nodes +== + + +PWM node: +- + +Required properties: + +- compatible: Must be one of: + * ingenic,jz4740-pwm + * ingenic,jz4725b-pwm +- #pwm-cells: Should be 3. See ../pwm/pwm.txt for a description of the cell + format. +- clocks: List of phandle & clock specifiers for the TCU clocks. +- clock-names: List of name strings for the TCU clocks. + + +Watchdog node: +-- + +Required properties: + +- compatible: Must be "ingenic,jz4740-w
[PATCH v15 04/13] mfd/syscon: Add device_node_to_regmap()
device_node_to_regmap() is exactly like syscon_node_to_regmap(), but it does not check that the node is compatible with "syscon", and won't attach the first clock it finds to the regmap. The rationale behind this, is that one device node with a standard compatible string "foo,bar" can be covered by multiple drivers sharing a regmap, or by a single driver doing all the job without a regmap, but these are implementation details which shouldn't reflect on the devicetree. Signed-off-by: Paul Cercueil --- Notes: v15: New patch drivers/mfd/syscon.c | 46 +- include/linux/mfd/syscon.h | 6 + 2 files changed, 36 insertions(+), 16 deletions(-) diff --git a/drivers/mfd/syscon.c b/drivers/mfd/syscon.c index b65e585fc8c6..660723276481 100644 --- a/drivers/mfd/syscon.c +++ b/drivers/mfd/syscon.c @@ -40,7 +40,7 @@ static const struct regmap_config syscon_regmap_config = { .reg_stride = 4, }; -static struct syscon *of_syscon_register(struct device_node *np) +static struct syscon *of_syscon_register(struct device_node *np, bool check_clk) { struct clk *clk; struct syscon *syscon; @@ -51,9 +51,6 @@ static struct syscon *of_syscon_register(struct device_node *np) struct regmap_config syscon_config = syscon_regmap_config; struct resource res; - if (!of_device_is_compatible(np, "syscon")) - return ERR_PTR(-EINVAL); - syscon = kzalloc(sizeof(*syscon), GFP_KERNEL); if (!syscon) return ERR_PTR(-ENOMEM); @@ -117,16 +114,18 @@ static struct syscon *of_syscon_register(struct device_node *np) goto err_regmap; } - clk = of_clk_get(np, 0); - if (IS_ERR(clk)) { - ret = PTR_ERR(clk); - /* clock is optional */ - if (ret != -ENOENT) - goto err_clk; - } else { - ret = regmap_mmio_attach_clk(regmap, clk); - if (ret) - goto err_attach; + if (check_clk) { + clk = of_clk_get(np, 0); + if (IS_ERR(clk)) { + ret = PTR_ERR(clk); + /* clock is optional */ + if (ret != -ENOENT) + goto err_clk; + } else { + ret = regmap_mmio_attach_clk(regmap, clk); + if (ret) + goto err_attach; + } } syscon->regmap = regmap; @@ -150,7 +149,8 @@ static struct syscon *of_syscon_register(struct device_node *np) return ERR_PTR(ret); } -struct regmap *syscon_node_to_regmap(struct device_node *np) +static struct regmap *device_node_get_regmap(struct device_node *np, +bool check_clk) { struct syscon *entry, *syscon = NULL; @@ -165,13 +165,27 @@ struct regmap *syscon_node_to_regmap(struct device_node *np) spin_unlock(&syscon_list_slock); if (!syscon) - syscon = of_syscon_register(np); + syscon = of_syscon_register(np, check_clk); if (IS_ERR(syscon)) return ERR_CAST(syscon); return syscon->regmap; } + +struct regmap *device_node_to_regmap(struct device_node *np) +{ + return device_node_get_regmap(np, false); +} +EXPORT_SYMBOL_GPL(device_node_to_regmap); + +struct regmap *syscon_node_to_regmap(struct device_node *np) +{ + if (!of_device_is_compatible(np, "syscon")) + return ERR_PTR(-EINVAL); + + return device_node_get_regmap(np, true); +} EXPORT_SYMBOL_GPL(syscon_node_to_regmap); struct regmap *syscon_regmap_lookup_by_compatible(const char *s) diff --git a/include/linux/mfd/syscon.h b/include/linux/mfd/syscon.h index 8cfda0554381..112dc66262cc 100644 --- a/include/linux/mfd/syscon.h +++ b/include/linux/mfd/syscon.h @@ -17,12 +17,18 @@ struct device_node; #ifdef CONFIG_MFD_SYSCON +extern struct regmap *device_node_to_regmap(struct device_node *np); extern struct regmap *syscon_node_to_regmap(struct device_node *np); extern struct regmap *syscon_regmap_lookup_by_compatible(const char *s); extern struct regmap *syscon_regmap_lookup_by_phandle( struct device_node *np, const char *property); #else +static inline struct regmap *device_node_to_regmap(struct device_node *np) +{ + return ERR_PTR(-ENOTSUPP); +} + static inline struct regmap *syscon_node_to_regmap(struct device_node *np) { return ERR_PTR(-ENOTSUPP); -- 2.21.0.593.g511ec345e18
[PATCH v15 05/13] clk: ingenic: Add driver for the TCU clocks
Add driver to support the clocks provided by the Timer/Counter Unit (TCU) of the JZ47xx SoCs from Ingenic. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v12: New patch v13: - Don't enable/disable the TCU clock on demand. Enable it in the probe and call it a day. - Register suspend callbacks to gate/ungate the TCU clock on suspend/resume. - Use pr_fmt and pr_crit instead of custom TCU_ERR() macro - Remove useless dependency on COMMON_CLK in Kconfig - Remove registration of clkdev v14: Change %i to %d v15: - Use CLK_OF_DECLARE_DRIVER macro since we use "simple-mfd" - Use device_node_to_regmap() drivers/clk/ingenic/Kconfig | 10 +- drivers/clk/ingenic/Makefile | 1 + drivers/clk/ingenic/tcu.c| 474 +++ 3 files changed, 484 insertions(+), 1 deletion(-) create mode 100644 drivers/clk/ingenic/tcu.c diff --git a/drivers/clk/ingenic/Kconfig b/drivers/clk/ingenic/Kconfig index fe8db93cf21a..1cb489959a99 100644 --- a/drivers/clk/ingenic/Kconfig +++ b/drivers/clk/ingenic/Kconfig @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -menu "Ingenic JZ47xx CGU drivers" +menu "Ingenic SoCs drivers" depends on MIPS config INGENIC_CGU_COMMON @@ -45,4 +45,12 @@ config INGENIC_CGU_JZ4780 If building for a JZ4780 SoC, you want to say Y here. +config INGENIC_TCU_CLK + bool "Ingenic JZ47xx TCU clocks driver" + default MACH_INGENIC + select MFD_SYSCON + help + Support the clocks of the Timer/Counter Unit (TCU) of the Ingenic + JZ47xx SoCs. + endmenu diff --git a/drivers/clk/ingenic/Makefile b/drivers/clk/ingenic/Makefile index 250570a809d3..097220b05131 100644 --- a/drivers/clk/ingenic/Makefile +++ b/drivers/clk/ingenic/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_INGENIC_CGU_JZ4740)+= jz4740-cgu.o obj-$(CONFIG_INGENIC_CGU_JZ4725B) += jz4725b-cgu.o obj-$(CONFIG_INGENIC_CGU_JZ4770) += jz4770-cgu.o obj-$(CONFIG_INGENIC_CGU_JZ4780) += jz4780-cgu.o +obj-$(CONFIG_INGENIC_TCU_CLK) += tcu.o diff --git a/drivers/clk/ingenic/tcu.c b/drivers/clk/ingenic/tcu.c new file mode 100644 index ..a1a5f9cb439e --- /dev/null +++ b/drivers/clk/ingenic/tcu.c @@ -0,0 +1,474 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * JZ47xx SoCs TCU clocks driver + * Copyright (C) 2019 Paul Cercueil + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* 8 channels max + watchdog + OST */ +#define TCU_CLK_COUNT 10 + +#undef pr_fmt +#define pr_fmt(fmt) "ingenic-tcu-clk: " fmt + +enum tcu_clk_parent { + TCU_PARENT_PCLK, + TCU_PARENT_RTC, + TCU_PARENT_EXT, +}; + +struct ingenic_soc_info { + unsigned int num_channels; + bool has_ost; + bool has_tcu_clk; +}; + +struct ingenic_tcu_clk_info { + struct clk_init_data init_data; + u8 gate_bit; + u8 tcsr_reg; +}; + +struct ingenic_tcu_clk { + struct clk_hw hw; + unsigned int idx; + struct ingenic_tcu *tcu; + const struct ingenic_tcu_clk_info *info; +}; + +struct ingenic_tcu { + const struct ingenic_soc_info *soc_info; + struct regmap *map; + struct clk *clk; + + struct clk_hw_onecell_data *clocks; +}; + +static struct ingenic_tcu *ingenic_tcu; + +static inline struct ingenic_tcu_clk *to_tcu_clk(struct clk_hw *hw) +{ + return container_of(hw, struct ingenic_tcu_clk, hw); +} + +static int ingenic_tcu_enable(struct clk_hw *hw) +{ + struct ingenic_tcu_clk *tcu_clk = to_tcu_clk(hw); + const struct ingenic_tcu_clk_info *info = tcu_clk->info; + struct ingenic_tcu *tcu = tcu_clk->tcu; + + regmap_write(tcu->map, TCU_REG_TSCR, BIT(info->gate_bit)); + + return 0; +} + +static void ingenic_tcu_disable(struct clk_hw *hw) +{ + struct ingenic_tcu_clk *tcu_clk = to_tcu_clk(hw); + const struct ingenic_tcu_clk_info *info = tcu_clk->info; + struct ingenic_tcu *tcu = tcu_clk->tcu; + + regmap_write(tcu->map, TCU_REG_TSSR, BIT(info->gate_bit)); +} + +static int ingenic_tcu_is_enabled(struct clk_hw *hw) +{ + struct ingenic_tcu_clk *tcu_clk = to_tcu_clk(hw); + const struct ingenic_tcu_clk_info *info = tcu_clk->info; + unsigned int value; + + regmap_read(tcu_clk->tcu->map, TCU_REG_TSR, &value); + + return !(value & BIT(info->gate_bit)); +} + +static bool ingenic_tcu_enable_regs(struct clk_hw *hw) +{ + struct ingenic_tcu_clk *tcu_clk = to_tcu_clk(hw); + const struct ingenic_tcu_clk_info *info = tcu_clk->info; + struct ingenic_tcu *tcu = tcu_clk->tcu; + bool enabled = false; + + /* +* If the SoC has no global TCU clock, we must ungate the channel's +* clock to be able to access its registers. +* If we have a TCU clock, it will be enabled automatically as it has +*
[PATCH v15 06/13] irqchip: Add irq-ingenic-tcu driver
This driver handles the interrupt controller built in the Timer/Counter Unit (TCU) of the JZ47xx SoCs from Ingenic. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek Reviewed-by: Thomas Gleixner --- Notes: v12: New patch v13: No change v14: Remove empty lines in structure definitions v15: Use device_node_to_regmap() drivers/irqchip/Kconfig | 11 ++ drivers/irqchip/Makefile | 1 + drivers/irqchip/irq-ingenic-tcu.c | 182 ++ 3 files changed, 194 insertions(+) create mode 100644 drivers/irqchip/irq-ingenic-tcu.c diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index 80e10f4e213a..3c8308e6b3a7 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -315,6 +315,17 @@ config INGENIC_IRQ depends on MACH_INGENIC default y +config INGENIC_TCU_IRQ + bool "Ingenic JZ47xx TCU interrupt controller" + default MACH_INGENIC + depends on MIPS || COMPILE_TEST + select MFD_SYSCON + help + Support for interrupts in the Timer/Counter Unit (TCU) of the Ingenic + JZ47xx SoCs. + + If unsure, say N. + config RENESAS_H8300H_INTC bool select IRQ_DOMAIN diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile index 8d0fcec6ab23..cc7c43932f16 100644 --- a/drivers/irqchip/Makefile +++ b/drivers/irqchip/Makefile @@ -75,6 +75,7 @@ obj-$(CONFIG_RENESAS_H8300H_INTC) += irq-renesas-h8300h.o obj-$(CONFIG_RENESAS_H8S_INTC) += irq-renesas-h8s.o obj-$(CONFIG_ARCH_SA1100) += irq-sa11x0.o obj-$(CONFIG_INGENIC_IRQ) += irq-ingenic.o +obj-$(CONFIG_INGENIC_TCU_IRQ) += irq-ingenic-tcu.o obj-$(CONFIG_IMX_GPCV2)+= irq-imx-gpcv2.o obj-$(CONFIG_PIC32_EVIC) += irq-pic32-evic.o obj-$(CONFIG_MSCC_OCELOT_IRQ) += irq-mscc-ocelot.o diff --git a/drivers/irqchip/irq-ingenic-tcu.c b/drivers/irqchip/irq-ingenic-tcu.c new file mode 100644 index ..6d05cefe9d79 --- /dev/null +++ b/drivers/irqchip/irq-ingenic-tcu.c @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * JZ47xx SoCs TCU IRQ driver + * Copyright (C) 2019 Paul Cercueil + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +struct ingenic_tcu { + struct regmap *map; + struct clk *clk; + struct irq_domain *domain; + unsigned int nb_parent_irqs; + u32 parent_irqs[3]; +}; + +static void ingenic_tcu_intc_cascade(struct irq_desc *desc) +{ + struct irq_chip *irq_chip = irq_data_get_irq_chip(&desc->irq_data); + struct irq_domain *domain = irq_desc_get_handler_data(desc); + struct irq_chip_generic *gc = irq_get_domain_generic_chip(domain, 0); + struct regmap *map = gc->private; + uint32_t irq_reg, irq_mask; + unsigned int i; + + regmap_read(map, TCU_REG_TFR, &irq_reg); + regmap_read(map, TCU_REG_TMR, &irq_mask); + + chained_irq_enter(irq_chip, desc); + + irq_reg &= ~irq_mask; + + for_each_set_bit(i, (unsigned long *)&irq_reg, 32) + generic_handle_irq(irq_linear_revmap(domain, i)); + + chained_irq_exit(irq_chip, desc); +} + +static void ingenic_tcu_gc_unmask_enable_reg(struct irq_data *d) +{ + struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d); + struct irq_chip_type *ct = irq_data_get_chip_type(d); + struct regmap *map = gc->private; + u32 mask = d->mask; + + irq_gc_lock(gc); + regmap_write(map, ct->regs.ack, mask); + regmap_write(map, ct->regs.enable, mask); + *ct->mask_cache |= mask; + irq_gc_unlock(gc); +} + +static void ingenic_tcu_gc_mask_disable_reg(struct irq_data *d) +{ + struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d); + struct irq_chip_type *ct = irq_data_get_chip_type(d); + struct regmap *map = gc->private; + u32 mask = d->mask; + + irq_gc_lock(gc); + regmap_write(map, ct->regs.disable, mask); + *ct->mask_cache &= ~mask; + irq_gc_unlock(gc); +} + +static void ingenic_tcu_gc_mask_disable_reg_and_ack(struct irq_data *d) +{ + struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d); + struct irq_chip_type *ct = irq_data_get_chip_type(d); + struct regmap *map = gc->private; + u32 mask = d->mask; + + irq_gc_lock(gc); + regmap_write(map, ct->regs.ack, mask); + regmap_write(map, ct->regs.disable, mask); + irq_gc_unlock(gc); +} + +static int __init ingenic_tcu_irq_init(struct device_node *np, + struct device_node *parent) +{ + struct irq_chip_generic *gc; + struct irq_chip_type *ct; + struct ingenic_tcu *tcu; + struct regmap *map; + unsigned int i; + int ret, irqs; + + map = device_node_to_regmap(np); + if (IS_ERR(map)) + ret
[PATCH v15 08/13] clk: jz4740: Add TCU clock
Add the missing TCU clock to the list of clocks supplied by the CGU for the JZ4740 SoC. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek Acked-by: Stephen Boyd Acked-by: Rob Herring --- Notes: v5: New patch v6-v15: No change drivers/clk/ingenic/jz4740-cgu.c | 6 ++ include/dt-bindings/clock/jz4740-cgu.h | 1 + 2 files changed, 7 insertions(+) diff --git a/drivers/clk/ingenic/jz4740-cgu.c b/drivers/clk/ingenic/jz4740-cgu.c index 4c0a20949c2c..67f8a0e14284 100644 --- a/drivers/clk/ingenic/jz4740-cgu.c +++ b/drivers/clk/ingenic/jz4740-cgu.c @@ -222,6 +222,12 @@ static const struct ingenic_cgu_clk_info jz4740_cgu_clocks[] = { .parents = { JZ4740_CLK_EXT, -1, -1, -1 }, .gate = { CGU_REG_CLKGR, 5 }, }, + + [JZ4740_CLK_TCU] = { + "tcu", CGU_CLK_GATE, + .parents = { JZ4740_CLK_EXT, -1, -1, -1 }, + .gate = { CGU_REG_CLKGR, 1 }, + }, }; static void __init jz4740_cgu_init(struct device_node *np) diff --git a/include/dt-bindings/clock/jz4740-cgu.h b/include/dt-bindings/clock/jz4740-cgu.h index 6ed83f926ae7..e82d77028581 100644 --- a/include/dt-bindings/clock/jz4740-cgu.h +++ b/include/dt-bindings/clock/jz4740-cgu.h @@ -34,5 +34,6 @@ #define JZ4740_CLK_ADC 19 #define JZ4740_CLK_I2C 20 #define JZ4740_CLK_AIC 21 +#define JZ4740_CLK_TCU 22 #endif /* __DT_BINDINGS_CLOCK_JZ4740_CGU_H__ */ -- 2.21.0.593.g511ec345e18
[PATCH v15 07/13] clocksource: Add a new timer-ingenic driver
This driver handles the TCU (Timer Counter Unit) present on the Ingenic JZ47xx SoCs, and provides the kernel with a system timer, a clocksource and a sched_clock. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek Reviewed-by: Thomas Gleixner --- Notes: v2: Use SPDX identifier for the license v3: - Move documentation to its own patch - Search the devicetree for PWM clients, and use all the TCU channels that won't be used for PWM v4: - Add documentation about why we search for PWM clients - Verify that the PWM clients are for the TCU PWM driver v5: Major overhaul. Too many changes to list. Consider it's a new patch. v6: - Add two API functions ingenic_tcu_request_channel and ingenic_tcu_release_channel. To be used by the PWM driver to request the use of a TCU channel. The driver will now dynamically move away the system timer or clocksource to a new TCU channel. - The system timer now defaults to channel 0, the clocksource now defaults to channel 1 and is no more optional. The ingenic,timer-channel and ingenic,clocksource-channel devicetree properties are now gone. - Fix round_rate / set_rate not calculating the prescale divider the same way. This caused problems when (parent_rate / div) would give a non-integer result. The behaviour is correct now. - The clocksource clock is turned off on suspend now. v7: Fix section mismatch by using builtin_platform_driver_probe() v8: - Removed ingenic_tcu_[request,release]_channel, and the mechanism to dynamically change the TCU channel of the system timer or the clocksource. - The driver's devicetree node can now have two more children nodes, that correspond to the system timer and clocksource. For these two, the driver will use the TCU timer that correspond to the memory resource supplied in their respective node. v9: - Removed support for clocksource / timer children devicetree nodes. Now, we use a property "ingenic,pwm-channels-mask" to know which PWM channels are reserved for PWM use and should not be used as OS timers. v10: - Use CLK_SET_RATE_UNGATE instead of CLK_SET_RATE_GATE + manually un-gating the clock before changing rate. Same for re-parenting. - Unconditionally create the clocksource and sched_clock even if the SoC possesses a OS Timer. That gives the choice back to the user which clocksource should be selected. - Use subsys_initcall() instead of builtin_platform_driver_probe(). The OS Timer driver calls builtin_platform_driver_probe, which requires the device to be created before that. - Cosmetic cleanups v11: - Change prototype of exported function ingenic_tcu_pwm_can_use_chn(), use a struct device * as first argument. - Read clocksource using the regmap instead of bypassing it. Bypassing the regmap makes sense only for the sched_clock where the read operation must be as fast as possible. - Fix incorrect format in pr_crit() macro v12: - Clock handling and IRQ handling are gone, and are now handled in their own driver. - Obtain regmap from the ingenic-tcu MFD driver. As a result, we cannot bypass the regmap anymore for the sched_clock. v13: No change v14: Remove empty lines in structure definitions v15: Use device_node_to_regmap() drivers/clocksource/Kconfig | 11 + drivers/clocksource/Makefile| 1 + drivers/clocksource/ingenic-timer.c | 356 3 files changed, 368 insertions(+) create mode 100644 drivers/clocksource/ingenic-timer.c diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig index 5e9317dc3d39..a9cdc2c4f8bd 100644 --- a/drivers/clocksource/Kconfig +++ b/drivers/clocksource/Kconfig @@ -685,4 +685,15 @@ config MILBEAUT_TIMER help Enables the support for Milbeaut timer driver. +config INGENIC_TIMER + bool "Clocksource/timer using the TCU in Ingenic JZ SoCs" + default MACH_INGENIC + depends on MIPS || COMPILE_TEST + depends on COMMON_CLK + select MFD_SYSCON + select TIMER_OF + select IRQ_DOMAIN + help + Support for the timer/counter unit of the Ingenic JZ SoCs. + endmenu diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile index 2e7936e7833f..4dfe4225ece7 100644 --- a/drivers/clocksource/Makefile +++ b/drivers/clocksource/Makefile @@ -80,6 +80,7 @@ obj-$(CONFIG_ASM9260_TIMER) += asm9260_timer.o obj-$(CONFIG_H8300_TMR8) += h8300_timer8.o obj-$(CONFIG_H8300_TMR16) += h8300_timer16.o obj-$(CON
[PATCH v15 12/13] MIPS: GCW0: Reduce system timer and clocksource to 750 kHz
The default clock (12 MHz) is too fast for the system timer. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v8: New patch v9: Don't configure clock timer1, as the OS Timer is used as clocksource on this SoC v10: Revert back to v8 bahaviour. Let the user choose what clocksource should be used. v11: No change v12: Move clocksource to channel 2, as channel 1 is used as PWM for the backlight. v13-v15: No change arch/mips/boot/dts/ingenic/gcw0.dts | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/gcw0.dts b/arch/mips/boot/dts/ingenic/gcw0.dts index 35f0291e8d38..f58d239c2058 100644 --- a/arch/mips/boot/dts/ingenic/gcw0.dts +++ b/arch/mips/boot/dts/ingenic/gcw0.dts @@ -2,6 +2,7 @@ /dts-v1/; #include "jz4770.dtsi" +#include / { compatible = "gcw,zero", "ingenic,jz4770"; @@ -60,3 +61,12 @@ /* The WiFi module is connected to the UHC. */ status = "okay"; }; + +&tcu { + /* 750 kHz for the system timer and clocksource */ + assigned-clocks = <&tcu TCU_CLK_TIMER0>, <&tcu TCU_CLK_TIMER2>; + assigned-clock-rates = <75>, <75>; + + /* PWM1 is in use, so reserve channel #2 for the clocksource */ + ingenic,pwm-channels-mask = <0xfa>; +}; -- 2.21.0.593.g511ec345e18
[PATCH v15 10/13] MIPS: qi_lb60: Reduce system timer and clocksource to 750 kHz
The default clock (12 MHz) is too fast for the system timer, which fails to report time accurately. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v5: New patch v6: Remove ingenic,clocksource-channel property v7-v15: No change arch/mips/boot/dts/ingenic/qi_lb60.dts | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/qi_lb60.dts b/arch/mips/boot/dts/ingenic/qi_lb60.dts index cc26650562c2..933d98ca8d93 100644 --- a/arch/mips/boot/dts/ingenic/qi_lb60.dts +++ b/arch/mips/boot/dts/ingenic/qi_lb60.dts @@ -2,6 +2,7 @@ /dts-v1/; #include "jz4740.dtsi" +#include #include / { @@ -64,3 +65,9 @@ pinctrl-names = "default"; pinctrl-0 = <&pins_mmc>; }; + +&tcu { + /* 750 kHz for the system timer and clocksource */ + assigned-clocks = <&tcu TCU_CLK_TIMER0>, <&tcu TCU_CLK_TIMER1>; + assigned-clock-rates = <75>, <75>; +}; -- 2.21.0.593.g511ec345e18
[PATCH v15 13/13] MIPS: jz4740: Drop obsolete code
The old clocksource/timer platform code is now obsoleted by the newly introduced TCU drivers. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v5: New patch v6-v11: No change v12: Only remove clocksource code. The rest will eventually be removed in a future patchset when the PWM/watchdog drivers are updated. v13-v15: No change arch/mips/jz4740/time.c | 151 +--- 1 file changed, 2 insertions(+), 149 deletions(-) diff --git a/arch/mips/jz4740/time.c b/arch/mips/jz4740/time.c index cb768e560d8b..5476899f0882 100644 --- a/arch/mips/jz4740/time.c +++ b/arch/mips/jz4740/time.c @@ -4,161 +4,14 @@ * JZ4740 platform time support */ -#include #include -#include -#include -#include +#include -#include -#include - -#include #include -#include - -#define TIMER_CLOCKEVENT 0 -#define TIMER_CLOCKSOURCE 1 - -static uint16_t jz4740_jiffies_per_tick; - -static u64 jz4740_clocksource_read(struct clocksource *cs) -{ - return jz4740_timer_get_count(TIMER_CLOCKSOURCE); -} - -static struct clocksource jz4740_clocksource = { - .name = "jz4740-timer", - .rating = 200, - .read = jz4740_clocksource_read, - .mask = CLOCKSOURCE_MASK(16), - .flags = CLOCK_SOURCE_IS_CONTINUOUS, -}; - -static u64 notrace jz4740_read_sched_clock(void) -{ - return jz4740_timer_get_count(TIMER_CLOCKSOURCE); -} - -static irqreturn_t jz4740_clockevent_irq(int irq, void *devid) -{ - struct clock_event_device *cd = devid; - - jz4740_timer_ack_full(TIMER_CLOCKEVENT); - - if (!clockevent_state_periodic(cd)) - jz4740_timer_disable(TIMER_CLOCKEVENT); - - cd->event_handler(cd); - - return IRQ_HANDLED; -} - -static int jz4740_clockevent_set_periodic(struct clock_event_device *evt) -{ - jz4740_timer_set_count(TIMER_CLOCKEVENT, 0); - jz4740_timer_set_period(TIMER_CLOCKEVENT, jz4740_jiffies_per_tick); - jz4740_timer_irq_full_enable(TIMER_CLOCKEVENT); - jz4740_timer_enable(TIMER_CLOCKEVENT); - - return 0; -} - -static int jz4740_clockevent_resume(struct clock_event_device *evt) -{ - jz4740_timer_irq_full_enable(TIMER_CLOCKEVENT); - jz4740_timer_enable(TIMER_CLOCKEVENT); - - return 0; -} - -static int jz4740_clockevent_shutdown(struct clock_event_device *evt) -{ - jz4740_timer_disable(TIMER_CLOCKEVENT); - - return 0; -} - -static int jz4740_clockevent_set_next(unsigned long evt, - struct clock_event_device *cd) -{ - jz4740_timer_set_count(TIMER_CLOCKEVENT, 0); - jz4740_timer_set_period(TIMER_CLOCKEVENT, evt); - jz4740_timer_enable(TIMER_CLOCKEVENT); - - return 0; -} - -static struct clock_event_device jz4740_clockevent = { - .name = "jz4740-timer", - .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, - .set_next_event = jz4740_clockevent_set_next, - .set_state_shutdown = jz4740_clockevent_shutdown, - .set_state_periodic = jz4740_clockevent_set_periodic, - .set_state_oneshot = jz4740_clockevent_shutdown, - .tick_resume = jz4740_clockevent_resume, - .rating = 200, -#ifdef CONFIG_MACH_JZ4740 - .irq = JZ4740_IRQ_TCU0, -#endif -#if defined(CONFIG_MACH_JZ4770) || defined(CONFIG_MACH_JZ4780) - .irq = JZ4780_IRQ_TCU2, -#endif -}; - -static struct irqaction timer_irqaction = { - .handler= jz4740_clockevent_irq, - .flags = IRQF_PERCPU | IRQF_TIMER, - .name = "jz4740-timerirq", - .dev_id = &jz4740_clockevent, -}; void __init plat_time_init(void) { - int ret; - uint32_t clk_rate; - uint16_t ctrl; - struct clk *ext_clk; - of_clk_init(NULL); jz4740_timer_init(); - - ext_clk = clk_get(NULL, "ext"); - if (IS_ERR(ext_clk)) - panic("unable to get ext clock"); - clk_rate = clk_get_rate(ext_clk) >> 4; - clk_put(ext_clk); - - jz4740_jiffies_per_tick = DIV_ROUND_CLOSEST(clk_rate, HZ); - - clockevent_set_clock(&jz4740_clockevent, clk_rate); - jz4740_clockevent.min_delta_ns = clockevent_delta2ns(100, &jz4740_clockevent); - jz4740_clockevent.min_delta_ticks = 100; - jz4740_clockevent.max_delta_ns = clockevent_delta2ns(0x, &jz4740_clockevent); - jz4740_clockevent.max_delta_ticks = 0x; - jz4740_clockevent.cpumask = cpumask_of(0); - - clockevents_register_device(&jz4740_clockevent); - - ret = clocksource_register_hz(&jz4740_clocksource, clk_rate); - - if (ret) - printk(KERN_ERR "Failed to register clocksource: %d\n", ret); - - sched_clock_register(jz4740_read_sched_clock, 16, clk_rate); - - setup_irq(jz4740_clockevent.irq, &timer_irqaction); - - ctrl = JZ_TIMER_CTRL_PRESCALE_16 | JZ_TIMER_CTRL_SRC_EXT; - - jz4740_timer_set_ctrl(TIMER_CLOCKEVENT, ctrl); -
[PATCH v15 11/13] MIPS: CI20: Reduce system timer and clocksource to 3 MHz
The default clock (48 MHz) is too fast for the system timer. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v5: New patch v6: Set also the rate for the clocksource channel's clock v7: No change v8: No change v9: Don't configure clock timer1, as the OS Timer is used as clocksource on this SoC v10: Revert back to v8 bahaviour. Let the user choose what clocksource should be used. v11-v15: No change arch/mips/boot/dts/ingenic/ci20.dts | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/ci20.dts b/arch/mips/boot/dts/ingenic/ci20.dts index 4f7b1fa31cf5..2e9952311ecd 100644 --- a/arch/mips/boot/dts/ingenic/ci20.dts +++ b/arch/mips/boot/dts/ingenic/ci20.dts @@ -2,6 +2,7 @@ /dts-v1/; #include "jz4780.dtsi" +#include #include / { @@ -238,3 +239,9 @@ bias-disable; }; }; + +&tcu { + /* 3 MHz for the system timer and clocksource */ + assigned-clocks = <&tcu TCU_CLK_TIMER0>, <&tcu TCU_CLK_TIMER1>; + assigned-clock-rates = <300>, <300>; +}; -- 2.21.0.593.g511ec345e18
[PATCH v15 09/13] MIPS: jz4740: Add DTS nodes for the TCU drivers
Add DTS nodes for the JZ4780, JZ4770 and JZ4740 devicetree files. Signed-off-by: Paul Cercueil Tested-by: Mathieu Malaterre Tested-by: Artur Rojek --- Notes: v5: New patch v6: Fix register lengths in watchdog/pwm nodes v7: No change v8: - Fix wrong start address for PWM node - Add system timer and clocksource sub-nodes v9: Drop timer and clocksource sub-nodes v10-v11: No change v12: Drop PWM/watchdog/OST sub-nodes, for now. v13-v14: No change v15: Add "simple-mfd" compatible string arch/mips/boot/dts/ingenic/jz4740.dtsi | 22 ++ arch/mips/boot/dts/ingenic/jz4770.dtsi | 21 + arch/mips/boot/dts/ingenic/jz4780.dtsi | 23 +++ 3 files changed, 66 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/jz4740.dtsi b/arch/mips/boot/dts/ingenic/jz4740.dtsi index 3ffaf63f22dd..058800bfc875 100644 --- a/arch/mips/boot/dts/ingenic/jz4740.dtsi +++ b/arch/mips/boot/dts/ingenic/jz4740.dtsi @@ -53,6 +53,28 @@ clock-names = "rtc"; }; + tcu: timer@10002000 { + compatible = "ingenic,jz4740-tcu", "simple-mfd"; + reg = <0x10002000 0x1000>; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x10002000 0x1000>; + + #clock-cells = <1>; + + clocks = <&cgu JZ4740_CLK_RTC + &cgu JZ4740_CLK_EXT + &cgu JZ4740_CLK_PCLK + &cgu JZ4740_CLK_TCU>; + clock-names = "rtc", "ext", "pclk", "tcu"; + + interrupt-controller; + #interrupt-cells = <1>; + + interrupt-parent = <&intc>; + interrupts = <23 22 21>; + }; + rtc_dev: rtc@10003000 { compatible = "ingenic,jz4740-rtc"; reg = <0x10003000 0x40>; diff --git a/arch/mips/boot/dts/ingenic/jz4770.dtsi b/arch/mips/boot/dts/ingenic/jz4770.dtsi index 49ede6c14ff3..0bfb9edff3d0 100644 --- a/arch/mips/boot/dts/ingenic/jz4770.dtsi +++ b/arch/mips/boot/dts/ingenic/jz4770.dtsi @@ -46,6 +46,27 @@ #clock-cells = <1>; }; + tcu: timer@10002000 { + compatible = "ingenic,jz4770-tcu", "simple-mfd"; + reg = <0x10002000 0x1000>; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x10002000 0x1000>; + + #clock-cells = <1>; + + clocks = <&cgu JZ4770_CLK_RTC + &cgu JZ4770_CLK_EXT + &cgu JZ4770_CLK_PCLK>; + clock-names = "rtc", "ext", "pclk"; + + interrupt-controller; + #interrupt-cells = <1>; + + interrupt-parent = <&intc>; + interrupts = <27 26 25>; + }; + pinctrl: pin-controller@1001 { compatible = "ingenic,jz4770-pinctrl"; reg = <0x1001 0x600>; diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi b/arch/mips/boot/dts/ingenic/jz4780.dtsi index b03cdec56de9..c54bd7cfec55 100644 --- a/arch/mips/boot/dts/ingenic/jz4780.dtsi +++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi @@ -46,6 +46,29 @@ #clock-cells = <1>; }; + tcu: timer@10002000 { + compatible = "ingenic,jz4780-tcu", +"ingenic,jz4770-tcu", +"simple-mfd"; + reg = <0x10002000 0x1000>; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x10002000 0x1000>; + + #clock-cells = <1>; + + clocks = <&cgu JZ4780_CLK_RTCLK + &cgu JZ4780_CLK_EXCLK + &cgu JZ4780_CLK_PCLK>; + clock-names = "rtc", "ext", "pclk"; + + interrupt-controller; + #interrupt-cells = <1>; + + interrupt-parent = <&intc>; + interrupts = <27 26 25>; + }; + rtc_dev: rtc@10003000 { compatible = "ingenic,jz4780-rtc"; reg = <0x10003000 0x4c>; -- 2.21.0.593.g511ec345e18
Re: [PATCH] Documentation: move Documentation/virtual to Documentation/virt
On Wed, 24 Jul 2019 10:51:36 +0200 Paolo Bonzini wrote: > On 24/07/19 09:24, Christoph Hellwig wrote: > > Renaming docs seems to be en vogue at the moment, so fix on of the > > grossly misnamed directories. We usually never use "virtual" as > > a shortcut for virtualization in the kernel, but always virt, > > as seen in the virt/ top-level directory. Fix up the documentation > > to match that. > > > > Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a > > common "virtual" directory, I.E:") > > Signed-off-by: Christoph Hellwig > > Queued, thanks. I can't count how many times I said "I really should > rename that directory". ...and it's up to Linus before I even got a chance to look at it - one has to be fast around here...:) There's nothing wrong with this move, but it does miss the point of much of the reorganization that has been going on in the docs tree. It's not just a matter of getting more pleasing names; the real idea is to create a better, more reader-focused organization on kernel documentation as a whole. Documentation/virt still has the sort of confusion of audiences that we're trying to fix: - kvm/api.txt pretty clearly belongs in the userspace-api book, rather than tossed in with: - kvm/review-checklist.txt, which belongs in the subsystem guide, if only we'd gotten around to creating it yet, or - kvm/mmu.txt, which is information for kernel developers, or - uml/UserModeLinux-HOWTO.txt, which belongs in the admin guide. I suspect that organization is going to be one of the main issues to talk about in Lisbon. Meanwhile, I hope that this rename won't preclude organizational work in the future. Thanks, jon
RE: [PATCH v5] Documentation/checkpatch: Prefer strscpy/strscpy_pad over strcpy/strlcpy/strncpy
Hi, > -Original Message- > From: Gote, Nitin R [mailto:nitin.r.g...@intel.com] > Sent: Tuesday, July 23, 2019 2:56 PM > To: Joe Perches ; Kees Cook > Cc: cor...@lwn.net; a...@linux-foundation.org; a...@canonical.com; > linux-doc@vger.kernel.org; kernel-harden...@lists.openwall.com > Subject: RE: [PATCH v5] Documentation/checkpatch: Prefer > strscpy/strscpy_pad over strcpy/strlcpy/strncpy > > > > -Original Message- > > From: Joe Perches [mailto:j...@perches.com] > > Sent: Monday, July 22, 2019 11:11 PM > > To: Kees Cook ; Gote, Nitin R > > > > Cc: cor...@lwn.net; a...@linux-foundation.org; a...@canonical.com; > > linux-doc@vger.kernel.org; kernel-harden...@lists.openwall.com > > Subject: Re: [PATCH v5] Documentation/checkpatch: Prefer > > strscpy/strscpy_pad over strcpy/strlcpy/strncpy > > > > On Mon, 2019-07-22 at 10:30 -0700, Kees Cook wrote: > > > On Wed, Jul 17, 2019 at 10:00:05AM +0530, NitinGote wrote: > > > > From: Nitin Gote > > > > > > > > Added check in checkpatch.pl to > > > > 1. Deprecate strcpy() in favor of strscpy(). > > > > 2. Deprecate strlcpy() in favor of strscpy(). > > > > 3. Deprecate strncpy() in favor of strscpy() or strscpy_pad(). > > > > > > > > Updated strncpy() section in Documentation/process/deprecated.rst > > > > to cover strscpy_pad() case. > > > > > > > > Signed-off-by: Nitin Gote > > > > > > Reviewed-by: Kees Cook > > > > > > Joe, does this address your checkpatch concerns? > > > > Well, kinda. > > > > strscpy_pad isn't used anywhere in the kernel. > > > > And > > > > +"strncpy" => "strscpy, strscpy_pad or > for non- > > NUL-terminated strings, strncpy() can still be used, but destinations > > should be marked with __nonstring", > > > > is a bit verbose. This could be simply: > > > > +"strncpy" => "strscpy - for non-NUL-terminated uses, > > + strncpy() dst > > should be __nonstring", > > > Could you please give your opinion on below comment. > But, if the destination buffer needs extra NUL-padding for remaining size of > destination, then safe replacement is strscpy_pad(). Right? If yes, then > what > is your opinion on below change : > > "strncpy" => "strscpy, strcpy_pad - for non-NUL-terminated uses, > strncpy() dst should be __nonstring", > > If you agree on this, then I will include this change in next patch version. > -Nitin
Re: [PATCH v5] Documentation/checkpatch: Prefer strscpy/strscpy_pad over strcpy/strlcpy/strncpy
On Wed, 2019-07-24 at 18:17 +, Gote, Nitin R wrote: > Hi, Hi again. [] > > > > > 3. Deprecate strncpy() in favor of strscpy() or strscpy_pad(). Please remember there does not exist a single actual use of strscpy_pad in the kernel sources and no apparent real need for it. I don't find one anyway. > Could you please give your opinion on below comment. > > > But, if the destination buffer needs extra NUL-padding for remaining size of > > destination, then safe replacement is strscpy_pad(). Right? If yes, then > > what > > is your opinion on below change : > > > > "strncpy" => "strscpy, strcpy_pad - for non-NUL-terminated uses, > > strncpy() dst should be __nonstring", > > > If you agree on this, then I will include this change in next patch version. Two things: The kernel-doc documentation uses dest not dst. I think stracpy should be preferred over strscpy.
[PATCH] Correct documentation for /proc/schedstat
Commit 425e0968a25fa3f111f9919964cac079738140b5 ("sched: move code into kernel/sched_stats.h") appears to have inadvertently changed the unit of time from jiffies to nanoseconds as part of the implementation of CFS. Signed-off-by: Phil Frost --- Documentation/scheduler/sched-stats.txt | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt index 8259b34a66ae..b6c1807a01b3 100644 --- a/Documentation/scheduler/sched-stats.txt +++ b/Documentation/scheduler/sched-stats.txt @@ -19,6 +19,11 @@ are no architectures which need more than three domain levels. The first field in the domain stats is a bit map indicating which cpus are affected by that domain. +2.6.23 introduced the CFS scheduler, and also an inadvertent +backwards-incompatible change to the statistics. Although the schedstat version +is 14 in either case, in 2.6.23 and later, counters accumulate time in +nanoseconds. Prior to that, jiffies. + These fields are counters, and only increment. Programs which make use of these will need to start with a baseline observation and then calculate the change in the counters at each subsequent observation. A perl script @@ -48,9 +53,10 @@ Next two are try_to_wake_up() statistics: 6) # of times try_to_wake_up() was called to wake up the local cpu Next three are statistics describing scheduling latency: - 7) sum of all time spent running by tasks on this processor (in jiffies) + 7) sum of all time spent running by tasks on this processor (in +nanoseconds, or jiffies prior to 2.6.23) 8) sum of all time spent waiting to run by tasks on this processor (in -jiffies) +nanoseconds, or jiffies prior to 2.6.23) 9) # of timeslices run on this cpu -- 2.20.1 (Apple Git-117)
[PATCH v10 0/2] overlayfs override_creds=off
Patch series: overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh Add optional __get xattr method paired to __vfs_getxattr overlayfs: add __get xattr method overlayfs: internal getxattr operations without sepolicy checking overlayfs: override_creds=off option bypass creator_cred The first four patches address fundamental security issues that should be solved regardless of the override_creds=off feature. The fifth that adds the feature depends on these other fixes. By default, all access to the upper, lower and work directories is the recorded mounter's MAC and DAC credentials. The incoming accesses are checked against the caller's credentials. If the principles of least privilege are applied for sepolicy, the mounter's credentials might not overlap the credentials of the caller's when accessing the overlayfs filesystem. For example, a file that a lower DAC privileged caller can execute, is MAC denied to the generally higher DAC privileged mounter, to prevent an attack vector. We add the option to turn off override_creds in the mount options; all subsequent operations after mount on the filesystem will be only the caller's credentials. The module boolean parameter and mount option override_creds is also added as a presence check for this "feature", existence of /sys/module/overlay/parameters/overlay_creds Signed-off-by: Mark Salyzyn Cc: Miklos Szeredi Cc: Jonathan Corbet Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Amir Goldstein Cc: Randy Dunlap Cc: Stephen Smalley Cc: linux-unio...@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- v10: - Rebase - Return NULL on CAP_DAC_READ_SEARCH - Add __get xattr method to solve sepolicy logging issue - Drop unnecassary sys_admin sepolicy checking for administrative driver internal xattr functions. v6: - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS. - Do better with the documentation, drop rationalizations. - pr_warn message adjusted to report consequences. v5: - beefed up the caveats in the Documentation - Is dependent on "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh" "overlayfs: check CAP_MKNOD before issuing vfs_whiteout" - Added prwarn when override_creds=off v4: - spelling and grammar errors in text v3: - Change name from caller_credentials / creator_credentials to the boolean override_creds. - Changed from creator to mounter credentials. - Updated and fortified the documentation. - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS v2: - Forward port changed attr to stat, resulting in a build error. - altered commit message.
[PATCH v10 2/5] Add optional __get xattr method paired to __vfs_getxattr
Add an optional __get xattr method that would be called, if set, only in __vfs_getxattr instead of the regular get xattr method. Signed-off-by: Mark Salyzyn Cc: Miklos Szeredi Cc: Jonathan Corbet Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Amir Goldstein Cc: Randy Dunlap Cc: Stephen Smalley Cc: linux-unio...@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: kernel-t...@android.com --- v10 - added to patch series --- fs/xattr.c| 11 ++- include/linux/xattr.h | 7 +-- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/xattr.c b/fs/xattr.c index 90dd78f0eb27..b8f4734e222f 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -306,6 +306,9 @@ __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name, handler = xattr_resolve_name(inode, &name); if (IS_ERR(handler)) return PTR_ERR(handler); + if (unlikely(handler->__get)) + return handler->__get(handler, dentry, inode, name, value, + size); if (!handler->get) return -EOPNOTSUPP; return handler->get(handler, dentry, inode, name, value, size); @@ -317,6 +320,7 @@ vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size) { struct inode *inode = dentry->d_inode; int error; + const struct xattr_handler *handler; error = xattr_permission(inode, name, MAY_READ); if (error) @@ -339,7 +343,12 @@ vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size) return ret; } nolsm: - return __vfs_getxattr(dentry, inode, name, value, size); + handler = xattr_resolve_name(inode, &name); + if (IS_ERR(handler)) + return PTR_ERR(handler); + if (!handler->get) + return -EOPNOTSUPP; + return handler->get(handler, dentry, inode, name, value, size); } EXPORT_SYMBOL_GPL(vfs_getxattr); diff --git a/include/linux/xattr.h b/include/linux/xattr.h index 6dad031be3c2..30f25e1ac571 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -30,10 +30,13 @@ struct xattr_handler { const char *prefix; int flags; /* fs private flags */ bool (*list)(struct dentry *dentry); - int (*get)(const struct xattr_handler *, struct dentry *dentry, + int (*get)(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, const char *name, void *buffer, size_t size); - int (*set)(const struct xattr_handler *, struct dentry *dentry, + int (*__get)(const struct xattr_handler *handler, struct dentry *dentry, +struct inode *inode, const char *name, void *buffer, +size_t size); + int (*set)(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, const char *name, const void *buffer, size_t size, int flags); }; -- 2.22.0.657.g960e92d24f-goog
[PATCH v10 1/5] overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh
Assumption never checked, should fail if the mounter creds are not sufficient. Signed-off-by: Mark Salyzyn Cc: Miklos Szeredi Cc: Jonathan Corbet Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Amir Goldstein Cc: Randy Dunlap Cc: Stephen Smalley Cc: linux-unio...@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: kernel-t...@android.com --- v10: - return NULL rather than ERR_PTR(-EPERM) - did _not_ add it ovl_can_decode_fh() because of changes since last review, suspect needs to be added to ovl_lower_uuid_ok()? v8 + v9: - rebase v7: - This time for realz v6: - rebase v5: - dependency of "overlayfs: override_creds=off option bypass creator_cred" --- fs/overlayfs/namei.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index e9717c2f7d45..9702f0d5309d 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -161,6 +161,9 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt, if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid)) return NULL; + if (!capable(CAP_DAC_READ_SEARCH)) + return NULL; + bytes = (fh->len - offsetof(struct ovl_fh, fid)); real = exportfs_decode_fh(mnt, (struct fid *)fh->fid, bytes >> 2, (int)fh->type, -- 2.22.0.657.g960e92d24f-goog
[PATCH v10 4/5] overlayfs: internal getxattr operations without sepolicy checking
Check impure, opaque, origin & meta xattr with no sepolicy audit (using __vfs_getxattr) since these operations are internal to overlayfs operations and do not disclose any data. This became an issue for credential override off since sys_admin would have been required by the caller; whereas would have been inherently present for the creator since it performed the mount. This is a change in operations since we do not check in the new ovl_vfs_getxattr function if the credential override is off or not. Reasoning is that the sepolicy check is unnecessary overhead, especially since the check can be expensive. Signed-off-by: Mark Salyzyn Cc: Miklos Szeredi Cc: Jonathan Corbet Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Amir Goldstein Cc: Randy Dunlap Cc: Stephen Smalley Cc: linux-unio...@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: kernel-t...@android.com --- v10 - added to patch series --- fs/overlayfs/namei.c | 12 +++- fs/overlayfs/overlayfs.h | 2 ++ fs/overlayfs/util.c | 24 +++- 3 files changed, 24 insertions(+), 14 deletions(-) diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index 9702f0d5309d..fb6c0cd7b65f 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -106,10 +106,11 @@ int ovl_check_fh_len(struct ovl_fh *fh, int fh_len) static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name) { - int res, err; + ssize_t res; + int err; struct ovl_fh *fh = NULL; - res = vfs_getxattr(dentry, name, NULL, 0); + res = ovl_vfs_getxattr(dentry, name, NULL, 0); if (res < 0) { if (res == -ENODATA || res == -EOPNOTSUPP) return NULL; @@ -123,7 +124,7 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name) if (!fh) return ERR_PTR(-ENOMEM); - res = vfs_getxattr(dentry, name, fh, res); + res = ovl_vfs_getxattr(dentry, name, fh, res); if (res < 0) goto fail; @@ -141,10 +142,11 @@ static struct ovl_fh *ovl_get_fh(struct dentry *dentry, const char *name) return NULL; fail: - pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res); + pr_warn_ratelimited("overlayfs: failed to get origin (%zi)\n", res); goto out; invalid: - pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh); + pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", + (int)res, fh); goto out; } diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 73a02a263fbc..82574684a9b6 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -205,6 +205,8 @@ int ovl_want_write(struct dentry *dentry); void ovl_drop_write(struct dentry *dentry); struct dentry *ovl_workdir(struct dentry *dentry); const struct cred *ovl_override_creds(struct super_block *sb); +ssize_t ovl_vfs_getxattr(struct dentry *dentry, const char *name, void *buf, +size_t size); struct super_block *ovl_same_sb(struct super_block *sb); int ovl_can_decode_fh(struct super_block *sb); struct dentry *ovl_indexdir(struct super_block *sb); diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index f5678a3f8350..672459c3cff7 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -40,6 +40,12 @@ const struct cred *ovl_override_creds(struct super_block *sb) return override_creds(ofs->creator_cred); } +ssize_t ovl_vfs_getxattr(struct dentry *dentry, const char *name, void *buf, +size_t size) +{ + return __vfs_getxattr(dentry, d_inode(dentry), name, buf, size); +} + struct super_block *ovl_same_sb(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; @@ -537,9 +543,9 @@ void ovl_copy_up_end(struct dentry *dentry) bool ovl_check_origin_xattr(struct dentry *dentry) { - int res; + ssize_t res; - res = vfs_getxattr(dentry, OVL_XATTR_ORIGIN, NULL, 0); + res = ovl_vfs_getxattr(dentry, OVL_XATTR_ORIGIN, NULL, 0); /* Zero size value means "copied up but origin unknown" */ if (res >= 0) @@ -550,13 +556,13 @@ bool ovl_check_origin_xattr(struct dentry *dentry) bool ovl_check_dir_xattr(struct dentry *dentry, const char *name) { - int res; + ssize_t res; char val; if (!d_is_dir(dentry)) return false; - res = vfs_getxattr(dentry, name, &val, 1); + res = ovl_vfs_getxattr(dentry, name, &val, 1); if (res == 1 && val == 'y') return true; @@ -837,13 +843,13 @@ int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir) /* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */ int ovl_check_metacopy_xattr(struct dentry *dentry) { - int res; + ssize_t res; /* Only regular files can have metacopy xattr */ if (!S_ISREG
[PATCH v10 3/5] overlayfs: add __get xattr method
Because of the overlayfs getxattr recursion, the incoming inode fails to update the selinux sid resulting in avc denials being reported against a target context of u:object_r:unlabeled:s0. Solution is to add a _get xattr method that calls the __vfs_getxattr handler so that the context can be read in, rather than being denied with an -EACCES when vfs_getxattr handler is called. Signed-off-by: Mark Salyzyn Cc: Miklos Szeredi Cc: Jonathan Corbet Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Amir Goldstein Cc: Randy Dunlap Cc: Stephen Smalley Cc: linux-unio...@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: kernel-t...@android.com --- v10 - added to patch series --- fs/overlayfs/inode.c | 15 +++ fs/overlayfs/overlayfs.h | 2 ++ fs/overlayfs/super.c | 18 ++ 3 files changed, 35 insertions(+) diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 7663aeb85fa3..d3b53849615c 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -362,6 +362,21 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name, return err; } +int __ovl_xattr_get(struct dentry *dentry, struct inode *inode, + const char *name, void *value, size_t size) +{ + ssize_t res; + const struct cred *old_cred; + struct dentry *realdentry = + ovl_i_dentry_upper(inode) ?: ovl_dentry_lower(dentry); + + old_cred = ovl_override_creds(dentry->d_sb); + res = __vfs_getxattr(realdentry, d_inode(realdentry), name, value, +size); + ovl_revert_creds(old_cred); + return res; +} + int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name, void *value, size_t size) { diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 6934bcf030f0..73a02a263fbc 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -357,6 +357,8 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name, const void *value, size_t size, int flags); int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name, void *value, size_t size); +int __ovl_xattr_get(struct dentry *dentry, struct inode *inode, + const char *name, void *value, size_t size); ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size); struct posix_acl *ovl_get_acl(struct inode *inode, int type); int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags); diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index b368e2e102fa..82e1130de206 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -859,6 +859,14 @@ ovl_posix_acl_xattr_get(const struct xattr_handler *handler, return ovl_xattr_get(dentry, inode, handler->name, buffer, size); } +static int __maybe_unused +__ovl_posix_acl_xattr_get(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, void *buffer, size_t size) +{ + return __ovl_xattr_get(dentry, inode, handler->name, buffer, size); +} + static int __maybe_unused ovl_posix_acl_xattr_set(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, @@ -939,6 +947,13 @@ static int ovl_other_xattr_get(const struct xattr_handler *handler, return ovl_xattr_get(dentry, inode, name, buffer, size); } +static int __ovl_other_xattr_get(const struct xattr_handler *handler, +struct dentry *dentry, struct inode *inode, +const char *name, void *buffer, size_t size) +{ + return __ovl_xattr_get(dentry, inode, name, buffer, size); +} + static int ovl_other_xattr_set(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, const char *name, const void *value, @@ -952,6 +967,7 @@ ovl_posix_acl_access_xattr_handler = { .name = XATTR_NAME_POSIX_ACL_ACCESS, .flags = ACL_TYPE_ACCESS, .get = ovl_posix_acl_xattr_get, + .__get = __ovl_posix_acl_xattr_get, .set = ovl_posix_acl_xattr_set, }; @@ -960,6 +976,7 @@ ovl_posix_acl_default_xattr_handler = { .name = XATTR_NAME_POSIX_ACL_DEFAULT, .flags = ACL_TYPE_DEFAULT, .get = ovl_posix_acl_xattr_get, + .__get = __ovl_posix_acl_xattr_get, .set = ovl_posix_acl_xattr_set, }; @@ -972,6 +989,7 @@ static const struct xattr_handler ovl_own_xattr_handler = { static const struct xattr_handler ovl_other_xattr_handler = { .prefix = "", /* catch all */ .get = ovl_other_xattr_get, + .__get = __ovl_other_xattr_get, .set = ovl_other_xattr_set, }; -- 2.22.0.657.g960e92d24f-goog
[PATCH v10 5/5] overlayfs: override_creds=off option bypass creator_cred
By default, all access to the upper, lower and work directories is the recorded mounter's MAC and DAC credentials. The incoming accesses are checked against the caller's credentials. If the principles of least privilege are applied, the mounter's credentials might not overlap the credentials of the caller's when accessing the overlayfs filesystem. For example, a file that a lower DAC privileged caller can execute, is MAC denied to the generally higher DAC privileged mounter, to prevent an attack vector. We add the option to turn off override_creds in the mount options; all subsequent operations after mount on the filesystem will be only the caller's credentials. The module boolean parameter and mount option override_creds is also added as a presence check for this "feature", existence of /sys/module/overlay/parameters/override_creds. It was not always this way. Circa 4.6 there was no recorded mounter's credentials, instead privileged access to upper or work directories were temporarily increased to perform the operations. The MAC (selinux) policies were caller's in all cases. override_creds=off partially returns us to this older access model minus the insecure temporary credential increases. This is to permit use in a system with non-overlapping security models for each executable including the agent that mounts the overlayfs filesystem. In Android this is the case since init, which performs the mount operations, has a minimal MAC set of privileges to reduce any attack surface, and services that use the content have a different set of MAC privileges (eg: read, for vendor labelled configuration, execute for vendor libraries and modules). The caveats are not a problem in the Android usage model, however they should be fixed for completeness and for general use in time. Signed-off-by: Mark Salyzyn Cc: Miklos Szeredi Cc: Jonathan Corbet Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Amir Goldstein Cc: Randy Dunlap Cc: Stephen Smalley Cc: linux-unio...@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: kernel-t...@android.com --- v10: - Rebase (and expand because of increased revert_cred usage) v9: - Add to the caveats v8: - drop pr_warn message after straw poll to remove it. - added a use case in the commit message v7: - change name of internal parameter to ovl_override_creds_def - report override_creds only if different than default v6: - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS. - Do better with the documentation. - pr_warn message adjusted to report consequences. v5: - beefed up the caveats in the Documentation - Is dependent on "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh" "overlayfs: check CAP_MKNOD before issuing vfs_whiteout" - Added prwarn when override_creds=off v4: - spelling and grammar errors in text v3: - Change name from caller_credentials / creator_credentials to the boolean override_creds. - Changed from creator to mounter credentials. - Updated and fortified the documentation. - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS v2: - Forward port changed attr to stat, resulting in a build error. - altered commit message. a --- Documentation/filesystems/overlayfs.txt | 23 +++ fs/overlayfs/copy_up.c | 2 +- fs/overlayfs/dir.c | 11 ++- fs/overlayfs/file.c | 20 ++-- fs/overlayfs/inode.c| 18 +- fs/overlayfs/namei.c| 6 +++--- fs/overlayfs/overlayfs.h| 1 + fs/overlayfs/ovl_entry.h| 1 + fs/overlayfs/readdir.c | 4 ++-- fs/overlayfs/super.c| 22 +- fs/overlayfs/util.c | 12 ++-- 11 files changed, 87 insertions(+), 33 deletions(-) diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 1da2f1668f08..d48125076602 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -102,6 +102,29 @@ Only the lists of names from directories are merged. Other content such as metadata and extended attributes are reported for the upper directory only. These attributes of the lower directory are hidden. +credentials +--- + +By default, all access to the upper, lower and work directories is the +recorded mounter's MAC and DAC credentials. The incoming accesses are +checked against the caller's credentials. + +In the case where caller MAC or DAC credentials do not overlap, a +use case available in older versions of the driver, the +override_creds mount flag can be turned off and help when the use +pattern has caller with legitimate credentials where the mounter +does not. Several unintended side effects will occur though. The +caller without certain key capabilities or lower privilege will not +always be able to delete files or directories, create nodes, or +
Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing
On Mon, Jul 22, 2019 at 03:06:39PM -0700, Andrew Morton wrote: [snip] > > + *end = *start + count * BITS_PER_BYTE; > > + if (*end > max_frame) > > + *end = max_frame; > > + return 0; > > +} > > + > > > > ... > > > > +static void add_page_idle_list(struct page *page, > > + unsigned long addr, struct mm_walk *walk) > > +{ > > + struct page *page_get; > > + struct page_node *pn; > > + int bit; > > + unsigned long frames; > > + struct page_idle_proc_priv *priv = walk->private; > > + u64 *chunk = (u64 *)priv->buffer; > > + > > + if (priv->write) { > > + /* Find whether this page was asked to be marked */ > > + frames = (addr - priv->start_addr) >> PAGE_SHIFT; > > + bit = frames % BITMAP_CHUNK_BITS; > > + chunk = &chunk[frames / BITMAP_CHUNK_BITS]; > > + if (((*chunk >> bit) & 1) == 0) > > + return; > > + } > > + > > + page_get = page_idle_get_page(page); > > + if (!page_get) > > + return; > > + > > + pn = kmalloc(sizeof(*pn), GFP_ATOMIC); > > I'm not liking this GFP_ATOMIC. If I'm reading the code correctly, > userspace can ask for an arbitrarily large number of GFP_ATOMIC > allocations by doing a large read. This can potentially exhaust page > reserves which things like networking Rx interrupts need and can make > this whole feature less reliable. For the revision, I will pre-allocate the page nodes in advance so it does not need to do this. Diff on top of this patch is below. Let me know any comments, thanks. Btw, I also dropped the idle_page_list_lock by putting the idle_page_list list_head on the stack instead of heap. ---8<--- From: "Joel Fernandes (Google)" Subject: [PATCH] mm/page_idle: Avoid need for GFP_ATOMIC GFP_ATOMIC can harm allocations does by other allocations that are in need of reserves and the like. Pre-allocate the nodes list so that spinlocked region can just use it. Suggested-by: Andrew Morton Signed-off-by: Joel Fernandes (Google) --- mm/page_idle.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/mm/page_idle.c b/mm/page_idle.c index 874a60c41fef..b9c790721f16 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -266,6 +266,10 @@ struct page_idle_proc_priv { unsigned long start_addr; char *buffer; int write; + + /* Pre-allocate and provide nodes to add_page_idle_list() */ + struct page_node *page_nodes; + int cur_page_node; }; static void add_page_idle_list(struct page *page, @@ -291,10 +295,7 @@ static void add_page_idle_list(struct page *page, if (!page_get) return; - pn = kmalloc(sizeof(*pn), GFP_ATOMIC); - if (!pn) - return; - + pn = &(priv->page_nodes[priv->cur_page_node++]); pn->page = page_get; pn->addr = addr; list_add(&pn->list, &idle_page_list); @@ -379,6 +380,15 @@ ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff, priv.buffer = buffer; priv.start_addr = start_addr; priv.write = write; + + priv.cur_page_node = 0; + priv.page_nodes = kzalloc(sizeof(struct page_node) * (end_frame - start_frame), + GFP_KERNEL); + if (!priv.page_nodes) { + ret = -ENOMEM; + goto out; + } + walk.private = &priv; walk.mm = mm; @@ -425,6 +435,7 @@ ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff, ret = copy_to_user(ubuff, buffer, count); up_read(&mm->mmap_sem); + kfree(priv.page_nodes); out: kfree(buffer); out_mmput: -- 2.22.0.657.g960e92d24f-goog
Re: [PATCH v10 3/5] overlayfs: add __get xattr method
On Wed, Jul 24, 2019 at 10:57 PM Mark Salyzyn wrote: > > Because of the overlayfs getxattr recursion, the incoming inode fails > to update the selinux sid resulting in avc denials being reported > against a target context of u:object_r:unlabeled:s0. This description is too brief for me to understand the root problem. What's wring with the overlayfs getxattr recursion w.r.t the selinux security model? Please give an example of your unprivileged mounter use case to explain. CC Vivek because I could really never understand all this. > > Solution is to add a _get xattr method that calls the __vfs_getxattr > handler so that the context can be read in, rather than being denied > with an -EACCES when vfs_getxattr handler is called. > > Signed-off-by: Mark Salyzyn > Cc: Miklos Szeredi > Cc: Jonathan Corbet > Cc: Vivek Goyal > Cc: Eric W. Biederman > Cc: Amir Goldstein > Cc: Randy Dunlap > Cc: Stephen Smalley > Cc: linux-unio...@vger.kernel.org > Cc: linux-doc@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: kernel-t...@android.com > --- > v10 - added to patch series > --- > fs/overlayfs/inode.c | 15 +++ > fs/overlayfs/overlayfs.h | 2 ++ > fs/overlayfs/super.c | 18 ++ > 3 files changed, 35 insertions(+) > > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c > index 7663aeb85fa3..d3b53849615c 100644 > --- a/fs/overlayfs/inode.c > +++ b/fs/overlayfs/inode.c > @@ -362,6 +362,21 @@ int ovl_xattr_set(struct dentry *dentry, struct inode > *inode, const char *name, > return err; > } > > +int __ovl_xattr_get(struct dentry *dentry, struct inode *inode, > + const char *name, void *value, size_t size) > +{ > + ssize_t res; > + const struct cred *old_cred; > + struct dentry *realdentry = > + ovl_i_dentry_upper(inode) ?: ovl_dentry_lower(dentry); > + > + old_cred = ovl_override_creds(dentry->d_sb); > + res = __vfs_getxattr(realdentry, d_inode(realdentry), name, value, > +size); > + ovl_revert_creds(old_cred); > + return res; > +} > + > int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char > *name, > void *value, size_t size) > { > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h > index 6934bcf030f0..73a02a263fbc 100644 > --- a/fs/overlayfs/overlayfs.h > +++ b/fs/overlayfs/overlayfs.h > @@ -357,6 +357,8 @@ int ovl_xattr_set(struct dentry *dentry, struct inode > *inode, const char *name, > const void *value, size_t size, int flags); > int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char > *name, > void *value, size_t size); > +int __ovl_xattr_get(struct dentry *dentry, struct inode *inode, > + const char *name, void *value, size_t size); > ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size); > struct posix_acl *ovl_get_acl(struct inode *inode, int type); > int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags); > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index b368e2e102fa..82e1130de206 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -859,6 +859,14 @@ ovl_posix_acl_xattr_get(const struct xattr_handler > *handler, > return ovl_xattr_get(dentry, inode, handler->name, buffer, size); > } > > +static int __maybe_unused > +__ovl_posix_acl_xattr_get(const struct xattr_handler *handler, > + struct dentry *dentry, struct inode *inode, > + const char *name, void *buffer, size_t size) > +{ > + return __ovl_xattr_get(dentry, inode, handler->name, buffer, size); > +} > + > static int __maybe_unused > ovl_posix_acl_xattr_set(const struct xattr_handler *handler, > struct dentry *dentry, struct inode *inode, > @@ -939,6 +947,13 @@ static int ovl_other_xattr_get(const struct > xattr_handler *handler, > return ovl_xattr_get(dentry, inode, name, buffer, size); > } > > +static int __ovl_other_xattr_get(const struct xattr_handler *handler, > +struct dentry *dentry, struct inode *inode, > +const char *name, void *buffer, size_t size) > +{ > + return __ovl_xattr_get(dentry, inode, name, buffer, size); > +} > + > static int ovl_other_xattr_set(const struct xattr_handler *handler, >struct dentry *dentry, struct inode *inode, >const char *name, const void *value, > @@ -952,6 +967,7 @@ ovl_posix_acl_access_xattr_handler = { > .name = XATTR_NAME_POSIX_ACL_ACCESS, > .flags = ACL_TYPE_ACCESS, > .get = ovl_posix_acl_xattr_get, > + .__get = __ovl_posix_acl_xattr_get, > .set = ovl_posix_acl_xattr_set, > }; > > @@ -960,6 +976,7 @@ ovl_posix_acl_default_xattr_handler = { > .name = XATTR_NAME_POSIX_ACL
Re: [PATCH v10 5/5] overlayfs: override_creds=off option bypass creator_cred
On Wed, Jul 24, 2019 at 10:57 PM Mark Salyzyn wrote: > > By default, all access to the upper, lower and work directories is the > recorded mounter's MAC and DAC credentials. The incoming accesses are > checked against the caller's credentials. > > If the principles of least privilege are applied, the mounter's > credentials might not overlap the credentials of the caller's when > accessing the overlayfs filesystem. For example, a file that a lower > DAC privileged caller can execute, is MAC denied to the generally > higher DAC privileged mounter, to prevent an attack vector. > > We add the option to turn off override_creds in the mount options; all > subsequent operations after mount on the filesystem will be only the > caller's credentials. The module boolean parameter and mount option > override_creds is also added as a presence check for this "feature", > existence of /sys/module/overlay/parameters/override_creds. > > It was not always this way. Circa 4.6 there was no recorded mounter's > credentials, instead privileged access to upper or work directories > were temporarily increased to perform the operations. The MAC > (selinux) policies were caller's in all cases. override_creds=off > partially returns us to this older access model minus the insecure > temporary credential increases. This is to permit use in a system > with non-overlapping security models for each executable including > the agent that mounts the overlayfs filesystem. In Android > this is the case since init, which performs the mount operations, > has a minimal MAC set of privileges to reduce any attack surface, > and services that use the content have a different set of MAC > privileges (eg: read, for vendor labelled configuration, execute for > vendor libraries and modules). The caveats are not a problem in > the Android usage model, however they should be fixed for > completeness and for general use in time. > > Signed-off-by: Mark Salyzyn > Cc: Miklos Szeredi > Cc: Jonathan Corbet > Cc: Vivek Goyal > Cc: Eric W. Biederman > Cc: Amir Goldstein > Cc: Randy Dunlap > Cc: Stephen Smalley > Cc: linux-unio...@vger.kernel.org > Cc: linux-doc@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: kernel-t...@android.com > --- > v10: > - Rebase (and expand because of increased revert_cred usage) > > v9: > - Add to the caveats > > v8: > - drop pr_warn message after straw poll to remove it. > - added a use case in the commit message > > v7: > - change name of internal parameter to ovl_override_creds_def > - report override_creds only if different than default > > v6: > - Drop CONFIG_OVERLAY_FS_OVERRIDE_CREDS. > - Do better with the documentation. > - pr_warn message adjusted to report consequences. > > v5: > - beefed up the caveats in the Documentation > - Is dependent on > "overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh" > "overlayfs: check CAP_MKNOD before issuing vfs_whiteout" > - Added prwarn when override_creds=off > > v4: > - spelling and grammar errors in text > > v3: > - Change name from caller_credentials / creator_credentials to the > boolean override_creds. > - Changed from creator to mounter credentials. > - Updated and fortified the documentation. > - Added CONFIG_OVERLAY_FS_OVERRIDE_CREDS > > v2: > - Forward port changed attr to stat, resulting in a build error. > - altered commit message. > > a > --- > Documentation/filesystems/overlayfs.txt | 23 +++ > fs/overlayfs/copy_up.c | 2 +- > fs/overlayfs/dir.c | 11 ++- > fs/overlayfs/file.c | 20 ++-- > fs/overlayfs/inode.c| 18 +- > fs/overlayfs/namei.c| 6 +++--- > fs/overlayfs/overlayfs.h| 1 + > fs/overlayfs/ovl_entry.h| 1 + > fs/overlayfs/readdir.c | 4 ++-- > fs/overlayfs/super.c| 22 +- > fs/overlayfs/util.c | 12 ++-- > 11 files changed, 87 insertions(+), 33 deletions(-) > > diff --git a/Documentation/filesystems/overlayfs.txt > b/Documentation/filesystems/overlayfs.txt > index 1da2f1668f08..d48125076602 100644 > --- a/Documentation/filesystems/overlayfs.txt > +++ b/Documentation/filesystems/overlayfs.txt > @@ -102,6 +102,29 @@ Only the lists of names from directories are merged. > Other content > such as metadata and extended attributes are reported for the upper > directory only. These attributes of the lower directory are hidden. > > +credentials > +--- > + > +By default, all access to the upper, lower and work directories is the > +recorded mounter's MAC and DAC credentials. The incoming accesses are > +checked against the caller's credentials. > + > +In the case where caller MAC or DAC credentials do not overlap, a > +use case available in older versions of the driver, the > +override_creds mount flag can be turned off and help wh