Re: [PATCH v1 4/7] ARM: dts: imx6dl-prtvt7: add TSC2046 touchscreen node
On 2021-01-20 15:22, Oleksij Rempel wrote: Add touchscreen support to the Protonic VT7 board. Co-Developed-by: Robin van der Gracht Signed-off-by: Robin van der Gracht Signed-off-by: Oleksij Rempel --- arch/arm/boot/dts/imx6dl-prtvt7.dts | 20 1 file changed, 20 insertions(+) diff --git a/arch/arm/boot/dts/imx6dl-prtvt7.dts b/arch/arm/boot/dts/imx6dl-prtvt7.dts index d9cb1e41cc10..63ae2065834c 100644 --- a/arch/arm/boot/dts/imx6dl-prtvt7.dts +++ b/arch/arm/boot/dts/imx6dl-prtvt7.dts @@ -266,6 +266,26 @@ &ecspi2 { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_ecspi2>; status = "okay"; + + touchscreen@0 { + compatible = "ti,tsc2046"; + reg = <0>; + pinctrl-0 = <&pinctrl_tsc>; + pinctrl-names ="default"; + spi-max-frequency = <10>; + interrupts-extended = <&gpio3 20 IRQ_TYPE_EDGE_FALLING>; + pendown-gpio = <&gpio3 20 GPIO_ACTIVE_LOW>; + + touchscreen-inverted-x; + touchscreen-inverted-y; Please remove both inverted properties since it's not inverted. + touchscreen-max-pressure = <4095>; + + ti,vref-delay-usecs = /bits/ 16 <100>; + ti,x-plate-ohms = /bits/ 16 <800>; + ti,y-plate-ohms = /bits/ 16 <300>; + + wakeup-source; + }; }; &i2c1 { Robin ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 12/12] auxdisplay: constify fb ops
Hello Jani, On 2019-12-09 15:03, Jani Nikula wrote: On Tue, 03 Dec 2019, Jani Nikula wrote: Now that the fbops member of struct fb_info is const, we can start making the ops const as well. Cc: Miguel Ojeda Sandonis Cc: Robin van der Gracht Reviewed-by: Daniel Vetter Reviewed-by: Miguel Ojeda Acked-by: Robin van der Gracht Signed-off-by: Jani Nikula Miguel, Robin, just to err on the safe side, were you both okay with me merging this through drm-misc? Not very likely to conflict badly. ht16k33 driver hasn't seen much change lately, should be fine. Robin ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 14/14] auxdisplay: constify fb ops
On 2019-11-29 11:29, Jani Nikula wrote: Now that the fbops member of struct fb_info is const, we can start making the ops const as well. Cc: Miguel Ojeda Sandonis Cc: Robin van der Gracht Signed-off-by: Jani Nikula --- drivers/auxdisplay/cfag12864bfb.c | 2 +- drivers/auxdisplay/ht16k33.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/auxdisplay/cfag12864bfb.c b/drivers/auxdisplay/cfag12864bfb.c index 4074886b7bc8..2002291ab338 100644 --- a/drivers/auxdisplay/cfag12864bfb.c +++ b/drivers/auxdisplay/cfag12864bfb.c @@ -57,7 +57,7 @@ static int cfag12864bfb_mmap(struct fb_info *info, struct vm_area_struct *vma) return vm_map_pages_zero(vma, &pages, 1); } -static struct fb_ops cfag12864bfb_ops = { +static const struct fb_ops cfag12864bfb_ops = { .owner = THIS_MODULE, .fb_read = fb_sys_read, .fb_write = fb_sys_write, diff --git a/drivers/auxdisplay/ht16k33.c b/drivers/auxdisplay/ht16k33.c index a2fcde582e2a..d951d54b26f5 100644 --- a/drivers/auxdisplay/ht16k33.c +++ b/drivers/auxdisplay/ht16k33.c @@ -228,7 +228,7 @@ static int ht16k33_mmap(struct fb_info *info, struct vm_area_struct *vma) return vm_map_pages_zero(vma, &pages, 1); } -static struct fb_ops ht16k33_fb_ops = { +static const struct fb_ops ht16k33_fb_ops = { .owner = THIS_MODULE, .fb_read = fb_sys_read, .fb_write = fb_sys_write, Acked-by: Robin van der Gracht ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 14/14] auxdisplay: constify fb ops
On 2019-11-29 21:59, Miguel Ojeda wrote: On Fri, Nov 29, 2019 at 9:30 PM Daniel Vetter wrote: Well we do have very small lcd display drivers in drm, and before that in fbdev. And you have a fbdev framebuffer driver in there, which looks a bit misplaced ... Afaiui you also have some even tinier lcd drivers where you don't address pixels, but just directly upload text, and those obviously don't fit into drm/fbdev world. But anything where you can address pixels very much does. -Daniel The first driver in the category used fb.h. At the time (~13 years ago) it was decided that the drivers should go into a different category/folder instead and then the other were added. In any case, I am removing the original ones since I cannot test them anymore and there are likely no user. The only other fb user could be relocated if Robin agrees. The ht16k33 driver registers a framebuffer, backlight and input device. Not sure if it's OK to let a driver under fbdev register all of those, but relocating it is fine by me. Robin ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA
On 2021-04-09 14:39, David Hildenbrand wrote: On 09.04.21 15:35, Arnd Bergmann wrote: On Fri, Apr 9, 2021 at 1:21 PM David Hildenbrand wrote: Random drivers should not override a user configuration of core knobs (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA, which depends on CMA, if possible; however, these drivers also have to tolerate if DMA_CMA is not available/functioning, for example, if no CMA area for DMA_CMA use has been setup via "cma=X". In the worst case, the driver cannot do it's job properly in some configurations. For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if available") documents While this is no build dependency, etnaviv will only work correctly on most systems if CMA and DMA_CMA are enabled. Select both options if available to avoid users ending up with a non-working GPU due to a lacking kernel config. So etnaviv really wants to have DMA_CMA, however, can deal with some cases where it is not available. Let's introduce WANT_DMA_CMA and use it in most cases where drivers select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because of recursive dependency issues). We'll assume that any driver that selects DRM_GEM_CMA_HELPER or DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible. With this change, distributions can disable CONFIG_CMA or CONFIG_DMA_CMA, without it silently getting enabled again by random drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA and CONFIG_DMA_CMA if they are unspecified and any driver is around that selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or DRM_KMS_CMA_HELPER. For example, if any driver selects WANT_DMA_CMA and we do a "make olddefconfig": 1. With "# CONFIG_CMA is not set" and no specification of "CONFIG_DMA_CMA" -> CONFIG_DMA_CMA won't be part of .config 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA Contiguous Memory Allocator (CMA) [Y/n/?] (NEW) DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW) 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set" -> CONFIG_DMA_CMA will be removed from .config Note: drivers/remoteproc seems to be special; commit c51e882cd711 ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that there is a real dependency to DMA_CMA for it to work; leave that dependency in place and don't convert it to a soft dependency. I don't think this dependency is fundamentally different from the others, though davinci machines tend to have less memory than a lot of the other machines, so it's more likely to fail without CMA. I was also unsure - and Lucas had similar thoughts. If you want, I can send a v4 also taking care of this. TBH I think it should all just be removed. DMA_CMA is effectively an internal feature of the DMA API, and drivers which simply use the DMA API shouldn't really be trying to assume *how* things might be allocated at runtime - CMA is hardly the only way. Platform-level assumptions about the presence or not of IOMMUs, memory carveouts, etc., and whether it even matters - e.g. a device with a tiny LCD may only need display buffers which still fit in a regular MAX_ORDER allocation - could go in platform-specific configs, but I really don't think they belong at the generic subsystem level. We already have various examples like I2S drivers that won't even probe without a dmaengine provider being present, or host controller drivers which are useless without their corresponding phy driver (and I'm guessing you can probably also do higher-level things like include the block layer but omit all filesystem drivers). I don't believe it's Kconfig's job to try to guess whether a given configuration is *useful*, only to enforce that's it's valid to build. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
On 2021/04/22 Lucas Stach wrote: > > But the timer of runtime_auto_suspend decide when enter runtime > > suspend rather than hardware, while transfer data size and transfer > > rate on IP bus decide when the dma interrupt happen. > > > But it isn't the hardware that decides to drop the rpm refcount to 0 and > starting the autosuspend timer, it's the driver. Yes, driver should keep rpm refcount never to 0 in such case. But I think that case Is a common case in dma cyclic with runtime_auto_suspend, so some dma driver also add pm_runtime_get_sync/ pm_runtime_put_autosuspend in interrupt handler like qcom/bam_dma.c for safe rather than only pm_runtime_mark_last_busy(). > > > Generally, we can call pm_runtime_get_sync(fsl_chan->dev)/ > > pm_runtime_mark_last_busy in interrupt handler to hope the > > runtime_auto_suspend timer expiry later than interrupt coming, but if > > the transfer data size is larger in cyclic and transfer rate is very > > slow like 115200 or lower on uart, the fix autosuspend timer > > 100ms/200ms maybe not enough, hence, runtime suspend may execute > > meanwhile the dma interrupt maybe triggered and caught by GIC(but > > interrupt handler prevent by spin_lock_irqsave in pm_suspend_timer_fn() ), > and then interrupt handler start to run after runtime suspend. > > If your driver code drops the rpm refcount to 0 and starts the autosuspend > timer while a cyclic transfer is still in flight this is clearly a bug. > Autosuspend is > not there to paper over driver bugs, but to amortize cost of actually > suspending and resuming the hardware. Your driver code must still work even > if the timeout is 0, i.e. the hardware is immediately suspended after you drop > the rpm refcount to 0. > > If you still have transfers queued/in-flight the driver code must keep a rpm > reference. Yes, that's what I said for fix before as below. 'I have a simple workaround that disable runtime suspend in issue_pending worker by calling pm_runtime_forbid() and then enable runtime auto suspend in dmaengine_terminate_all so that we could easily regard that edma channel is always in runtime resume between issue_pending and channel terminated and ignore the above interrupt handler/scu-pd limitation' ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 08/16] swiotlb: Update is_swiotlb_active to add a struct device argument
On 2021-04-22 09:15, Claire Chang wrote: Update is_swiotlb_active to add a struct device argument. This will be useful later to allow for restricted DMA pool. Signed-off-by: Claire Chang --- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +- drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +- drivers/pci/xen-pcifront.c | 2 +- include/linux/swiotlb.h | 4 ++-- kernel/dma/direct.c | 2 +- kernel/dma/swiotlb.c | 4 ++-- 6 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index ce6b664b10aa..7d48c433446b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) max_order = MAX_ORDER; #ifdef CONFIG_SWIOTLB - if (is_swiotlb_active()) { + if (is_swiotlb_active(NULL)) { unsigned int max_segment; max_segment = swiotlb_max_segment(); diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index e8b506a6685b..2a2ae6d6cf6d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm) } #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86) - need_swiotlb = is_swiotlb_active(); + need_swiotlb = is_swiotlb_active(NULL); #endif ret = ttm_device_init(&drm->ttm.bdev, &nouveau_bo_driver, drm->dev->dev, diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index b7a8f3a1921f..6d548ce53ce7 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev) spin_unlock(&pcifront_dev_lock); - if (!err && !is_swiotlb_active()) { + if (!err && !is_swiotlb_active(NULL)) { err = pci_xen_swiotlb_init_late(); if (err) dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n"); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 2a6cca07540b..c530c976d18b 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -123,7 +123,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); -bool is_swiotlb_active(void); +bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); #else #define swiotlb_force SWIOTLB_NO_FORCE @@ -143,7 +143,7 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev) return SIZE_MAX; } -static inline bool is_swiotlb_active(void) +static inline bool is_swiotlb_active(struct device *dev) { return false; } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 84c9feb5474a..7a88c34d0867 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask) size_t dma_direct_max_mapping_size(struct device *dev) { /* If SWIOTLB is active, use its maximum mapping size */ - if (is_swiotlb_active() && + if (is_swiotlb_active(dev) && (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) I wonder if it's worth trying to fold these other conditions into is_swiotlb_active() itself? I'm not entirely sure what matters for Xen, but for the other cases it seems like they probably only care about whether bouncing may occur for their particular device or not (possibly they want to be using dma_max_mapping_size() now anyway - TBH I'm struggling to make sense of what the swiotlb_max_segment business is supposed to mean). Otherwise, patch #9 will need to touch here as well to make sure that per-device forced bouncing is reflected correctly. Robin. return swiotlb_max_mapping_size(dev); return SIZE_MAX; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index ffbb8724e06c..1d221343f1c8 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -659,9 +659,9 @@ size_t swiotlb_max_mapping_size(struct device *dev) return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE; } -bool is_swiotlb_active(void) +bool is_swiotlb_active(struct device *dev) { - return io_tlb_default_mem != NULL; + return get_io_tlb_mem(dev) != NULL; } EXPORT_SYMBOL_GPL(is_swiotlb_active); ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 16/16] of: Add plumbing for restricted DMA pool
On 2021-04-22 09:15, Claire Chang wrote: If a device is not behind an IOMMU, we look up the device node and set up the restricted DMA when the restricted-dma-pool is presented. Signed-off-by: Claire Chang --- drivers/of/address.c| 25 + drivers/of/device.c | 3 +++ drivers/of/of_private.h | 5 + 3 files changed, 33 insertions(+) diff --git a/drivers/of/address.c b/drivers/of/address.c index 54f221dde267..fff3adfe4986 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -1109,6 +1110,30 @@ bool of_dma_is_coherent(struct device_node *np) } EXPORT_SYMBOL_GPL(of_dma_is_coherent); +int of_dma_set_restricted_buffer(struct device *dev) +{ + struct device_node *node; + int count, i; + + if (!dev->of_node) + return 0; + + count = of_property_count_elems_of_size(dev->of_node, "memory-region", + sizeof(phandle)); + for (i = 0; i < count; i++) { + node = of_parse_phandle(dev->of_node, "memory-region", i); + /* There might be multiple memory regions, but only one +* restriced-dma-pool region is allowed. +*/ What's the use-case for having multiple regions if the restricted pool is by definition the only one accessible? Robin. + if (of_device_is_compatible(node, "restricted-dma-pool") && + of_device_is_available(node)) + return of_reserved_mem_device_init_by_idx( + dev, dev->of_node, i); + } + + return 0; +} + /** * of_mmio_is_nonposted - Check if device uses non-posted MMIO * @np: device node diff --git a/drivers/of/device.c b/drivers/of/device.c index c5a9473a5fb1..d8d865223e51 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); + if (!iommu) + return of_dma_set_restricted_buffer(dev); + return 0; } EXPORT_SYMBOL_GPL(of_dma_configure_id); diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index d717efbd637d..e9237f5eff48 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -163,12 +163,17 @@ struct bus_dma_region; #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA) int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map); +int of_dma_set_restricted_buffer(struct device *dev); #else static inline int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) { return -ENODEV; } +static inline int of_dma_get_restricted_buffer(struct device *dev) +{ + return -ENODEV; +} #endif #endif /* _LINUX_OF_PRIVATE_H */ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 14/16] dma-direct: Allocate memory from restricted DMA pool if available
On 2021-04-22 09:15, Claire Chang wrote: The restricted DMA pool is preferred if available. The restricted DMA pools provide a basic level of protection against the DMA overwriting buffer contents at unexpected times. However, to protect against general data leakage and system memory corruption, the system needs to provide a way to lock down the memory access, e.g., MPU. Signed-off-by: Claire Chang --- kernel/dma/direct.c | 35 ++- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 7a27f0510fcc..29523d2a9845 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -78,6 +78,10 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) static void __dma_direct_free_pages(struct device *dev, struct page *page, size_t size) { +#ifdef CONFIG_DMA_RESTRICTED_POOL + if (swiotlb_free(dev, page, size)) + return; +#endif dma_free_contiguous(dev, page, size); } @@ -92,7 +96,17 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, &phys_limit); - page = dma_alloc_contiguous(dev, size, gfp); + +#ifdef CONFIG_DMA_RESTRICTED_POOL + page = swiotlb_alloc(dev, size); + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { + __dma_direct_free_pages(dev, page, size); + page = NULL; + } +#endif + + if (!page) + page = dma_alloc_contiguous(dev, size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, size); page = NULL; @@ -148,7 +162,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, gfp |= __GFP_NOWARN; if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && - !force_dma_unencrypted(dev)) { + !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) { page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO); if (!page) return NULL; @@ -161,8 +175,8 @@ void *dma_direct_alloc(struct device *dev, size_t size, } if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && - !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - !dev_is_dma_coherent(dev)) + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) && + !is_dev_swiotlb_force(dev)) return arch_dma_alloc(dev, size, dma_handle, gfp, attrs); /* @@ -172,7 +186,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && !gfpflags_allow_blocking(gfp) && (force_dma_unencrypted(dev) || -(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev +(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + !dev_is_dma_coherent(dev))) && + !is_dev_swiotlb_force(dev)) return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp); /* we always manually zero the memory once we are done */ @@ -253,15 +269,15 @@ void dma_direct_free(struct device *dev, size_t size, unsigned int page_order = get_order(size); if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && - !force_dma_unencrypted(dev)) { + !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ dma_free_contiguous(dev, cpu_addr, size); return; } if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && - !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - !dev_is_dma_coherent(dev)) { + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) && + !is_dev_swiotlb_force(dev)) { arch_dma_free(dev, size, cpu_addr, dma_addr, attrs); return; } @@ -289,7 +305,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size, void *ret; if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && - force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) + force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) && + !is_dev_swiotlb_force(dev)) return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp); Wait, this seems broken for non-coherent devices - in that case we need to return a non-cacheable address, but we can't simply fall through into the remapping path below in GFP_ATOMIC context. That's why we need the atomic pool conc
Re: [PATCH] drm/i915: switch from 'pci_' to 'dma_' API
Hi, FWIW this patch itself looks fine, but it does highlight some things which could be further cleaned up if anyone's interested... On 2021-08-22 22:06, Christophe JAILLET wrote: [...] diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index a74b72f50cc9..afb35d2e5c73 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -32,7 +32,7 @@ static int init_fake_lmem_bar(struct intel_memory_region *mem) mem->remap_addr = dma_map_resource(i915->drm.dev, mem->region.start, mem->fake_mappable.size, - PCI_DMA_BIDIRECTIONAL, + DMA_BIDIRECTIONAL, DMA_ATTR_FORCE_CONTIGUOUS); DMA_ATTR_FORCE_CONTIGUOUS is nonsensical here (and below) as it is only meaningful for coherent buffers allocated by dma_alloc_attrs(). if (dma_mapping_error(i915->drm.dev, mem->remap_addr)) { drm_mm_remove_node(&mem->fake_mappable); @@ -62,7 +62,7 @@ static void release_fake_lmem_bar(struct intel_memory_region *mem) dma_unmap_resource(mem->i915->drm.dev, mem->remap_addr, mem->fake_mappable.size, - PCI_DMA_BIDIRECTIONAL, + DMA_BIDIRECTIONAL, DMA_ATTR_FORCE_CONTIGUOUS); } [...] diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 36489be4896b..cd5f2348a187 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -30,7 +30,7 @@ int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj, do { if (dma_map_sg_attrs(obj->base.dev->dev, pages->sgl, pages->nents, -PCI_DMA_BIDIRECTIONAL, +DMA_BIDIRECTIONAL, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING | DMA_ATTR_NO_WARN)) Similarly DMA_ATTR_NO_KERNEL_MAPPING and DMA_ATTR_NO_WARN are also for coherent allocations rather than streaming mappings. I'll see if I can whip up a patch to make the API documentation clearer... Thanks, Robin. @@ -64,7 +64,7 @@ void i915_gem_gtt_finish_pages(struct drm_i915_gem_object *obj, usleep_range(100, 250); dma_unmap_sg(i915->drm.dev, pages->sgl, pages->nents, -PCI_DMA_BIDIRECTIONAL); +DMA_BIDIRECTIONAL); } /**
Re: [PATCH] parisc/parport_gsc: switch from 'pci_' to 'dma_' API
On 2021-08-23 22:30, Christophe JAILLET wrote: The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below. @@ expression e1, e2, e3, e4; @@ -pci_free_consistent(e1, e2, e3, e4) +dma_free_coherent(&e1->dev, e2, e3, e4) Signed-off-by: Christophe JAILLET --- If needed, see post from Christoph Hellwig on the kernel-janitors ML: https://marc.info/?l=kernel-janitors&m=158745678307186&w=4 This has *NOT* been compile tested because I don't have the needed configuration. ssdfs --- drivers/parport/parport_gsc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/parport/parport_gsc.c b/drivers/parport/parport_gsc.c index 1e43b3f399a8..db912fa6b6df 100644 --- a/drivers/parport/parport_gsc.c +++ b/drivers/parport/parport_gsc.c @@ -390,9 +390,8 @@ static int __exit parport_remove_chip(struct parisc_device *dev) if (p->irq != PARPORT_IRQ_NONE) free_irq(p->irq, p); if (priv->dma_buf) - pci_free_consistent(priv->dev, PAGE_SIZE, - priv->dma_buf, - priv->dma_handle); + dma_free_coherent(&priv->dev->dev, PAGE_SIZE, + priv->dma_buf, priv->dma_handle); Hmm, seeing a free on its own made me wonder where the corresponding alloc was, but on closer inspection it seems there isn't one. AFAICS priv->dma_buf is only ever assigned with NULL (and priv->dev doesn't seem to be assigned at all), so this could likely just be removed. In fact it looks like all the references to DMA in this driver are just copy-paste from parport_pc and unused. Robin. kfree (p->private_data); parport_put_port(p); kfree (ops); /* hope no-one cached it */
[PATCH] drm/cma-helper: Set VM_DONTEXPAND for mmap
drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc() will end up calling remap_pfn_range() (which happens to set the relevant vma flag, among others), so in order to make sure expectations around VM_DONTEXPAND are met, let it explicitly set the flag like most other GEM mmap implementations do. This avoids repeated warnings on a small minority of systems where the display is behind an IOMMU, and has a simple driver which does not override drm_gem_cma_default_funcs. Signed-off-by: Robin Murphy --- drivers/gpu/drm/drm_gem_cma_helper.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index d53388199f34..63e48d98263d 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -510,6 +510,7 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) */ vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node); vma->vm_flags &= ~VM_PFNMAP; + vma->vm_flags |= VM_DONTEXPAND; cma_obj = to_drm_gem_cma_obj(obj); -- 2.25.1
Re: [PATCH 2/3] drm/etnaviv: fix dma configuration of the virtual device
On 2021-08-26 13:10, Michael Walle wrote: The DMA configuration of the virtual device is inherited from the first actual etnaviv device. Unfortunately, this doesn't work with an IOMMU: [5.191008] Failed to set up IOMMU for device (null); retaining platform DMA ops This is because there is no associated iommu_group with the device. The group is set in iommu_group_add_device() which is eventually called by device_add() via the platform bus: device_add() blocking_notifier_call_chain() iommu_bus_notifier() iommu_probe_device() __iommu_probe_device() iommu_group_get_for_dev() iommu_group_add_device() Move of_dma_configure() into the probe function, which is called after device_add(). Normally, the platform code will already call it itself if .of_node is set. Unfortunately, this isn't the case here. Also move the dma mask assignemnts to probe() to keep all DMA related settings together. I assume the driver must already keep track of the real GPU platform device in order to map registers, request interrupts, etc. correctly - can't it also correctly use that device for DMA API calls and avoid the need for these shenanigans altogether? FYI, IOMMU configuration is really supposed to *only* run at add_device() time as above - the fact that it's currently hooked in to be retriggered by of_dma_configure() on DT platforms actually turns out to lead to various issues within the IOMMU API, and the plan to change that is slowly climbing up my to-do list. Robin. Signed-off-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_drv.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c index 2509b3e85709..ff6425f6ebad 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c @@ -589,6 +589,7 @@ static int compare_str(struct device *dev, void *data) static int etnaviv_pdev_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; + struct device_node *first_node = NULL; struct component_match *match = NULL; if (!dev->platform_data) { @@ -598,6 +599,9 @@ static int etnaviv_pdev_probe(struct platform_device *pdev) if (!of_device_is_available(core_node)) continue; + if (!first_node) + first_node = core_node; + drm_of_component_match_add(&pdev->dev, &match, compare_of, core_node); } @@ -609,6 +613,17 @@ static int etnaviv_pdev_probe(struct platform_device *pdev) component_match_add(dev, &match, compare_str, names[i]); } + pdev->dev.coherent_dma_mask = DMA_BIT_MASK(40); + pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask; + + /* +* Apply the same DMA configuration to the virtual etnaviv +* device as the GPU we found. This assumes that all Vivante +* GPUs in the system share the same DMA constraints. +*/ + if (first_node) + of_dma_configure(&pdev->dev, first_node, true); + return component_master_add_with_match(dev, &etnaviv_master_ops, match); } @@ -659,15 +674,6 @@ static int __init etnaviv_init(void) of_node_put(np); goto unregister_platform_driver; } - pdev->dev.coherent_dma_mask = DMA_BIT_MASK(40); - pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask; - - /* -* Apply the same DMA configuration to the virtual etnaviv -* device as the GPU we found. This assumes that all Vivante -* GPUs in the system share the same DMA constraints. -*/ - of_dma_configure(&pdev->dev, np, true); ret = platform_device_add(pdev); if (ret) {
Re: [PATCH 2/3] drm/etnaviv: fix dma configuration of the virtual device
On 2021-08-26 16:17, Lucas Stach wrote: Am Donnerstag, dem 26.08.2021 um 16:00 +0100 schrieb Robin Murphy: On 2021-08-26 13:10, Michael Walle wrote: The DMA configuration of the virtual device is inherited from the first actual etnaviv device. Unfortunately, this doesn't work with an IOMMU: [5.191008] Failed to set up IOMMU for device (null); retaining platform DMA ops This is because there is no associated iommu_group with the device. The group is set in iommu_group_add_device() which is eventually called by device_add() via the platform bus: device_add() blocking_notifier_call_chain() iommu_bus_notifier() iommu_probe_device() __iommu_probe_device() iommu_group_get_for_dev() iommu_group_add_device() Move of_dma_configure() into the probe function, which is called after device_add(). Normally, the platform code will already call it itself if .of_node is set. Unfortunately, this isn't the case here. Also move the dma mask assignemnts to probe() to keep all DMA related settings together. I assume the driver must already keep track of the real GPU platform device in order to map registers, request interrupts, etc. correctly - can't it also correctly use that device for DMA API calls and avoid the need for these shenanigans altogether? Not without a bigger rework. There's still quite a bit of midlayer issues in DRM, where dma-buf imports are dma-mapped and cached via the virtual DRM device instead of the real GPU device. Also etnaviv is able to coalesce multiple Vivante GPUs in a single system under one virtual DRM device, which is used on i.MX6 where the 2D and 3D GPUs are separate peripherals, but have the same DMA constraints. Sure, I wouldn't expect it to be trivial to fix properly, but I wanted to point out that this is essentially a hack, relying on an implicit side-effect of of_dma_configure() which is already slated for removal. As such, I for one am not going to be too sympathetic if it stops working in future. Furthermore, even today it doesn't work in general - it might be OK for LS1028A with a single GPU block behind an SMMU, but as soon as you have multiple GPU blocks with distinct SMMU StreamIDs, or behind different IOMMU instances, then you're stuffed again. Although in fact I think it's also broken even for LS1028A, since AFAICS there's no guarantee that the relevant SMMU instance will actually be probed, or the SMMU driver even loaded, when etnaviv_pdev_probe() runs. Effectively we would need to handle N devices for the dma-mapping in a lot of places instead of only dealing with the one virtual DRM device. It would probably be the right thing to anyways, but it's not something that can be changed short-term. I'm also not yet sure about the performance implications, as we might run into some cache maintenance bottlenecks if we dma synchronize buffers to multiple real device instead of doing it a single time with the virtual DRM device. I know, I know, this has a lot of assumptions baked in that could fall apart if someone builds a SoC with multiple Vivante GPUs that have differing DMA constraints, but up until now hardware designers have not been *that* crazy, fortunately. I'm not too familiar with the component stuff, but would it be viable to just have etnaviv_gpu_platform_probe() set up the first GPU which comes along as the master component and fundamental DRM device, then treat any subsequent ones as subcomponents as before? That would at least stand to be more robust in terms of obviating the of_dma_configure() hack (only actual bus code should ever be calling that), even if it won't do anything for the multiple IOMMU mapping or differing DMA constraints problems. Thanks, Robin.
Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-11 08:26, Christoph Hellwig wrote: On Wed, Mar 10, 2021 at 06:39:57PM +, Robin Murphy wrote: Actually... Just mirroring the iommu_dma_strict value into struct iommu_domain should solve all of that with very little boilerplate code. Yes, my initial thought was to directly replace the attribute with a common flag at iommu_domain level, but since in all cases the behaviour is effectively global rather than actually per-domain, it seemed reasonable to take it a step further. This passes compile-testing for arm64 and x86, what do you think? It seems to miss a few bits, and also generally seems to be not actually apply to recent mainline or something like it due to different empty lines in a few places. Yeah, that was sketched out on top of some other development patches, and in being so focused on not breaking any of the x86 behaviours I did indeed overlook fully converting the SMMU drivers... oops! (my thought was to do the conversion for its own sake, then clean up the redundant attribute separately, but I guess it's fine either way) Let me know what you think of the version here: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup I'll happily switch the patch to you as the author if you're fine with that as well. I still have reservations about removing the attribute API entirely and pretending that io_pgtable_cfg is anything other than a SoC-specific private interface, but the reworked patch on its own looks reasonable to me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers either...) Just iommu_get_dma_strict() needs an export since the SMMU drivers can be modular - I consciously didn't add that myself since I was mistakenly thinking only iommu-dma would call it. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-15 08:33, Christoph Hellwig wrote: On Fri, Mar 12, 2021 at 04:18:24PM +, Robin Murphy wrote: Let me know what you think of the version here: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup I'll happily switch the patch to you as the author if you're fine with that as well. I still have reservations about removing the attribute API entirely and pretending that io_pgtable_cfg is anything other than a SoC-specific private interface, I think a private inteface would make more sense. For now I've just condensed it down to a generic set of quirk bits and dropped the attrs structure, which seems like an ok middle ground for now. That being said I wonder why that quirk isn't simply set in the device tree? Because it's a software policy decision rather than any inherent property of the platform, and the DT certainly doesn't know *when* any particular device might prefer its IOMMU to use cacheable pagetables to minimise TLB miss latency vs. saving the cache capacity for larger data buffers. It really is most logical to decide this at the driver level. In truth the overall concept *is* relatively generic (a trend towards larger system caches and cleverer usage is about both raw performance and saving power on off-SoC DRAM traffic), it's just the particular implementation of using io-pgtable to set an outer-cacheable walk attribute in an SMMU TCR that's pretty much specific to Qualcomm SoCs. Hence why having a common abstraction at the iommu_domain level, but where the exact details are free to vary across different IOMMUs and their respective client drivers, is in many ways an ideal fit. but the reworked patch on its own looks reasonable to me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers either...) Just iommu_get_dma_strict() needs an export since the SMMU drivers can be modular - I consciously didn't add that myself since I was mistakenly thinking only iommu-dma would call it. Fixed. Can I get your signoff for the patch? Then I'll switch it to over to being attributed to you. Sure - I would have thought that the one I originally posted still stands, but for the avoidance of doubt, for the parts of commit 8b6d45c495bd in your tree that remain from what I wrote: Signed-off-by: Robin Murphy Cheers, Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 24/30] Kconfig: Change Synopsys to Synopsis
On 2021-03-29 00:53, Bhaskar Chowdhury wrote: s/Synopsys/Synopsis/ .two different places. Erm, that is definitely not a typo... :/ ..and for some unknown reason it introduce a empty line deleted and added back. Presumably your editor is configured to trim trailing whitespace on save. Furthermore, there are several instances in the other patches where your "corrections" are grammatically incorrect, I'm not sure what the deal is with patch #14, and you've also used the wrong subsystem name (it should be "dmaengine"). It's great to want to clean things up, but please pay a bit of care and attention to what you're actually doing. Robin. Signed-off-by: Bhaskar Chowdhury --- drivers/dma/Kconfig | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 0c2827fd8c19..30e8cc26f43b 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -170,15 +170,15 @@ config DMA_SUN6I Support for the DMA engine first found in Allwinner A31 SoCs. config DW_AXI_DMAC - tristate "Synopsys DesignWare AXI DMA support" + tristate "Synopsis DesignWare AXI DMA support" depends on OF || COMPILE_TEST depends on HAS_IOMEM select DMA_ENGINE select DMA_VIRTUAL_CHANNELS help - Enable support for Synopsys DesignWare AXI DMA controller. + Enable support for Synopsis DesignWare AXI DMA controller. NOTE: This driver wasn't tested on 64 bit platform because - of lack 64 bit platform with Synopsys DW AXI DMAC. + of lack 64 bit platform with Synopsis DW AXI DMAC. config EP93XX_DMA bool "Cirrus Logic EP93xx DMA support" @@ -394,7 +394,7 @@ config MOXART_DMA select DMA_VIRTUAL_CHANNELS help Enable support for the MOXA ART SoC DMA controller. - + Say Y here if you enabled MMP ADMA, otherwise say N. config MPC512X_DMA -- 2.26.3 ___ iommu mailing list io...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Erm, this patch is based on that one, it's right there in the context :/ Thanks, Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Erm, this patch is based on that one, it's right there in the context :/ Ah, sorry, I didn't spot that! I was just trying to illustrate that this is per-device. Sure, I understand - and I'm just trying to bang home that despite appearances it's never actually been treated as such for SMMU, so anything that's wrong after this change was already wrong before. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-31 12:49, Will Deacon wrote: On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote: On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. For DMA domains sure, but I don't think that's the case for unmanaged domains such as those used by VFIO. Eh? This is only relevant to DMA domains anyway. Flush queues are part of the IOVA allocator that VFIO doesn't even use. It's always been the case that unmanaged domains only use strict invalidation. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. I think a helper that takes a domain would be a good starting point. You mean device, right? The one condition we currently have is at the device level, and there's really nothing inherent to the domain itself that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care about this). Another idea that's just come to mind is now that IOMMU_DOMAIN_DMA has a standard meaning, maybe we could split out a separate IOMMU_DOMAIN_DMA_STRICT type such that it can all propagate from iommu_get_def_domain_type()? That feels like it might be quite promising, but I'd still do it as an improvement on top of this patch, since it's beyond just cleaning up the abuse of domain attributes to pass a command-line option around. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-31 16:32, Will Deacon wrote: On Wed, Mar 31, 2021 at 02:09:37PM +0100, Robin Murphy wrote: On 2021-03-31 12:49, Will Deacon wrote: On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote: On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. For DMA domains sure, but I don't think that's the case for unmanaged domains such as those used by VFIO. Eh? This is only relevant to DMA domains anyway. Flush queues are part of the IOVA allocator that VFIO doesn't even use. It's always been the case that unmanaged domains only use strict invalidation. Maybe I'm going mad. With this patch, the SMMU driver unconditionally sets IO_PGTABLE_QUIRK_NON_STRICT for page-tables if iommu_get_dma_strict() is true, no? In which case, that will get set for page-tables corresponding to unmanaged domains as well as DMA domains when it is enabled. That didn't happen before because you couldn't set the attribute for unmanaged domains. What am I missing? Oh cock... sorry, all this time I've been saying what I *expect* it to do, while overlooking the fact that the IO_PGTABLE_QUIRK_NON_STRICT hunks were the bits I forgot to write and Christoph had to fix up. Indeed, those should be checking the domain type too to preserve the existing behaviour. Apologies for the confusion. Robin. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. I think a helper that takes a domain would be a good starting point. You mean device, right? The one condition we currently have is at the device level, and there's really nothing inherent to the domain itself that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care about this). Device would probably work too; you'd pass the first device to attach to the domain when querying t
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-16 15:38, Christoph Hellwig wrote: [...] diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f1e38526d5bd40..996dfdf9d375dd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2017,7 +2017,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; - if (smmu_domain->non_strict) + if (!iommu_get_dma_strict()) As Will raised, this also needs to be checking "domain->type == IOMMU_DOMAIN_DMA" to maintain equivalent behaviour to the attribute code below. pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); @@ -2449,52 +2449,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; } -static int arm_smmu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) -{ - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - - switch (domain->type) { - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } -} [...] diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f985817c967a25..edb1de479dd1a7 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -668,7 +668,6 @@ struct arm_smmu_domain { struct mutexinit_mutex; /* Protects smmu pointer */ struct io_pgtable_ops *pgtbl_ops; - boolnon_strict; atomic_tnr_ats_masters; enum arm_smmu_domain_stage stage; diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 0aa6d667274970..3dde22b1f8ffb0 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -761,6 +761,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; + if (!iommu_get_dma_strict()) Ditto here. Sorry for not spotting that sooner :( Robin. + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + if (smmu->impl && smmu->impl->init_context) { ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg, dev); if (ret) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays
On 2021-02-04 07:29, Christoph Hellwig wrote: On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote: This patch converts several swiotlb related variables to arrays, in order to maintain stat/status for different swiotlb buffers. Here are variables involved: - io_tlb_start and io_tlb_end - io_tlb_nslabs and io_tlb_used - io_tlb_list - io_tlb_index - max_segment - io_tlb_orig_addr - no_iotlb_memory There is no functional change and this is to prepare to enable 64-bit swiotlb. Claire Chang (on Cc) already posted a patch like this a month ago, which looks much better because it actually uses a struct instead of all the random variables. Indeed, I skimmed the cover letter and immediately thought that this whole thing is just the restricted DMA pool concept[1] again, only from a slightly different angle. Robin. [1] https://lore.kernel.org/linux-iommu/20210106034124.30560-1-tien...@chromium.org/ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/lima: Use delayed timer as default in devfreq profile
On 2021-02-03 02:01, Qiang Yu wrote: On Tue, Feb 2, 2021 at 10:02 PM Lukasz Luba wrote: On 2/2/21 1:01 AM, Qiang Yu wrote: Hi Lukasz, Thanks for the explanation. So the deferred timer option makes a mistake that when GPU goes from idle to busy for only one poll periodic, in this case 50ms, right? Not exactly. Driver sets the polling interval to 50ms (in this case) because it needs ~3-frame average load (in 60fps). I have discovered the issue quite recently that on systems with 2 CPUs or more, the devfreq core is not monitoring the devices even for seconds. Therefore, we might end up with quite big amount of work that GPU is doing, but we don't know about it. Devfreq core didn't check <- timer didn't fired. Then suddenly that CPU, which had the deferred timer registered last time, is waking up and timer triggers to check our device. We get the stats, but they might be showing load from 1sec not 50ms. We feed them into governor. Governor sees the new load, but was tested and configured for 50ms, so it might try to rise the frequency to max. The GPU work might be already lower and there is no need for such freq. Then the CPU goes idle again, so no devfreq core check for next e.g. 1sec, but the frequency stays at max OPP and we burn power. So, it's completely unreliable. We might stuck at min frequency and suffer the frame drops, or sometimes stuck to max freq and burn more power when there is no such need. Similar for thermal governor, which is confused by this old stats and long period stats, longer than 50ms. Stats from last e.g. ~1sec tells you nothing about real recent GPU workload. Oh, right, I missed this case. But delayed timer will wakeup CPU every 50ms even when system is idle, will this cause more power consumption for the case like phone suspend? No, in case of phone suspend it won't increase the power consumption. The device won't be woken up, it will stay in suspend. I mean the CPU is waked up frequently by timer when phone suspend, not the whole device (like the display). Seems it's better to have deferred timer when device is suspended for power saving, and delayed timer when device in working state. User knows this and can use sysfs to change it. Doesn't devfreq_suspend_device() already cancel any timer work either way in that case? Robin. Set the delayed timer as default is reasonable, so patch is: Reviewed-by: Qiang Yu Regards, Qiang Regards, Lukasz Regards, Qiang On Mon, Feb 1, 2021 at 5:53 PM Lukasz Luba wrote: Hi Qiang, On 1/30/21 1:51 PM, Qiang Yu wrote: Thanks for the patch. But I can't observe any difference on glmark2 with or without this patch. Maybe you can provide other test which can benefit from it. This is a design problem and has impact on the whole system. There is a few issues. When the device is not checked and there are long delays between last check and current, the history is broken. It confuses the devfreq governor and thermal governor (Intelligent Power Allocation (IPA)). Thermal governor works on stale stats data and makes stupid decisions, because there is no new stats (device not checked). Similar applies to devfreq simple_ondemand governor, where it 'tires' to work on a lng period even 3sec and make prediction for the next frequency based on it (which is broken). How it should be done: constant reliable check is needed, then: - period is guaranteed and has fixed size, e.g 50ms or 100ms. - device status is quite recent so thermal devfreq cooling provides 'fresh' data into thermal governor This would prevent odd behavior and solve the broken cases. Considering it will wake up CPU more frequently, and user may choose to change this by sysfs, I'd like to not apply it. The deferred timer for GPU is wrong option, for UFS or eMMC makes more sense. It's also not recommended for NoC busses. I've discovered that some time ago and proposed to have option to switch into delayed timer. Trust me, it wasn't obvious to find out that this missing check has those impacts. So the other engineers or users might not know that some problems they faces (especially when the device load is changing) is due to this delayed vs deffered timer and they will change it in the sysfs. Regards, Lukasz Regards, Qiang ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC PATCH 1/8] of/device: Allow specifying a custom iommu_spec to of_dma_configure
Hi Mikko, On 2021-02-08 16:38, Mikko Perttunen wrote: To allow for more customized device tree bindings that point to IOMMUs, allow manual specification of iommu_spec to of_dma_configure. The initial use case for this is with Host1x, where the driver manages a set of device tree-defined IOMMU contexts that are dynamically allocated to various users. These contexts don't correspond to platform devices and are instead attached to dummy devices on a custom software bus. I'd suggest taking a closer look at the patches that made this of_dma_configure_id() in the first place, and the corresponding bus code in fsl-mc. At this level, Host1x sounds effectively identical to DPAA2 in terms of being a bus of logical devices composed from bits of implicit behind-the-scenes hardware. I mean, compare your series title to the fact that their identifiers are literally named "Isolation Context ID" ;) Please just use the existing mechanisms to describe a mapping between Host1x context IDs and SMMU Stream IDs, rather than what looks like a giant hacky mess here. (This also reminds me I wanted to rip out all the PCI special-cases and convert pci_dma_configure() over to passing its own IDs too, so thanks for the memory-jog...) Robin. Signed-off-by: Mikko Perttunen --- drivers/iommu/of_iommu.c | 12 drivers/of/device.c | 9 + include/linux/of_device.h | 34 +++--- include/linux/of_iommu.h | 6 -- 4 files changed, 44 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index e505b9130a1c..3fefa6c63863 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -87,8 +87,7 @@ int of_get_dma_window(struct device_node *dn, const char *prefix, int index, } EXPORT_SYMBOL_GPL(of_get_dma_window); -static int of_iommu_xlate(struct device *dev, - struct of_phandle_args *iommu_spec) +int of_iommu_xlate(struct device *dev, struct of_phandle_args *iommu_spec) { const struct iommu_ops *ops; struct fwnode_handle *fwnode = &iommu_spec->np->fwnode; @@ -117,6 +116,7 @@ static int of_iommu_xlate(struct device *dev, module_put(ops->owner); return ret; } +EXPORT_SYMBOL_GPL(of_iommu_xlate); static int of_iommu_configure_dev_id(struct device_node *master_np, struct device *dev, @@ -177,7 +177,8 @@ static int of_iommu_configure_device(struct device_node *master_np, const struct iommu_ops *of_iommu_configure(struct device *dev, struct device_node *master_np, - const u32 *id) + const u32 *id, + struct of_phandle_args *iommu_spec) { const struct iommu_ops *ops = NULL; struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); @@ -209,7 +210,10 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, err = pci_for_each_dma_alias(to_pci_dev(dev), of_pci_iommu_init, &info); } else { - err = of_iommu_configure_device(master_np, dev, id); + if (iommu_spec) + err = of_iommu_xlate(dev, iommu_spec); + else + err = of_iommu_configure_device(master_np, dev, id); fwspec = dev_iommu_fwspec_get(dev); if (!err && fwspec) diff --git a/drivers/of/device.c b/drivers/of/device.c index aedfaaafd3e7..84ada2110c5b 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -88,8 +88,9 @@ int of_device_add(struct platform_device *ofdev) * can use a platform bus notifier and handle BUS_NOTIFY_ADD_DEVICE events * to fix up DMA configuration. */ -int of_dma_configure_id(struct device *dev, struct device_node *np, - bool force_dma, const u32 *id) +int __of_dma_configure(struct device *dev, struct device_node *np, + bool force_dma, const u32 *id, + struct of_phandle_args *iommu_spec) { const struct iommu_ops *iommu; const struct bus_dma_region *map = NULL; @@ -170,7 +171,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, dev_dbg(dev, "device is%sdma coherent\n", coherent ? " " : " not "); - iommu = of_iommu_configure(dev, np, id); + iommu = of_iommu_configure(dev, np, id, iommu_spec); if (PTR_ERR(iommu) == -EPROBE_DEFER) { kfree(map); return -EPROBE_DEFER; @@ -184,7 +185,7 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, dev->dma_range_map = map; return 0; } -EXPORT_SYMBOL_GPL(of_dma_configure_id); +EXPORT_SYMBOL_GPL(__of_dma_configure);
Re: [PATCH v3 4/6] drm/sprd: add Unisoc's drm display controller driver
_INT_ERRBIT(2) +#define BIT_DPU_INT_UPDATE_DONEBIT(4) +#define BIT_DPU_INT_VSYNC BIT(5) +#define BIT_DPU_INT_MMU_VAOR_RDBIT(16) +#define BIT_DPU_INT_MMU_VAOR_WRBIT(17) +#define BIT_DPU_INT_MMU_INV_RD BIT(18) +#define BIT_DPU_INT_MMU_INV_WR BIT(19) + +/* DPI control bits */ +#define BIT_DPU_EDPI_TE_EN BIT(8) +#define BIT_DPU_EDPI_FROM_EXTERNAL_PAD BIT(10) +#define BIT_DPU_DPI_HALT_ENBIT(16) + +static const u32 primary_fmts[] = { + DRM_FORMAT_XRGB, + DRM_FORMAT_XBGR, + DRM_FORMAT_ARGB, + DRM_FORMAT_ABGR, + DRM_FORMAT_RGBA, + DRM_FORMAT_BGRA, + DRM_FORMAT_RGBX, + DRM_FORMAT_RGB565, + DRM_FORMAT_BGR565, + DRM_FORMAT_NV12, + DRM_FORMAT_NV21, + DRM_FORMAT_NV16, + DRM_FORMAT_NV61, + DRM_FORMAT_YUV420, + DRM_FORMAT_YVU420, +}; + +struct sprd_plane { + struct drm_plane plane; + u32 index; +}; + +static inline struct sprd_plane *to_sprd_plane(struct drm_plane *plane) +{ + return container_of(plane, struct sprd_plane, plane); +} + +static u32 check_mmu_isr(struct sprd_dpu *dpu, u32 reg_val) +{ + struct dpu_context *ctx = &dpu->ctx; + u32 mmu_mask = BIT_DPU_INT_MMU_VAOR_RD | + BIT_DPU_INT_MMU_VAOR_WR | + BIT_DPU_INT_MMU_INV_RD | + BIT_DPU_INT_MMU_INV_WR; + u32 val = reg_val & mmu_mask; + int i; + + if (val) { + drm_err(dpu->drm, "--- iommu interrupt err: 0x%04x ---\n", val); + + if (val & BIT_DPU_INT_MMU_INV_RD) + drm_err(dpu->drm, "iommu invalid read error, addr: 0x%08x\n", + readl(ctx->base + REG_MMU_INV_ADDR_RD)); + if (val & BIT_DPU_INT_MMU_INV_WR) + drm_err(dpu->drm, "iommu invalid write error, addr: 0x%08x\n", + readl(ctx->base + REG_MMU_INV_ADDR_WR)); + if (val & BIT_DPU_INT_MMU_VAOR_RD) + drm_err(dpu->drm, "iommu va out of range read error, addr: 0x%08x\n", + readl(ctx->base + REG_MMU_VAOR_ADDR_RD)); + if (val & BIT_DPU_INT_MMU_VAOR_WR) + drm_err(dpu->drm, "iommu va out of range write error, addr: 0x%08x\n", + readl(ctx->base + REG_MMU_VAOR_ADDR_WR)); + + for (i = 0; i < 8; i++) { + reg_val = layer_reg_rd(ctx, REG_LAY_CTRL, i); + if (reg_val & 0x1) + drm_info(dpu->drm, "layer%d: 0x%08x 0x%08x 0x%08x ctrl: 0x%08x\n", i, + layer_reg_rd(ctx, REG_LAY_BASE_ADDR0, i), + layer_reg_rd(ctx, REG_LAY_BASE_ADDR1, i), + layer_reg_rd(ctx, REG_LAY_BASE_ADDR2, i), + layer_reg_rd(ctx, REG_LAY_CTRL, i)); + } + } + + return val; +} Please don't do this - if the IOMMU is managed by a separate driver, that driver should handle its own interrupts. If you want to report client-specific information associated with a fault you can use iommu_set_fault_handler() to hook in your reporting function (the IOMMU driver should correspondingly call report_iommu_fault() from its own fault handler). Even regardless of the mess this seems to be responsible for in terms of DT bindings and resource claiming, duplicating the basic IOMMU fault processing across all the drivers for your media blocks is clearly not a good path to take. Robin. + +static int dpu_wait_stop_done(struct sprd_dpu *dpu) +{ + struct dpu_context *ctx = &dpu->ctx; + int rc; + + if (ctx->stopped) + return 0; + + rc = wait_event_interruptible_timeout(ctx->wait_queue, ctx->evt_stop, + msecs_to_jiffies(500)); + ctx->evt_stop = false; + + ctx->stopped = true; + + if (!rc) { + drm_err(dpu->drm, "dpu wait for stop done time out!\n"); + return -ETIMEDOUT; + } + + return 0; +} + +static int dpu_wait_update_done(struct sprd_dpu *dpu) +{ + struct dpu_context *ctx = &dpu->ctx; + int rc; + + ctx->evt_update = false; + + rc = wait_event_interruptible_timeout(ctx->wait_queue, ctx->evt_update, + msecs_to_jiffies(500)); + + if (!rc) { + drm_err(dpu->drm, "dpu wait for reg update done time out!\n"); + return -ETIMEDOUT; + } + + return 0; +} + +s
Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-01 08:42, Christoph Hellwig wrote: Use explicit methods for setting and querying the information instead. Now that everyone's using iommu-dma, is there any point in bouncing this through the drivers at all? Seems like it would make more sense for the x86 drivers to reflect their private options back to iommu_dma_strict (and allow Intel's caching mode to override it as well), then have iommu_dma_init_domain just test !iommu_dma_strict && domain->ops->flush_iotlb_all. Robin. Also remove the now unused iommu_domain_get_attr functionality. Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 ++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 56 + drivers/iommu/dma-iommu.c | 8 ++- drivers/iommu/intel/iommu.c | 27 ++ drivers/iommu/iommu.c | 19 +++ include/linux/iommu.h | 17 ++- 7 files changed, 51 insertions(+), 146 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a69a8b573e40d0..37a8e51db17656 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1771,24 +1771,11 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev) return acpihid_device_group(dev); } -static int amd_iommu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain) { - switch (domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - return -ENODEV; - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = !amd_iommu_unmap_flush; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } + if (domain->type != IOMMU_DOMAIN_DMA) + return false; + return !amd_iommu_unmap_flush; } /* @@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = { .release_device = amd_iommu_release_device, .probe_finalize = amd_iommu_probe_finalize, .device_group = amd_iommu_device_group, - .domain_get_attr = amd_iommu_domain_get_attr, + .dma_use_flush_queue = amd_iommu_dma_use_flush_queue, .get_resv_regions = amd_iommu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, .is_attach_deferred = amd_iommu_is_attach_deferred, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8594b4a8304375..bf96172e8c1f71 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; } -static int arm_smmu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - switch (domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - switch (attr) { - case DOMAIN_ATTR_NESTING: - *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); - return 0; - default: - return -ENODEV; - } - break; - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } + if (domain->type != IOMMU_DOMAIN_DMA) + return false; + return smmu_domain->non_strict; +} + + +static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain) +{ + if (domain->type != IOMMU_DOMAIN_DMA) + return; + to_smmu_domain(domain)->non_strict = true; } static int arm_smmu_domain_set_attr(struct iommu_domain *domain, @@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, } break; case IOMMU_DOMAIN_DMA: - switch(attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - smmu_domain->non_strict = *(int *)data; - break
Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG
On 2021-03-01 08:42, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig Moreso than the previous patch, where the feature is at least relatively generic (note that there's a bunch of in-flight development around DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to bloat the generic iommu_ops structure with private driver-specific interfaces. The attribute interface is a great compromise for these kinds of things, and you can easily add type-checked wrappers around it for external callers (maybe even make the actual attributes internal between the IOMMU core and drivers) if that's your concern. Robin. --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/iommu/arm/arm-smmu/arm-smmu.c | 40 +++-- drivers/iommu/iommu.c | 9 ++ include/linux/iommu.h | 9 +- 4 files changed, 29 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 0f184c3dd9d9ec..78d98ab2ee3a68 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu) struct io_pgtable_domain_attr pgtbl_cfg; pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg); + iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg); } struct msm_gem_address_space * diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2e17d990d04481..2858999c86dfd1 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain) return ret; } -static int arm_smmu_domain_set_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain, + struct io_pgtable_domain_attr *pgtbl_cfg) { - int ret = 0; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + int ret = -EPERM; - mutex_lock(&smmu_domain->init_mutex); - - switch(domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - switch (attr) { - case DOMAIN_ATTR_IO_PGTABLE_CFG: { - struct io_pgtable_domain_attr *pgtbl_cfg = data; - - if (smmu_domain->smmu) { - ret = -EPERM; - goto out_unlock; - } + if (domain->type != IOMMU_DOMAIN_UNMANAGED) + return -EINVAL; - smmu_domain->pgtbl_cfg = *pgtbl_cfg; - break; - } - default: - ret = -ENODEV; - } - break; - case IOMMU_DOMAIN_DMA: - ret = -ENODEV; - break; - default: - ret = -EINVAL; + mutex_lock(&smmu_domain->init_mutex); + if (!smmu_domain->smmu) { + smmu_domain->pgtbl_cfg = *pgtbl_cfg; + ret = 0; } -out_unlock: mutex_unlock(&smmu_domain->init_mutex); + return ret; } @@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .dma_use_flush_queue= arm_smmu_dma_use_flush_queue, .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue, - .domain_set_attr= arm_smmu_domain_set_attr, + .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr, .domain_enable_nesting = arm_smmu_domain_enable_nesting, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2e9e058501a953..8490aefd4b41f8 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting); +int iommu_domain_set_pgtable_attr(struct iommu_domain *domain, + struct io_pgtable_domain_attr *pgtbl_cfg) +{ + if (!domain->ops->domain_set_pgtable_attr) + return -EINVAL; + return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg); +} +EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr); + void iommu_get_resv_regions(struct device *dev, struct list_head *list) { const struct iommu_ops *ops = dev->bus->iommu_ops; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index aed88aa3bd3edf..39d3ed4d2700ac 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -40,6 +40,7 @@ struct
Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-10 09:25, Christoph Hellwig wrote: On Wed, Mar 10, 2021 at 10:15:01AM +0100, Christoph Hellwig wrote: On Thu, Mar 04, 2021 at 03:25:27PM +, Robin Murphy wrote: On 2021-03-01 08:42, Christoph Hellwig wrote: Use explicit methods for setting and querying the information instead. Now that everyone's using iommu-dma, is there any point in bouncing this through the drivers at all? Seems like it would make more sense for the x86 drivers to reflect their private options back to iommu_dma_strict (and allow Intel's caching mode to override it as well), then have iommu_dma_init_domain just test !iommu_dma_strict && domain->ops->flush_iotlb_all. Hmm. I looked at this, and kill off ->dma_enable_flush_queue for the ARM drivers and just looking at iommu_dma_strict seems like a very clear win. OTOH x86 is a little more complicated. AMD and intel defaul to lazy mode, so we'd have to change the global iommu_dma_strict if they are initialized. Also Intel has not only a "static" option to disable lazy mode, but also a "dynamic" one where it iterates structure. So I think on the get side we're stuck with the method, but it still simplifies the whole thing. Actually... Just mirroring the iommu_dma_strict value into struct iommu_domain should solve all of that with very little boilerplate code. Yes, my initial thought was to directly replace the attribute with a common flag at iommu_domain level, but since in all cases the behaviour is effectively global rather than actually per-domain, it seemed reasonable to take it a step further. This passes compile-testing for arm64 and x86, what do you think? Robin. ->8- Subject: [PATCH] iommu: Consolidate strict invalidation handling Now that everyone is using iommu-dma, the global invalidation policy really doesn't need to be woven through several parts of the core API and individual drivers, we can just look it up directly at the one point that we now make the flush queue decision. If the x86 drivers reflect their internal options and overrides back to iommu_dma_strict, that can become the canonical source. Signed-off-by: Robin Murphy --- drivers/iommu/amd/iommu.c | 2 ++ drivers/iommu/dma-iommu.c | 8 +--- drivers/iommu/intel/iommu.c | 12 drivers/iommu/iommu.c | 35 +++ include/linux/iommu.h | 2 ++ 5 files changed, 44 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a69a8b573e40..1db29e59d468 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1856,6 +1856,8 @@ int __init amd_iommu_init_dma_ops(void) else pr_info("Lazy IO/TLB flushing enabled\n"); + iommu_set_dma_strict(amd_iommu_unmap_flush); + return 0; } diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index af765c813cc8..789a950cc125 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -304,10 +304,6 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad) cookie = container_of(iovad, struct iommu_dma_cookie, iovad); domain = cookie->fq_domain; - /* -* The IOMMU driver supporting DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE -* implies that ops->flush_iotlb_all must be non-NULL. -*/ domain->ops->flush_iotlb_all(domain); } @@ -334,7 +330,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; - int attr; if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) return -EINVAL; @@ -371,8 +366,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, init_iova_domain(iovad, 1UL << order, base_pfn); if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && - !iommu_domain_get_attr(domain, DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, &attr) && - attr) { + domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) { if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, iommu_dma_entry_dtor)) pr_warn("iova flush queue initialization failed\n"); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index b5c746f0f63b..f5b452cd1266 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4377,6 +4377,17 @@ int __init intel_iommu_init(void) down_read(&dmar_global_lock); for_each_active_iommu(iommu, drhd) { + if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) { + /* +* The flush queue implementation does
Re: [PATCH v5 4/6] drm/sprd: add Unisoc's drm display controller driver
On 2021-05-17 10:27, Joerg Roedel wrote: On Fri, Apr 30, 2021 at 08:20:10PM +0800, Kevin Tang wrote: Cc Robin & Joerg This is just some GPU internal MMU being used here, it seems. It doesn't use the IOMMU core code, so no Ack needed from the IOMMU side. Except the actual MMU being used is drivers/iommu/sprd_iommu.c - this is just the display driver poking directly at the interrupt registers of its associated IOMMU instance. I still think this is wrong, and that it should be treated as a shared interrupt, with the IOMMU driver handling its own registers and reporting to the client through the standard report_iommu_fault() API, especially since there are apparently more blocks using these IOMMU instances than just the display. Robin.
[PATCH] iommu/io-pgtable: Remove tlb_flush_leaf
The only user of tlb_flush_leaf is a particularly hairy corner of the Arm short-descriptor code, which wants a synchronous invalidation to minimise the races inherent in trying to split a large page mapping. This is already far enough into "here be dragons" territory that no sensible caller should ever hit it, and thus it really doesn't need optimising. Although using tlb_flush_walk there may technically be more heavyweight than needed, it does the job and saves everyone else having to carry around useless baggage. Signed-off-by: Robin Murphy --- Reviewing the Mediatek TLB optimisation patches just left me thinking "why do we even have this?"... Panfrost folks, this has zero functional impact to you, merely wants an ack for straying outside drivers/iommu. Robin. drivers/gpu/drm/msm/msm_iommu.c | 1 - drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 -- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 -- drivers/iommu/arm/arm-smmu/arm-smmu.c | 25 +++-- drivers/iommu/arm/arm-smmu/qcom_iommu.c | 8 --- drivers/iommu/io-pgtable-arm-v7s.c | 3 +-- drivers/iommu/io-pgtable-arm.c | 1 - drivers/iommu/ipmmu-vmsa.c | 1 - drivers/iommu/msm_iommu.c | 7 -- drivers/iommu/mtk_iommu.c | 1 - include/linux/io-pgtable.h | 11 - 11 files changed, 4 insertions(+), 68 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 22ac7c692a81..50d881794758 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -139,7 +139,6 @@ static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather, static const struct iommu_flush_ops null_tlb_ops = { .tlb_flush_all = msm_iommu_tlb_flush_all, .tlb_flush_walk = msm_iommu_tlb_flush_walk, - .tlb_flush_leaf = msm_iommu_tlb_flush_walk, .tlb_add_page = msm_iommu_tlb_add_page, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 776448c527ea..c186914cc4f9 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -347,16 +347,9 @@ static void mmu_tlb_flush_walk(unsigned long iova, size_t size, size_t granule, mmu_tlb_sync_context(cookie); } -static void mmu_tlb_flush_leaf(unsigned long iova, size_t size, size_t granule, - void *cookie) -{ - mmu_tlb_sync_context(cookie); -} - static const struct iommu_flush_ops mmu_tlb_ops = { .tlb_flush_all = mmu_tlb_inv_context_s1, .tlb_flush_walk = mmu_tlb_flush_walk, - .tlb_flush_leaf = mmu_tlb_flush_leaf, }; int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e634bbe60573..fb684a393118 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1741,16 +1741,9 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, arm_smmu_tlb_inv_range(iova, size, granule, false, cookie); } -static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, - size_t granule, void *cookie) -{ - arm_smmu_tlb_inv_range(iova, size, granule, true, cookie); -} - static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context, .tlb_flush_walk = arm_smmu_tlb_inv_walk, - .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_page = arm_smmu_tlb_inv_page_nosync, }; diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index dad7fa86fbd4..0b8c59922a2b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -333,14 +333,6 @@ static void arm_smmu_tlb_inv_walk_s1(unsigned long iova, size_t size, arm_smmu_tlb_sync_context(cookie); } -static void arm_smmu_tlb_inv_leaf_s1(unsigned long iova, size_t size, -size_t granule, void *cookie) -{ - arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie, - ARM_SMMU_CB_S1_TLBIVAL); - arm_smmu_tlb_sync_context(cookie); -} - static void arm_smmu_tlb_add_page_s1(struct iommu_iotlb_gather *gather, unsigned long iova, size_t granule, void *cookie) @@ -357,14 +349,6 @@ static void arm_smmu_tlb_inv_walk_s2(unsigned long iova, size_t size, arm_smmu_tlb_sync_context(cookie); } -static void arm_smmu_tlb_inv_leaf_s2(unsigned long iova, size_t size, -size_t granule, void *cookie) -{ - arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie, -
Re: [PATCH] iommu/io-pgtable: Remove tlb_flush_leaf
On 2020-11-25 17:29, Robin Murphy wrote: The only user of tlb_flush_leaf is a particularly hairy corner of the Arm short-descriptor code, which wants a synchronous invalidation to minimise the races inherent in trying to split a large page mapping. This is already far enough into "here be dragons" territory that no sensible caller should ever hit it, and thus it really doesn't need optimising. Although using tlb_flush_walk there may technically be more heavyweight than needed, it does the job and saves everyone else having to carry around useless baggage. Signed-off-by: Robin Murphy --- Reviewing the Mediatek TLB optimisation patches just left me thinking "why do we even have this?"... Panfrost folks, this has zero functional impact to you, merely wants an ack for straying outside drivers/iommu. Oops, same goes for MSM folks too - I honestly failed to even notice that some of the "msm_iommu" code actually lives in DRM. Maybe the fancy editor makes life *too* easy and I should go back to grep and vim and having to pay attention to paths... :) Robin. drivers/gpu/drm/msm/msm_iommu.c | 1 - drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 -- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 -- drivers/iommu/arm/arm-smmu/arm-smmu.c | 25 +++-- drivers/iommu/arm/arm-smmu/qcom_iommu.c | 8 --- drivers/iommu/io-pgtable-arm-v7s.c | 3 +-- drivers/iommu/io-pgtable-arm.c | 1 - drivers/iommu/ipmmu-vmsa.c | 1 - drivers/iommu/msm_iommu.c | 7 -- drivers/iommu/mtk_iommu.c | 1 - include/linux/io-pgtable.h | 11 - 11 files changed, 4 insertions(+), 68 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index 22ac7c692a81..50d881794758 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -139,7 +139,6 @@ static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather, static const struct iommu_flush_ops null_tlb_ops = { .tlb_flush_all = msm_iommu_tlb_flush_all, .tlb_flush_walk = msm_iommu_tlb_flush_walk, - .tlb_flush_leaf = msm_iommu_tlb_flush_walk, .tlb_add_page = msm_iommu_tlb_add_page, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 776448c527ea..c186914cc4f9 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -347,16 +347,9 @@ static void mmu_tlb_flush_walk(unsigned long iova, size_t size, size_t granule, mmu_tlb_sync_context(cookie); } -static void mmu_tlb_flush_leaf(unsigned long iova, size_t size, size_t granule, - void *cookie) -{ - mmu_tlb_sync_context(cookie); -} - static const struct iommu_flush_ops mmu_tlb_ops = { .tlb_flush_all = mmu_tlb_inv_context_s1, .tlb_flush_walk = mmu_tlb_flush_walk, - .tlb_flush_leaf = mmu_tlb_flush_leaf, }; int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e634bbe60573..fb684a393118 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1741,16 +1741,9 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, arm_smmu_tlb_inv_range(iova, size, granule, false, cookie); } -static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, - size_t granule, void *cookie) -{ - arm_smmu_tlb_inv_range(iova, size, granule, true, cookie); -} - static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context, .tlb_flush_walk = arm_smmu_tlb_inv_walk, - .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_page = arm_smmu_tlb_inv_page_nosync, }; diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index dad7fa86fbd4..0b8c59922a2b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -333,14 +333,6 @@ static void arm_smmu_tlb_inv_walk_s1(unsigned long iova, size_t size, arm_smmu_tlb_sync_context(cookie); } -static void arm_smmu_tlb_inv_leaf_s1(unsigned long iova, size_t size, -size_t granule, void *cookie) -{ - arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie, - ARM_SMMU_CB_S1_TLBIVAL); - arm_smmu_tlb_sync_context(cookie); -} - static void arm_smmu_tlb_add_page_s1(struct iommu_iotlb_gather *gather, unsigned long iova, size_t granule, void *cookie) @@ -357,14 +349,6 @@ static void arm_smmu_tlb_inv_walk_
Re: [PATCH v15 12/12] of: Add plumbing for restricted DMA pool
On 2021-07-02 04:08, Guenter Roeck wrote: Hi, On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote: If a device is not behind an IOMMU, we look up the device node and set up the restricted DMA when the restricted-dma-pool is presented. Signed-off-by: Claire Chang Tested-by: Stefano Stabellini Tested-by: Will Deacon With this patch in place, all sparc and sparc64 qemu emulations fail to boot. Symptom is that the root file system is not found. Reverting this patch fixes the problem. Bisect log is attached. Ah, OF_ADDRESS depends on !SPARC, so of_dma_configure_id() is presumably returning an unexpected -ENODEV from the of_dma_set_restricted_buffer() stub. That should probably be returning 0 instead, since either way it's not an error condition for it to simply do nothing. Robin. Guenter --- # bad: [fb0ca446157a86b75502c1636b0d81e642fe6bf1] Add linux-next specific files for 20210701 # good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13 git bisect start 'HEAD' 'v5.13' # bad: [f63c4fda987a19b1194cc45cb72fd5bf968d9d90] Merge remote-tracking branch 'rdma/for-next' git bisect bad f63c4fda987a19b1194cc45cb72fd5bf968d9d90 # good: [46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4] Merge remote-tracking branch 'ipsec/master' git bisect good 46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4 # good: [43ba6969cfb8185353a7a6fc79070f13b9e3d6d3] Merge remote-tracking branch 'clk/clk-next' git bisect good 43ba6969cfb8185353a7a6fc79070f13b9e3d6d3 # good: [1ca5eddcf8dca1d6345471c6404e7364af0d7019] Merge remote-tracking branch 'fuse/for-next' git bisect good 1ca5eddcf8dca1d6345471c6404e7364af0d7019 # good: [8f6d7b3248705920187263a4e7147b0752ec7dcf] Merge remote-tracking branch 'pci/next' git bisect good 8f6d7b3248705920187263a4e7147b0752ec7dcf # good: [df1885a755784da3ef285f36d9230c1d090ef186] RDMA/rtrs_clt: Alloc less memory with write path fast memory registration git bisect good df1885a755784da3ef285f36d9230c1d090ef186 # good: [93d31efb58c8ad4a66bbedbc2d082df458c04e45] Merge remote-tracking branch 'cpufreq-arm/cpufreq/arm/linux-next' git bisect good 93d31efb58c8ad4a66bbedbc2d082df458c04e45 # good: [46308965ae6fdc7c25deb2e8c048510ae51bbe66] RDMA/irdma: Check contents of user-space irdma_mem_reg_req object git bisect good 46308965ae6fdc7c25deb2e8c048510ae51bbe66 # good: [6de7a1d006ea9db235492b288312838d6878385f] thermal/drivers/int340x/processor_thermal: Split enumeration and processing part git bisect good 6de7a1d006ea9db235492b288312838d6878385f # good: [081bec2577cda3d04f6559c60b6f4e2242853520] dt-bindings: of: Add restricted DMA pool git bisect good 081bec2577cda3d04f6559c60b6f4e2242853520 # good: [bf95ac0bcd69979af146852f6a617a60285ebbc1] Merge remote-tracking branch 'thermal/thermal/linux-next' git bisect good bf95ac0bcd69979af146852f6a617a60285ebbc1 # good: [3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc] RDMA/core: Always release restrack object git bisect good 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc # bad: [cff1f23fad6e0bd7d671acce0d15285c709f259c] Merge remote-tracking branch 'swiotlb/linux-next' git bisect bad cff1f23fad6e0bd7d671acce0d15285c709f259c # bad: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for restricted DMA pool git bisect bad b655006619b7bccd0dc1e055bd72de5d613e7b5c # first bad commit: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for restricted DMA pool
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On 2021-07-02 14:58, Will Deacon wrote: Hi Nathan, On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote: On 7/1/2021 12:40 AM, Will Deacon wrote: On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote: On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote: On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: `BUG: unable to handle page fault for address: 003a8290` and the fact it crashed at `_raw_spin_lock_irqsave` look like the memory (maybe dev->dma_io_tlb_mem) was corrupted? The dev->dma_io_tlb_mem should be set here (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) through device_initialize. I'm less sure about this. 'dma_io_tlb_mem' should be pointing at 'io_tlb_default_mem', which is a page-aligned allocation from memblock. The spinlock is at offset 0x24 in that structure, and looking at the register dump from the crash: Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006 Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: RCX: 8900572ad580 Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c RDI: 1d17 Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c R09: Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 R12: 0212 Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 R15: 0020 Jun 29 18:28:42 hp-4300G kernel: FS: 7f1f8898ea40() GS:89005728() knlGS: Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: ES: CR0: 80050033 Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d CR4: 00350ee0 Jun 29 18:28:42 hp-4300G kernel: Call Trace: Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0 Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and RDX pointing at the spinlock. Yet RAX is holding junk :/ I agree that enabling KASAN would be a good idea, but I also think we probably need to get some more information out of swiotlb_tbl_map_single() to see see what exactly is going wrong in there. I can certainly enable KASAN and if there is any debug print I can add or dump anything, let me know! I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built x86 defconfig and ran it on my laptop. However, it seems to work fine! Please can you share your .config? Sure thing, it is attached. It is just Arch Linux's config run through olddefconfig. The original is below in case you need to diff it. https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config If there is anything more that I can provide, please let me know. I eventually got this booting (for some reason it was causing LD to SEGV trying to link it for a while...) and sadly it works fine on my laptop. Hmm. Did you manage to try again with KASAN? It might also be worth taking the IOMMU out of the equation, since that interfaces differently with SWIOTLB and I couldn't figure out the code path from the log you provided. What happens if you boot with "amd_iommu=off swiotlb=force"? Oh, now there's a thing... the chat from the IOMMU API in the boot log implies that the IOMMU *should* be in the picture - we see that default domains are IOMMU_DOMAIN_DMA default and the GPU :0c:00.0 was added to a group. That means dev->dma_ops should be set and DMA API calls should be going through iommu-dma, yet the callstack in the crash says we've gone straight from dma_map_page_attrs() to swiotlb_map(), implying the inline dma_direct_map_page() path. If dev->dma_ops didn't look right in the first place, it's perhaps less surprising that dev->dma_io_tlb_mem might be wild as well. It doesn't seem plausible that we should have a race between initialising the device and probing its driver, so maybe the whole dev pointer is getting trampled earlier in the callchain (or is fundamentally wrong to begin with, but from a quick skim of the amdgpu code it did look like adev->dev and adev->pdev are appropriately set early on by amdgpu_pci_probe()). (although word of warning here: i915 dies horribly on my laptop if I pass swiotlb=force, even with the distro 5.10 kernel) FWIW I'd imagine you probably need to massively increase the SWIOTLB buffer size to have hope of that working. Robin.
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On 2021-07-06 14:24, Will Deacon wrote: On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote: On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote: So at this point, the AMD IOMMU driver does: swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; where 'swiotlb' is a global variable indicating whether or not swiotlb is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which will call swiotlb_exit() if 'swiotlb' is false. Now, that used to work fine, because swiotlb_exit() clears 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I think that all the devices which have successfully probed beforehand will have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem' field. Yeah. I don't think we can do that anymore, and I also think it is a bad idea to start with. I've had a crack at reworking things along the following lines: - io_tlb_default_mem now lives in the BSS, the flexible array member is now a pointer and that part is allocated dynamically (downside of this is an extra indirection to get at the slots). - io_tlb_default_mem.nslabs tells you whether the thing is valid - swiotlb_exit() frees the slots array and clears the rest of the structure to 0. I also extended it to free the actual slabs, but I'm not sure why it wasn't doing that before. So a non-NULL dev->dma_io_tlb_mem should always be valid to follow. FWIW I was pondering the question of whether to do something along those lines or just scrap the default assignment entirely, so since I hadn't got round to saying that I've gone ahead and hacked up the alternative (similarly untested) for comparison :) TBH I'm still not sure which one I prefer... Robin. ->8- diff --git a/drivers/base/core.c b/drivers/base/core.c index ea5b85354526..394abf184c1a 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2847,9 +2847,6 @@ void device_initialize(struct device *dev) defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) dev->dma_coherent = dma_default_coherent; #endif -#ifdef CONFIG_SWIOTLB - dev->dma_io_tlb_mem = io_tlb_default_mem; -#endif } EXPORT_SYMBOL_GPL(device_initialize); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 39284ff2a6cd..620f16d89a98 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -107,16 +107,21 @@ struct io_tlb_mem { }; extern struct io_tlb_mem *io_tlb_default_mem; +static inline struct io_tlb_mem *dev_iotlb_mem(struct device *dev) +{ + return dev->dma_io_tlb_mem ?: io_tlb_default_mem; +} + static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); return mem && paddr >= mem->start && paddr < mem->end; } static inline bool is_swiotlb_force_bounce(struct device *dev) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); return mem && mem->force_bounce; } @@ -167,7 +172,7 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size); static inline bool is_swiotlb_for_alloc(struct device *dev) { - return dev->dma_io_tlb_mem->for_alloc; + return dev_iotlb_mem(dev)->for_alloc; } #else static inline struct page *swiotlb_alloc(struct device *dev, size_t size) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b7f76bca89bf..f4942149f87d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -359,7 +359,7 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT; phys_addr_t orig_addr = mem->slots[index].orig_addr; size_t alloc_size = mem->slots[index].alloc_size; @@ -440,7 +440,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, size_t alloc_size) { - struct io_tlb_mem *mem = dev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev_iotlb_mem(dev); unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -522,7 +522,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, size_t mapping_size, size_t alloc_size, enum dm
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On 2021-07-06 15:05, Christoph Hellwig wrote: On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote: FWIW I was pondering the question of whether to do something along those lines or just scrap the default assignment entirely, so since I hadn't got round to saying that I've gone ahead and hacked up the alternative (similarly untested) for comparison :) TBH I'm still not sure which one I prefer... Claire did implement something like your suggestion originally, but I don't really like it as it doesn't scale for adding multiple global pools, e.g. for the 64-bit addressable one for the various encrypted secure guest schemes. Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're not concerned with a minimal fix for backports anyway I'm more than happy to focus on Will's approach. Another thing is that that looks to take us a quiet step closer to the possibility of dynamically resizing a SWIOTLB pool, which is something that some of the hypervisor protection schemes looking to build on top of this series may want to explore at some point. Robin.
Re: [PATCH v2 1/2] dt-bindings: display: rockchip: Add compatible for rk3568 HDMI
On 2021-07-07 13:03, Benjamin Gaignard wrote: Define a new compatible for rk3568 HDMI. This version of HDMI hardware block needs two new clocks hclk_vio and hclk to provide phy reference clocks. Do you know what hclk_vio is and whether it's actually necessary? I don't see any mention of it downstream, and based on previous experience I'm suspicious that it might be just the parent of hclk, and thus should not need to be explicitly claimed by the device or baked into it's binding. Robin. Signed-off-by: Benjamin Gaignard --- version 2: - Add the clocks needed for the phy. .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml index 75cd9c686e985..cb8643b3a8b84 100644 --- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml @@ -23,6 +23,7 @@ properties: - rockchip,rk3288-dw-hdmi - rockchip,rk3328-dw-hdmi - rockchip,rk3399-dw-hdmi + - rockchip,rk3568-dw-hdmi reg-io-width: const: 4 @@ -51,8 +52,11 @@ properties: - vpll - enum: - grf + - hclk_vio + - vpll + - enum: + - hclk - vpll - - const: vpll ddc-i2c-bus: $ref: /schemas/types.yaml#/definitions/phandle
Re: [PATCH v2 1/2] dt-bindings: display: rockchip: Add compatible for rk3568 HDMI
On 2021-07-14 10:19, Michael Riesch wrote: Hello Heiko, On 7/13/21 10:49 AM, Heiko Stübner wrote: Hi Michael, Am Dienstag, 13. Juli 2021, 10:44:00 CEST schrieb Michael Riesch: The HDMI TX block in the RK3568 requires two power supplies, which have to be enabled in some cases (at least on the RK3568 EVB1 the voltages VDDA0V9_IMAGE and VCCA1V8_IMAGE are disabled by default). It would be great if this was considered by the driver and the device tree binding. I am not sure, though, whether this is a RK3568 specific or rockchip_dw_hdmi specific thing. Maybe it can even enter the Synopsis DW HDMI driver. I do remember that this discussion happened many years back already. And yes the supplies are needed for all but back then there was opposition as these are supposedly phy-related supplies, not for the dw-hdmi itself. [There are variants with an external phy, like on the rk3328] See discussion on [0] [0] https://dri-devel.freedesktop.narkive.com/pen2zWo1/patch-v3-1-2-drm-bridge-dw-hdmi-support-optional-supply-regulators Thanks for the pointer. My summary of this discussion would be the following: - There was no consensus on how to handle the issue. The voltages still have to be enabled from the outside of the driver. - Open question: rockchip-specific or general solution? (one may detect a tendency towards a rockchip-specific solution) - Open question: separation of the phy from the dw_hdmi IP core? First of all, IMHO the driver should enable those voltages, otherwise we will have the same discussion again in 5-6 years :-) Then, the rockchip,dw-hdmi binding features a property "phys", presumably to handle external phys (e.g., for the RK3328). This fact and the referenced discussion suggest a rockchip-specific solution. FWIW I've long thought that cleaning up the phy situation in dw-hdmi would be a good idea. It's always seemed a bit sketchy that on RK3328 we still validate modes against the tables for the Synopsys phy which isn't relevant, and if that does allow a clock rate through that the actual phy rejects then things just go horribly wrong and the display breaks. In the Rockchip documentation (at least for RK3328, RK3399 and RK3568), there are two extra voltages denoted as "HDMI PHY analog power". It would be tempting to add the internal phy to the device tree and glue it to the dw-hdmi using the "phys" property. However, as pointed out in the referenced discussion, the configuration registers of the phy are somewhat interleaved with the dw-hdmi registers and a clear separation may be tricky. Conceptually I don't think there's any issue with the HDMI node being its own phy provider where appropriate. At the DT level it should simply be a case of having both sets of properties, e.g.: &hdmi { #phy-cells = <0>; phys = <&hdmi>; }; And at the driver level AFAICS it's pretty much just a case of dw-hdmi additionally registering itself as a phy provider if the internal phy is present - the only difference then should be that it can end up calling back into itself via the common phy API rather than directly via internal special-cases. Robin. As a more pragmatic alternative, we could add optional supplies to the rockchip,dw-hdmi binding and evaluate the "phys" property. If the latter is not specified, the internal phy is used and the supplies must be enabled. Would such an approach be acceptable? Best regards, Michael On 7/7/21 2:03 PM, Benjamin Gaignard wrote: Define a new compatible for rk3568 HDMI. This version of HDMI hardware block needs two new clocks hclk_vio and hclk to provide phy reference clocks. Signed-off-by: Benjamin Gaignard --- version 2: - Add the clocks needed for the phy. .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml index 75cd9c686e985..cb8643b3a8b84 100644 --- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml @@ -23,6 +23,7 @@ properties: - rockchip,rk3288-dw-hdmi - rockchip,rk3328-dw-hdmi - rockchip,rk3399-dw-hdmi + - rockchip,rk3568-dw-hdmi reg-io-width: const: 4 @@ -51,8 +52,11 @@ properties: - vpll - enum: - grf + - hclk_vio + - vpll + - enum: + - hclk - vpll - - const: vpll The description and documentation of the clocks are somewhat misleading IMHO. This is not caused by your patches, of course. But maybe this is a chance to clean them up a bit. It seems that the CEC clock is an optional clock of the dw-hdmi driver. Shouldn't it be d
Re: use of dma_direct_set_offset in (allwinner) drivers
On 2020-11-04 08:14, Maxime Ripard wrote: Hi Christoph, On Tue, Nov 03, 2020 at 10:55:38AM +0100, Christoph Hellwig wrote: Linux 5.10-rc1 switched from having a single dma offset in struct device to a set of DMA ranges, and introduced a new helper to set them, dma_direct_set_offset. This in fact surfaced that a bunch of drivers that violate our layering and set the offset from drivers, which meant we had to reluctantly export the symbol to set up the DMA range. The drivers are: drivers/gpu/drm/sun4i/sun4i_backend.c This just use dma_direct_set_offset as a fallback. Is there any good reason to not just kill off the fallback? drivers/media/platform/sunxi/sun4i-csi/sun4i_csi.c Same as above. So, the history of this is: - We initially introduced the support for those two controllers assuming that there was a direct mapping between the physical and DMA addresses. It turns out it didn't and the DMA accesses were going through a secondary, dedicated, bus that didn't have the same mapping of the RAM than the CPU. 4690803b09c6 ("drm/sun4i: backend: Offset layer buffer address by DRAM starting address") - This dedicated bus is undocumented and barely used in the vendor kernel so this was overlooked, and it's fairly hard to get infos on it for all the SoCs we support. We added the DT support for it though on some SoCs we had enough infos to do so: c43a4469402f ("dt-bindings: interconnect: Add a dma interconnect name") 22f88e311399 ("ARM: dts: sun5i: Add the MBUS controller") This explains the check on the interconnect property - However, due to the stable DT rule, we still need to operate without regressions on older DTs that wouldn't have that property (and for SoCs we haven't figured out). Hence the fallback. How about having something in the platform code that keys off the top-level SoC compatible and uses a bus notifier to create offsets for the relevant devices if an MBUS description is missing? At least that way the workaround could be confined to a single dedicated place and look somewhat similar to other special cases like sta2x11, rather than being duplicated all over the place. Robin. drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c This driver unconditionally sets the offset. Why can't we do this in the device tree? drivers/staging/media/sunxi/cedrus/cedrus_hw.c Same as above. We should make those two match the previous ones, but we'll have the same issue here eventually. Most likely they were never ran on an SoC for which we have the MBUS figured out. Maxime ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH RFC libdrm 1/2] intel: Improve checks for big-endian
On 2022-03-07 20:53, Geert Uytterhoeven wrote: - sparc64-linux-gnu-gcc does not define __BIG_ENDIAN__ or SPARC, but does define __sparc__, hence replace the check for SPARC by a check for __sparc__, - powerpc{,64,64}-linux-gnu-gcc does not define __ppc__ or __ppc64__, but does define __BIG_ENDIAN__. powerpc64le-linux-gnu-gcc does not define __ppc__ or __ppc64__, but does define __LITTLE_ENDIAN__. Hence remove the checks for __ppc__ and __ppc64__. - mips{,64}-linux-gnu{,abi64}-gcc does not define __BIG_ENDIAN__, but does define __MIPSEB__, so add a check for the latter, - m68k-linux-gnu-gcc does not define __BIG_ENDIAN__, but does define __mc68000__, so add a check for the latter, - hppa{,64}-linux-gnu-gcc defines __BIG_ENDIAN__, and thus should work out-of-the-box. FWIW there's also __ARM_BIG_ENDIAN for arm-* and aarch64-* targets in BE mode, if you want to flesh out the list even more. In principle there's also "__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__" which should supposedly be consistent across platforms, but I don't know if that's even more of a specific GCC-ism. Robin. Signed-off-by: Geert Uytterhoeven --- Untested due to lack of hardware. --- intel/uthash.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intel/uthash.h b/intel/uthash.h index 45d1f9fc12a1d6f9..0f5480be6e8ca033 100644 --- a/intel/uthash.h +++ b/intel/uthash.h @@ -648,7 +648,7 @@ do { #define MUR_PLUS2_ALIGNED(p) (((unsigned long)p & 3UL) == 2UL) #define MUR_PLUS3_ALIGNED(p) (((unsigned long)p & 3UL) == 3UL) #define WP(p) ((uint32_t*)((unsigned long)(p) & ~3UL)) -#if (defined(__BIG_ENDIAN__) || defined(SPARC) || defined(__ppc__) || defined(__ppc64__)) +#if (defined(__BIG_ENDIAN__) || defined(__sparc__) || defined(__mc68000__) || defined(__MIPSEB__)) #define MUR_THREE_ONE(p) *WP(p))&0x00ff) << 8) | (((*(WP(p)+1))&0xff00) >> 24)) #define MUR_TWO_TWO(p) *WP(p))&0x) <<16) | (((*(WP(p)+1))&0x) >> 16)) #define MUR_ONE_THREE(p) *WP(p))&0x00ff) <<24) | (((*(WP(p)+1))&0xff00) >> 8))
Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk
On 2022-03-09 08:37, elaine.zhang wrote: hi, 在 2022/3/9 下午4:18, Sascha Hauer 写道: Hi Elaine, On Wed, Mar 09, 2022 at 09:41:39AM +0800, zhangq...@rock-chips.com wrote: hi,all: Let me explain the clock dependency: From the clock tree, pclk_vo0 and hclk_vo0 are completely independent clocks with different parent clocks and different clock frequencies。 But the niu path is : pclk_vo is dependent on hclk_vo, and the pclk_vo niu goes through hclk_vo niu. Thanks, this is the information we are looking for. What is "NIU" btw? I think this is even documented in the Reference Manual. With the right pointer I just found: The NIU (native interface unit) You can see 2.8.6(NIU Clock gating reliance) in TRM. A part of niu clocks have a dependence on another niu clock in order to sharing the internal bus. When these clocks are in use, another niu clock must be opened, and cannot be gated. These clocks and the special clock on which they are relied are as following: Clocks which have dependency The clock which can not be gated - ... pclk_vo_niu, hclk_vo_s_niu hclk_vo_niu ... Yeah, the dependency is this. It may be not very detailed, I don't have permission to publish detailed NIU designs. The clock tree and NIU bus paths are designed independently So there are three solutions to this problem: 1. DTS adds a reference to Hclk while referencing Pclk. 2, The dependent clock is always on, such as HCLK_VO0, but this is not friendly for the system power. 3. Create a non-clock-tree reference. Clk-link, for example, we have an implementation in our internal branch, but Upstream is not sure how to push it. I thought about something similar. That would help us here and on i.MX we have a similar situation: We have one bit that switches multiple clocks. That as well cannot be designed properly in the clock framework currently, but could be modelled with a concept of linked clocks. Doing this sounds like quite a bit of work and discussion though, I don't really like having this as a dependency to mainline the VOP2 driver. I vote for 1. in that case, we could still ignore the hclk in dts later when we have linked clocks. OK so just to clarify, the issue is not just that the upstream clock driver doesn't currently model the NIU clocks, but that even if it did, it would still need to implement some new multi-parent clock type to manage the internal dependency? That's fair enough - I do think that improving the clock driver would be the best long-term goal, but for the scope of this series, having an interim workaround does seem more reasonable now that we understand *why* we need it. Thanks, Robin.
Re: [PATCH v7 11/24] dt-bindings: display: rockchip: dw-hdmi: Add additional clock
On 2022-02-25 07:51, Sascha Hauer wrote: The rk3568 HDMI has an additional clock that needs to be enabled for the HDMI controller to work. The purpose of that clock is not clear. It is named "hclk" in the downstream driver, so use the same name. Further to the clarification of the underlying purpose on the other thread, I suggest we call the new clock "niu" and describe it as something like "Additional clock required to ungate the bus interface on certain SoCs". Cheers, Robin. Signed-off-by: Sascha Hauer Acked-by: Rob Herring --- Notes: Changes since v4: - Add Robs Ack .../bindings/display/rockchip/rockchip,dw-hdmi.yaml| 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml index 38ebb69830287..67a76f51637a7 100644 --- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml @@ -44,12 +44,13 @@ properties: items: - {} - {} - # The next three clocks are all optional, but shall be specified in this + # The next four clocks are all optional, but shall be specified in this # order when present. - description: The HDMI CEC controller main clock - description: Power for GRF IO - description: External clock for some HDMI PHY (old clock name, deprecated) - description: External clock for some HDMI PHY (new name) + - description: hclk clock-names: minItems: 2 @@ -61,13 +62,17 @@ properties: - grf - vpll - ref + - hclk - enum: - grf - vpll - ref + - hclk - enum: - vpll - ref + - hclk + - const: hclk ddc-i2c-bus: $ref: /schemas/types.yaml#/definitions/phandle
Re: [PATCH] drm/bridge: synopsys/dw-hdmi: set cec clock rate
On 2022-03-13 12:56, Peter Geis wrote: On Sun, Mar 13, 2022 at 6:13 AM Piotr Oniszczuk wrote: Wiadomość napisana przez Peter Geis w dniu 26.01.2022, o godz. 21:24: The hdmi-cec clock must be 32khz in order for cec to work correctly. Ensure after enabling the clock we set it in order for the hardware to work as expected. Warn on failure, in case this is a static clock that is slighty off. Fixes hdmi-cec support on Rockchip devices. Fixes: ebe32c3e282a ("drm/bridge: synopsys/dw-hdmi: Enable cec clock") Signed-off-by: Peter Geis --- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c index 54d8fdad395f..1a96da60e357 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -48,6 +48,9 @@ #define HDMI14_MAX_TMDSCLK34000 +/* HDMI CEC needs a clock rate of 32khz */ +#define HDMI_CEC_CLK_RATE32768 + enum hdmi_datamap { RGB444_8B = 0x01, RGB444_10B = 0x03, @@ -3347,6 +3350,10 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device *pdev, ret); goto err_iahb; } + + ret = clk_set_rate(hdmi->cec_clk, HDMI_CEC_CLK_RATE); + if (ret) + dev_warn(hdmi->dev, "Cannot set HDMI cec clock rate: %d\n", ret); } /* Product and revision IDs */ -- 2.25.1 ___ Linux-rockchip mailing list linux-rockc...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip Peter, On my 5.17-rc7 with applied rk356x VOP2 v8 series - this patch makes CEC working on rk3566. Unfortunately it breaks working ok CEC on rk3399 rockpi-4b. Reverting this patch brings back CEC on rk3399 - but rk3366 becomes with non working CEC I'm not sure how to move forward with this I was worried about that, thanks for testing it. Can you send me the cec_clk rate before and after this patch? Hmm, looks like there might be a bug in the RK3399 clock driver - although it's treated as having mux_pll_p as parents, according to the public TRM the CEC clock appears unique in being backwards compared to every other mux of those two inputs. I'm now tempted to test this on my board tonight and see if I can see 24MHz on the CEC pin... :) Robin.
Re: [PATCH] drm/bridge: synopsys/dw-hdmi: set cec clock rate
On 2022-03-14 11:31, Robin Murphy wrote: On 2022-03-13 12:56, Peter Geis wrote: On Sun, Mar 13, 2022 at 6:13 AM Piotr Oniszczuk wrote: Wiadomość napisana przez Peter Geis w dniu 26.01.2022, o godz. 21:24: The hdmi-cec clock must be 32khz in order for cec to work correctly. Ensure after enabling the clock we set it in order for the hardware to work as expected. Warn on failure, in case this is a static clock that is slighty off. Fixes hdmi-cec support on Rockchip devices. Fixes: ebe32c3e282a ("drm/bridge: synopsys/dw-hdmi: Enable cec clock") Signed-off-by: Peter Geis --- drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c index 54d8fdad395f..1a96da60e357 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -48,6 +48,9 @@ #define HDMI14_MAX_TMDSCLK 34000 +/* HDMI CEC needs a clock rate of 32khz */ +#define HDMI_CEC_CLK_RATE 32768 + enum hdmi_datamap { RGB444_8B = 0x01, RGB444_10B = 0x03, @@ -3347,6 +3350,10 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device *pdev, ret); goto err_iahb; } + + ret = clk_set_rate(hdmi->cec_clk, HDMI_CEC_CLK_RATE); + if (ret) + dev_warn(hdmi->dev, "Cannot set HDMI cec clock rate: %d\n", ret); } /* Product and revision IDs */ -- 2.25.1 ___ Linux-rockchip mailing list linux-rockc...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip Peter, On my 5.17-rc7 with applied rk356x VOP2 v8 series - this patch makes CEC working on rk3566. Unfortunately it breaks working ok CEC on rk3399 rockpi-4b. Reverting this patch brings back CEC on rk3399 - but rk3366 becomes with non working CEC I'm not sure how to move forward with this I was worried about that, thanks for testing it. Can you send me the cec_clk rate before and after this patch? Hmm, looks like there might be a bug in the RK3399 clock driver - although it's treated as having mux_pll_p as parents, according to the public TRM the CEC clock appears unique in being backwards compared to every other mux of those two inputs. I'm now tempted to test this on my board tonight and see if I can see 24MHz on the CEC pin... :) Nope, turns out that's an error in the TRM and the mux is fine. The bug is between the clock driver blindly assuming xin32k is usable, and nearly all RK3399 boards forgetting to claim the pinctrl to actually enable the input :/ Robin.
Re: [PATCH v2 4/8] drm/virtio: Improve DMA API usage for shmem BOs
On 2022-03-14 22:42, Dmitry Osipenko wrote: DRM API requires the DRM's driver to be backed with the device that can be used for generic DMA operations. The VirtIO-GPU device can't perform DMA operations if it uses PCI transport because PCI device driver creates a virtual VirtIO-GPU device that isn't associated with the PCI. Use PCI's GPU device for the DRM's device instead of the VirtIO-GPU device and drop DMA-related hacks from the VirtIO-GPU driver. Signed-off-by: Dmitry Osipenko --- drivers/gpu/drm/virtio/virtgpu_drv.c| 22 +++--- drivers/gpu/drm/virtio/virtgpu_drv.h| 5 +-- drivers/gpu/drm/virtio/virtgpu_kms.c| 7 ++-- drivers/gpu/drm/virtio/virtgpu_object.c | 56 + drivers/gpu/drm/virtio/virtgpu_vq.c | 13 +++--- 5 files changed, 37 insertions(+), 66 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c index 5f25a8d15464..8449dad3e65c 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.c +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c @@ -46,9 +46,9 @@ static int virtio_gpu_modeset = -1; MODULE_PARM_DESC(modeset, "Disable/Enable modesetting"); module_param_named(modeset, virtio_gpu_modeset, int, 0400); -static int virtio_gpu_pci_quirk(struct drm_device *dev, struct virtio_device *vdev) +static int virtio_gpu_pci_quirk(struct drm_device *dev) { - struct pci_dev *pdev = to_pci_dev(vdev->dev.parent); + struct pci_dev *pdev = to_pci_dev(dev->dev); const char *pname = dev_name(&pdev->dev); bool vga = (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; char unique[20]; @@ -101,6 +101,7 @@ static int virtio_gpu_pci_quirk(struct drm_device *dev, struct virtio_device *vd static int virtio_gpu_probe(struct virtio_device *vdev) { struct drm_device *dev; + struct device *dma_dev; int ret; if (drm_firmware_drivers_only() && virtio_gpu_modeset == -1) @@ -109,18 +110,29 @@ static int virtio_gpu_probe(struct virtio_device *vdev) if (virtio_gpu_modeset == 0) return -EINVAL; - dev = drm_dev_alloc(&driver, &vdev->dev); + /* +* If GPU's parent is a PCI device, then we will use this PCI device +* for the DRM's driver device because GPU won't have PCI's IOMMU DMA +* ops in this case since GPU device is sitting on a separate (from PCI) +* virtio-bus. +*/ + if (!strcmp(vdev->dev.parent->bus->name, "pci")) Nit: dev_is_pci() ? However, what about other VirtIO transports? Wouldn't virtio-mmio with F_ACCESS_PLATFORM be in a similar situation? Robin. + dma_dev = vdev->dev.parent; + else + dma_dev = &vdev->dev; + + dev = drm_dev_alloc(&driver, dma_dev); if (IS_ERR(dev)) return PTR_ERR(dev); vdev->priv = dev; if (!strcmp(vdev->dev.parent->bus->name, "pci")) { - ret = virtio_gpu_pci_quirk(dev, vdev); + ret = virtio_gpu_pci_quirk(dev); if (ret) goto err_free; } - ret = virtio_gpu_init(dev); + ret = virtio_gpu_init(vdev, dev); if (ret) goto err_free; diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h index 0a194aaad419..b2d93cb12ebf 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.h +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h @@ -100,8 +100,6 @@ struct virtio_gpu_object { struct virtio_gpu_object_shmem { struct virtio_gpu_object base; - struct sg_table *pages; - uint32_t mapped; }; struct virtio_gpu_object_vram { @@ -214,7 +212,6 @@ struct virtio_gpu_drv_cap_cache { }; struct virtio_gpu_device { - struct device *dev; struct drm_device *ddev; struct virtio_device *vdev; @@ -282,7 +279,7 @@ extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS]; void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file); /* virtgpu_kms.c */ -int virtio_gpu_init(struct drm_device *dev); +int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev); void virtio_gpu_deinit(struct drm_device *dev); void virtio_gpu_release(struct drm_device *dev); int virtio_gpu_driver_open(struct drm_device *dev, struct drm_file *file); diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c index 3313b92db531..0d1e3eb61bee 100644 --- a/drivers/gpu/drm/virtio/virtgpu_kms.c +++ b/drivers/gpu/drm/virtio/virtgpu_kms.c @@ -110,7 +110,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device *vgdev, vgdev->num_capsets = num_capsets; } -int virtio_gpu_init(struct drm_device *dev) +int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev) {
Re: [PATCH v8 09/24] drm/rockchip: dw_hdmi: Add support for niu clk
eally caring about power efficiency, then arguably the simplest and cleanest workaround would be the other option that Elaine mentioned, of just marking hclk_vo as critical for now. If it's likely to turn into a "nothing's as permanent as a temporary fix" situation, though, then the DT binding has less functional impact, even if it does leave us developers with baggage down the line. Robin. I have last questions.. 1. Previously you said that the PD driver takes care of enabling all the clocks it can find in the device by itself on RPM-resume, then why HDMI driver needs to enable the clock explicitly? 2. You added clk_prepare_enable(), but there is no corresponding clk_disable_unprepare(), AFAICS. Why?
Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64
On 2022-03-23 07:15, Christian König wrote: Am 22.03.22 um 10:34 schrieb Cong Liu: qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access. Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards. So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct. No, this one's not a bug. The Device memory type used for iomem mappings is *architecturally* defined to enforce properties like aligned accesses, no speculation, no reordering, etc. If something wants to be treated more like RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the appropriate thing to do in general (with the former being a bit more portable according to Documentation/driver-api/device-io.rst). Of course *then* you might find that on some systems the interconnect/PCIe implementation/endpoint doesn't actually like unaligned accesses once the CPU MMU starts allowing them to be sent out, but hey, one step at a time ;) Robin. Christian. 6.620515] pc : setup_hw_slot+0x24/0x60 [qxl] [ 6.620961] lr : setup_slot+0x34/0xf0 [qxl] [ 6.621376] sp : 800012b73760 [ 6.621701] x29: 800012b73760 x28: 0001 x27: 1000 [ 6.622400] x26: 0001 x25: 0400 x24: cf376848c000 [ 6.623099] x23: c4087400 x22: cf3718e17140 x21: [ 6.623823] x20: c4086000 x19: c40870b0 x18: 0014 [ 6.624519] x17: 4d3605ab x16: bb3b6129 x15: 6e771809 [ 6.625214] x14: 0001 x13: 007473696c5f7974 x12: 696e69615f65 [ 6.625909] x11: d543656a x10: x9 : cf3718e085a4 [ 6.626616] x8 : 006c7871 x7 : 000a x6 : 0017 [ 6.627343] x5 : 1400 x4 : 800011f63400 x3 : 1400 [ 6.628047] x2 : x1 : c40870b0 x0 : c4086000 [ 6.628751] Call trace: [ 6.628994] setup_hw_slot+0x24/0x60 [qxl] [ 6.629404] setup_slot+0x34/0xf0 [qxl] [ 6.629790] qxl_device_init+0x6f0/0x7f0 [qxl] [ 6.630235] qxl_pci_probe+0xdc/0x1d0 [qxl] [ 6.630654] local_pci_probe+0x48/0xb8 [ 6.631027] pci_device_probe+0x194/0x208 [ 6.631464] really_probe+0xd0/0x458 [ 6.631818] __driver_probe_device+0x124/0x1c0 [ 6.632256] driver_probe_device+0x48/0x130 [ 6.632669] __driver_attach+0xc4/0x1a8 [ 6.633049] bus_for_each_dev+0x78/0xd0 [ 6.633437] driver_attach+0x2c/0x38 [ 6.633789] bus_add_driver+0x154/0x248 [ 6.634168] driver_register+0x6c/0x128 [ 6.635205] __pci_register_driver+0x4c/0x58 [ 6.635628] qxl_init+0x48/0x1000 [qxl] [ 6.636013] do_one_initcall+0x50/0x240 [ 6.636390] do_init_module+0x60/0x238 [ 6.636768] load_module+0x2458/0x2900 [ 6.637136] __do_sys_finit_module+0xbc/0x128 [ 6.637561] __arm64_sys_finit_module+0x28/0x38 [ 6.638004] invoke_syscall+0x74/0xf0 [ 6.638366] el0_svc_common.constprop.0+0x58/0x1a8 [ 6.638836] do_el0_svc+0x2c/0x90 [ 6.639216] el0_svc+0x40/0x190 [ 6.639526] el0t_64_sync_handler+0xb0/0xb8 [ 6.639934] el0t_64_sync+0x1a4/0x1a8 [ 6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083) [ 6.640889] ---[ end trace 95615d89b7c87f95 ]--- Signed-off-by: Cong Liu --- drivers/gpu/drm/qxl/qxl_kms.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4dc5ad13f12c..0e61ac04d8ad 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev, (int)qdev->surfaceram_size / 1024, (sb == 4) ? "64bit" : "32bit"); +#ifdef CONFIG_ARM64 + qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size); +#else qdev->rom = ioremap(qdev->rom_base, qdev->rom_size); +#endif if (!qdev->rom) { pr_err("Unable to ioremap ROM\n"); r = -ENOMEM; @@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev, goto rom_unmap; } +#ifdef CONFIG_ARM64 + qdev->ram_header = ioremap_cache(qdev->vram_base + + qdev->rom->ram_header_offset, + sizeof(*qdev->ram_header)); +#else qdev->ram_header = ioremap(qdev->vram_base + qdev->rom->ram_header_offset, sizeof(*qdev->ram_header)); +#endif if (!qdev->ram_header) { DRM_ERROR("Unable to ioremap RAM header\n"); r = -ENOMEM;
Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64
On 2022-03-23 10:11, Gerd Hoffmann wrote: On Wed, Mar 23, 2022 at 09:45:13AM +, Robin Murphy wrote: On 2022-03-23 07:15, Christian K�nig wrote: Am 22.03.22 um 10:34 schrieb Cong Liu: qxl use ioremap to map ram_header and rom, in the arm64 implementation, the device is mapped as DEVICE_nGnRE, it can not support unaligned access. Well that some ARM boards doesn't allow unaligned access to MMIO space is a well known bug of those ARM boards. So as far as I know this is a hardware bug you are trying to workaround here and I'm not 100% sure that this is correct. No, this one's not a bug. The Device memory type used for iomem mappings is *architecturally* defined to enforce properties like aligned accesses, no speculation, no reordering, etc. If something wants to be treated more like RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the appropriate thing to do in general (with the former being a bit more portable according to Documentation/driver-api/device-io.rst). Well, qxl is a virtual device, so it *is* ram. I'm wondering whenever qxl actually works on arm? As far I know all virtual display devices with (virtual) pci memory bars for vram do not work on arm due to the guest mapping vram as io memory and the host mapping vram as normal ram and the mapping attribute mismatch causes caching troubles (only noticeable on real arm hardware, not in emulation). Did something change here recently? Indeed, Armv8.4 introduced the S2FWB feature to cope with situations like this - essentially it allows the hypervisor to share RAM-backed pages with the guest without losing coherency regardless of how the guest maps them. Robin.
Re: [PATCH v3 1/9] dt-bindings: host1x: Add memory-contexts property
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote: Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node. Couldn't you simply make "iommu-map" an allowed property on the host1x node itself? From a DT perspective I'm not sure the intermediate node really fits meaningfully, and I can't see that it serves much purpose in practice either, other than perhaps defeating fw_devlink. Robin. Signed-off-by: Mikko Perttunen --- v3: * New patch --- .../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml index 4fd513efb0f7..3ac0fde54a16 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml @@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1 +memory-contexts: + type: object + properties: +iommu-map: + description: Specification of stream IDs available for memory context device +use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to +usable stream IDs. + required: +- iommu-map + required: - reg-names
Re: [PATCH v3 1/9] dt-bindings: host1x: Add memory-contexts property
On 2022-02-21 15:28, Mikko Perttunen wrote: On 2/21/22 17:23, Robin Murphy wrote: On 2022-02-18 11:39, Mikko Perttunen via iommu wrote: Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node. Couldn't you simply make "iommu-map" an allowed property on the host1x node itself? From a DT perspective I'm not sure the intermediate node really fits meaningfully, and I can't see that it serves much purpose in practice either, other than perhaps defeating fw_devlink. Robin. The stream IDs described here are not used by the host1x device itself, so I don't think I can. Host1x's memory transactions still go through the stream ID specified in its 'iommus' property, these stream IDs are used by engines (typically in addition to the stream ID specified in their own nodes). Host1x 'iommus' -- Channel commands Engine 'iommus' -- Engine firmware (and data if context isolation is not enabled) memory-contexts 'iommu-map' -- Data used by engines. Right, that still appears to match my understanding, that as far as software sees, the host1x is effectively acting as a bridge to the engines in itself. Even if it's not physically routing traffic in and/or out, the host1x device is the place where the context IDs *logically* exist, and thus owns the mapping between context IDs and the StreamIDs emitted by any engine working in a given context. Consider a PCIe root complex with integrated endpoints - chances are the RCiEPs have their own physical interfaces to issue DMA directly into the SoC interconnect, but that doesn't change how we describe the PCI Requester ID to StreamID mapping at the root complex, since the RC still logically owns the RID space. You can think of a RID as being "consumed" at the RC by indexing into config space to ultimately gain control of the corresponding endpoint, just like context IDs are "consumed" at the host1x by generating commands to ultimately cause some engine to operate in the correct address space. You don't have to pretend the host1x uses a context for its own command-fetching (or whatever) traffic either - it's always been intended that the "iommus" and "iommu-map" properties should happily be able to coexist on the same node, since they serve distinctly different purposes. If it doesn't work in practice then we've got a bug to fix somewhere. If the context-switching mechanism was some distinct self-contained thing bolted on beside the other host1x functionality then describing it as a separate level of DT hierarchy might be more justifiable, but that's not the impression I'm getting from skimming the rest of the series. Just reading of the names of things in patch #6, my intuitive reaction is that clearly each host1x owns 9 StreamIDs, one for general stuff and 8 for contexts. Adding the knowledge that technically the context StreamIDs end up delegated to other host1x-controlled engines still doesn't shift the paradigm. I don't believe we need a level of DT structure purely to help document what the iommu-map means for host1x - the binding can do that just fine. Thanks, Robin. (Perhaps I should add this information to various places in more abundance and clarity.) Mikko Signed-off-by: Mikko Perttunen --- v3: * New patch --- .../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml index 4fd513efb0f7..3ac0fde54a16 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml @@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1 + memory-contexts: + type: object + properties: + iommu-map: + description: Specification of stream IDs available for memory context device + use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to + usable stream IDs. + required: + - iommu-map + required: - reg-names
Re: [PATCH v3 8/9] drm/tegra: vic: Implement get_streamid_offset
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote: Implement the get_streamid_offset required for supporting context isolation. Since old firmware cannot support context isolation without hacks that we don't want to implement, check the firmware binary to see if context isolation should be enabled. Signed-off-by: Mikko Perttunen --- drivers/gpu/drm/tegra/vic.c | 38 + 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index 1e342fa3d27b..2863ee5e0e67 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -38,6 +38,8 @@ struct vic { struct clk *clk; struct reset_control *rst; + bool can_use_context; + /* Platform configuration */ const struct vic_config *config; }; @@ -229,6 +231,7 @@ static int vic_load_firmware(struct vic *vic) { struct host1x_client *client = &vic->client.base; struct tegra_drm *tegra = vic->client.drm; + u32 fce_bin_data_offset; dma_addr_t iova; size_t size; void *virt; @@ -277,6 +280,25 @@ static int vic_load_firmware(struct vic *vic) vic->falcon.firmware.phys = phys; } + /* +* Check if firmware is new enough to not require mapping firmware +* to data buffer domains. +*/ + fce_bin_data_offset = *(u32 *)(virt + VIC_UCODE_FCE_DATA_OFFSET); + + if (!vic->config->supports_sid) { + vic->can_use_context = false; + } else if (fce_bin_data_offset != 0x0 && fce_bin_data_offset != 0xa5a5a5a5) { + /* +* Firmware will access FCE through STREAMID0, so context +* isolation cannot be used. +*/ + vic->can_use_context = false; + dev_warn_once(vic->dev, "context isolation disabled due to old firmware\n"); + } else { + vic->can_use_context = true; + } + return 0; cleanup: @@ -358,10 +380,26 @@ static void vic_close_channel(struct tegra_drm_context *context) host1x_channel_put(context->channel); } +static int vic_get_streamid_offset(struct tegra_drm_client *client) +{ + struct vic *vic = to_vic(client); + int err; + + err = vic_load_firmware(vic); + if (err < 0) + return err; + + if (vic->can_use_context) + return 0x30; + else + return -ENOTSUPP; +} + static const struct tegra_drm_client_ops vic_ops = { .open_channel = vic_open_channel, .close_channel = vic_close_channel, .submit = tegra_drm_submit, + .get_streamid_offset = vic_get_streamid_offset, The patch order seems off here, since the .get_streamid_offset member isn't defined yet. Robin. }; #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
Re: [PATCH v3 2/9] gpu: host1x: Add context bus
On 2022-02-22 16:21, Christoph Hellwig wrote: On Fri, Feb 18, 2022 at 01:39:45PM +0200, Mikko Perttunen wrote: The context bus is a "dummy" bus that contains struct devices that correspond to IOMMU contexts assigned through Host1x to processes. Even when host1x itself is built as a module, the bus is registered in built-in code so that the built-in ARM SMMU driver is able to reference it. Signed-off-by: Mikko Perttunen --- drivers/gpu/Makefile | 3 +-- drivers/gpu/host1x/Kconfig | 5 + drivers/gpu/host1x/Makefile| 1 + drivers/gpu/host1x/context_bus.c | 31 ++ include/linux/host1x_context_bus.h | 15 +++ 5 files changed, 53 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/host1x/context_bus.c create mode 100644 include/linux/host1x_context_bus.h diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile index 835c88318cec..8997f0096545 100644 --- a/drivers/gpu/Makefile +++ b/drivers/gpu/Makefile @@ -2,7 +2,6 @@ # drm/tegra depends on host1x, so if both drivers are built-in care must be # taken to initialize them in the correct order. Link order is the only way # to ensure this currently. -obj-$(CONFIG_TEGRA_HOST1X) += host1x/ -obj-y += drm/ vga/ +obj-y += host1x/ drm/ vga/ obj-$(CONFIG_IMX_IPUV3_CORE) += ipu-v3/ obj-$(CONFIG_TRACE_GPU_MEM) += trace/ diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index 6815b4db17c1..1861a8180d3f 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -1,8 +1,13 @@ # SPDX-License-Identifier: GPL-2.0-only + +config TEGRA_HOST1X_CONTEXT_BUS + bool + config TEGRA_HOST1X tristate "NVIDIA Tegra host1x driver" depends on ARCH_TEGRA || (ARM && COMPILE_TEST) select DMA_SHARED_BUFFER + select TEGRA_HOST1X_CONTEXT_BUS select IOMMU_IOVA help Driver for the NVIDIA Tegra host1x hardware. diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index d2b6f7de0498..c891a3e33844 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -18,3 +18,4 @@ host1x-y = \ hw/host1x07.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o +obj-$(CONFIG_TEGRA_HOST1X_CONTEXT_BUS) += context_bus.o diff --git a/drivers/gpu/host1x/context_bus.c b/drivers/gpu/host1x/context_bus.c new file mode 100644 index ..2625914f3c7d --- /dev/null +++ b/drivers/gpu/host1x/context_bus.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2021, NVIDIA Corporation. + */ + +#include +#include + +struct bus_type host1x_context_device_bus_type = { + .name = "host1x-context", +}; +EXPORT_SYMBOL(host1x_context_device_bus_type); EXPORT_SYMBOL_GPL, please. But the pattern that this copies in arm_smmu_bus_init is really ugly. I think we need to figure out a way todo that without having to export all the low-level bus types. Yup, as it happens that was the first step on my mission :) https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus Still a way to go with the main meat of that work, though, so I was figuring this could probably land as-is and I'll sweep it up in due course. Robin.
Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk
On 2022-02-25 11:10, Dmitry Osipenko wrote: 25.02.2022 13:49, Sascha Hauer пишет: On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote: 25.02.2022 10:51, Sascha Hauer пишет: The rk3568 HDMI has an additional clock that needs to be enabled for the HDMI controller to work. The purpose of that clock is not clear. It is named "hclk" in the downstream driver, so use the same name. Signed-off-by: Sascha Hauer --- Notes: Changes since v5: - Use devm_clk_get_optional rather than devm_clk_get drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c index fe4f9556239ac..c6c00e8779ab5 100644 --- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c @@ -76,6 +76,7 @@ struct rockchip_hdmi { const struct rockchip_hdmi_chip_data *chip_data; struct clk *ref_clk; struct clk *grf_clk; + struct clk *hclk_clk; struct dw_hdmi *hdmi; struct regulator *avdd_0v9; struct regulator *avdd_1v8; @@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi *hdmi) return PTR_ERR(hdmi->grf_clk); } + hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk"); + if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) { Have you tried to investigate the hclk? I'm still thinking that's not only HDMI that needs this clock and then the hardware description doesn't look correct. I am still not sure what you mean. Yes, it's not only the HDMI that needs this clock. The VOP2 needs it as well and the driver handles that. I'm curious whether DSI/DP also need that clock to be enabled. If they do, then you aren't modeling h/w properly AFAICS. Assuming nobody at Rockchip decided to make things needlessly inconsistent with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave interface. Usually, if that affected anything other than accessing VOP registers, indeed it would smell of something being wrong in the clock tree, but in this case I'd also be suspicious of whether it might have ended up clocking related GRF registers as well (either directly, or indirectly via some gate that the clock driver hasn't modelled yet). If the symptom of not claiming HCLK_VOP is hanging on some register access in the HDMI driver while the VOP is idle, then it should be relatively straightforward to narrow down with some logging, and see if it looks like this is really just another "grf" clock. If not, then we're back to suspecting something more insidiously wrong elsewhere. Robin.
Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk
On 2022-02-25 13:11, Sascha Hauer wrote: On Fri, Feb 25, 2022 at 12:41:23PM +, Robin Murphy wrote: On 2022-02-25 11:10, Dmitry Osipenko wrote: 25.02.2022 13:49, Sascha Hauer пишет: On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote: 25.02.2022 10:51, Sascha Hauer пишет: The rk3568 HDMI has an additional clock that needs to be enabled for the HDMI controller to work. The purpose of that clock is not clear. It is named "hclk" in the downstream driver, so use the same name. Signed-off-by: Sascha Hauer --- Notes: Changes since v5: - Use devm_clk_get_optional rather than devm_clk_get drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c index fe4f9556239ac..c6c00e8779ab5 100644 --- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c @@ -76,6 +76,7 @@ struct rockchip_hdmi { const struct rockchip_hdmi_chip_data *chip_data; struct clk *ref_clk; struct clk *grf_clk; + struct clk *hclk_clk; struct dw_hdmi *hdmi; struct regulator *avdd_0v9; struct regulator *avdd_1v8; @@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi *hdmi) return PTR_ERR(hdmi->grf_clk); } + hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk"); + if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) { Have you tried to investigate the hclk? I'm still thinking that's not only HDMI that needs this clock and then the hardware description doesn't look correct. I am still not sure what you mean. Yes, it's not only the HDMI that needs this clock. The VOP2 needs it as well and the driver handles that. I'm curious whether DSI/DP also need that clock to be enabled. If they do, then you aren't modeling h/w properly AFAICS. Assuming nobody at Rockchip decided to make things needlessly inconsistent with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave interface. Usually, if that affected anything other than accessing VOP registers, indeed it would smell of something being wrong in the clock tree, but in this case I'd also be suspicious of whether it might have ended up clocking related GRF registers as well (either directly, or indirectly via some gate that the clock driver hasn't modelled yet). Ok, I am beginning to understand. I verified that hdmi, mipi and dp are hanging when HCLK_VOP is disabled by disabling that clock via sysfs using CLOCK_ALLOW_WRITE_DEBUGFS. When it's disabled then the registers of that units can't be accessed. However, when I disable HCLK_VOP by directly writing to the gate bit RK3568_CLKGATE_CON(20) then only accessing VOP registers hangs, the other units stay functional. So it seems it must be the parent clock which must be enabled. The parent clock is hclk_vo. This clock should be handled as part of the RK3568_PD_VO power domain: power-domain@RK3568_PD_VO { reg = ; clocks = <&cru HCLK_VO>, <&cru PCLK_VO>, <&cru ACLK_VOP_PRE>; pm_qos = <&qos_hdcp>, <&qos_vop_m0>, <&qos_vop_m1>; #power-domain-cells = <0>; }; The HDMI controller is part of that domain, so I think this should work, but it doesn't. That's where I am now, I'll have a closer look. Ah, interesting. Looking at the clock driver, I'd also be suspicious whether pclk_vo is somehow messed up such that we're currently relying on hclk_vo to keep the common grandparent enabled. Seems like the DSI and eDP (and HDCP if anyone ever used it) registers would be similarly affected if so, and sure enough they both have a similarly suspect extra "hclk" in the downstream DT too. Robin.
Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
[ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] On 2022-02-25 03:24, Michael Cheng wrote: Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou performs a flush by first performing a clean, follow by an invalidation operation. v2 (Michael Cheng): Use correct macro for cleaning and invalidation the dcache. Thanks Tvrtko for the suggestion. v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a symbol that could be use by other modules, thus use caches_clean_inval_pou instead. Also this version removes include for cacheflush, since its already included base on architecture type. Signed-off-by: Michael Cheng Reviewed-by: Matt Roper --- drivers/gpu/drm/drm_cache.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index c3e6e615bf09..81c28714f930 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned long length) if (wbinvd_on_all_cpus()) pr_err("Timed out waiting for cache flush\n"); + +#elif defined(CONFIG_ARM64) + void *end = addr + length; + caches_clean_inval_pou((unsigned long)addr, (unsigned long)end); Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? Robin. (Note that the above is somewhat of a loaded question, and I do actually have half an idea of what you're trying to do here and why it won't fly, but I'd like to at least assume you've read the documentation of the function you decided was OK to use) + #else WARN_ONCE(1, "Architecture has no drm_cache.c support\n"); #endif
Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk
On 2022-02-28 14:19, Sascha Hauer wrote: On Fri, Feb 25, 2022 at 02:11:54PM +0100, Sascha Hauer wrote: On Fri, Feb 25, 2022 at 12:41:23PM +, Robin Murphy wrote: On 2022-02-25 11:10, Dmitry Osipenko wrote: 25.02.2022 13:49, Sascha Hauer пишет: On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote: 25.02.2022 10:51, Sascha Hauer пишет: The rk3568 HDMI has an additional clock that needs to be enabled for the HDMI controller to work. The purpose of that clock is not clear. It is named "hclk" in the downstream driver, so use the same name. Signed-off-by: Sascha Hauer --- Notes: Changes since v5: - Use devm_clk_get_optional rather than devm_clk_get drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c index fe4f9556239ac..c6c00e8779ab5 100644 --- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c @@ -76,6 +76,7 @@ struct rockchip_hdmi { const struct rockchip_hdmi_chip_data *chip_data; struct clk *ref_clk; struct clk *grf_clk; + struct clk *hclk_clk; struct dw_hdmi *hdmi; struct regulator *avdd_0v9; struct regulator *avdd_1v8; @@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi *hdmi) return PTR_ERR(hdmi->grf_clk); } + hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk"); + if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) { Have you tried to investigate the hclk? I'm still thinking that's not only HDMI that needs this clock and then the hardware description doesn't look correct. I am still not sure what you mean. Yes, it's not only the HDMI that needs this clock. The VOP2 needs it as well and the driver handles that. I'm curious whether DSI/DP also need that clock to be enabled. If they do, then you aren't modeling h/w properly AFAICS. Assuming nobody at Rockchip decided to make things needlessly inconsistent with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave interface. Usually, if that affected anything other than accessing VOP registers, indeed it would smell of something being wrong in the clock tree, but in this case I'd also be suspicious of whether it might have ended up clocking related GRF registers as well (either directly, or indirectly via some gate that the clock driver hasn't modelled yet). Ok, I am beginning to understand. I verified that hdmi, mipi and dp are hanging when HCLK_VOP is disabled by disabling that clock via sysfs using CLOCK_ALLOW_WRITE_DEBUGFS. When it's disabled then the registers of that units can't be accessed. However, when I disable HCLK_VOP by directly writing to the gate bit RK3568_CLKGATE_CON(20) then only accessing VOP registers hangs, the other units stay functional. So it seems it must be the parent clock which must be enabled. The parent clock is hclk_vo. This clock should be handled as part of the RK3568_PD_VO power domain: power-domain@RK3568_PD_VO { reg = ; clocks = <&cru HCLK_VO>, <&cru PCLK_VO>, <&cru ACLK_VOP_PRE>; pm_qos = <&qos_hdcp>, <&qos_vop_m0>, <&qos_vop_m1>; #power-domain-cells = <0>; }; Forget this. The clocks in this node are only enabled during enabling or disabling the power domain, they are disabled again immediately afterwards. OK, I need HCLK_VO to access the HDMI registers. I verified that by disabling HCLK_VO at register level (CRU_GATE_CON(20) BIT(1)). The HDMI registers become inaccessible then. This means I'll replace HCLK_VOP in the HDMI node with HCLK_VO. Does this sound sane? Well, it's still a mystery hack overall, and in some ways it seems even more suspect to be claiming a whole branch of the clock tree rather than a leaf gate with a specific purpose. I'm really starting to think that the underlying issue here is a bug in the clock driver, or a hardware mishap that should logically be worked around by the clock driver, rather than individual the consumers. Does it work if you hack the clock driver to think that PCLK_VO is a child of HCLK_VO? Even if that's not technically true, it would seem to effectively match the observed behaviour (i.e. all 3 things whose register access apparently *should* be enabled by a gate off PCLK_VO, seem to also require HCLK_VO). Thanks, Robin.
Re: [PATCH v4 1/9] dt-bindings: host1x: Add iommu-map property
On 2022-03-01 16:14, cyn...@kapsi.fi wrote: From: Mikko Perttunen Add schema information for specifying context stream IDs. This uses the standard iommu-map property. Signed-off-by: Mikko Perttunen --- v3: * New patch v4: * Remove memory-contexts subnode. --- .../bindings/display/tegra/nvidia,tegra20-host1x.yaml| 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml index 4fd513efb0f7..0adeb03b9e3a 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml @@ -144,6 +144,11 @@ allOf: reset-names: maxItems: 1 +iommu-map: + description: Specification of stream IDs available for memory context device +use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to Nit: maybe "context IDs 0..n" for maximum possible clarity? Either way, though, I'm happy that if the simplest and most straightforward approach works, then it's the best choice. Reviewed-by: Robin Murphy Cheers, Robin. +usable stream IDs. + required: - reg-names
Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
On 2022-02-25 19:27, Michael Cheng wrote: Hi Robin, [ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] * Thanks for adding the arm64 maintainer and sorry I didn't rope them in sooner. Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? * Also thanks for pointing this out. Initially I was using dcache_clean_inval_poc, which seem to be the equivalently to what x86 is doing for dcache flushing, but it was giving me build errors since its not on the global list of kernel symbols. And after revisiting the documentation for caches_clean_inval_pou, it won't fly for what we are trying to do. Moving forward, what would you (or someone in the ARM community) suggest we do? Could it be possible to export dcache_clean_inval_poc as a global symbol? Unlikely, unless something with a legitimate need for CPU-centric cache maintenance like kexec or CPU hotplug ever becomes modular. In the case of a device driver, it's not even the basic issues of assuming to find direct equivalents to x86 semantics in other CPU architectures, or effectively reinventing parts of the DMA API, it's even bigger than that. Once you move from being integrated in a single vendor's system architecture to being on a discrete card, you fundamentally *no longer have any control over cache coherency*. Whether the host CPU architecture happens to be AArch64, RISC-V, or whatever doesn't really matter, you're at the mercy of 3rd-party PCIe and interconnect IP vendors, and SoC integrators. You'll find yourself in systems where PCIe simply cannot snoop any caches, where you'd better have the correct DMA API calls in place to have any hope of even the most basic functionality working properly; you'll find yourself in systems where even if the PCIe root complex claims to support No Snoop, your uncached traffic will still end up snooping stale data that got prefetched back into caches you thought you'd invalidated; you'll find yourself in systems where your memory attributes may or may not get forcibly rewritten by an IOMMU depending on the kernel config and/or command line. It's not about simply finding a substitute for clflush, it's that the reasons you have for using clflush in the first place can no longer be assumed to be valid. Robin. On 2022-02-25 10:24 a.m., Robin Murphy wrote: [ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] On 2022-02-25 03:24, Michael Cheng wrote: Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou performs a flush by first performing a clean, follow by an invalidation operation. v2 (Michael Cheng): Use correct macro for cleaning and invalidation the dcache. Thanks Tvrtko for the suggestion. v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a symbol that could be use by other modules, thus use caches_clean_inval_pou instead. Also this version removes include for cacheflush, since its already included base on architecture type. Signed-off-by: Michael Cheng Reviewed-by: Matt Roper --- drivers/gpu/drm/drm_cache.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index c3e6e615bf09..81c28714f930 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned long length) if (wbinvd_on_all_cpus()) pr_err("Timed out waiting for cache flush\n"); + +#elif defined(CONFIG_ARM64) + void *end = addr + length; + caches_clean_inval_pou((unsigned long)addr, (unsigned long)end); Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? Robin. (Note that the above is somewhat of a loaded question, and I do actually have half an idea of what you're trying to do here and why it won't fly, but I'd like to at least assume you've read the documentation of the function you decided was OK to use) + #else WARN_ONCE(1, "Architecture has no drm_cache.c support\n"); #endif
Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
On 2022-03-02 15:55, Michael Cheng wrote: Thanks for the feedback Robin! Sorry my choices of word weren't that great, but what I meant is to understand how ARM flushes a range of dcache for device drivers, and not an equal to x86 clflush. I believe the concern is if the CPU writes an update, that update might only be sitting in the CPU cache and never make it to device memory where the device can see it; there are specific places that we are supposed to flush the CPU caches to make sure our updates are visible to the hardware. Ah, OK, if it's more about ordering, and it's actually write buffers rather than caches that you care about flushing, then we might be a lot safer, phew! For a very simple overview, in a case where the device itself needs to observe memory writes in the correct order, e.g.: data_descriptor.valid = 1; clflush(&data_descriptor); command_descriptor.data = &data_descriptor writel(/* control register to read command to then read data */) then dma_wmb() between the first two writes should be the right tool to ensure that the command does not observe the command update while the data update is still sat somewhere in a CPU write buffer. If you want a slightly stronger notion that, at a given point, all prior writes have actually been issued and should now be visible (rather than just that they won't become visible in the wrong order whenever they do), then wmb() should suffice on arm64. Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM for a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can still be write-buffered, so barriers still matter even when actual cache maintenance ops don't (and as before if you're trying to perform cache maintenance outside the DMA API then you've already lost anyway). MMIO registers should be mapped as Device memory via ioremap(), which is not bufferable, hence the barrier implicit in writel() effectively pushes out any prior buffered writes ahead of a register write, which is why we don't need to worry about this most of the time. This is only a very rough overview, though, and I'm not familiar enough with x86 semantics, your hardware, or the exact use-case to be able to say whether barriers alone are anywhere near the right answer or not. Robin. +Matt Roper Matt, Lucas, any feed back here? On 2022-03-02 4:49 a.m., Robin Murphy wrote: On 2022-02-25 19:27, Michael Cheng wrote: Hi Robin, [ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] * Thanks for adding the arm64 maintainer and sorry I didn't rope them in sooner. Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? * Also thanks for pointing this out. Initially I was using dcache_clean_inval_poc, which seem to be the equivalently to what x86 is doing for dcache flushing, but it was giving me build errors since its not on the global list of kernel symbols. And after revisiting the documentation for caches_clean_inval_pou, it won't fly for what we are trying to do. Moving forward, what would you (or someone in the ARM community) suggest we do? Could it be possible to export dcache_clean_inval_poc as a global symbol? Unlikely, unless something with a legitimate need for CPU-centric cache maintenance like kexec or CPU hotplug ever becomes modular. In the case of a device driver, it's not even the basic issues of assuming to find direct equivalents to x86 semantics in other CPU architectures, or effectively reinventing parts of the DMA API, it's even bigger than that. Once you move from being integrated in a single vendor's system architecture to being on a discrete card, you fundamentally *no longer have any control over cache coherency*. Whether the host CPU architecture happens to be AArch64, RISC-V, or whatever doesn't really matter, you're at the mercy of 3rd-party PCIe and interconnect IP vendors, and SoC integrators. You'll find yourself in systems where PCIe simply cannot snoop any caches, where you'd better have the correct DMA API calls in place to have any hope of even the most basic functionality working properly; you'll find yourself in systems where even if the PCIe root complex claims to support No Snoop, your uncached traffic will still end up snooping stale data that got prefetched back into caches you thought you'd invalidated; you'll find yourself in systems where your memory attributes may or may not get forcibly rewritten by an IOMMU depending on the kernel config and/or command line. It's not about simply finding a substitute for clflush, it's that the reasons you have for using clflush in the first place can no longer be assumed to be valid. Robin. On 2022-02
Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)
On 2021-09-10 11:09, Guchun Chen wrote: Vendor will define their own memory types on top of TTM_PL_PRIV, but call ttm_set_driver_manager directly without checking mem_type value when setting up memory manager. So add such check to aware the case when array bounds. v2: lower check level to WARN_ON Signed-off-by: Leslie Shi Signed-off-by: Guchun Chen --- include/drm/ttm/ttm_device.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index 07d722950d5b..aa79953c807c 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type) static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type, struct ttm_resource_manager *manager) { + WARN_ON(type >= TTM_NUM_MEM_TYPES); Nit: I know nothing about this code, but from the context alone it would seem sensible to do if (WARN_ON(type >= TTM_NUM_MEM_TYPES)) return; to avoid making the subsequent assignment when we *know* it's invalid and likely to corrupt memory. Robin. bdev->man_drv[type] = manager; }
Re: [PATCH v2] drm/cma-helper: Set VM_DONTEXPAND for mmap
Hi Daniel, On 2021-10-13 18:08, Daniel Vetter wrote: On Wed, Oct 13, 2021 at 10:36:54AM -0400, Alyssa Rosenzweig wrote: From: Robin Murphy drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc() will end up calling remap_pfn_range() (which happens to set the relevant vma flag, among others), so in order to make sure expectations around VM_DONTEXPAND are met, let it explicitly set the flag like most other GEM mmap implementations do. This avoids repeated warnings on a small minority of systems where the display is behind an IOMMU, and has a simple driver which does not override drm_gem_cma_default_funcs. Arm hdlcd is an in-tree affected driver. Out-of-tree, the Apple DCP driver is affected; this fix is required for DCP to be mainlined. How/where does this warn? In drm_gem_mmap_obj(). Also there should be a lot more drivers than just these two which have an iommu for the display block, so this not working is definitely a more wide-spread issue. As the commit message implies, all those other drivers appear to end up using different mmap() implementations one way or another. Once I'd eventually figured it out, it didn't surprise me that the combination of a trivially-dumb display with an IOMMU is an oddball corner-case. -Daniel Signed-off-by: Robin Murphy Reviewed-and-tested-by: Alyssa Rosenzweig Alyssa - thanks for reviving this BTW, I'd forgotten all about it! - for future reference the clunky olde-worlde version of reassigning an MR to yourself is to add your sign-off to the end of the block (in addition to any review tags you may have previously given or choose to add) to note that you've chosen to take on responsibility for the patch[1]. FWIW I'm also partial to the practice of adding a little note in between if you've made any tweaks, e.g. "[alyssa: clarify affected drivers]", but that's often more of a personal choice. Cheers, Robin. [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin --- drivers/gpu/drm/drm_gem_cma_helper.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index d53388199f34..63e48d98263d 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -510,6 +510,7 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) */ vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node); vma->vm_flags &= ~VM_PFNMAP; + vma->vm_flags |= VM_DONTEXPAND; cma_obj = to_drm_gem_cma_obj(obj); -- 2.30.2
Re: Phyr Starter
On 2022-01-20 15:27, Keith Busch wrote: On Thu, Jan 20, 2022 at 02:56:02PM +0100, Christoph Hellwig wrote: - on the input side to dma mapping the bio_vecs (or phyrs) are chained as bios or whatever the containing structure is. These already exist and have infrastructure at least in the block layer - on the output side I plan for two options: 1) we have a sane IOMMU and everyting will be coalesced into a single dma_range. This requires setting the block layer merge boundary to match the IOMMU page size, but that is a very good thing to do anyway. It doesn't look like IOMMU page sizes are exported, or even necessarily consistently sized on at least one arch (power). FWIW POWER does its own thing separate from the IOMMU API. For all the regular IOMMU API players, page sizes are published in the iommu_domain that the common iommu-dma layer operates on. In fact it already uses them to pick chunk sizes for composing large buffer allocations. Robin.
Re: [PATCH] [RFC] fbcon: Add option to enable legacy hardware acceleration
On 2022-01-25 19:12, Helge Deller wrote: Add a config option CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION to enable bitblt and fillrect hardware acceleration in the framebuffer console. If disabled, such acceleration will not be used, even if it is supported by the graphics hardware driver. If you plan to use DRM as your main graphics output system, you should disable this option since it will prevent compiling in code which isn't used later on when DRM takes over. For all other configurations, e.g. if none of your graphic cards support DRM (yet), DRM isn't available for your architecture, or you can't be sure that the graphic card in the target system will support DRM, you most likely want to enable this option. This patch is the first RFC. Independed of this patch I did some timing experiments with a qemu virtual machine running a PA-RISC Debian Linux installation with a screen resolution of 2048x1024 with 8bpp. In that case qemu emulated the graphics hardware bitblt and fillrect acceleration by using the native (x86) CPU. It was a simple testcase which was to run "time dmesg", where the syslog had 284 lines. The results showed a huge speedup: a) time dmesg (without acceleration): -> 19.0 seconds b) time dmesg (with acceleration): -> 2.6 seconds Signed-off-by: Helge Deller diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig index 840d9813b0bc..da84d1c21c21 100644 --- a/drivers/video/console/Kconfig +++ b/drivers/video/console/Kconfig @@ -78,6 +78,17 @@ config FRAMEBUFFER_CONSOLE help Low-level framebuffer-based console driver. +config FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION + bool "Framebuffer Console hardware acceleration support" + depends on FRAMEBUFFER_CONSOLE + default y if !DRM + default y if !(X86 || ARM) You probably mean ARM64 there, if you're trying to encapsulate "modern systems where this really isn't relevant". Some supported ARM platforms do date back to the days of weird and wonderful fbdev hardware. Conversely I recently cobbled an ancient PCI VGA card into an arm64 machine and was slightly disappointed that there didn't seem to be any driver that was usable straight off :) (Yes, I might give v86d + uvesafb a go at some point...) Robin. + help + If you use a system on which DRM is fully supported you usually want to say N, + otherwise you probably want to enable this option. + If enabled the framebuffer console will utilize the hardware acceleration + of your graphics card by using hardware bitblt and fillrect features. + config FRAMEBUFFER_CONSOLE_DETECT_PRIMARY bool "Map the console to the primary display device" depends on FRAMEBUFFER_CONSOLE diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index b813985f1403..f7b7d35953e8 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -1136,11 +1136,13 @@ static void fbcon_init(struct vc_data *vc, int init) ops->graphics = 0; +#ifdef CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION if ((cap & FBINFO_HWACCEL_COPYAREA) && !(cap & FBINFO_HWACCEL_DISABLED)) p->scrollmode = SCROLL_MOVE; else /* default to something safe */ p->scrollmode = SCROLL_REDRAW; +#endif /* * ++guenther: console.c:vc_allocate() relies on initializing @@ -1705,7 +1707,7 @@ static bool fbcon_scroll(struct vc_data *vc, unsigned int t, unsigned int b, count = vc->vc_rows; if (logo_shown >= 0) goto redraw_up; - switch (p->scrollmode) { + switch (fb_scrollmode(p)) { case SCROLL_MOVE: fbcon_redraw_blit(vc, info, p, t, b - t - count, count); @@ -1795,7 +1797,7 @@ static bool fbcon_scroll(struct vc_data *vc, unsigned int t, unsigned int b, count = vc->vc_rows; if (logo_shown >= 0) goto redraw_down; - switch (p->scrollmode) { + switch (fb_scrollmode(p)) { case SCROLL_MOVE: fbcon_redraw_blit(vc, info, p, b - 1, b - t - count, -count); @@ -1946,12 +1948,12 @@ static void fbcon_bmove_rec(struct vc_data *vc, struct fbcon_display *p, int sy, height, width); } -static void updatescrollmode(struct fbcon_display *p, +static void updatescrollmode_accel(struct fbcon_display *p, struct fb_info *info, struct vc_data *vc) { +#ifdef CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION struct fbcon_ops *ops = info->fbcon_par; -
Re: [PATCH 21/27] arm64: dts: rockchip: rk356x: Add HDMI nodes
On 2022-01-26 16:04, Peter Geis wrote: On Wed, Jan 26, 2022 at 9:58 AM Sascha Hauer wrote: Add support for the HDMI port found on RK3568. Signed-off-by: Sascha Hauer --- arch/arm64/boot/dts/rockchip/rk356x.dtsi | 37 +++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/rk356x.dtsi b/arch/arm64/boot/dts/rockchip/rk356x.dtsi index 4008bd666d01..e38fb223e9b8 100644 --- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi @@ -10,7 +10,6 @@ #include #include #include -#include #include / { @@ -502,6 +501,42 @@ vop_mmu: iommu@fe043e00 { status = "disabled"; }; + hdmi: hdmi@fe0a { + compatible = "rockchip,rk3568-dw-hdmi"; + reg = <0x0 0xfe0a 0x0 0x2>; + interrupts = ; + clocks = <&cru PCLK_HDMI_HOST>, +<&cru CLK_HDMI_SFR>, +<&cru CLK_HDMI_CEC>, +<&pmucru CLK_HDMI_REF>, +<&cru HCLK_VOP>; + clock-names = "iahb", "isfr", "cec", "ref", "hclk"; + pinctrl-names = "default"; + pinctrl-0 = <&hdmitx_scl &hdmitx_sda &hdmitxm0_cec>; I looked into CEC support here, and it seems that it does work with one change. Please add the two following lines to your patch: assigned-clocks = <&cru CLK_HDMI_CEC>; assigned-clock-rates = <32768>; The issue is the clk_rtc32k_frac clock that feeds clk_rtc_32k which feeds clk_hdmi_cec is 24mhz at boot, which is too high for CEC to function. Wouldn't it make far more sense to just stick a suitable clk_set_rate() call in the driver? AFAICS it's already explicitly aware of the CEC clock. Robin. + power-domains = <&power RK3568_PD_VO>; + reg-io-width = <4>; + rockchip,grf = <&grf>; + #sound-dai-cells = <0>; + status = "disabled"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + hdmi_in: port@0 { + reg = <0>; + #address-cells = <1>; + #size-cells = <0>; + }; + + hdmi_out: port@1 { + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + }; + }; + }; + qos_gpu: qos@fe128000 { compatible = "rockchip,rk3568-qos", "syscon"; reg = <0x0 0xfe128000 0x0 0x20>; -- 2.30.2 ___ Linux-rockchip mailing list linux-rockc...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip
Re: [PATCH 21/27] arm64: dts: rockchip: rk356x: Add HDMI nodes
On 2022-01-26 18:44, Peter Geis wrote: On Wed, Jan 26, 2022 at 12:56 PM Robin Murphy wrote: On 2022-01-26 16:04, Peter Geis wrote: On Wed, Jan 26, 2022 at 9:58 AM Sascha Hauer wrote: Add support for the HDMI port found on RK3568. Signed-off-by: Sascha Hauer --- arch/arm64/boot/dts/rockchip/rk356x.dtsi | 37 +++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/rk356x.dtsi b/arch/arm64/boot/dts/rockchip/rk356x.dtsi index 4008bd666d01..e38fb223e9b8 100644 --- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi @@ -10,7 +10,6 @@ #include #include #include -#include #include / { @@ -502,6 +501,42 @@ vop_mmu: iommu@fe043e00 { status = "disabled"; }; + hdmi: hdmi@fe0a { + compatible = "rockchip,rk3568-dw-hdmi"; + reg = <0x0 0xfe0a 0x0 0x2>; + interrupts = ; + clocks = <&cru PCLK_HDMI_HOST>, +<&cru CLK_HDMI_SFR>, +<&cru CLK_HDMI_CEC>, +<&pmucru CLK_HDMI_REF>, +<&cru HCLK_VOP>; + clock-names = "iahb", "isfr", "cec", "ref", "hclk"; + pinctrl-names = "default"; + pinctrl-0 = <&hdmitx_scl &hdmitx_sda &hdmitxm0_cec>; I looked into CEC support here, and it seems that it does work with one change. Please add the two following lines to your patch: assigned-clocks = <&cru CLK_HDMI_CEC>; assigned-clock-rates = <32768>; The issue is the clk_rtc32k_frac clock that feeds clk_rtc_32k which feeds clk_hdmi_cec is 24mhz at boot, which is too high for CEC to function. Wouldn't it make far more sense to just stick a suitable clk_set_rate() call in the driver? AFAICS it's already explicitly aware of the CEC clock. This is handled purely in the drivers/gpu/drm/bridge/synopsys/dw-hdmi.c driver, so I'm hesitant to touch it there as it would affect all users, not just Rockchip. I'd have a strong hunch that it's a standard thing for the DesignWare IP and not affected by platform integration. I don't have the magical Synopsys databook, but between the trusty old i.MX6 manual and most of the other in-tree DTs getting their dw-hdmi "cec" clock from suspiciously-obviously-named sources, I'd be somewhat surprised if it was ever anything other than 32KHz. Robin. Could someone familiar with the dw-hdmi IP weigh in on the minimum and maximum clock rate the CEC block can handle? Robin. + power-domains = <&power RK3568_PD_VO>; + reg-io-width = <4>; + rockchip,grf = <&grf>; + #sound-dai-cells = <0>; + status = "disabled"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + hdmi_in: port@0 { + reg = <0>; + #address-cells = <1>; + #size-cells = <0>; + }; + + hdmi_out: port@1 { + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + }; + }; + }; + qos_gpu: qos@fe128000 { compatible = "rockchip,rk3568-qos", "syscon"; reg = <0x0 0xfe128000 0x0 0x20>; -- 2.30.2 ___ Linux-rockchip mailing list linux-rockc...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip
Re: [RFC PATCH] component: Add common helpers for compare/release functions
On 2022-01-28 08:11, Yong Wu wrote: [...] diff --git a/include/linux/component.h b/include/linux/component.h index 16de18f473d7..5a7468ea827c 100644 --- a/include/linux/component.h +++ b/include/linux/component.h @@ -2,6 +2,8 @@ #ifndef COMPONENT_H #define COMPONENT_H +#include +#include #include @@ -82,6 +84,22 @@ struct component_master_ops { void (*unbind)(struct device *master); }; +/* A set common helpers for compare/release functions */ +static inline int compare_of(struct device *dev, void *data) +{ + return dev->of_node == data; +} Note that this is effectively just device_match_of_node(), although I guess there is an argument that having a nice consistent set of component_match API helpers might be worth more than a tiny code saving by borrowing one from a different API. Either way, however, I don't think there's any good argument for instantiating separate copies of these functions in every driver that uses them. If they're used as callbacks then they can't actually be inlined anyway, so they may as well be exported from component.c as normal so that the code really is shared (plus then there's nice symmetry with the aforementioned device_match API helpers too). Thanks, Robin. +static inline void release_of(struct device *dev, void *data) +{ + of_node_put(data); +} + +static inline int compare_dev(struct device *dev, void *data) +{ + return dev == data; +} + void component_master_del(struct device *, const struct component_master_ops *); diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c index eff200a07d9f..992132cbfb9f 100644 --- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -4417,16 +4417,6 @@ static const struct component_master_ops wcd938x_comp_ops = { .unbind = wcd938x_unbind, }; -static int wcd938x_compare_of(struct device *dev, void *data) -{ - return dev->of_node == data; -} - -static void wcd938x_release_of(struct device *dev, void *data) -{ - of_node_put(data); -} - static int wcd938x_add_slave_components(struct wcd938x_priv *wcd938x, struct device *dev, struct component_match **matchptr) @@ -4442,8 +4432,7 @@ static int wcd938x_add_slave_components(struct wcd938x_priv *wcd938x, } of_node_get(wcd938x->rxnode); - component_match_add_release(dev, matchptr, wcd938x_release_of, - wcd938x_compare_of, wcd938x->rxnode); + component_match_add_release(dev, matchptr, release_of, compare_of, wcd938x->rxnode); wcd938x->txnode = of_parse_phandle(np, "qcom,tx-device", 0); if (!wcd938x->txnode) { @@ -4451,8 +4440,7 @@ static int wcd938x_add_slave_components(struct wcd938x_priv *wcd938x, return -ENODEV; } of_node_get(wcd938x->txnode); - component_match_add_release(dev, matchptr, wcd938x_release_of, - wcd938x_compare_of, wcd938x->txnode); + component_match_add_release(dev, matchptr, release_of, compare_of, wcd938x->txnode); return 0; }
Re: [Intel-gfx] [PATCH] drm/i915: Use per device iommu check
On 2021-11-10 09:35, Tvrtko Ursulin wrote: On 10/11/2021 07:25, Lu Baolu wrote: On 2021/11/10 1:35, Tvrtko Ursulin wrote: On 09/11/2021 17:19, Lucas De Marchi wrote: On Tue, Nov 09, 2021 at 12:17:59PM +, Tvrtko Ursulin wrote: From: Tvrtko Ursulin On igfx + dgfx setups, it appears that intel_iommu=igfx_off option only disables the igfx iommu. Stop relying on global intel_iommu_gfx_mapped and probe presence of iommu domain per device to accurately reflect its status. nice, I was just starting to look into thus but for another reason: we are adding support for other archs, like aarch64, and the global from here was a problem Yes I realized the other iommu angle as well. To do this properly we need to sort the intel_vtd_active call sites into at least two buckets - which are truly about VT-d and which are just IOMMU. For instance the THP decision in i915_gemfs.co would be "are we behind any iommu". Some other call sites are possibly only about the bugs in the igfx iommu. Not sure if there is a third bucket for any potential differences between igfx iommu and other Intel iommu in case of dgfx. I'd like to hear from Baolu as well to confirm if intel_iommu driver is handling igfx + dgfx correctly in respect to the two global variables I mention in the commit message. I strongly agree that the drivers should call the IOMMU interface directly for portability. For Intel graphic driver, we have two issues: #1) driver asks vt-d driver for identity map with intel_iommu=igfx_off. #2) driver query the status with a global intel_iommu_gfx_mapped. We need to solve these two problems step by step. This patch is definitely a good start point. (I should have really consolidated the thread, but never mind now.) You mean good starting point for the discussion or between your first and second email you started thinking it may even work? Because as I wrote in the other email, it appears to work. But I fully accept it may be by accident and you may suggest a proper API to be added to the IOMMU core, which I would then be happy to use. The "proper" API at the moment is device_iommu_mapped(), but indeed that only answers the "is this device connected to an IOMMU at all?" question, it doesn't tell you whether that IOMMU is translating or just bypassing. If translation only really matters in terms of DMA API usage - i.e. you're not interested in using the IOMMU API directly - then I reckon it would be fairly reasonable to use dma_get_ops() to look at whether you've got dma-direct or iommu_dma_ops. At the moment that's sufficient to tell you whether your DMA is translated or not. If a more formal interface is wanted in future, I'm inclined to think that it would still belong at the DMA API level, since the public IOMMU API is really all about explicit translation, whereas what we care about here is reflected more in its internal interaction with the DMA APIs. If maybe not immediately, perhaps we could start with this patch and going forward add something more detailed. Like for instance allowing us to query the name/id of the iommu driver in case i915 needs to apply different quirks across them? Not sure how feasible that would be, but at the moment the need does sound plausible to me. FWIW I'd be wary of trying to get that clever - with iGFX it's easy to make fine-grained decisions because you've got a known and fixed integration with a particular IOMMU, but once you get out into the wider world you'll run into not only multiple different IOMMU implementations behind the same driver, but even the exact same IOMMU IP having different characteristics in different SoCs. Even something as seemingly-innocuous as an "is it worth using 2MB large pages?" quirk list could effectively become the cross product of various kernel config options and all PCIe-capable SoCs in existence. Cheers, Robin.
Re: [PATCH] drm/i915: Use per device iommu check
On 2021-11-10 14:11, Tvrtko Ursulin wrote: On 10/11/2021 12:35, Lu Baolu wrote: On 2021/11/10 20:08, Tvrtko Ursulin wrote: On 10/11/2021 12:04, Lu Baolu wrote: On 2021/11/10 17:30, Tvrtko Ursulin wrote: On 10/11/2021 07:12, Lu Baolu wrote: Hi Tvrtko, On 2021/11/9 20:17, Tvrtko Ursulin wrote: From: Tvrtko Ursulin On igfx + dgfx setups, it appears that intel_iommu=igfx_off option only disables the igfx iommu. Stop relying on global intel_iommu_gfx_mapped and probe presence of iommu domain per device to accurately reflect its status. Signed-off-by: Tvrtko Ursulin Cc: Lu Baolu --- Baolu, is my understanding here correct? Maybe I am confused by both intel_iommu_gfx_mapped and dmar_map_gfx being globals in the intel_iommu driver. But it certainly appears the setup can assign some iommu ops (and assign the discrete i915 to iommu group) when those two are set to off. diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e967cd08f23e..9fb38a54f1fe 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1763,26 +1763,27 @@ static inline bool run_as_guest(void) #define HAS_D12_PLANE_MINIMIZATION(dev_priv) (IS_ROCKETLAKE(dev_priv) || \ IS_ALDERLAKE_S(dev_priv)) -static inline bool intel_vtd_active(void) +static inline bool intel_vtd_active(struct drm_i915_private *i915) { -#ifdef CONFIG_INTEL_IOMMU - if (intel_iommu_gfx_mapped) + if (iommu_get_domain_for_dev(i915->drm.dev)) return true; -#endif /* Running as a guest, we assume the host is enforcing VT'd */ return run_as_guest(); } Have you verified this change? I am afraid that iommu_get_domain_for_dev() always gets a valid iommu domain even intel_iommu_gfx_mapped == 0. Yes it seems to work as is: default: # grep -i iommu /sys/kernel/debug/dri/*/i915_capabilities /sys/kernel/debug/dri/0/i915_capabilities:iommu: enabled /sys/kernel/debug/dri/1/i915_capabilities:iommu: enabled intel_iommu=igfx_off: # grep -i iommu /sys/kernel/debug/dri/*/i915_capabilities /sys/kernel/debug/dri/0/i915_capabilities:iommu: disabled /sys/kernel/debug/dri/1/i915_capabilities:iommu: enabled On my system dri device 0 is integrated graphics and 1 is discrete. The drm device 0 has a dedicated iommu. When the user request igfx not mapped, the VT-d implementation will turn it off to save power. But for shared iommu, you definitely will get it enabled. Sorry I am not following, what exactly do you mean? Is there a platform with integrated graphics without a dedicated iommu, in which case intel_iommu=igfx_off results in intel_iommu_gfx_mapped == 0 and iommu_get_domain_for_dev returning non-NULL? Your code always work for an igfx with a dedicated iommu. This might be always true on today's platforms. But from driver's point of view, we should not make such assumption. For example, if the iommu implementation decides not to turn off the graphic iommu (perhaps due to some hw quirk or for graphic virtualization), your code will be broken. If I got it right, this would go back to your earlier recommendation to have the check look like this: static bool intel_vtd_active(struct drm_i915_private *i915) { struct iommu_domain *domain; domain = iommu_get_domain_for_dev(i915->drm.dev); if (domain && (domain->type & __IOMMU_DOMAIN_PAGING)) return true; ... This would be okay as a first step? Elsewhere in the thread Robin suggested looking at the dec->dma_ops and comparing against iommu_dma_ops. These two solution would be effectively the same? Effectively, yes. See iommu_setup_dma_ops() - the only way to end up with iommu_dma_ops is if a managed translation domain is present; if the IOMMU is present but the default domain type has been set to passthrough (either globally or forced for the given device) it will do nothing and leave you with dma-direct, while if the IOMMU has been ignored entirely then it should never even be called. Thus it neatly encapsulates what you're after here. Cheers, Robin.
Re: [PATCH v1 00/12] drm/rockchip: RK356x VOP2 support
On 2021-11-22 17:47, Alex Bee wrote: Am 22.11.21 um 09:10 schrieb Sascha Hauer: Hi Alex, On Mon, Nov 22, 2021 at 12:18:47AM +0100, Alex Bee wrote: Hi Sascha, Am 17.11.21 um 15:33 schrieb Sascha Hauer: This series adds initial graphics support for the Rockchip RK356[68] SoCs. Graphics support is based around the VOP2 controller which replaces the VOP controller found on earlier Rockchip SoCs. The driver has been tested with HDMI support included in this series and MIPI-DSI which is not included because it needs some more work. The driver is taken from the downstream Rockchip kernel and heavily polished, most non standard features have been removed for now. I tested the driver with the libdrm modetest utility and also with weston with both pixman and panfrost driver support. Michael Riesch reported the driver to work on the RK3566 as well, but device tree support for this SoC is not yet included in this series. The HDMI changes are based on patches from Benjamin Gaignard, but modified a bit as I found out that the HDMI port on the RK3568 only needs one additional clock, not two. Also I added regulator support which is needed to get the HDMI up on the rk3568-EVB board. All review and testing feedback welcome thanks for working on that - it's very (very,very) much appreciated. It took me some time to figure it out: It seems rk3568-iommu driver s broken - I did only get "white noise" when using it alongside vop (similar like it was reported here before). However: removing the iommu-property from vop makes it working for me with HDMI output on quartz64 as well. Could you check if you have the iommu driver in kernel enabled if it works for you, if the property is present in DT? (I used 5.16-rc1 + this series + [0]). I have the iommu driver enabled and it works for me. I get this during boot: [0.263287] rockchip-vop2 fe04.vop: Adding to iommu group 0 So I expect it is indeed used. Also vop mmu seems to have the power-domain missing in your series (same as downstream) - however adding that doesn't help much currently. Probably the power domain gets enabled anyway when the VOP is activated, so adding it to the iommu won't help anything. Nevertheless it seems correct to add the property, I'll do so in the next round. As a sidenote: I verfied this with using Ezequiel's vpu addtion for RK356x: It did only work when removing the iommu there as well (getting tons of page faults otherwise) - so iommu driver really seems to broken, at least for RK3566. (Or I'm a missing a option in kernel config, which wasn't required for the older iommu version?) I don't think so. I started from defconfig and disabled other architectures and unneeded drivers, but I did not enable anything specific to iommu. I've found out now that I can make it work with iommu, by limiting the available memory to something below 4G (I have a 8G board). So there is something wrong in the driver or somewhere in memory mapping, iommu api (since it works when using CMA), ... however: it does clearly not relate to your patch. FWIW it doesn't surprise me that there might still be bugs lurking in the IOMMU driver's relatively recent changes for packing 40-bit physical addresses into 32-bit pagetable entries and registers - that sort of thing is always tricky to get right. You're correct that that's something that wants debugging in its own right, though. Robin.
[PATCH 1/9] gpu: host1x: Add missing DMA API include
Host1x seems to be relying on picking up dma-mapping.h transitively from iova.h, which has no reason to include it in the first place. Fix the former issue before we totally break things by fixing the latter one. CC: Thierry Reding CC: Mikko Perttunen CC: dri-devel@lists.freedesktop.org CC: linux-te...@vger.kernel.org Signed-off-by: Robin Murphy --- Feel free to pick this into drm-misc-next or drm-misc-fixes straight away if that suits - it's only to avoid a build breakage once the rest of the series gets queued. Robin. drivers/gpu/host1x/bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c index 218e3718fd68..881fad5c3307 100644 --- a/drivers/gpu/host1x/bus.c +++ b/drivers/gpu/host1x/bus.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include -- 2.28.0.dirty
Re: [PATCH 1/9] gpu: host1x: Add missing DMA API include
On 2021-11-23 14:10, Robin Murphy wrote: Host1x seems to be relying on picking up dma-mapping.h transitively from iova.h, which has no reason to include it in the first place. Fix the former issue before we totally break things by fixing the latter one. CC: Thierry Reding CC: Mikko Perttunen CC: dri-devel@lists.freedesktop.org CC: linux-te...@vger.kernel.org Signed-off-by: Robin Murphy --- Feel free to pick this into drm-misc-next or drm-misc-fixes straight away if that suits - it's only to avoid a build breakage once the rest of the series gets queued. Bah, seems like tegra-vic needs the same treatment too, but wasn't in my local config. Should I squash that into a respin of this patch on the grounds of being vaguely related, or would you prefer it separate? (Either way I'll wait a little while to see if the buildbots uncover any more...) Cheers, Robin. drivers/gpu/host1x/bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c index 218e3718fd68..881fad5c3307 100644 --- a/drivers/gpu/host1x/bus.c +++ b/drivers/gpu/host1x/bus.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include
[PATCH] drm/tegra: vic: Fix DMA API misuse
Upon failure, dma_alloc_coherent() returns NULL. If that does happen, passing some uninitialised stack contents to dma_mapping_error() - which belongs to a different API in the first place - has precious little chance of detecting it. Also include the correct header, because the fragile transitive inclusion currently providing it is going to break soon. Fixes: 20e7dce255e9 ("drm/tegra: Remove memory allocation from Falcon library") Signed-off-by: Robin Murphy --- It also doesn't appear to handle failure of the tegra_drm_alloc() path either, but that's a loose thread I have no desire to pull on... ;) --- drivers/gpu/drm/tegra/vic.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index c02010ff2b7f..da4af5371991 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -5,6 +5,7 @@ #include #include +#include #include #include #include @@ -232,10 +233,8 @@ static int vic_load_firmware(struct vic *vic) if (!client->group) { virt = dma_alloc_coherent(vic->dev, size, &iova, GFP_KERNEL); - - err = dma_mapping_error(vic->dev, iova); - if (err < 0) - return err; + if (!virt) + return -ENOMEM; } else { virt = tegra_drm_alloc(tegra, size, &iova); } -- 2.28.0.dirty
Re: [PATCH] drm/i915: Use per device iommu check
On 2021-11-25 10:42, Tvrtko Ursulin wrote: From: Tvrtko Ursulin With both integrated and discrete Intel GPUs in a system, the current global check of intel_iommu_gfx_mapped, as done from intel_vtd_active() may not be completely accurate. In this patch we add i915 parameter to intel_vtd_active() in order to prepare it for multiple GPUs and we also change the check away from Intel specific intel_iommu_gfx_mapped (global exported by the Intel IOMMU driver) to probing the presence of IOMMU domain on a specific device using iommu_get_domain_for_dev(). FWIW the way you have it now is functionally equivalent to using device_iommu_mapped(), which I think might be slightly clearer for the current intent, but I don't have a significantly strong preference (after all, this *was* the de-facto way of checking before device_iommu_mapped() was introduced, and there are still other examples of it around). So from the IOMMU perspective, Acked-by: Robin Murphy Perhaps the AGP driver could also be tweaked and intel_iommu_gfx_mapped cleaned away entirely, but I'll leave that for Baolu to think about :) Cheers, Robin. It was suggested to additionally check for __IOMMU_DOMAIN_PAGING bit present in the returned iommu domain, however I opted not to do that at this point. Checking for this flag would detect whether IOMMU is in address translation mode, with the assumption that is the only relevant question. Downside to that is that in identity mapping (pass-through) mode IOMMU hardware is still active, sitting on the communication path, just not doing address translation. My rationale was, that for the many intel_vtd_active() checks in our code base, while some clearly are about performance impact of address translation, some may be about working around functional issues when the IOMMU hardware is simply being active. There also may be some performance impact in pass-through mode, but I have not specifically attempted to measure it. Therefore the safest option feels to be to keep intel_vtd_active() answering the question of "is the IOMMU hardware active" for this device. If in the future we want to expand the set of questions to "is IOMMU active and doing address translation" we can easily do that by adding a new helper to be called from appropriate sites. v2: * Check for dmar translation specifically, not just iommu domain. (Baolu) v3: * Go back to plain "any domain" check for now, rewrite commit message. Signed-off-by: Tvrtko Ursulin Cc: Lu Baolu Cc: Lucas De Marchi Cc: Robin Murphy --- drivers/gpu/drm/i915/display/intel_bw.c | 2 +- drivers/gpu/drm/i915/display/intel_display.c | 2 +- drivers/gpu/drm/i915/display/intel_fbc.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 2 +- drivers/gpu/drm/i915/gem/i915_gemfs.c| 2 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 4 ++-- drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_driver.c | 7 +++ drivers/gpu/drm/i915/i915_drv.h | 13 +++-- drivers/gpu/drm/i915/i915_gpu_error.c| 5 + drivers/gpu/drm/i915/intel_device_info.c | 14 +- drivers/gpu/drm/i915/intel_pm.c | 2 +- 12 files changed, 25 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index abec394f6869..2da4aacc956b 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -634,7 +634,7 @@ static unsigned int intel_bw_data_rate(struct drm_i915_private *dev_priv, for_each_pipe(dev_priv, pipe) data_rate += bw_state->data_rate[pipe]; - if (DISPLAY_VER(dev_priv) >= 13 && intel_vtd_active()) + if (DISPLAY_VER(dev_priv) >= 13 && intel_vtd_active(dev_priv)) data_rate = data_rate * 105 / 100; return data_rate; diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index b2d51cd79d6c..1ef77ba7f645 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -1293,7 +1293,7 @@ static bool needs_async_flip_vtd_wa(const struct intel_crtc_state *crtc_state) { struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev); - return crtc_state->uapi.async_flip && intel_vtd_active() && + return crtc_state->uapi.async_flip && intel_vtd_active(i915) && (DISPLAY_VER(i915) == 9 || IS_BROADWELL(i915) || IS_HASWELL(i915)); } diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c index d0c34bc3af6c..614e8697c068 100644 --- a/drivers/gpu/drm/i915/display/intel_fbc.c +++ b/drivers/gpu/drm/i915/display/intel_fbc.c @@ -1677,7 +1677,7 @@ static int intel_sa
Re: [PATCH] dma_heap: use sg_table.orig_nents in sg_table release flow
On 2021-11-25 12:46, guangming@mediatek.com wrote: From: Guangming Use (sg_table.orig_nents) rather than (sg_table.nents) to traverse sg_table to free sg_table. Use (sg_table.nents) maybe will casuse some pages can't be freed. ...and this sort of bug is precisely why we have the for_each_sgtable_sg() helper ;) Robin. Signed-off-by: Guangming --- drivers/dma-buf/heaps/system_heap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index 23a7e74ef966..ce10d4eb674c 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf) int i; table = &buffer->sg_table; - for_each_sg(table->sgl, sg, table->nents, i) { + for_each_sg(table->sgl, sg, table->orig_nents, i) { struct page *page = sg_page(sg); __free_pages(page, compound_order(page));
Re: [PATCH v2] dma_heap: use for_each_sgtable_sg in sg_table release flow
On 2021-11-25 13:49, guangming@mediatek.com wrote: From: Guangming Use (for_each_sgtable_sg) rather than (for_each_sg) to traverse sg_table to free sg_table. Use (for_each_sg) maybe will casuse some pages can't be freed when send wrong nents number. It's still worth spelling out that this is fixing a bug where the current code should have been using table->orig_nents - it's just that switching to the sgtable helper is the best way to make the fix, since it almost entirely removes the possibility of making that (previously rather common) mistake. If it helps, for the change itself: Reviewed-by: Robin Murphy Thanks, Robin. Signed-off-by: Guangming --- drivers/dma-buf/heaps/system_heap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index 23a7e74ef966..8660508f3684 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf) int i; table = &buffer->sg_table; - for_each_sg(table->sgl, sg, table->nents, i) { + for_each_sgtable_sg(table, sg, i) { struct page *page = sg_page(sg); __free_pages(page, compound_order(page));
Re: [PATCH v2 3/3] drm/etnaviv: use a 32 bit mask as coherent DMA mask
Sorry I missed this earlier... On 2021-09-07 17:49, Michael Walle wrote: The STLB and the first command buffer (which is used to set up the TLBs) has a 32 bit size restriction in hardware. There seems to be no way to specify addresses larger than 32 bit. Keep it simple and restict the addresses to the lower 4 GiB range for all coherent DMA memory allocations. Please note, that platform_device_alloc() will initialize dev->dma_mask to point to pdev->platform_dma_mask, thus dma_set_mask() will work as expected. While at it, move the dma_mask setup code to the of_dma_configure() to keep all the DMA setup code next to each other. Suggested-by: Lucas Stach Signed-off-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_drv.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c index 54eb653ca295..0b756ecb1bc2 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c @@ -613,6 +613,24 @@ static int etnaviv_pdev_probe(struct platform_device *pdev) component_match_add(dev, &match, compare_str, names[i]); } + /* +* PTA and MTLB can have 40 bit base addresses, but +* unfortunately, an entry in the MTLB can only point to a +* 32 bit base address of a STLB. Moreover, to initialize the +* MMU we need a command buffer with a 32 bit address because +* without an MMU there is only an indentity mapping between +* the internal 32 bit addresses and the bus addresses. +* +* To make things easy, we set the dma_coherent_mask to 32 +* bit to make sure we are allocating the command buffers and +* TLBs in the lower 4 GiB address space. +*/ + if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(40)) || + dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) { Since AFAICS you're not changing the default dma_mask pointer to point to some storage other than the coherent mask, the dma_set_mask() call effectively does nothing and both masks will end up reading back as 32 bits. Robin. + dev_dbg(&pdev->dev, "No suitable DMA available\n"); + return -ENODEV; + } + /* * Apply the same DMA configuration to the virtual etnaviv * device as the GPU we found. This assumes that all Vivante @@ -671,8 +689,6 @@ static int __init etnaviv_init(void) of_node_put(np); goto unregister_platform_driver; } - pdev->dev.coherent_dma_mask = DMA_BIT_MASK(40); - pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask; ret = platform_device_add(pdev); if (ret) {
Re: [PATCH v2 3/3] drm/etnaviv: use a 32 bit mask as coherent DMA mask
On 2021-12-01 13:41, Lucas Stach wrote: Hi Robin, Am Mittwoch, dem 01.12.2021 um 12:50 + schrieb Robin Murphy: Sorry I missed this earlier... On 2021-09-07 17:49, Michael Walle wrote: The STLB and the first command buffer (which is used to set up the TLBs) has a 32 bit size restriction in hardware. There seems to be no way to specify addresses larger than 32 bit. Keep it simple and restict the addresses to the lower 4 GiB range for all coherent DMA memory allocations. Please note, that platform_device_alloc() will initialize dev->dma_mask to point to pdev->platform_dma_mask, thus dma_set_mask() will work as expected. While at it, move the dma_mask setup code to the of_dma_configure() to keep all the DMA setup code next to each other. Suggested-by: Lucas Stach Signed-off-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_drv.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c index 54eb653ca295..0b756ecb1bc2 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c @@ -613,6 +613,24 @@ static int etnaviv_pdev_probe(struct platform_device *pdev) component_match_add(dev, &match, compare_str, names[i]); } + /* +* PTA and MTLB can have 40 bit base addresses, but +* unfortunately, an entry in the MTLB can only point to a +* 32 bit base address of a STLB. Moreover, to initialize the +* MMU we need a command buffer with a 32 bit address because +* without an MMU there is only an indentity mapping between +* the internal 32 bit addresses and the bus addresses. +* +* To make things easy, we set the dma_coherent_mask to 32 +* bit to make sure we are allocating the command buffers and +* TLBs in the lower 4 GiB address space. +*/ + if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(40)) || + dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) { Since AFAICS you're not changing the default dma_mask pointer to point to some storage other than the coherent mask, the dma_set_mask() call effectively does nothing and both masks will end up reading back as 32 bits. From what I can see the dma_mask has allocated storage in the platform device and does not point to the coherent dma mask, see setup_pdev_dma_masks(). Urgh, apologies for the confusion - seems I had one of those mental short-circuits and was utterly convinced that that's what the platform device setup did, but of course it's only the fallback case in of_dma_configure(). Sorry for the noise! Cheers, Robin.
Re: [PATCH v2 1/3] dt-bindings: gpu: mali-bifrost: Document RZ/G2L support
On 2021-12-06 15:00, Biju Das wrote: The Renesas RZ/G2{L, LC} SoC (a.k.a R9A07G044) has a Bifrost Mali-G31 GPU, add a compatible string for it. Signed-off-by: Biju Das Reviewed-by: Lad Prabhakar --- v1->v2: * Updated minItems for resets as 2 * Documented optional property reset-names * Documented reset-names as required property for RZ/G2L SoC. --- .../bindings/gpu/arm,mali-bifrost.yaml| 39 ++- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml index 6f98dd55fb4c..c3b2f4ddd520 100644 --- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml +++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml @@ -19,6 +19,7 @@ properties: - amlogic,meson-g12a-mali - mediatek,mt8183-mali - realtek,rtd1619-mali + - renesas,r9a07g044-mali - rockchip,px30-mali - rockchip,rk3568-mali - const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully discoverable @@ -27,19 +28,30 @@ properties: maxItems: 1 interrupts: +minItems: 3 items: - description: Job interrupt - description: MMU interrupt - description: GPU interrupt + - description: EVENT interrupt I believe we haven't bothered with the "Event" interrupt so far since there's no real meaningful use for it - it seems the downstream binding for Arm's kbase driver doesn't mention it either. interrupt-names: +minItems: 3 items: - const: job - const: mmu - const: gpu + - const: event clocks: -maxItems: 1 +minItems: 1 +maxItems: 3 + + clock-names: +items: + - const: gpu + - const: bus + - const: bus_ace Note that the Bifrost GPUs themselves all only have a single external clock and reset (unexcitingly named "CLK" and "RESETn" respectively, FWIW). I can't help feeling wary that defining additional names for vendor integration details in the core binding may quickly grow into a mess of mutually-incompatible sets of values, for no great benefit. At the very least, it would seem more sensible to put them in the SoC-specific conditional schemas. Robin. mali-supply: true @@ -52,7 +64,14 @@ properties: maxItems: 3 resets: -maxItems: 2 +minItems: 2 +maxItems: 3 + + reset-names: +items: + - const: rst + - const: axi_rst + - const: ace_rst "#cooling-cells": const: 2 @@ -113,6 +132,22 @@ allOf: - sram-supply - power-domains - power-domain-names + - if: + properties: +compatible: + contains: +const: renesas,r9a07g044-mali +then: + properties: +interrupt-names: + minItems: 4 +clock-names: + minItems: 3 + required: +- clock-names +- power-domains +- resets +- reset-names else: properties: power-domains:
Re: [PATCH 04/12] drm/rockchip: dw_hdmi: add regulator support
On 2021-11-17 14:33, Sascha Hauer wrote: The RK3568 has HDMI_TX_AVDD0V9 and HDMI_TX_AVDD_1V8 supply inputs needed for the HDMI port. add support for these to the driver for boards which have them supplied by switchable regulators. Signed-off-by: Sascha Hauer --- .../display/rockchip/rockchip,dw-hdmi.yaml| 6 ++ drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 58 ++- 2 files changed, 61 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml index 53fa42479d5b7..293b2cfbf739f 100644 --- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml @@ -28,6 +28,12 @@ properties: reg-io-width: const: 4 + avdd-0v9-supply: +description: A 0.9V supply that powers up the SoC internal circuitry. + + avdd-1v8-supply: +description: A 0.9V supply that powers up the SoC internal circuitry. + clocks: minItems: 2 items: diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c index 29608c25e2d0e..b8fe56c89cdc9 100644 --- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -83,6 +84,8 @@ struct rockchip_hdmi { struct clk *vpll_clk; struct clk *grf_clk; struct dw_hdmi *hdmi; + struct regulator *avdd_0v9; + struct regulator *avdd_1v8; struct phy *phy; }; @@ -222,6 +225,22 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi *hdmi) hdmi->vpll_clk = hdmi->clks[RK_HDMI_CLK_VPLL].clk; hdmi->grf_clk = hdmi->clks[RK_HDMI_CLK_GRF].clk; + hdmi->avdd_0v9 = devm_regulator_get_optional(hdmi->dev, "avdd-0v9"); These are clearly *not* optional, unless the HDMI block is magic and can still work without physical power. Use devm_regulator_get(), and if the real supply is missing from the DT for whatever reason you should get a dummy regulator back, which you can then successfully disable and enable without all the conditional mess. Robin. + if (IS_ERR(hdmi->avdd_0v9)) { + if (PTR_ERR(hdmi->avdd_0v9) != -ENODEV) + return PTR_ERR(hdmi->avdd_0v9); + + hdmi->avdd_0v9 = NULL; + } + + hdmi->avdd_1v8 = devm_regulator_get_optional(hdmi->dev, "avdd-1v8"); + if (IS_ERR(hdmi->avdd_1v8)) { + if (PTR_ERR(hdmi->avdd_1v8) != -ENODEV) + return PTR_ERR(hdmi->avdd_1v8); + + hdmi->avdd_1v8 = NULL; + } + return 0; } @@ -559,11 +578,27 @@ static int dw_hdmi_rockchip_bind(struct device *dev, struct device *master, return ret; } + if (hdmi->avdd_0v9) { + ret = regulator_enable(hdmi->avdd_0v9); + if (ret) { + DRM_DEV_ERROR(hdmi->dev, "failed to enable avdd0v9: %d\n", ret); + goto err_avdd_0v9; + } + } + + if (hdmi->avdd_1v8) { + ret = regulator_enable(hdmi->avdd_1v8); + if (ret) { + DRM_DEV_ERROR(hdmi->dev, "failed to enable avdd1v8: %d\n", ret); + goto err_avdd_1v8; + } + } + ret = clk_bulk_prepare_enable(RK_HDMI_NCLOCKS_HDMI, hdmi->clks); if (ret) { DRM_DEV_ERROR(hdmi->dev, "Failed to enable HDMI vpll: %d\n", ret); - return ret; + goto err_clk; } if (hdmi->chip_data == &rk3568_chip_data) { @@ -587,10 +622,21 @@ static int dw_hdmi_rockchip_bind(struct device *dev, struct device *master, */ if (IS_ERR(hdmi->hdmi)) { ret = PTR_ERR(hdmi->hdmi); - drm_encoder_cleanup(encoder); - clk_bulk_disable_unprepare(RK_HDMI_NCLOCKS_HDMI, hdmi->clks); + goto err_bind; } + return 0; + +err_bind: + clk_bulk_disable_unprepare(RK_HDMI_NCLOCKS_HDMI, hdmi->clks); + drm_encoder_cleanup(encoder); +err_clk: + if (hdmi->avdd_1v8) + regulator_disable(hdmi->avdd_1v8); +err_avdd_1v8: + if (hdmi->avdd_0v9) + regulator_disable(hdmi->avdd_0v9); +err_avdd_0v9: return ret; } @@ -601,6 +647,12 @@ static void dw_hdmi_rockchip_unbind(struct device *dev, struct device *master, dw_hdmi_unbind(hdmi->hdmi); clk_bulk_disable_unprepare(RK_HDMI_NCLOCKS_HDMI, hdmi->clks); + + if (hdmi->avdd_1v8) + regulator_disable(hdmi->avdd_1v8); + + if (hdmi->avdd_0v9) + regulator_disable(hdmi->avdd_0v9); } static const struct component_ops dw_hdmi_rockchip_ops = {
Re: [PATCH 08/18] dt-bindings: display: rockchip: dw-hdmi: Add regulator support
On 2021-12-08 15:12, Sascha Hauer wrote: Signed-off-by: Sascha Hauer --- .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml index 2ab6578033da2..b9dca49aa6e05 100644 --- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml @@ -28,6 +28,12 @@ properties: reg-io-width: const: 4 + avdd-0v9-supply: +description: A 0.9V supply that powers up the SoC internal circuitry. Might be worth calling out the actual pin name so it's abundantly clear for DT authors cross-referencing schematics. Annoyingly, some SoCs have HDMI_AVDD_1V0 instead of HDMI_AVDD_0V9 - I'm not sure it's worth splitting hairs that far in terms of the property name itself, but I'll leave that for others to decide. + avdd-1v8-supply: +description: A 1.8V supply that powers up the SoC internal circuitry. At least HDMI_AVDD_1V8 seems more consistent. Thanks, Robin. + clocks: minItems: 2 items:
[PATCH v2 02/11] gpu: host1x: Add missing DMA API include
Host1x seems to be relying on picking up dma-mapping.h transitively from iova.h, which has no reason to include it in the first place. Fix the former issue before we totally break things by fixing the latter one. CC: Thierry Reding CC: Mikko Perttunen CC: dri-devel@lists.freedesktop.org Signed-off-by: Robin Murphy --- v2: No change drivers/gpu/host1x/bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c index 218e3718fd68..881fad5c3307 100644 --- a/drivers/gpu/host1x/bus.c +++ b/drivers/gpu/host1x/bus.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include -- 2.28.0.dirty
[PATCH v2 03/11] drm/tegra: vic: Fix DMA API misuse
Upon failure, dma_alloc_coherent() returns NULL. If that does happen, passing some uninitialised stack contents to dma_mapping_error() - which belongs to a different API in the first place - has precious little chance of detecting it. Also include the correct header, because the fragile transitive inclusion currently providing it is going to break soon. Fixes: 20e7dce255e9 ("drm/tegra: Remove memory allocation from Falcon library") CC: Thierry Reding CC: Mikko Perttunen CC: dri-devel@lists.freedesktop.org Signed-off-by: Robin Murphy --- It also doesn't appear to handle failure of the tegra_drm_alloc() path either, but that's a loose thread I have no desire to pull on... ;) v2: Resend as part of the series, originally posted separately here: https://lore.kernel.org/dri-devel/2703882439344010e33bf21ecd63cf9e5e6dc00d.1637781007.git.robin.mur...@arm.com/ drivers/gpu/drm/tegra/vic.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index c02010ff2b7f..da4af5371991 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -5,6 +5,7 @@ #include #include +#include #include #include #include @@ -232,10 +233,8 @@ static int vic_load_firmware(struct vic *vic) if (!client->group) { virt = dma_alloc_coherent(vic->dev, size, &iova, GFP_KERNEL); - - err = dma_mapping_error(vic->dev, iova); - if (err < 0) - return err; + if (!virt) + return -ENOMEM; } else { virt = tegra_drm_alloc(tegra, size, &iova); } -- 2.28.0.dirty
Re: [PATCH v2 0/8] Host1x context isolation support
On 2021-11-08 10:36, Mikko Perttunen wrote: On 9/16/21 5:32 PM, Mikko Perttunen wrote: Hi all, *** New in v2: Added support for Tegra194 Use standard iommu-map property instead of custom mechanism *** this series adds support for Host1x 'context isolation'. Since when programming engines through Host1x, userspace can program in any addresses it wants, we need some way to isolate the engines' memory spaces. Traditionally this has either been done imperfectly with a single shared IOMMU domain, or by copying and verifying the programming command stream at submit time (Host1x firewall). Since Tegra186 there is a privileged (only usable by kernel) Host1x opcode that allows setting the stream ID sent by the engine to the SMMU. So, by allocating a number of context banks and stream IDs for this purpose, and using this opcode at the beginning of each job, we can implement isolation. Due to the limited number of context banks only each process gets its own context, and not each channel. This feature also allows sharing engines among multiple VMs when used with Host1x's hardware virtualization support - up to 8 VMs can be configured with a subset of allowed stream IDs, enforced at hardware level. To implement this, this series adds a new host1x context bus, which will contain the 'struct device's corresponding to each context bank / stream ID, changes to device tree and SMMU code to allow registering the devices and using the bus, as well as the Host1x stream ID programming code and support in TegraDRM. Device tree bindings are not updated yet pending consensus that the proposed changes make sense. Thanks, Mikko Mikko Perttunen (8): gpu: host1x: Add context bus gpu: host1x: Add context device management code gpu: host1x: Program context stream ID on submission iommu/arm-smmu: Attach to host1x context device bus arm64: tegra: Add Host1x context stream IDs on Tegra186+ drm/tegra: falcon: Set DMACTX field on DMA transactions drm/tegra: vic: Implement get_streamid_offset drm/tegra: Support context isolation arch/arm64/boot/dts/nvidia/tegra186.dtsi | 12 ++ arch/arm64/boot/dts/nvidia/tegra194.dtsi | 12 ++ drivers/gpu/Makefile | 3 +- drivers/gpu/drm/tegra/drm.h | 2 + drivers/gpu/drm/tegra/falcon.c | 8 + drivers/gpu/drm/tegra/falcon.h | 1 + drivers/gpu/drm/tegra/submit.c | 13 ++ drivers/gpu/drm/tegra/uapi.c | 34 - drivers/gpu/drm/tegra/vic.c | 38 + drivers/gpu/host1x/Kconfig | 5 + drivers/gpu/host1x/Makefile | 2 + drivers/gpu/host1x/context.c | 174 ++ drivers/gpu/host1x/context.h | 27 drivers/gpu/host1x/context_bus.c | 31 drivers/gpu/host1x/dev.c | 12 +- drivers/gpu/host1x/dev.h | 2 + drivers/gpu/host1x/hw/channel_hw.c | 52 ++- drivers/gpu/host1x/hw/host1x06_hardware.h | 10 ++ drivers/gpu/host1x/hw/host1x07_hardware.h | 10 ++ drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 ++ include/linux/host1x.h | 21 +++ include/linux/host1x_context_bus.h | 15 ++ 22 files changed, 488 insertions(+), 9 deletions(-) create mode 100644 drivers/gpu/host1x/context.c create mode 100644 drivers/gpu/host1x/context.h create mode 100644 drivers/gpu/host1x/context_bus.c create mode 100644 include/linux/host1x_context_bus.h IOMMU/DT folks, any thoughts about this approach? The patches that are of interest outside of Host1x/TegraDRM specifics are patches 1, 2, 4, and 5. FWIW it looks fairly innocuous to me. I don't understand host1x - neither hardware nor driver abstractions - well enough to meaningfully review it all (e.g. maybe it's deliberate that the bus .dma_configure method isn't used?), but the SMMU patch seems fine given the Kconfig solution to avoid module linkage problems. Cheers, Robin.
Re: [PATCH v3] drm: rockchip: hdmi: enable higher resolutions than FHD
On 2020-12-14 11:03, Vicente Bergas wrote: On Tue, Dec 1, 2020 at 5:06 PM Vicente Bergas wrote: This patch enables a QHD HDMI monitor to work at native resolution. Please, anybody? Has anyone been able to validate this on other SoCs? I guess that's still the main concern - empirically I've found that clock rates that work perfectly on RK3399 can be glitchy on RK3288 to the point of being barely usable, and can result in no display at all on chips not using the Synopsys phy, like RK3328, where the HDMI driver considers the mode valid but later the phy driver is unable to match it. I don't have access to a QHD display to test this myself; I've only played around with weirder 4:3, 5:4 and 16:10 modes on PC monitors. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem
On 2020-12-17 10:25, Christian König wrote: Am 17.12.20 um 02:07 schrieb Chen Li: On Wed, 16 Dec 2020 22:19:11 +0800, Christian König wrote: Am 16.12.20 um 14:48 schrieb Chen Li: On Wed, 16 Dec 2020 15:59:37 +0800, Christian König wrote: [SNIP] Hi, Christian. I'm not sure why this change is a hack here. I cannot see the problem and wll be grateful if you give more explainations. __memset is supposed to work on those addresses, otherwise you can't use the e8860 on your arm64 system. If __memset is supposed to work on those adresses, why this commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5f&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HhWxUaLo3WpzoV6hjV%2BG1HICaIOXwsoNpzv5tNMNg8A%3D&reserved=0) is needed? (I also notice drm/radeon didn't take this change though) just out of curiosity. We generally accept those patches as cleanup in the kernel with the hope that we can find a way to work around the userspace restrictions. But when you also have this issue in userspace then there isn't much we can do for you. Replacing the the direct write in the kernel with calls to writel() or memset_io() will fix that temporary, but you have a more general problem here. I cannot see what's the more general problem here :( u mean performance? No, not performance. See standards like OpenGL, Vulkan as well as VA-API and VDPAU require that you can mmap() device memory and execute memset/memcpy on the memory from userspace. If your ARM base board can't do that for some then you can't use the hardware with that board. If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based systems I believe it should be able to mmap() to userspace with the Normal memory type, where unaligned accesses and such are allowed, as opposed to the Device memory type intended for MMIO mappings, which has more restrictions but stricter ordering guarantees. Regardless of what happens elsewhere though, if something is mapped *into the kernel* with ioremap(), then it is fundamentally wrong per the kernel memory model to reference that mapping directly without using I/O accessors. That is not specific to any individual architecture, and Sparse should be screaming about it already. I guess in this case the UVD code needs to pay more attention to whether radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not. (I'm assuming the initial fault was memset() with 0 trying to perform "DC ZVA" on a Device-type mapping from ioremap() - FYI a stacktrace on its own without the rest of the error dump showing what actually triggered it isn't overly useful) Robin. For amdgpu I suggest that we allocate the UVD message in GTT instead of VRAM since we don't have the hardware restriction for that on the new generations. Thanks, I will try to dig into deeper. But what's the "hardware restriction" meaning here? I'm not familiar with video driver stack and amd gpu, sorry. On older hardware (AGP days) the buffer had to be in VRAM (MMIO) memory, but on modern system GTT (system memory) works as well. IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 is in amdgpu's table. But I cannot tell whether e8860 has iommu, and I cannot find iommu from lspci, so graphics translation table may not work here? That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. This is implemented using GTT, e.g. the VM page tables inside the GPU. And yes it should work I will prepare a patch for it. BTW: How does userspace work on arm64 then? The driver stack usually only works if mmio can be mapped directly. I also post two usespace issue on mesa, and you may be interested with them: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3954&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZR7pDS%2BCLUuMjCeKcMAXfHtbczt8WdUwSeLZCuHfCHw%3D&reserved=0 https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3951&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274033344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jAJo3aG2I1oIDTZXWhNgcKoKbd6tTdiAtc7vE4hJJPY%3D&reserved=0 I paste some virtual memory
Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem
On 2020-12-17 14:02, Christian König wrote: Am 17.12.20 um 14:45 schrieb Robin Murphy: On 2020-12-17 10:25, Christian König wrote: Am 17.12.20 um 02:07 schrieb Chen Li: On Wed, 16 Dec 2020 22:19:11 +0800, Christian König wrote: Am 16.12.20 um 14:48 schrieb Chen Li: On Wed, 16 Dec 2020 15:59:37 +0800, Christian König wrote: [SNIP] Hi, Christian. I'm not sure why this change is a hack here. I cannot see the problem and wll be grateful if you give more explainations. __memset is supposed to work on those addresses, otherwise you can't use the e8860 on your arm64 system. If __memset is supposed to work on those adresses, why this commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5f&data=04%7C01%7Cchristian.koenig%40amd.com%7C3551ae4972b044bb831608d8a291f81c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438095114292394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xns81uCGfN1tjsVn5LBU8QhmUinZRJQlXz8w%2FJ7%2FGTM%3D&reserved=0) is needed? (I also notice drm/radeon didn't take this change though) just out of curiosity. We generally accept those patches as cleanup in the kernel with the hope that we can find a way to work around the userspace restrictions. But when you also have this issue in userspace then there isn't much we can do for you. Replacing the the direct write in the kernel with calls to writel() or memset_io() will fix that temporary, but you have a more general problem here. I cannot see what's the more general problem here :( u mean performance? No, not performance. See standards like OpenGL, Vulkan as well as VA-API and VDPAU require that you can mmap() device memory and execute memset/memcpy on the memory from userspace. If your ARM base board can't do that for some then you can't use the hardware with that board. If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based systems I believe it should be able to mmap() to userspace with the Normal memory type, where unaligned accesses and such are allowed, as opposed to the Device memory type intended for MMIO mappings, which has more restrictions but stricter ordering guarantees. Do you have some background why some ARM boards fail with that? We had a couple of reports that memset/memcpy fail in userspace (usually system just spontaneously reboots or becomes unresponsive), but so far nobody could tell us why that happens? Part of it is that Arm doesn't really have an ideal memory type for mapping RAM behind PCI (much like we also struggle with the vague expectations of what write-combine might mean beyond x86). Device memory can be relaxed to allow gathering, reordering and write-buffering, but is still a bit too restrictive in other ways - aligned, non-speculative, etc. - for something that's really just RAM and expected to be usable as such. Thus to map PCI memory as "write-combine" we use Normal non-cacheable, which means the CPU MMU is going to allow software to do all the things it might expect of RAM, but we're now at the mercy of the menagerie of interconnects and PCI implementations out there. Atomic operations, for example, *might* be resolved by the CPU coherency mechanism or in the interconnect, such that the PCI host bridge only sees regular loads and stores, but more often than not they'll just result in an atomic transaction going all the way to the host bridge. A super-duper-clever host bridge implementation might even support that, but the vast majority are likely to just reject it as invalid. Similarly, unaligned accesses, cache line fills/evictions, and such will often work, since they're essentially just larger read/write bursts, but some host bridges can be picky and might reject access sizes they don't like (there's at least one where even 64-bit accesses don't work. On a 64-bit system...) If an invalid transaction does reach the host bridge, it's going to come back to the CPU as an external abort. If we're really lucky that could be taken synchronously, attributable to a specific instruction, and just oops/SIGBUS the relevant kernel/userspace thread. Often though, (particularly with big out-of-order CPUs) it's likely to be asynchronous and no longer attributable, and thus taken as an SError event, which in general roughly translates to "part of the SoC has fallen off". The only reasonable response we have to that is to panic the system. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem
On 2020-12-18 06:14, Chen Li wrote: [...] No, not performance. See standards like OpenGL, Vulkan as well as VA-API and VDPAU require that you can mmap() device memory and execute memset/memcpy on the memory from userspace. If your ARM base board can't do that for some then you can't use the hardware with that board. If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based systems I believe it should be able to mmap() to userspace with the Normal memory type, where unaligned accesses and such are allowed, as opposed to the Device memory type intended for MMIO mappings, which has more restrictions but stricter ordering guarantees. Hi, Robin. I cannot understand it allow unaligned accesses. prefetchable PCI bar should also be mmio, and accesses will end with device memory, so why does this allow unaligned access? Because even Device-GRE is a bit too restrictive to expose to userspace that's likely to expect it to behave as regular memory, so, for better or worse, we use MT_NORMAL_MC for pgrprot_writecombine(). Regardless of what happens elsewhere though, if something is mapped *into the kernel* with ioremap(), then it is fundamentally wrong per the kernel memory model to reference that mapping directly without using I/O accessors. That is not specific to any individual architecture, and Sparse should be screaming about it already. I guess in this case the UVD code needs to pay more attention to whether radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not. (I'm assuming the initial fault was memset() with 0 trying to perform "DC ZVA" on a Device-type mapping from ioremap() - FYI a stacktrace on its own without the rest of the error dump showing what actually triggered it isn't overly useful) Robin. why it may be 'DC ZVA'? I'm not sure the pc in initial kernel fault memset, but I capture the userspace crash pc: stp(128bit) or str with neon(also 128bit) to render node(/dev/dri/renderD128). As I said it was an assumption. I guessed at it being more likely to be an MMU fault than an external abort, and given the size and the fact that it's a variable initialisation guessed at it being slightly more likely to hit the ZVA special-case rather than being unaligned. Looking again, I guess starting at an odd-numbered 32-bit element might lead to an unaligned store of XZR, but either way it doesn't really matter - what it showed is it clearly *could* be an MMU fault because TTM seems to be a bit careless with iomem pointers. That said, if you're also getting external aborts from your host bridge not liking 128-bit transactions, then as Christian says you're probably going to have a bad time on this platform either way. Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem
On 2020-12-18 14:33, Christian König wrote: Am 18.12.20 um 15:17 schrieb Robin Murphy: On 2020-12-17 14:02, Christian König wrote: [SNIP] Do you have some background why some ARM boards fail with that? We had a couple of reports that memset/memcpy fail in userspace (usually system just spontaneously reboots or becomes unresponsive), but so far nobody could tell us why that happens? Part of it is that Arm doesn't really have an ideal memory type for mapping RAM behind PCI (much like we also struggle with the vague expectations of what write-combine might mean beyond x86). Device memory can be relaxed to allow gathering, reordering and write-buffering, but is still a bit too restrictive in other ways - aligned, non-speculative, etc. - for something that's really just RAM and expected to be usable as such. Thus to map PCI memory as "write-combine" we use Normal non-cacheable, which means the CPU MMU is going to allow software to do all the things it might expect of RAM, but we're now at the mercy of the menagerie of interconnects and PCI implementations out there. I see. As far as I know we already correctly map the RAM from the GPU as "write-combine". Atomic operations, for example, *might* be resolved by the CPU coherency mechanism or in the interconnect, such that the PCI host bridge only sees regular loads and stores, but more often than not they'll just result in an atomic transaction going all the way to the host bridge. A super-duper-clever host bridge implementation might even support that, but the vast majority are likely to just reject it as invalid. Support for atomics is actually specified by an PCIe extension. As far as I know that extension is even necessary for full KFD support on AMD and full Cuda support for NVidia GPUs. Similarly, unaligned accesses, cache line fills/evictions, and such will often work, since they're essentially just larger read/write bursts, but some host bridges can be picky and might reject access sizes they don't like (there's at least one where even 64-bit accesses don't work. On a 64-bit system...) This is breaking our neck here. We need 64bit writes on 64bit systems to end up as one 64bit write at the hardware and not two 32bit writes or otherwise the doorbells won't work correctly. Just to clarify, that particular case *is* considered catastrophically broken ;) In general you can assume that on AArch64, any aligned 64-bit load or store is atomic (64-bit accesses on 32-bit Arm are less well-defined, but hopefully nobody cares by now). Larger writes are pretty much unproblematic, for P2P our bus interface even supports really large multi byte transfers. If an invalid transaction does reach the host bridge, it's going to come back to the CPU as an external abort. If we're really lucky that could be taken synchronously, attributable to a specific instruction, and just oops/SIGBUS the relevant kernel/userspace thread. Often though, (particularly with big out-of-order CPUs) it's likely to be asynchronous and no longer attributable, and thus taken as an SError event, which in general roughly translates to "part of the SoC has fallen off". The only reasonable response we have to that is to panic the system. Yeah, that sounds exactly like what we see on some of the ARM boards out there. At least we have an explanation for that behavior now. Going to talk about this with our hardware engineers. We might be able to work around some of that stuff, but that is rather tricky to get working under those conditions. Yeah, unfortunately there's no easy way to judge the quality of any given SoC's PCI implementation until you throw your required traffic at it and things either break or don't... Cheers, Robin. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules
On 2020-12-22 00:44, Isaac J. Manjarres wrote: The SMMU driver depends on the availability of the ARM LPAE and ARM V7S io-pgtable format code to work properly. In preparation Nit: we don't really depend on v7s - we *can* use it if it's available, address constraints are suitable, and the SMMU implementation actually supports it (many don't), but we can still quite happily not use it even so. LPAE is mandatory in the architecture so that's our only hard requirement, embodied in the kconfig select. This does mean there may technically still be a corner case involving ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime failure rather than a build error, so unless and until anyone demonstrates that it actually matters I don't feel particularly inclined to give it much thought. Robin. for having the io-pgtable formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to ensure that the io-pgtable format modules are loaded before loading the ARM SMMU driver module. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfd..a72649f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); +MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s"); ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration
} -subsys_initcall(arm_lpae_do_selftests); +#else +static int __init arm_lpae_do_selftests(void) +{ + return 0; +} #endif + +static int __init arm_lpae_init(void) +{ + int ret, i; + + for (i = 0; i < ARRAY_SIZE(io_pgtable_arm_lpae_init_fns); i++) { + ret = io_pgtable_ops_register(&io_pgtable_arm_lpae_init_fns[i]); + if (ret < 0) { + pr_err("Failed to register ARM LPAE fmt: %d\n"); + goto err_io_pgtable_register; + } + } + + ret = arm_lpae_do_selftests(); + if (ret < 0) + goto err_io_pgtable_register; + + return 0; + +err_io_pgtable_register: + for (i = i - 1; i >= 0; i--) Personally I find "while (i--)" a bit clearer for this kind of unwinding, but maybe post-decrement isn't to everyone's taste. + io_pgtable_ops_unregister(&io_pgtable_arm_lpae_init_fns[i]); + return ret; +} +core_initcall(arm_lpae_init); + +static void __exit arm_lpae_exit(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(io_pgtable_arm_lpae_init_fns); i++) + io_pgtable_ops_unregister(&io_pgtable_arm_lpae_init_fns[i]); +} +module_exit(arm_lpae_exit); diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c index 94394c8..2c6eb2e 100644 --- a/drivers/iommu/io-pgtable.c +++ b/drivers/iommu/io-pgtable.c @@ -10,33 +10,45 @@ #include #include #include +#include +#include #include -static const struct io_pgtable_init_fns * -io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = { -#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE - [ARM_32_LPAE_S1] = &io_pgtable_arm_32_lpae_s1_init_fns, - [ARM_32_LPAE_S2] = &io_pgtable_arm_32_lpae_s2_init_fns, - [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns, - [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns, - [ARM_MALI_LPAE] = &io_pgtable_arm_mali_lpae_init_fns, -#endif -#ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S - [ARM_V7S] = &io_pgtable_arm_v7s_init_fns, -#endif +struct io_pgtable_init_fns_node { + struct io_pgtable_init_fns *fns; + struct list_head list; }; +static LIST_HEAD(io_pgtable_init_fns_list); +static DEFINE_RWLOCK(io_pgtable_init_fns_list_lock); + +static struct io_pgtable_init_fns *io_pgtable_get_init_fns(enum io_pgtable_fmt fmt) +{ + struct io_pgtable_init_fns_node *iter; + struct io_pgtable_init_fns *fns = NULL; + + read_lock(&io_pgtable_init_fns_list_lock); + list_for_each_entry(iter, &io_pgtable_init_fns_list, list) + if (iter->fns->fmt == fmt) { + fns = iter->fns; + break; + } + read_unlock(&io_pgtable_init_fns_list_lock); + + return fns; +} I think it would be a lot easier to stick with a simple array indexed by enum - that way you can just set/clear/test entries without needing to worry about locking. Basically just remove the const and the initialisers from the existing one ;) (and if you think you're concerned about memory, consider that just the list head plus lock is already half the size of the table) Other than that, I think this all looks pretty promising - I'd suggest sending a non-RFC after rc1 so that it gets everyone's proper attention. Thanks, Robin. + struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt, struct io_pgtable_cfg *cfg, void *cookie) { struct io_pgtable *iop; - const struct io_pgtable_init_fns *fns; + struct io_pgtable_init_fns *fns; if (fmt >= IO_PGTABLE_NUM_FMTS) return NULL; - fns = io_pgtable_init_table[fmt]; + fns = io_pgtable_get_init_fns(fmt); if (!fns) return NULL; @@ -59,12 +71,64 @@ EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops); void free_io_pgtable_ops(struct io_pgtable_ops *ops) { struct io_pgtable *iop; + struct io_pgtable_init_fns *fns; if (!ops) return; iop = io_pgtable_ops_to_pgtable(ops); io_pgtable_tlb_flush_all(iop); - io_pgtable_init_table[iop->fmt]->free(iop); + fns = io_pgtable_get_init_fns(iop->fmt); + if (fns) + fns->free(iop); } EXPORT_SYMBOL_GPL(free_io_pgtable_ops); + +int io_pgtable_ops_register(struct io_pgtable_init_fns *init_fns) +{ + struct io_pgtable_init_fns_node *iter, *fns_node; + int ret = 0; + + if (!init_fns || init_fns->fmt >= IO_PGTABLE_NUM_FMTS || + !init_fns->alloc || !init_fns->free) + return -EINVAL; + + fns_node = kzalloc(sizeof(*fns_node), GFP_KERNEL); + if (!fns_node) + return -ENOMEM; + + write_lock(&io_pgtable_init_fns_list_lock); + list_fo
Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules
On 2020-12-22 19:49, isa...@codeaurora.org wrote: On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The SMMU driver depends on the availability of the ARM LPAE and ARM V7S io-pgtable format code to work properly. In preparation Nit: we don't really depend on v7s - we *can* use it if it's available, address constraints are suitable, and the SMMU implementation actually supports it (many don't), but we can still quite happily not use it even so. LPAE is mandatory in the architecture so that's our only hard requirement, embodied in the kconfig select. This does mean there may technically still be a corner case involving ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime failure rather than a build error, so unless and until anyone demonstrates that it actually matters I don't feel particularly inclined to give it much thought. Robin. Okay, I'll fix up the commit message, as well as the code, so that it only depends on io-pgtable-arm. Well, IIUC it would make sense to keep the softdep for when the v7s module *is* present; I just wanted to clarify that it's more of a nice-to-have rather than a necessity. Robin. Thanks, Isaac for having the io-pgtable formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to ensure that the io-pgtable format modules are loaded before loading the ARM SMMU driver module. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfd..a72649f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); +MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s"); ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration
On 2020-12-22 19:54, isa...@codeaurora.org wrote: On 2020-12-22 11:27, Robin Murphy wrote: On 2020-12-22 00:44, Isaac J. Manjarres wrote: The io-pgtable code constructs an array of init functions for each page table format at compile time. This is not ideal, as this increases the footprint of the io-pgtable code, as well as prevents io-pgtable formats from being built as kernel modules. In preparation for modularizing the io-pgtable formats, switch to a dynamic registration scheme, where each io-pgtable format can register their init functions with the io-pgtable code at boot or module insertion time. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/io-pgtable-arm-v7s.c | 34 +- drivers/iommu/io-pgtable-arm.c | 90 ++-- drivers/iommu/io-pgtable.c | 94 -- include/linux/io-pgtable.h | 51 + 4 files changed, 209 insertions(+), 60 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac9..89aad2f 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -835,7 +836,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { +static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = { + .fmt = ARM_V7S, .alloc = arm_v7s_alloc_pgtable, .free = arm_v7s_free_pgtable, }; @@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void) pr_info("self test ok\n"); return 0; } -subsys_initcall(arm_v7s_do_selftests); +#else +static int arm_v7s_do_selftests(void) +{ + return 0; +} #endif + +static int __init arm_v7s_init(void) +{ + int ret; + + ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns); + if (ret < 0) { + pr_err("Failed to register ARM V7S format\n"); Super-nit: I think "v7s" should probably be lowercase there. Also general consistency WRT to showing the error code and whether or not to abbreviate "format" would be nice. Ok, I can fix this accordingly. + return ret; + } + + ret = arm_v7s_do_selftests(); + if (ret < 0) + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); + + return ret; +} +core_initcall(arm_v7s_init); + +static void __exit arm_v7s_exit(void) +{ + io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns); +} +module_exit(arm_v7s_exit); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58..ff0ea2f 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) return NULL; } -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { - .alloc = arm_64_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s1, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { - .alloc = arm_32_lpae_alloc_pgtable_s2, - .free = arm_lpae_free_pgtable, -}; - -struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { - .alloc = arm_mali_lpae_alloc_pgtable, - .free = arm_lpae_free_pgtable, +static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = { + { + .fmt = ARM_32_LPAE_S1, + .alloc = arm_32_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_32_LPAE_S2, + .alloc = arm_32_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_64_LPAE_S1, + .alloc = arm_64_lpae_alloc_pgtable_s1, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_64_LPAE_S2, + .alloc = arm_64_lpae_alloc_pgtable_s2, + .free = arm_lpae_free_pgtable, + }, + { + .fmt = ARM_MALI_LPAE, + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, + }, }; #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST @@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void) pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail); return fail ? -EFAULT : 0; } -subsys_initcall(arm_lpae_do_selftests); +#else +static int __init arm_lpae_d
[PATCH] drm/tegra: Add missing DMA API includes
Include dma-mapping.h directly in the remaining places that touch the DMA API, to avoid imminent breakage from refactoring other headers. Signed-off-by: Robin Murphy --- This makes arm64 allmodconfig build cleanly for me with the IOVA series, so hopefully is the last of it... drivers/gpu/drm/tegra/dc.c| 1 + drivers/gpu/drm/tegra/hub.c | 1 + drivers/gpu/drm/tegra/plane.c | 1 + 3 files changed, 3 insertions(+) diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index a29d64f87563..c33420e1fb07 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c index b910155f80c4..7d52a403334b 100644 --- a/drivers/gpu/drm/tegra/hub.c +++ b/drivers/gpu/drm/tegra/hub.c @@ -5,6 +5,7 @@ #include #include +#include #include #include #include diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c index 16a1cdc28657..75db069dd26f 100644 --- a/drivers/gpu/drm/tegra/plane.c +++ b/drivers/gpu/drm/tegra/plane.c @@ -3,6 +3,7 @@ * Copyright (C) 2017 NVIDIA CORPORATION. All rights reserved. */ +#include #include #include -- 2.28.0.dirty
Re: [PATCH 1/1] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain
On 2021-12-18 07:45, Lu Baolu wrote: The suported page sizes of an iommu_domain are saved in the pgsize_bitmap field. Retrieve the value from the right place. Fixes: 58fd9375c2c534 ("drm/nouveau/platform: probe IOMMU if present") ...except domain->pgsize_bitmap was introduced more than a year after that commit ;) As an improvement rather than a fix, though, Reviewed-by: Robin Murphy Signed-off-by: Lu Baolu --- drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c index d0d52c1d4aee..992cc285f2fe 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c @@ -133,7 +133,7 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev) * or equal to the system's PAGE_SIZE, with a preference if * both are equal. */ - pgsize_bitmap = tdev->iommu.domain->ops->pgsize_bitmap; + pgsize_bitmap = tdev->iommu.domain->pgsize_bitmap; if (pgsize_bitmap & PAGE_SIZE) { tdev->iommu.pgshift = PAGE_SHIFT; } else {
Re: [PATCH 1/2] drm/panfrost: mmu: improved memory attributes
On 2021-12-23 11:06, asheplya...@basealt.ru wrote: From: Alexey Sheplyakov T62x/T60x GPUs are known to not work with panfrost as of now. One of the reasons is wrong/incomplete memory attributes which the panfrost driver sets in the page tables: - MEMATTR_IMP_DEF should be 0x48ULL, not 0x88ULL. 0x88ULL is MEMATTR_OUTER_IMP_DEF I guess the macro could be renamed if anyone's particularly bothered, but using the outer-cacheable attribute is deliberate because it is necessary for I/O-coherent GPUs to work properly (and should be irrelevant for non-coherent integrations). I'm away from my Juno board for the next couple of weeks but I'm fairly confident that this change as-is will break cache snooping. - MEMATTR_FORCE_CACHE_ALL and MEMATTR_OUTER_WA are missing. They're "missing" because they're not used, and there's currently no mechanism by which they *could* be used. Also note that the indices in MEMATTR have to line up with the euqivalent LPAE indices for the closest match to what the IOMMU API's IOMMU_{CACHE,MMIO} flags represent, so moving those around is yet more subtle breakage. T72x and newer GPUs work just fine with such incomplete/wrong memory attributes. However T62x are quite picky and quickly lock up. To avoid the problem set the same memory attributes (in LPAE page tables) as mali_kbase. The patch has been tested (for regressions) with T860 GPU (rk3399 SoC). At the first glance (using GNOME desktop, running glmark) it does not cause any changes for this GPU. Note: this patch is necessary, but *not* enough to get panfrost working with T62x I'd note that panfrost has been working OK - to the extent that Mesa supports its older ISA - on the T624 (single core group) in Arm's Juno SoC for over a year now since commit 268af50f38b1. If you have to force outer non-cacheable to avoid getting translation faults and other errors that look like the GPU is inexplicably seeing the wrong data, I'd check whether you have the same thing where your integration is actually I/O-coherent and you're missing the "dma-coherent" property in your DT. Thanks, Robin. Signed-off-by: Alexey Sheplyakov Signed-off-by: Vadim V. Vlasov Cc: Rob Herring Cc: Tomeu Vizoso Cc: Steven Price Cc: Alyssa Rosenzweig Cc: Vadim V. Vlasov --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 --- drivers/iommu/io-pgtable-arm.c | 16 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 39562f2d11a4..2f4f8a17bc82 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -133,9 +133,6 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab)); mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab)); - /* Need to revisit mem attrs. -* NC is the default, Mali driver is inner WT. -*/ mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr)); mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr)); diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index dd9e47189d0d..15b39c337e20 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -122,13 +122,17 @@ #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 #define ARM_LPAE_MAIR_ATTR_IDX_DEV2 #define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE 3 +#define ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA 4 #define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0) #define ARM_MALI_LPAE_TTBR_READ_INNER BIT(2) #define ARM_MALI_LPAE_TTBR_SHARE_OUTERBIT(4) -#define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x88ULL -#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL +#define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x48ULL +#define ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL 0x4FULL +#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x4DULL +#define ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF 0x88ULL +#define ARM_MALI_LPAE_MEMATTR_OUTER_WA 0x8DULL #define APPLE_DART_PTE_PROT_NO_WRITE (1<<7) #define APPLE_DART_PTE_PROT_NO_READ (1<<8) @@ -1080,10 +1084,14 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) cfg->arm_mali_lpae_cfg.memattr = (ARM_MALI_LPAE_MEMATTR_IMP_DEF << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) | + (ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL +<< ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) | (ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) | - (ARM_MALI_LPAE_MEMATTR_IMP_DEF -<< ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); +
[PATCH] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"
[2.898845] [] ret_from_fork+0x10/0x40 [2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7) [2.898871] ---[ end trace d5713dcad023ee04 ]--- [2.89] Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b In a toss-up between the GPU seeing stale data artefacts on some systems vs. catastrophic kernel crashes on other systems, the latter would seem to take precedence, so revert this change until the real underlying problem can be fixed. Signed-off-by: Robin Murphy --- Alex, Ben, Dave, I know Alex was looking into this, but since we're nearly at -rc6 already it looks like the only thing to do for 4.6 is pick the lesser of two evils... Thanks, Robin. drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c index 18fab3973ce5..62ad0300cfa5 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c @@ -1614,7 +1614,7 @@ nvkm_device_pci_func = { .fini = nvkm_device_pci_fini, .resource_addr = nvkm_device_pci_resource_addr, .resource_size = nvkm_device_pci_resource_size, - .cpu_coherent = !IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_ARM64), + .cpu_coherent = !IS_ENABLED(CONFIG_ARM), }; int -- 2.8.1.dirty
[PATCH 1/4] drm/panfrost: Set DMA masks earlier
The DMA masks need to be set correctly before any DMA API activity kicks off, and the current point in panfrost_probe() is way too late in that regard. since panfrost_mmu_init() has already set up a live address space and DMA-mapped MMU pagetables. We can't set masks until we've queried the appropriate value from MMU_FEATURES, but as soon as reasonably possible after that should suffice. Signed-off-by: Robin Murphy --- drivers/gpu/drm/panfrost/panfrost_drv.c | 5 - drivers/gpu/drm/panfrost/panfrost_gpu.c | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index c06af78ab833..af0058ffc1e4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -3,8 +3,6 @@ /* Copyright 2019 Linaro, Ltd., Rob Herring */ /* Copyright 2019 Collabora ltd. */ -#include -#include #include #include #include @@ -388,9 +386,6 @@ static int panfrost_probe(struct platform_device *pdev) goto err_out0; } - dma_set_mask_and_coherent(pfdev->dev, - DMA_BIT_MASK(FIELD_GET(0xff00, pfdev->features.mmu_features))); - err = panfrost_devfreq_init(pfdev); if (err) { dev_err(&pdev->dev, "Fatal error during devfreq init\n"); diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index aceaf6e44a09..42511fc1fea0 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -2,8 +2,10 @@ /* Copyright 2018 Marty E. Plummer */ /* Copyright 2019 Linaro, Ltd., Rob Herring */ /* Copyright 2019 Collabora ltd. */ +#include #include #include +#include #include #include #include @@ -332,6 +334,9 @@ int panfrost_gpu_init(struct panfrost_device *pfdev) panfrost_gpu_init_features(pfdev); + dma_set_mask_and_coherent(pfdev->dev, + DMA_BIT_MASK(FIELD_GET(0xff00, pfdev->features.mmu_features))); + irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "gpu"); if (irq <= 0) return -ENODEV; -- 2.21.0.dirty ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 0/4] drm/panfrost: Misc. fixes and cleanups
Hi, These are a few trivial fixes and cleanups from playing with the panfrost kernel driver on an Arm Juno board. Not that anyone has ever cared much about the built-in GPU on Juno, but it's at least a somewhat interesting platform from the kernel driver perspective for having I/O coherency, RAM above 4GB, and DVFS abstracted behind a firmware interface. Robin. Robin Murphy (4): drm/panfrost: Set DMA masks earlier drm/panfrost: Disable PM on probe failure drm/panfrost: Don't scream about deferred probe drm/panfrost: Show stored feature registers drivers/gpu/drm/panfrost/panfrost_drv.c | 12 +--- drivers/gpu/drm/panfrost/panfrost_gpu.c | 19 --- 2 files changed, 17 insertions(+), 14 deletions(-) -- 2.21.0.dirty ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/4] drm/panfrost: Disable PM on probe failure
Make sure to disable runtime PM again if probe fails after we've enabled it. Otherwise, any subsequent attempt to re-probe starts triggering "Unbalanced pm_runtime_enable!" assertions from the driver core. Signed-off-by: Robin Murphy --- drivers/gpu/drm/panfrost/panfrost_drv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index af0058ffc1e4..a881e2346b55 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -405,6 +405,7 @@ static int panfrost_probe(struct platform_device *pdev) err_out1: panfrost_device_fini(pfdev); err_out0: + pm_runtime_disable(pfdev->dev); drm_dev_put(ddev); return err; } -- 2.21.0.dirty ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC 5/4] arm64: dts: juno: add GPU subsystem
Since we now have bindings for Mali Midgard GPUs, let's use them to describe Juno's GPU subsystem, if only because we can. Juno sports a Mali-T624 integrated behind an MMU-400 (as a gesture towards virtualisation), in their own dedicated power domain with DVFS controlled by the SCP. Signed-off-by: Robin Murphy --- Just in case anyone else is interested. Note that I've not been using this exact patch, since my Juno is running the new SCMI-based firmware which needs not-yet-upstream MHU changes, but this should in theory be the equivalent change for the upstream SCPI-based DT. .../bindings/gpu/arm,mali-midgard.txt | 1 + arch/arm64/boot/dts/arm/juno-base.dtsi| 25 +++ 2 files changed, 26 insertions(+) diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt index 18a2cde2e5f3..c17f8e96d1e6 100644 --- a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt +++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt @@ -16,6 +16,7 @@ Required properties: + "arm,mali-t880" * which must be preceded by one of the following vendor specifics: + "amlogic,meson-gxm-mali" ++ "arm,juno-mali" + "rockchip,rk3288-mali" + "rockchip,rk3399-mali" diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 995a7107cdd3..6edaf03620f9 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -35,6 +35,18 @@ clock-names = "apb_pclk"; }; + smmu_gpu: iommu@2b40 { + compatible = "arm,mmu-400", "arm,smmu-v1"; + reg = <0x0 0x2b40 0x0 0x1>; + interrupts = , +; + #iommu-cells = <1>; + #global-interrupts = <1>; + power-domains = <&scpi_devpd 3>; + dma-coherent; + status = "disabled"; + }; + smmu_pcie: iommu@2b50 { compatible = "arm,mmu-401", "arm,smmu-v1"; reg = <0x0 0x2b50 0x0 0x1>; @@ -487,6 +499,19 @@ }; }; + gpu: gpu@2d00 { + compatible = "arm,juno-mali", "arm,mali-t624"; + reg = <0 0x2d00 0 0x1>; + interrupts = , +, +; + interrupt-names = "gpu", "job", "mmu"; + clocks = <&scpi_dvfs 2>; + power-domains = <&scpi_devpd 3>; + dma-coherent; + status = "disabled"; + }; + sram: sram@2e00 { compatible = "arm,juno-sram-ns", "mmio-sram"; reg = <0x0 0x2e00 0x0 0x8000>; -- 2.21.0.dirty ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel