Re: [PATCH v1 4/7] ARM: dts: imx6dl-prtvt7: add TSC2046 touchscreen node

2021-01-21 Thread robin

On 2021-01-20 15:22, Oleksij Rempel wrote:

Add touchscreen support to the Protonic VT7 board.

Co-Developed-by: Robin van der Gracht 
Signed-off-by: Robin van der Gracht 
Signed-off-by: Oleksij Rempel 
---
 arch/arm/boot/dts/imx6dl-prtvt7.dts | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm/boot/dts/imx6dl-prtvt7.dts
b/arch/arm/boot/dts/imx6dl-prtvt7.dts
index d9cb1e41cc10..63ae2065834c 100644
--- a/arch/arm/boot/dts/imx6dl-prtvt7.dts
+++ b/arch/arm/boot/dts/imx6dl-prtvt7.dts
@@ -266,6 +266,26 @@ &ecspi2 {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_ecspi2>;
status = "okay";
+
+   touchscreen@0 {
+   compatible = "ti,tsc2046";
+   reg = <0>;
+   pinctrl-0 = <&pinctrl_tsc>;
+   pinctrl-names ="default";
+   spi-max-frequency = <10>;
+   interrupts-extended = <&gpio3 20 IRQ_TYPE_EDGE_FALLING>;
+   pendown-gpio = <&gpio3 20 GPIO_ACTIVE_LOW>;
+
+   touchscreen-inverted-x;
+   touchscreen-inverted-y;


Please remove both inverted properties since it's not inverted.


+   touchscreen-max-pressure = <4095>;
+
+   ti,vref-delay-usecs = /bits/ 16 <100>;
+   ti,x-plate-ohms = /bits/ 16 <800>;
+   ti,y-plate-ohms = /bits/ 16 <300>;
+
+   wakeup-source;
+   };
 };

 &i2c1 {


Robin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 12/12] auxdisplay: constify fb ops

2019-12-11 Thread robin

Hello Jani,

On 2019-12-09 15:03, Jani Nikula wrote:

On Tue, 03 Dec 2019, Jani Nikula  wrote:

Now that the fbops member of struct fb_info is const, we can start
making the ops const as well.

Cc: Miguel Ojeda Sandonis 
Cc: Robin van der Gracht 
Reviewed-by: Daniel Vetter 
Reviewed-by: Miguel Ojeda 
Acked-by: Robin van der Gracht 
Signed-off-by: Jani Nikula 


Miguel, Robin, just to err on the safe side, were you both okay with me
merging this through drm-misc? Not very likely to conflict badly.


ht16k33 driver hasn't seen much change lately, should be fine.

Robin
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 14/14] auxdisplay: constify fb ops

2019-12-02 Thread robin

On 2019-11-29 11:29, Jani Nikula wrote:

Now that the fbops member of struct fb_info is const, we can start
making the ops const as well.

Cc: Miguel Ojeda Sandonis 
Cc: Robin van der Gracht 
Signed-off-by: Jani Nikula 
---
 drivers/auxdisplay/cfag12864bfb.c | 2 +-
 drivers/auxdisplay/ht16k33.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/auxdisplay/cfag12864bfb.c
b/drivers/auxdisplay/cfag12864bfb.c
index 4074886b7bc8..2002291ab338 100644
--- a/drivers/auxdisplay/cfag12864bfb.c
+++ b/drivers/auxdisplay/cfag12864bfb.c
@@ -57,7 +57,7 @@ static int cfag12864bfb_mmap(struct fb_info *info,
struct vm_area_struct *vma)
return vm_map_pages_zero(vma, &pages, 1);
 }

-static struct fb_ops cfag12864bfb_ops = {
+static const struct fb_ops cfag12864bfb_ops = {
.owner = THIS_MODULE,
.fb_read = fb_sys_read,
.fb_write = fb_sys_write,
diff --git a/drivers/auxdisplay/ht16k33.c 
b/drivers/auxdisplay/ht16k33.c

index a2fcde582e2a..d951d54b26f5 100644
--- a/drivers/auxdisplay/ht16k33.c
+++ b/drivers/auxdisplay/ht16k33.c
@@ -228,7 +228,7 @@ static int ht16k33_mmap(struct fb_info *info,
struct vm_area_struct *vma)
return vm_map_pages_zero(vma, &pages, 1);
 }

-static struct fb_ops ht16k33_fb_ops = {
+static const struct fb_ops ht16k33_fb_ops = {
.owner = THIS_MODULE,
.fb_read = fb_sys_read,
.fb_write = fb_sys_write,


Acked-by: Robin van der Gracht 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 14/14] auxdisplay: constify fb ops

2019-12-02 Thread robin

On 2019-11-29 21:59, Miguel Ojeda wrote:

On Fri, Nov 29, 2019 at 9:30 PM Daniel Vetter  wrote:


Well we do have very small lcd display drivers in drm, and before that 
in
fbdev. And you have a fbdev framebuffer driver in there, which looks a 
bit

misplaced ...

Afaiui you also have some even tinier lcd drivers where you don't 
address
pixels, but just directly upload text, and those obviously don't fit 
into
drm/fbdev world. But anything where you can address pixels very much 
does.

-Daniel


The first driver in the category used fb.h. At the time (~13 years
ago) it was decided that the drivers should go into a different
category/folder instead and then the other were added.

In any case, I am removing the original ones since I cannot test them
anymore and there are likely no user. The only other fb user could be
relocated if Robin agrees.


The ht16k33 driver registers a framebuffer, backlight and input device.
Not sure if it's OK to let a driver under fbdev register all of those,
but relocating it is fine by me.

Robin

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA

2021-04-12 Thread Robin Murphy

On 2021-04-09 14:39, David Hildenbrand wrote:

On 09.04.21 15:35, Arnd Bergmann wrote:
On Fri, Apr 9, 2021 at 1:21 PM David Hildenbrand  
wrote:


Random drivers should not override a user configuration of core knobs
(e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA,
which depends on CMA, if possible; however, these drivers also have to
tolerate if DMA_CMA is not available/functioning, for example, if no CMA
area for DMA_CMA use has been setup via "cma=X". In the worst case, the
driver cannot do it's job properly in some configurations.

For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and 
DMA_CMA if

available") documents
 While this is no build dependency, etnaviv will only work 
correctly
 on most systems if CMA and DMA_CMA are enabled. Select both 
options
 if available to avoid users ending up with a non-working GPU 
due to

 a lacking kernel config.
So etnaviv really wants to have DMA_CMA, however, can deal with some 
cases

where it is not available.

Let's introduce WANT_DMA_CMA and use it in most cases where drivers
select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because
of recursive dependency issues).

We'll assume that any driver that selects DRM_GEM_CMA_HELPER or
DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible.

With this change, distributions can disable CONFIG_CMA or
CONFIG_DMA_CMA, without it silently getting enabled again by random
drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA
and CONFIG_DMA_CMA if they are unspecified and any driver is around that
selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or
DRM_KMS_CMA_HELPER.

For example, if any driver selects WANT_DMA_CMA and we do a
"make olddefconfig":

1. With "# CONFIG_CMA is not set" and no specification of
    "CONFIG_DMA_CMA"

-> CONFIG_DMA_CMA won't be part of .config

2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA

Contiguous Memory Allocator (CMA) [Y/n/?] (NEW)
DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW)

3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set"

-> CONFIG_DMA_CMA will be removed from .config

Note: drivers/remoteproc seems to be special; commit c51e882cd711
("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains 
that
there is a real dependency to DMA_CMA for it to work; leave that 
dependency

in place and don't convert it to a soft dependency.


I don't think this dependency is fundamentally different from the others,
though davinci machines tend to have less memory than a lot of the
other machines, so it's more likely to fail without CMA.



I was also unsure - and Lucas had similar thoughts. If you want, I can 
send a v4 also taking care of this.


TBH I think it should all just be removed. DMA_CMA is effectively an 
internal feature of the DMA API, and drivers which simply use the DMA 
API shouldn't really be trying to assume *how* things might be allocated 
at runtime - CMA is hardly the only way. Platform-level assumptions 
about the presence or not of IOMMUs, memory carveouts, etc., and whether 
it even matters - e.g. a device with a tiny LCD may only need display 
buffers which still fit in a regular MAX_ORDER allocation - could go in 
platform-specific configs, but I really don't think they belong at the 
generic subsystem level.


We already have various examples like I2S drivers that won't even probe 
without a dmaengine provider being present, or host controller drivers 
which are useless without their corresponding phy driver (and I'm 
guessing you can probably also do higher-level things like include the 
block layer but omit all filesystem drivers). I don't believe it's 
Kconfig's job to try to guess whether a given configuration is *useful*, 
only to enforce that's it's valid to build.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe

2021-04-21 Thread Robin Gong
On 2021/04/22 Lucas Stach  wrote:
> > But the timer of runtime_auto_suspend decide when enter runtime
> > suspend rather than hardware, while transfer data size and transfer
> > rate on IP bus decide when the dma interrupt happen.
> >
> But it isn't the hardware that decides to drop the rpm refcount to 0 and
> starting the autosuspend timer, it's the driver.
Yes, driver should keep rpm refcount never to 0 in such case. But I think that 
case
Is a common case in dma cyclic with runtime_auto_suspend, so some dma driver
also add pm_runtime_get_sync/ pm_runtime_put_autosuspend in interrupt handler
like qcom/bam_dma.c for safe rather than only pm_runtime_mark_last_busy().

> 
> >  Generally, we can call pm_runtime_get_sync(fsl_chan->dev)/
> > pm_runtime_mark_last_busy in interrupt handler to hope the
> > runtime_auto_suspend timer expiry later than interrupt coming, but if
> > the transfer data size is larger in cyclic and transfer rate is very
> > slow like 115200 or lower on uart, the fix autosuspend timer
> > 100ms/200ms maybe not enough, hence, runtime suspend may execute
> > meanwhile the dma interrupt maybe triggered and caught by GIC(but
> > interrupt handler prevent by spin_lock_irqsave in pm_suspend_timer_fn() ),
> and then interrupt handler start to run after runtime suspend.
> 
> If your driver code drops the rpm refcount to 0 and starts the autosuspend
> timer while a cyclic transfer is still in flight this is clearly a bug. 
> Autosuspend is
> not there to paper over driver bugs, but to amortize cost of actually
> suspending and resuming the hardware. Your driver code must still work even
> if the timeout is 0, i.e. the hardware is immediately suspended after you drop
> the rpm refcount to 0.
> 
> If you still have transfers queued/in-flight the driver code must keep a rpm
> reference.
Yes, that's what I said for fix before as below.
'I have a simple workaround that disable runtime suspend in issue_pending 
worker by
calling pm_runtime_forbid() and then enable runtime auto suspend in 
dmaengine_terminate_all
so that we could easily regard that edma channel is always in runtime resume 
between
issue_pending and channel terminated and ignore the above interrupt 
handler/scu-pd limitation' 




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 08/16] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-04-23 Thread Robin Murphy

On 2021-04-22 09:15, Claire Chang wrote:

Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
  drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
  drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
  drivers/pci/xen-pcifront.c   | 2 +-
  include/linux/swiotlb.h  | 4 ++--
  kernel/dma/direct.c  | 2 +-
  kernel/dma/swiotlb.c | 4 ++--
  6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..7d48c433446b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
  
  	max_order = MAX_ORDER;

  #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(NULL)) {
unsigned int max_segment;
  
  		max_segment = swiotlb_max_segment();

diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index e8b506a6685b..2a2ae6d6cf6d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
  
  #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)

-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(NULL);
  #endif
  
  	ret = ttm_device_init(&drm->ttm.bdev, &nouveau_bo_driver, drm->dev->dev,

diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..6d548ce53ce7 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
  
  	spin_unlock(&pcifront_dev_lock);
  
-	if (!err && !is_swiotlb_active()) {

+   if (!err && !is_swiotlb_active(NULL)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2a6cca07540b..c530c976d18b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -123,7 +123,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
  void __init swiotlb_exit(void);
  unsigned int swiotlb_max_segment(void);
  size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
  void __init swiotlb_adjust_size(unsigned long size);
  #else
  #define swiotlb_force SWIOTLB_NO_FORCE
@@ -143,7 +143,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
  }
  
-static inline bool is_swiotlb_active(void)

+static inline bool is_swiotlb_active(struct device *dev)
  {
return false;
  }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
  size_t dma_direct_max_mapping_size(struct device *dev)
  {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))


I wonder if it's worth trying to fold these other conditions into 
is_swiotlb_active() itself? I'm not entirely sure what matters for Xen, 
but for the other cases it seems like they probably only care about 
whether bouncing may occur for their particular device or not (possibly 
they want to be using dma_max_mapping_size() now anyway - TBH I'm 
struggling to make sense of what the swiotlb_max_segment business is 
supposed to mean).


Otherwise, patch #9 will need to touch here as well to make sure that 
per-device forced bouncing is reflected correctly.


Robin.


return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ffbb8724e06c..1d221343f1c8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -659,9 +659,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
  }
  
-bool is_swiotlb_active(void)

+bool is_swiotlb_active(struct device *dev)
  {
-   return io_tlb_default_mem != NULL;
+   return get_io_tlb_mem(dev) != NULL;
  }
  EXPORT_SYMBOL_GPL(is_swiotlb_active);
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 16/16] of: Add plumbing for restricted DMA pool

2021-04-23 Thread Robin Murphy

On 2021-04-22 09:15, Claire Chang wrote:

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
  drivers/of/address.c| 25 +
  drivers/of/device.c |  3 +++
  drivers/of/of_private.h |  5 +
  3 files changed, 33 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 54f221dde267..fff3adfe4986 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1109,6 +1110,30 @@ bool of_dma_is_coherent(struct device_node *np)
  }
  EXPORT_SYMBOL_GPL(of_dma_is_coherent);
  
+int of_dma_set_restricted_buffer(struct device *dev)

+{
+   struct device_node *node;
+   int count, i;
+
+   if (!dev->of_node)
+   return 0;
+
+   count = of_property_count_elems_of_size(dev->of_node, "memory-region",
+   sizeof(phandle));
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(dev->of_node, "memory-region", i);
+   /* There might be multiple memory regions, but only one
+* restriced-dma-pool region is allowed.
+*/


What's the use-case for having multiple regions if the restricted pool 
is by definition the only one accessible?


Robin.


+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(
+   dev, dev->of_node, i);
+   }
+
+   return 0;
+}
+
  /**
   * of_mmio_is_nonposted - Check if device uses non-posted MMIO
   * @np:   device node
diff --git a/drivers/of/device.c b/drivers/of/device.c
index c5a9473a5fb1..d8d865223e51 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
  
  	arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
  
+	if (!iommu)

+   return of_dma_set_restricted_buffer(dev);
+
return 0;
  }
  EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d717efbd637d..e9237f5eff48 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -163,12 +163,17 @@ struct bus_dma_region;
  #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
  int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev);
  #else
  static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
  {
return -ENODEV;
  }
+static inline int of_dma_get_restricted_buffer(struct device *dev)
+{
+   return -ENODEV;
+}
  #endif
  
  #endif /* _LINUX_OF_PRIVATE_H */



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 14/16] dma-direct: Allocate memory from restricted DMA pool if available

2021-04-23 Thread Robin Murphy

On 2021-04-22 09:15, Claire Chang wrote:

The restricted DMA pool is preferred if available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
---
  kernel/dma/direct.c | 35 ++-
  1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a27f0510fcc..29523d2a9845 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -78,6 +78,10 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
  static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
  {
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (swiotlb_free(dev, page, size))
+   return;
+#endif
dma_free_contiguous(dev, page, size);
  }
  
@@ -92,7 +96,17 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
  
  	gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,

   &phys_limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   page = NULL;
+   }
+#endif
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -148,7 +162,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
  
  	if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&

-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -161,8 +175,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
  
  	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&

-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
  
  	/*

@@ -172,7 +186,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
  
  	/* we always manually zero the memory once we are done */

@@ -253,15 +269,15 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
  
  	if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&

-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
}
  
  	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&

-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev)) {
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev)) {
arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
return;
}
@@ -289,7 +305,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
void *ret;
  
  	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&

-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);


Wait, this seems broken for non-coherent devices - in that case we need 
to return a non-cacheable address, but we can't simply fall through into 
the remapping path below in GFP_ATOMIC context. That's why we need the 
atomic pool conc

Re: [PATCH] drm/i915: switch from 'pci_' to 'dma_' API

2021-08-24 Thread Robin Murphy

Hi,

FWIW this patch itself looks fine, but it does highlight some things 
which could be further cleaned up if anyone's interested...


On 2021-08-22 22:06, Christophe JAILLET wrote:
[...]

diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c 
b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index a74b72f50cc9..afb35d2e5c73 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -32,7 +32,7 @@ static int init_fake_lmem_bar(struct intel_memory_region *mem)
mem->remap_addr = dma_map_resource(i915->drm.dev,
   mem->region.start,
   mem->fake_mappable.size,
-  PCI_DMA_BIDIRECTIONAL,
+  DMA_BIDIRECTIONAL,
   DMA_ATTR_FORCE_CONTIGUOUS);


DMA_ATTR_FORCE_CONTIGUOUS is nonsensical here (and below) as it is only 
meaningful for coherent buffers allocated by dma_alloc_attrs().



if (dma_mapping_error(i915->drm.dev, mem->remap_addr)) {
drm_mm_remove_node(&mem->fake_mappable);
@@ -62,7 +62,7 @@ static void release_fake_lmem_bar(struct intel_memory_region 
*mem)
dma_unmap_resource(mem->i915->drm.dev,
   mem->remap_addr,
   mem->fake_mappable.size,
-  PCI_DMA_BIDIRECTIONAL,
+  DMA_BIDIRECTIONAL,
   DMA_ATTR_FORCE_CONTIGUOUS);
  }
  

[...]

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 36489be4896b..cd5f2348a187 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -30,7 +30,7 @@ int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object 
*obj,
do {
if (dma_map_sg_attrs(obj->base.dev->dev,
 pages->sgl, pages->nents,
-PCI_DMA_BIDIRECTIONAL,
+DMA_BIDIRECTIONAL,
 DMA_ATTR_SKIP_CPU_SYNC |
 DMA_ATTR_NO_KERNEL_MAPPING |
 DMA_ATTR_NO_WARN))


Similarly DMA_ATTR_NO_KERNEL_MAPPING and DMA_ATTR_NO_WARN are also for 
coherent allocations rather than streaming mappings.


I'll see if I can whip up a patch to make the API documentation clearer...

Thanks,
Robin.


@@ -64,7 +64,7 @@ void i915_gem_gtt_finish_pages(struct drm_i915_gem_object 
*obj,
usleep_range(100, 250);
  
  	dma_unmap_sg(i915->drm.dev, pages->sgl, pages->nents,

-PCI_DMA_BIDIRECTIONAL);
+DMA_BIDIRECTIONAL);
  }
  
  /**




Re: [PATCH] parisc/parport_gsc: switch from 'pci_' to 'dma_' API

2021-08-24 Thread Robin Murphy

On 2021-08-23 22:30, Christophe JAILLET wrote:

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

@@
expression e1, e2, e3, e4;
@@
-pci_free_consistent(e1, e2, e3, e4)
+dma_free_coherent(&e1->dev, e2, e3, e4)

Signed-off-by: Christophe JAILLET 
---
If needed, see post from Christoph Hellwig on the kernel-janitors ML:
https://marc.info/?l=kernel-janitors&m=158745678307186&w=4

This has *NOT* been compile tested because I don't have the needed
configuration.
ssdfs
---
  drivers/parport/parport_gsc.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/parport/parport_gsc.c b/drivers/parport/parport_gsc.c
index 1e43b3f399a8..db912fa6b6df 100644
--- a/drivers/parport/parport_gsc.c
+++ b/drivers/parport/parport_gsc.c
@@ -390,9 +390,8 @@ static int __exit parport_remove_chip(struct parisc_device 
*dev)
if (p->irq != PARPORT_IRQ_NONE)
free_irq(p->irq, p);
if (priv->dma_buf)
-   pci_free_consistent(priv->dev, PAGE_SIZE,
-   priv->dma_buf,
-   priv->dma_handle);
+   dma_free_coherent(&priv->dev->dev, PAGE_SIZE,
+ priv->dma_buf, priv->dma_handle);


Hmm, seeing a free on its own made me wonder where the corresponding 
alloc was, but on closer inspection it seems there isn't one. AFAICS 
priv->dma_buf is only ever assigned with NULL (and priv->dev doesn't 
seem to be assigned at all), so this could likely just be removed. In 
fact it looks like all the references to DMA in this driver are just 
copy-paste from parport_pc and unused.


Robin.


kfree (p->private_data);
parport_put_port(p);
kfree (ops); /* hope no-one cached it */



[PATCH] drm/cma-helper: Set VM_DONTEXPAND for mmap

2021-08-25 Thread Robin Murphy
drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc()
will end up calling remap_pfn_range() (which happens to set the relevant
vma flag, among others), so in order to make sure expectations around
VM_DONTEXPAND are met, let it explicitly set the flag like most other
GEM mmap implementations do.

This avoids repeated warnings on a small minority of systems where the
display is behind an IOMMU, and has a simple driver which does not
override drm_gem_cma_default_funcs.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index d53388199f34..63e48d98263d 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -510,6 +510,7 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct 
vm_area_struct *vma)
 */
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
vma->vm_flags &= ~VM_PFNMAP;
+   vma->vm_flags |= VM_DONTEXPAND;
 
cma_obj = to_drm_gem_cma_obj(obj);
 
-- 
2.25.1



Re: [PATCH 2/3] drm/etnaviv: fix dma configuration of the virtual device

2021-08-26 Thread Robin Murphy

On 2021-08-26 13:10, Michael Walle wrote:

The DMA configuration of the virtual device is inherited from the first
actual etnaviv device. Unfortunately, this doesn't work with an IOMMU:

[5.191008] Failed to set up IOMMU for device (null); retaining platform DMA 
ops

This is because there is no associated iommu_group with the device. The
group is set in iommu_group_add_device() which is eventually called by
device_add() via the platform bus:
   device_add()
 blocking_notifier_call_chain()
   iommu_bus_notifier()
 iommu_probe_device()
   __iommu_probe_device()
 iommu_group_get_for_dev()
   iommu_group_add_device()

Move of_dma_configure() into the probe function, which is called after
device_add(). Normally, the platform code will already call it itself
if .of_node is set. Unfortunately, this isn't the case here.

Also move the dma mask assignemnts to probe() to keep all DMA related
settings together.


I assume the driver must already keep track of the real GPU platform 
device in order to map registers, request interrupts, etc. correctly - 
can't it also correctly use that device for DMA API calls and avoid the 
need for these shenanigans altogether?


FYI, IOMMU configuration is really supposed to *only* run at 
add_device() time as above - the fact that it's currently hooked in to 
be retriggered by of_dma_configure() on DT platforms actually turns out 
to lead to various issues within the IOMMU API, and the plan to change 
that is slowly climbing up my to-do list.


Robin.


Signed-off-by: Michael Walle 
---
  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 24 +++-
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 2509b3e85709..ff6425f6ebad 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -589,6 +589,7 @@ static int compare_str(struct device *dev, void *data)
  static int etnaviv_pdev_probe(struct platform_device *pdev)
  {
struct device *dev = &pdev->dev;
+   struct device_node *first_node = NULL;
struct component_match *match = NULL;
  
  	if (!dev->platform_data) {

@@ -598,6 +599,9 @@ static int etnaviv_pdev_probe(struct platform_device *pdev)
if (!of_device_is_available(core_node))
continue;
  
+			if (!first_node)

+   first_node = core_node;
+
drm_of_component_match_add(&pdev->dev, &match,
   compare_of, core_node);
}
@@ -609,6 +613,17 @@ static int etnaviv_pdev_probe(struct platform_device *pdev)
component_match_add(dev, &match, compare_str, names[i]);
}
  
+	pdev->dev.coherent_dma_mask = DMA_BIT_MASK(40);

+   pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
+
+   /*
+* Apply the same DMA configuration to the virtual etnaviv
+* device as the GPU we found. This assumes that all Vivante
+* GPUs in the system share the same DMA constraints.
+*/
+   if (first_node)
+   of_dma_configure(&pdev->dev, first_node, true);
+
return component_master_add_with_match(dev, &etnaviv_master_ops, match);
  }
  
@@ -659,15 +674,6 @@ static int __init etnaviv_init(void)

of_node_put(np);
goto unregister_platform_driver;
}
-   pdev->dev.coherent_dma_mask = DMA_BIT_MASK(40);
-   pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
-
-   /*
-* Apply the same DMA configuration to the virtual etnaviv
-* device as the GPU we found. This assumes that all Vivante
-* GPUs in the system share the same DMA constraints.
-*/
-   of_dma_configure(&pdev->dev, np, true);
  
  		ret = platform_device_add(pdev);

if (ret) {



Re: [PATCH 2/3] drm/etnaviv: fix dma configuration of the virtual device

2021-08-26 Thread Robin Murphy

On 2021-08-26 16:17, Lucas Stach wrote:

Am Donnerstag, dem 26.08.2021 um 16:00 +0100 schrieb Robin Murphy:

On 2021-08-26 13:10, Michael Walle wrote:

The DMA configuration of the virtual device is inherited from the first
actual etnaviv device. Unfortunately, this doesn't work with an IOMMU:

[5.191008] Failed to set up IOMMU for device (null); retaining platform DMA 
ops

This is because there is no associated iommu_group with the device. The
group is set in iommu_group_add_device() which is eventually called by
device_add() via the platform bus:
device_add()
  blocking_notifier_call_chain()
iommu_bus_notifier()
  iommu_probe_device()
__iommu_probe_device()
  iommu_group_get_for_dev()
iommu_group_add_device()

Move of_dma_configure() into the probe function, which is called after
device_add(). Normally, the platform code will already call it itself
if .of_node is set. Unfortunately, this isn't the case here.

Also move the dma mask assignemnts to probe() to keep all DMA related
settings together.


I assume the driver must already keep track of the real GPU platform
device in order to map registers, request interrupts, etc. correctly -
can't it also correctly use that device for DMA API calls and avoid the
need for these shenanigans altogether?


Not without a bigger rework. There's still quite a bit of midlayer
issues in DRM, where dma-buf imports are dma-mapped and cached via the
virtual DRM device instead of the real GPU device. Also etnaviv is able
to coalesce multiple Vivante GPUs in a single system under one virtual
DRM device, which is used on i.MX6 where the 2D and 3D GPUs are
separate peripherals, but have the same DMA constraints.


Sure, I wouldn't expect it to be trivial to fix properly, but I wanted 
to point out that this is essentially a hack, relying on an implicit 
side-effect of of_dma_configure() which is already slated for removal. 
As such, I for one am not going to be too sympathetic if it stops 
working in future.


Furthermore, even today it doesn't work in general - it might be OK for 
LS1028A with a single GPU block behind an SMMU, but as soon as you have 
multiple GPU blocks with distinct SMMU StreamIDs, or behind different 
IOMMU instances, then you're stuffed again.


Although in fact I think it's also broken even for LS1028A, since AFAICS 
there's no guarantee that the relevant SMMU instance will actually be 
probed, or the SMMU driver even loaded, when etnaviv_pdev_probe() runs.



Effectively we would need to handle N devices for the dma-mapping in a
lot of places instead of only dealing with the one virtual DRM device.
It would probably be the right thing to anyways, but it's not something
that can be changed short-term. I'm also not yet sure about the
performance implications, as we might run into some cache maintenance
bottlenecks if we dma synchronize buffers to multiple real device
instead of doing it a single time with the virtual DRM device. I know,
I know, this has a lot of assumptions baked in that could fall apart if
someone builds a SoC with multiple Vivante GPUs that have differing DMA
constraints, but up until now hardware designers have not been *that*
crazy, fortunately.


I'm not too familiar with the component stuff, but would it be viable to 
just have etnaviv_gpu_platform_probe() set up the first GPU which comes 
along as the master component and fundamental DRM device, then treat any 
subsequent ones as subcomponents as before? That would at least stand to 
be more robust in terms of obviating the of_dma_configure() hack (only 
actual bus code should ever be calling that), even if it won't do 
anything for the multiple IOMMU mapping or differing DMA constraints 
problems.


Thanks,
Robin.


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-12 Thread Robin Murphy

On 2021-03-11 08:26, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 06:39:57PM +, Robin Murphy wrote:

Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?


It seems to miss a few bits, and also generally seems to be not actually
apply to recent mainline or something like it due to different empty
lines in a few places.


Yeah, that was sketched out on top of some other development patches, 
and in being so focused on not breaking any of the x86 behaviours I did 
indeed overlook fully converting the SMMU drivers... oops!


(my thought was to do the conversion for its own sake, then clean up the 
redundant attribute separately, but I guess it's fine either way)



Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and 
pretending that io_pgtable_cfg is anything other than a SoC-specific 
private interface, but the reworked patch on its own looks reasonable to 
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers 
either...) Just iommu_get_dma_strict() needs an export since the SMMU 
drivers can be modular - I consciously didn't add that myself since I 
was mistakenly thinking only iommu-dma would call it.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-16 Thread Robin Murphy

On 2021-03-15 08:33, Christoph Hellwig wrote:

On Fri, Mar 12, 2021 at 04:18:24PM +, Robin Murphy wrote:

Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and
pretending that io_pgtable_cfg is anything other than a SoC-specific
private interface,


I think a private inteface would make more sense.  For now I've just
condensed it down to a generic set of quirk bits and dropped the
attrs structure, which seems like an ok middle ground for now.  That
being said I wonder why that quirk isn't simply set in the device
tree?


Because it's a software policy decision rather than any inherent 
property of the platform, and the DT certainly doesn't know *when* any 
particular device might prefer its IOMMU to use cacheable pagetables to 
minimise TLB miss latency vs. saving the cache capacity for larger data 
buffers. It really is most logical to decide this at the driver level.


In truth the overall concept *is* relatively generic (a trend towards 
larger system caches and cleverer usage is about both raw performance 
and saving power on off-SoC DRAM traffic), it's just the particular 
implementation of using io-pgtable to set an outer-cacheable walk 
attribute in an SMMU TCR that's pretty much specific to Qualcomm SoCs. 
Hence why having a common abstraction at the iommu_domain level, but 
where the exact details are free to vary across different IOMMUs and 
their respective client drivers, is in many ways an ideal fit.



but the reworked patch on its own looks reasonable to
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers
either...) Just iommu_get_dma_strict() needs an export since the SMMU
drivers can be modular - I consciously didn't add that myself since I was
mistakenly thinking only iommu-dma would call it.


Fixed.  Can I get your signoff for the patch?  Then I'll switch it to
over to being attributed to you.


Sure - I would have thought that the one I originally posted still 
stands, but for the avoidance of doubt, for the parts of commit 
8b6d45c495bd in your tree that remain from what I wrote:


Signed-off-by: Robin Murphy 

Cheers,
Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 24/30] Kconfig: Change Synopsys to Synopsis

2021-03-30 Thread Robin Murphy

On 2021-03-29 00:53, Bhaskar Chowdhury wrote:

s/Synopsys/Synopsis/  .two different places.


Erm, that is definitely not a typo... :/


..and for some unknown reason it introduce a empty line deleted and added
back.


Presumably your editor is configured to trim trailing whitespace on save.

Furthermore, there are several instances in the other patches where your 
"corrections" are grammatically incorrect, I'm not sure what the deal is 
with patch #14, and you've also used the wrong subsystem name (it should 
be "dmaengine"). It's great to want to clean things up, but please pay a 
bit of care and attention to what you're actually doing.


Robin.


Signed-off-by: Bhaskar Chowdhury 
---
  drivers/dma/Kconfig | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 0c2827fd8c19..30e8cc26f43b 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -170,15 +170,15 @@ config DMA_SUN6I
  Support for the DMA engine first found in Allwinner A31 SoCs.

  config DW_AXI_DMAC
-   tristate "Synopsys DesignWare AXI DMA support"
+   tristate "Synopsis DesignWare AXI DMA support"
depends on OF || COMPILE_TEST
depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
- Enable support for Synopsys DesignWare AXI DMA controller.
+ Enable support for Synopsis DesignWare AXI DMA controller.
  NOTE: This driver wasn't tested on 64 bit platform because
- of lack 64 bit platform with Synopsys DW AXI DMAC.
+ of lack 64 bit platform with Synopsis DW AXI DMAC.

  config EP93XX_DMA
bool "Cirrus Logic EP93xx DMA support"
@@ -394,7 +394,7 @@ config MOXART_DMA
select DMA_VIRTUAL_CHANNELS
help
  Enable support for the MOXA ART SoC DMA controller.
-
+
  Say Y here if you enabled MMP ADMA, otherwise say N.

  config MPC512X_DMA
--
2.26.3

___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
  missing bits]
Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
  drivers/iommu/dma-iommu.c   |  9 +--
  drivers/iommu/intel/iommu.c | 64 -
  drivers/iommu/iommu.c   | 27 ++---
  include/linux/iommu.h   |  4 +-
  8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of 
stuffing a global property into a domain attribute then pulling it out 
again in the illusion that it was in any way per-domain. We're still 
checking dev_is_untrusted() before making an actual decision, and it's 
not like we can't add more factors at that point if we want to.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/

Thanks,
Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
   missing bits]
Signed-off-by: Christoph Hellwig 
---
   drivers/iommu/amd/iommu.c   | 23 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
   drivers/iommu/dma-iommu.c   |  9 +--
   drivers/iommu/intel/iommu.c | 64 -
   drivers/iommu/iommu.c   | 27 ++---
   include/linux/iommu.h   |  4 +-
   8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the 
domain attribute by iommu_group_alloc_default_domain(), so there's no 
functional change there.


Obviously some of the above checks could be factored out into some kind 
of iommu_use_flush_queue() helper that IOMMU drivers can also call if 
they need to keep in sync. Or maybe we just allow iommu-dma to set 
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if 
we're treating that as a generic thing now.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/


Ah, sorry, I didn't spot that! I was just trying to illustrate that this
is per-device.


Sure, I understand - and I'm just trying to bang home that despite 
appearances it's never actually been treated as such for SMMU, so 
anything that's wrong after this change was already wrong before.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy

On 2021-03-31 12:49, Will Deacon wrote:

On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote:

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Christoph Hellwig 
---
drivers/iommu/amd/iommu.c   | 23 +---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
drivers/iommu/dma-iommu.c   |  9 +--
drivers/iommu/intel/iommu.c | 64 -
drivers/iommu/iommu.c   | 27 ++---
include/linux/iommu.h   |  4 +-
8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the
domain attribute by iommu_group_alloc_default_domain(), so there's no
functional change there.


For DMA domains sure, but I don't think that's the case for unmanaged
domains such as those used by VFIO.


Eh? This is only relevant to DMA domains anyway. Flush queues are part 
of the IOVA allocator that VFIO doesn't even use. It's always been the 
case that unmanaged domains only use strict invalidation.



Obviously some of the above checks could be factored out into some kind of
iommu_use_flush_queue() helper that IOMMU drivers can also call if they need
to keep in sync. Or maybe we just allow iommu-dma to set
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're
treating that as a generic thing now.


I think a helper that takes a domain would be a good starting point.


You mean device, right? The one condition we currently have is at the 
device level, and there's really nothing inherent to the domain itself 
that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care 
about this).


Another idea that's just come to mind is now that IOMMU_DOMAIN_DMA has a 
standard meaning, maybe we could split out a separate 
IOMMU_DOMAIN_DMA_STRICT type such that it can all propagate from 
iommu_get_def_domain_type()? That feels like it might be quite 
promising, but I'd still do it as an improvement on top of this patch, 
since it's beyond just cleaning up the abuse of domain attributes to 
pass a command-line option around.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy

On 2021-03-31 16:32, Will Deacon wrote:

On Wed, Mar 31, 2021 at 02:09:37PM +0100, Robin Murphy wrote:

On 2021-03-31 12:49, Will Deacon wrote:

On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote:

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
 missing bits]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/amd/iommu.c   | 23 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
 drivers/iommu/dma-iommu.c   |  9 +--
 drivers/iommu/intel/iommu.c | 64 -
 drivers/iommu/iommu.c   | 27 ++---
 include/linux/iommu.h   |  4 +-
 8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the
domain attribute by iommu_group_alloc_default_domain(), so there's no
functional change there.


For DMA domains sure, but I don't think that's the case for unmanaged
domains such as those used by VFIO.


Eh? This is only relevant to DMA domains anyway. Flush queues are part of
the IOVA allocator that VFIO doesn't even use. It's always been the case
that unmanaged domains only use strict invalidation.


Maybe I'm going mad. With this patch, the SMMU driver unconditionally sets
IO_PGTABLE_QUIRK_NON_STRICT for page-tables if iommu_get_dma_strict() is
true, no? In which case, that will get set for page-tables corresponding
to unmanaged domains as well as DMA domains when it is enabled. That didn't
happen before because you couldn't set the attribute for unmanaged domains.

What am I missing?


Oh cock... sorry, all this time I've been saying what I *expect* it to 
do, while overlooking the fact that the IO_PGTABLE_QUIRK_NON_STRICT 
hunks were the bits I forgot to write and Christoph had to fix up. 
Indeed, those should be checking the domain type too to preserve the 
existing behaviour. Apologies for the confusion.


Robin.


Obviously some of the above checks could be factored out into some kind of
iommu_use_flush_queue() helper that IOMMU drivers can also call if they need
to keep in sync. Or maybe we just allow iommu-dma to set
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're
treating that as a generic thing now.


I think a helper that takes a domain would be a good starting point.


You mean device, right? The one condition we currently have is at the device
level, and there's really nothing inherent to the domain itself that matters
(since the type is implicitly IOMMU_DOMAIN_DMA to even care about this).


Device would probably work too; you'd pass the first device to attach to the
domain when querying t

Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy

On 2021-03-16 15:38, Christoph Hellwig wrote:
[...]

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f1e38526d5bd40..996dfdf9d375dd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2017,7 +2017,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
-	if (smmu_domain->non_strict)

+   if (!iommu_get_dma_strict())


As Will raised, this also needs to be checking "domain->type == 
IOMMU_DOMAIN_DMA" to maintain equivalent behaviour to the attribute code 
below.



pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
  
  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);

@@ -2449,52 +2449,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-
-   switch (domain->type) {
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
-}

[...]

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a25..edb1de479dd1a7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -668,7 +668,6 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
  
  	struct io_pgtable_ops		*pgtbl_ops;

-   boolnon_strict;
atomic_tnr_ats_masters;
  
  	enum arm_smmu_domain_stage	stage;

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 0aa6d667274970..3dde22b1f8ffb0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -761,6 +761,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
+	if (!iommu_get_dma_strict())


Ditto here.

Sorry for not spotting that sooner :(

Robin.


+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
if (smmu->impl && smmu->impl->init_context) {
ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg, dev);
if (ret)

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays

2021-02-04 Thread Robin Murphy

On 2021-02-04 07:29, Christoph Hellwig wrote:

On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote:

This patch converts several swiotlb related variables to arrays, in
order to maintain stat/status for different swiotlb buffers. Here are
variables involved:

- io_tlb_start and io_tlb_end
- io_tlb_nslabs and io_tlb_used
- io_tlb_list
- io_tlb_index
- max_segment
- io_tlb_orig_addr
- no_iotlb_memory

There is no functional change and this is to prepare to enable 64-bit
swiotlb.


Claire Chang (on Cc) already posted a patch like this a month ago,
which looks much better because it actually uses a struct instead
of all the random variables.


Indeed, I skimmed the cover letter and immediately thought that this 
whole thing is just the restricted DMA pool concept[1] again, only from 
a slightly different angle.


Robin.

[1] 
https://lore.kernel.org/linux-iommu/20210106034124.30560-1-tien...@chromium.org/

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/lima: Use delayed timer as default in devfreq profile

2021-02-04 Thread Robin Murphy

On 2021-02-03 02:01, Qiang Yu wrote:

On Tue, Feb 2, 2021 at 10:02 PM Lukasz Luba  wrote:




On 2/2/21 1:01 AM, Qiang Yu wrote:

Hi Lukasz,

Thanks for the explanation. So the deferred timer option makes a mistake that
when GPU goes from idle to busy for only one poll periodic, in this
case 50ms, right?


Not exactly. Driver sets the polling interval to 50ms (in this case)
because it needs ~3-frame average load (in 60fps). I have discovered the
issue quite recently that on systems with 2 CPUs or more, the devfreq
core is not monitoring the devices even for seconds. Therefore, we might
end up with quite big amount of work that GPU is doing, but we don't
know about it. Devfreq core didn't check <- timer didn't fired. Then
suddenly that CPU, which had the deferred timer registered last time,
is waking up and timer triggers to check our device. We get the stats,
but they might be showing load from 1sec not 50ms. We feed them into
governor. Governor sees the new load, but was tested and configured for
50ms, so it might try to rise the frequency to max. The GPU work might
be already lower and there is no need for such freq. Then the CPU goes
idle again, so no devfreq core check for next e.g. 1sec, but the
frequency stays at max OPP and we burn power.

So, it's completely unreliable. We might stuck at min frequency and
suffer the frame drops, or sometimes stuck to max freq and burn more
power when there is no such need.

Similar for thermal governor, which is confused by this old stats and
long period stats, longer than 50ms.

Stats from last e.g. ~1sec tells you nothing about real recent GPU
workload.

Oh, right, I missed this case.




But delayed timer will wakeup CPU every 50ms even when system is idle, will this
cause more power consumption for the case like phone suspend?


No, in case of phone suspend it won't increase the power consumption.
The device won't be woken up, it will stay in suspend.

I mean the CPU is waked up frequently by timer when phone suspend,
not the whole device (like the display).

Seems it's better to have deferred timer when device is suspended for
power saving,
and delayed timer when device in working state. User knows this and
can use sysfs
to change it.


Doesn't devfreq_suspend_device() already cancel any timer work either 
way in that case?


Robin.


Set the delayed timer as default is reasonable, so patch is:
Reviewed-by: Qiang Yu 

Regards,
Qiang



Regards,
Lukasz




Regards,
Qiang


On Mon, Feb 1, 2021 at 5:53 PM Lukasz Luba  wrote:


Hi Qiang,

On 1/30/21 1:51 PM, Qiang Yu wrote:

Thanks for the patch. But I can't observe any difference on glmark2
with or without this patch.
Maybe you can provide other test which can benefit from it.


This is a design problem and has impact on the whole system.
There is a few issues. When the device is not checked and there are
long delays between last check and current, the history is broken.
It confuses the devfreq governor and thermal governor (Intelligent Power
Allocation (IPA)). Thermal governor works on stale stats data and makes
stupid decisions, because there is no new stats (device not checked).
Similar applies to devfreq simple_ondemand governor, where it 'tires' to
work on a lng period even 3sec and make prediction for the next
frequency based on it (which is broken).

How it should be done: constant reliable check is needed, then:
- period is guaranteed and has fixed size, e.g 50ms or 100ms.
- device status is quite recent so thermal devfreq cooling provides
 'fresh' data into thermal governor

This would prevent odd behavior and solve the broken cases.



Considering it will wake up CPU more frequently, and user may choose
to change this by sysfs,
I'd like to not apply it.


The deferred timer for GPU is wrong option, for UFS or eMMC makes more
sense. It's also not recommended for NoC busses. I've discovered that
some time ago and proposed to have option to switch into delayed timer.
Trust me, it wasn't obvious to find out that this missing check has
those impacts. So the other engineers or users might not know that some
problems they faces (especially when the device load is changing) is due
to this delayed vs deffered timer and they will change it in the sysfs.

Regards,
Lukasz



Regards,
Qiang


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC PATCH 1/8] of/device: Allow specifying a custom iommu_spec to of_dma_configure

2021-02-16 Thread Robin Murphy

Hi Mikko,

On 2021-02-08 16:38, Mikko Perttunen wrote:

To allow for more customized device tree bindings that point to IOMMUs,
allow manual specification of iommu_spec to of_dma_configure.

The initial use case for this is with Host1x, where the driver manages
a set of device tree-defined IOMMU contexts that are dynamically
allocated to various users. These contexts don't correspond to
platform devices and are instead attached to dummy devices on a custom
software bus.


I'd suggest taking a closer look at the patches that made this 
of_dma_configure_id() in the first place, and the corresponding bus code 
in fsl-mc. At this level, Host1x sounds effectively identical to DPAA2 
in terms of being a bus of logical devices composed from bits of 
implicit behind-the-scenes hardware. I mean, compare your series title 
to the fact that their identifiers are literally named "Isolation 
Context ID" ;)


Please just use the existing mechanisms to describe a mapping between 
Host1x context IDs and SMMU Stream IDs, rather than what looks like a 
giant hacky mess here.


(This also reminds me I wanted to rip out all the PCI special-cases and 
convert pci_dma_configure() over to passing its own IDs too, so thanks 
for the memory-jog...)


Robin.


Signed-off-by: Mikko Perttunen 
---
  drivers/iommu/of_iommu.c  | 12 
  drivers/of/device.c   |  9 +
  include/linux/of_device.h | 34 +++---
  include/linux/of_iommu.h  |  6 --
  4 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e505b9130a1c..3fefa6c63863 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -87,8 +87,7 @@ int of_get_dma_window(struct device_node *dn, const char 
*prefix, int index,
  }
  EXPORT_SYMBOL_GPL(of_get_dma_window);
  
-static int of_iommu_xlate(struct device *dev,

- struct of_phandle_args *iommu_spec)
+int of_iommu_xlate(struct device *dev, struct of_phandle_args *iommu_spec)
  {
const struct iommu_ops *ops;
struct fwnode_handle *fwnode = &iommu_spec->np->fwnode;
@@ -117,6 +116,7 @@ static int of_iommu_xlate(struct device *dev,
module_put(ops->owner);
return ret;
  }
+EXPORT_SYMBOL_GPL(of_iommu_xlate);
  
  static int of_iommu_configure_dev_id(struct device_node *master_np,

 struct device *dev,
@@ -177,7 +177,8 @@ static int of_iommu_configure_device(struct device_node 
*master_np,
  
  const struct iommu_ops *of_iommu_configure(struct device *dev,

   struct device_node *master_np,
-  const u32 *id)
+  const u32 *id,
+  struct of_phandle_args *iommu_spec)
  {
const struct iommu_ops *ops = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -209,7 +210,10 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, &info);
} else {
-   err = of_iommu_configure_device(master_np, dev, id);
+   if (iommu_spec)
+   err = of_iommu_xlate(dev, iommu_spec);
+   else
+   err = of_iommu_configure_device(master_np, dev, id);
  
  		fwspec = dev_iommu_fwspec_get(dev);

if (!err && fwspec)
diff --git a/drivers/of/device.c b/drivers/of/device.c
index aedfaaafd3e7..84ada2110c5b 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -88,8 +88,9 @@ int of_device_add(struct platform_device *ofdev)
   * can use a platform bus notifier and handle BUS_NOTIFY_ADD_DEVICE events
   * to fix up DMA configuration.
   */
-int of_dma_configure_id(struct device *dev, struct device_node *np,
-   bool force_dma, const u32 *id)
+int __of_dma_configure(struct device *dev, struct device_node *np,
+   bool force_dma, const u32 *id,
+   struct of_phandle_args *iommu_spec)
  {
const struct iommu_ops *iommu;
const struct bus_dma_region *map = NULL;
@@ -170,7 +171,7 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
dev_dbg(dev, "device is%sdma coherent\n",
coherent ? " " : " not ");
  
-	iommu = of_iommu_configure(dev, np, id);

+   iommu = of_iommu_configure(dev, np, id, iommu_spec);
if (PTR_ERR(iommu) == -EPROBE_DEFER) {
kfree(map);
return -EPROBE_DEFER;
@@ -184,7 +185,7 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
dev->dma_range_map = map;
return 0;
  }
-EXPORT_SYMBOL_GPL(of_dma_configure_id);
+EXPORT_SYMBOL_GPL(__of_dma_configure);
  
  

Re: [PATCH v3 4/6] drm/sprd: add Unisoc's drm display controller driver

2021-02-16 Thread Robin Murphy
_INT_ERRBIT(2)
+#define BIT_DPU_INT_UPDATE_DONEBIT(4)
+#define BIT_DPU_INT_VSYNC  BIT(5)
+#define BIT_DPU_INT_MMU_VAOR_RDBIT(16)
+#define BIT_DPU_INT_MMU_VAOR_WRBIT(17)
+#define BIT_DPU_INT_MMU_INV_RD BIT(18)
+#define BIT_DPU_INT_MMU_INV_WR BIT(19)
+
+/* DPI control bits */
+#define BIT_DPU_EDPI_TE_EN BIT(8)
+#define BIT_DPU_EDPI_FROM_EXTERNAL_PAD BIT(10)
+#define BIT_DPU_DPI_HALT_ENBIT(16)
+
+static const u32 primary_fmts[] = {
+   DRM_FORMAT_XRGB,
+   DRM_FORMAT_XBGR,
+   DRM_FORMAT_ARGB,
+   DRM_FORMAT_ABGR,
+   DRM_FORMAT_RGBA,
+   DRM_FORMAT_BGRA,
+   DRM_FORMAT_RGBX,
+   DRM_FORMAT_RGB565,
+   DRM_FORMAT_BGR565,
+   DRM_FORMAT_NV12,
+   DRM_FORMAT_NV21,
+   DRM_FORMAT_NV16,
+   DRM_FORMAT_NV61,
+   DRM_FORMAT_YUV420,
+   DRM_FORMAT_YVU420,
+};
+
+struct sprd_plane {
+   struct drm_plane plane;
+   u32 index;
+};
+
+static inline struct sprd_plane *to_sprd_plane(struct drm_plane *plane)
+{
+   return container_of(plane, struct sprd_plane, plane);
+}
+
+static u32 check_mmu_isr(struct sprd_dpu *dpu, u32 reg_val)
+{
+   struct dpu_context *ctx = &dpu->ctx;
+   u32 mmu_mask = BIT_DPU_INT_MMU_VAOR_RD |
+   BIT_DPU_INT_MMU_VAOR_WR |
+   BIT_DPU_INT_MMU_INV_RD |
+   BIT_DPU_INT_MMU_INV_WR;
+   u32 val = reg_val & mmu_mask;
+   int i;
+
+   if (val) {
+   drm_err(dpu->drm, "--- iommu interrupt err: 0x%04x ---\n", val);
+
+   if (val & BIT_DPU_INT_MMU_INV_RD)
+   drm_err(dpu->drm, "iommu invalid read error, addr: 
0x%08x\n",
+   readl(ctx->base + REG_MMU_INV_ADDR_RD));
+   if (val & BIT_DPU_INT_MMU_INV_WR)
+   drm_err(dpu->drm, "iommu invalid write error, addr: 
0x%08x\n",
+   readl(ctx->base + REG_MMU_INV_ADDR_WR));
+   if (val & BIT_DPU_INT_MMU_VAOR_RD)
+   drm_err(dpu->drm, "iommu va out of range read error, addr: 
0x%08x\n",
+   readl(ctx->base + REG_MMU_VAOR_ADDR_RD));
+   if (val & BIT_DPU_INT_MMU_VAOR_WR)
+   drm_err(dpu->drm, "iommu va out of range write error, addr: 
0x%08x\n",
+   readl(ctx->base + REG_MMU_VAOR_ADDR_WR));
+
+   for (i = 0; i < 8; i++) {
+   reg_val = layer_reg_rd(ctx, REG_LAY_CTRL, i);
+   if (reg_val & 0x1)
+   drm_info(dpu->drm, "layer%d: 0x%08x 0x%08x 0x%08x 
ctrl: 0x%08x\n", i,
+   layer_reg_rd(ctx, REG_LAY_BASE_ADDR0, 
i),
+   layer_reg_rd(ctx, REG_LAY_BASE_ADDR1, 
i),
+   layer_reg_rd(ctx, REG_LAY_BASE_ADDR2, 
i),
+   layer_reg_rd(ctx, REG_LAY_CTRL, i));
+   }
+   }
+
+   return val;
+}


Please don't do this - if the IOMMU is managed by a separate driver, 
that driver should handle its own interrupts. If you want to report 
client-specific information associated with a fault you can use 
iommu_set_fault_handler() to hook in your reporting function (the IOMMU 
driver should correspondingly call report_iommu_fault() from its own 
fault handler).


Even regardless of the mess this seems to be responsible for in terms of 
DT bindings and resource claiming, duplicating the basic IOMMU fault 
processing across all the drivers for your media blocks is clearly not a 
good path to take.


Robin.


+
+static int dpu_wait_stop_done(struct sprd_dpu *dpu)
+{
+   struct dpu_context *ctx = &dpu->ctx;
+   int rc;
+
+   if (ctx->stopped)
+   return 0;
+
+   rc = wait_event_interruptible_timeout(ctx->wait_queue, ctx->evt_stop,
+  msecs_to_jiffies(500));
+   ctx->evt_stop = false;
+
+   ctx->stopped = true;
+
+   if (!rc) {
+   drm_err(dpu->drm, "dpu wait for stop done time out!\n");
+   return -ETIMEDOUT;
+   }
+
+   return 0;
+}
+
+static int dpu_wait_update_done(struct sprd_dpu *dpu)
+{
+   struct dpu_context *ctx = &dpu->ctx;
+   int rc;
+
+   ctx->evt_update = false;
+
+   rc = wait_event_interruptible_timeout(ctx->wait_queue, ctx->evt_update,
+  msecs_to_jiffies(500));
+
+   if (!rc) {
+   drm_err(dpu->drm, "dpu wait for reg update done time out!\n");
+   return -ETIMEDOUT;
+   }
+
+   return 0;
+}
+
+s

Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this 
through the drivers at all? Seems like it would make more sense for the 
x86 drivers to reflect their private options back to iommu_dma_strict 
(and allow Intel's caching mode to override it as well), then have 
iommu_dma_init_domain just test !iommu_dma_strict && 
domain->ops->flush_iotlb_all.


Robin.


Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 ++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
  drivers/iommu/dma-iommu.c   |  8 ++-
  drivers/iommu/intel/iommu.c | 27 ++
  drivers/iommu/iommu.c   | 19 +++
  include/linux/iommu.h   | 17 ++-
  7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
  }
  
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
  {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
  }
  
  /*

@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	switch (domain->type) {

-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
  }
  
  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break

Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 


Moreso than the previous patch, where the feature is at least relatively 
generic (note that there's a bunch of in-flight development around 
DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to 
bloat the generic iommu_ops structure with private driver-specific 
interfaces. The attribute interface is a great compromise for these 
kinds of things, and you can easily add type-checked wrappers around it 
for external callers (maybe even make the actual attributes internal 
between the IOMMU core and drivers) if that's your concern.


Robin.


---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
  drivers/iommu/iommu.c   |  9 ++
  include/linux/iommu.h   |  9 +-
  4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
  
  	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
+   iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
  }
  
  struct msm_gem_address_space *

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
  }
  
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
  {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
  
-	mutex_lock(&smmu_domain->init_mutex);

-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
  
-			smmu_domain->pgtbl_cfg = *pgtbl_cfg;

-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
  }
  
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {

.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
  
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,

+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
  void iommu_get_resv_regions(struct device *dev, struct list_head *list)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct 

Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-10 Thread Robin Murphy

On 2021-03-10 09:25, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 10:15:01AM +0100, Christoph Hellwig wrote:

On Thu, Mar 04, 2021 at 03:25:27PM +, Robin Murphy wrote:

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this
through the drivers at all? Seems like it would make more sense for the x86
drivers to reflect their private options back to iommu_dma_strict (and
allow Intel's caching mode to override it as well), then have
iommu_dma_init_domain just test !iommu_dma_strict &&
domain->ops->flush_iotlb_all.


Hmm.  I looked at this, and kill off ->dma_enable_flush_queue for
the ARM drivers and just looking at iommu_dma_strict seems like a
very clear win.

OTOH x86 is a little more complicated.  AMD and intel defaul to lazy
mode, so we'd have to change the global iommu_dma_strict if they are
initialized.  Also Intel has not only a "static" option to disable
lazy mode, but also a "dynamic" one where it iterates structure.  So
I think on the get side we're stuck with the method, but it still
simplifies the whole thing.


Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?

Robin.

->8-
Subject: [PATCH] iommu: Consolidate strict invalidation handling

Now that everyone is using iommu-dma, the global invalidation policy
really doesn't need to be woven through several parts of the core API
and individual drivers, we can just look it up directly at the one point
that we now make the flush queue decision. If the x86 drivers reflect
their internal options and overrides back to iommu_dma_strict, that can
become the canonical source.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/iommu.c   |  2 ++
 drivers/iommu/dma-iommu.c   |  8 +---
 drivers/iommu/intel/iommu.c | 12 
 drivers/iommu/iommu.c   | 35 +++
 include/linux/iommu.h   |  2 ++
 5 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..1db29e59d468 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1856,6 +1856,8 @@ int __init amd_iommu_init_dma_ops(void)
else
pr_info("Lazy IO/TLB flushing enabled\n");
 
+	iommu_set_dma_strict(amd_iommu_unmap_flush);

+
return 0;
 
 }

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..789a950cc125 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -304,10 +304,6 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain 
*iovad)
 
 	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);

domain = cookie->fq_domain;
-   /*
-* The IOMMU driver supporting DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
-* implies that ops->flush_iotlb_all must be non-NULL.
-*/
domain->ops->flush_iotlb_all(domain);
 }
 
@@ -334,7 +330,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,

struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
-   int attr;
 
 	if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)

return -EINVAL;
@@ -371,8 +366,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
init_iova_domain(iovad, 1UL << order, base_pfn);
 
 	if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&

-   !iommu_domain_get_attr(domain, DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, &attr) 
&&
-   attr) {
+   domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
pr_warn("iova flush queue initialization failed\n");
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b5c746f0f63b..f5b452cd1266 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4377,6 +4377,17 @@ int __init intel_iommu_init(void)
 
 	down_read(&dmar_global_lock);

for_each_active_iommu(iommu, drhd) {
+   if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
+   /*
+* The flush queue implementation does 

Re: [PATCH v5 4/6] drm/sprd: add Unisoc's drm display controller driver

2021-05-17 Thread Robin Murphy

On 2021-05-17 10:27, Joerg Roedel wrote:

On Fri, Apr 30, 2021 at 08:20:10PM +0800, Kevin Tang wrote:

Cc  Robin & Joerg


This is just some GPU internal MMU being used here, it seems. It doesn't
use the IOMMU core code, so no Ack needed from the IOMMU side.


Except the actual MMU being used is drivers/iommu/sprd_iommu.c - this is 
just the display driver poking directly at the interrupt registers of 
its associated IOMMU instance. I still think this is wrong, and that it 
should be treated as a shared interrupt, with the IOMMU driver handling 
its own registers and reporting to the client through the standard 
report_iommu_fault() API, especially since there are apparently more 
blocks using these IOMMU instances than just the display.


Robin.


[PATCH] iommu/io-pgtable: Remove tlb_flush_leaf

2020-11-25 Thread Robin Murphy
The only user of tlb_flush_leaf is a particularly hairy corner of the
Arm short-descriptor code, which wants a synchronous invalidation to
minimise the races inherent in trying to split a large page mapping.
This is already far enough into "here be dragons" territory that no
sensible caller should ever hit it, and thus it really doesn't need
optimising. Although using tlb_flush_walk there may technically be
more heavyweight than needed, it does the job and saves everyone else
having to carry around useless baggage.

Signed-off-by: Robin Murphy 
---

Reviewing the Mediatek TLB optimisation patches just left me thinking
"why do we even have this?"... Panfrost folks, this has zero functional
impact to you, merely wants an ack for straying outside drivers/iommu.

Robin.

 drivers/gpu/drm/msm/msm_iommu.c |  1 -
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  7 --
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  7 --
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 25 +++--
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |  8 ---
 drivers/iommu/io-pgtable-arm-v7s.c  |  3 +--
 drivers/iommu/io-pgtable-arm.c  |  1 -
 drivers/iommu/ipmmu-vmsa.c  |  1 -
 drivers/iommu/msm_iommu.c   |  7 --
 drivers/iommu/mtk_iommu.c   |  1 -
 include/linux/io-pgtable.h  | 11 -
 11 files changed, 4 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 22ac7c692a81..50d881794758 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -139,7 +139,6 @@ static void msm_iommu_tlb_add_page(struct 
iommu_iotlb_gather *gather,
 static const struct iommu_flush_ops null_tlb_ops = {
.tlb_flush_all = msm_iommu_tlb_flush_all,
.tlb_flush_walk = msm_iommu_tlb_flush_walk,
-   .tlb_flush_leaf = msm_iommu_tlb_flush_walk,
.tlb_add_page = msm_iommu_tlb_add_page,
 };

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 776448c527ea..c186914cc4f9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -347,16 +347,9 @@ static void mmu_tlb_flush_walk(unsigned long iova, size_t 
size, size_t granule,
mmu_tlb_sync_context(cookie);
 }

-static void mmu_tlb_flush_leaf(unsigned long iova, size_t size, size_t granule,
-  void *cookie)
-{
-   mmu_tlb_sync_context(cookie);
-}
-
 static const struct iommu_flush_ops mmu_tlb_ops = {
.tlb_flush_all  = mmu_tlb_inv_context_s1,
.tlb_flush_walk = mmu_tlb_flush_walk,
-   .tlb_flush_leaf = mmu_tlb_flush_leaf,
 };

 int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e634bbe60573..fb684a393118 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1741,16 +1741,9 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, 
size_t size,
arm_smmu_tlb_inv_range(iova, size, granule, false, cookie);
 }

-static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   arm_smmu_tlb_inv_range(iova, size, granule, true, cookie);
-}
-
 static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
-   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_page   = arm_smmu_tlb_inv_page_nosync,
 };

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..0b8c59922a2b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -333,14 +333,6 @@ static void arm_smmu_tlb_inv_walk_s1(unsigned long iova, 
size_t size,
arm_smmu_tlb_sync_context(cookie);
 }

-static void arm_smmu_tlb_inv_leaf_s1(unsigned long iova, size_t size,
-size_t granule, void *cookie)
-{
-   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie,
- ARM_SMMU_CB_S1_TLBIVAL);
-   arm_smmu_tlb_sync_context(cookie);
-}
-
 static void arm_smmu_tlb_add_page_s1(struct iommu_iotlb_gather *gather,
 unsigned long iova, size_t granule,
 void *cookie)
@@ -357,14 +349,6 @@ static void arm_smmu_tlb_inv_walk_s2(unsigned long iova, 
size_t size,
arm_smmu_tlb_sync_context(cookie);
 }

-static void arm_smmu_tlb_inv_leaf_s2(unsigned long iova, size_t size,
-size_t granule, void *cookie)
-{
-   arm_smmu_tlb_inv_range_s2(iova, size, granule, cookie,
- 

Re: [PATCH] iommu/io-pgtable: Remove tlb_flush_leaf

2020-11-25 Thread Robin Murphy

On 2020-11-25 17:29, Robin Murphy wrote:

The only user of tlb_flush_leaf is a particularly hairy corner of the
Arm short-descriptor code, which wants a synchronous invalidation to
minimise the races inherent in trying to split a large page mapping.
This is already far enough into "here be dragons" territory that no
sensible caller should ever hit it, and thus it really doesn't need
optimising. Although using tlb_flush_walk there may technically be
more heavyweight than needed, it does the job and saves everyone else
having to carry around useless baggage.

Signed-off-by: Robin Murphy 
---

Reviewing the Mediatek TLB optimisation patches just left me thinking
"why do we even have this?"... Panfrost folks, this has zero functional
impact to you, merely wants an ack for straying outside drivers/iommu.


Oops, same goes for MSM folks too - I honestly failed to even notice 
that some of the "msm_iommu" code actually lives in DRM. Maybe the fancy 
editor makes life *too* easy and I should go back to grep and vim and 
having to pay attention to paths... :)


Robin.


  drivers/gpu/drm/msm/msm_iommu.c |  1 -
  drivers/gpu/drm/panfrost/panfrost_mmu.c |  7 --
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  7 --
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 25 +++--
  drivers/iommu/arm/arm-smmu/qcom_iommu.c |  8 ---
  drivers/iommu/io-pgtable-arm-v7s.c  |  3 +--
  drivers/iommu/io-pgtable-arm.c  |  1 -
  drivers/iommu/ipmmu-vmsa.c  |  1 -
  drivers/iommu/msm_iommu.c   |  7 --
  drivers/iommu/mtk_iommu.c   |  1 -
  include/linux/io-pgtable.h  | 11 -
  11 files changed, 4 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 22ac7c692a81..50d881794758 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -139,7 +139,6 @@ static void msm_iommu_tlb_add_page(struct 
iommu_iotlb_gather *gather,
  static const struct iommu_flush_ops null_tlb_ops = {
.tlb_flush_all = msm_iommu_tlb_flush_all,
.tlb_flush_walk = msm_iommu_tlb_flush_walk,
-   .tlb_flush_leaf = msm_iommu_tlb_flush_walk,
.tlb_add_page = msm_iommu_tlb_add_page,
  };

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 776448c527ea..c186914cc4f9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -347,16 +347,9 @@ static void mmu_tlb_flush_walk(unsigned long iova, size_t 
size, size_t granule,
mmu_tlb_sync_context(cookie);
  }

-static void mmu_tlb_flush_leaf(unsigned long iova, size_t size, size_t granule,
-  void *cookie)
-{
-   mmu_tlb_sync_context(cookie);
-}
-
  static const struct iommu_flush_ops mmu_tlb_ops = {
.tlb_flush_all  = mmu_tlb_inv_context_s1,
.tlb_flush_walk = mmu_tlb_flush_walk,
-   .tlb_flush_leaf = mmu_tlb_flush_leaf,
  };

  int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e634bbe60573..fb684a393118 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1741,16 +1741,9 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, 
size_t size,
arm_smmu_tlb_inv_range(iova, size, granule, false, cookie);
  }

-static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   arm_smmu_tlb_inv_range(iova, size, granule, true, cookie);
-}
-
  static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context,
.tlb_flush_walk = arm_smmu_tlb_inv_walk,
-   .tlb_flush_leaf = arm_smmu_tlb_inv_leaf,
.tlb_add_page   = arm_smmu_tlb_inv_page_nosync,
  };

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..0b8c59922a2b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -333,14 +333,6 @@ static void arm_smmu_tlb_inv_walk_s1(unsigned long iova, 
size_t size,
arm_smmu_tlb_sync_context(cookie);
  }

-static void arm_smmu_tlb_inv_leaf_s1(unsigned long iova, size_t size,
-size_t granule, void *cookie)
-{
-   arm_smmu_tlb_inv_range_s1(iova, size, granule, cookie,
- ARM_SMMU_CB_S1_TLBIVAL);
-   arm_smmu_tlb_sync_context(cookie);
-}
-
  static void arm_smmu_tlb_add_page_s1(struct iommu_iotlb_gather *gather,
 unsigned long iova, size_t granule,
 void *cookie)
@@ -357,14 +349,6 @@ static void arm_smmu_tlb_inv_walk_

Re: [PATCH v15 12/12] of: Add plumbing for restricted DMA pool

2021-07-02 Thread Robin Murphy

On 2021-07-02 04:08, Guenter Roeck wrote:

Hi,

On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote:

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 


With this patch in place, all sparc and sparc64 qemu emulations
fail to boot. Symptom is that the root file system is not found.
Reverting this patch fixes the problem. Bisect log is attached.


Ah, OF_ADDRESS depends on !SPARC, so of_dma_configure_id() is presumably 
returning an unexpected -ENODEV from the of_dma_set_restricted_buffer() 
stub. That should probably be returning 0 instead, since either way it's 
not an error condition for it to simply do nothing.


Robin.



Guenter

---
# bad: [fb0ca446157a86b75502c1636b0d81e642fe6bf1] Add linux-next specific files 
for 20210701
# good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
git bisect start 'HEAD' 'v5.13'
# bad: [f63c4fda987a19b1194cc45cb72fd5bf968d9d90] Merge remote-tracking branch 
'rdma/for-next'
git bisect bad f63c4fda987a19b1194cc45cb72fd5bf968d9d90
# good: [46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4] Merge remote-tracking branch 
'ipsec/master'
git bisect good 46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4
# good: [43ba6969cfb8185353a7a6fc79070f13b9e3d6d3] Merge remote-tracking branch 
'clk/clk-next'
git bisect good 43ba6969cfb8185353a7a6fc79070f13b9e3d6d3
# good: [1ca5eddcf8dca1d6345471c6404e7364af0d7019] Merge remote-tracking branch 
'fuse/for-next'
git bisect good 1ca5eddcf8dca1d6345471c6404e7364af0d7019
# good: [8f6d7b3248705920187263a4e7147b0752ec7dcf] Merge remote-tracking branch 
'pci/next'
git bisect good 8f6d7b3248705920187263a4e7147b0752ec7dcf
# good: [df1885a755784da3ef285f36d9230c1d090ef186] RDMA/rtrs_clt: Alloc less 
memory with write path fast memory registration
git bisect good df1885a755784da3ef285f36d9230c1d090ef186
# good: [93d31efb58c8ad4a66bbedbc2d082df458c04e45] Merge remote-tracking branch 
'cpufreq-arm/cpufreq/arm/linux-next'
git bisect good 93d31efb58c8ad4a66bbedbc2d082df458c04e45
# good: [46308965ae6fdc7c25deb2e8c048510ae51bbe66] RDMA/irdma: Check contents 
of user-space irdma_mem_reg_req object
git bisect good 46308965ae6fdc7c25deb2e8c048510ae51bbe66
# good: [6de7a1d006ea9db235492b288312838d6878385f] 
thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
git bisect good 6de7a1d006ea9db235492b288312838d6878385f
# good: [081bec2577cda3d04f6559c60b6f4e2242853520] dt-bindings: of: Add 
restricted DMA pool
git bisect good 081bec2577cda3d04f6559c60b6f4e2242853520
# good: [bf95ac0bcd69979af146852f6a617a60285ebbc1] Merge remote-tracking branch 
'thermal/thermal/linux-next'
git bisect good bf95ac0bcd69979af146852f6a617a60285ebbc1
# good: [3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc] RDMA/core: Always release 
restrack object
git bisect good 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc
# bad: [cff1f23fad6e0bd7d671acce0d15285c709f259c] Merge remote-tracking branch 
'swiotlb/linux-next'
git bisect bad cff1f23fad6e0bd7d671acce0d15285c709f259c
# bad: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for 
restricted DMA pool
git bisect bad b655006619b7bccd0dc1e055bd72de5d613e7b5c
# first bad commit: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing 
for restricted DMA pool



Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-02 Thread Robin Murphy

On 2021-07-02 14:58, Will Deacon wrote:

Hi Nathan,

On Thu, Jul 01, 2021 at 12:52:20AM -0700, Nathan Chancellor wrote:

On 7/1/2021 12:40 AM, Will Deacon wrote:

On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:

On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:

On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:

`BUG: unable to handle page fault for address: 003a8290` and
the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
(maybe dev->dma_io_tlb_mem) was corrupted?
The dev->dma_io_tlb_mem should be set here
(https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
through device_initialize.


I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
'io_tlb_default_mem', which is a page-aligned allocation from memblock.
The spinlock is at offset 0x24 in that structure, and looking at the
register dump from the crash:

Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006
Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX:  
RCX: 8900572ad580
Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c 
RDI: 1d17
Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c 
R09: 
Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 
R12: 0212
Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 
R15: 0020
Jun 29 18:28:42 hp-4300G kernel: FS:  7f1f8898ea40() 
GS:89005728() knlGS:
Jun 29 18:28:42 hp-4300G kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d 
CR4: 00350ee0
Jun 29 18:28:42 hp-4300G kernel: Call Trace:
Jun 29 18:28:42 hp-4300G kernel:  _raw_spin_lock_irqsave+0x39/0x50
Jun 29 18:28:42 hp-4300G kernel:  swiotlb_tbl_map_single+0x12b/0x4c0

Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
RDX pointing at the spinlock. Yet RAX is holding junk :/

I agree that enabling KASAN would be a good idea, but I also think we
probably need to get some more information out of swiotlb_tbl_map_single()
to see see what exactly is going wrong in there.


I can certainly enable KASAN and if there is any debug print I can add
or dump anything, let me know!


I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
x86 defconfig and ran it on my laptop. However, it seems to work fine!

Please can you share your .config?


Sure thing, it is attached. It is just Arch Linux's config run through
olddefconfig. The original is below in case you need to diff it.

https://raw.githubusercontent.com/archlinux/svntogit-packages/9045405dc835527164f3034b3ceb9a67c7a53cd4/trunk/config

If there is anything more that I can provide, please let me know.


I eventually got this booting (for some reason it was causing LD to SEGV
trying to link it for a while...) and sadly it works fine on my laptop. Hmm.

Did you manage to try again with KASAN?

It might also be worth taking the IOMMU out of the equation, since that
interfaces differently with SWIOTLB and I couldn't figure out the code path
from the log you provided. What happens if you boot with "amd_iommu=off
swiotlb=force"?


Oh, now there's a thing... the chat from the IOMMU API in the boot log 
implies that the IOMMU *should* be in the picture - we see that default 
domains are IOMMU_DOMAIN_DMA default and the GPU :0c:00.0 was added 
to a group. That means dev->dma_ops should be set and DMA API calls 
should be going through iommu-dma, yet the callstack in the crash says 
we've gone straight from dma_map_page_attrs() to swiotlb_map(), implying 
the inline dma_direct_map_page() path.


If dev->dma_ops didn't look right in the first place, it's perhaps less 
surprising that dev->dma_io_tlb_mem might be wild as well. It doesn't 
seem plausible that we should have a race between initialising the 
device and probing its driver, so maybe the whole dev pointer is getting 
trampled earlier in the callchain (or is fundamentally wrong to begin 
with, but from a quick skim of the amdgpu code it did look like 
adev->dev and adev->pdev are appropriately set early on by 
amdgpu_pci_probe()).



(although word of warning here: i915 dies horribly on my laptop if I pass
swiotlb=force, even with the distro 5.10 kernel)


FWIW I'd imagine you probably need to massively increase the SWIOTLB 
buffer size to have hope of that working.


Robin.


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Robin Murphy

On 2021-07-06 14:24, Will Deacon wrote:

On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote:

On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote:

So at this point, the AMD IOMMU driver does:

swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;

where 'swiotlb' is a global variable indicating whether or not swiotlb
is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
will call swiotlb_exit() if 'swiotlb' is false.

Now, that used to work fine, because swiotlb_exit() clears
'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
think that all the devices which have successfully probed beforehand will
have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
field.


Yeah.  I don't think we can do that anymore, and I also think it is
a bad idea to start with.


I've had a crack at reworking things along the following lines:

   - io_tlb_default_mem now lives in the BSS, the flexible array member
 is now a pointer and that part is allocated dynamically (downside of
 this is an extra indirection to get at the slots).

   - io_tlb_default_mem.nslabs tells you whether the thing is valid

   - swiotlb_exit() frees the slots array and clears the rest of the
 structure to 0. I also extended it to free the actual slabs, but I'm
 not sure why it wasn't doing that before.

So a non-NULL dev->dma_io_tlb_mem should always be valid to follow.


FWIW I was pondering the question of whether to do something along those 
lines or just scrap the default assignment entirely, so since I hadn't 
got round to saying that I've gone ahead and hacked up the alternative 
(similarly untested) for comparison :)


TBH I'm still not sure which one I prefer...

Robin.

->8-
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ea5b85354526..394abf184c1a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2847,9 +2847,6 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = io_tlb_default_mem;
-#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 39284ff2a6cd..620f16d89a98 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -107,16 +107,21 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;

+static inline struct io_tlb_mem *dev_iotlb_mem(struct device *dev)
+{
+   return dev->dma_io_tlb_mem ?: io_tlb_default_mem;
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t 
paddr)

 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);

return mem && paddr >= mem->start && paddr < mem->end;
 }

 static inline bool is_swiotlb_force_bounce(struct device *dev)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);

return mem && mem->force_bounce;
 }
@@ -167,7 +172,7 @@ bool swiotlb_free(struct device *dev, struct page 
*page, size_t size);


 static inline bool is_swiotlb_for_alloc(struct device *dev)
 {
-   return dev->dma_io_tlb_mem->for_alloc;
+   return dev_iotlb_mem(dev)->for_alloc;
 }
 #else
 static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b7f76bca89bf..f4942149f87d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -359,7 +359,7 @@ static unsigned int swiotlb_align_offset(struct 
device *dev, u64 addr)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, 
size_t size,

   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -440,7 +440,7 @@ static unsigned int wrap_index(struct io_tlb_mem 
*mem, unsigned int index)

 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
  size_t alloc_size)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -522,7 +522,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
*dev, phys_addr_t orig_addr,

size_t mapping_size, size_t alloc_size,
enum dm

Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Robin Murphy

On 2021-07-06 15:05, Christoph Hellwig wrote:

On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:

FWIW I was pondering the question of whether to do something along those
lines or just scrap the default assignment entirely, so since I hadn't got
round to saying that I've gone ahead and hacked up the alternative
(similarly untested) for comparison :)

TBH I'm still not sure which one I prefer...


Claire did implement something like your suggestion originally, but
I don't really like it as it doesn't scale for adding multiple global
pools, e.g. for the 64-bit addressable one for the various encrypted
secure guest schemes.


Ah yes, that had slipped my mind, and it's a fair point indeed. Since 
we're not concerned with a minimal fix for backports anyway I'm more 
than happy to focus on Will's approach. Another thing is that that looks 
to take us a quiet step closer to the possibility of dynamically 
resizing a SWIOTLB pool, which is something that some of the hypervisor 
protection schemes looking to build on top of this series may want to 
explore at some point.


Robin.


Re: [PATCH v2 1/2] dt-bindings: display: rockchip: Add compatible for rk3568 HDMI

2021-07-13 Thread Robin Murphy

On 2021-07-07 13:03, Benjamin Gaignard wrote:

Define a new compatible for rk3568 HDMI.
This version of HDMI hardware block needs two new clocks hclk_vio and hclk
to provide phy reference clocks.


Do you know what hclk_vio is and whether it's actually necessary? I 
don't see any mention of it downstream, and based on previous experience 
I'm suspicious that it might be just the parent of hclk, and thus should 
not need to be explicitly claimed by the device or baked into it's binding.


Robin.


Signed-off-by: Benjamin Gaignard 
---
version 2:
- Add the clocks needed for the phy.

  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml 
b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
index 75cd9c686e985..cb8643b3a8b84 100644
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
@@ -23,6 +23,7 @@ properties:
- rockchip,rk3288-dw-hdmi
- rockchip,rk3328-dw-hdmi
- rockchip,rk3399-dw-hdmi
+  - rockchip,rk3568-dw-hdmi
  
reg-io-width:

  const: 4
@@ -51,8 +52,11 @@ properties:
- vpll
- enum:
- grf
+  - hclk_vio
+  - vpll
+  - enum:
+  - hclk
- vpll
-  - const: vpll
  
ddc-i2c-bus:

  $ref: /schemas/types.yaml#/definitions/phandle



Re: [PATCH v2 1/2] dt-bindings: display: rockchip: Add compatible for rk3568 HDMI

2021-07-14 Thread Robin Murphy

On 2021-07-14 10:19, Michael Riesch wrote:

Hello Heiko,

On 7/13/21 10:49 AM, Heiko Stübner wrote:

Hi Michael,

Am Dienstag, 13. Juli 2021, 10:44:00 CEST schrieb Michael Riesch:

The HDMI TX block in the RK3568 requires two power supplies, which have
to be enabled in some cases (at least on the RK3568 EVB1 the voltages
VDDA0V9_IMAGE and VCCA1V8_IMAGE are disabled by default). It would be
great if this was considered by the driver and the device tree binding.
I am not sure, though, whether this is a RK3568 specific or
rockchip_dw_hdmi specific thing. Maybe it can even enter the Synopsis DW
HDMI driver.


I do remember that this discussion happened many years back already.
And yes the supplies are needed for all but back then there was opposition
as these are supposedly phy-related supplies, not for the dw-hdmi itself.
[There are variants with an external phy, like on the rk3328]

See discussion on [0]

[0] 
https://dri-devel.freedesktop.narkive.com/pen2zWo1/patch-v3-1-2-drm-bridge-dw-hdmi-support-optional-supply-regulators


Thanks for the pointer. My summary of this discussion would be the
following:

  - There was no consensus on how to handle the issue. The voltages still
have to be enabled from the outside of the driver.
  - Open question: rockchip-specific or general solution? (one may detect
a tendency towards a rockchip-specific solution)
  - Open question: separation of the phy from the dw_hdmi IP core?

First of all, IMHO the driver should enable those voltages, otherwise we
will have the same discussion again in 5-6 years :-)

Then, the rockchip,dw-hdmi binding features a property "phys",
presumably to handle external phys (e.g., for the RK3328). This fact and
the referenced discussion suggest a rockchip-specific solution.


FWIW I've long thought that cleaning up the phy situation in dw-hdmi 
would be a good idea. It's always seemed a bit sketchy that on RK3328 we 
still validate modes against the tables for the Synopsys phy which isn't 
relevant, and if that does allow a clock rate through that the actual 
phy rejects then things just go horribly wrong and the display breaks.



In the Rockchip documentation (at least for RK3328, RK3399 and RK3568),
there are two extra voltages denoted as "HDMI PHY analog power". It
would be tempting to add the internal phy to the device tree and glue it
to the dw-hdmi using the "phys" property. However, as pointed out in the
referenced discussion, the configuration registers of the phy are
somewhat interleaved with the dw-hdmi registers and a clear separation
may be tricky.


Conceptually I don't think there's any issue with the HDMI node being 
its own phy provider where appropriate. At the DT level it should simply 
be a case of having both sets of properties, e.g.:


&hdmi {
#phy-cells = <0>;
phys = <&hdmi>;
};

And at the driver level AFAICS it's pretty much just a case of dw-hdmi 
additionally registering itself as a phy provider if the internal phy is 
present - the only difference then should be that it can end up calling 
back into itself via the common phy API rather than directly via 
internal special-cases.


Robin.


As a more pragmatic alternative, we could add optional supplies to the
rockchip,dw-hdmi binding and evaluate the "phys" property. If the latter
is not specified, the internal phy is used and the supplies must be
enabled. Would such an approach be acceptable?

Best regards,
Michael


On 7/7/21 2:03 PM, Benjamin Gaignard wrote:

Define a new compatible for rk3568 HDMI.
This version of HDMI hardware block needs two new clocks hclk_vio and hclk
to provide phy reference clocks.

Signed-off-by: Benjamin Gaignard 
---
version 2:
- Add the clocks needed for the phy.

  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml 
b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
index 75cd9c686e985..cb8643b3a8b84 100644
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
@@ -23,6 +23,7 @@ properties:
- rockchip,rk3288-dw-hdmi
- rockchip,rk3328-dw-hdmi
- rockchip,rk3399-dw-hdmi
+  - rockchip,rk3568-dw-hdmi
  
reg-io-width:

  const: 4
@@ -51,8 +52,11 @@ properties:
- vpll
- enum:
- grf
+  - hclk_vio
+  - vpll
+  - enum:
+  - hclk
- vpll
-  - const: vpll


The description and documentation of the clocks are somewhat misleading
IMHO. This is not caused by your patches, of course. But maybe this is a
chance to clean them up a bit.

It seems that the CEC clock is an optional clock of the dw-hdmi driver.
Shouldn't it be d

Re: use of dma_direct_set_offset in (allwinner) drivers

2020-11-04 Thread Robin Murphy

On 2020-11-04 08:14, Maxime Ripard wrote:

Hi Christoph,

On Tue, Nov 03, 2020 at 10:55:38AM +0100, Christoph Hellwig wrote:

Linux 5.10-rc1 switched from having a single dma offset in struct device
to a set of DMA ranges, and introduced a new helper to set them,
dma_direct_set_offset.

This in fact surfaced that a bunch of drivers that violate our layering
and set the offset from drivers, which meant we had to reluctantly
export the symbol to set up the DMA range.

The drivers are:

drivers/gpu/drm/sun4i/sun4i_backend.c

   This just use dma_direct_set_offset as a fallback.  Is there any good
   reason to not just kill off the fallback?

drivers/media/platform/sunxi/sun4i-csi/sun4i_csi.c

   Same as above.


So, the history of this is:

   - We initially introduced the support for those two controllers
 assuming that there was a direct mapping between the physical and
 DMA addresses. It turns out it didn't and the DMA accesses were
 going through a secondary, dedicated, bus that didn't have the same
 mapping of the RAM than the CPU.

 4690803b09c6 ("drm/sun4i: backend: Offset layer buffer address by DRAM starting 
address")

   - This dedicated bus is undocumented and barely used in the vendor
 kernel so this was overlooked, and it's fairly hard to get infos on
 it for all the SoCs we support. We added the DT support for it
 though on some SoCs we had enough infos to do so:

 c43a4469402f ("dt-bindings: interconnect: Add a dma interconnect name")
 22f88e311399 ("ARM: dts: sun5i: Add the MBUS controller")

 This explains the check on the interconnect property

   - However, due to the stable DT rule, we still need to operate without
 regressions on older DTs that wouldn't have that property (and for
 SoCs we haven't figured out). Hence the fallback.


How about having something in the platform code that keys off the 
top-level SoC compatible and uses a bus notifier to create offsets for 
the relevant devices if an MBUS description is missing? At least that 
way the workaround could be confined to a single dedicated place and 
look somewhat similar to other special cases like sta2x11, rather than 
being duplicated all over the place.


Robin.


drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c

   This driver unconditionally sets the offset.  Why can't we do this
   in the device tree?

drivers/staging/media/sunxi/cedrus/cedrus_hw.c

   Same as above.



We should make those two match the previous ones, but we'll have the
same issue here eventually. Most likely they were never ran on an SoC
for which we have the MBUS figured out.

Maxime


___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH RFC libdrm 1/2] intel: Improve checks for big-endian

2022-03-08 Thread Robin Murphy

On 2022-03-07 20:53, Geert Uytterhoeven wrote:

   - sparc64-linux-gnu-gcc does not define __BIG_ENDIAN__ or SPARC, but
 does define __sparc__, hence replace the check for SPARC by a check
 for __sparc__,
   - powerpc{,64,64}-linux-gnu-gcc does not define __ppc__ or __ppc64__,
 but does define __BIG_ENDIAN__.
 powerpc64le-linux-gnu-gcc does not define __ppc__ or __ppc64__, but
 does define __LITTLE_ENDIAN__.
 Hence remove the checks for __ppc__ and __ppc64__.
   - mips{,64}-linux-gnu{,abi64}-gcc does not define __BIG_ENDIAN__, but
 does define __MIPSEB__, so add a check for the latter,
   - m68k-linux-gnu-gcc does not define __BIG_ENDIAN__, but does define
 __mc68000__, so add a check for the latter,
   - hppa{,64}-linux-gnu-gcc defines __BIG_ENDIAN__, and thus should work
 out-of-the-box.


FWIW there's also __ARM_BIG_ENDIAN for arm-* and aarch64-* targets in BE 
mode, if you want to flesh out the list even more. In principle there's 
also "__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__" which should supposedly be 
consistent across platforms, but I don't know if that's even more of a 
specific GCC-ism.


Robin.


Signed-off-by: Geert Uytterhoeven 
---
Untested due to lack of hardware.
---
  intel/uthash.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/intel/uthash.h b/intel/uthash.h
index 45d1f9fc12a1d6f9..0f5480be6e8ca033 100644
--- a/intel/uthash.h
+++ b/intel/uthash.h
@@ -648,7 +648,7 @@ do {
  #define MUR_PLUS2_ALIGNED(p) (((unsigned long)p & 3UL) == 2UL)
  #define MUR_PLUS3_ALIGNED(p) (((unsigned long)p & 3UL) == 3UL)
  #define WP(p) ((uint32_t*)((unsigned long)(p) & ~3UL))
-#if (defined(__BIG_ENDIAN__) || defined(SPARC) || defined(__ppc__) || 
defined(__ppc64__))
+#if (defined(__BIG_ENDIAN__) || defined(__sparc__) || defined(__mc68000__) || 
defined(__MIPSEB__))
  #define MUR_THREE_ONE(p) *WP(p))&0x00ff) << 8) | (((*(WP(p)+1))&0xff00) 
>> 24))
  #define MUR_TWO_TWO(p)   *WP(p))&0x) <<16) | (((*(WP(p)+1))&0x) 
>> 16))
  #define MUR_ONE_THREE(p) *WP(p))&0x00ff) <<24) | (((*(WP(p)+1))&0xff00) 
>>  8))


Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk

2022-03-09 Thread Robin Murphy

On 2022-03-09 08:37, elaine.zhang wrote:

hi,


在 2022/3/9 下午4:18, Sascha Hauer 写道:

Hi Elaine,

On Wed, Mar 09, 2022 at 09:41:39AM +0800, zhangq...@rock-chips.com wrote:

    hi,all:
    Let me explain the clock dependency:
    From the clock tree, pclk_vo0 and hclk_vo0 are completely 
independent
    clocks with different parent clocks and different clock 
frequencies。

    But the niu path is :
    pclk_vo is dependent on hclk_vo, and the pclk_vo niu goes 
through  hclk_vo

    niu.

Thanks, this is the information we are looking for. What is "NIU" btw?
I think this is even documented in the Reference Manual. With the right
pointer I just found:


The NIU (native interface unit)

You can see 2.8.6(NIU Clock gating reliance) in TRM.




A part of niu clocks have a dependence on another niu clock in order to
sharing the internal bus. When these clocks are in use, another niu
clock must be opened, and cannot be gated.  These clocks and the special
clock on which they are relied are as following:

Clocks which have dependency The clock which can not be gated
-
...
pclk_vo_niu, hclk_vo_s_niu   hclk_vo_niu
...


Yeah, the dependency is this.

It may be not very detailed, I don't have permission to publish detailed 
NIU designs.






    The clock tree and NIU bus paths are designed independently
    So there are three solutions to this problem:
    1. DTS adds a reference to Hclk while referencing Pclk.
    2, The dependent clock is always on, such as HCLK_VO0, but this 
is not

    friendly for the system power.
    3. Create a non-clock-tree reference. Clk-link, for example, we 
have an
    implementation in our internal branch, but Upstream is not sure 
how to

    push it.

I thought about something similar. That would help us here and on i.MX
we have a similar situation: We have one bit that switches multiple
clocks. That as well cannot be designed properly in the clock framework
currently, but could be modelled with a concept of linked clocks.

Doing this sounds like quite a bit of work and discussion though, I
don't really like having this as a dependency to mainline the VOP2
driver. I vote for 1. in that case, we could still ignore the hclk in
dts later when we have linked clocks.


OK so just to clarify, the issue is not just that the upstream clock 
driver doesn't currently model the NIU clocks, but that even if it did, 
it would still need to implement some new multi-parent clock type to 
manage the internal dependency? That's fair enough - I do think that 
improving the clock driver would be the best long-term goal, but for the 
scope of this series, having an interim workaround does seem more 
reasonable now that we understand *why* we need it.


Thanks,
Robin.


Re: [PATCH v7 11/24] dt-bindings: display: rockchip: dw-hdmi: Add additional clock

2022-03-09 Thread Robin Murphy

On 2022-02-25 07:51, Sascha Hauer wrote:

The rk3568 HDMI has an additional clock that needs to be enabled for the
HDMI controller to work. The purpose of that clock is not clear. It is
named "hclk" in the downstream driver, so use the same name.


Further to the clarification of the underlying purpose on the other 
thread, I suggest we call the new clock "niu" and describe it as 
something like "Additional clock required to ungate the bus interface on 
certain SoCs".


Cheers,
Robin.


Signed-off-by: Sascha Hauer 
Acked-by: Rob Herring 
---

Notes:
 Changes since v4:
 - Add Robs Ack

  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml| 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git 
a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml 
b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
index 38ebb69830287..67a76f51637a7 100644
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
@@ -44,12 +44,13 @@ properties:
  items:
- {}
- {}
-  # The next three clocks are all optional, but shall be specified in this
+  # The next four clocks are all optional, but shall be specified in this
# order when present.
- description: The HDMI CEC controller main clock
- description: Power for GRF IO
- description: External clock for some HDMI PHY (old clock name, 
deprecated)
- description: External clock for some HDMI PHY (new name)
+  - description: hclk
  
clock-names:

  minItems: 2
@@ -61,13 +62,17 @@ properties:
- grf
- vpll
- ref
+  - hclk
- enum:
- grf
- vpll
- ref
+  - hclk
- enum:
- vpll
- ref
+  - hclk
+  - const: hclk
  
ddc-i2c-bus:

  $ref: /schemas/types.yaml#/definitions/phandle


Re: [PATCH] drm/bridge: synopsys/dw-hdmi: set cec clock rate

2022-03-14 Thread Robin Murphy

On 2022-03-13 12:56, Peter Geis wrote:

On Sun, Mar 13, 2022 at 6:13 AM Piotr Oniszczuk
 wrote:





Wiadomość napisana przez Peter Geis  w dniu 26.01.2022, o 
godz. 21:24:

The hdmi-cec clock must be 32khz in order for cec to work correctly.
Ensure after enabling the clock we set it in order for the hardware to
work as expected.
Warn on failure, in case this is a static clock that is slighty off.
Fixes hdmi-cec support on Rockchip devices.

Fixes: ebe32c3e282a ("drm/bridge: synopsys/dw-hdmi: Enable cec clock")

Signed-off-by: Peter Geis 
---
drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 7 +++
1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index 54d8fdad395f..1a96da60e357 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -48,6 +48,9 @@

#define HDMI14_MAX_TMDSCLK34000

+/* HDMI CEC needs a clock rate of 32khz */
+#define HDMI_CEC_CLK_RATE32768
+
enum hdmi_datamap {
   RGB444_8B = 0x01,
   RGB444_10B = 0x03,
@@ -3347,6 +3350,10 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device 
*pdev,
   ret);
   goto err_iahb;
   }
+
+ ret = clk_set_rate(hdmi->cec_clk, HDMI_CEC_CLK_RATE);
+ if (ret)
+ dev_warn(hdmi->dev, "Cannot set HDMI cec clock rate: 
%d\n", ret);
   }

   /* Product and revision IDs */
--
2.25.1


___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip


Peter,

On my 5.17-rc7 with applied rk356x VOP2 v8 series - this patch makes CEC 
working on rk3566.
Unfortunately it breaks working ok CEC on rk3399 rockpi-4b.

Reverting this patch brings back CEC on rk3399 - but rk3366 becomes with non 
working CEC

I'm not sure how to move forward with this


I was worried about that, thanks for testing it.
Can you send me the cec_clk rate before and after this patch?


Hmm, looks like there might be a bug in the RK3399 clock driver - 
although it's treated as having mux_pll_p as parents, according to the 
public TRM the CEC clock appears unique in being backwards compared to 
every other mux of those two inputs. I'm now tempted to test this on my 
board tonight and see if I can see 24MHz on the CEC pin... :)


Robin.


Re: [PATCH] drm/bridge: synopsys/dw-hdmi: set cec clock rate

2022-03-14 Thread Robin Murphy

On 2022-03-14 11:31, Robin Murphy wrote:

On 2022-03-13 12:56, Peter Geis wrote:

On Sun, Mar 13, 2022 at 6:13 AM Piotr Oniszczuk
 wrote:




Wiadomość napisana przez Peter Geis  w dniu 
26.01.2022, o godz. 21:24:


The hdmi-cec clock must be 32khz in order for cec to work correctly.
Ensure after enabling the clock we set it in order for the hardware to
work as expected.
Warn on failure, in case this is a static clock that is slighty off.
Fixes hdmi-cec support on Rockchip devices.

Fixes: ebe32c3e282a ("drm/bridge: synopsys/dw-hdmi: Enable cec clock")

Signed-off-by: Peter Geis 
---
drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 7 +++
1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c

index 54d8fdad395f..1a96da60e357 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -48,6 +48,9 @@

#define HDMI14_MAX_TMDSCLK    34000

+/* HDMI CEC needs a clock rate of 32khz */
+#define HDMI_CEC_CLK_RATE    32768
+
enum hdmi_datamap {
   RGB444_8B = 0x01,
   RGB444_10B = 0x03,
@@ -3347,6 +3350,10 @@ struct dw_hdmi *dw_hdmi_probe(struct 
platform_device *pdev,

   ret);
   goto err_iahb;
   }
+
+ ret = clk_set_rate(hdmi->cec_clk, HDMI_CEC_CLK_RATE);
+ if (ret)
+ dev_warn(hdmi->dev, "Cannot set HDMI cec clock 
rate: %d\n", ret);

   }

   /* Product and revision IDs */
--
2.25.1


___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip


Peter,

On my 5.17-rc7 with applied rk356x VOP2 v8 series - this patch makes 
CEC working on rk3566.

Unfortunately it breaks working ok CEC on rk3399 rockpi-4b.

Reverting this patch brings back CEC on rk3399 - but rk3366 becomes 
with non working CEC


I'm not sure how to move forward with this


I was worried about that, thanks for testing it.
Can you send me the cec_clk rate before and after this patch?


Hmm, looks like there might be a bug in the RK3399 clock driver - 
although it's treated as having mux_pll_p as parents, according to the 
public TRM the CEC clock appears unique in being backwards compared to 
every other mux of those two inputs. I'm now tempted to test this on my 
board tonight and see if I can see 24MHz on the CEC pin... :)


Nope, turns out that's an error in the TRM and the mux is fine. The bug 
is between the clock driver blindly assuming xin32k is usable, and 
nearly all RK3399 boards forgetting to claim the pinctrl to actually 
enable the input :/


Robin.


Re: [PATCH v2 4/8] drm/virtio: Improve DMA API usage for shmem BOs

2022-03-16 Thread Robin Murphy

On 2022-03-14 22:42, Dmitry Osipenko wrote:

DRM API requires the DRM's driver to be backed with the device that can
be used for generic DMA operations. The VirtIO-GPU device can't perform
DMA operations if it uses PCI transport because PCI device driver creates
a virtual VirtIO-GPU device that isn't associated with the PCI. Use PCI's
GPU device for the DRM's device instead of the VirtIO-GPU device and drop
DMA-related hacks from the VirtIO-GPU driver.

Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/virtio/virtgpu_drv.c| 22 +++---
  drivers/gpu/drm/virtio/virtgpu_drv.h|  5 +--
  drivers/gpu/drm/virtio/virtgpu_kms.c|  7 ++--
  drivers/gpu/drm/virtio/virtgpu_object.c | 56 +
  drivers/gpu/drm/virtio/virtgpu_vq.c | 13 +++---
  5 files changed, 37 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 5f25a8d15464..8449dad3e65c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -46,9 +46,9 @@ static int virtio_gpu_modeset = -1;
  MODULE_PARM_DESC(modeset, "Disable/Enable modesetting");
  module_param_named(modeset, virtio_gpu_modeset, int, 0400);
  
-static int virtio_gpu_pci_quirk(struct drm_device *dev, struct virtio_device *vdev)

+static int virtio_gpu_pci_quirk(struct drm_device *dev)
  {
-   struct pci_dev *pdev = to_pci_dev(vdev->dev.parent);
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
const char *pname = dev_name(&pdev->dev);
bool vga = (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
char unique[20];
@@ -101,6 +101,7 @@ static int virtio_gpu_pci_quirk(struct drm_device *dev, 
struct virtio_device *vd
  static int virtio_gpu_probe(struct virtio_device *vdev)
  {
struct drm_device *dev;
+   struct device *dma_dev;
int ret;
  
  	if (drm_firmware_drivers_only() && virtio_gpu_modeset == -1)

@@ -109,18 +110,29 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
if (virtio_gpu_modeset == 0)
return -EINVAL;
  
-	dev = drm_dev_alloc(&driver, &vdev->dev);

+   /*
+* If GPU's parent is a PCI device, then we will use this PCI device
+* for the DRM's driver device because GPU won't have PCI's IOMMU DMA
+* ops in this case since GPU device is sitting on a separate (from PCI)
+* virtio-bus.
+*/
+   if (!strcmp(vdev->dev.parent->bus->name, "pci"))


Nit: dev_is_pci() ?

However, what about other VirtIO transports? Wouldn't virtio-mmio with 
F_ACCESS_PLATFORM be in a similar situation?


Robin.


+   dma_dev = vdev->dev.parent;
+   else
+   dma_dev = &vdev->dev;
+
+   dev = drm_dev_alloc(&driver, dma_dev);
if (IS_ERR(dev))
return PTR_ERR(dev);
vdev->priv = dev;
  
  	if (!strcmp(vdev->dev.parent->bus->name, "pci")) {

-   ret = virtio_gpu_pci_quirk(dev, vdev);
+   ret = virtio_gpu_pci_quirk(dev);
if (ret)
goto err_free;
}
  
-	ret = virtio_gpu_init(dev);

+   ret = virtio_gpu_init(vdev, dev);
if (ret)
goto err_free;
  
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h

index 0a194aaad419..b2d93cb12ebf 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -100,8 +100,6 @@ struct virtio_gpu_object {
  
  struct virtio_gpu_object_shmem {

struct virtio_gpu_object base;
-   struct sg_table *pages;
-   uint32_t mapped;
  };
  
  struct virtio_gpu_object_vram {

@@ -214,7 +212,6 @@ struct virtio_gpu_drv_cap_cache {
  };
  
  struct virtio_gpu_device {

-   struct device *dev;
struct drm_device *ddev;
  
  	struct virtio_device *vdev;

@@ -282,7 +279,7 @@ extern struct drm_ioctl_desc 
virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
  void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
  
  /* virtgpu_kms.c */

-int virtio_gpu_init(struct drm_device *dev);
+int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev);
  void virtio_gpu_deinit(struct drm_device *dev);
  void virtio_gpu_release(struct drm_device *dev);
  int virtio_gpu_driver_open(struct drm_device *dev, struct drm_file *file);
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 3313b92db531..0d1e3eb61bee 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -110,7 +110,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
vgdev->num_capsets = num_capsets;
  }
  
-int virtio_gpu_init(struct drm_device *dev)

+int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev)
  {

Re: [PATCH v8 09/24] drm/rockchip: dw_hdmi: Add support for niu clk

2022-03-16 Thread Robin Murphy
eally 
caring about power efficiency, then arguably the simplest and cleanest 
workaround would be the other option that Elaine mentioned, of just 
marking hclk_vo as critical for now. If it's likely to turn into a 
"nothing's as permanent as a temporary fix" situation, though, then the 
DT binding has less functional impact, even if it does leave us 
developers with baggage down the line.


Robin.



I have last questions..

1. Previously you said that the PD driver takes care of enabling all the
clocks it can find in the device by itself on RPM-resume, then why HDMI
driver needs to enable the clock explicitly?

2. You added clk_prepare_enable(), but there is no corresponding
clk_disable_unprepare(), AFAICS. Why?


Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Robin Murphy

On 2022-03-23 07:15, Christian König wrote:

Am 22.03.22 um 10:34 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access.


Well that some ARM boards doesn't allow unaligned access to MMIO space 
is a well known bug of those ARM boards.


So as far as I know this is a hardware bug you are trying to workaround 
here and I'm not 100% sure that this is correct.


No, this one's not a bug. The Device memory type used for iomem mappings 
is *architecturally* defined to enforce properties like aligned 
accesses, no speculation, no reordering, etc. If something wants to be 
treated more like RAM than actual MMIO registers, then ioremap_wc() or 
ioremap_cache() is the appropriate thing to do in general (with the 
former being a bit more portable according to 
Documentation/driver-api/device-io.rst).


Of course *then* you might find that on some systems the 
interconnect/PCIe implementation/endpoint doesn't actually like 
unaligned accesses once the CPU MMU starts allowing them to be sent out, 
but hey, one step at a time ;)


Robin.



Christian.



   6.620515] pc : setup_hw_slot+0x24/0x60 [qxl]
[    6.620961] lr : setup_slot+0x34/0xf0 [qxl]
[    6.621376] sp : 800012b73760
[    6.621701] x29: 800012b73760 x28: 0001 x27: 
1000
[    6.622400] x26: 0001 x25: 0400 x24: 
cf376848c000
[    6.623099] x23: c4087400 x22: cf3718e17140 x21: 

[    6.623823] x20: c4086000 x19: c40870b0 x18: 
0014
[    6.624519] x17: 4d3605ab x16: bb3b6129 x15: 
6e771809
[    6.625214] x14: 0001 x13: 007473696c5f7974 x12: 
696e69615f65
[    6.625909] x11: d543656a x10:  x9 : 
cf3718e085a4
[    6.626616] x8 : 006c7871 x7 : 000a x6 : 
0017
[    6.627343] x5 : 1400 x4 : 800011f63400 x3 : 
1400
[    6.628047] x2 :  x1 : c40870b0 x0 : 
c4086000

[    6.628751] Call trace:
[    6.628994]  setup_hw_slot+0x24/0x60 [qxl]
[    6.629404]  setup_slot+0x34/0xf0 [qxl]
[    6.629790]  qxl_device_init+0x6f0/0x7f0 [qxl]
[    6.630235]  qxl_pci_probe+0xdc/0x1d0 [qxl]
[    6.630654]  local_pci_probe+0x48/0xb8
[    6.631027]  pci_device_probe+0x194/0x208
[    6.631464]  really_probe+0xd0/0x458
[    6.631818]  __driver_probe_device+0x124/0x1c0
[    6.632256]  driver_probe_device+0x48/0x130
[    6.632669]  __driver_attach+0xc4/0x1a8
[    6.633049]  bus_for_each_dev+0x78/0xd0
[    6.633437]  driver_attach+0x2c/0x38
[    6.633789]  bus_add_driver+0x154/0x248
[    6.634168]  driver_register+0x6c/0x128
[    6.635205]  __pci_register_driver+0x4c/0x58
[    6.635628]  qxl_init+0x48/0x1000 [qxl]
[    6.636013]  do_one_initcall+0x50/0x240
[    6.636390]  do_init_module+0x60/0x238
[    6.636768]  load_module+0x2458/0x2900
[    6.637136]  __do_sys_finit_module+0xbc/0x128
[    6.637561]  __arm64_sys_finit_module+0x28/0x38
[    6.638004]  invoke_syscall+0x74/0xf0
[    6.638366]  el0_svc_common.constprop.0+0x58/0x1a8
[    6.638836]  do_el0_svc+0x2c/0x90
[    6.639216]  el0_svc+0x40/0x190
[    6.639526]  el0t_64_sync_handler+0xb0/0xb8
[    6.639934]  el0t_64_sync+0x1a4/0x1a8
[    6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083)
[    6.640889] ---[ end trace 95615d89b7c87f95 ]---

Signed-off-by: Cong Liu 
---
  drivers/gpu/drm/qxl/qxl_kms.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_kms.c 
b/drivers/gpu/drm/qxl/qxl_kms.c

index 4dc5ad13f12c..0e61ac04d8ad 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev,
   (int)qdev->surfaceram_size / 1024,
   (sb == 4) ? "64bit" : "32bit");
+#ifdef CONFIG_ARM64
+    qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
+#else
  qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
+#endif
  if (!qdev->rom) {
  pr_err("Unable to ioremap ROM\n");
  r = -ENOMEM;
@@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev,
  goto rom_unmap;
  }
+#ifdef CONFIG_ARM64
+    qdev->ram_header = ioremap_cache(qdev->vram_base +
+   qdev->rom->ram_header_offset,
+   sizeof(*qdev->ram_header));
+#else
  qdev->ram_header = ioremap(qdev->vram_base +
 qdev->rom->ram_header_offset,
 sizeof(*qdev->ram_header));
+#endif
  if (!qdev->ram_header) {
  DRM_ERROR("Unable to ioremap RAM header\n");
  r = -ENOMEM;




Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Robin Murphy

On 2022-03-23 10:11, Gerd Hoffmann wrote:

On Wed, Mar 23, 2022 at 09:45:13AM +, Robin Murphy wrote:

On 2022-03-23 07:15, Christian K�nig wrote:

Am 22.03.22 um 10:34 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access.


Well that some ARM boards doesn't allow unaligned access to MMIO space
is a well known bug of those ARM boards.

So as far as I know this is a hardware bug you are trying to workaround
here and I'm not 100% sure that this is correct.


No, this one's not a bug. The Device memory type used for iomem mappings is
*architecturally* defined to enforce properties like aligned accesses, no
speculation, no reordering, etc. If something wants to be treated more like
RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the
appropriate thing to do in general (with the former being a bit more
portable according to Documentation/driver-api/device-io.rst).


Well, qxl is a virtual device, so it *is* ram.

I'm wondering whenever qxl actually works on arm?  As far I know all
virtual display devices with (virtual) pci memory bars for vram do not
work on arm due to the guest mapping vram as io memory and the host
mapping vram as normal ram and the mapping attribute mismatch causes
caching troubles (only noticeable on real arm hardware, not in
emulation).  Did something change here recently?


Indeed, Armv8.4 introduced the S2FWB feature to cope with situations 
like this - essentially it allows the hypervisor to share RAM-backed 
pages with the guest without losing coherency regardless of how the 
guest maps them.


Robin.


Re: [PATCH v3 1/9] dt-bindings: host1x: Add memory-contexts property

2022-02-21 Thread Robin Murphy

On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:

Add schema information for the memory-contexts property used to
specify context stream IDs. This uses the standard iommu-map property
inside a child node.


Couldn't you simply make "iommu-map" an allowed property on the host1x 
node itself? From a DT perspective I'm not sure the intermediate node 
really fits meaningfully, and I can't see that it serves much purpose in 
practice either, other than perhaps defeating fw_devlink.


Robin.


Signed-off-by: Mikko Perttunen 
---
v3:
* New patch
---
  .../bindings/display/tegra/nvidia,tegra20-host1x.yaml  | 10 ++
  1 file changed, 10 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 
b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..3ac0fde54a16 100644
--- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,16 @@ allOf:
  reset-names:
maxItems: 1
  
+memory-contexts:

+  type: object
+  properties:
+iommu-map:
+  description: Specification of stream IDs available for memory 
context device
+use. Should be a mapping of IDs 0..n to IOMMU entries 
corresponding to
+usable stream IDs.
+  required:
+- iommu-map
+
required:
  - reg-names
  


Re: [PATCH v3 1/9] dt-bindings: host1x: Add memory-contexts property

2022-02-21 Thread Robin Murphy

On 2022-02-21 15:28, Mikko Perttunen wrote:

On 2/21/22 17:23, Robin Murphy wrote:

On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:

Add schema information for the memory-contexts property used to
specify context stream IDs. This uses the standard iommu-map property
inside a child node.


Couldn't you simply make "iommu-map" an allowed property on the host1x 
node itself? From a DT perspective I'm not sure the intermediate node 
really fits meaningfully, and I can't see that it serves much purpose 
in practice either, other than perhaps defeating fw_devlink.


Robin.


The stream IDs described here are not used by the host1x device itself, 
so I don't think I can. Host1x's memory transactions still go through 
the stream ID specified in its 'iommus' property, these stream IDs are 
used by engines (typically in addition to the stream ID specified in 
their own nodes).


Host1x 'iommus' -- Channel commands
Engine 'iommus' -- Engine firmware (and data if context isolation is not 
enabled)

memory-contexts 'iommu-map' -- Data used by engines.


Right, that still appears to match my understanding, that as far as 
software sees, the host1x is effectively acting as a bridge to the 
engines in itself. Even if it's not physically routing traffic in and/or 
out, the host1x device is the place where the context IDs *logically* 
exist, and thus owns the mapping between context IDs and the StreamIDs 
emitted by any engine working in a given context.


Consider a PCIe root complex with integrated endpoints - chances are the 
RCiEPs have their own physical interfaces to issue DMA directly into the 
SoC interconnect, but that doesn't change how we describe the PCI 
Requester ID to StreamID mapping at the root complex, since the RC still 
logically owns the RID space. You can think of a RID as being "consumed" 
at the RC by indexing into config space to ultimately gain control of 
the corresponding endpoint, just like context IDs are "consumed" at the 
 host1x by generating commands to ultimately cause some engine to 
operate in the correct address space.


You don't have to pretend the host1x uses a context for its own 
command-fetching (or whatever) traffic either - it's always been 
intended that the "iommus" and "iommu-map" properties should happily be 
able to coexist on the same node, since they serve distinctly different 
purposes. If it doesn't work in practice then we've got a bug to fix 
somewhere.


If the context-switching mechanism was some distinct self-contained 
thing bolted on beside the other host1x functionality then describing it 
as a separate level of DT hierarchy might be more justifiable, but 
that's not the impression I'm getting from skimming the rest of the 
series. Just reading of the names of things in patch #6, my intuitive 
reaction is that clearly each host1x owns 9 StreamIDs, one for general 
stuff and 8 for contexts. Adding the knowledge that technically the 
context StreamIDs end up delegated to other host1x-controlled engines 
still doesn't shift the paradigm. I don't believe we need a level of DT 
structure purely to help document what the iommu-map means for host1x - 
the binding can do that just fine.


Thanks,
Robin.

(Perhaps I should add this information to various places in more 
abundance and clarity.)


Mikko




Signed-off-by: Mikko Perttunen 
---
v3:
* New patch
---
  .../bindings/display/tegra/nvidia,tegra20-host1x.yaml  | 10 ++
  1 file changed, 10 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 
b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 


index 4fd513efb0f7..3ac0fde54a16 100644
--- 
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 

+++ 
b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 


@@ -144,6 +144,16 @@ allOf:
  reset-names:
    maxItems: 1
+    memory-contexts:
+  type: object
+  properties:
+    iommu-map:
+  description: Specification of stream IDs available for 
memory context device
+    use. Should be a mapping of IDs 0..n to IOMMU 
entries corresponding to

+    usable stream IDs.
+  required:
+    - iommu-map
+
    required:
  - reg-names




Re: [PATCH v3 8/9] drm/tegra: vic: Implement get_streamid_offset

2022-02-21 Thread Robin Murphy

On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:

Implement the get_streamid_offset required for supporting context
isolation. Since old firmware cannot support context isolation
without hacks that we don't want to implement, check the firmware
binary to see if context isolation should be enabled.

Signed-off-by: Mikko Perttunen 
---
  drivers/gpu/drm/tegra/vic.c | 38 +
  1 file changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 1e342fa3d27b..2863ee5e0e67 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -38,6 +38,8 @@ struct vic {
struct clk *clk;
struct reset_control *rst;
  
+	bool can_use_context;

+
/* Platform configuration */
const struct vic_config *config;
  };
@@ -229,6 +231,7 @@ static int vic_load_firmware(struct vic *vic)
  {
struct host1x_client *client = &vic->client.base;
struct tegra_drm *tegra = vic->client.drm;
+   u32 fce_bin_data_offset;
dma_addr_t iova;
size_t size;
void *virt;
@@ -277,6 +280,25 @@ static int vic_load_firmware(struct vic *vic)
vic->falcon.firmware.phys = phys;
}
  
+	/*

+* Check if firmware is new enough to not require mapping firmware
+* to data buffer domains.
+*/
+   fce_bin_data_offset = *(u32 *)(virt + VIC_UCODE_FCE_DATA_OFFSET);
+
+   if (!vic->config->supports_sid) {
+   vic->can_use_context = false;
+   } else if (fce_bin_data_offset != 0x0 && fce_bin_data_offset != 
0xa5a5a5a5) {
+   /*
+* Firmware will access FCE through STREAMID0, so context
+* isolation cannot be used.
+*/
+   vic->can_use_context = false;
+   dev_warn_once(vic->dev, "context isolation disabled due to old 
firmware\n");
+   } else {
+   vic->can_use_context = true;
+   }
+
return 0;
  
  cleanup:

@@ -358,10 +380,26 @@ static void vic_close_channel(struct tegra_drm_context 
*context)
host1x_channel_put(context->channel);
  }
  
+static int vic_get_streamid_offset(struct tegra_drm_client *client)

+{
+   struct vic *vic = to_vic(client);
+   int err;
+
+   err = vic_load_firmware(vic);
+   if (err < 0)
+   return err;
+
+   if (vic->can_use_context)
+   return 0x30;
+   else
+   return -ENOTSUPP;
+}
+
  static const struct tegra_drm_client_ops vic_ops = {
.open_channel = vic_open_channel,
.close_channel = vic_close_channel,
.submit = tegra_drm_submit,
+   .get_streamid_offset = vic_get_streamid_offset,


The patch order seems off here, since the .get_streamid_offset member 
isn't defined yet.


Robin.


  };
  
  #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"


Re: [PATCH v3 2/9] gpu: host1x: Add context bus

2022-02-22 Thread Robin Murphy

On 2022-02-22 16:21, Christoph Hellwig wrote:

On Fri, Feb 18, 2022 at 01:39:45PM +0200, Mikko Perttunen wrote:

The context bus is a "dummy" bus that contains struct devices that
correspond to IOMMU contexts assigned through Host1x to processes.

Even when host1x itself is built as a module, the bus is registered
in built-in code so that the built-in ARM SMMU driver is able to
reference it.

Signed-off-by: Mikko Perttunen 
---
  drivers/gpu/Makefile   |  3 +--
  drivers/gpu/host1x/Kconfig |  5 +
  drivers/gpu/host1x/Makefile|  1 +
  drivers/gpu/host1x/context_bus.c   | 31 ++
  include/linux/host1x_context_bus.h | 15 +++
  5 files changed, 53 insertions(+), 2 deletions(-)
  create mode 100644 drivers/gpu/host1x/context_bus.c
  create mode 100644 include/linux/host1x_context_bus.h

diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index 835c88318cec..8997f0096545 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -2,7 +2,6 @@
  # drm/tegra depends on host1x, so if both drivers are built-in care must be
  # taken to initialize them in the correct order. Link order is the only way
  # to ensure this currently.
-obj-$(CONFIG_TEGRA_HOST1X) += host1x/
-obj-y  += drm/ vga/
+obj-y  += host1x/ drm/ vga/
  obj-$(CONFIG_IMX_IPUV3_CORE)  += ipu-v3/
  obj-$(CONFIG_TRACE_GPU_MEM)   += trace/
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
index 6815b4db17c1..1861a8180d3f 100644
--- a/drivers/gpu/host1x/Kconfig
+++ b/drivers/gpu/host1x/Kconfig
@@ -1,8 +1,13 @@
  # SPDX-License-Identifier: GPL-2.0-only
+
+config TEGRA_HOST1X_CONTEXT_BUS
+   bool
+
  config TEGRA_HOST1X
tristate "NVIDIA Tegra host1x driver"
depends on ARCH_TEGRA || (ARM && COMPILE_TEST)
select DMA_SHARED_BUFFER
+   select TEGRA_HOST1X_CONTEXT_BUS
select IOMMU_IOVA
help
  Driver for the NVIDIA Tegra host1x hardware.
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index d2b6f7de0498..c891a3e33844 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -18,3 +18,4 @@ host1x-y = \
hw/host1x07.o
  
  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o

+obj-$(CONFIG_TEGRA_HOST1X_CONTEXT_BUS) += context_bus.o
diff --git a/drivers/gpu/host1x/context_bus.c b/drivers/gpu/host1x/context_bus.c
new file mode 100644
index ..2625914f3c7d
--- /dev/null
+++ b/drivers/gpu/host1x/context_bus.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, NVIDIA Corporation.
+ */
+
+#include 
+#include 
+
+struct bus_type host1x_context_device_bus_type = {
+   .name = "host1x-context",
+};
+EXPORT_SYMBOL(host1x_context_device_bus_type);


EXPORT_SYMBOL_GPL, please.

But the pattern that this copies in arm_smmu_bus_init is really
ugly.  I think we need to figure out a way todo that without having
to export all the low-level bus types.


Yup, as it happens that was the first step on my mission :)

https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

Still a way to go with the main meat of that work, though, so I was 
figuring this could probably land as-is and I'll sweep it up in due course.


Robin.


Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk

2022-02-25 Thread Robin Murphy

On 2022-02-25 11:10, Dmitry Osipenko wrote:

25.02.2022 13:49, Sascha Hauer пишет:

On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote:

25.02.2022 10:51, Sascha Hauer пишет:

The rk3568 HDMI has an additional clock that needs to be enabled for the
HDMI controller to work. The purpose of that clock is not clear. It is
named "hclk" in the downstream driver, so use the same name.

Signed-off-by: Sascha Hauer 
---

Notes:
 Changes since v5:
 - Use devm_clk_get_optional rather than devm_clk_get

  drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 
  1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index fe4f9556239ac..c6c00e8779ab5 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -76,6 +76,7 @@ struct rockchip_hdmi {
const struct rockchip_hdmi_chip_data *chip_data;
struct clk *ref_clk;
struct clk *grf_clk;
+   struct clk *hclk_clk;
struct dw_hdmi *hdmi;
struct regulator *avdd_0v9;
struct regulator *avdd_1v8;
@@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi 
*hdmi)
return PTR_ERR(hdmi->grf_clk);
}
  
+	hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk");

+   if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) {


Have you tried to investigate the hclk? I'm still thinking that's not
only HDMI that needs this clock and then the hardware description
doesn't look correct.


I am still not sure what you mean. Yes, it's not only the HDMI that
needs this clock. The VOP2 needs it as well and the driver handles that.


I'm curious whether DSI/DP also need that clock to be enabled. If they
do, then you aren't modeling h/w properly AFAICS.


Assuming nobody at Rockchip decided to make things needlessly 
inconsistent with previous SoCs, HCLK_VOP should be the clock for the 
VOP's AHB slave interface. Usually, if that affected anything other than 
accessing VOP registers, indeed it would smell of something being wrong 
in the clock tree, but in this case I'd also be suspicious of whether it 
might have ended up clocking related GRF registers as well (either 
directly, or indirectly via some gate that the clock driver hasn't 
modelled yet).


If the symptom of not claiming HCLK_VOP is hanging on some register 
access in the HDMI driver while the VOP is idle, then it should be 
relatively straightforward to narrow down with some logging, and see if 
it looks like this is really just another "grf" clock. If not, then 
we're back to suspecting something more insidiously wrong elsewhere.


Robin.


Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk

2022-02-25 Thread Robin Murphy

On 2022-02-25 13:11, Sascha Hauer wrote:

On Fri, Feb 25, 2022 at 12:41:23PM +, Robin Murphy wrote:

On 2022-02-25 11:10, Dmitry Osipenko wrote:

25.02.2022 13:49, Sascha Hauer пишет:

On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote:

25.02.2022 10:51, Sascha Hauer пишет:

The rk3568 HDMI has an additional clock that needs to be enabled for the
HDMI controller to work. The purpose of that clock is not clear. It is
named "hclk" in the downstream driver, so use the same name.

Signed-off-by: Sascha Hauer 
---

Notes:
  Changes since v5:
  - Use devm_clk_get_optional rather than devm_clk_get

   drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 
   1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index fe4f9556239ac..c6c00e8779ab5 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -76,6 +76,7 @@ struct rockchip_hdmi {
const struct rockchip_hdmi_chip_data *chip_data;
struct clk *ref_clk;
struct clk *grf_clk;
+   struct clk *hclk_clk;
struct dw_hdmi *hdmi;
struct regulator *avdd_0v9;
struct regulator *avdd_1v8;
@@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi 
*hdmi)
return PTR_ERR(hdmi->grf_clk);
}
+   hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk");
+   if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) {


Have you tried to investigate the hclk? I'm still thinking that's not
only HDMI that needs this clock and then the hardware description
doesn't look correct.


I am still not sure what you mean. Yes, it's not only the HDMI that
needs this clock. The VOP2 needs it as well and the driver handles that.


I'm curious whether DSI/DP also need that clock to be enabled. If they
do, then you aren't modeling h/w properly AFAICS.


Assuming nobody at Rockchip decided to make things needlessly inconsistent
with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave
interface. Usually, if that affected anything other than accessing VOP
registers, indeed it would smell of something being wrong in the clock tree,
but in this case I'd also be suspicious of whether it might have ended up
clocking related GRF registers as well (either directly, or indirectly via
some gate that the clock driver hasn't modelled yet).


Ok, I am beginning to understand. I verified that hdmi, mipi and dp are
hanging when HCLK_VOP is disabled by disabling that clock via sysfs
using CLOCK_ALLOW_WRITE_DEBUGFS. When it's disabled then the registers
of that units can't be accessed. However, when I disable HCLK_VOP by
directly writing to the gate bit RK3568_CLKGATE_CON(20) then only
accessing VOP registers hangs, the other units stay functional.
So it seems it must be the parent clock which must be enabled. The
parent clock is hclk_vo. This clock should be handled as part of the
RK3568_PD_VO power domain:

power-domain@RK3568_PD_VO {
 reg = ;
 clocks = <&cru HCLK_VO>,
  <&cru PCLK_VO>,
  <&cru ACLK_VOP_PRE>;
  pm_qos = <&qos_hdcp>,
   <&qos_vop_m0>,
   <&qos_vop_m1>;
  #power-domain-cells = <0>;
 };

The HDMI controller is part of that domain, so I think this should work,
but it doesn't. That's where I am now, I'll have a closer look.


Ah, interesting. Looking at the clock driver, I'd also be suspicious 
whether pclk_vo is somehow messed up such that we're currently relying 
on hclk_vo to keep the common grandparent enabled. Seems like the DSI 
and eDP (and HDCP if anyone ever used it) registers would be similarly 
affected if so, and sure enough they both have a similarly suspect extra 
"hclk" in the downstream DT too.


Robin.


Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-02-25 Thread Robin Murphy
[ +arm64 maintainers for their awareness, which would have been a good 
thing to do from the start ]


On 2022-02-25 03:24, Michael Cheng wrote:

Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
performs a flush by first performing a clean, follow by an invalidation
operation.

v2 (Michael Cheng): Use correct macro for cleaning and invalidation the
dcache. Thanks Tvrtko for the suggestion.

v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h

v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a
symbol that could be use by other modules, thus use
caches_clean_inval_pou instead. Also this version
removes include for cacheflush, since its already
included base on architecture type.

Signed-off-by: Michael Cheng 
Reviewed-by: Matt Roper 
---
  drivers/gpu/drm/drm_cache.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index c3e6e615bf09..81c28714f930 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
  
  	if (wbinvd_on_all_cpus())

pr_err("Timed out waiting for cache flush\n");
+
+#elif defined(CONFIG_ARM64)
+   void *end = addr + length;
+   caches_clean_inval_pou((unsigned long)addr, (unsigned long)end);


Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


Robin.

(Note that the above is somewhat of a loaded question, and I do actually 
have half an idea of what you're trying to do here and why it won't fly, 
but I'd like to at least assume you've read the documentation of the 
function you decided was OK to use)



+
  #else
WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
  #endif


Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk

2022-03-01 Thread Robin Murphy

On 2022-02-28 14:19, Sascha Hauer wrote:

On Fri, Feb 25, 2022 at 02:11:54PM +0100, Sascha Hauer wrote:

On Fri, Feb 25, 2022 at 12:41:23PM +, Robin Murphy wrote:

On 2022-02-25 11:10, Dmitry Osipenko wrote:

25.02.2022 13:49, Sascha Hauer пишет:

On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote:

25.02.2022 10:51, Sascha Hauer пишет:

The rk3568 HDMI has an additional clock that needs to be enabled for the
HDMI controller to work. The purpose of that clock is not clear. It is
named "hclk" in the downstream driver, so use the same name.

Signed-off-by: Sascha Hauer 
---

Notes:
  Changes since v5:
  - Use devm_clk_get_optional rather than devm_clk_get

   drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 
   1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index fe4f9556239ac..c6c00e8779ab5 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -76,6 +76,7 @@ struct rockchip_hdmi {
const struct rockchip_hdmi_chip_data *chip_data;
struct clk *ref_clk;
struct clk *grf_clk;
+   struct clk *hclk_clk;
struct dw_hdmi *hdmi;
struct regulator *avdd_0v9;
struct regulator *avdd_1v8;
@@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi 
*hdmi)
return PTR_ERR(hdmi->grf_clk);
}
+   hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk");
+   if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) {


Have you tried to investigate the hclk? I'm still thinking that's not
only HDMI that needs this clock and then the hardware description
doesn't look correct.


I am still not sure what you mean. Yes, it's not only the HDMI that
needs this clock. The VOP2 needs it as well and the driver handles that.


I'm curious whether DSI/DP also need that clock to be enabled. If they
do, then you aren't modeling h/w properly AFAICS.


Assuming nobody at Rockchip decided to make things needlessly inconsistent
with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave
interface. Usually, if that affected anything other than accessing VOP
registers, indeed it would smell of something being wrong in the clock tree,
but in this case I'd also be suspicious of whether it might have ended up
clocking related GRF registers as well (either directly, or indirectly via
some gate that the clock driver hasn't modelled yet).


Ok, I am beginning to understand. I verified that hdmi, mipi and dp are
hanging when HCLK_VOP is disabled by disabling that clock via sysfs
using CLOCK_ALLOW_WRITE_DEBUGFS. When it's disabled then the registers
of that units can't be accessed. However, when I disable HCLK_VOP by
directly writing to the gate bit RK3568_CLKGATE_CON(20) then only
accessing VOP registers hangs, the other units stay functional.
So it seems it must be the parent clock which must be enabled. The
parent clock is hclk_vo. This clock should be handled as part of the
RK3568_PD_VO power domain:

power-domain@RK3568_PD_VO {
 reg = ;
 clocks = <&cru HCLK_VO>,
  <&cru PCLK_VO>,
  <&cru ACLK_VOP_PRE>;
  pm_qos = <&qos_hdcp>,
   <&qos_vop_m0>,
   <&qos_vop_m1>;
  #power-domain-cells = <0>;
 };


Forget this. The clocks in this node are only enabled during enabling or
disabling the power domain, they are disabled again immediately afterwards.

OK, I need HCLK_VO to access the HDMI registers. I verified that by
disabling HCLK_VO at register level (CRU_GATE_CON(20) BIT(1)). The
HDMI registers become inaccessible then. This means I'll replace
HCLK_VOP in the HDMI node with HCLK_VO. Does this sound sane?


Well, it's still a mystery hack overall, and in some ways it seems even 
more suspect to be claiming a whole branch of the clock tree rather than 
a leaf gate with a specific purpose. I'm really starting to think that 
the underlying issue here is a bug in the clock driver, or a hardware 
mishap that should logically be worked around by the clock driver, 
rather than individual the consumers.


Does it work if you hack the clock driver to think that PCLK_VO is a 
child of HCLK_VO? Even if that's not technically true, it would seem to 
effectively match the observed behaviour (i.e. all 3 things whose 
register access apparently *should* be enabled by a gate off PCLK_VO, 
seem to also require HCLK_VO).


Thanks,
Robin.


Re: [PATCH v4 1/9] dt-bindings: host1x: Add iommu-map property

2022-03-01 Thread Robin Murphy

On 2022-03-01 16:14, cyn...@kapsi.fi wrote:

From: Mikko Perttunen 

Add schema information for specifying context stream IDs. This uses
the standard iommu-map property.

Signed-off-by: Mikko Perttunen 
---
v3:
* New patch
v4:
* Remove memory-contexts subnode.
---
  .../bindings/display/tegra/nvidia,tegra20-host1x.yaml| 5 +
  1 file changed, 5 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 
b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..0adeb03b9e3a 100644
--- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,11 @@ allOf:
  reset-names:
maxItems: 1
  
+iommu-map:

+  description: Specification of stream IDs available for memory 
context device
+use. Should be a mapping of IDs 0..n to IOMMU entries 
corresponding to


Nit: maybe "context IDs 0..n" for maximum possible clarity?

Either way, though, I'm happy that if the simplest and most 
straightforward approach works, then it's the best choice.


Reviewed-by: Robin Murphy 

Cheers,
Robin.


+usable stream IDs.
+
required:
  - reg-names
  


Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-03-02 Thread Robin Murphy

On 2022-02-25 19:27, Michael Cheng wrote:

Hi Robin,

[ +arm64 maintainers for their awareness, which would have been a good 
thing to do from the start ]


  * Thanks for adding the arm64 maintainer and sorry I didn't rope them
    in sooner.

Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


  * Also thanks for pointing this out. Initially I was using
    dcache_clean_inval_poc, which seem to be the equivalently to what
    x86 is doing for dcache flushing, but it was giving me build errors
    since its not on the global list of kernel symbols. And after
    revisiting the documentation for caches_clean_inval_pou, it won't
    fly for what we are trying to do. Moving forward, what would you (or
    someone in the ARM community) suggest we do? Could it be possible to
    export dcache_clean_inval_poc as a global symbol?


Unlikely, unless something with a legitimate need for CPU-centric cache 
maintenance like kexec or CPU hotplug ever becomes modular.


In the case of a device driver, it's not even the basic issues of 
assuming to find direct equivalents to x86 semantics in other CPU 
architectures, or effectively reinventing parts of the DMA API, it's 
even bigger than that. Once you move from being integrated in a single 
vendor's system architecture to being on a discrete card, you 
fundamentally *no longer have any control over cache coherency*. Whether 
the host CPU architecture happens to be AArch64, RISC-V, or whatever 
doesn't really matter, you're at the mercy of 3rd-party PCIe and 
interconnect IP vendors, and SoC integrators. You'll find yourself in 
systems where PCIe simply cannot snoop any caches, where you'd better 
have the correct DMA API calls in place to have any hope of even the 
most basic functionality working properly; you'll find yourself in 
systems where even if the PCIe root complex claims to support No Snoop, 
your uncached traffic will still end up snooping stale data that got 
prefetched back into caches you thought you'd invalidated; you'll find 
yourself in systems where your memory attributes may or may not get 
forcibly rewritten by an IOMMU depending on the kernel config and/or 
command line.


It's not about simply finding a substitute for clflush, it's that the 
reasons you have for using clflush in the first place can no longer be 
assumed to be valid.


Robin.


On 2022-02-25 10:24 a.m., Robin Murphy wrote:
[ +arm64 maintainers for their awareness, which would have been a good 
thing to do from the start ]


On 2022-02-25 03:24, Michael Cheng wrote:

Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
performs a flush by first performing a clean, follow by an invalidation
operation.

v2 (Michael Cheng): Use correct macro for cleaning and invalidation the
    dcache. Thanks Tvrtko for the suggestion.

v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h

v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a
    symbol that could be use by other modules, thus use
    caches_clean_inval_pou instead. Also this version
    removes include for cacheflush, since its already
    included base on architecture type.

Signed-off-by: Michael Cheng 
Reviewed-by: Matt Roper 
---
  drivers/gpu/drm/drm_cache.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index c3e6e615bf09..81c28714f930 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned long 
length)

    if (wbinvd_on_all_cpus())
  pr_err("Timed out waiting for cache flush\n");
+
+#elif defined(CONFIG_ARM64)
+    void *end = addr + length;
+    caches_clean_inval_pou((unsigned long)addr, (unsigned long)end);


Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


Robin.

(Note that the above is somewhat of a loaded question, and I do 
actually have half an idea of what you're trying to do here and why it 
won't fly, but I'd like to at least assume you've read the 
documentation of the function you decided was OK to use)



+
  #else
  WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
  #endif


Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-03-02 Thread Robin Murphy

On 2022-03-02 15:55, Michael Cheng wrote:

Thanks for the feedback Robin!

Sorry my choices of word weren't that great, but what I meant is to 
understand how ARM flushes a range of dcache for device drivers, and not 
an equal to x86 clflush.


I believe the concern is if the CPU writes an update, that update might 
only be sitting in the CPU cache and never make it to device memory 
where the device can see it; there are specific places that we are 
supposed to flush the CPU caches to make sure our updates are visible to 
the hardware.


Ah, OK, if it's more about ordering, and it's actually write buffers 
rather than caches that you care about flushing, then we might be a lot 
safer, phew!


For a very simple overview, in a case where the device itself needs to 
observe memory writes in the correct order, e.g.:


data_descriptor.valid = 1;

clflush(&data_descriptor);

command_descriptor.data = &data_descriptor

writel(/* control register to read command to then read data */)

then dma_wmb() between the first two writes should be the right tool to 
ensure that the command does not observe the command update while the 
data update is still sat somewhere in a CPU write buffer.


If you want a slightly stronger notion that, at a given point, all prior 
writes have actually been issued and should now be visible (rather than 
just that they won't become visible in the wrong order whenever they 
do), then wmb() should suffice on arm64.


Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM for 
a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can still 
be write-buffered, so barriers still matter even when actual cache 
maintenance ops don't (and as before if you're trying to perform cache 
maintenance outside the DMA API then you've already lost anyway). MMIO 
registers should be mapped as Device memory via ioremap(), which is not 
bufferable, hence the barrier implicit in writel() effectively pushes 
out any prior buffered writes ahead of a register write, which is why we 
don't need to worry about this most of the time.


This is only a very rough overview, though, and I'm not familiar enough 
with x86 semantics, your hardware, or the exact use-case to be able to 
say whether barriers alone are anywhere near the right answer or not.


Robin.



+Matt Roper

Matt, Lucas, any feed back here?

On 2022-03-02 4:49 a.m., Robin Murphy wrote:

On 2022-02-25 19:27, Michael Cheng wrote:

Hi Robin,

[ +arm64 maintainers for their awareness, which would have been a 
good thing to do from the start ]


  * Thanks for adding the arm64 maintainer and sorry I didn't rope them
    in sooner.

Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


  * Also thanks for pointing this out. Initially I was using
    dcache_clean_inval_poc, which seem to be the equivalently to what
    x86 is doing for dcache flushing, but it was giving me build errors
    since its not on the global list of kernel symbols. And after
    revisiting the documentation for caches_clean_inval_pou, it won't
    fly for what we are trying to do. Moving forward, what would you (or
    someone in the ARM community) suggest we do? Could it be possible to
    export dcache_clean_inval_poc as a global symbol?


Unlikely, unless something with a legitimate need for CPU-centric 
cache maintenance like kexec or CPU hotplug ever becomes modular.


In the case of a device driver, it's not even the basic issues of 
assuming to find direct equivalents to x86 semantics in other CPU 
architectures, or effectively reinventing parts of the DMA API, it's 
even bigger than that. Once you move from being integrated in a single 
vendor's system architecture to being on a discrete card, you 
fundamentally *no longer have any control over cache coherency*. 
Whether the host CPU architecture happens to be AArch64, RISC-V, or 
whatever doesn't really matter, you're at the mercy of 3rd-party PCIe 
and interconnect IP vendors, and SoC integrators. You'll find yourself 
in systems where PCIe simply cannot snoop any caches, where you'd 
better have the correct DMA API calls in place to have any hope of 
even the most basic functionality working properly; you'll find 
yourself in systems where even if the PCIe root complex claims to 
support No Snoop, your uncached traffic will still end up snooping 
stale data that got prefetched back into caches you thought you'd 
invalidated; you'll find yourself in systems where your memory 
attributes may or may not get forcibly rewritten by an IOMMU depending 
on the kernel config and/or command line.


It's not about simply finding a substitute for clflush, it's that the 
reasons you have for using clflush in the first place can no longer be 
assumed to be valid.


Robin.


On 2022-02

Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)

2021-09-10 Thread Robin Murphy

On 2021-09-10 11:09, Guchun Chen wrote:

Vendor will define their own memory types on top of TTM_PL_PRIV,
but call ttm_set_driver_manager directly without checking mem_type
value when setting up memory manager. So add such check to aware
the case when array bounds.

v2: lower check level to WARN_ON

Signed-off-by: Leslie Shi 
Signed-off-by: Guchun Chen 
---
  include/drm/ttm/ttm_device.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..aa79953c807c 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
  static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
  struct ttm_resource_manager *manager)
  {
+   WARN_ON(type >= TTM_NUM_MEM_TYPES);


Nit: I know nothing about this code, but from the context alone it would 
seem sensible to do


if (WARN_ON(type >= TTM_NUM_MEM_TYPES))
return;

to avoid making the subsequent assignment when we *know* it's invalid 
and likely to corrupt memory.


Robin.


bdev->man_drv[type] = manager;
  }
  



Re: [PATCH v2] drm/cma-helper: Set VM_DONTEXPAND for mmap

2021-10-13 Thread Robin Murphy

Hi Daniel,

On 2021-10-13 18:08, Daniel Vetter wrote:

On Wed, Oct 13, 2021 at 10:36:54AM -0400, Alyssa Rosenzweig wrote:

From: Robin Murphy 

drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc()
will end up calling remap_pfn_range() (which happens to set the relevant
vma flag, among others), so in order to make sure expectations around
VM_DONTEXPAND are met, let it explicitly set the flag like most other
GEM mmap implementations do.

This avoids repeated warnings on a small minority of systems where the
display is behind an IOMMU, and has a simple driver which does not
override drm_gem_cma_default_funcs. Arm hdlcd is an in-tree affected
driver. Out-of-tree, the Apple DCP driver is affected; this fix is
required for DCP to be mainlined.


How/where does this warn?


In drm_gem_mmap_obj().


Also there should be a lot more drivers than
just these two which have an iommu for the display block, so this not
working is definitely a more wide-spread issue.


As the commit message implies, all those other drivers appear to end up 
using different mmap() implementations one way or another. Once I'd 
eventually figured it out, it didn't surprise me that the combination of 
a trivially-dumb display with an IOMMU is an oddball corner-case.



-Daniel


Signed-off-by: Robin Murphy 
Reviewed-and-tested-by: Alyssa Rosenzweig 


Alyssa - thanks for reviving this BTW, I'd forgotten all about it! - for 
future reference the clunky olde-worlde version of reassigning an MR to 
yourself is to add your sign-off to the end of the block (in addition to 
any review tags you may have previously given or choose to add) to note 
that you've chosen to take on responsibility for the patch[1]. FWIW I'm 
also partial to the practice of adding a little note in between if 
you've made any tweaks, e.g. "[alyssa: clarify affected drivers]", but 
that's often more of a personal choice.


Cheers,
Robin.

[1] 
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin



---
  drivers/gpu/drm/drm_gem_cma_helper.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index d53388199f34..63e48d98263d 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -510,6 +510,7 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct 
vm_area_struct *vma)
 */
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
vma->vm_flags &= ~VM_PFNMAP;
+   vma->vm_flags |= VM_DONTEXPAND;
  
  	cma_obj = to_drm_gem_cma_obj(obj);
  
--

2.30.2





Re: Phyr Starter

2022-01-20 Thread Robin Murphy

On 2022-01-20 15:27, Keith Busch wrote:

On Thu, Jan 20, 2022 at 02:56:02PM +0100, Christoph Hellwig wrote:

  - on the input side to dma mapping the bio_vecs (or phyrs) are chained
as bios or whatever the containing structure is.  These already exist
and have infrastructure at least in the block layer
  - on the output side I plan for two options:

1) we have a sane IOMMU and everyting will be coalesced into a
   single dma_range.  This requires setting the block layer
   merge boundary to match the IOMMU page size, but that is
   a very good thing to do anyway.


It doesn't look like IOMMU page sizes are exported, or even necessarily
consistently sized on at least one arch (power).


FWIW POWER does its own thing separate from the IOMMU API. For all the 
regular IOMMU API players, page sizes are published in the iommu_domain 
that the common iommu-dma layer operates on. In fact it already uses 
them to pick chunk sizes for composing large buffer allocations.


Robin.


Re: [PATCH] [RFC] fbcon: Add option to enable legacy hardware acceleration

2022-01-25 Thread Robin Murphy

On 2022-01-25 19:12, Helge Deller wrote:

Add a config option CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION to
enable bitblt and fillrect hardware acceleration in the framebuffer
console. If disabled, such acceleration will not be used, even if it is
supported by the graphics hardware driver.

If you plan to use DRM as your main graphics output system, you should
disable this option since it will prevent compiling in code which isn't
used later on when DRM takes over.

For all other configurations, e.g. if none of your graphic cards support
DRM (yet), DRM isn't available for your architecture, or you can't be
sure that the graphic card in the target system will support DRM, you
most likely want to enable this option.


This patch is the first RFC.


Independed of this patch I did some timing experiments with a qemu
virtual machine running a PA-RISC Debian Linux installation with a
screen resolution of 2048x1024 with 8bpp. In that case qemu emulated the
graphics hardware bitblt and fillrect acceleration by using the native
(x86) CPU.

It was a simple testcase which was to run "time dmesg", where the syslog
had 284 lines. The results showed a huge speedup:
a) time dmesg (without acceleration):
-> 19.0 seconds
b) time dmesg (with acceleration):
->  2.6 seconds

Signed-off-by: Helge Deller 

diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
index 840d9813b0bc..da84d1c21c21 100644
--- a/drivers/video/console/Kconfig
+++ b/drivers/video/console/Kconfig
@@ -78,6 +78,17 @@ config FRAMEBUFFER_CONSOLE
help
  Low-level framebuffer-based console driver.

+config FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION
+   bool "Framebuffer Console hardware acceleration support"
+   depends on FRAMEBUFFER_CONSOLE
+   default y if !DRM
+   default y if !(X86 || ARM)


You probably mean ARM64 there, if you're trying to encapsulate "modern 
systems where this really isn't relevant". Some supported ARM platforms 
do date back to the days of weird and wonderful fbdev hardware.


Conversely I recently cobbled an ancient PCI VGA card into an arm64 
machine and was slightly disappointed that there didn't seem to be any 
driver that was usable straight off :)


(Yes, I might give v86d + uvesafb a go at some point...)

Robin.


+   help
+ If you use a system on which DRM is fully supported you usually want 
to say N,
+ otherwise you probably want to enable this option.
+ If enabled the framebuffer console will utilize the hardware 
acceleration
+ of your graphics card by using hardware bitblt and fillrect features.
+
  config FRAMEBUFFER_CONSOLE_DETECT_PRIMARY
 bool "Map the console to the primary display device"
 depends on FRAMEBUFFER_CONSOLE
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c
index b813985f1403..f7b7d35953e8 100644
--- a/drivers/video/fbdev/core/fbcon.c
+++ b/drivers/video/fbdev/core/fbcon.c
@@ -1136,11 +1136,13 @@ static void fbcon_init(struct vc_data *vc, int init)

ops->graphics = 0;

+#ifdef CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION
if ((cap & FBINFO_HWACCEL_COPYAREA) &&
!(cap & FBINFO_HWACCEL_DISABLED))
p->scrollmode = SCROLL_MOVE;
else /* default to something safe */
p->scrollmode = SCROLL_REDRAW;
+#endif

/*
 *  ++guenther: console.c:vc_allocate() relies on initializing
@@ -1705,7 +1707,7 @@ static bool fbcon_scroll(struct vc_data *vc, unsigned int 
t, unsigned int b,
count = vc->vc_rows;
if (logo_shown >= 0)
goto redraw_up;
-   switch (p->scrollmode) {
+   switch (fb_scrollmode(p)) {
case SCROLL_MOVE:
fbcon_redraw_blit(vc, info, p, t, b - t - count,
 count);
@@ -1795,7 +1797,7 @@ static bool fbcon_scroll(struct vc_data *vc, unsigned int 
t, unsigned int b,
count = vc->vc_rows;
if (logo_shown >= 0)
goto redraw_down;
-   switch (p->scrollmode) {
+   switch (fb_scrollmode(p)) {
case SCROLL_MOVE:
fbcon_redraw_blit(vc, info, p, b - 1, b - t - count,
 -count);
@@ -1946,12 +1948,12 @@ static void fbcon_bmove_rec(struct vc_data *vc, struct 
fbcon_display *p, int sy,
   height, width);
  }

-static void updatescrollmode(struct fbcon_display *p,
+static void updatescrollmode_accel(struct fbcon_display *p,
struct fb_info *info,
struct vc_data *vc)
  {
+#ifdef CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION
struct fbcon_ops *ops = info->fbcon_par;
- 

Re: [PATCH 21/27] arm64: dts: rockchip: rk356x: Add HDMI nodes

2022-01-26 Thread Robin Murphy

On 2022-01-26 16:04, Peter Geis wrote:

On Wed, Jan 26, 2022 at 9:58 AM Sascha Hauer  wrote:


Add support for the HDMI port found on RK3568.

Signed-off-by: Sascha Hauer 
---
  arch/arm64/boot/dts/rockchip/rk356x.dtsi | 37 +++-
  1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk356x.dtsi 
b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
index 4008bd666d01..e38fb223e9b8 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
@@ -10,7 +10,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 

  / {
@@ -502,6 +501,42 @@ vop_mmu: iommu@fe043e00 {
 status = "disabled";
 };

+   hdmi: hdmi@fe0a {
+   compatible = "rockchip,rk3568-dw-hdmi";
+   reg = <0x0 0xfe0a 0x0 0x2>;
+   interrupts = ;
+   clocks = <&cru PCLK_HDMI_HOST>,
+<&cru CLK_HDMI_SFR>,
+<&cru CLK_HDMI_CEC>,
+<&pmucru CLK_HDMI_REF>,
+<&cru HCLK_VOP>;
+   clock-names = "iahb", "isfr", "cec", "ref", "hclk";
+   pinctrl-names = "default";
+   pinctrl-0 = <&hdmitx_scl &hdmitx_sda &hdmitxm0_cec>;


I looked into CEC support here, and it seems that it does work with one change.
Please add the two following lines to your patch:
assigned-clocks = <&cru CLK_HDMI_CEC>;
assigned-clock-rates = <32768>;

The issue is the clk_rtc32k_frac clock that feeds clk_rtc_32k which
feeds clk_hdmi_cec is 24mhz at boot, which is too high for CEC to
function.


Wouldn't it make far more sense to just stick a suitable clk_set_rate() 
call in the driver? AFAICS it's already explicitly aware of the CEC clock.


Robin.


+   power-domains = <&power RK3568_PD_VO>;
+   reg-io-width = <4>;
+   rockchip,grf = <&grf>;
+   #sound-dai-cells = <0>;
+   status = "disabled";
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   hdmi_in: port@0 {
+   reg = <0>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+
+   hdmi_out: port@1 {
+   reg = <1>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+   };
+   };
+
 qos_gpu: qos@fe128000 {
 compatible = "rockchip,rk3568-qos", "syscon";
 reg = <0x0 0xfe128000 0x0 0x20>;
--
2.30.2



___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip


Re: [PATCH 21/27] arm64: dts: rockchip: rk356x: Add HDMI nodes

2022-01-26 Thread Robin Murphy

On 2022-01-26 18:44, Peter Geis wrote:

On Wed, Jan 26, 2022 at 12:56 PM Robin Murphy  wrote:


On 2022-01-26 16:04, Peter Geis wrote:

On Wed, Jan 26, 2022 at 9:58 AM Sascha Hauer  wrote:


Add support for the HDMI port found on RK3568.

Signed-off-by: Sascha Hauer 
---
   arch/arm64/boot/dts/rockchip/rk356x.dtsi | 37 +++-
   1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk356x.dtsi 
b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
index 4008bd666d01..e38fb223e9b8 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
@@ -10,7 +10,6 @@
   #include 
   #include 
   #include 
-#include 
   #include 

   / {
@@ -502,6 +501,42 @@ vop_mmu: iommu@fe043e00 {
  status = "disabled";
  };

+   hdmi: hdmi@fe0a {
+   compatible = "rockchip,rk3568-dw-hdmi";
+   reg = <0x0 0xfe0a 0x0 0x2>;
+   interrupts = ;
+   clocks = <&cru PCLK_HDMI_HOST>,
+<&cru CLK_HDMI_SFR>,
+<&cru CLK_HDMI_CEC>,
+<&pmucru CLK_HDMI_REF>,
+<&cru HCLK_VOP>;
+   clock-names = "iahb", "isfr", "cec", "ref", "hclk";
+   pinctrl-names = "default";
+   pinctrl-0 = <&hdmitx_scl &hdmitx_sda &hdmitxm0_cec>;


I looked into CEC support here, and it seems that it does work with one change.
Please add the two following lines to your patch:
assigned-clocks = <&cru CLK_HDMI_CEC>;
assigned-clock-rates = <32768>;

The issue is the clk_rtc32k_frac clock that feeds clk_rtc_32k which
feeds clk_hdmi_cec is 24mhz at boot, which is too high for CEC to
function.


Wouldn't it make far more sense to just stick a suitable clk_set_rate()
call in the driver? AFAICS it's already explicitly aware of the CEC clock.


This is handled purely in the
drivers/gpu/drm/bridge/synopsys/dw-hdmi.c driver, so I'm hesitant to
touch it there as it would affect all users, not just Rockchip.


I'd have a strong hunch that it's a standard thing for the DesignWare IP 
and not affected by platform integration. I don't have the magical 
Synopsys databook, but between the trusty old i.MX6 manual and most of 
the other in-tree DTs getting their dw-hdmi "cec" clock from 
suspiciously-obviously-named sources, I'd be somewhat surprised if it 
was ever anything other than 32KHz.


Robin.


Could someone familiar with the dw-hdmi IP weigh in on the minimum and
maximum clock rate the CEC block can handle?



Robin.


+   power-domains = <&power RK3568_PD_VO>;
+   reg-io-width = <4>;
+   rockchip,grf = <&grf>;
+   #sound-dai-cells = <0>;
+   status = "disabled";
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   hdmi_in: port@0 {
+   reg = <0>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+
+   hdmi_out: port@1 {
+   reg = <1>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   };
+   };
+   };
+
  qos_gpu: qos@fe128000 {
  compatible = "rockchip,rk3568-qos", "syscon";
  reg = <0x0 0xfe128000 0x0 0x20>;
--
2.30.2



___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip


Re: [RFC PATCH] component: Add common helpers for compare/release functions

2022-01-28 Thread Robin Murphy

On 2022-01-28 08:11, Yong Wu wrote:
[...]

diff --git a/include/linux/component.h b/include/linux/component.h
index 16de18f473d7..5a7468ea827c 100644
--- a/include/linux/component.h
+++ b/include/linux/component.h
@@ -2,6 +2,8 @@
  #ifndef COMPONENT_H
  #define COMPONENT_H
  
+#include 

+#include 
  #include 
  
  
@@ -82,6 +84,22 @@ struct component_master_ops {

void (*unbind)(struct device *master);
  };
  
+/* A set common helpers for compare/release functions */

+static inline int compare_of(struct device *dev, void *data)
+{
+   return dev->of_node == data;
+}


Note that this is effectively just device_match_of_node(), although I 
guess there is an argument that having a nice consistent set of 
component_match API helpers might be worth more than a tiny code saving 
by borrowing one from a different API.


Either way, however, I don't think there's any good argument for 
instantiating separate copies of these functions in every driver that 
uses them. If they're used as callbacks then they can't actually be 
inlined anyway, so they may as well be exported from component.c as 
normal so that the code really is shared (plus then there's nice 
symmetry with the aforementioned device_match API helpers too).


Thanks,
Robin.


+static inline void release_of(struct device *dev, void *data)
+{
+   of_node_put(data);
+}
+
+static inline int compare_dev(struct device *dev, void *data)
+{
+   return dev == data;
+}
+
  void component_master_del(struct device *,
const struct component_master_ops *);
  
diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c

index eff200a07d9f..992132cbfb9f 100644
--- a/sound/soc/codecs/wcd938x.c
+++ b/sound/soc/codecs/wcd938x.c
@@ -4417,16 +4417,6 @@ static const struct component_master_ops 
wcd938x_comp_ops = {
.unbind = wcd938x_unbind,
  };
  
-static int wcd938x_compare_of(struct device *dev, void *data)

-{
-   return dev->of_node == data;
-}
-
-static void wcd938x_release_of(struct device *dev, void *data)
-{
-   of_node_put(data);
-}
-
  static int wcd938x_add_slave_components(struct wcd938x_priv *wcd938x,
struct device *dev,
struct component_match **matchptr)
@@ -4442,8 +4432,7 @@ static int wcd938x_add_slave_components(struct 
wcd938x_priv *wcd938x,
}
  
  	of_node_get(wcd938x->rxnode);

-   component_match_add_release(dev, matchptr, wcd938x_release_of,
-   wcd938x_compare_of, wcd938x->rxnode);
+   component_match_add_release(dev, matchptr, release_of, compare_of, 
wcd938x->rxnode);
  
  	wcd938x->txnode = of_parse_phandle(np, "qcom,tx-device", 0);

if (!wcd938x->txnode) {
@@ -4451,8 +4440,7 @@ static int wcd938x_add_slave_components(struct 
wcd938x_priv *wcd938x,
return -ENODEV;
}
of_node_get(wcd938x->txnode);
-   component_match_add_release(dev, matchptr, wcd938x_release_of,
-   wcd938x_compare_of, wcd938x->txnode);
+   component_match_add_release(dev, matchptr, release_of, compare_of, 
wcd938x->txnode);
return 0;
  }
  


Re: [Intel-gfx] [PATCH] drm/i915: Use per device iommu check

2021-11-10 Thread Robin Murphy

On 2021-11-10 09:35, Tvrtko Ursulin wrote:


On 10/11/2021 07:25, Lu Baolu wrote:

On 2021/11/10 1:35, Tvrtko Ursulin wrote:


On 09/11/2021 17:19, Lucas De Marchi wrote:

On Tue, Nov 09, 2021 at 12:17:59PM +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

On igfx + dgfx setups, it appears that intel_iommu=igfx_off option 
only

disables the igfx iommu. Stop relying on global intel_iommu_gfx_mapped
and probe presence of iommu domain per device to accurately reflect 
its

status.


nice, I was just starting to look into thus but for another reason: we
are adding support for other archs, like aarch64, and the global 
from here

was a problem


Yes I realized the other iommu angle as well. To do this properly we 
need to sort the intel_vtd_active call sites into at least two 
buckets - which are truly about VT-d and which are just IOMMU.


For instance the THP decision in i915_gemfs.co would be "are we 
behind any iommu". Some other call sites are possibly only about the 
bugs in the igfx iommu. Not sure if there is a third bucket for any 
potential differences between igfx iommu and other Intel iommu in 
case of dgfx.


I'd like to hear from Baolu as well to confirm if intel_iommu driver 
is handling igfx + dgfx correctly in respect to the two global 
variables I mention in the commit message.


I strongly agree that the drivers should call the IOMMU interface
directly for portability. For Intel graphic driver, we have two issues:

#1) driver asks vt-d driver for identity map with intel_iommu=igfx_off.
#2) driver query the status with a global intel_iommu_gfx_mapped.

We need to solve these two problems step by step. This patch is
definitely a good start point.


(I should have really consolidated the thread, but never mind now.)

You mean good starting point for the discussion or between your first 
and second email you started thinking it may even work?


Because as I wrote in the other email, it appears to work. But I fully 
accept it may be by accident and you may suggest a proper API to be 
added to the IOMMU core, which I would then be happy to use.


The "proper" API at the moment is device_iommu_mapped(), but indeed that 
only answers the "is this device connected to an IOMMU at all?" 
question, it doesn't tell you whether that IOMMU is translating or just 
bypassing.


If translation only really matters in terms of DMA API usage - i.e. 
you're not interested in using the IOMMU API directly - then I reckon it 
would be fairly reasonable to use dma_get_ops() to look at whether 
you've got dma-direct or iommu_dma_ops. At the moment that's sufficient 
to tell you whether your DMA is translated or not. If a more formal 
interface is wanted in future, I'm inclined to think that it would still 
belong at the DMA API level, since the public IOMMU API is really all 
about explicit translation, whereas what we care about here is reflected 
more in its internal interaction with the DMA APIs.


If maybe not immediately, perhaps we could start with this patch and 
going forward add something more detailed. Like for instance allowing us 
to query the name/id of the iommu driver in case i915 needs to apply 
different quirks across them? Not sure how feasible that would be, but 
at the moment the need does sound plausible to me.


FWIW I'd be wary of trying to get that clever - with iGFX it's easy to 
make fine-grained decisions because you've got a known and fixed 
integration with a particular IOMMU, but once you get out into the wider 
world you'll run into not only multiple different IOMMU implementations 
behind the same driver, but even the exact same IOMMU IP having 
different characteristics in different SoCs. Even something as 
seemingly-innocuous as an "is it worth using 2MB large pages?" quirk 
list could effectively become the cross product of various kernel config 
options and all PCIe-capable SoCs in existence.


Cheers,
Robin.


Re: [PATCH] drm/i915: Use per device iommu check

2021-11-10 Thread Robin Murphy

On 2021-11-10 14:11, Tvrtko Ursulin wrote:


On 10/11/2021 12:35, Lu Baolu wrote:

On 2021/11/10 20:08, Tvrtko Ursulin wrote:


On 10/11/2021 12:04, Lu Baolu wrote:

On 2021/11/10 17:30, Tvrtko Ursulin wrote:


On 10/11/2021 07:12, Lu Baolu wrote:

Hi Tvrtko,

On 2021/11/9 20:17, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin

On igfx + dgfx setups, it appears that intel_iommu=igfx_off 
option only
disables the igfx iommu. Stop relying on global 
intel_iommu_gfx_mapped
and probe presence of iommu domain per device to accurately 
reflect its

status.

Signed-off-by: Tvrtko Ursulin
Cc: Lu Baolu
---
Baolu, is my understanding here correct? Maybe I am confused by both
intel_iommu_gfx_mapped and dmar_map_gfx being globals in the 
intel_iommu
driver. But it certainly appears the setup can assign some iommu 
ops (and
assign the discrete i915 to iommu group) when those two are set 
to off.


diff --git a/drivers/gpu/drm/i915/i915_drv.h 
b/drivers/gpu/drm/i915/i915_drv.h

index e967cd08f23e..9fb38a54f1fe 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1763,26 +1763,27 @@ static inline bool run_as_guest(void)
  #define HAS_D12_PLANE_MINIMIZATION(dev_priv) 
(IS_ROCKETLAKE(dev_priv) || \

    IS_ALDERLAKE_S(dev_priv))

-static inline bool intel_vtd_active(void)
+static inline bool intel_vtd_active(struct drm_i915_private *i915)
  {
-#ifdef CONFIG_INTEL_IOMMU
-    if (intel_iommu_gfx_mapped)
+    if (iommu_get_domain_for_dev(i915->drm.dev))
  return true;
-#endif

  /* Running as a guest, we assume the host is enforcing VT'd */
  return run_as_guest();
  }

Have you verified this change? I am afraid that
iommu_get_domain_for_dev() always gets a valid iommu domain even
intel_iommu_gfx_mapped == 0.


Yes it seems to work as is:

default:

# grep -i iommu /sys/kernel/debug/dri/*/i915_capabilities
/sys/kernel/debug/dri/0/i915_capabilities:iommu: enabled
/sys/kernel/debug/dri/1/i915_capabilities:iommu: enabled

intel_iommu=igfx_off:

# grep -i iommu /sys/kernel/debug/dri/*/i915_capabilities
/sys/kernel/debug/dri/0/i915_capabilities:iommu: disabled
/sys/kernel/debug/dri/1/i915_capabilities:iommu: enabled

On my system dri device 0 is integrated graphics and 1 is discrete.


The drm device 0 has a dedicated iommu. When the user request igfx not
mapped, the VT-d implementation will turn it off to save power. But for
shared iommu, you definitely will get it enabled.


Sorry I am not following, what exactly do you mean? Is there a 
platform with integrated graphics without a dedicated iommu, in which 
case intel_iommu=igfx_off results in intel_iommu_gfx_mapped == 0 and 
iommu_get_domain_for_dev returning non-NULL?


Your code always work for an igfx with a dedicated iommu. This might be
always true on today's platforms. But from driver's point of view, we
should not make such assumption.

For example, if the iommu implementation decides not to turn off the
graphic iommu (perhaps due to some hw quirk or for graphic
virtualization), your code will be broken.


If I got it right, this would go back to your earlier recommendation to 
have the check look like this:


static bool intel_vtd_active(struct drm_i915_private *i915)
{
     struct iommu_domain *domain;

     domain = iommu_get_domain_for_dev(i915->drm.dev);
     if (domain && (domain->type & __IOMMU_DOMAIN_PAGING))
     return true;
 ...

This would be okay as a first step?

Elsewhere in the thread Robin suggested looking at the dec->dma_ops and 
comparing against iommu_dma_ops. These two solution would be effectively 
the same?


Effectively, yes. See iommu_setup_dma_ops() - the only way to end up 
with iommu_dma_ops is if a managed translation domain is present; if the 
IOMMU is present but the default domain type has been set to passthrough 
(either globally or forced for the given device) it will do nothing and 
leave you with dma-direct, while if the IOMMU has been ignored entirely 
then it should never even be called. Thus it neatly encapsulates what 
you're after here.


Cheers,
Robin.


Re: [PATCH v1 00/12] drm/rockchip: RK356x VOP2 support

2021-11-22 Thread Robin Murphy

On 2021-11-22 17:47, Alex Bee wrote:

Am 22.11.21 um 09:10 schrieb Sascha Hauer:

Hi Alex,

On Mon, Nov 22, 2021 at 12:18:47AM +0100, Alex Bee wrote:

Hi Sascha,

Am 17.11.21 um 15:33 schrieb Sascha Hauer:

This series adds initial graphics support for the Rockchip RK356[68]
SoCs.  Graphics support is based around the VOP2 controller which
replaces the VOP controller found on earlier Rockchip SoCs. The driver
has been tested with HDMI support included in this series and MIPI-DSI
which is not included because it needs some more work. The driver is
taken from the downstream Rockchip kernel and heavily polished, most non
standard features have been removed for now. I tested the driver with
the libdrm modetest utility and also with weston with both pixman and
panfrost driver support. Michael Riesch reported the driver to work on
the RK3566 as well, but device tree support for this SoC is not yet
included in this series.

The HDMI changes are based on patches from Benjamin Gaignard, but
modified a bit as I found out that the HDMI port on the RK3568 only
needs one additional clock, not two. Also I added regulator support
which is needed to get the HDMI up on the rk3568-EVB board.

All review and testing feedback welcome


thanks for working on that - it's very (very,very) much appreciated.

It took me some time to figure it out: It seems rk3568-iommu driver s
broken - I did only get "white noise" when using it alongside vop
(similar like it was reported here before). However: removing the
iommu-property from vop makes it working for me with HDMI output on
quartz64 as well. Could you check if you have the iommu driver in kernel
enabled if it works for you, if the property is present in DT? (I used
5.16-rc1 + this series + [0]).

I have the iommu driver enabled and it works for me. I get this during
boot:

[0.263287] rockchip-vop2 fe04.vop: Adding to iommu group 0

So I expect it is indeed used.


Also vop mmu seems to have the
power-domain missing in your series (same as downstream) - however
adding that doesn't help much currently.

Probably the power domain gets enabled anyway when the VOP is activated,
so adding it to the iommu won't help anything. Nevertheless it seems
correct to add the property, I'll do so in the next round.


As a sidenote: I verfied this with using Ezequiel's vpu addtion for
RK356x: It did only work when removing the iommu there as well (getting
tons of page faults otherwise) - so iommu driver really seems to broken,
at least for RK3566. (Or I'm a missing a option in kernel config, which
wasn't required for the older iommu version?)

I don't think so. I started from defconfig and disabled other
architectures and unneeded drivers, but I did not enable anything
specific to iommu.


I've found out now that I can make it work with iommu, by limiting the
available memory to something below 4G (I have a 8G board). So there is
something wrong in the driver or somewhere in memory mapping, iommu api
(since it works when using CMA), ... however: it does clearly not relate
to your patch.


FWIW it doesn't surprise me that there might still be bugs lurking in 
the IOMMU driver's relatively recent changes for packing 40-bit physical 
addresses into 32-bit pagetable entries and registers - that sort of 
thing is always tricky to get right. You're correct that that's 
something that wants debugging in its own right, though.


Robin.


[PATCH 1/9] gpu: host1x: Add missing DMA API include

2021-11-23 Thread Robin Murphy
Host1x seems to be relying on picking up dma-mapping.h transitively from
iova.h, which has no reason to include it in the first place. Fix the
former issue before we totally break things by fixing the latter one.

CC: Thierry Reding 
CC: Mikko Perttunen 
CC: dri-devel@lists.freedesktop.org
CC: linux-te...@vger.kernel.org
Signed-off-by: Robin Murphy 
---

Feel free to pick this into drm-misc-next or drm-misc-fixes straight
away if that suits - it's only to avoid a build breakage once the rest
of the series gets queued.

Robin.

 drivers/gpu/host1x/bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index 218e3718fd68..881fad5c3307 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.28.0.dirty



Re: [PATCH 1/9] gpu: host1x: Add missing DMA API include

2021-11-24 Thread Robin Murphy

On 2021-11-23 14:10, Robin Murphy wrote:

Host1x seems to be relying on picking up dma-mapping.h transitively from
iova.h, which has no reason to include it in the first place. Fix the
former issue before we totally break things by fixing the latter one.

CC: Thierry Reding 
CC: Mikko Perttunen 
CC: dri-devel@lists.freedesktop.org
CC: linux-te...@vger.kernel.org
Signed-off-by: Robin Murphy 
---

Feel free to pick this into drm-misc-next or drm-misc-fixes straight
away if that suits - it's only to avoid a build breakage once the rest
of the series gets queued.


Bah, seems like tegra-vic needs the same treatment too, but wasn't in my 
local config. Should I squash that into a respin of this patch on the 
grounds of being vaguely related, or would you prefer it separate?


(Either way I'll wait a little while to see if the buildbots uncover any 
more...)


Cheers,
Robin.


  drivers/gpu/host1x/bus.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index 218e3718fd68..881fad5c3307 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -5,6 +5,7 @@
   */
  
  #include 

+#include 
  #include 
  #include 
  #include 



[PATCH] drm/tegra: vic: Fix DMA API misuse

2021-11-24 Thread Robin Murphy
Upon failure, dma_alloc_coherent() returns NULL. If that does happen,
passing some uninitialised stack contents to dma_mapping_error() - which
belongs to a different API in the first place - has precious little
chance of detecting it.

Also include the correct header, because the fragile transitive
inclusion currently providing it is going to break soon.

Fixes: 20e7dce255e9 ("drm/tegra: Remove memory allocation from Falcon library")
Signed-off-by: Robin Murphy 

---

It also doesn't appear to handle failure of the tegra_drm_alloc() path
either, but that's a loose thread I have no desire to pull on... ;)
---
 drivers/gpu/drm/tegra/vic.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index c02010ff2b7f..da4af5371991 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -232,10 +233,8 @@ static int vic_load_firmware(struct vic *vic)
 
if (!client->group) {
virt = dma_alloc_coherent(vic->dev, size, &iova, GFP_KERNEL);
-
-   err = dma_mapping_error(vic->dev, iova);
-   if (err < 0)
-   return err;
+   if (!virt)
+   return -ENOMEM;
} else {
virt = tegra_drm_alloc(tegra, size, &iova);
}
-- 
2.28.0.dirty



Re: [PATCH] drm/i915: Use per device iommu check

2021-11-25 Thread Robin Murphy

On 2021-11-25 10:42, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

With both integrated and discrete Intel GPUs in a system, the current
global check of intel_iommu_gfx_mapped, as done from intel_vtd_active()
may not be completely accurate.

In this patch we add i915 parameter to intel_vtd_active() in order to
prepare it for multiple GPUs and we also change the check away from Intel
specific intel_iommu_gfx_mapped (global exported by the Intel IOMMU
driver) to probing the presence of IOMMU domain on a specific device
using iommu_get_domain_for_dev().


FWIW the way you have it now is functionally equivalent to using 
device_iommu_mapped(), which I think might be slightly clearer for the 
current intent, but I don't have a significantly strong preference 
(after all, this *was* the de-facto way of checking before 
device_iommu_mapped() was introduced, and there are still other examples 
of it around). So from the IOMMU perspective,


Acked-by: Robin Murphy 

Perhaps the AGP driver could also be tweaked and intel_iommu_gfx_mapped 
cleaned away entirely, but I'll leave that for Baolu to think about :)


Cheers,
Robin.


It was suggested to additionally check for __IOMMU_DOMAIN_PAGING bit
present in the returned iommu domain, however I opted not to do that at
this point.

Checking for this flag would detect whether IOMMU is in address
translation mode, with the assumption that is the only relevant question.
Downside to that is that in identity mapping (pass-through) mode IOMMU
hardware is still active, sitting on the communication path, just not
doing address translation.

My rationale was, that for the many intel_vtd_active() checks in our code
base, while some clearly are about performance impact of address
translation, some may be about working around functional issues when the
IOMMU hardware is simply being active. There also may be some performance
impact in pass-through mode, but I have not specifically attempted to
measure it.

Therefore the safest option feels to be to keep intel_vtd_active()
answering the question of "is the IOMMU hardware active" for this device.
If in the future we want to expand the set of questions to "is IOMMU
active and doing address translation" we can easily do that by adding a
new helper to be called from appropriate sites.

v2:
   * Check for dmar translation specifically, not just iommu domain. (Baolu)

v3:
  * Go back to plain "any domain" check for now, rewrite commit message.

Signed-off-by: Tvrtko Ursulin 
Cc: Lu Baolu 
Cc: Lucas De Marchi 
Cc: Robin Murphy 
---
  drivers/gpu/drm/i915/display/intel_bw.c  |  2 +-
  drivers/gpu/drm/i915/display/intel_display.c |  2 +-
  drivers/gpu/drm/i915/display/intel_fbc.c |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c   |  2 +-
  drivers/gpu/drm/i915/gem/i915_gemfs.c|  2 +-
  drivers/gpu/drm/i915/gt/intel_ggtt.c |  4 ++--
  drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
  drivers/gpu/drm/i915/i915_driver.c   |  7 +++
  drivers/gpu/drm/i915/i915_drv.h  | 13 +++--
  drivers/gpu/drm/i915/i915_gpu_error.c|  5 +
  drivers/gpu/drm/i915/intel_device_info.c | 14 +-
  drivers/gpu/drm/i915/intel_pm.c  |  2 +-
  12 files changed, 25 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bw.c 
b/drivers/gpu/drm/i915/display/intel_bw.c
index abec394f6869..2da4aacc956b 100644
--- a/drivers/gpu/drm/i915/display/intel_bw.c
+++ b/drivers/gpu/drm/i915/display/intel_bw.c
@@ -634,7 +634,7 @@ static unsigned int intel_bw_data_rate(struct 
drm_i915_private *dev_priv,
for_each_pipe(dev_priv, pipe)
data_rate += bw_state->data_rate[pipe];
  
-	if (DISPLAY_VER(dev_priv) >= 13 && intel_vtd_active())

+   if (DISPLAY_VER(dev_priv) >= 13 && intel_vtd_active(dev_priv))
data_rate = data_rate * 105 / 100;
  
  	return data_rate;

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index b2d51cd79d6c..1ef77ba7f645 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -1293,7 +1293,7 @@ static bool needs_async_flip_vtd_wa(const struct 
intel_crtc_state *crtc_state)
  {
struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev);
  
-	return crtc_state->uapi.async_flip && intel_vtd_active() &&

+   return crtc_state->uapi.async_flip && intel_vtd_active(i915) &&
(DISPLAY_VER(i915) == 9 || IS_BROADWELL(i915) || 
IS_HASWELL(i915));
  }
  
diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c

index d0c34bc3af6c..614e8697c068 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -1677,7 +1677,7 @@ static int intel_sa

Re: [PATCH] dma_heap: use sg_table.orig_nents in sg_table release flow

2021-11-25 Thread Robin Murphy

On 2021-11-25 12:46, guangming@mediatek.com wrote:

From: Guangming 

Use (sg_table.orig_nents) rather than (sg_table.nents) to traverse
sg_table to free sg_table.
Use (sg_table.nents) maybe will casuse some pages can't be freed.


...and this sort of bug is precisely why we have the 
for_each_sgtable_sg() helper ;)


Robin.


Signed-off-by: Guangming 
---
  drivers/dma-buf/heaps/system_heap.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index 23a7e74ef966..ce10d4eb674c 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf 
*dmabuf)
int i;
  
  	table = &buffer->sg_table;

-   for_each_sg(table->sgl, sg, table->nents, i) {
+   for_each_sg(table->sgl, sg, table->orig_nents, i) {
struct page *page = sg_page(sg);
  
  		__free_pages(page, compound_order(page));




Re: [PATCH v2] dma_heap: use for_each_sgtable_sg in sg_table release flow

2021-11-25 Thread Robin Murphy

On 2021-11-25 13:49, guangming@mediatek.com wrote:

From: Guangming 

Use (for_each_sgtable_sg) rather than (for_each_sg) to traverse
sg_table to free sg_table.
Use (for_each_sg) maybe will casuse some pages can't be freed
when send wrong nents number.


It's still worth spelling out that this is fixing a bug where the 
current code should have been using table->orig_nents - it's just that 
switching to the sgtable helper is the best way to make the fix, since 
it almost entirely removes the possibility of making that (previously 
rather common) mistake.


If it helps, for the change itself:

Reviewed-by: Robin Murphy 

Thanks,
Robin.


Signed-off-by: Guangming 
---
  drivers/dma-buf/heaps/system_heap.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c 
b/drivers/dma-buf/heaps/system_heap.c
index 23a7e74ef966..8660508f3684 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf 
*dmabuf)
int i;
  
  	table = &buffer->sg_table;

-   for_each_sg(table->sgl, sg, table->nents, i) {
+   for_each_sgtable_sg(table, sg, i) {
struct page *page = sg_page(sg);
  
  		__free_pages(page, compound_order(page));




Re: [PATCH v2 3/3] drm/etnaviv: use a 32 bit mask as coherent DMA mask

2021-12-01 Thread Robin Murphy

Sorry I missed this earlier...

On 2021-09-07 17:49, Michael Walle wrote:

The STLB and the first command buffer (which is used to set up the TLBs)
has a 32 bit size restriction in hardware. There seems to be no way to
specify addresses larger than 32 bit. Keep it simple and restict the
addresses to the lower 4 GiB range for all coherent DMA memory
allocations.

Please note, that platform_device_alloc() will initialize dev->dma_mask
to point to pdev->platform_dma_mask, thus dma_set_mask() will work as
expected.

While at it, move the dma_mask setup code to the of_dma_configure() to
keep all the DMA setup code next to each other.

Suggested-by: Lucas Stach 
Signed-off-by: Michael Walle 
---
  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 54eb653ca295..0b756ecb1bc2 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -613,6 +613,24 @@ static int etnaviv_pdev_probe(struct platform_device *pdev)
component_match_add(dev, &match, compare_str, names[i]);
}
  
+	/*

+* PTA and MTLB can have 40 bit base addresses, but
+* unfortunately, an entry in the MTLB can only point to a
+* 32 bit base address of a STLB. Moreover, to initialize the
+* MMU we need a command buffer with a 32 bit address because
+* without an MMU there is only an indentity mapping between
+* the internal 32 bit addresses and the bus addresses.
+*
+* To make things easy, we set the dma_coherent_mask to 32
+* bit to make sure we are allocating the command buffers and
+* TLBs in the lower 4 GiB address space.
+*/
+   if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(40)) ||
+   dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) {


Since AFAICS you're not changing the default dma_mask pointer to point 
to some storage other than the coherent mask, the dma_set_mask() call 
effectively does nothing and both masks will end up reading back as 32 bits.


Robin.


+   dev_dbg(&pdev->dev, "No suitable DMA available\n");
+   return -ENODEV;
+   }
+
/*
 * Apply the same DMA configuration to the virtual etnaviv
 * device as the GPU we found. This assumes that all Vivante
@@ -671,8 +689,6 @@ static int __init etnaviv_init(void)
of_node_put(np);
goto unregister_platform_driver;
}
-   pdev->dev.coherent_dma_mask = DMA_BIT_MASK(40);
-   pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
  
  		ret = platform_device_add(pdev);

if (ret) {



Re: [PATCH v2 3/3] drm/etnaviv: use a 32 bit mask as coherent DMA mask

2021-12-01 Thread Robin Murphy

On 2021-12-01 13:41, Lucas Stach wrote:

Hi Robin,

Am Mittwoch, dem 01.12.2021 um 12:50 + schrieb Robin Murphy:

Sorry I missed this earlier...

On 2021-09-07 17:49, Michael Walle wrote:

The STLB and the first command buffer (which is used to set up the TLBs)
has a 32 bit size restriction in hardware. There seems to be no way to
specify addresses larger than 32 bit. Keep it simple and restict the
addresses to the lower 4 GiB range for all coherent DMA memory
allocations.

Please note, that platform_device_alloc() will initialize dev->dma_mask
to point to pdev->platform_dma_mask, thus dma_set_mask() will work as
expected.

While at it, move the dma_mask setup code to the of_dma_configure() to
keep all the DMA setup code next to each other.

Suggested-by: Lucas Stach 
Signed-off-by: Michael Walle 
---
   drivers/gpu/drm/etnaviv/etnaviv_drv.c | 20 ++--
   1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 54eb653ca295..0b756ecb1bc2 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -613,6 +613,24 @@ static int etnaviv_pdev_probe(struct platform_device *pdev)
component_match_add(dev, &match, compare_str, names[i]);
}
   
+	/*

+* PTA and MTLB can have 40 bit base addresses, but
+* unfortunately, an entry in the MTLB can only point to a
+* 32 bit base address of a STLB. Moreover, to initialize the
+* MMU we need a command buffer with a 32 bit address because
+* without an MMU there is only an indentity mapping between
+* the internal 32 bit addresses and the bus addresses.
+*
+* To make things easy, we set the dma_coherent_mask to 32
+* bit to make sure we are allocating the command buffers and
+* TLBs in the lower 4 GiB address space.
+*/
+   if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(40)) ||
+   dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32))) {


Since AFAICS you're not changing the default dma_mask pointer to point
to some storage other than the coherent mask, the dma_set_mask() call
effectively does nothing and both masks will end up reading back as 32 bits.


 From what I can see the dma_mask has allocated storage in the platform
device and does not point to the coherent dma mask, see
setup_pdev_dma_masks().


Urgh, apologies for the confusion - seems I had one of those mental 
short-circuits and was utterly convinced that that's what the platform 
device setup did, but of course it's only the fallback case in 
of_dma_configure(). Sorry for the noise!


Cheers,
Robin.


Re: [PATCH v2 1/3] dt-bindings: gpu: mali-bifrost: Document RZ/G2L support

2021-12-06 Thread Robin Murphy

On 2021-12-06 15:00, Biju Das wrote:

The Renesas RZ/G2{L, LC} SoC (a.k.a R9A07G044) has a Bifrost Mali-G31 GPU,
add a compatible string for it.

Signed-off-by: Biju Das 
Reviewed-by: Lad Prabhakar 
---
v1->v2:
  * Updated minItems for resets as 2
  * Documented optional property reset-names
  * Documented reset-names as required property for RZ/G2L SoC.
---
  .../bindings/gpu/arm,mali-bifrost.yaml| 39 ++-
  1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml 
b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
index 6f98dd55fb4c..c3b2f4ddd520 100644
--- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
+++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
@@ -19,6 +19,7 @@ properties:
- amlogic,meson-g12a-mali
- mediatek,mt8183-mali
- realtek,rtd1619-mali
+  - renesas,r9a07g044-mali
- rockchip,px30-mali
- rockchip,rk3568-mali
- const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully 
discoverable
@@ -27,19 +28,30 @@ properties:
  maxItems: 1
  
interrupts:

+minItems: 3
  items:
- description: Job interrupt
- description: MMU interrupt
- description: GPU interrupt
+  - description: EVENT interrupt


I believe we haven't bothered with the "Event" interrupt so far since 
there's no real meaningful use for it - it seems the downstream binding 
for Arm's kbase driver doesn't mention it either.



interrupt-names:
+minItems: 3
  items:
- const: job
- const: mmu
- const: gpu
+  - const: event
  
clocks:

-maxItems: 1
+minItems: 1
+maxItems: 3
+
+  clock-names:
+items:
+  - const: gpu
+  - const: bus
+  - const: bus_ace


Note that the Bifrost GPUs themselves all only have a single external 
clock and reset (unexcitingly named "CLK" and "RESETn" respectively, 
FWIW). I can't help feeling wary that defining additional names for 
vendor integration details in the core binding may quickly grow into a 
mess of mutually-incompatible sets of values, for no great benefit. At 
the very least, it would seem more sensible to put them in the 
SoC-specific conditional schemas.


Robin.

  
mali-supply: true
  
@@ -52,7 +64,14 @@ properties:

  maxItems: 3
  
resets:

-maxItems: 2
+minItems: 2
+maxItems: 3
+
+  reset-names:
+items:
+  - const: rst
+  - const: axi_rst
+  - const: ace_rst
  
"#cooling-cells":

  const: 2
@@ -113,6 +132,22 @@ allOf:
  - sram-supply
  - power-domains
  - power-domain-names
+  - if:
+  properties:
+compatible:
+  contains:
+const: renesas,r9a07g044-mali
+then:
+  properties:
+interrupt-names:
+  minItems: 4
+clock-names:
+  minItems: 3
+  required:
+- clock-names
+- power-domains
+- resets
+- reset-names
  else:
properties:
  power-domains:



Re: [PATCH 04/12] drm/rockchip: dw_hdmi: add regulator support

2021-12-07 Thread Robin Murphy

On 2021-11-17 14:33, Sascha Hauer wrote:

The RK3568 has HDMI_TX_AVDD0V9 and HDMI_TX_AVDD_1V8 supply inputs needed
for the HDMI port. add support for these to the driver for boards which
have them supplied by switchable regulators.

Signed-off-by: Sascha Hauer 
---
  .../display/rockchip/rockchip,dw-hdmi.yaml|  6 ++
  drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c   | 58 ++-
  2 files changed, 61 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml 
b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
index 53fa42479d5b7..293b2cfbf739f 100644
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
@@ -28,6 +28,12 @@ properties:
reg-io-width:
  const: 4
  
+  avdd-0v9-supply:

+description: A 0.9V supply that powers up the SoC internal circuitry.
+
+  avdd-1v8-supply:
+description: A 0.9V supply that powers up the SoC internal circuitry.
+
clocks:
  minItems: 2
  items:
diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index 29608c25e2d0e..b8fe56c89cdc9 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -83,6 +84,8 @@ struct rockchip_hdmi {
struct clk *vpll_clk;
struct clk *grf_clk;
struct dw_hdmi *hdmi;
+   struct regulator *avdd_0v9;
+   struct regulator *avdd_1v8;
struct phy *phy;
  };
  
@@ -222,6 +225,22 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi *hdmi)

hdmi->vpll_clk = hdmi->clks[RK_HDMI_CLK_VPLL].clk;
hdmi->grf_clk = hdmi->clks[RK_HDMI_CLK_GRF].clk;
  
+	hdmi->avdd_0v9 = devm_regulator_get_optional(hdmi->dev, "avdd-0v9");


These are clearly *not* optional, unless the HDMI block is magic and can 
still work without physical power. Use devm_regulator_get(), and if the 
real supply is missing from the DT for whatever reason you should get a 
dummy regulator back, which you can then successfully disable and enable 
without all the conditional mess.


Robin.


+   if (IS_ERR(hdmi->avdd_0v9)) {
+   if (PTR_ERR(hdmi->avdd_0v9) != -ENODEV)
+   return PTR_ERR(hdmi->avdd_0v9);
+
+   hdmi->avdd_0v9 = NULL;
+   }
+
+   hdmi->avdd_1v8 = devm_regulator_get_optional(hdmi->dev, "avdd-1v8");
+   if (IS_ERR(hdmi->avdd_1v8)) {
+   if (PTR_ERR(hdmi->avdd_1v8) != -ENODEV)
+   return PTR_ERR(hdmi->avdd_1v8);
+
+   hdmi->avdd_1v8 = NULL;
+   }
+
return 0;
  }
  
@@ -559,11 +578,27 @@ static int dw_hdmi_rockchip_bind(struct device *dev, struct device *master,

return ret;
}
  
+	if (hdmi->avdd_0v9) {

+   ret = regulator_enable(hdmi->avdd_0v9);
+   if (ret) {
+   DRM_DEV_ERROR(hdmi->dev, "failed to enable avdd0v9: 
%d\n", ret);
+   goto err_avdd_0v9;
+   }
+   }
+
+   if (hdmi->avdd_1v8) {
+   ret = regulator_enable(hdmi->avdd_1v8);
+   if (ret) {
+   DRM_DEV_ERROR(hdmi->dev, "failed to enable avdd1v8: 
%d\n", ret);
+   goto err_avdd_1v8;
+   }
+   }
+
ret = clk_bulk_prepare_enable(RK_HDMI_NCLOCKS_HDMI, hdmi->clks);
if (ret) {
DRM_DEV_ERROR(hdmi->dev, "Failed to enable HDMI vpll: %d\n",
  ret);
-   return ret;
+   goto err_clk;
}
  
  	if (hdmi->chip_data == &rk3568_chip_data) {

@@ -587,10 +622,21 @@ static int dw_hdmi_rockchip_bind(struct device *dev, 
struct device *master,
 */
if (IS_ERR(hdmi->hdmi)) {
ret = PTR_ERR(hdmi->hdmi);
-   drm_encoder_cleanup(encoder);
-   clk_bulk_disable_unprepare(RK_HDMI_NCLOCKS_HDMI, hdmi->clks);
+   goto err_bind;
}
  
+	return 0;

+
+err_bind:
+   clk_bulk_disable_unprepare(RK_HDMI_NCLOCKS_HDMI, hdmi->clks);
+   drm_encoder_cleanup(encoder);
+err_clk:
+   if (hdmi->avdd_1v8)
+   regulator_disable(hdmi->avdd_1v8);
+err_avdd_1v8:
+   if (hdmi->avdd_0v9)
+   regulator_disable(hdmi->avdd_0v9);
+err_avdd_0v9:
return ret;
  }
  
@@ -601,6 +647,12 @@ static void dw_hdmi_rockchip_unbind(struct device *dev, struct device *master,
  
  	dw_hdmi_unbind(hdmi->hdmi);

clk_bulk_disable_unprepare(RK_HDMI_NCLOCKS_HDMI, hdmi->clks);
+
+   if (hdmi->avdd_1v8)
+   regulator_disable(hdmi->avdd_1v8);
+
+   if (hdmi->avdd_0v9)
+   regulator_disable(hdmi->avdd_0v9);
  }
  
  static const struct component_ops dw_hdmi_rockchip_ops = {




Re: [PATCH 08/18] dt-bindings: display: rockchip: dw-hdmi: Add regulator support

2021-12-08 Thread Robin Murphy

On 2021-12-08 15:12, Sascha Hauer wrote:

Signed-off-by: Sascha Hauer 
---
  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 6 ++
  1 file changed, 6 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml 
b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
index 2ab6578033da2..b9dca49aa6e05 100644
--- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
+++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml
@@ -28,6 +28,12 @@ properties:
reg-io-width:
  const: 4
  
+  avdd-0v9-supply:

+description: A 0.9V supply that powers up the SoC internal circuitry.


Might be worth calling out the actual pin name so it's abundantly clear 
for DT authors cross-referencing schematics. Annoyingly, some SoCs have 
HDMI_AVDD_1V0 instead of HDMI_AVDD_0V9 - I'm not sure it's worth 
splitting hairs that far in terms of the property name itself, but I'll 
leave that for others to decide.



+  avdd-1v8-supply:
+description: A 1.8V supply that powers up the SoC internal circuitry.


At least HDMI_AVDD_1V8 seems more consistent.

Thanks,
Robin.


+
clocks:
  minItems: 2
  items:



[PATCH v2 02/11] gpu: host1x: Add missing DMA API include

2021-12-10 Thread Robin Murphy
Host1x seems to be relying on picking up dma-mapping.h transitively from
iova.h, which has no reason to include it in the first place. Fix the
former issue before we totally break things by fixing the latter one.

CC: Thierry Reding 
CC: Mikko Perttunen 
CC: dri-devel@lists.freedesktop.org
Signed-off-by: Robin Murphy 
---

v2: No change

 drivers/gpu/host1x/bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index 218e3718fd68..881fad5c3307 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.28.0.dirty



[PATCH v2 03/11] drm/tegra: vic: Fix DMA API misuse

2021-12-10 Thread Robin Murphy
Upon failure, dma_alloc_coherent() returns NULL. If that does happen,
passing some uninitialised stack contents to dma_mapping_error() - which
belongs to a different API in the first place - has precious little
chance of detecting it.

Also include the correct header, because the fragile transitive
inclusion currently providing it is going to break soon.

Fixes: 20e7dce255e9 ("drm/tegra: Remove memory allocation from Falcon library")
CC: Thierry Reding 
CC: Mikko Perttunen 
CC: dri-devel@lists.freedesktop.org
Signed-off-by: Robin Murphy 

---

It also doesn't appear to handle failure of the tegra_drm_alloc() path
either, but that's a loose thread I have no desire to pull on... ;)

v2: Resend as part of the series, originally posted separately here:

https://lore.kernel.org/dri-devel/2703882439344010e33bf21ecd63cf9e5e6dc00d.1637781007.git.robin.mur...@arm.com/

 drivers/gpu/drm/tegra/vic.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index c02010ff2b7f..da4af5371991 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -232,10 +233,8 @@ static int vic_load_firmware(struct vic *vic)
 
if (!client->group) {
virt = dma_alloc_coherent(vic->dev, size, &iova, GFP_KERNEL);
-
-   err = dma_mapping_error(vic->dev, iova);
-   if (err < 0)
-   return err;
+   if (!virt)
+   return -ENOMEM;
} else {
virt = tegra_drm_alloc(tegra, size, &iova);
}
-- 
2.28.0.dirty



Re: [PATCH v2 0/8] Host1x context isolation support

2021-12-14 Thread Robin Murphy

On 2021-11-08 10:36, Mikko Perttunen wrote:

On 9/16/21 5:32 PM, Mikko Perttunen wrote:

Hi all,

***
New in v2:

Added support for Tegra194
Use standard iommu-map property instead of custom mechanism
***

this series adds support for Host1x 'context isolation'. Since
when programming engines through Host1x, userspace can program in
any addresses it wants, we need some way to isolate the engines'
memory spaces. Traditionally this has either been done imperfectly
with a single shared IOMMU domain, or by copying and verifying the
programming command stream at submit time (Host1x firewall).

Since Tegra186 there is a privileged (only usable by kernel)
Host1x opcode that allows setting the stream ID sent by the engine
to the SMMU. So, by allocating a number of context banks and stream
IDs for this purpose, and using this opcode at the beginning of
each job, we can implement isolation. Due to the limited number of
context banks only each process gets its own context, and not
each channel.

This feature also allows sharing engines among multiple VMs when
used with Host1x's hardware virtualization support - up to 8 VMs
can be configured with a subset of allowed stream IDs, enforced
at hardware level.

To implement this, this series adds a new host1x context bus, which
will contain the 'struct device's corresponding to each context
bank / stream ID, changes to device tree and SMMU code to allow
registering the devices and using the bus, as well as the Host1x
stream ID programming code and support in TegraDRM.

Device tree bindings are not updated yet pending consensus that the
proposed changes make sense.

Thanks,
Mikko

Mikko Perttunen (8):
   gpu: host1x: Add context bus
   gpu: host1x: Add context device management code
   gpu: host1x: Program context stream ID on submission
   iommu/arm-smmu: Attach to host1x context device bus
   arm64: tegra: Add Host1x context stream IDs on Tegra186+
   drm/tegra: falcon: Set DMACTX field on DMA transactions
   drm/tegra: vic: Implement get_streamid_offset
   drm/tegra: Support context isolation

  arch/arm64/boot/dts/nvidia/tegra186.dtsi  |  12 ++
  arch/arm64/boot/dts/nvidia/tegra194.dtsi  |  12 ++
  drivers/gpu/Makefile  |   3 +-
  drivers/gpu/drm/tegra/drm.h   |   2 +
  drivers/gpu/drm/tegra/falcon.c    |   8 +
  drivers/gpu/drm/tegra/falcon.h    |   1 +
  drivers/gpu/drm/tegra/submit.c    |  13 ++
  drivers/gpu/drm/tegra/uapi.c  |  34 -
  drivers/gpu/drm/tegra/vic.c   |  38 +
  drivers/gpu/host1x/Kconfig    |   5 +
  drivers/gpu/host1x/Makefile   |   2 +
  drivers/gpu/host1x/context.c  | 174 ++
  drivers/gpu/host1x/context.h  |  27 
  drivers/gpu/host1x/context_bus.c  |  31 
  drivers/gpu/host1x/dev.c  |  12 +-
  drivers/gpu/host1x/dev.h  |   2 +
  drivers/gpu/host1x/hw/channel_hw.c    |  52 ++-
  drivers/gpu/host1x/hw/host1x06_hardware.h |  10 ++
  drivers/gpu/host1x/hw/host1x07_hardware.h |  10 ++
  drivers/iommu/arm/arm-smmu/arm-smmu.c |  13 ++
  include/linux/host1x.h    |  21 +++
  include/linux/host1x_context_bus.h    |  15 ++
  22 files changed, 488 insertions(+), 9 deletions(-)
  create mode 100644 drivers/gpu/host1x/context.c
  create mode 100644 drivers/gpu/host1x/context.h
  create mode 100644 drivers/gpu/host1x/context_bus.c
  create mode 100644 include/linux/host1x_context_bus.h



IOMMU/DT folks, any thoughts about this approach? The patches that are 
of interest outside of Host1x/TegraDRM specifics are patches 1, 2, 4, 
and 5.


FWIW it looks fairly innocuous to me. I don't understand host1x - 
neither hardware nor driver abstractions - well enough to meaningfully 
review it all (e.g. maybe it's deliberate that the bus .dma_configure 
method isn't used?), but the SMMU patch seems fine given the Kconfig 
solution to avoid module linkage problems.


Cheers,
Robin.


Re: [PATCH v3] drm: rockchip: hdmi: enable higher resolutions than FHD

2020-12-14 Thread Robin Murphy

On 2020-12-14 11:03, Vicente Bergas wrote:

On Tue, Dec 1, 2020 at 5:06 PM Vicente Bergas  wrote:

This patch enables a QHD HDMI monitor to work at native resolution.


Please, anybody?


Has anyone been able to validate this on other SoCs? I guess that's 
still the main concern - empirically I've found that clock rates that 
work perfectly on RK3399 can be glitchy on RK3288 to the point of being 
barely usable, and can result in no display at all on chips not using 
the Synopsys phy, like RK3328, where the HDMI driver considers the mode 
valid but later the phy driver is unable to match it.


I don't have access to a QHD display to test this myself; I've only 
played around with weirder 4:3, 5:4 and 16:10 modes on PC monitors.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-17 Thread Robin Murphy

On 2020-12-17 10:25, Christian König wrote:

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]
Hi, Christian. I'm not sure why this change is a hack here. I cannot 
see the problem and wll be grateful if you give more explainations.
__memset is supposed to work on those addresses, otherwise you can't 
use the

e8860 on your arm64 system.
If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5f&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HhWxUaLo3WpzoV6hjV%2BG1HICaIOXwsoNpzv5tNMNg8A%3D&reserved=0) 
is needed? (I also notice drm/radeon didn't take this change though) 
just out of curiosity.


We generally accept those patches as cleanup in the kernel with the hope 
that we can find a way to work around the userspace restrictions.


But when you also have this issue in userspace then there isn't much we 
can do for you.



Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general 
problem here.

I cannot see what's the more general problem here :( u mean performance?


No, not performance. See standards like OpenGL, Vulkan as well as VA-API 
and VDPAU require that you can mmap() device memory and execute 
memset/memcpy on the memory from userspace.


If your ARM base board can't do that for some then you can't use the 
hardware with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based 
systems I believe it should be able to mmap() to userspace with the 
Normal memory type, where unaligned accesses and such are allowed, as 
opposed to the Device memory type intended for MMIO mappings, which has 
more restrictions but stricter ordering guarantees.


Regardless of what happens elsewhere though, if something is mapped 
*into the kernel* with ioremap(), then it is fundamentally wrong per the 
kernel memory model to reference that mapping directly without using I/O 
accessors. That is not specific to any individual architecture, and 
Sparse should be screaming about it already. I guess in this case the 
UVD code needs to pay more attention to whether radeon_bo_kmap() ends up 
going via ttm_bo_ioremap() or not.


(I'm assuming the initial fault was memset() with 0 trying to perform 
"DC ZVA" on a Device-type mapping from ioremap() - FYI a stacktrace on 
its own without the rest of the error dump showing what actually 
triggered it isn't overly useful)


Robin.

For amdgpu I suggest that we allocate the UVD message in GTT 
instead of VRAM
since we don't have the hardware restriction for that on the new 
generations.


Thanks, I will try to dig into deeper. But what's the "hardware 
restriction" meaning here? I'm not familiar with video driver stack 
and amd gpu, sorry.
On older hardware (AGP days) the buffer had to be in VRAM (MMIO) 
memory, but on

modern system GTT (system memory) works as well.
IIUC, e8860 can use amdgpu(I use radeon now) beause its device id 6822 
is in amdgpu's table. But I cannot tell whether e8860 has iommu, and I 
cannot find iommu from lspci, so graphics translation table may not 
work here?


That is not related to IOMMU. IOMMU is a feature of the CPU/motherboard. 
This is implemented using GTT, e.g. the VM page tables inside the GPU.


And yes it should work I will prepare a patch for it.

BTW: How does userspace work on arm64 then? The driver stack 
usually only works

if mmio can be mapped directly.
I also post two usespace issue on mesa, and you may be interested 
with them:
   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3954&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274023350%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZR7pDS%2BCLUuMjCeKcMAXfHtbczt8WdUwSeLZCuHfCHw%3D&reserved=0 

   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F3951&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ed3c075888746b7f41408d8a22811c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437640274033344%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jAJo3aG2I1oIDTZXWhNgcKoKbd6tTdiAtc7vE4hJJPY%3D&reserved=0 

I paste some virtual memory

Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Robin Murphy

On 2020-12-17 14:02, Christian König wrote:

Am 17.12.20 um 14:45 schrieb Robin Murphy:

On 2020-12-17 10:25, Christian König wrote:

Am 17.12.20 um 02:07 schrieb Chen Li:

On Wed, 16 Dec 2020 22:19:11 +0800,
Christian König wrote:

Am 16.12.20 um 14:48 schrieb Chen Li:

On Wed, 16 Dec 2020 15:59:37 +0800,
Christian König wrote:

[SNIP]
Hi, Christian. I'm not sure why this change is a hack here. I 
cannot see the problem and wll be grateful if you give more 
explainations.
__memset is supposed to work on those addresses, otherwise you 
can't use the

e8860 on your arm64 system.
If __memset is supposed to work on those adresses, why this 
commit(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fcommit%2Fba0b2275a6781b2f3919d931d63329b5548f6d5f&data=04%7C01%7Cchristian.koenig%40amd.com%7C3551ae4972b044bb831608d8a291f81c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438095114292394%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xns81uCGfN1tjsVn5LBU8QhmUinZRJQlXz8w%2FJ7%2FGTM%3D&reserved=0) 
is needed? (I also notice drm/radeon didn't take this change though) 
just out of curiosity.


We generally accept those patches as cleanup in the kernel with the 
hope that we can find a way to work around the userspace restrictions.


But when you also have this issue in userspace then there isn't much 
we can do for you.



Replacing the the direct write in the kernel with calls to writel() or
memset_io() will fix that temporary, but you have a more general 
problem here.
I cannot see what's the more general problem here :( u mean 
performance?


No, not performance. See standards like OpenGL, Vulkan as well as 
VA-API and VDPAU require that you can mmap() device memory and 
execute memset/memcpy on the memory from userspace.


If your ARM base board can't do that for some then you can't use the 
hardware with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane 
Arm-based systems I believe it should be able to mmap() to userspace 
with the Normal memory type, where unaligned accesses and such are 
allowed, as opposed to the Device memory type intended for MMIO 
mappings, which has more restrictions but stricter ordering guarantees.


Do you have some background why some ARM boards fail with that?

We had a couple of reports that memset/memcpy fail in userspace (usually 
system just spontaneously reboots or becomes unresponsive), but so far 
nobody could tell us why that happens?


Part of it is that Arm doesn't really have an ideal memory type for 
mapping RAM behind PCI (much like we also struggle with the vague 
expectations of what write-combine might mean beyond x86). Device memory 
can be relaxed to allow gathering, reordering and write-buffering, but 
is still a bit too restrictive in other ways - aligned, non-speculative, 
etc. - for something that's really just RAM and expected to be usable as 
such. Thus to map PCI memory as "write-combine" we use Normal 
non-cacheable, which means the CPU MMU is going to allow software to do 
all the things it might expect of RAM, but we're now at the mercy of the 
menagerie of interconnects and PCI implementations out there.


Atomic operations, for example, *might* be resolved by the CPU coherency 
mechanism or in the interconnect, such that the PCI host bridge only 
sees regular loads and stores, but more often than not they'll just 
result in an atomic transaction going all the way to the host bridge. A 
super-duper-clever host bridge implementation might even support that, 
but the vast majority are likely to just reject it as invalid.


Similarly, unaligned accesses, cache line fills/evictions, and such will 
often work, since they're essentially just larger read/write bursts, but 
some host bridges can be picky and might reject access sizes they don't 
like (there's at least one where even 64-bit accesses don't work. On a 
64-bit system...)


If an invalid transaction does reach the host bridge, it's going to come 
back to the CPU as an external abort. If we're really lucky that could 
be taken synchronously, attributable to a specific instruction, and just 
oops/SIGBUS the relevant kernel/userspace thread. Often though, 
(particularly with big out-of-order CPUs) it's likely to be asynchronous 
and no longer attributable, and thus taken as an SError event, which in 
general roughly translates to "part of the SoC has fallen off". The only 
reasonable response we have to that is to panic the system.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Robin Murphy

On 2020-12-18 06:14, Chen Li wrote:
[...]

No, not performance. See standards like OpenGL, Vulkan as well as VA-API and
VDPAU require that you can mmap() device memory and execute memset/memcpy on
the memory from userspace.

If your ARM base board can't do that for some then you can't use the hardware
with that board.


If the VRAM lives in a prefetchable PCI bar then on most sane Arm-based systems
I believe it should be able to mmap() to userspace with the Normal memory type,
where unaligned accesses and such are allowed, as opposed to the Device memory
type intended for MMIO mappings, which has more restrictions but stricter
ordering guarantees.
  
Hi, Robin. I cannot understand it allow unaligned accesses. prefetchable PCI bar should also be mmio, and accesses will end with device memory, so why does this allow unaligned access?


Because even Device-GRE is a bit too restrictive to expose to userspace 
that's likely to expect it to behave as regular memory, so, for better 
or worse, we use MT_NORMAL_MC for pgrprot_writecombine().



Regardless of what happens elsewhere though, if something is mapped *into the
kernel* with ioremap(), then it is fundamentally wrong per the kernel memory
model to reference that mapping directly without using I/O accessors. That is
not specific to any individual architecture, and Sparse should be screaming
about it already. I guess in this case the UVD code needs to pay more attention
to whether radeon_bo_kmap() ends up going via ttm_bo_ioremap() or not.

(I'm assuming the initial fault was memset() with 0 trying to perform "DC ZVA"
on a Device-type mapping from ioremap() - FYI a stacktrace on its own without
the rest of the error dump showing what actually triggered it isn't overly
useful)

Robin.

why it may be 'DC ZVA'? I'm not sure the pc in initial kernel fault memset, but 
I capture the userspace crash pc: stp(128bit) or str with neon(also 128bit) to 
render node(/dev/dri/renderD128).


As I said it was an assumption. I guessed at it being more likely to be 
an MMU fault than an external abort, and given the size and the fact 
that it's a variable initialisation guessed at it being slightly more 
likely to hit the ZVA special-case rather than being unaligned. Looking 
again, I guess starting at an odd-numbered 32-bit element might lead to 
an unaligned store of XZR, but either way it doesn't really matter - 
what it showed is it clearly *could* be an MMU fault because TTM seems 
to be a bit careless with iomem pointers.


That said, if you're also getting external aborts from your host bridge 
not liking 128-bit transactions, then as Christian says you're probably 
going to have a bad time on this platform either way.


Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/[amdgpu|radeon]: fix memset on io mem

2020-12-18 Thread Robin Murphy

On 2020-12-18 14:33, Christian König wrote:

Am 18.12.20 um 15:17 schrieb Robin Murphy:

On 2020-12-17 14:02, Christian König wrote:

[SNIP]
Do you have some background why some ARM boards fail with that?

We had a couple of reports that memset/memcpy fail in userspace 
(usually system just spontaneously reboots or becomes unresponsive), 
but so far nobody could tell us why that happens?


Part of it is that Arm doesn't really have an ideal memory type for 
mapping RAM behind PCI (much like we also struggle with the vague 
expectations of what write-combine might mean beyond x86). Device 
memory can be relaxed to allow gathering, reordering and 
write-buffering, but is still a bit too restrictive in other ways - 
aligned, non-speculative, etc. - for something that's really just RAM 
and expected to be usable as such. Thus to map PCI memory as 
"write-combine" we use Normal non-cacheable, which means the CPU MMU 
is going to allow software to do all the things it might expect of 
RAM, but we're now at the mercy of the menagerie of interconnects and 
PCI implementations out there.


I see. As far as I know we already correctly map the RAM from the GPU as 
"write-combine".


Atomic operations, for example, *might* be resolved by the CPU 
coherency mechanism or in the interconnect, such that the PCI host 
bridge only sees regular loads and stores, but more often than not 
they'll just result in an atomic transaction going all the way to the 
host bridge. A super-duper-clever host bridge implementation might 
even support that, but the vast majority are likely to just reject it 
as invalid.


Support for atomics is actually specified by an PCIe extension. As far 
as I know that extension is even necessary for full KFD support on AMD 
and full Cuda support for NVidia GPUs.




Similarly, unaligned accesses, cache line fills/evictions, and such 
will often work, since they're essentially just larger read/write 
bursts, but some host bridges can be picky and might reject access 
sizes they don't like (there's at least one where even 64-bit accesses 
don't work. On a 64-bit system...)


This is breaking our neck here. We need 64bit writes on 64bit systems to 
end up as one 64bit write at the hardware and not two 32bit writes or 
otherwise the doorbells won't work correctly.


Just to clarify, that particular case *is* considered catastrophically 
broken ;)


In general you can assume that on AArch64, any aligned 64-bit load or 
store is atomic (64-bit accesses on 32-bit Arm are less well-defined, 
but hopefully nobody cares by now).


Larger writes are pretty much unproblematic, for P2P our bus interface 
even supports really large multi byte transfers.


If an invalid transaction does reach the host bridge, it's going to 
come back to the CPU as an external abort. If we're really lucky that 
could be taken synchronously, attributable to a specific instruction, 
and just oops/SIGBUS the relevant kernel/userspace thread. Often 
though, (particularly with big out-of-order CPUs) it's likely to be 
asynchronous and no longer attributable, and thus taken as an SError 
event, which in general roughly translates to "part of the SoC has 
fallen off". The only reasonable response we have to that is to panic 
the system.


Yeah, that sounds exactly like what we see on some of the ARM boards out 
there. At least we have an explanation for that behavior now.


Going to talk about this with our hardware engineers. We might be able 
to work around some of that stuff, but that is rather tricky to get 
working under those conditions.


Yeah, unfortunately there's no easy way to judge the quality of any 
given SoC's PCI implementation until you throw your required traffic at 
it and things either break or don't...


Cheers,
Robin.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules

2020-12-22 Thread Robin Murphy

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The SMMU driver depends on the availability of the ARM LPAE and
ARM V7S io-pgtable format code to work properly. In preparation


Nit: we don't really depend on v7s - we *can* use it if it's available, 
address constraints are suitable, and the SMMU implementation actually 
supports it (many don't), but we can still quite happily not use it even 
so. LPAE is mandatory in the architecture so that's our only hard 
requirement, embodied in the kconfig select.


This does mean there may technically still be a corner case involving 
ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime 
failure rather than a build error, so unless and until anyone 
demonstrates that it actually matters I don't feel particularly inclined 
to give it much thought.


Robin.


for having the io-pgtable formats as modules, add a "pre"
dependency with MODULE_SOFTDEP() to ensure that the io-pgtable
format modules are loaded before loading the ARM SMMU driver module.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfd..a72649f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU 
implementations");
  MODULE_AUTHOR("Will Deacon ");
  MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s");


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration

2020-12-22 Thread Robin Murphy
}
-subsys_initcall(arm_lpae_do_selftests);
+#else
+static int __init arm_lpae_do_selftests(void)
+{
+   return 0;
+}
  #endif
+
+static int __init arm_lpae_init(void)
+{
+   int ret, i;
+
+   for (i = 0; i < ARRAY_SIZE(io_pgtable_arm_lpae_init_fns); i++) {
+   ret = io_pgtable_ops_register(&io_pgtable_arm_lpae_init_fns[i]);
+   if (ret < 0) {
+   pr_err("Failed to register ARM LPAE fmt: %d\n");
+   goto err_io_pgtable_register;
+   }
+   }
+
+   ret = arm_lpae_do_selftests();
+   if (ret < 0)
+   goto err_io_pgtable_register;
+
+   return 0;
+
+err_io_pgtable_register:
+   for (i = i - 1; i >= 0; i--)


Personally I find "while (i--)" a bit clearer for this kind of 
unwinding, but maybe post-decrement isn't to everyone's taste.



+   io_pgtable_ops_unregister(&io_pgtable_arm_lpae_init_fns[i]);
+   return ret;
+}
+core_initcall(arm_lpae_init);
+
+static void __exit arm_lpae_exit(void)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(io_pgtable_arm_lpae_init_fns); i++)
+   io_pgtable_ops_unregister(&io_pgtable_arm_lpae_init_fns[i]);
+}
+module_exit(arm_lpae_exit);
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 94394c8..2c6eb2e 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -10,33 +10,45 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  
-static const struct io_pgtable_init_fns *

-io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
-#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE
-   [ARM_32_LPAE_S1] = &io_pgtable_arm_32_lpae_s1_init_fns,
-   [ARM_32_LPAE_S2] = &io_pgtable_arm_32_lpae_s2_init_fns,
-   [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
-   [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
-   [ARM_MALI_LPAE] = &io_pgtable_arm_mali_lpae_init_fns,
-#endif
-#ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S
-   [ARM_V7S] = &io_pgtable_arm_v7s_init_fns,
-#endif
+struct io_pgtable_init_fns_node {
+   struct io_pgtable_init_fns *fns;
+   struct list_head list;
  };
  
+static LIST_HEAD(io_pgtable_init_fns_list);

+static DEFINE_RWLOCK(io_pgtable_init_fns_list_lock);
+
+static struct io_pgtable_init_fns *io_pgtable_get_init_fns(enum io_pgtable_fmt 
fmt)
+{
+   struct io_pgtable_init_fns_node *iter;
+   struct io_pgtable_init_fns *fns = NULL;
+
+   read_lock(&io_pgtable_init_fns_list_lock);
+   list_for_each_entry(iter, &io_pgtable_init_fns_list, list)
+   if (iter->fns->fmt == fmt) {
+   fns = iter->fns;
+   break;
+   }
+   read_unlock(&io_pgtable_init_fns_list_lock);
+
+   return fns;
+}


I think it would be a lot easier to stick with a simple array indexed by 
enum - that way you can just set/clear/test entries without needing to 
worry about locking. Basically just remove the const and the 
initialisers from the existing one ;)


(and if you think you're concerned about memory, consider that just the 
list head plus lock is already half the size of the table)


Other than that, I think this all looks pretty promising - I'd suggest 
sending a non-RFC after rc1 so that it gets everyone's proper attention.


Thanks,
Robin.


+
  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
struct io_pgtable_cfg *cfg,
void *cookie)
  {
struct io_pgtable *iop;
-   const struct io_pgtable_init_fns *fns;
+   struct io_pgtable_init_fns *fns;
  
  	if (fmt >= IO_PGTABLE_NUM_FMTS)

return NULL;
  
-	fns = io_pgtable_init_table[fmt];

+   fns = io_pgtable_get_init_fns(fmt);
if (!fns)
return NULL;
  
@@ -59,12 +71,64 @@ EXPORT_SYMBOL_GPL(alloc_io_pgtable_ops);

  void free_io_pgtable_ops(struct io_pgtable_ops *ops)
  {
struct io_pgtable *iop;
+   struct io_pgtable_init_fns *fns;
  
  	if (!ops)

return;
  
  	iop = io_pgtable_ops_to_pgtable(ops);

io_pgtable_tlb_flush_all(iop);
-   io_pgtable_init_table[iop->fmt]->free(iop);
+   fns = io_pgtable_get_init_fns(iop->fmt);
+   if (fns)
+   fns->free(iop);
  }
  EXPORT_SYMBOL_GPL(free_io_pgtable_ops);
+
+int io_pgtable_ops_register(struct io_pgtable_init_fns *init_fns)
+{
+   struct io_pgtable_init_fns_node *iter, *fns_node;
+   int ret = 0;
+
+   if (!init_fns || init_fns->fmt >= IO_PGTABLE_NUM_FMTS ||
+   !init_fns->alloc || !init_fns->free)
+   return -EINVAL;
+
+   fns_node = kzalloc(sizeof(*fns_node), GFP_KERNEL);
+   if (!fns_node)
+   return -ENOMEM;
+
+   write_lock(&io_pgtable_init_fns_list_lock);
+   list_fo

Re: [PATCH v2 3/7] iommu/arm-smmu: Add dependency on io-pgtable format modules

2020-12-23 Thread Robin Murphy

On 2020-12-22 19:49, isa...@codeaurora.org wrote:

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The SMMU driver depends on the availability of the ARM LPAE and
ARM V7S io-pgtable format code to work properly. In preparation


Nit: we don't really depend on v7s - we *can* use it if it's
available, address constraints are suitable, and the SMMU
implementation actually supports it (many don't), but we can still
quite happily not use it even so. LPAE is mandatory in the
architecture so that's our only hard requirement, embodied in the
kconfig select.

This does mean there may technically still be a corner case involving
ARM_SMMU=y and IO_PGTABLE_ARM_V7S=m, but at worst it's now a runtime
failure rather than a build error, so unless and until anyone
demonstrates that it actually matters I don't feel particularly
inclined to give it much thought.

Robin.


Okay, I'll fix up the commit message, as well as the code, so that it
only depends on io-pgtable-arm.


Well, IIUC it would make sense to keep the softdep for when the v7s 
module *is* present; I just wanted to clarify that it's more of a 
nice-to-have rather than a necessity.


Robin.


Thanks,
Isaac

for having the io-pgtable formats as modules, add a "pre"
dependency with MODULE_SOFTDEP() to ensure that the io-pgtable
format modules are loaded before loading the ARM SMMU driver module.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d8c6bfd..a72649f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2351,3 +2351,4 @@ MODULE_DESCRIPTION("IOMMU API for ARM 
architected SMMU implementations");

  MODULE_AUTHOR("Will Deacon ");
  MODULE_ALIAS("platform:arm-smmu");
  MODULE_LICENSE("GPL v2");
+MODULE_SOFTDEP("pre: io-pgtable-arm io-pgtable-arm-v7s");



___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 1/7] iommu/io-pgtable: Introduce dynamic io-pgtable fmt registration

2020-12-23 Thread Robin Murphy

On 2020-12-22 19:54, isa...@codeaurora.org wrote:

On 2020-12-22 11:27, Robin Murphy wrote:

On 2020-12-22 00:44, Isaac J. Manjarres wrote:

The io-pgtable code constructs an array of init functions for each
page table format at compile time. This is not ideal, as this
increases the footprint of the io-pgtable code, as well as prevents
io-pgtable formats from being built as kernel modules.

In preparation for modularizing the io-pgtable formats, switch to a
dynamic registration scheme, where each io-pgtable format can register
their init functions with the io-pgtable code at boot or module
insertion time.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 34 +-
  drivers/iommu/io-pgtable-arm.c | 90 
++--
  drivers/iommu/io-pgtable.c | 94 
--

  include/linux/io-pgtable.h | 51 +
  4 files changed, 209 insertions(+), 60 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index 1d92ac9..89aad2f 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -835,7 +836,8 @@ static struct io_pgtable 
*arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,

  return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+static struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
+    .fmt    = ARM_V7S,
  .alloc    = arm_v7s_alloc_pgtable,
  .free    = arm_v7s_free_pgtable,
  };
@@ -982,5 +984,33 @@ static int __init arm_v7s_do_selftests(void)
  pr_info("self test ok\n");
  return 0;
  }
-subsys_initcall(arm_v7s_do_selftests);
+#else
+static int arm_v7s_do_selftests(void)
+{
+    return 0;
+}
  #endif
+
+static int __init arm_v7s_init(void)
+{
+    int ret;
+
+    ret = io_pgtable_ops_register(&io_pgtable_arm_v7s_init_fns);
+    if (ret < 0) {
+    pr_err("Failed to register ARM V7S format\n");


Super-nit: I think "v7s" should probably be lowercase there. Also
general consistency WRT to showing the error code and whether or not
to abbreviate "format" would be nice.


Ok, I can fix this accordingly.


+    return ret;
+    }
+
+    ret = arm_v7s_do_selftests();
+    if (ret < 0)
+    io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+
+    return ret;
+}
+core_initcall(arm_v7s_init);
+
+static void __exit arm_v7s_exit(void)
+{
+    io_pgtable_ops_unregister(&io_pgtable_arm_v7s_init_fns);
+}
+module_exit(arm_v7s_exit);
diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58..ff0ea2f 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1043,29 +1044,32 @@ arm_mali_lpae_alloc_pgtable(struct 
io_pgtable_cfg *cfg, void *cookie)

  return NULL;
  }
  -struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
-    .alloc    = arm_64_lpae_alloc_pgtable_s1,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
-    .alloc    = arm_64_lpae_alloc_pgtable_s2,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
-    .alloc    = arm_32_lpae_alloc_pgtable_s1,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
-    .alloc    = arm_32_lpae_alloc_pgtable_s2,
-    .free    = arm_lpae_free_pgtable,
-};
-
-struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
-    .alloc    = arm_mali_lpae_alloc_pgtable,
-    .free    = arm_lpae_free_pgtable,
+static struct io_pgtable_init_fns io_pgtable_arm_lpae_init_fns[] = {
+    {
+    .fmt    = ARM_32_LPAE_S1,
+    .alloc    = arm_32_lpae_alloc_pgtable_s1,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_32_LPAE_S2,
+    .alloc    = arm_32_lpae_alloc_pgtable_s2,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_64_LPAE_S1,
+    .alloc    = arm_64_lpae_alloc_pgtable_s1,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_64_LPAE_S2,
+    .alloc    = arm_64_lpae_alloc_pgtable_s2,
+    .free    = arm_lpae_free_pgtable,
+    },
+    {
+    .fmt    = ARM_MALI_LPAE,
+    .alloc    = arm_mali_lpae_alloc_pgtable,
+    .free    = arm_lpae_free_pgtable,
+    },
  };
    #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
@@ -1250,5 +1254,43 @@ static int __init arm_lpae_do_selftests(void)
  pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
  return fail ? -EFAULT : 0;
  }
-subsys_initcall(arm_lpae_do_selftests);
+#else
+static int __init arm_lpae_d

[PATCH] drm/tegra: Add missing DMA API includes

2021-12-16 Thread Robin Murphy
Include dma-mapping.h directly in the remaining places that touch the
DMA API, to avoid imminent breakage from refactoring other headers.

Signed-off-by: Robin Murphy 
---

This makes arm64 allmodconfig build cleanly for me with the IOVA series,
so hopefully is the last of it...

 drivers/gpu/drm/tegra/dc.c| 1 +
 drivers/gpu/drm/tegra/hub.c   | 1 +
 drivers/gpu/drm/tegra/plane.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index a29d64f87563..c33420e1fb07 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c
index b910155f80c4..7d52a403334b 100644
--- a/drivers/gpu/drm/tegra/hub.c
+++ b/drivers/gpu/drm/tegra/hub.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c
index 16a1cdc28657..75db069dd26f 100644
--- a/drivers/gpu/drm/tegra/plane.c
+++ b/drivers/gpu/drm/tegra/plane.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2017 NVIDIA CORPORATION.  All rights reserved.
  */
 
+#include 
 #include 
 #include 
 
-- 
2.28.0.dirty



Re: [PATCH 1/1] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain

2021-12-20 Thread Robin Murphy

On 2021-12-18 07:45, Lu Baolu wrote:

The suported page sizes of an iommu_domain are saved in the pgsize_bitmap
field. Retrieve the value from the right place.

Fixes: 58fd9375c2c534 ("drm/nouveau/platform: probe IOMMU if present")


...except domain->pgsize_bitmap was introduced more than a year after 
that commit ;)


As an improvement rather than a fix, though,

Reviewed-by: Robin Murphy 


Signed-off-by: Lu Baolu 
---
  drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index d0d52c1d4aee..992cc285f2fe 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -133,7 +133,7 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra 
*tdev)
 * or equal to the system's PAGE_SIZE, with a preference if
 * both are equal.
 */
-   pgsize_bitmap = tdev->iommu.domain->ops->pgsize_bitmap;
+   pgsize_bitmap = tdev->iommu.domain->pgsize_bitmap;
if (pgsize_bitmap & PAGE_SIZE) {
tdev->iommu.pgshift = PAGE_SHIFT;
} else {


Re: [PATCH 1/2] drm/panfrost: mmu: improved memory attributes

2021-12-24 Thread Robin Murphy

On 2021-12-23 11:06, asheplya...@basealt.ru wrote:

From: Alexey Sheplyakov 

T62x/T60x GPUs are known to not work with panfrost as of now.
One of the reasons is wrong/incomplete memory attributes which
the panfrost driver sets in the page tables:

- MEMATTR_IMP_DEF should be 0x48ULL, not 0x88ULL.
   0x88ULL is MEMATTR_OUTER_IMP_DEF


I guess the macro could be renamed if anyone's particularly bothered, 
but using the outer-cacheable attribute is deliberate because it is 
necessary for I/O-coherent GPUs to work properly (and should be 
irrelevant for non-coherent integrations). I'm away from my Juno board 
for the next couple of weeks but I'm fairly confident that this change 
as-is will break cache snooping.



- MEMATTR_FORCE_CACHE_ALL and MEMATTR_OUTER_WA are missing.


They're "missing" because they're not used, and there's currently no 
mechanism by which they *could* be used. Also note that the indices in 
MEMATTR have to line up with the euqivalent LPAE indices for the closest 
match to what the IOMMU API's IOMMU_{CACHE,MMIO} flags represent, so 
moving those around is yet more subtle breakage.



T72x and newer GPUs work just fine with such incomplete/wrong memory
attributes. However T62x are quite picky and quickly lock up.

To avoid the problem set the same memory attributes (in LPAE page
tables) as mali_kbase.

The patch has been tested (for regressions) with T860 GPU (rk3399 SoC).
At the first glance (using GNOME desktop, running glmark) it does
not cause any changes for this GPU.

Note: this patch is necessary, but *not* enough to get panfrost
working with T62x


I'd note that panfrost has been working OK - to the extent that Mesa 
supports its older ISA - on the T624 (single core group) in Arm's Juno 
SoC for over a year now since commit 268af50f38b1.


If you have to force outer non-cacheable to avoid getting translation 
faults and other errors that look like the GPU is inexplicably seeing 
the wrong data, I'd check whether you have the same thing where your 
integration is actually I/O-coherent and you're missing the 
"dma-coherent" property in your DT.


Thanks,
Robin.


Signed-off-by: Alexey Sheplyakov 
Signed-off-by: Vadim V. Vlasov 

Cc: Rob Herring 
Cc: Tomeu Vizoso 
Cc: Steven Price 
Cc: Alyssa Rosenzweig 
Cc: Vadim V. Vlasov 
---
  drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 ---
  drivers/iommu/io-pgtable-arm.c  | 16 
  2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 39562f2d11a4..2f4f8a17bc82 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -133,9 +133,6 @@ static void panfrost_mmu_enable(struct panfrost_device 
*pfdev, struct panfrost_m
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab));
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab));
  
-	/* Need to revisit mem attrs.

-* NC is the default, Mali driver is inner WT.
-*/
mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr));
mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr));
  
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c

index dd9e47189d0d..15b39c337e20 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -122,13 +122,17 @@
  #define ARM_LPAE_MAIR_ATTR_IDX_CACHE  1
  #define ARM_LPAE_MAIR_ATTR_IDX_DEV2
  #define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE 3
+#define ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA 4
  
  #define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)

  #define ARM_MALI_LPAE_TTBR_READ_INNER BIT(2)
  #define ARM_MALI_LPAE_TTBR_SHARE_OUTERBIT(4)
  
-#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL

-#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
+#define ARM_MALI_LPAE_MEMATTR_IMP_DEF  0x48ULL
+#define ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL 0x4FULL
+#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x4DULL
+#define ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF 0x88ULL
+#define ARM_MALI_LPAE_MEMATTR_OUTER_WA 0x8DULL
  
  #define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)

  #define APPLE_DART_PTE_PROT_NO_READ (1<<8)
@@ -1080,10 +1084,14 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
cfg->arm_mali_lpae_cfg.memattr =
(ARM_MALI_LPAE_MEMATTR_IMP_DEF
 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
+   (ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL
+<< ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
(ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC
 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
-   (ARM_MALI_LPAE_MEMATTR_IMP_DEF
-<< ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+   

[PATCH] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"

2016-04-29 Thread Robin Murphy
[2.898845] [] ret_from_fork+0x10/0x40
[2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7)
[2.898871] ---[ end trace d5713dcad023ee04 ]---
[2.89] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b

In a toss-up between the GPU seeing stale data artefacts on some systems
vs. catastrophic kernel crashes on other systems, the latter would seem
to take precedence, so revert this change until the real underlying
problem can be fixed.

Signed-off-by: Robin Murphy 
---

Alex, Ben, Dave,

I know Alex was looking into this, but since we're nearly at -rc6 already
it looks like the only thing to do for 4.6 is pick the lesser of two evils...

Thanks,
Robin.

 drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c
index 18fab3973ce5..62ad0300cfa5 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c
@@ -1614,7 +1614,7 @@ nvkm_device_pci_func = {
.fini = nvkm_device_pci_fini,
.resource_addr = nvkm_device_pci_resource_addr,
.resource_size = nvkm_device_pci_resource_size,
-   .cpu_coherent = !IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_ARM64),
+   .cpu_coherent = !IS_ENABLED(CONFIG_ARM),
 };

 int
-- 
2.8.1.dirty



[PATCH 1/4] drm/panfrost: Set DMA masks earlier

2019-05-03 Thread Robin Murphy
The DMA masks need to be set correctly before any DMA API activity kicks
off, and the current point in panfrost_probe() is way too late in that
regard. since panfrost_mmu_init() has already set up a live address
space and DMA-mapped MMU pagetables. We can't set masks until we've
queried the appropriate value from MMU_FEATURES, but as soon as
reasonably possible after that should suffice.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 5 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index c06af78ab833..af0058ffc1e4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -3,8 +3,6 @@
 /* Copyright 2019 Linaro, Ltd., Rob Herring  */
 /* Copyright 2019 Collabora ltd. */
 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -388,9 +386,6 @@ static int panfrost_probe(struct platform_device *pdev)
goto err_out0;
}
 
-   dma_set_mask_and_coherent(pfdev->dev,
-   DMA_BIT_MASK(FIELD_GET(0xff00, pfdev->features.mmu_features)));
-
err = panfrost_devfreq_init(pfdev);
if (err) {
dev_err(&pdev->dev, "Fatal error during devfreq init\n");
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index aceaf6e44a09..42511fc1fea0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -2,8 +2,10 @@
 /* Copyright 2018 Marty E. Plummer  */
 /* Copyright 2019 Linaro, Ltd., Rob Herring  */
 /* Copyright 2019 Collabora ltd. */
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -332,6 +334,9 @@ int panfrost_gpu_init(struct panfrost_device *pfdev)
 
panfrost_gpu_init_features(pfdev);
 
+   dma_set_mask_and_coherent(pfdev->dev,
+   DMA_BIT_MASK(FIELD_GET(0xff00, pfdev->features.mmu_features)));
+
irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "gpu");
if (irq <= 0)
return -ENODEV;
-- 
2.21.0.dirty

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 0/4] drm/panfrost: Misc. fixes and cleanups

2019-05-03 Thread Robin Murphy
Hi,

These are a few trivial fixes and cleanups from playing with the
panfrost kernel driver on an Arm Juno board. Not that anyone has ever
cared much about the built-in GPU on Juno, but it's at least a somewhat
interesting platform from the kernel driver perspective for having
I/O coherency, RAM above 4GB, and DVFS abstracted behind a firmware
interface.

Robin.


Robin Murphy (4):
  drm/panfrost: Set DMA masks earlier
  drm/panfrost: Disable PM on probe failure
  drm/panfrost: Don't scream about deferred probe
  drm/panfrost: Show stored feature registers

 drivers/gpu/drm/panfrost/panfrost_drv.c | 12 +---
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 19 ---
 2 files changed, 17 insertions(+), 14 deletions(-)

-- 
2.21.0.dirty

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/4] drm/panfrost: Disable PM on probe failure

2019-05-03 Thread Robin Murphy
Make sure to disable runtime PM again if probe fails after we've enabled
it. Otherwise, any subsequent attempt to re-probe starts triggering
"Unbalanced pm_runtime_enable!" assertions from the driver core.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index af0058ffc1e4..a881e2346b55 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -405,6 +405,7 @@ static int panfrost_probe(struct platform_device *pdev)
 err_out1:
panfrost_device_fini(pfdev);
 err_out0:
+   pm_runtime_disable(pfdev->dev);
drm_dev_put(ddev);
return err;
 }
-- 
2.21.0.dirty

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC 5/4] arm64: dts: juno: add GPU subsystem

2019-05-03 Thread Robin Murphy
Since we now have bindings for Mali Midgard GPUs, let's use them to
describe Juno's GPU subsystem, if only because we can. Juno sports a
Mali-T624 integrated behind an MMU-400 (as a gesture towards
virtualisation), in their own dedicated power domain with DVFS
controlled by the SCP.

Signed-off-by: Robin Murphy 
---

Just in case anyone else is interested. Note that I've not been using
this exact patch, since my Juno is running the new SCMI-based firmware
which needs not-yet-upstream MHU changes, but this should in theory be
the equivalent change for the upstream SCPI-based DT.

 .../bindings/gpu/arm,mali-midgard.txt |  1 +
 arch/arm64/boot/dts/arm/juno-base.dtsi| 25 +++
 2 files changed, 26 insertions(+)

diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt 
b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
index 18a2cde2e5f3..c17f8e96d1e6 100644
--- a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
+++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
@@ -16,6 +16,7 @@ Required properties:
 + "arm,mali-t880"
   * which must be preceded by one of the following vendor specifics:
 + "amlogic,meson-gxm-mali"
++ "arm,juno-mali"
 + "rockchip,rk3288-mali"
 + "rockchip,rk3399-mali"
 
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi 
b/arch/arm64/boot/dts/arm/juno-base.dtsi
index 995a7107cdd3..6edaf03620f9 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -35,6 +35,18 @@
clock-names = "apb_pclk";
};
 
+   smmu_gpu: iommu@2b40 {
+   compatible = "arm,mmu-400", "arm,smmu-v1";
+   reg = <0x0 0x2b40 0x0 0x1>;
+   interrupts = ,
+;
+   #iommu-cells = <1>;
+   #global-interrupts = <1>;
+   power-domains = <&scpi_devpd 3>;
+   dma-coherent;
+   status = "disabled";
+   };
+
smmu_pcie: iommu@2b50 {
compatible = "arm,mmu-401", "arm,smmu-v1";
reg = <0x0 0x2b50 0x0 0x1>;
@@ -487,6 +499,19 @@
};
};
 
+   gpu: gpu@2d00 {
+   compatible = "arm,juno-mali", "arm,mali-t624";
+   reg = <0 0x2d00 0 0x1>;
+   interrupts = ,
+,
+;
+   interrupt-names = "gpu", "job", "mmu";
+   clocks = <&scpi_dvfs 2>;
+   power-domains = <&scpi_devpd 3>;
+   dma-coherent;
+   status = "disabled";
+   };
+
sram: sram@2e00 {
compatible = "arm,juno-sram-ns", "mmio-sram";
reg = <0x0 0x2e00 0x0 0x8000>;
-- 
2.21.0.dirty

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  1   2   3   4   5   6   >