date:20210420

Re: [PATCH] video: fbdev: sm501fb: Fix deallocation of buffers order

2021-04-20 Thread Greg KH

On Tue, Apr 06, 2021 at 06:35:17PM -0500, Aditya Pakki wrote:
> The resource release in sm501fb_remove() is not in the inverse order of
> sm501fb_probe(), for the buffers. Release the info object after
> deallocating the buffers.
> 
> Signed-off-by: Aditya Pakki 
> ---
>  drivers/video/fbdev/sm501fb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/sm501fb.c b/drivers/video/fbdev/sm501fb.c
> index 6a52eba64559..4c32c9e88850 100644
> --- a/drivers/video/fbdev/sm501fb.c
> +++ b/drivers/video/fbdev/sm501fb.c
> @@ -2060,11 +2060,11 @@ static int sm501fb_remove(struct platform_device 
> *pdev)
>   unregister_framebuffer(fbinfo_pnl);
>  
>   sm501fb_stop(info);
> - kfree(info);
>  
>   framebuffer_release(fbinfo_pnl);
>   framebuffer_release(fbinfo_crt);
>  
> + kfree(info);
>   return 0;
>  }
>  
> -- 
> 2.25.1
> 

There is no function change here at all, please stop it with pointless
patches.

greg k-h
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Mike Rapoport

On Tue, Apr 20, 2021 at 09:04:51AM +0200, Michal Hocko wrote:
> On Mon 19-04-21 18:37:13, Christian König wrote:
> > Am 19.04.21 um 18:11 schrieb Michal Hocko:
> [...]
> > > The question is not whether it is NUMA aware but whether it is useful to
> > > know per-numa data for the purpose the counter is supposed to serve.
> > 
> > No, not at all. The pages of a single DMA-buf could even be from different
> > NUMA nodes if the exporting driver decides that this is somehow useful.
> 
> As the use of the counter hasn't been explained yet I can only
> speculate. One thing that I can imagine to be useful is to fill gaps in
> our accounting. It is quite often that the memroy accounted in
> /proc/meminfo (or oom report) doesn't add up to the overall memory
> usage. In some workloads the workload can be huge! In many cases there
> are other means to find out additional memory by a subsystem specific
> interfaces (e.g. networking buffers). I do assume that dma-buf is just
> one of those and the counter can fill the said gap at least partially
> for some workloads. That is definitely useful.

A bit off-topic.

Michal, I think it would have been nice to have an explanation like above
in Documentation/proc/meminfo, what do you say?
 
> What I am trying to bring up with NUMA side is that the same problem can
> happen on per-node basis. Let's say that some user consumes unexpectedly
> large amount of dma-buf on a certain node. This can lead to observable
> performance impact on anybody on allocating from that node and even
> worse cause an OOM for node bound consumers. How do I find out that it
> was dma-buf that has caused the problem?
> 
> See where I am heading?

-- 
Sincerely yours,
Mike.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Christian König


Am 20.04.21 um 09:04 schrieb Michal Hocko:

On Mon 19-04-21 18:37:13, Christian König wrote:

Am 19.04.21 um 18:11 schrieb Michal Hocko:

[...]

The question is not whether it is NUMA aware but whether it is useful to
know per-numa data for the purpose the counter is supposed to serve.

No, not at all. The pages of a single DMA-buf could even be from different
NUMA nodes if the exporting driver decides that this is somehow useful.

As the use of the counter hasn't been explained yet I can only
speculate. One thing that I can imagine to be useful is to fill gaps in
our accounting. It is quite often that the memroy accounted in
/proc/meminfo (or oom report) doesn't add up to the overall memory
usage. In some workloads the workload can be huge! In many cases there
are other means to find out additional memory by a subsystem specific
interfaces (e.g. networking buffers). I do assume that dma-buf is just
one of those and the counter can fill the said gap at least partially
for some workloads. That is definitely useful.


Yes, completely agree. I'm just not 100% sure if the DMA-buf framework 
should account for that or the individual drivers exporting DMA-bufs.


See below for a further explanation.


What I am trying to bring up with NUMA side is that the same problem can
happen on per-node basis. Let's say that some user consumes unexpectedly
large amount of dma-buf on a certain node. This can lead to observable
performance impact on anybody on allocating from that node and even
worse cause an OOM for node bound consumers. How do I find out that it
was dma-buf that has caused the problem?


Yes, that is the direction my thinking goes as well, but also even further.

See DMA-buf is also used to share device local memory between processes 
as well. In other words VRAM on graphics hardware.


On my test system here I have 32GB of system memory and 16GB of VRAM. I 
can use DMA-buf to allocate that 16GB of VRAM quite easily which then 
shows up under /proc/meminfo as used memory.


But that isn't really system memory at all, it's just allocated device 
memory.



See where I am heading?


Yeah, totally. Thanks for pointing this out.

Suggestions how to handle that?

Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3 5/7] drm/vmwgfx: Inline ttm_bo_mmap() into vmwgfx driver

2021-04-20 Thread Thomas Zimmermann


Hi

Am 16.04.21 um 15:51 schrieb Christian König:

Am 16.04.21 um 15:46 schrieb Christian König:

Am 16.04.21 um 15:31 schrieb Thomas Zimmermann:

The vmwgfx driver is the only remaining user of ttm_bo_mmap(). Inline
the code. The internal helper ttm_bo_vm_lookup() is now also part of
vmwgfx as vmw_bo_vm_lookup().

v2:
* replace pr_err() with drm_err() (Zack)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Zack Rusin 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 56 ++--
  1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c

index cb9975889e2f..c8b6543b4e39 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -27,6 +27,32 @@
    #include "vmwgfx_drv.h"
  +static struct ttm_buffer_object *vmw_bo_vm_lookup(struct 
ttm_device *bdev,

+  unsigned long offset,
+  unsigned long pages)
+{
+    struct vmw_private *dev_priv = container_of(bdev, struct 
vmw_private, bdev);

+    struct drm_device *drm = &dev_priv->drm;
+    struct drm_vma_offset_node *node;
+    struct ttm_buffer_object *bo = NULL;
+
+    drm_vma_offset_lock_lookup(bdev->vma_manager);
+
+    node = drm_vma_offset_lookup_locked(bdev->vma_manager, offset, 
pages);

+    if (likely(node)) {
+    bo = container_of(node, 

struct ttm_buffer_object,

+  base.vma_node);
+    bo = ttm_bo_get_unless_zero(bo);
+    }
+
+    drm_vma_offset_unlock_lookup(bdev->vma_manager);
+
+    if (!bo)
+    drm_err(drm, "Could not find buffer object to map\n");
+
+    return bo;
+}
+
  int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
  {
  static const struct vm_operations_struct vmw_vm_ops = {
@@ -41,10 +67,28 @@ int vmw_mmap(struct file *filp, struct 
vm_area_struct *vma)

  };
  struct drm_file *file_priv = filp->private_data;
  struct vmw_private *dev_priv = vmw_priv(file_priv->minor->dev);
-    int ret = ttm_bo_mmap(filp, vma, &dev_priv->bdev);
+    struct ttm_device *bdev = &dev_priv->bdev;
+    struct ttm_buffer_object *bo;
+    int ret;
+
+    if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
+    return -EINVAL;
+
+    bo = vmw_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
+    if (unlikely(!bo))
+    return -EINVAL;
  -    if (ret)
-    return ret;
+    if (unlikely(!bo->bdev->funcs->verify_access)) {
+    ret = -EPERM;
+    goto out_unref;
+    }
+    ret = bo->bdev->funcs->verify_access(bo, filp);


Is there any reason we can't call vmw_verify_access() directly here?

Would allow us to completely nuke the verify_access callback as well 
as far as I can see.


Forget what I said, couldn't see the next patch in my mailbox at time of 
writing.


Whole series is Reviewed-by: Christian König 


Thanks a lot. If I'm not mistaken, the patches at [1] need to go in 
first. So it could take a a bit until this lands.


Otherwise, this series could go through the same tree as [1] if nouveau 
and vmwgfx devs don't mind.


Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/88822/



Thanks for the nice cleanup,
Christian.



Regards,
Christian.


+    if (unlikely(ret != 0))
+    goto out_unref;
+
+    ret = ttm_bo_mmap_obj(vma, bo);
+    if (unlikely(ret != 0))
+    goto out_unref;
    vma->vm_ops = &vmw_vm_ops;
  @@ -52,7 +96,13 @@ int vmw_mmap(struct file *filp, struct 
vm_area_struct *vma)

  if (!is_cow_mapping(vma->vm_flags))
  vma->vm_flags = (vma->vm_flags & ~VM_MIXEDMAP) | VM_PFNMAP;
  +    ttm_bo_put(bo); /* release extra ref taken 
by 

ttm_bo_mmap_obj() */
+
  return 0;
+
+out_unref:
+    ttm_bo_put(bo);
+    return ret;
  }
    /* struct vmw_validation_mem callback */




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] efifb: Fix runtime pm calls for non PCI efifb device

2021-04-20 Thread Sudeep Holla

Gentle Ping! There is boot failure because of this issue with linux-next
on few arm platforms with non PCIe efifb. Please review and get the fix
merged ASAP so the testing on these platforms can continue with linux-next.

On Thu, Apr 15, 2021 at 11:22:24AM +0100, Sudeep Holla wrote:
> Commit a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at PCI 
> D0")
> added runtime pm calls to probe and remove routines to ensure the PCI
> device for efifb stays in D0 state. However not ever efifb is based on
> PCI device and efifb_pci_dev can be NULL if that is the case.
>
> In such cases, we will get a boot splat like below due to NULL dereference:
> -->8
>  Console: switching to colour frame buffer device 240x67
>  fb0: EFI VGA frame buffer device
>  Unable to handle kernel NULL pointer dereference at virtual address 
> 0270
>  Mem abort info:
>ESR = 0x9604
>EC = 0x25: DABT (current EL), IL = 32 bits
>SET = 0, FnV = 0
>EA = 0, S1PTW = 0
>  Data abort info:
>ISV = 0, ISS = 0x0004
>CM = 0, WnR = 0
>  [0270] user address but active_mm is swapper
>  Internal error: Oops: 9604 [#1] PREEMPT SMP
>  Modules linked in:
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc7-next-20210413 #1
>  Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development 
> Platform
>  pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>  pc : pm_runtime_drop_link+0x12c/0x338
>  lr : efifb_probe+0x7bc/0x7f0
>  Call trace:
>   pm_runtime_drop_link+0x12c/0x338
>   efifb_probe+0x7bc/0x7f0
>   platform_probe+0x68/0xd8
>   really_probe+0xe4/0x3a8
>   driver_probe_device+0x64/0xc8
>   device_driver_attach+0x74/0x80
>   __driver_attach+0x64/0xf0
>   bus_for_each_dev+0x70/0xc0
>   driver_attach+0x24/0x30
>   bus_add_driver+0x150/0x1f8
>   driver_register+0x64/0x120
>   __platform_driver_register+0x28/0x38
>   efifb_driver_init+0x1c/0x28
>   do_one_initcall+0x48/0x2b0
>   kernel_init_freeable+0x1e8/0x258
>   kernel_init+0x14/0x118
>   ret_from_fork+0x10/0x30
>  Code: 88027c01 35a2 17fff706 f9800051 (885f7c40)
>  ---[ end trace 17d8da630bf8ff77 ]---
>  Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> -->8
>
> Fix the issue by checking for non-NULL efifb_pci_dev before dereferencing
> for runtime pm calls in probe and remove routines.
>
> Fixes: a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at PCI 
> D0")
> Cc: Kai-Heng Feng 
> Cc: Alex Deucher 
> Cc: Thomas Zimmermann 
> Cc: Peter Jones 
> Signed-off-by: Sudeep Holla 
> ---
>  drivers/video/fbdev/efifb.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> index f58a545b3bf3..8ea8f079cde2 100644
> --- a/drivers/video/fbdev/efifb.c
> +++ b/drivers/video/fbdev/efifb.c
> @@ -575,7 +575,8 @@ static int efifb_probe(struct platform_device *dev)
>   goto err_fb_dealoc;
>   }
>   fb_info(info, "%s frame buffer device\n", info->fix.id);
> - pm_runtime_get_sync(&efifb_pci_dev->dev);
> + if (efifb_pci_dev)
> + pm_runtime_get_sync(&efifb_pci_dev->dev);
>   return 0;
>
>  err_fb_dealoc:
> @@ -602,7 +603,8 @@ static int efifb_remove(struct platform_device *pdev)
>   unregister_framebuffer(info);
>   sysfs_remove_groups(&pdev->dev.kobj, efifb_groups);
>   framebuffer_release(info);
> - pm_runtime_put(&efifb_pci_dev->dev);
> + if (efifb_pci_dev)
> + pm_runtime_put(&efifb_pci_dev->dev);
>
>   return 0;
>  }
> --
> 2.25.1
>

--
Regards,
Sudeep
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Christian König


Am 20.04.21 um 09:46 schrieb Michal Hocko:

On Tue 20-04-21 09:32:14, Christian König wrote:

Am 20.04.21 um 09:04 schrieb Michal Hocko:

On Mon 19-04-21 18:37:13, Christian König wrote:

Am 19.04.21 um 18:11 schrieb Michal Hocko:

[...]

What I am trying to bring up with NUMA side is that the same problem can
happen on per-node basis. Let's say that some user consumes unexpectedly
large amount of dma-buf on a certain node. This can lead to observable
performance impact on anybody on allocating from that node and even
worse cause an OOM for node bound consumers. How do I find out that it
was dma-buf that has caused the problem?

Yes, that is the direction my thinking goes as well, but also even further.

See DMA-buf is also used to share device local memory between processes as
well. In other words VRAM on graphics hardware.

On my test system here I have 32GB of system memory and 16GB of VRAM. I can
use DMA-buf to allocate that 16GB of VRAM quite easily which then shows up
under /proc/meminfo as used memory.

This is something that would be really interesting in the changelog. I
mean the expected and extreme memory consumption of this memory. Ideally
with some hints on what to do when the number is really high (e.g. mount
debugfs and have a look here and there to check whether this is just too
many users or an unexpected pattern to be reported).


But that isn't really system memory at all, it's just allocated device
memory.

OK, that was not really clear to me. So this is not really accounted to
MemTotal?


It depends. In a lot of embedded systems you only have system memory and 
in this case that value here is indeed really useful.



If that is really the case then reporting it into the oom
report is completely pointless and I am not even sure /proc/meminfo is
the right interface either. It would just add more confusion I am
afraid.


I kind of agree. As I said a DMA-buf could be backed by system memory or 
device memory.


In the case when it is backed by system memory it does make sense to 
report this in an OOM dump.


But only the exporting driver knows what the DMA-buf handle represents, 
the framework just provides the common ground for inter driver 
communication.



See where I am heading?

Yeah, totally. Thanks for pointing this out.

Suggestions how to handle that?

As I've pointed out in previous reply we do have an API to account per
node memory but now that you have brought up that this is not something
we account as a regular memory then this doesn't really fit into that
model. But maybe I am just confused.


Well does that API also has a counter for memory used by device drivers?

If yes then the device driver who exported the DMA-buf should probably 
use that API. If no we might want to create one.


I mean the author of this patch seems to have an use case where this is 
needed and I also see that we have some hole in how we account memory.


Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drm/panel: lvds: Drop unnecessary NULL pointer checks for lvds->enable_gpio

2021-04-20 Thread Liu Ying

gpiod_set_value_cansleep() does NULL pointer check for passed in
gpio descriptor's pointer, so it's unnecessary to do that check
before calling that function. This patch drops those checks from
this panel driver.

Cc: Laurent Pinchart 
Cc: Thierry Reding 
Cc: Sam Ravnborg 
Cc: David Airlie 
Cc: Daniel Vetter 
Signed-off-by: Liu Ying 
---
 drivers/gpu/drm/panel/panel-lvds.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-lvds.c 
b/drivers/gpu/drm/panel/panel-lvds.c
index 59a8d99..19f11fa 100644
--- a/drivers/gpu/drm/panel/panel-lvds.c
+++ b/drivers/gpu/drm/panel/panel-lvds.c
@@ -50,8 +50,7 @@ static int panel_lvds_unprepare(struct drm_panel *panel)
 {
struct panel_lvds *lvds = to_panel_lvds(panel);
 
-   if (lvds->enable_gpio)
-   gpiod_set_value_cansleep(lvds->enable_gpio, 0);
+   gpiod_set_value_cansleep(lvds->enable_gpio, 0);
 
if (lvds->supply)
regulator_disable(lvds->supply);
@@ -74,8 +73,7 @@ static int panel_lvds_prepare(struct drm_panel *panel)
}
}
 
-   if (lvds->enable_gpio)
-   gpiod_set_value_cansleep(lvds->enable_gpio, 1);
+   gpiod_set_value_cansleep(lvds->enable_gpio, 1);
 
return 0;
 }
-- 
2.7.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] efifb: Fix runtime pm calls for non PCI efifb device

2021-04-20 Thread Kai-Heng Feng

Hi Sudeep,

On Tue, Apr 20, 2021 at 3:53 PM Sudeep Holla  wrote:
>
> Gentle Ping! There is boot failure because of this issue with linux-next
> on few arm platforms with non PCIe efifb. Please review and get the fix
> merged ASAP so the testing on these platforms can continue with linux-next.

It was merged in drm-tip as d510c88cfbb2 ("efifb: Check efifb_pci_dev
before using it").

Kai-Heng

>
> On Thu, Apr 15, 2021 at 11:22:24AM +0100, Sudeep Holla wrote:
> > Commit a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at PCI 
> > D0")
> > added runtime pm calls to probe and remove routines to ensure the PCI
> > device for efifb stays in D0 state. However not ever efifb is based on
> > PCI device and efifb_pci_dev can be NULL if that is the case.
> >
> > In such cases, we will get a boot splat like below due to NULL dereference:
> > -->8
> >  Console: switching to colour frame buffer device 240x67
> >  fb0: EFI VGA frame buffer device
> >  Unable to handle kernel NULL pointer dereference at virtual address 
> > 0270
> >  Mem abort info:
> >ESR = 0x9604
> >EC = 0x25: DABT (current EL), IL = 32 bits
> >SET = 0, FnV = 0
> >EA = 0, S1PTW = 0
> >  Data abort info:
> >ISV = 0, ISS = 0x0004
> >CM = 0, WnR = 0
> >  [0270] user address but active_mm is swapper
> >  Internal error: Oops: 9604 [#1] PREEMPT SMP
> >  Modules linked in:
> >  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc7-next-20210413 #1
> >  Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development 
> > Platform
> >  pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> >  pc : pm_runtime_drop_link+0x12c/0x338
> >  lr : efifb_probe+0x7bc/0x7f0
> >  Call trace:
> >   pm_runtime_drop_link+0x12c/0x338
> >   efifb_probe+0x7bc/0x7f0
> >   platform_probe+0x68/0xd8
> >   really_probe+0xe4/0x3a8
> >   driver_probe_device+0x64/0xc8
> >   device_driver_attach+0x74/0x80
> >   __driver_attach+0x64/0xf0
> >   bus_for_each_dev+0x70/0xc0
> >   driver_attach+0x24/0x30
> >   bus_add_driver+0x150/0x1f8
> >   driver_register+0x64/0x120
> >   __platform_driver_register+0x28/0x38
> >   efifb_driver_init+0x1c/0x28
> >   do_one_initcall+0x48/0x2b0
> >   kernel_init_freeable+0x1e8/0x258
> >   kernel_init+0x14/0x118
> >   ret_from_fork+0x10/0x30
> >  Code: 88027c01 35a2 17fff706 f9800051 (885f7c40)
> >  ---[ end trace 17d8da630bf8ff77 ]---
> >  Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> > -->8
> >
> > Fix the issue by checking for non-NULL efifb_pci_dev before dereferencing
> > for runtime pm calls in probe and remove routines.
> >
> > Fixes: a6c0fd3d5a8b ("efifb: Ensure graphics device for efifb stays at PCI 
> > D0")
> > Cc: Kai-Heng Feng 
> > Cc: Alex Deucher 
> > Cc: Thomas Zimmermann 
> > Cc: Peter Jones 
> > Signed-off-by: Sudeep Holla 
> > ---
> >  drivers/video/fbdev/efifb.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> > index f58a545b3bf3..8ea8f079cde2 100644
> > --- a/drivers/video/fbdev/efifb.c
> > +++ b/drivers/video/fbdev/efifb.c
> > @@ -575,7 +575,8 @@ static int efifb_probe(struct platform_device *dev)
> >   goto err_fb_dealoc;
> >   }
> >   fb_info(info, "%s frame buffer device\n", info->fix.id);
> > - pm_runtime_get_sync(&efifb_pci_dev->dev);
> > + if (efifb_pci_dev)
> > + pm_runtime_get_sync(&efifb_pci_dev->dev);
> >   return 0;
> >
> >  err_fb_dealoc:
> > @@ -602,7 +603,8 @@ static int efifb_remove(struct platform_device *pdev)
> >   unregister_framebuffer(info);
> >   sysfs_remove_groups(&pdev->dev.kobj, efifb_groups);
> >   framebuffer_release(info);
> > - pm_runtime_put(&efifb_pci_dev->dev);
> > + if (efifb_pci_dev)
> > + pm_runtime_put(&efifb_pci_dev->dev);
> >
> >   return 0;
> >  }
> > --
> > 2.25.1
> >
>
> --
> Regards,
> Sudeep
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [v1 0/3] drm: Add support for backlight control of eDP panel on ti-sn65dsi86 bridge

2021-04-20 Thread Jani Nikula

Cc: Lyude and drm-misc maintainers

On Wed, 14 Apr 2021, Rajeev Nandan  wrote:
> The backlight level of an eDP panel can be controlled through the AUX
> channel using DPCD registers of the panel.
>
> The capability for the Source device to adjust backlight characteristics
> within the panel, using the Sink device DPCD registers is indicated by
> the TCON_BACKLIGHT_ADJUSTMENT_CAPABLE bit in the EDP_GENERAL_CAPABILITY_1
> register (DPCD Address 701h, bit0). In this configuration, the eDP TCON
> receives the backlight level information from the host, through the AUX
> channel.

i915 has had this capability for some years now, and work is in progress
to extract the DP AUX backlight code to drm core as helpers [1]. There's
much more to it than what's proposed here. Adding incompatible DP AUX
code at this point would be a pretty bad outcome.

For example, we can't tie backlight device register to DP AUX backlight,
because there are modes where *both* the eDP PWM pin based backlight
control and DP AUX backlight control are used *simultaneously*. The
backlight device register needs to be in code that is aware of both.

Granted, it was a mistake way back when to add this in i915 only, and it
should've been lifted to drm much earlier. It would've been done by
Lyude by now, but people were not happy about not using drm device based
logging. And that has unfortunately lead to a pretty massive prep series
[2].

Please look into the code added to drm helpers in [1], and see how that
would work for you.

BR,
Jani.

[1] http://lore.kernel.org/r/20210205234515.1216538-1-ly...@redhat.com
[2] http://lore.kernel.org/r/20210419225523.184856-1-ly...@redhat.com

>
> The changes in this patch series do the following:
> - Add drm_dp_aux_backlight_ APIs to support backlight control using DPCD
>   registers on the DisplayPort AUX channel.
>   The current version only supports backlight brightness control by the
>   EDP_BACKLIGHT_BRIGHTNESS_MSB/LSB registers (DPCD Addresses 722h-723h).
> - Add support for backlight control of the eDP panel connected to the
>   ti-sn65dsi86 bridge.
>
> Rajeev Nandan (3):
>   drm/dp: Add DisplayPort aux backlight control support
>   dt-bindings: drm/bridge: ti-sn65dsi86: Document use-aux-backlight
>   drm/bridge: ti-sn65dsi86: Add DisplayPort aux backlight support
>
>  .../bindings/display/bridge/ti,sn65dsi86.yaml  |   8 +
>  drivers/gpu/drm/Kconfig|   8 +
>  drivers/gpu/drm/Makefile   |   1 +
>  drivers/gpu/drm/bridge/Kconfig |   1 +
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c  |  26 +++
>  drivers/gpu/drm/drm_dp_aux_backlight.c | 191 
> +
>  include/drm/drm_dp_aux_backlight.h |  29 
>  7 files changed, 264 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_dp_aux_backlight.c
>  create mode 100644 include/drm/drm_dp_aux_backlight.h

-- 
Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 0/2 V6]Add dma-buf counter

2021-04-20 Thread Peter Enderborg

The dma-buf counter is a metric for mapped memory used by it's clients.
It is a shared buffer that is typically used for interprocess communication
or process to hardware communication. In android we used to have ION,. but
it is now replaced with dma-buf. ION had some overview metrics that was similar.



V1
initial version. Add dma-buf counter

V2
Fix build depencendy error suggested by Matthew Wilcox
Extent commit message sugged by Köning

V3
Change variable and function names.

V4
Fix function name in code doc
Reported-by: kernel test robot 

V5
Removed EXPORT_SYMBOL_GPL suggested by Muchun Song

V6
Made it a patch set, Adding a addional patch for
printing dma-buf counter in show_mem.
Suggested by Michal Hocko.






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/2 V6] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter Enderborg

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems. Typical usage
is to see that system does not do to much pre-allocations,
finding memory leaks in userspace, such as not all clients
close down the reference to the buffer.

Signed-off-by: Peter Enderborg 
---
 Documentation/filesystems/proc.rst |  5 +
 drivers/dma-buf/dma-buf.c  | 12 
 fs/proc/meminfo.c  |  5 -
 include/linux/dma-buf.h|  1 +
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 48fbfc336ebf..a85df9490810 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -973,6 +973,7 @@ varies by architecture and compile options.  The following 
is from a
 AnonHugePages:   49152 kB
 ShmemHugePages:  0 kB
 ShmemPmdMapped:  0 kB
+DmaBufTotal  0 kB
 
 MemTotal
   Total usable RAM (i.e. physical RAM minus a few reserved
@@ -1102,6 +1103,10 @@ VmallocChunk
 Percpu
   Memory allocated to the percpu allocator used to back percpu
   allocations. This stat excludes the cost of metadata.
+DmaBufTotal
+  Memory allocated by dma-buf driver.What memory is used
+ is  arbitrary. (It might be kernel, local or even hardware vram).
+ Details on buffers are found in debugfs if enabled.
 
 vmallocinfo
 ~~~
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..4dc37cd4293b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,7 @@ struct dma_buf_list {
 };
 
 static struct dma_buf_list db_list;
+static atomic_long_t dma_buf_global_allocated;
 
 static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 {
@@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
dma_resv_fini(dmabuf->resv);
 
+   atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
@@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
mutex_lock(&db_list.lock);
list_add(&dmabuf->list_node, &db_list.head);
mutex_unlock(&db_list.lock);
+   atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
 
return dmabuf;
 
@@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
dma_buf_map *map)
 }
 EXPORT_SYMBOL_GPL(dma_buf_vunmap);
 
+/**
+ * dma_buf_allocated_pages - Return the used nr of pages
+ * allocated for dma-buf
+ */
+long dma_buf_allocated_pages(void)
+{
+   return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..ccc7c40c8db7 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -16,6 +16,7 @@
 #ifdef CONFIG_CMA
 #include 
 #endif
+#include 
 #include 
 #include "internal.h"
 
@@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
show_val_kb(m, "CmaFree:",
global_zone_page_state(NR_FREE_CMA_PAGES));
 #endif
-
+#ifdef CONFIG_DMA_SHARED_BUFFER
+   show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
+#endif
hugetlb_report_meminfo(m);
 
arch_report_meminfo(m);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index efdc56b9d95f..5b05816bd2cd 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+long dma_buf_allocated_pages(void);
 #endif /* __DMA_BUF_H__ */
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/2 V6] lib/show_mem.c: Add dma-buf counter to show_mem dump.

2021-04-20 Thread Peter Enderborg

On system where dma-buf is used it can be many clients that adds up
to a lot of memory. This can be relevant for OOM handling when
running out of memory or how system handle this memory. It may be to free
with a kill.

Suggested-by: Michal Hocko 
Signed-off-by: Peter Enderborg 
---
 lib/show_mem.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/show_mem.c b/lib/show_mem.c
index 1c26c14ffbb9..ec4748c64353 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 
 void show_mem(unsigned int filter, nodemask_t *nodemask)
 {
@@ -41,4 +42,8 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
 #ifdef CONFIG_MEMORY_FAILURE
printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
 #endif
+#ifdef CONFIG_DMA_SHARED_BUFFER
+   printk("%lu pages dma-buf\n", dma_buf_allocated_pages());
+#endif
+
 }
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] efifb: Fix runtime pm calls for non PCI efifb device

2021-04-20 Thread Sudeep Holla

On Tue, Apr 20, 2021 at 04:12:26PM +0800, Kai-Heng Feng wrote:
> Hi Sudeep,
>
> On Tue, Apr 20, 2021 at 3:53 PM Sudeep Holla  wrote:
> >
> > Gentle Ping! There is boot failure because of this issue with linux-next
> > on few arm platforms with non PCIe efifb. Please review and get the fix
> > merged ASAP so the testing on these platforms can continue with linux-next.
>
> It was merged in drm-tip as d510c88cfbb2 ("efifb: Check efifb_pci_dev
> before using it").
>

Ah OK, thanks! But I don't think it is appear on -next yet.

--
Regards,
Sudeep
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Daniel Vetter

On Sat, Apr 17, 2021 at 01:54:08PM +0200, Christian König wrote:
> Am 17.04.21 um 13:20 schrieb peter.enderb...@sony.com:
> > On 4/17/21 12:59 PM, Christian König wrote:
> > > Am 17.04.21 um 12:40 schrieb Peter Enderborg:
> > > > This adds a total used dma-buf memory. Details
> > > > can be found in debugfs, however it is not for everyone
> > > > and not always available. dma-buf are indirect allocated by
> > > > userspace. So with this value we can monitor and detect
> > > > userspace applications that have problems.
> > > > 
> > > > Signed-off-by: Peter Enderborg 
> > > Reviewed-by: Christian König 
> > > 
> > > How do you want to upstream this?
> > I don't understand that question. The patch applies on Torvalds 5.12-rc7,
> > but I guess 5.13 is what we work on right now.
> 
> Yeah, but how do you want to get it into Linus tree?
> 
> I can push it together with other DMA-buf patches through drm-misc-next and
> then Dave will send it to Linus for inclusion in 5.13.

Small correction, we've already frozen for the merge window so this will
land in 5.14.
-Daniel

> 
> But could be that you are pushing multiple changes towards Linus through
> some other branch. In this case I'm fine if you pick that way instead if you
> want to keep your patches together for some reason.
> 
> Christian.
> 
> > 
> > > > ---
> > > >    drivers/dma-buf/dma-buf.c | 13 +
> > > >    fs/proc/meminfo.c |  5 -
> > > >    include/linux/dma-buf.h   |  1 +
> > > >    3 files changed, 18 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > > index f264b70c383e..197e5c45dd26 100644
> > > > --- a/drivers/dma-buf/dma-buf.c
> > > > +++ b/drivers/dma-buf/dma-buf.c
> > > > @@ -37,6 +37,7 @@ struct dma_buf_list {
> > > >    };
> > > >      static struct dma_buf_list db_list;
> > > > +static atomic_long_t dma_buf_global_allocated;
> > > >      static char *dmabuffs_dname(struct dentry *dentry, char *buffer, 
> > > > int buflen)
> > > >    {
> > > > @@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
> > > >    if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
> > > >    dma_resv_fini(dmabuf->resv);
> > > >    +    atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
> > > >    module_put(dmabuf->owner);
> > > >    kfree(dmabuf->name);
> > > >    kfree(dmabuf);
> > > > @@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
> > > > dma_buf_export_info *exp_info)
> > > >    mutex_lock(&db_list.lock);
> > > >    list_add(&dmabuf->list_node, &db_list.head);
> > > >    mutex_unlock(&db_list.lock);
> > > > +    atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
> > > >      return dmabuf;
> > > >    @@ -1346,6 +1349,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, 
> > > > struct dma_buf_map *map)
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(dma_buf_vunmap);
> > > >    +/**
> > > > + * dma_buf_allocated_pages - Return the used nr of pages
> > > > + * allocated for dma-buf
> > > > + */
> > > > +long dma_buf_allocated_pages(void)
> > > > +{
> > > > +    return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(dma_buf_allocated_pages);
> > > > +
> > > >    #ifdef CONFIG_DEBUG_FS
> > > >    static int dma_buf_debug_show(struct seq_file *s, void *unused)
> > > >    {
> > > > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > > > index 6fa761c9cc78..ccc7c40c8db7 100644
> > > > --- a/fs/proc/meminfo.c
> > > > +++ b/fs/proc/meminfo.c
> > > > @@ -16,6 +16,7 @@
> > > >    #ifdef CONFIG_CMA
> > > >    #include 
> > > >    #endif
> > > > +#include 
> > > >    #include 
> > > >    #include "internal.h"
> > > >    @@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, 
> > > > void *v)
> > > >    show_val_kb(m, "CmaFree:    ",
> > > >    global_zone_page_state(NR_FREE_CMA_PAGES));
> > > >    #endif
> > > > -
> > > > +#ifdef CONFIG_DMA_SHARED_BUFFER
> > > > +    show_val_kb(m, "DmaBufTotal:    ", dma_buf_allocated_pages());
> > > > +#endif
> > > >    hugetlb_report_meminfo(m);
> > > >      arch_report_meminfo(m);
> > > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > > > index efdc56b9d95f..5b05816bd2cd 100644
> > > > --- a/include/linux/dma-buf.h
> > > > +++ b/include/linux/dma-buf.h
> > > > @@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct 
> > > > vm_area_struct *,
> > > >     unsigned long);
> > > >    int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > > >    void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > > > +long dma_buf_allocated_pages(void);
> > > >    #endif /* __DMA_BUF_H__ */
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll

Re: [PATCH 03/12] drm/exynos: Don't set allow_fb_modifiers explicitly

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 03:31:27PM +0900, Inki Dae wrote:
> 
> 
> 21. 4. 13. 오후 6:48에 Daniel Vetter 이(가) 쓴 글:
> > Since
> > 
> > commit 890880ddfdbe256083170866e49c87618b706ac7
> > Author: Paul Kocialkowski 
> > Date:   Fri Jan 4 09:56:10 2019 +0100
> > 
> > drm: Auto-set allow_fb_modifiers when given modifiers at plane init
> > 
> > this is done automatically as part of plane init, if drivers set the
> > modifier list correctly. Which is the case here.
> > 
> > Signed-off-by: Daniel Vetter 
> 
> Acked-by: Inki Dae 

Thanks for taking a look, pushed to drm-misc-next.
-Daniel

> 
> Thanks,
> Inki Dae
> 
> > Cc: Inki Dae 
> > Cc: Joonyoung Shim 
> > Cc: Seung-Woo Kim 
> > Cc: Kyungmin Park 
> > Cc: Krzysztof Kozlowski 
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > ---
> >  drivers/gpu/drm/exynos/exynos_drm_fb.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/exynos/exynos_drm_fb.c 
> > b/drivers/gpu/drm/exynos/exynos_drm_fb.c
> > index 64370b634cca..79fa3649185c 100644
> > --- a/drivers/gpu/drm/exynos/exynos_drm_fb.c
> > +++ b/drivers/gpu/drm/exynos/exynos_drm_fb.c
> > @@ -177,7 +177,5 @@ void exynos_drm_mode_config_init(struct drm_device *dev)
> > dev->mode_config.funcs = &exynos_drm_mode_config_funcs;
> > dev->mode_config.helper_private = &exynos_drm_mode_config_helpers;
> >  
> > -   dev->mode_config.allow_fb_modifiers = true;
> > -
> > dev->mode_config.normalize_zpos = true;
> >  }
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 0/9] drm: Support simple-framebuffer devices and firmware fbs

2021-04-20 Thread Daniel Vetter

On Mon, Apr 19, 2021 at 10:00:56AM +0200, Geert Uytterhoeven wrote:
> Hi Thomas,
> 
> On Fri, Apr 16, 2021 at 11:00 AM Thomas Zimmermann  
> wrote:
> > This patchset adds support for simple-framebuffer platform devices and
> > a handover mechanism for native drivers to take-over control of the
> > hardware.
> >
> > The new driver, called simpledrm, binds to a simple-frambuffer platform
> > device. The kernel's boot code creates such devices for firmware-provided
> > framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot
> > loader sets up the framebuffers. Description via device tree is also an
> > option.
> 
> I guess this can be used as a replacement for offb, too...
> 
> > Patches 4 to 8 add the simpledrm driver. It's build on simple DRM helpers
> > and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During
> 
>  if support for 8-bit frame buffers would be added?

Is that 8-bit greyscale or 8-bit indexed with 256 entry palette? Former
shouldn't be a big thing, but the latter is only really supported by the
overall drm ecosystem in theory. Most userspace assumes that xrgb
works, and we keep that illusion up by emulating it in kernel for hw which
just doesn't support it. But reformatting xrgb to c8 is tricky at
best. The uapis are all there for setting the palette, and C8 is a defined
format even with atomic kms interface, but really there's not much
userspace for it. In other words, it would work as well as current offb
would, but that's at least that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/connector: demote connector force-probes for non-master clients

2021-04-20 Thread Simon Ser

Ping Daniel Vetter
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3 5/7] drm/vmwgfx: Inline ttm_bo_mmap() into vmwgfx driver

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 09:51:27AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 16.04.21 um 15:51 schrieb Christian König:
> > Am 16.04.21 um 15:46 schrieb Christian König:
> > > Am 16.04.21 um 15:31 schrieb Thomas Zimmermann:
> > > > The vmwgfx driver is the only remaining user of ttm_bo_mmap(). Inline
> > > > the code. The internal helper ttm_bo_vm_lookup() is now also part of
> > > > vmwgfx as vmw_bo_vm_lookup().
> > > > 
> > > > v2:
> > > > * replace pr_err() with drm_err() (Zack)
> > > > 
> > > > Signed-off-by: Thomas Zimmermann 
> > > > Reviewed-by: Zack Rusin 
> > > > ---
> > > >   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 56 ++--
> > > >   1 file changed, 53 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
> > > > b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
> > > > index cb9975889e2f..c8b6543b4e39 100644
> > > > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
> > > > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
> > > > @@ -27,6 +27,32 @@
> > > >     #include "vmwgfx_drv.h"
> > > >   +static struct ttm_buffer_object *vmw_bo_vm_lookup(struct
> > > > ttm_device *bdev,
> > > > +  unsigned long offset,
> > > > +  unsigned long pages)
> > > > +{
> > > > +    struct vmw_private *dev_priv = container_of(bdev, struct
> > > > vmw_private, bdev);
> > > > +    struct drm_device *drm = &dev_priv->drm;
> > > > +    struct drm_vma_offset_node *node;
> > > > +    struct ttm_buffer_object *bo = NULL;
> > > > +
> > > > +    drm_vma_offset_lock_lookup(bdev->vma_manager);
> > > > +
> > > > +    node = drm_vma_offset_lookup_locked(bdev->vma_manager,
> > > > offset, pages);
> > > > +    if (likely(node)) {
> > > > +    bo = container_of(node,
> struct ttm_buffer_object,
> > > > +  base.vma_node);
> > > > +    bo = ttm_bo_get_unless_zero(bo);
> > > > +    }
> > > > +
> > > > +    drm_vma_offset_unlock_lookup(bdev->vma_manager);
> > > > +
> > > > +    if (!bo)
> > > > +    drm_err(drm, "Could not find buffer object to map\n");
> > > > +
> > > > +    return bo;
> > > > +}
> > > > +
> > > >   int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
> > > >   {
> > > >   static const struct vm_operations_struct vmw_vm_ops = {
> > > > @@ -41,10 +67,28 @@ int vmw_mmap(struct file *filp, struct
> > > > vm_area_struct *vma)
> > > >   };
> > > >   struct drm_file *file_priv = filp->private_data;
> > > >   struct vmw_private *dev_priv = vmw_priv(file_priv->minor->dev);
> > > > -    int ret = ttm_bo_mmap(filp, vma, &dev_priv->bdev);
> > > > +    struct ttm_device *bdev = &dev_priv->bdev;
> > > > +    struct ttm_buffer_object *bo;
> > > > +    int ret;
> > > > +
> > > > +    if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
> > > > +    return -EINVAL;
> > > > +
> > > > +    bo = vmw_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
> > > > +    if (unlikely(!bo))
> > > > +    return -EINVAL;
> > > >   -    if (ret)
> > > > -    return ret;
> > > > +    if (unlikely(!bo->bdev->funcs->verify_access)) {
> > > > +    ret = -EPERM;
> > > > +    goto out_unref;
> > > > +    }
> > > > +    ret = bo->bdev->funcs->verify_access(bo, filp);
> > > 
> > > Is there any reason we can't call vmw_verify_access() directly here?
> > > 
> > > Would allow us to completely nuke the verify_access callback as well
> > > as far as I can see.
> > 
> > Forget what I said, couldn't see the next patch in my mailbox at time of
> > writing.
> > 
> > Whole series is Reviewed-by: Christian König 
> 
> Thanks a lot. If I'm not mistaken, the patches at [1] need to go in first.
> So it could take a a bit until this lands.
> 
> Otherwise, this series could go through the same tree as [1] if nouveau and
> vmwgfx devs don't mind.

I would land it all through drm-misc-next. Maybe check with Alex on irc
for an ack for merging that way, but I don't think this will cause issues
against the amdgpu tree. Lots of ttm cleanup has landed this way already
past few months. Otherwise you could create a small topic branch with
these patches here and send that to Alex, and he can sort out the
interaction with Felix' series.
-Daniel


> 
> Best regards
> Thomas
> 
> [1] https://patchwork.freedesktop.org/series/88822/
> 
> > 
> > Thanks for the nice cleanup,
> > Christian.
> > 
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > +    if (unlikely(ret != 0))
> > > > +    goto out_unref;
> > > > +
> > > > +    ret = ttm_bo_mmap_obj(vma, bo);
> > > > +    if (unlikely(ret != 0))
> > > > +    goto out_unref;
> > > >     vma->vm_ops = &vmw_vm_ops;
> > > >   @@ -52,7 +96,13 @@ int vmw_mmap(struct file *filp, struct
> > > > vm_area_struct *vma)
> > > >   if (!is_cow_mapping(vma->vm_flags))
> > > >   vma->vm_flags = (vma->vm_flags & ~VM_MIXEDMAP) | VM_PFNMAP;
> > > >   +    ttm_bo_put(bo); /* release extra ref taken
> by
> > > > ttm_bo_mmap_obj() */
> > > > +
> > > >

Re: [PATCH 1/2] drm/doc: document drm_mode_get_plane

2021-04-20 Thread Simon Ser

Hi Leandro,

Any chance you could re-spin at least the first patch?

Thanks,

Simon
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 27/40] drm/ttm/ttm_device: Demote kernel-doc abuses

2021-04-20 Thread Daniel Vetter

On Fri, Apr 16, 2021 at 05:32:52PM +0200, Christian König wrote:
> Am 16.04.21 um 16:37 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> > 
> >   drivers/gpu/drm/ttm/ttm_device.c:42: warning: Function parameter or 
> > member 'ttm_global_mutex' not described in 'DEFINE_MUTEX'
> >   drivers/gpu/drm/ttm/ttm_device.c:42: warning: expecting prototype for 
> > ttm_global_mutex(). Prototype was for DEFINE_MUTEX() instead
> >   drivers/gpu/drm/ttm/ttm_device.c:112: warning: Function parameter or 
> > member 'ctx' not described in 'ttm_global_swapout'
> >   drivers/gpu/drm/ttm/ttm_device.c:112: warning: Function parameter or 
> > member 'gfp_flags' not described in 'ttm_global_swapout'
> >   drivers/gpu/drm/ttm/ttm_device.c:112: warning: expecting prototype for A 
> > buffer object shrink method that tries to swap out the first(). Prototype 
> > was for ttm_global_swapout() instead
> > 
> > Cc: Christian Koenig 
> > Cc: Huang Rui 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Lee Jones 
> 
> Reviewed-by: Christian König 

Can you pls also land all the patches you reviewed from Lee into
drm-misc-next? Just so they wont' get lost. Will all head in for 5.14.
-Daniel
> 
> > ---
> >   drivers/gpu/drm/ttm/ttm_device.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
> > b/drivers/gpu/drm/ttm/ttm_device.c
> > index 9b787b3caeb50..a8bec8358350d 100644
> > --- a/drivers/gpu/drm/ttm/ttm_device.c
> > +++ b/drivers/gpu/drm/ttm/ttm_device.c
> > @@ -36,7 +36,7 @@
> >   #include "ttm_module.h"
> > -/**
> > +/*
> >* ttm_global_mutex - protecting the global state
> >*/
> >   DEFINE_MUTEX(ttm_global_mutex);
> > @@ -104,7 +104,7 @@ static int ttm_global_init(void)
> > return ret;
> >   }
> > -/**
> > +/*
> >* A buffer object shrink method that tries to swap out the first
> >* buffer object on the global::swap_lru list.
> >*/
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] drm/drm_bufs.c: In switch, add break in default case

2021-04-20 Thread Daniel Vetter

On Sat, Apr 17, 2021 at 06:15:52PM +0200, Fabio M. De Francesco wrote:
> Added a "break" in the default case of a switch select statement.
> GCC complains, although this "break" is not strictly necessary 
> for the code to work as expected.

Luckily we already have a comment stating that this is all just to make
gcc happy.
> 
> Signed-off-by: Fabio M. De Francesco 

Applied to drm-misc-next, thanks for the patch.
-Daniel

> ---
> 
> Changes from v1: Added the reason why of this change in the log.
> 
>  drivers/gpu/drm/drm_bufs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
> index e3d77dfefb0a..fc40aa0adf73 100644
> --- a/drivers/gpu/drm/drm_bufs.c
> +++ b/drivers/gpu/drm/drm_bufs.c
> @@ -79,7 +79,7 @@ static struct drm_map_list *drm_find_matching_map(struct 
> drm_device *dev,
>   return entry;
>   break;
>   default: /* Make gcc happy */
> - ;
> + break;
>   }
>   if (entry->map->offset == map->offset)
>   return entry;
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Daniel Vetter

On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
> This adds a total used dma-buf memory. Details
> can be found in debugfs, however it is not for everyone
> and not always available. dma-buf are indirect allocated by
> userspace. So with this value we can monitor and detect
> userspace applications that have problems.
> 
> Signed-off-by: Peter Enderborg 

So there have been tons of discussions around how to track dma-buf and
why, and I really need to understand the use-cass here first I think. proc
uapi is as much forever as anything else, and depending what you're doing
this doesn't make any sense at all:

- on most linux systems dma-buf are only instantiated for shared buffer.
  So there this gives you a fairly meaningless number and not anything
  reflecting gpu memory usage at all.

- on Android all buffers are allocated through dma-buf afaik. But there
  we've recently had some discussions about how exactly we should track
  all this, and the conclusion was that most of this should be solved by
  cgroups long term. So if this is for Android, then I don't think adding
  random quick stop-gaps to upstream is a good idea (because it's a pretty
  long list of patches that have come up on this).

So what is this for?
-Daniel

> ---
>  drivers/dma-buf/dma-buf.c | 12 
>  fs/proc/meminfo.c |  5 -
>  include/linux/dma-buf.h   |  1 +
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index f264b70c383e..4dc37cd4293b 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -37,6 +37,7 @@ struct dma_buf_list {
>  };
>  
>  static struct dma_buf_list db_list;
> +static atomic_long_t dma_buf_global_allocated;
>  
>  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
>  {
> @@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
>   if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
>   dma_resv_fini(dmabuf->resv);
>  
> + atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
>   module_put(dmabuf->owner);
>   kfree(dmabuf->name);
>   kfree(dmabuf);
> @@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
> dma_buf_export_info *exp_info)
>   mutex_lock(&db_list.lock);
>   list_add(&dmabuf->list_node, &db_list.head);
>   mutex_unlock(&db_list.lock);
> + atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
>  
>   return dmabuf;
>  
> @@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
> dma_buf_map *map)
>  }
>  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
>  
> +/**
> + * dma_buf_allocated_pages - Return the used nr of pages
> + * allocated for dma-buf
> + */
> +long dma_buf_allocated_pages(void)
> +{
> + return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
> +}
> +
>  #ifdef CONFIG_DEBUG_FS
>  static int dma_buf_debug_show(struct seq_file *s, void *unused)
>  {
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 6fa761c9cc78..ccc7c40c8db7 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -16,6 +16,7 @@
>  #ifdef CONFIG_CMA
>  #include 
>  #endif
> +#include 
>  #include 
>  #include "internal.h"
>  
> @@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>   show_val_kb(m, "CmaFree:",
>   global_zone_page_state(NR_FREE_CMA_PAGES));
>  #endif
> -
> +#ifdef CONFIG_DMA_SHARED_BUFFER
> + show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
> +#endif
>   hugetlb_report_meminfo(m);
>  
>   arch_report_meminfo(m);
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index efdc56b9d95f..5b05816bd2cd 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct 
> *,
>unsigned long);
>  int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
>  void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
> +long dma_buf_allocated_pages(void);
>  #endif /* __DMA_BUF_H__ */
> -- 
> 2.17.1
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/2] drm/gma500: remove trailing whitespaces

2021-04-20 Thread Daniel Vetter

On Mon, Apr 19, 2021 at 10:18:07AM +0200, Krzysztof Kozlowski wrote:
> Remove trailing whitespaces.  No functional change.
> 
> Signed-off-by: Krzysztof Kozlowski 

Both patches applied to drm-misc-next, thanks.
-Daniel

> ---
>  drivers/gpu/drm/gma500/backlight.c|  4 +--
>  drivers/gpu/drm/gma500/cdv_intel_dp.c | 50 +--
>  2 files changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/gma500/backlight.c 
> b/drivers/gpu/drm/gma500/backlight.c
> index 35600d070cb5..9e90258541a4 100644
> --- a/drivers/gpu/drm/gma500/backlight.c
> +++ b/drivers/gpu/drm/gma500/backlight.c
> @@ -42,7 +42,7 @@ void gma_backlight_disable(struct drm_device *dev)
>   dev_priv->backlight_device->props.brightness = 0;
>   do_gma_backlight_set(dev);
>   }
> -#endif   
> +#endif
>  }
>  
>  void gma_backlight_set(struct drm_device *dev, int v)
> @@ -54,7 +54,7 @@ void gma_backlight_set(struct drm_device *dev, int v)
>   dev_priv->backlight_device->props.brightness = v;
>   do_gma_backlight_set(dev);
>   }
> -#endif   
> +#endif
>  }
>  
>  int gma_backlight_init(struct drm_device *dev)
> diff --git a/drivers/gpu/drm/gma500/cdv_intel_dp.c 
> b/drivers/gpu/drm/gma500/cdv_intel_dp.c
> index 6d3ada39ff86..595b765ecc71 100644
> --- a/drivers/gpu/drm/gma500/cdv_intel_dp.c
> +++ b/drivers/gpu/drm/gma500/cdv_intel_dp.c
> @@ -245,7 +245,7 @@ i2c_dp_aux_add_bus(struct i2c_adapter *adapter)
>  if (W && !in_dbg_master()) msleep(W);   \
>  }   \
>  ret__;  \
> -})  
> +})
>  
>  #define wait_for(COND, MS) _wait_for(COND, MS, 1)
>  
> @@ -386,7 +386,7 @@ static void cdv_intel_edp_panel_vdd_on(struct gma_encoder 
> *intel_encoder)
>   if (intel_dp->panel_on) {
>   DRM_DEBUG_KMS("Skip VDD on because of panel on\n");
>   return;
> - }   
> + }
>   DRM_DEBUG_KMS("\n");
>  
>   pp = REG_READ(PP_CONTROL);
> @@ -433,7 +433,7 @@ static bool cdv_intel_edp_panel_on(struct gma_encoder 
> *intel_encoder)
>   DRM_DEBUG_KMS("Error in Powering up eDP panel, status %x\n", 
> REG_READ(PP_STATUS));
>   intel_dp->panel_on = false;
>   } else
> - intel_dp->panel_on = true;  
> + intel_dp->panel_on = true;
>   msleep(intel_dp->panel_power_up_delay);
>  
>   return false;
> @@ -449,7 +449,7 @@ static void cdv_intel_edp_panel_off (struct gma_encoder 
> *intel_encoder)
>  
>   pp = REG_READ(PP_CONTROL);
>  
> - if ((pp & POWER_TARGET_ON) == 0) 
> + if ((pp & POWER_TARGET_ON) == 0)
>   return;
>  
>   intel_dp->panel_on = false;
> @@ -464,7 +464,7 @@ static void cdv_intel_edp_panel_off (struct gma_encoder 
> *intel_encoder)
>   DRM_DEBUG_KMS("PP_STATUS %x\n", REG_READ(PP_STATUS));
>  
>   if (wait_for((REG_READ(PP_STATUS) & idle_off_mask) == 0, 1000)) {
> - DRM_DEBUG_KMS("Error in turning off Panel\n");  
> + DRM_DEBUG_KMS("Error in turning off Panel\n");
>   }
>  
>   msleep(intel_dp->panel_power_cycle_delay);
> @@ -535,7 +535,7 @@ cdv_intel_dp_mode_valid(struct drm_connector *connector,
>   if (cdv_intel_dp_link_required(mode->clock, 24)
>   > cdv_intel_dp_max_data_rate(max_link_clock, max_lanes))
>   return MODE_CLOCK_HIGH;
> - 
> +
>   }
>   if (mode->clock < 1)
>   return MODE_CLOCK_LOW;
> @@ -606,7 +606,7 @@ cdv_intel_dp_aux_ch(struct gma_encoder *encoder,
>   for (i = 0; i < send_bytes; i += 4)
>   REG_WRITE(ch_data + i,
>  pack_aux(send + i, send_bytes - i));
> - 
> +
>   /* Send the command and wait for it to complete */
>   REG_WRITE(ch_ctl,
>  DP_AUX_CH_CTL_SEND_BUSY |
> @@ -623,7 +623,7 @@ cdv_intel_dp_aux_ch(struct gma_encoder *encoder,
>   break;
>   udelay(100);
>   }
> - 
> +
>   /* Clear done status and any errors */
>   REG_WRITE(ch_ctl,
>  status |
> @@ -659,7 +659,7 @@ cdv_intel_dp_aux_ch(struct gma_encoder *encoder,
> DP_AUX_CH_CTL_MESSAGE_SIZE_SHIFT);
>   if (recv_bytes > recv_size)
>   recv_bytes = recv_size;
> - 
> +
>   for (i = 0; i < recv_bytes; i += 4)
>   unpack_aux(REG_READ(ch_data + i),
>  recv + i, recv_bytes - i);
> @@ -870,7 +870,7 @@ cdv_intel_dp_i2c_init(struct gma_connector *connector,
>   ret = i2c_dp_aux_add_bus(&intel_dp->adapter);
>   if (is_edp(encoder))
>   cdv_intel_edp_panel_vdd_off(encoder);
> - 
> +
>   return ret;
>  }
>  
> @@ -1291,13 +1291,13 @@ cdv_intel_get_adjust_train(s

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg



>> But that isn't really system memory at all, it's just allocated device
>> memory.
> OK, that was not really clear to me. So this is not really accounted to
> MemTotal? If that is really the case then reporting it into the oom
> report is completely pointless and I am not even sure /proc/meminfo is
> the right interface either. It would just add more confusion I am
> afraid.
>  

Why is it confusing? Documentation is quite clear:

"Provides information about distribution and utilization of memory. This
varies by architecture and compile options."

A topology with VRAM fits very well on that. The point is to have an
overview.


>>> See where I am heading?
>> Yeah, totally. Thanks for pointing this out.
>>
>> Suggestions how to handle that?
> As I've pointed out in previous reply we do have an API to account per
> node memory but now that you have brought up that this is not something
> we account as a regular memory then this doesn't really fit into that
> model. But maybe I am just confused.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/mediatek: set panel orientation before drm_dev_register().

2021-04-20 Thread Hsin-Yi Wang

On Fri, Apr 9, 2021 at 12:53 PM Hsin-Yi Wang  wrote:
>
> drm_dev_register() sets connector->registration_state to
> DRM_CONNECTOR_REGISTERED and dev->registered to true. If
> drm_connector_set_panel_orientation() is first called after
> drm_dev_register(), it will fail several checks and results in following
> warning. So set panel orientation in dsi before drm_dev_register() is
> called.
>
> [4.480976] [ cut here ]
> [4.485603] WARNING: CPU: 5 PID: 369 at 
> drivers/gpu/drm/drm_mode_object.c:45 __drm_mode_object_add+0xb4/0xbc
> 
> [4.609772] Call trace:
> [4.612208]  __drm_mode_object_add+0xb4/0xbc
> [4.616466]  drm_mode_object_add+0x20/0x2c
> [4.620552]  drm_property_create+0xdc/0x174
> [4.624723]  drm_property_create_enum+0x34/0x98
> [4.629241]  drm_connector_set_panel_orientation+0x64/0xa0
> [4.634716]  boe_panel_get_modes+0x88/0xd8
> [4.638802]  drm_panel_get_modes+0x2c/0x48
> [4.642887]  panel_bridge_get_modes+0x1c/0x28
> [4.647233]  drm_bridge_connector_get_modes+0xa0/0xd4
> [4.652273]  drm_helper_probe_single_connector_modes+0x218/0x700
> [4.658266]  drm_mode_getconnector+0x1b4/0x45c
> [4.662699]  drm_ioctl_kernel+0xac/0x128
> [4.11]  drm_ioctl+0x268/0x410
> [4.670002]  drm_compat_ioctl+0xdc/0xf0
> [4.673829]  __arm64_compat_sys_ioctl+0xc8/0x100
> [4.678436]  el0_svc_common+0xf4/0x1c0
> [4.682174]  do_el0_svc_compat+0x28/0x3c
> [4.686088]  el0_svc_compat+0x10/0x1c
> [4.689738]  el0_sync_compat_handler+0xa8/0xcc
> [4.694171]  el0_sync_compat+0x178/0x180
> [4.698082] ---[ end trace b4f2db9d9c88610b ]---
> [4.702721] [ cut here ]
> [4.707329] WARNING: CPU: 5 PID: 369 at 
> drivers/gpu/drm/drm_mode_object.c:243 drm_object_attach_property+0x48/0xb8
> 
> [4.833830] Call trace:
> [4.836266]  drm_object_attach_property+0x48/0xb8
> [4.840958]  drm_connector_set_panel_orientation+0x84/0xa0
> [4.846432]  boe_panel_get_modes+0x88/0xd8
> [4.850516]  drm_panel_get_modes+0x2c/0x48
> [4.854600]  panel_bridge_get_modes+0x1c/0x28
> [4.858946]  drm_bridge_connector_get_modes+0xa0/0xd4
> [4.863984]  drm_helper_probe_single_connector_modes+0x218/0x700
> [4.869978]  drm_mode_getconnector+0x1b4/0x45c
> [4.874410]  drm_ioctl_kernel+0xac/0x128
> [4.878320]  drm_ioctl+0x268/0x410
> [4.881711]  drm_compat_ioctl+0xdc/0xf0
> [4.885536]  __arm64_compat_sys_ioctl+0xc8/0x100
> [4.890142]  el0_svc_common+0xf4/0x1c0
> [4.893879]  do_el0_svc_compat+0x28/0x3c
> [4.897791]  el0_svc_compat+0x10/0x1c
> [4.901441]  el0_sync_compat_handler+0xa8/0xcc
> [4.905873]  el0_sync_compat+0x178/0x180
> [4.909783] ---[ end trace b4f2db9d9c88610c ]---
>
> Signed-off-by: Hsin-Yi Wang 

ping on the thread, thanks.

> ---
>  drivers/gpu/drm/mediatek/mtk_dsi.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
> b/drivers/gpu/drm/mediatek/mtk_dsi.c
> index ae403c67cbd9..45a702ee09f3 100644
> --- a/drivers/gpu/drm/mediatek/mtk_dsi.c
> +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
> @@ -205,6 +205,7 @@ struct mtk_dsi {
> u32 irq_data;
> wait_queue_head_t irq_wait_queue;
> const struct mtk_dsi_driver_data *driver_data;
> +   enum drm_panel_orientation orientation;
>  };
>
>  static inline struct mtk_dsi *bridge_to_dsi(struct drm_bridge *b)
> @@ -966,6 +967,8 @@ static int mtk_dsi_encoder_init(struct drm_device *drm, 
> struct mtk_dsi *dsi)
> }
> drm_connector_attach_encoder(dsi->connector, &dsi->encoder);
>
> +   drm_connector_set_panel_orientation(dsi->connector, dsi->orientation);
> +
> return 0;
>
>  err_cleanup_encoder:
> @@ -1029,6 +1032,12 @@ static int mtk_dsi_probe(struct platform_device *pdev)
> ret = PTR_ERR(dsi->next_bridge);
> goto err_unregister_host;
> }
> +
> +   ret = of_drm_get_panel_orientation(panel->dev->of_node, 
> &dsi->orientation);
> +   if (ret) {
> +   dev_err(dev, "failed to get panel orientation %d\n", 
> ret);
> +   return ret;
> +   }
> }
>
> dsi->driver_data = of_device_get_match_data(dev);
> --
> 2.31.1.295.g9ea45b61b8-goog
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-20 Thread Arnd Bergmann

On Tue, Apr 20, 2021 at 1:44 AM Dominique MARTINET
 wrote:
> Arnd Bergmann wrote on Mon, Apr 19, 2021 at 02:16:36PM +0200:
> > For built-in drivers, load order depends on the initcall level and
> > link order (how things are lined listed in the Makefile hierarchy).
> >
> > For loadable modules, this is up to user space in the end.
> >
> > Which of the drivers in this scenario are loadable modules?
>
> All the drivers involved in my case are built-in (nvmem, soc and final
> soc_device_match consumer e.g. caam_jr that crashes the kernel if soc is
> not identified properly).

Ok, in that case you may have a chance to just adapt the initcall
levels. This is somewhat fragile if someone else already relies
on a particular order, but it's an easy one-line change to change
a driver e.g. from module_init() or device_initcall() to arch_initcall().

> I frankly don't like the idea of moving nvmem/ above soc/ in
> drivers/Makefile as a "solution" to this (especially as there is one
> that seems to care about what soc they run on...), so I'll have a look
> at links first, hopefully that will work out.

Right, that would be way more fragile.

I think the main problem in this case is the caam driver that really
should not look into the particular SoC type or even machine
compatible string. This is something we can do as a last resort
for compatibility with busted devicetree files, but it appears that
this driver does it as the primary method for identifying different
hardware revisions. I would suggest fixing the binding so that
each SoC that includes one of these devices has a soc specific
compatible string associated with the device that the driver can
use as the primary way of identifying the device.

We probably need to keep the old logic around for old dtb files,
but there can at least be a comment next to that table that
discourages people from adding more entries there.

  Arnd
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-20 Thread Arnd Bergmann

On Tue, Apr 20, 2021 at 1:44 AM Dominique MARTINET
 wrote:
> Arnd Bergmann wrote on Mon, Apr 19, 2021 at 02:16:36PM +0200:
> > For built-in drivers, load order depends on the initcall level and
> > link order (how things are lined listed in the Makefile hierarchy).
> >
> > For loadable modules, this is up to user space in the end.
> >
> > Which of the drivers in this scenario are loadable modules?
>
> All the drivers involved in my case are built-in (nvmem, soc and final
> soc_device_match consumer e.g. caam_jr that crashes the kernel if soc is
> not identified properly).

Ok, in that case you may have a chance to just adapt the initcall
levels. This is somewhat fragile if someone else already relies
on a particular order, but it's an easy one-line change to change
a driver e.g. from module_init() or device_initcall() to arch_initcall().

> I frankly don't like the idea of moving nvmem/ above soc/ in
> drivers/Makefile as a "solution" to this (especially as there is one
> that seems to care about what soc they run on...), so I'll have a look
> at links first, hopefully that will work out.

Right, that would be way more fragile.

I think the main problem in this case is the caam driver that really
should not look into the particular SoC type or even machine
compatible string. This is something we can do as a last resort
for compatibility with busted devicetree files, but it appears that
this driver does it as the primary method for identifying different
hardware revisions. I would suggest fixing the binding so that
each SoC that includes one of these devices has a soc specific
compatible string associated with the device that the driver can
use as the primary way of identifying the device.

We probably need to keep the old logic around for old dtb files,
but there can at least be a comment next to that table that
discourages people from adding more entries there.

  Arnd
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/connector: demote connector force-probes for non-master clients

2021-04-20 Thread Daniel Vetter

On Fri, Apr 02, 2021 at 01:22:12PM +0200, Simon Ser wrote:
> Force-probing a connector can be slow and cause flickering. As this
> affects the global KMS state, let's make it so only the DRM master
> can force-probe a connector.
> 
> Non-master DRM clients won't be able to force-probe a connector
> anymore. Instead, KMS will perform a regular read-only connector
> query.
> 
> Signed-off-by: Simon Ser 
> Cc: Daniel Vetter 
> Cc: Pekka Paalanen 
> ---
>  drivers/gpu/drm/drm_connector.c | 11 ---
>  include/uapi/drm/drm_mode.h |  7 ---
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
> index 7631f76e7f34..2f70a52a892b 100644
> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -20,6 +20,7 @@
>   * OF THIS SOFTWARE.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2374,9 +2375,13 @@ int drm_mode_getconnector(struct drm_device *dev, void 
> *data,
>  
>   mutex_lock(&dev->mode_config.mutex);
>   if (out_resp->count_modes == 0) {
> - connector->funcs->fill_modes(connector,
> -  dev->mode_config.max_width,
> -  dev->mode_config.max_height);
> + if (drm_is_current_master(file_priv))
> + connector->funcs->fill_modes(connector,
> +  dev->mode_config.max_width,
> +  
> dev->mode_config.max_height);
> + else
> + drm_dbg_kms(dev, "User-space requested a forced probe 
> on [CONNECTOR:%d:%s] but is not the DRM master, demoting to read-only probe",
> + connector->base.id, connector->name);
>   }
>  
>   out_resp->mm_width = connector->display_info.width_mm;
> diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
> index a5e76aa06ad5..3efa2e38d89b 100644
> --- a/include/uapi/drm/drm_mode.h
> +++ b/include/uapi/drm/drm_mode.h
> @@ -413,9 +413,10 @@ enum drm_mode_subconnector {
>   *
>   * **Force-probing a connector**
>   *
> - * If the @count_modes field is set to zero, the kernel will perform a forced
> - * probe on the connector to refresh the connector status, modes and EDID.
> - * A forced-probe can be slow, might cause flickering and the ioctl will 
> block.
> + * If the @count_modes field is set to zero and the DRM client is the DRM

*current* DRM master

The drm master/client relationship survives a DROPMASTER ioctl, but also
it's only really relevant for the old authmagic dance. But just to be
consistent here.

> + * master, the kernel will perform a forced probe on the connector to refresh
> + * the connector status, modes and EDID. A forced-probe can be slow, might
> + * cause flickering and the ioctl will block.

Do we have an igt for this? Timing test should do the job I think,
assuming we have at least one output which requires an edid read (so maybe
skip the test if a forced probe takes less than 10ms or so).

Patch lgtm, but igt would be really nice here.

Reviewed-by: Daniel Vetter 

>   *
>   * User-space needs to force-probe connectors to ensure their metadata is
>   * up-to-date at startup and after receiving a hot-plug event. User-space
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 0/9] drm: Support simple-framebuffer devices and firmware fbs

2021-04-20 Thread Geert Uytterhoeven

Hi Daniel,

On Tue, Apr 20, 2021 at 10:46 AM Daniel Vetter  wrote:
> On Mon, Apr 19, 2021 at 10:00:56AM +0200, Geert Uytterhoeven wrote:
> > On Fri, Apr 16, 2021 at 11:00 AM Thomas Zimmermann  
> > wrote:
> > > This patchset adds support for simple-framebuffer platform devices and
> > > a handover mechanism for native drivers to take-over control of the
> > > hardware.
> > >
> > > The new driver, called simpledrm, binds to a simple-frambuffer platform
> > > device. The kernel's boot code creates such devices for firmware-provided
> > > framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot
> > > loader sets up the framebuffers. Description via device tree is also an
> > > option.
> >
> > I guess this can be used as a replacement for offb, too...
> >
> > > Patches 4 to 8 add the simpledrm driver. It's build on simple DRM helpers
> > > and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During
> >
> >  if support for 8-bit frame buffers would be added?
>
> Is that 8-bit greyscale or 8-bit indexed with 256 entry palette? Former

8-bit indexed with 256 entry palette.

> shouldn't be a big thing, but the latter is only really supported by the
> overall drm ecosystem in theory. Most userspace assumes that xrgb
> works, and we keep that illusion up by emulating it in kernel for hw which
> just doesn't support it. But reformatting xrgb to c8 is tricky at
> best. The uapis are all there for setting the palette, and C8 is a defined
> format even with atomic kms interface, but really there's not much
> userspace for it. In other words, it would work as well as current offb
> would, but that's at least that.

Oh, that's good to know!
Does this mean fbdev is not deprecated for anything <= C8? ;-)

A while ago, I was looking into converting an fbdev driver to drm, and
one of the things I ran into is lack of C4, C2, C1, Y8, Y4, Y2, and
monochrome support.  On top of that, lots of internal code seems to
assume pixels are never smaller than a byte (thus ignoring
char_per_block[]/block_w).  The lack of support for planar modes isn't
that bad, combined with the need for copying, as c2p conversion can be
done while copying, thus even making life easier for userspace
applications that can just always work on chunky data.
Then real work kicked in, before I got anything working...

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 0/9] drm: Support simple-framebuffer devices and firmware fbs

2021-04-20 Thread Gerd Hoffmann

  Hi,

> > > Patches 4 to 8 add the simpledrm driver. It's build on simple DRM helpers
> > > and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During
> > 
> >  if support for 8-bit frame buffers would be added?
> 
> Is that 8-bit greyscale or 8-bit indexed with 256 entry palette? Former
> shouldn't be a big thing, but the latter is only really supported by the
> overall drm ecosystem in theory. Most userspace assumes that xrgb
> works, and we keep that illusion up by emulating it in kernel for hw which
> just doesn't support it. But reformatting xrgb to c8 is tricky at
> best.

Well.  cirrus converts xrgb on the fly to rgb888 or rgb565
(depending on display resolution).  We could pull off the same trick
here and convert to rgb332 (assuming we can program the palette with the
color cube needed for that).  Wouldn't look pretty, but would probably
work better than expecting userspace know what color palettes are in
2021 ...

take care,
  Gerd

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 11:12 AM, Michal Hocko wrote:
> On Tue 20-04-21 09:02:57, peter.enderb...@sony.com wrote:
 But that isn't really system memory at all, it's just allocated device
 memory.
>>> OK, that was not really clear to me. So this is not really accounted to
>>> MemTotal? If that is really the case then reporting it into the oom
>>> report is completely pointless and I am not even sure /proc/meminfo is
>>> the right interface either. It would just add more confusion I am
>>> afraid.
>>>  
>> Why is it confusing? Documentation is quite clear:
> Because a single counter without a wider context cannot be put into any
> reasonable context. There is no notion of the total amount of device
> memory usable for dma-buf. As Christian explained some of it can be RAM
> based. So a single number is rather pointless on its own in many cases.
>
> Or let me just ask. What can you tell from dma-bud: $FOO kB in its
> current form?

It is better to be blind? The value can still be used a relative metric.
You collect the data and see how it change. And when you see
a unexpected change you start to dig in. It fore sure wont tell what line
in your application that has a bug.  But it might be an indicator that
a new game trigger a leak. And it is very well specified, it exactly the
size of mapped dma-buf. For what you use dma-buf you need to know
other parts of the system.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 10:58 AM, Daniel Vetter wrote:
> On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
>> This adds a total used dma-buf memory. Details
>> can be found in debugfs, however it is not for everyone
>> and not always available. dma-buf are indirect allocated by
>> userspace. So with this value we can monitor and detect
>> userspace applications that have problems.
>>
>> Signed-off-by: Peter Enderborg 
> So there have been tons of discussions around how to track dma-buf and
> why, and I really need to understand the use-cass here first I think. proc
> uapi is as much forever as anything else, and depending what you're doing
> this doesn't make any sense at all:
>
> - on most linux systems dma-buf are only instantiated for shared buffer.
>   So there this gives you a fairly meaningless number and not anything
>   reflecting gpu memory usage at all.
>
> - on Android all buffers are allocated through dma-buf afaik. But there
>   we've recently had some discussions about how exactly we should track
>   all this, and the conclusion was that most of this should be solved by
>   cgroups long term. So if this is for Android, then I don't think adding
>   random quick stop-gaps to upstream is a good idea (because it's a pretty
>   long list of patches that have come up on this).
>
> So what is this for?

For the overview. dma-buf today only have debugfs for info. Debugfs
is not allowed by google to use in andoid. So this aggregate the information
so we can get information on what going on on the system. 

And the LKML standard respond to that is "SHOW ME THE CODE".

When the top memgc has a aggregated information on dma-buf it is maybe
a better source to meminfo. But then it also imply that dma-buf requires memcg.

And I dont see any problem to replace this with something better with it is 
ready.

> -Daniel
>
>> ---
>>  drivers/dma-buf/dma-buf.c | 12 
>>  fs/proc/meminfo.c |  5 -
>>  include/linux/dma-buf.h   |  1 +
>>  3 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>> index f264b70c383e..4dc37cd4293b 100644
>> --- a/drivers/dma-buf/dma-buf.c
>> +++ b/drivers/dma-buf/dma-buf.c
>> @@ -37,6 +37,7 @@ struct dma_buf_list {
>>  };
>>  
>>  static struct dma_buf_list db_list;
>> +static atomic_long_t dma_buf_global_allocated;
>>  
>>  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
>>  {
>> @@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
>>  if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
>>  dma_resv_fini(dmabuf->resv);
>>  
>> +atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
>>  module_put(dmabuf->owner);
>>  kfree(dmabuf->name);
>>  kfree(dmabuf);
>> @@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
>> dma_buf_export_info *exp_info)
>>  mutex_lock(&db_list.lock);
>>  list_add(&dmabuf->list_node, &db_list.head);
>>  mutex_unlock(&db_list.lock);
>> +atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
>>  
>>  return dmabuf;
>>  
>> @@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
>> dma_buf_map *map)
>>  }
>>  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
>>  
>> +/**
>> + * dma_buf_allocated_pages - Return the used nr of pages
>> + * allocated for dma-buf
>> + */
>> +long dma_buf_allocated_pages(void)
>> +{
>> +return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
>> +}
>> +
>>  #ifdef CONFIG_DEBUG_FS
>>  static int dma_buf_debug_show(struct seq_file *s, void *unused)
>>  {
>> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
>> index 6fa761c9cc78..ccc7c40c8db7 100644
>> --- a/fs/proc/meminfo.c
>> +++ b/fs/proc/meminfo.c
>> @@ -16,6 +16,7 @@
>>  #ifdef CONFIG_CMA
>>  #include 
>>  #endif
>> +#include 
>>  #include 
>>  #include "internal.h"
>>  
>> @@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>>  show_val_kb(m, "CmaFree:",
>>  global_zone_page_state(NR_FREE_CMA_PAGES));
>>  #endif
>> -
>> +#ifdef CONFIG_DMA_SHARED_BUFFER
>> +show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
>> +#endif
>>  hugetlb_report_meminfo(m);
>>  
>>  arch_report_meminfo(m);
>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>> index efdc56b9d95f..5b05816bd2cd 100644
>> --- a/include/linux/dma-buf.h
>> +++ b/include/linux/dma-buf.h
>> @@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct 
>> *,
>>   unsigned long);
>>  int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>  void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
>> +long dma_buf_allocated_pages(void);
>>  #endif /* __DMA_BUF_H__ */
>> -- 
>> 2.17.1
>>
>> ___
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://urldefense.com/v3/__https://lists.freedesktop.org/mailman/listinfo

Re: [PATCH v4 0/9] drm: Support simple-framebuffer devices and firmware fbs

2021-04-20 Thread Geert Uytterhoeven

Hi Gerd,

On Tue, Apr 20, 2021 at 11:22 AM Gerd Hoffmann  wrote:
> > > > Patches 4 to 8 add the simpledrm driver. It's build on simple DRM 
> > > > helpers
> > > > and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. 
> > > > During
> > >
> > >  if support for 8-bit frame buffers would be added?
> >
> > Is that 8-bit greyscale or 8-bit indexed with 256 entry palette? Former
> > shouldn't be a big thing, but the latter is only really supported by the
> > overall drm ecosystem in theory. Most userspace assumes that xrgb
> > works, and we keep that illusion up by emulating it in kernel for hw which
> > just doesn't support it. But reformatting xrgb to c8 is tricky at
> > best.
>
> Well.  cirrus converts xrgb on the fly to rgb888 or rgb565
> (depending on display resolution).  We could pull off the same trick
> here and convert to rgb332 (assuming we can program the palette with the
> color cube needed for that).  Wouldn't look pretty, but would probably
> work better than expecting userspace know what color palettes are in
> 2021 ...

Yeah, I already had a similar idea for Amiga HAM ;-)

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Mike Rapoport

Hello Peter,

On Tue, Apr 20, 2021 at 09:26:00AM +, peter.enderb...@sony.com wrote:
> On 4/20/21 10:58 AM, Daniel Vetter wrote:
> > On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
> >> This adds a total used dma-buf memory. Details
> >> can be found in debugfs, however it is not for everyone
> >> and not always available. dma-buf are indirect allocated by
> >> userspace. So with this value we can monitor and detect
> >> userspace applications that have problems.
> >>
> >> Signed-off-by: Peter Enderborg 
> > So there have been tons of discussions around how to track dma-buf and
> > why, and I really need to understand the use-cass here first I think. proc
> > uapi is as much forever as anything else, and depending what you're doing
> > this doesn't make any sense at all:
> >
> > - on most linux systems dma-buf are only instantiated for shared buffer.
> >   So there this gives you a fairly meaningless number and not anything
> >   reflecting gpu memory usage at all.
> >
> > - on Android all buffers are allocated through dma-buf afaik. But there
> >   we've recently had some discussions about how exactly we should track
> >   all this, and the conclusion was that most of this should be solved by
> >   cgroups long term. So if this is for Android, then I don't think adding
> >   random quick stop-gaps to upstream is a good idea (because it's a pretty
> >   long list of patches that have come up on this).
> >
> > So what is this for?
> 
> For the overview. dma-buf today only have debugfs for info. Debugfs
> is not allowed by google to use in andoid. So this aggregate the information
> so we can get information on what going on on the system. 
 
Can you send an example debugfs output to see what data are we talking
about?

> And the LKML standard respond to that is "SHOW ME THE CODE".
> 
> When the top memgc has a aggregated information on dma-buf it is maybe
> a better source to meminfo. But then it also imply that dma-buf requires 
> memcg.
> 
> And I dont see any problem to replace this with something better with it is 
> ready.

Well, the problem with replacing the counter in /proc/meminfo is that it
requires all users of /proc/meminfo to adapt to the changes.

That's why it's way less painful to go an extra mile and define (hopefully)
stable user ABI up front.

Why can't you use fdinfo to show how much memory consumed by a dma-buf?

> > -Daniel
> >
> >> ---
> >>  drivers/dma-buf/dma-buf.c | 12 
> >>  fs/proc/meminfo.c |  5 -
> >>  include/linux/dma-buf.h   |  1 +
> >>  3 files changed, 17 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >> index f264b70c383e..4dc37cd4293b 100644
> >> --- a/drivers/dma-buf/dma-buf.c
> >> +++ b/drivers/dma-buf/dma-buf.c
> >> @@ -37,6 +37,7 @@ struct dma_buf_list {
> >>  };
> >>  
> >>  static struct dma_buf_list db_list;
> >> +static atomic_long_t dma_buf_global_allocated;
> >>  
> >>  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int 
> >> buflen)
> >>  {
> >> @@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
> >>if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
> >>dma_resv_fini(dmabuf->resv);
> >>  
> >> +  atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
> >>module_put(dmabuf->owner);
> >>kfree(dmabuf->name);
> >>kfree(dmabuf);
> >> @@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
> >> dma_buf_export_info *exp_info)
> >>mutex_lock(&db_list.lock);
> >>list_add(&dmabuf->list_node, &db_list.head);
> >>mutex_unlock(&db_list.lock);
> >> +  atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
> >>  
> >>return dmabuf;
> >>  
> >> @@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
> >> dma_buf_map *map)
> >>  }
> >>  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
> >>  
> >> +/**
> >> + * dma_buf_allocated_pages - Return the used nr of pages
> >> + * allocated for dma-buf
> >> + */
> >> +long dma_buf_allocated_pages(void)
> >> +{
> >> +  return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
> >> +}
> >> +
> >>  #ifdef CONFIG_DEBUG_FS
> >>  static int dma_buf_debug_show(struct seq_file *s, void *unused)
> >>  {
> >> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> >> index 6fa761c9cc78..ccc7c40c8db7 100644
> >> --- a/fs/proc/meminfo.c
> >> +++ b/fs/proc/meminfo.c
> >> @@ -16,6 +16,7 @@
> >>  #ifdef CONFIG_CMA
> >>  #include 
> >>  #endif
> >> +#include 
> >>  #include 
> >>  #include "internal.h"
> >>  
> >> @@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void 
> >> *v)
> >>show_val_kb(m, "CmaFree:",
> >>global_zone_page_state(NR_FREE_CMA_PAGES));
> >>  #endif
> >> -
> >> +#ifdef CONFIG_DMA_SHARED_BUFFER
> >> +  show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
> >> +#endif
> >>hugetlb_report_meminfo(m);
> >>  
> >>arch_report_meminfo(m);
> >> diff --git a/include/linux/

Re: [PULL] topic/intel-gen-to-ver -> drm-intel-next and drm-intel-gt-next

2021-04-20 Thread Joonas Lahtinen

Quoting Jani Nikula (2021-04-19 12:53:11)
> 
> Hi Joonas and Rodrigo -
> 
> Here's the gen to ver conversion topic branch to be merged to both
> drm-intel-next and drm-intel-gt-next.

Pulled.

Regards, Joonas

> Lots of Cc's for heads up.
> 
> 
> BR,
> Jani.
> 
> 
> topic/intel-gen-to-ver-2021-04-19:
> Gen to ver conversions across the driver
> 
> The main change is Lucas' series [1], with Ville's GLK fixes [2] and a
> cherry-pick of Matt's commit [3] from drm-intel-next as a base to avoid
> conflicts.
> 
> [1] https://patchwork.freedesktop.org/series/88825/
> [2] https://patchwork.freedesktop.org/series/88938/
> [3] 70bfb30743d5 ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
> 
> BR,
> Jani.
> 
> The following changes since commit 9c0fed84d5750e1eea6c664e073ffa2534a17743:
> 
>   Merge tag 'drm-intel-next-2021-04-01' of 
> git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-04-08 
> 14:02:21 +1000)
> 
> are available in the Git repository at:
> 
>   git://anongit.freedesktop.org/drm/drm-intel 
> tags/topic/intel-gen-to-ver-2021-04-19
> 
> for you to fetch changes up to 425390c5dce6da76578389629d19517fcd79c959:
> 
>   drm/i915: split dgfx features from gen 12 (2021-04-14 13:05:06 +0300)
> 
> 
> Gen to ver conversions across the driver
> 
> The main change is Lucas' series [1], with Ville's GLK fixes [2] and a
> cherry-pick of Matt's commit [3] from drm-intel-next as a base to avoid
> conflicts.
> 
> [1] https://patchwork.freedesktop.org/series/88825/
> [2] https://patchwork.freedesktop.org/series/88938/
> [3] 70bfb30743d5 ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
> 
> 
> Lucas De Marchi (12):
>   drm/i915/display: use DISPLAY_VER() on remaining users
>   drm/i915: rename display.version to display.ver
>   drm/i915/display: rename display version macros
>   drm/i915: add macros for graphics and media versions
>   drm/i915/gt: replace gen use in intel_engine_cs
>   drm/i915/selftests: replace unused mask with simple version
>   drm/i915/selftests: eliminate use of gen_mask
>   drm/i915: finish removal of gen_mask
>   drm/i915: eliminate remaining uses of intel_device_info->gen
>   drm/i915: finish removal of gen from intel_device_info
>   drm/i915: add media and display versions to device_info print
>   drm/i915: split dgfx features from gen 12
> 
> Matt Roper (1):
>   drm/i915/display: Eliminate IS_GEN9_{BC,LP}
> 
> Ville Syrjälä (5):
>   drm/i915: Restore lost glk FBC 16bpp w/a
>   drm/i915: Restore lost glk ccs w/a
>   drm/i915: Disable LTTPR detection on GLK once again
>   drm/i915: Don't use {skl, cnl}_hpd_pin() for bxt/glk
>   drm/i915: Remove a few redundant glk checks
> 
>  drivers/gpu/drm/i915/display/i9xx_plane.c  |  2 +-
>  drivers/gpu/drm/i915/display/icl_dsi.c |  4 +-
>  drivers/gpu/drm/i915/display/intel_atomic.c|  2 +-
>  drivers/gpu/drm/i915/display/intel_audio.c |  4 +-
>  drivers/gpu/drm/i915/display/intel_bios.c  |  9 +--
>  drivers/gpu/drm/i915/display/intel_bw.c|  8 +--
>  drivers/gpu/drm/i915/display/intel_cdclk.c | 42 +++---
>  drivers/gpu/drm/i915/display/intel_color.c |  6 +-
>  drivers/gpu/drm/i915/display/intel_crt.c   |  6 +-
>  drivers/gpu/drm/i915/display/intel_crtc.c  |  4 +-
>  drivers/gpu/drm/i915/display/intel_csr.c   |  4 +-
>  drivers/gpu/drm/i915/display/intel_ddi.c   | 53 +-
>  drivers/gpu/drm/i915/display/intel_ddi_buf_trans.c | 10 ++--
>  drivers/gpu/drm/i915/display/intel_display.c   | 64 
> +++---
>  .../gpu/drm/i915/display/intel_display_debugfs.c   |  2 +-
>  drivers/gpu/drm/i915/display/intel_display_power.c | 57 ++-
>  drivers/gpu/drm/i915/display/intel_dp.c| 10 ++--
>  .../gpu/drm/i915/display/intel_dp_link_training.c  |  2 +-
>  drivers/gpu/drm/i915/display/intel_dp_mst.c|  2 +-
>  drivers/gpu/drm/i915/display/intel_dpll.c  |  8 +--
>  drivers/gpu/drm/i915/display/intel_dpll_mgr.c  |  6 +-
>  drivers/gpu/drm/i915/display/intel_fb.c|  2 +-
>  drivers/gpu/drm/i915/display/intel_fbc.c   | 21 +++
>  drivers/gpu/drm/i915/display/intel_fifo_underrun.c |  4 +-
>  drivers/gpu/drm/i915/display/intel_gmbus.c | 12 ++--
>  drivers/gpu/drm/i915/display/intel_hdcp.c  |  9 +--
>  drivers/gpu/drm/i915/display/intel_hdmi.c  |  9 +--
>  drivers/gpu/drm/i915/display/intel_lvds.c  |  2 +-
>  drivers/gpu/drm/i915/display/intel_overlay.c   | 10 ++--
>  drivers/gpu/drm/i915/display/intel_panel.c | 10 ++--
>  drivers/gpu/drm/i915/display/intel_pipe_crc.c  |  4 +-
>  drivers/gpu/drm/i915/display/intel_pps.c   | 14 ++---
>  drivers/gpu/drm/i915/display/intel_psr.c   |  4 +-
>  drivers/gpu/drm/i

Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-20 Thread Péter Ujfalusi

Hi Alice,

On 4/19/21 7:27 AM, Alice Guo (OSS) wrote:
> From: Alice Guo 
> 
> Update all the code that use soc_device_match because add support for
> soc_device_match returning -EPROBE_DEFER.
> 
> Signed-off-by: Alice Guo 
> ---
>  drivers/bus/ti-sysc.c |  2 +-
>  drivers/clk/renesas/r8a7795-cpg-mssr.c|  4 +++-
>  drivers/clk/renesas/rcar-gen2-cpg.c   |  2 +-
>  drivers/clk/renesas/rcar-gen3-cpg.c   |  2 +-
>  drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c   |  7 ++-
>  drivers/dma/ti/k3-psil.c  |  3 +++
>  drivers/dma/ti/k3-udma.c  |  2 +-
>  drivers/gpu/drm/bridge/nwl-dsi.c  |  2 +-
>  drivers/gpu/drm/meson/meson_drv.c |  4 +++-
>  drivers/gpu/drm/omapdrm/dss/dispc.c   |  2 +-
>  drivers/gpu/drm/omapdrm/dss/dpi.c |  4 +++-
>  drivers/gpu/drm/omapdrm/dss/dsi.c |  3 +++
>  drivers/gpu/drm/omapdrm/dss/dss.c |  3 +++
>  drivers/gpu/drm/omapdrm/dss/hdmi4_core.c  |  3 +++
>  drivers/gpu/drm/omapdrm/dss/venc.c|  4 +++-
>  drivers/gpu/drm/omapdrm/omap_drv.c|  3 +++
>  drivers/gpu/drm/rcar-du/rcar_du_crtc.c|  4 +++-
>  drivers/gpu/drm/rcar-du/rcar_lvds.c   |  2 +-
>  drivers/gpu/drm/tidss/tidss_dispc.c   |  4 +++-
>  drivers/iommu/ipmmu-vmsa.c|  7 +--
>  drivers/media/platform/rcar-vin/rcar-core.c   |  2 +-
>  drivers/media/platform/rcar-vin/rcar-csi2.c   |  2 +-
>  drivers/media/platform/vsp1/vsp1_uif.c|  4 +++-
>  drivers/mmc/host/renesas_sdhi_core.c  |  2 +-
>  drivers/mmc/host/renesas_sdhi_internal_dmac.c |  2 +-
>  drivers/mmc/host/sdhci-of-esdhc.c | 21 ++-
>  drivers/mmc/host/sdhci-omap.c |  2 +-
>  drivers/mmc/host/sdhci_am654.c|  2 +-
>  drivers/net/ethernet/renesas/ravb_main.c  |  4 +++-
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c  |  2 +-
>  drivers/net/ethernet/ti/cpsw.c|  2 +-
>  drivers/net/ethernet/ti/cpsw_new.c|  2 +-
>  drivers/phy/ti/phy-omap-usb2.c|  4 +++-
>  drivers/pinctrl/renesas/core.c|  2 +-
>  drivers/pinctrl/renesas/pfc-r8a7790.c |  5 -
>  drivers/pinctrl/renesas/pfc-r8a7794.c |  5 -
>  drivers/soc/fsl/dpio/dpio-driver.c| 13 
>  drivers/soc/renesas/r8a774c0-sysc.c   |  5 -
>  drivers/soc/renesas/r8a7795-sysc.c|  2 +-
>  drivers/soc/renesas/r8a77990-sysc.c   |  5 -
>  drivers/soc/ti/k3-ringacc.c   |  2 +-
>  drivers/staging/mt7621-pci/pci-mt7621.c   |  2 +-
>  drivers/thermal/rcar_gen3_thermal.c   |  4 +++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c   | 10 +++--
>  drivers/usb/gadget/udc/renesas_usb3.c |  2 +-
>  drivers/usb/host/ehci-platform.c  |  4 +++-
>  drivers/usb/host/xhci-rcar.c  |  2 +-
>  drivers/watchdog/renesas_wdt.c|  2 +-
>  48 files changed, 131 insertions(+), 52 deletions(-)
> 

...

> diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
> index 96ad21869ba7..50a4c8f0993d 100644
> --- a/drivers/dma/ti/k3-udma.c
> +++ b/drivers/dma/ti/k3-udma.c
> @@ -5188,7 +5188,7 @@ static int udma_probe(struct platform_device *pdev)
>   ud->match_data = match->data;
>  
>   soc = soc_device_match(k3_soc_devices);
> - if (!soc) {
> + if (!IS_ERR(soc) && !soc) {

this does not sound right...

if (!soc || IS_ERR(soc))
or
if (IS_ERR_OR_NULL(soc))
is even better.

>   dev_err(dev, "No compatible SoC found\n");
>   return -ENODEV;

There might be a clever macro for it, but:

return soc ? PTR_ERR(soc) : -ENODEV;

>   }

-- 
Péter
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Christian König


Am 19.04.21 um 17:48 schrieb Jason Ekstrand:

Not going to comment on everything on the first pass...

On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:

Hi,

This is our initial proposal for explicit fences everywhere and new memory 
management that doesn't use BO fences. It's a redesign of how Linux graphics 
drivers work, and it can coexist with what we have now.


1. Introduction
(skip this if you are already sold on explicit fences)

The current Linux graphics architecture was initially designed for GPUs with 
only one graphics queue where everything was executed in the submission order 
and per-BO fences were used for memory management and CPU-GPU synchronization, 
not GPU-GPU synchronization. Later, multiple queues were added on top, which 
required the introduction of implicit GPU-GPU synchronization between queues of 
different processes using per-BO fences. Recently, even parallel execution 
within one queue was enabled where a command buffer starts draws and compute 
shaders, but doesn't wait for them, enabling parallelism between back-to-back 
command buffers. Modesetting also uses per-BO fences for scheduling flips. Our 
GPU scheduler was created to enable all those use cases, and it's the only 
reason why the scheduler exists.

The GPU scheduler, implicit synchronization, BO-fence-based memory management, 
and the tracking of per-BO fences increase CPU overhead and latency, and reduce 
parallelism. There is a desire to replace all of them with something much 
simpler. Below is how we could do it.


2. Explicit synchronization for window systems and modesetting

The producer is an application and the consumer is a compositor or a 
modesetting driver.

2.1. The Present request

As part of the Present request, the producer will pass 2 fences (sync objects) 
to the consumer alongside the presented DMABUF BO:
- The submit fence: Initially unsignalled, it will be signalled when the 
producer has finished drawing into the presented buffer.
- The return fence: Initially unsignalled, it will be signalled when the 
consumer has finished using the presented buffer.

I'm not sure syncobj is what we want.  In the Intel world we're trying
to go even further to something we're calling "userspace fences" which
are a timeline implemented as a single 64-bit value in some
CPU-mappable BO.  The client writes a higher value into the BO to
signal the timeline.


Well that is exactly what our Windows guys have suggested as well, but 
it strongly looks like that this isn't sufficient.


First of all you run into security problems when any application can 
just write any value to that memory location. Just imagine an 
application sets the counter to zero and X waits forever for some 
rendering to finish.


Additional to that in such a model you can't determine who is the guilty 
queue in case of a hang and can't reset the synchronization primitives 
in case of an error.


Apart from that this is rather inefficient, e.g. we don't have any way 
to prevent priority inversion when used as a synchronization mechanism 
between different GPU queues.


Christian.


   The kernel then provides some helpers for
waiting on them reliably and without spinning.  I don't expect
everyone to support these right away but, If we're going to re-plumb
userspace for explicit synchronization, I'd like to make sure we take
this into account so we only have to do it once.



Deadlock mitigation to recover from segfaults:
- The kernel knows which process is obliged to signal which fence. This 
information is part of the Present request and supplied by userspace.

This isn't clear to me.  Yes, if we're using anything dma-fence based
like syncobj, this is true.  But it doesn't seem totally true as a
general statement.



- If the producer crashes, the kernel signals the submit fence, so that the 
consumer can make forward progress.
- If the consumer crashes, the kernel signals the return fence, so that the 
producer can reclaim the buffer.
- A GPU hang signals all fences. Other deadlocks will be handled like GPU hangs.

What do you mean by "all"?  All fences that were supposed to be
signaled by the hung context?



Other window system requests can follow the same idea.

Merged fences where one fence object contains multiple fences will be 
supported. A merged fence is signalled only when its fences are signalled. The 
consumer will have the option to redefine the unsignalled return fence to a 
merged fence.

2.2. Modesetting

Since a modesetting driver can also be the consumer, the present ioctl will 
contain a submit fence and a return fence too. One small problem with this is 
that userspace can hang the modesetting driver, but in theory, any later 
present ioctl can override the previous one, so the unsignalled presentation is 
never used.


3. New memory management

The per-BO fences will be removed and the kernel will not know which buffers 
are busy. This will reduce CPU overhead and latency. The kernel will not need 
per-BO fences wit

Re: Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints

2021-04-20 Thread Daniel Stone

Hi,

On Mon, 19 Apr 2021 at 13:06, Simon Ser  wrote:

> I'm working on a Wayland extension [1] that, among other things, allows
> compositors to advertise the preferred device to be used by Wayland
> clients.
>
> In general, compositors will send a render node. However, in the case
> of split render/display SoCs, things get a little bit complicated.
>
> [...]
>

Thanks for the write-up Simon!

> There are a few solutions:
>
> 1. Require compositors to discover the render device by trying to import
>a buffer. For each available render device, the compositor would
>allocate a buffer, export it as a DMA-BUF, import it to the
>display-only device, then try to drmModeAddFB.
>

I don't think this is actually tractable? Assuming that 'allocate a buffer'
means 'obtain a gbm_device for the render node directly and allocate a
gbm_bo from it', even with compatible formats and modifiers this will fail
for more restrictive display hardware. imx-drm and pl111 (combined with vc4
on some Raspberry Pis) will fail this, since they'll take different
allocation paths when they're bound through kmsro vs. directly, accounting
for things like contiguous allocation. So we'd get false negatives on at
least some platforms.

> 2. Allow compositors to query the render device magically opened by
>kmsro. This could be done either via EGL_EXT_device_drm, or via a
>new EGL extension.
>

This would be my strong preference, and I don't entirely understand
anholt's pushback here. The way I see it, GBM is about allocation for
scanout, and EGL is about rendering. If, on a split GPU/display system, we
create a gbm_device from a KMS display-only device node, then creating an
EGLDisplay from that magically binds us to a completely different DRM GPU
node, and anything using that EGLDisplay will use that GPU device to render.

Being able to discover the GPU device node through the device query is
really useful, because it tells us exactly what implicit magic EGL did
under the hood, and about the device that EGL will use. Being able to
discover the display node is much less useful; it does tell us how GBM will
allocate buffers, but the user already knows which device is in use because
they supplied it to GBM. I see the display node as a property of GBM, and
the GPU node as a property of EGL, even if EGL does do (*waves hands*)
stuff under the hood to ensure the two are compatible.

If we had EGL_EXT_explicit_device, things get even more weird, I think;
would the device query on an EGLDisplay created with a combination of a
gbm_device native display handle and an explicit EGLDevice handle return
the scanout device from GBM or the GPU device from EGL? On my reading, I'd
expect it to be the latter; if the queries returned very different things
based on whether GPU device selection was implicit (returning the KMS node)
or explicit (GPU node), that would definitely violate the principle of
least surprise.

> 3. Allow compositors to query the kernel drivers to know which devices
>are compatible with each other. Some uAPI to query a compatible
>display device from a render-only device, or vice-versa, has been
>suggested in the past.
>

What does 'compatible' mean? Would an Intel iGPU and and AMD dGPU be
compatible with each other? Would a Mali GPU bound to system memory through
AMBA be as compatible with the display controller as it would with an AMD
GPU on PCIE? I think a query which only exposed whether or not devices
could share dmabufs with each other is far too generic to be helpful for
the actual usecase we have, as well as not being useful enough for other
usecases ('well you _can_ use dmabufs from your AMD GPU on your Mali GPU,
but only if they were allocated in the right domain').

> (1) has a number of limitations and gotchas. It requires allocating
> real buffers, this has a rather big cost for something done at
> compositor initialization time. It requires to select a buffer format
> and modifier compatible with both devices, so it can't be isolated in
> a simple function (and e.g. shared between all compositors in libdrm).
>

We're already going to have to do throwaway allocations to make Intel's
tiled modes work; I'd rather not extend this out to doing throwaway
allocations across device combinations as well as modifier lists.

> Some drivers will allow to drmModeAddFB buffers that can't be scanned
> out, and will only reject the buffer at atomic commit time.
>

This is 100% a KMS driver bug and should be fixed there. It's not
catastrophic, since commits can fail for any reason or none at all and
compositors are expected to handle this, but they should absolutely be
rejecting buffers which can never be scanned out at all at AddFB time.

> (2) wouldn't work with non-EGL APIs such as Vulkan. Eric Anholt seemed
> pretty opposed to this idea, but I didn't fully understood why.
>

Well, Vulkan doesn't have GBM in the same way, right? In the Vulkan case,
we already know exactly what the GPU is, because it's th

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 12:15 PM Christian König
 wrote:
>
> Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > Not going to comment on everything on the first pass...
> >
> > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> >> Hi,
> >>
> >> This is our initial proposal for explicit fences everywhere and new memory 
> >> management that doesn't use BO fences. It's a redesign of how Linux 
> >> graphics drivers work, and it can coexist with what we have now.
> >>
> >>
> >> 1. Introduction
> >> (skip this if you are already sold on explicit fences)
> >>
> >> The current Linux graphics architecture was initially designed for GPUs 
> >> with only one graphics queue where everything was executed in the 
> >> submission order and per-BO fences were used for memory management and 
> >> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple 
> >> queues were added on top, which required the introduction of implicit 
> >> GPU-GPU synchronization between queues of different processes using per-BO 
> >> fences. Recently, even parallel execution within one queue was enabled 
> >> where a command buffer starts draws and compute shaders, but doesn't wait 
> >> for them, enabling parallelism between back-to-back command buffers. 
> >> Modesetting also uses per-BO fences for scheduling flips. Our GPU 
> >> scheduler was created to enable all those use cases, and it's the only 
> >> reason why the scheduler exists.
> >>
> >> The GPU scheduler, implicit synchronization, BO-fence-based memory 
> >> management, and the tracking of per-BO fences increase CPU overhead and 
> >> latency, and reduce parallelism. There is a desire to replace all of them 
> >> with something much simpler. Below is how we could do it.
> >>
> >>
> >> 2. Explicit synchronization for window systems and modesetting
> >>
> >> The producer is an application and the consumer is a compositor or a 
> >> modesetting driver.
> >>
> >> 2.1. The Present request
> >>
> >> As part of the Present request, the producer will pass 2 fences (sync 
> >> objects) to the consumer alongside the presented DMABUF BO:
> >> - The submit fence: Initially unsignalled, it will be signalled when the 
> >> producer has finished drawing into the presented buffer.
> >> - The return fence: Initially unsignalled, it will be signalled when the 
> >> consumer has finished using the presented buffer.
> > I'm not sure syncobj is what we want.  In the Intel world we're trying
> > to go even further to something we're calling "userspace fences" which
> > are a timeline implemented as a single 64-bit value in some
> > CPU-mappable BO.  The client writes a higher value into the BO to
> > signal the timeline.
>
> Well that is exactly what our Windows guys have suggested as well, but
> it strongly looks like that this isn't sufficient.
>
> First of all you run into security problems when any application can
> just write any value to that memory location. Just imagine an
> application sets the counter to zero and X waits forever for some
> rendering to finish.

The thing is, with userspace fences security boundary issue prevent
moves into userspace entirely. And it really doesn't matter whether
the event you're waiting on doesn't complete because the other app
crashed or was stupid or intentionally gave you a wrong fence point:
You have to somehow handle that, e.g. perhaps with conditional
rendering and just using the old frame in compositing if the new one
doesn't show up in time. Or something like that. So trying to get the
kernel involved but also not so much involved sounds like a bad design
to me.

> Additional to that in such a model you can't determine who is the guilty
> queue in case of a hang and can't reset the synchronization primitives
> in case of an error.
>
> Apart from that this is rather inefficient, e.g. we don't have any way
> to prevent priority inversion when used as a synchronization mechanism
> between different GPU queues.

Yeah but you can't have it both ways. Either all the scheduling in the
kernel and fence handling is a problem, or you actually want to
schedule in the kernel. hw seems to definitely move towards the more
stupid spinlock-in-hw model (and direct submit from userspace and all
that), priority inversions be damned. I'm really not sure we should
fight that - if it's really that inefficient then maybe hw will add
support for waiting sync constructs in hardware, or at least be
smarter about scheduling other stuff. E.g. on intel hw both the kernel
scheduler and fw scheduler knows when you're spinning on a hw fence
(whether userspace or kernel doesn't matter) and plugs in something
else. Add in a bit of hw support to watch cachelines, and you have
something which can handle both directions efficiently.

Imo given where hw is going, we shouldn't try to be too clever here.
The only thing we do need to provision is being able to do cpu side
waits without spinning. And that should probably be done in a fairly
gpu specific way still.
-Daniel

> Chri

[PATCH 1/2] drm/i915: Remove asynchronous vma binding

2021-04-20 Thread Maarten Lankhorst

Commit e3793468b466 ("drm/i915: Use the async worker to avoid reclaim
tainting the ggtt->mutex") moves the vma binding to dma_fence_work,
but dma_fence_work has has signalling fence semantics.

On braswell, we can call stop_machine inside fence_work, causing a splat
because memory allocation inside stop_machine is allowed.

This patch does not fix that lockdep splat yet, but will allow the next
patch to remove it.

Signed-off-by: Maarten Lankhorst 
Acked-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c |   3 -
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c   |   1 -
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c   |   1 -
 drivers/gpu/drm/i915/gt/intel_ggtt.c   |   4 -
 drivers/gpu/drm/i915/gt/intel_gtt.h|   2 -
 drivers/gpu/drm/i915/i915_gem.c|   6 -
 drivers/gpu/drm/i915/i915_vma.c| 174 +++--
 drivers/gpu/drm/i915/i915_vma.h|   5 +-
 drivers/gpu/drm/i915/i915_vma_types.h  |   3 -
 9 files changed, 21 insertions(+), 178 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index b0597de206de..ec04df0a3b89 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -505,9 +505,6 @@ static void dbg_poison(struct i915_ggtt *ggtt,
if (!drm_mm_node_allocated(&ggtt->error_capture))
return;
 
-   if (ggtt->vm.bind_async_flags & I915_VMA_GLOBAL_BIND)
-   return; /* beware stop_machine() inversion */
-
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
 
mutex_lock(&ggtt->error_mutex);
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index e08dff376339..0c5a9a767849 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -436,7 +436,6 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
ppgtt->base.vm.top = 1;
 
-   ppgtt->base.vm.bind_async_flags = I915_VMA_LOCAL_BIND;
ppgtt->base.vm.allocate_va_range = gen6_alloc_va_range;
ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 176c19633412..92f8a23e66cc 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -736,7 +736,6 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt)
goto err_free_pd;
}
 
-   ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
ppgtt->vm.insert_entries = gen8_ppgtt_insert;
ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
ppgtt->vm.clear_range = gen8_ppgtt_clear;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 670c1271e7d5..e1ec6edae1fb 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -127,7 +127,6 @@ void i915_ggtt_suspend(struct i915_ggtt *ggtt)
 
list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) {
GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
-   i915_vma_wait_for_bind(vma);
 
if (i915_vma_is_pinned(vma))
continue;
@@ -671,7 +670,6 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
ppgtt->vm.allocate_va_range(&ppgtt->vm, &stash, 0, ggtt->vm.total);
 
ggtt->alias = ppgtt;
-   ggtt->vm.bind_async_flags |= ppgtt->vm.bind_async_flags;
 
GEM_BUG_ON(ggtt->vm.vma_ops.bind_vma != ggtt_bind_vma);
ggtt->vm.vma_ops.bind_vma = aliasing_gtt_bind_vma;
@@ -911,8 +909,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
IS_CHERRYVIEW(i915) /* fails with concurrent use/update */) {
ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
ggtt->vm.insert_page= bxt_vtd_ggtt_insert_page__BKL;
-   ggtt->vm.bind_async_flags =
-   I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND;
}
 
ggtt->invalidate = gen8_ggtt_invalidate;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
b/drivers/gpu/drm/i915/gt/intel_gtt.h
index e67e34e17913..d9d2ca8b4b61 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -230,8 +230,6 @@ struct i915_address_space {
u64 total;  /* size addr space maps (ex. 2GB for ggtt) */
u64 reserved;   /* size addr space reserved */
 
-   unsigned int bind_async_flags;
-
/*
 * Each active user context has its own address space (in full-ppgtt).
 * Since the vm may be shared between multiple contexts, we count how
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b23f58e94cfb..4639c47c038b 100644
--- a/drivers/gpu/d

[PATCH 2/2] drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2.

2021-04-20 Thread Maarten Lankhorst

The stop_machine() lock may allocate memory, but is called inside
vm->mutex, which is taken in the shrinker. This will cause a lockdep
splat, as can be seen below:

<4>[  462.585762] ==
<4>[  462.585768] WARNING: possible circular locking dependency detected
<4>[  462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G U
<4>[  462.585779] --
<4>[  462.585783] i915_selftest/5540 is trying to acquire lock:
<4>[  462.585788] 826440b0 (cpu_hotplug_lock){}-{0:0}, at: 
stop_machine+0x12/0x30
<4>[  462.585814]
  but task is already holding lock:
<4>[  462.585818] 888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: 
i915_vma_pin_ww+0x38e/0xb40 [i915]
<4>[  462.586301]
  which lock already depends on the new lock.

<4>[  462.586305]
  the existing dependency chain (in reverse order) is:
<4>[  462.586309]
  -> #2 (&vm->mutex/1){+.+.}-{3:3}:
<4>[  462.586323]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915]
<4>[  462.586719]i915_address_space_init+0x12d/0x130 [i915]
<4>[  462.587092]ppgtt_init+0x4e/0x80 [i915]
<4>[  462.587467]gen8_ppgtt_create+0x3e/0x5c0 [i915]
<4>[  462.587828]i915_ppgtt_create+0x28/0xf0 [i915]
<4>[  462.588203]intel_gt_init+0x123/0x370 [i915]
<4>[  462.588572]i915_gem_init+0x129/0x1f0 [i915]
<4>[  462.588971]i915_driver_probe+0x753/0xd80 [i915]
<4>[  462.589320]i915_pci_probe+0x43/0x1d0 [i915]
<4>[  462.589671]pci_device_probe+0x9e/0x110
<4>[  462.589680]really_probe+0xea/0x410
<4>[  462.589690]driver_probe_device+0xd9/0x140
<4>[  462.589697]device_driver_attach+0x4a/0x50
<4>[  462.589704]__driver_attach+0x83/0x140
<4>[  462.589711]bus_for_each_dev+0x75/0xc0
<4>[  462.589718]bus_add_driver+0x14b/0x1f0
<4>[  462.589724]driver_register+0x66/0xb0
<4>[  462.589731]i915_init+0x70/0x87 [i915]
<4>[  462.590053]do_one_initcall+0x56/0x2e0
<4>[  462.590061]do_init_module+0x55/0x200
<4>[  462.590068]load_module+0x2703/0x2990
<4>[  462.590074]__do_sys_finit_module+0xad/0x110
<4>[  462.590080]do_syscall_64+0x33/0x80
<4>[  462.590089]entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  462.590096]
  -> #1 (fs_reclaim){+.+.}-{0:0}:
<4>[  462.590109]fs_reclaim_acquire+0x9f/0xd0
<4>[  462.590118]kmem_cache_alloc_trace+0x3d/0x430
<4>[  462.590126]intel_cpuc_prepare+0x3b/0x1b0
<4>[  462.590133]cpuhp_invoke_callback+0x9e/0x890
<4>[  462.590141]_cpu_up+0xa4/0x130
<4>[  462.590147]cpu_up+0x82/0x90
<4>[  462.590153]bringup_nonboot_cpus+0x4a/0x60
<4>[  462.590159]smp_init+0x21/0x5c
<4>[  462.590167]kernel_init_freeable+0x8a/0x1b7
<4>[  462.590175]kernel_init+0x5/0xff
<4>[  462.590181]ret_from_fork+0x22/0x30
<4>[  462.590187]
  -> #0 (cpu_hotplug_lock){}-{0:0}:
<4>[  462.590199]__lock_acquire+0x1520/0x2590
<4>[  462.590207]lock_acquire+0xd1/0x3d0
<4>[  462.590213]cpus_read_lock+0x39/0xc0
<4>[  462.590219]stop_machine+0x12/0x30
<4>[  462.590226]bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
<4>[  462.590601]ggtt_bind_vma+0x5d/0x80 [i915]
<4>[  462.590970]i915_vma_bind+0xdc/0x1c0 [i915]
<4>[  462.591374]i915_vma_pin_ww+0x435/0xb40 [i915]
<4>[  462.591779]make_obj_busy+0xcb/0x330 [i915]
<4>[  462.592170]igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915]
<4>[  462.592562]__i915_subtests.cold.7+0x42/0x92 [i915]
<4>[  462.592995]__run_selftests.part.3+0x10d/0x172 [i915]
<4>[  462.593428]i915_live_selftests.cold.5+0x1f/0x47 [i915]
<4>[  462.593860]i915_pci_probe+0x93/0x1d0 [i915]
<4>[  462.594210]pci_device_probe+0x9e/0x110
<4>[  462.594217]really_probe+0xea/0x410
<4>[  462.594226]driver_probe_device+0xd9/0x140
<4>[  462.594233]device_driver_attach+0x4a/0x50
<4>[  462.594240]__driver_attach+0x83/0x140
<4>[  462.594247]bus_for_each_dev+0x75/0xc0
<4>[  462.594254]bus_add_driver+0x14b/0x1f0
<4>[  462.594260]driver_register+0x66/0xb0
<4>[  462.594267]i915_init+0x70/0x87 [i915]
<4>[  462.594586]do_one_initcall+0x56/0x2e0
<4>[  462.594592]do_init_module+0x55/0x200
<4>[  462.594599]load_module+0x2703/0x2990
<4>[  462.594605]__do_sys_finit_module+0xad/0x110
<4>[  462.594612]do_syscall_64+0x33/0x80
<4>[  462.594618]entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  462.594625]
  other info that might help us debug this:

<4>[  462.594629] Chain exists of:
cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1

<4>[  462.594645]  Possible unsafe locking scenario:

<4>[  462.594648]CPU0

[PATCH 1/1] drm/amdgpu: make sure we unpin the UVD BO

2021-04-20 Thread Nirmoy Das

Releasing pinned BOs is illegal now.
UVD 6 was missing from:
commit 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO")

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 760859880c1e..4eebf973a065 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -357,6 +357,7 @@ static int uvd_v6_0_enc_ring_test_ib(struct amdgpu_ring 
*ring, long timeout)
 
 error:
dma_fence_put(fence);
+   amdgpu_bo_unpin(bo);
amdgpu_bo_unreserve(bo);
amdgpu_bo_unref(&bo);
return r;
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Marek Olšák

Daniel, are you suggesting that we should skip any deadlock prevention in
the kernel, and just let userspace wait for and signal any fence it has
access to?

Do you have any concern with the deprecation/removal of BO fences in the
kernel assuming userspace is only using explicit fences? Any concern with
the submit and return fences for modesetting and other producer<->consumer
scenarios?

Thanks,
Marek

On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:

> On Tue, Apr 20, 2021 at 12:15 PM Christian König
>  wrote:
> >
> > Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > > Not going to comment on everything on the first pass...
> > >
> > > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> > >> Hi,
> > >>
> > >> This is our initial proposal for explicit fences everywhere and new
> memory management that doesn't use BO fences. It's a redesign of how Linux
> graphics drivers work, and it can coexist with what we have now.
> > >>
> > >>
> > >> 1. Introduction
> > >> (skip this if you are already sold on explicit fences)
> > >>
> > >> The current Linux graphics architecture was initially designed for
> GPUs with only one graphics queue where everything was executed in the
> submission order and per-BO fences were used for memory management and
> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> queues were added on top, which required the introduction of implicit
> GPU-GPU synchronization between queues of different processes using per-BO
> fences. Recently, even parallel execution within one queue was enabled
> where a command buffer starts draws and compute shaders, but doesn't wait
> for them, enabling parallelism between back-to-back command buffers.
> Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
> was created to enable all those use cases, and it's the only reason why the
> scheduler exists.
> > >>
> > >> The GPU scheduler, implicit synchronization, BO-fence-based memory
> management, and the tracking of per-BO fences increase CPU overhead and
> latency, and reduce parallelism. There is a desire to replace all of them
> with something much simpler. Below is how we could do it.
> > >>
> > >>
> > >> 2. Explicit synchronization for window systems and modesetting
> > >>
> > >> The producer is an application and the consumer is a compositor or a
> modesetting driver.
> > >>
> > >> 2.1. The Present request
> > >>
> > >> As part of the Present request, the producer will pass 2 fences (sync
> objects) to the consumer alongside the presented DMABUF BO:
> > >> - The submit fence: Initially unsignalled, it will be signalled when
> the producer has finished drawing into the presented buffer.
> > >> - The return fence: Initially unsignalled, it will be signalled when
> the consumer has finished using the presented buffer.
> > > I'm not sure syncobj is what we want.  In the Intel world we're trying
> > > to go even further to something we're calling "userspace fences" which
> > > are a timeline implemented as a single 64-bit value in some
> > > CPU-mappable BO.  The client writes a higher value into the BO to
> > > signal the timeline.
> >
> > Well that is exactly what our Windows guys have suggested as well, but
> > it strongly looks like that this isn't sufficient.
> >
> > First of all you run into security problems when any application can
> > just write any value to that memory location. Just imagine an
> > application sets the counter to zero and X waits forever for some
> > rendering to finish.
>
> The thing is, with userspace fences security boundary issue prevent
> moves into userspace entirely. And it really doesn't matter whether
> the event you're waiting on doesn't complete because the other app
> crashed or was stupid or intentionally gave you a wrong fence point:
> You have to somehow handle that, e.g. perhaps with conditional
> rendering and just using the old frame in compositing if the new one
> doesn't show up in time. Or something like that. So trying to get the
> kernel involved but also not so much involved sounds like a bad design
> to me.
>
> > Additional to that in such a model you can't determine who is the guilty
> > queue in case of a hang and can't reset the synchronization primitives
> > in case of an error.
> >
> > Apart from that this is rather inefficient, e.g. we don't have any way
> > to prevent priority inversion when used as a synchronization mechanism
> > between different GPU queues.
>
> Yeah but you can't have it both ways. Either all the scheduling in the
> kernel and fence handling is a problem, or you actually want to
> schedule in the kernel. hw seems to definitely move towards the more
> stupid spinlock-in-hw model (and direct submit from userspace and all
> that), priority inversions be damned. I'm really not sure we should
> fight that - if it's really that inefficient then maybe hw will add
> support for waiting sync constructs in hardware, or at least be
> smarter about scheduling other stuff. E.g. on intel hw both

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Michal Hocko

On Mon 19-04-21 18:37:13, Christian König wrote:
> Am 19.04.21 um 18:11 schrieb Michal Hocko:
[...]
> > The question is not whether it is NUMA aware but whether it is useful to
> > know per-numa data for the purpose the counter is supposed to serve.
> 
> No, not at all. The pages of a single DMA-buf could even be from different
> NUMA nodes if the exporting driver decides that this is somehow useful.

As the use of the counter hasn't been explained yet I can only
speculate. One thing that I can imagine to be useful is to fill gaps in
our accounting. It is quite often that the memroy accounted in
/proc/meminfo (or oom report) doesn't add up to the overall memory
usage. In some workloads the workload can be huge! In many cases there
are other means to find out additional memory by a subsystem specific
interfaces (e.g. networking buffers). I do assume that dma-buf is just
one of those and the counter can fill the said gap at least partially
for some workloads. That is definitely useful.

What I am trying to bring up with NUMA side is that the same problem can
happen on per-node basis. Let's say that some user consumes unexpectedly
large amount of dma-buf on a certain node. This can lead to observable
performance impact on anybody on allocating from that node and even
worse cause an OOM for node bound consumers. How do I find out that it
was dma-buf that has caused the problem?

See where I am heading?
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 0/9] drm: Support simple-framebuffer devices and firmware fbs

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 11:16:09AM +0200, Geert Uytterhoeven wrote:
> Hi Daniel,
> 
> On Tue, Apr 20, 2021 at 10:46 AM Daniel Vetter  wrote:
> > On Mon, Apr 19, 2021 at 10:00:56AM +0200, Geert Uytterhoeven wrote:
> > > On Fri, Apr 16, 2021 at 11:00 AM Thomas Zimmermann  
> > > wrote:
> > > > This patchset adds support for simple-framebuffer platform devices and
> > > > a handover mechanism for native drivers to take-over control of the
> > > > hardware.
> > > >
> > > > The new driver, called simpledrm, binds to a simple-frambuffer platform
> > > > device. The kernel's boot code creates such devices for 
> > > > firmware-provided
> > > > framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot
> > > > loader sets up the framebuffers. Description via device tree is also an
> > > > option.
> > >
> > > I guess this can be used as a replacement for offb, too...
> > >
> > > > Patches 4 to 8 add the simpledrm driver. It's build on simple DRM 
> > > > helpers
> > > > and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. 
> > > > During
> > >
> > >  if support for 8-bit frame buffers would be added?
> >
> > Is that 8-bit greyscale or 8-bit indexed with 256 entry palette? Former
> 
> 8-bit indexed with 256 entry palette.
> 
> > shouldn't be a big thing, but the latter is only really supported by the
> > overall drm ecosystem in theory. Most userspace assumes that xrgb
> > works, and we keep that illusion up by emulating it in kernel for hw which
> > just doesn't support it. But reformatting xrgb to c8 is tricky at
> > best. The uapis are all there for setting the palette, and C8 is a defined
> > format even with atomic kms interface, but really there's not much
> > userspace for it. In other words, it would work as well as current offb
> > would, but that's at least that.
> 
> Oh, that's good to know!
> Does this mean fbdev is not deprecated for anything <= C8? ;-)

Nope. It just means you wont be able to use drm-only userspace with it
most likely, without also investing a ton of effort into porting those
over.

> A while ago, I was looking into converting an fbdev driver to drm, and
> one of the things I ran into is lack of C4, C2, C1, Y8, Y4, Y2, and
> monochrome support.  On top of that, lots of internal code seems to
> assume pixels are never smaller than a byte (thus ignoring
> char_per_block[]/block_w).  The lack of support for planar modes isn't
> that bad, combined with the need for copying, as c2p conversion can be
> done while copying, thus even making life easier for userspace
> applications that can just always work on chunky data.
> Then real work kicked in, before I got anything working...

We support drm_fourcc, so adding more pixel formats is not problem at all.
Anything indexed/paletted will simply not work great with unchanged drm
userspace, because you can't really convert it over from the common
denominator of xrgb888. But if it's just about adding support, adding more
fourcc codes isn't a big deal. The fbdev layer hasn't been taught about
fourcc codes yet, but that's also just for lack of need by anyone.

Also we support abitrary uneven pixel packing too, with some generic
support for anything that's at least somewhat regular. That's been the
case for a while now. It was added for fancy tiling and compression
formats, but works equally well for anything else that's aligned different
than what can be described with simplistic bytes-per-pixel only.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 09:26:00AM +, peter.enderb...@sony.com wrote:
> On 4/20/21 10:58 AM, Daniel Vetter wrote:
> > On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
> >> This adds a total used dma-buf memory. Details
> >> can be found in debugfs, however it is not for everyone
> >> and not always available. dma-buf are indirect allocated by
> >> userspace. So with this value we can monitor and detect
> >> userspace applications that have problems.
> >>
> >> Signed-off-by: Peter Enderborg 
> > So there have been tons of discussions around how to track dma-buf and
> > why, and I really need to understand the use-cass here first I think. proc
> > uapi is as much forever as anything else, and depending what you're doing
> > this doesn't make any sense at all:
> >
> > - on most linux systems dma-buf are only instantiated for shared buffer.
> >   So there this gives you a fairly meaningless number and not anything
> >   reflecting gpu memory usage at all.
> >
> > - on Android all buffers are allocated through dma-buf afaik. But there
> >   we've recently had some discussions about how exactly we should track
> >   all this, and the conclusion was that most of this should be solved by
> >   cgroups long term. So if this is for Android, then I don't think adding
> >   random quick stop-gaps to upstream is a good idea (because it's a pretty
> >   long list of patches that have come up on this).
> >
> > So what is this for?
> 
> For the overview. dma-buf today only have debugfs for info. Debugfs
> is not allowed by google to use in andoid. So this aggregate the information
> so we can get information on what going on on the system. 
> 
> And the LKML standard respond to that is "SHOW ME THE CODE".

Yes. Except this extends to how exactly this is supposed to be used in
userspace and acted upon.

> When the top memgc has a aggregated information on dma-buf it is maybe
> a better source to meminfo. But then it also imply that dma-buf requires 
> memcg.
> 
> And I dont see any problem to replace this with something better with it is 
> ready.

The thing is, this is uapi. Once it's merged we cannot, ever, replace it.
It must be kept around forever, or a very close approximation thereof. So
merging this with the justification that we can fix it later on or replace
isn't going to happen.
-Daniel

> 
> > -Daniel
> >
> >> ---
> >>  drivers/dma-buf/dma-buf.c | 12 
> >>  fs/proc/meminfo.c |  5 -
> >>  include/linux/dma-buf.h   |  1 +
> >>  3 files changed, 17 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >> index f264b70c383e..4dc37cd4293b 100644
> >> --- a/drivers/dma-buf/dma-buf.c
> >> +++ b/drivers/dma-buf/dma-buf.c
> >> @@ -37,6 +37,7 @@ struct dma_buf_list {
> >>  };
> >>  
> >>  static struct dma_buf_list db_list;
> >> +static atomic_long_t dma_buf_global_allocated;
> >>  
> >>  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int 
> >> buflen)
> >>  {
> >> @@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
> >>if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
> >>dma_resv_fini(dmabuf->resv);
> >>  
> >> +  atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
> >>module_put(dmabuf->owner);
> >>kfree(dmabuf->name);
> >>kfree(dmabuf);
> >> @@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
> >> dma_buf_export_info *exp_info)
> >>mutex_lock(&db_list.lock);
> >>list_add(&dmabuf->list_node, &db_list.head);
> >>mutex_unlock(&db_list.lock);
> >> +  atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
> >>  
> >>return dmabuf;
> >>  
> >> @@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
> >> dma_buf_map *map)
> >>  }
> >>  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
> >>  
> >> +/**
> >> + * dma_buf_allocated_pages - Return the used nr of pages
> >> + * allocated for dma-buf
> >> + */
> >> +long dma_buf_allocated_pages(void)
> >> +{
> >> +  return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
> >> +}
> >> +
> >>  #ifdef CONFIG_DEBUG_FS
> >>  static int dma_buf_debug_show(struct seq_file *s, void *unused)
> >>  {
> >> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> >> index 6fa761c9cc78..ccc7c40c8db7 100644
> >> --- a/fs/proc/meminfo.c
> >> +++ b/fs/proc/meminfo.c
> >> @@ -16,6 +16,7 @@
> >>  #ifdef CONFIG_CMA
> >>  #include 
> >>  #endif
> >> +#include 
> >>  #include 
> >>  #include "internal.h"
> >>  
> >> @@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void 
> >> *v)
> >>show_val_kb(m, "CmaFree:",
> >>global_zone_page_state(NR_FREE_CMA_PAGES));
> >>  #endif
> >> -
> >> +#ifdef CONFIG_DMA_SHARED_BUFFER
> >> +  show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
> >> +#endif
> >>hugetlb_report_meminfo(m);
> >>  
> >>arch_report_meminfo(m);
> >> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >> index ef

Re: Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints

2021-04-20 Thread Daniel Vetter

Just 2 comments on the kernel aspects here.

On Tue, Apr 20, 2021 at 12:18 PM Daniel Stone  wrote:
>
> Hi,
>
> On Mon, 19 Apr 2021 at 13:06, Simon Ser  wrote:
>>
>> I'm working on a Wayland extension [1] that, among other things, allows
>> compositors to advertise the preferred device to be used by Wayland
>> clients.
>>
>> In general, compositors will send a render node. However, in the case
>> of split render/display SoCs, things get a little bit complicated.
>>
>> [...]
>
>
> Thanks for the write-up Simon!
>
>>
>> There are a few solutions:
>>
>> 1. Require compositors to discover the render device by trying to import
>>a buffer. For each available render device, the compositor would
>>allocate a buffer, export it as a DMA-BUF, import it to the
>>display-only device, then try to drmModeAddFB.
>
>
> I don't think this is actually tractable? Assuming that 'allocate a buffer' 
> means 'obtain a gbm_device for the render node directly and allocate a gbm_bo 
> from it', even with compatible formats and modifiers this will fail for more 
> restrictive display hardware. imx-drm and pl111 (combined with vc4 on some 
> Raspberry Pis) will fail this, since they'll take different allocation paths 
> when they're bound through kmsro vs. directly, accounting for things like 
> contiguous allocation. So we'd get false negatives on at least some platforms.
>
>>
>> 2. Allow compositors to query the render device magically opened by
>>kmsro. This could be done either via EGL_EXT_device_drm, or via a
>>new EGL extension.
>
>
> This would be my strong preference, and I don't entirely understand anholt's 
> pushback here. The way I see it, GBM is about allocation for scanout, and EGL 
> is about rendering. If, on a split GPU/display system, we create a gbm_device 
> from a KMS display-only device node, then creating an EGLDisplay from that 
> magically binds us to a completely different DRM GPU node, and anything using 
> that EGLDisplay will use that GPU device to render.
>
> Being able to discover the GPU device node through the device query is really 
> useful, because it tells us exactly what implicit magic EGL did under the 
> hood, and about the device that EGL will use. Being able to discover the 
> display node is much less useful; it does tell us how GBM will allocate 
> buffers, but the user already knows which device is in use because they 
> supplied it to GBM. I see the display node as a property of GBM, and the GPU 
> node as a property of EGL, even if EGL does do (*waves hands*) stuff under 
> the hood to ensure the two are compatible.
>
> If we had EGL_EXT_explicit_device, things get even more weird, I think; would 
> the device query on an EGLDisplay created with a combination of a gbm_device 
> native display handle and an explicit EGLDevice handle return the scanout 
> device from GBM or the GPU device from EGL? On my reading, I'd expect it to 
> be the latter; if the queries returned very different things based on whether 
> GPU device selection was implicit (returning the KMS node) or explicit (GPU 
> node), that would definitely violate the principle of least surprise.
>
>>
>> 3. Allow compositors to query the kernel drivers to know which devices
>>are compatible with each other. Some uAPI to query a compatible
>>display device from a render-only device, or vice-versa, has been
>>suggested in the past.
>
>
> What does 'compatible' mean? Would an Intel iGPU and and AMD dGPU be 
> compatible with each other? Would a Mali GPU bound to system memory through 
> AMBA be as compatible with the display controller as it would with an AMD GPU 
> on PCIE? I think a query which only exposed whether or not devices could 
> share dmabufs with each other is far too generic to be helpful for the actual 
> usecase we have, as well as not being useful enough for other usecases ('well 
> you _can_ use dmabufs from your AMD GPU on your Mali GPU, but only if they 
> were allocated in the right domain').
>
>>
>> (1) has a number of limitations and gotchas. It requires allocating
>> real buffers, this has a rather big cost for something done at
>> compositor initialization time. It requires to select a buffer format
>> and modifier compatible with both devices, so it can't be isolated in
>> a simple function (and e.g. shared between all compositors in libdrm).
>
>
> We're already going to have to do throwaway allocations to make Intel's tiled 
> modes work; I'd rather not extend this out to doing throwaway allocations 
> across device combinations as well as modifier lists.
>
>>
>> Some drivers will allow to drmModeAddFB buffers that can't be scanned
>> out, and will only reject the buffer at atomic commit time.
>
>
> This is 100% a KMS driver bug and should be fixed there. It's not 
> catastrophic, since commits can fail for any reason or none at all and 
> compositors are expected to handle this, but they should absolutely be 
> rejecting buffers which can never be scanned out at all at Add

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote:
> Daniel, are you suggesting that we should skip any deadlock prevention in
> the kernel, and just let userspace wait for and signal any fence it has
> access to?

Yeah. If we go with userspace fences, then userspace can hang itself. Not
the kernel's problem. The only criteria is that the kernel itself must
never rely on these userspace fences, except for stuff like implementing
optimized cpu waits. And in those we must always guarantee that the
userspace process remains interruptible.

It's a completely different world from dma_fence based kernel fences,
whether those are implicit or explicit.

> Do you have any concern with the deprecation/removal of BO fences in the
> kernel assuming userspace is only using explicit fences? Any concern with
> the submit and return fences for modesetting and other producer<->consumer
> scenarios?

Let me work on the full replay for your rfc first, because there's a lot
of details here and nuance.
-Daniel

> 
> Thanks,
> Marek
> 
> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:
> 
> > On Tue, Apr 20, 2021 at 12:15 PM Christian König
> >  wrote:
> > >
> > > Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > > > Not going to comment on everything on the first pass...
> > > >
> > > > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> > > >> Hi,
> > > >>
> > > >> This is our initial proposal for explicit fences everywhere and new
> > memory management that doesn't use BO fences. It's a redesign of how Linux
> > graphics drivers work, and it can coexist with what we have now.
> > > >>
> > > >>
> > > >> 1. Introduction
> > > >> (skip this if you are already sold on explicit fences)
> > > >>
> > > >> The current Linux graphics architecture was initially designed for
> > GPUs with only one graphics queue where everything was executed in the
> > submission order and per-BO fences were used for memory management and
> > CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> > queues were added on top, which required the introduction of implicit
> > GPU-GPU synchronization between queues of different processes using per-BO
> > fences. Recently, even parallel execution within one queue was enabled
> > where a command buffer starts draws and compute shaders, but doesn't wait
> > for them, enabling parallelism between back-to-back command buffers.
> > Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
> > was created to enable all those use cases, and it's the only reason why the
> > scheduler exists.
> > > >>
> > > >> The GPU scheduler, implicit synchronization, BO-fence-based memory
> > management, and the tracking of per-BO fences increase CPU overhead and
> > latency, and reduce parallelism. There is a desire to replace all of them
> > with something much simpler. Below is how we could do it.
> > > >>
> > > >>
> > > >> 2. Explicit synchronization for window systems and modesetting
> > > >>
> > > >> The producer is an application and the consumer is a compositor or a
> > modesetting driver.
> > > >>
> > > >> 2.1. The Present request
> > > >>
> > > >> As part of the Present request, the producer will pass 2 fences (sync
> > objects) to the consumer alongside the presented DMABUF BO:
> > > >> - The submit fence: Initially unsignalled, it will be signalled when
> > the producer has finished drawing into the presented buffer.
> > > >> - The return fence: Initially unsignalled, it will be signalled when
> > the consumer has finished using the presented buffer.
> > > > I'm not sure syncobj is what we want.  In the Intel world we're trying
> > > > to go even further to something we're calling "userspace fences" which
> > > > are a timeline implemented as a single 64-bit value in some
> > > > CPU-mappable BO.  The client writes a higher value into the BO to
> > > > signal the timeline.
> > >
> > > Well that is exactly what our Windows guys have suggested as well, but
> > > it strongly looks like that this isn't sufficient.
> > >
> > > First of all you run into security problems when any application can
> > > just write any value to that memory location. Just imagine an
> > > application sets the counter to zero and X waits forever for some
> > > rendering to finish.
> >
> > The thing is, with userspace fences security boundary issue prevent
> > moves into userspace entirely. And it really doesn't matter whether
> > the event you're waiting on doesn't complete because the other app
> > crashed or was stupid or intentionally gave you a wrong fence point:
> > You have to somehow handle that, e.g. perhaps with conditional
> > rendering and just using the old frame in compositing if the new one
> > doesn't show up in time. Or something like that. So trying to get the
> > kernel involved but also not so much involved sounds like a bad design
> > to me.
> >
> > > Additional to that in such a model you can't determine who is the guilty
> > > queue in case of a hang and can't reset the sync

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 1:04 PM, Michal Hocko wrote:
> On Tue 20-04-21 09:25:51, peter.enderb...@sony.com wrote:
>> On 4/20/21 11:12 AM, Michal Hocko wrote:
>>> On Tue 20-04-21 09:02:57, peter.enderb...@sony.com wrote:
>> But that isn't really system memory at all, it's just allocated device
>> memory.
> OK, that was not really clear to me. So this is not really accounted to
> MemTotal? If that is really the case then reporting it into the oom
> report is completely pointless and I am not even sure /proc/meminfo is
> the right interface either. It would just add more confusion I am
> afraid.
>  
 Why is it confusing? Documentation is quite clear:
>>> Because a single counter without a wider context cannot be put into any
>>> reasonable context. There is no notion of the total amount of device
>>> memory usable for dma-buf. As Christian explained some of it can be RAM
>>> based. So a single number is rather pointless on its own in many cases.
>>>
>>> Or let me just ask. What can you tell from dma-bud: $FOO kB in its
>>> current form?
>> It is better to be blind?
> No it is better to have a sensible counter that can be reasoned about.
> So far you are only claiming that having something is better than
> nothing and I would agree with you if that was a debugging one off
> interface. But /proc/meminfo and other proc files have to be maintained
> with future portability in mind. This is not a dumping ground for _some_
> counters that might be interesting at the _current_ moment. E.g. what
> happens if somebody wants to have a per device resp. memory based
> dma-buf data? Are you going to change the semantic or add another
> 2 counters?

This is the DmaBufTotal. It is the upper limit. If is not there is  is 
something else.

And when we have a better resolution on measuring it, it would make sense
to add a DmaBufVram, DmaBufMemGC or what ever we can pickup.

This is what we can measure today.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 1:14 PM, Daniel Vetter wrote:
> On Tue, Apr 20, 2021 at 09:26:00AM +, peter.enderb...@sony.com wrote:
>> On 4/20/21 10:58 AM, Daniel Vetter wrote:
>>> On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
 This adds a total used dma-buf memory. Details
 can be found in debugfs, however it is not for everyone
 and not always available. dma-buf are indirect allocated by
 userspace. So with this value we can monitor and detect
 userspace applications that have problems.

 Signed-off-by: Peter Enderborg 
>>> So there have been tons of discussions around how to track dma-buf and
>>> why, and I really need to understand the use-cass here first I think. proc
>>> uapi is as much forever as anything else, and depending what you're doing
>>> this doesn't make any sense at all:
>>>
>>> - on most linux systems dma-buf are only instantiated for shared buffer.
>>>   So there this gives you a fairly meaningless number and not anything
>>>   reflecting gpu memory usage at all.
>>>
>>> - on Android all buffers are allocated through dma-buf afaik. But there
>>>   we've recently had some discussions about how exactly we should track
>>>   all this, and the conclusion was that most of this should be solved by
>>>   cgroups long term. So if this is for Android, then I don't think adding
>>>   random quick stop-gaps to upstream is a good idea (because it's a pretty
>>>   long list of patches that have come up on this).
>>>
>>> So what is this for?
>> For the overview. dma-buf today only have debugfs for info. Debugfs
>> is not allowed by google to use in andoid. So this aggregate the information
>> so we can get information on what going on on the system. 
>>
>> And the LKML standard respond to that is "SHOW ME THE CODE".
> Yes. Except this extends to how exactly this is supposed to be used in
> userspace and acted upon.
>
>> When the top memgc has a aggregated information on dma-buf it is maybe
>> a better source to meminfo. But then it also imply that dma-buf requires 
>> memcg.
>>
>> And I dont see any problem to replace this with something better with it is 
>> ready.
> The thing is, this is uapi. Once it's merged we cannot, ever, replace it.
> It must be kept around forever, or a very close approximation thereof. So
> merging this with the justification that we can fix it later on or replace
> isn't going to happen.

It is intended to be relevant as long there is a dma-buf. This is a proper
metric. If the newer implementations is not get the same result it is
not doing it right and is not better. If a memcg counter or a global_zone
counter do the same thing they it can replace the suggested method.

But I dont think they will. dma-buf does not have to be mapped to a process,
and the case of vram, it is not covered in current global_zone. All of them
would be very nice to have in some form. But it wont change what the
correct value of what "Total" is.


> -Daniel
>
>>> -Daniel
>>>
 ---
  drivers/dma-buf/dma-buf.c | 12 
  fs/proc/meminfo.c |  5 -
  include/linux/dma-buf.h   |  1 +
  3 files changed, 17 insertions(+), 1 deletion(-)

 diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
 index f264b70c383e..4dc37cd4293b 100644
 --- a/drivers/dma-buf/dma-buf.c
 +++ b/drivers/dma-buf/dma-buf.c
 @@ -37,6 +37,7 @@ struct dma_buf_list {
  };
  
  static struct dma_buf_list db_list;
 +static atomic_long_t dma_buf_global_allocated;
  
  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int 
 buflen)
  {
 @@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
dma_resv_fini(dmabuf->resv);
  
 +  atomic_long_sub(dmabuf->size, &dma_buf_global_allocated);
module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
 @@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
 dma_buf_export_info *exp_info)
mutex_lock(&db_list.lock);
list_add(&dmabuf->list_node, &db_list.head);
mutex_unlock(&db_list.lock);
 +  atomic_long_add(dmabuf->size, &dma_buf_global_allocated);
  
return dmabuf;
  
 @@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
 dma_buf_map *map)
  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
  
 +/**
 + * dma_buf_allocated_pages - Return the used nr of pages
 + * allocated for dma-buf
 + */
 +long dma_buf_allocated_pages(void)
 +{
 +  return atomic_long_read(&dma_buf_global_allocated) >> PAGE_SHIFT;
 +}
 +
  #ifdef CONFIG_DEBUG_FS
  static int dma_buf_debug_show(struct seq_file *s, void *unused)
  {
 diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
 index 6fa761c9cc78..ccc7c40c8db7 100644
 --- a/fs/proc/meminfo.c
 +++ b/fs/proc/meminfo.c
 @@ -16,6 +16,7 @@

Re: [PATCH v2 10/16] drm/exynos: implement a drm bridge

2021-04-20 Thread Frieder Schrempf


On 23.02.21 13:07, Daniel Vetter wrote:

On Thu, Feb 18, 2021 at 5:02 PM Andrzej Hajda  wrote:


Hi Michael,

W dniu 18.02.2021 o 09:04, Michael Tretter pisze:

On Wed, 10 Feb 2021 10:10:37 +0100, Frieder Schrempf wrote:

On 04.02.21 18:46, Daniel Vetter wrote:

On Thu, Feb 4, 2021 at 6:26 PM Laurent Pinchart 
 wrote:

On Thu, Feb 04, 2021 at 06:19:22PM +0100, Daniel Vetter wrote:

On Thu, Feb 4, 2021 at 5:28 PM Andrzej Hajda wrote:

W dniu 04.02.2021 o 17:05, Daniel Vetter pisze:

On Thu, Feb 04, 2021 at 11:56:32AM +0100, Michael Tretter wrote:

On Thu, 04 Feb 2021 11:17:49 +0100, Daniel Vetter wrote:

On Wed, Feb 3, 2021 at 9:32 PM Michael Tretter wrote:

On Mon, 01 Feb 2021 17:33:14 +0100, Michael Tretter wrote:

On Tue, 15 Sep 2020 21:40:40 +0200, Andrzej Hajda wrote:

W dniu 14.09.2020 o 23:19, Andrzej Hajda pisze:

On 14.09.2020 22:01, Michael Tretter wrote:

On Mon, 14 Sep 2020 14:31:19 +0200, Marek Szyprowski wrote:

On 14.09.2020 10:29, Marek Szyprowski wrote:

On 11.09.2020 15:54, Michael Tretter wrote:

Make the exynos_dsi driver a full drm bridge that can be found and
used
from other drivers.

Other drivers can only attach to the bridge, if a mipi dsi device
already attached to the bridge. This allows to defer the probe of the
display pipe until the downstream bridges are available, too.

Signed-off-by: Michael Tretter 

This one (and the whole series applied) still fails on Exynos boards:

[drm] Exynos DRM: using 11c0.fimd device for DMA mapping
operations
exynos-drm exynos-drm: bound 11c0.fimd (ops fimd_component_ops)
OF: graph: no port node found in /soc/dsi@11c8
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address
0084
pgd = (ptrval)
[0084] *pgd=
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted
5.9.0-rc4-next-20200911-00010-g417dc70d70ec #1608
Hardware name: Samsung Exynos (Flattened Device Tree)
PC is at drm_bridge_attach+0x18/0x164
LR is at exynos_dsi_bind+0x88/0xa8
pc : []lr : []psr: 2013
sp : ef0dfca8  ip : 0002  fp : c13190e0
r10:   r9 : ee46d580  r8 : c13190e0
r7 : ee438800  r6 : 0018  r5 : ef253810  r4 : ef39e840
r3 :   r2 : 0018  r1 : ef39e888  r0 : ef39e840
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4000404a  DAC: 0051
Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
Stack: (0xef0dfca8 to 0xef0e)
...
[] (drm_bridge_attach) from []
(exynos_dsi_bind+0x88/0xa8)
[] (exynos_dsi_bind) from []
(component_bind_all+0xfc/0x290)
[] (component_bind_all) from []
(exynos_drm_bind+0xe4/0x19c)
[] (exynos_drm_bind) from []
(try_to_bring_up_master+0x1e4/0x2c4)
[] (try_to_bring_up_master) from []
(component_master_add_with_match+0xd4/0x108)
[] (component_master_add_with_match) from []
(exynos_drm_platform_probe+0xe4/0x110)
[] (exynos_drm_platform_probe) from []
(platform_drv_probe+0x6c/0xa4)
[] (platform_drv_probe) from []
(really_probe+0x200/0x4fc)
[] (really_probe) from []
(driver_probe_device+0x78/0x1fc)
[] (driver_probe_device) from []
(device_driver_attach+0x58/0x60)
[] (device_driver_attach) from []
(__driver_attach+0xdc/0x174)
[] (__driver_attach) from []
(bus_for_each_dev+0x68/0xb4)
[] (bus_for_each_dev) from []
(bus_add_driver+0x158/0x214)
[] (bus_add_driver) from []
(driver_register+0x78/0x110)
[] (driver_register) from []
(exynos_drm_init+0xe4/0x118)
[] (exynos_drm_init) from []
(do_one_initcall+0x8c/0x42c)
[] (do_one_initcall) from []
(kernel_init_freeable+0x190/0x1dc)
[] (kernel_init_freeable) from []
(kernel_init+0x8/0x118)
[] (kernel_init) from [] (ret_from_fork+0x14/0x20)
Exception stack(0xef0dffb0 to 0xef0dfff8)
...
---[ end trace ee27f313f9ed9da1 ]---

# arm-linux-gnueabi-addr2line -e vmlinux c0628c08
drivers/gpu/drm/drm_bridge.c:184 (discriminator 1)

I will try to debug it a bit more today.

The above crash has been caused by lack of in_bridge initialization to
NULL in exynos_dsi_bind() in this patch. However, fixing it reveals
another issue:

[drm] Exynos DRM: using 11c0.fimd device for DMA mapping operations
exynos-drm exynos-drm: bound 11c0.fimd (ops fimd_component_ops)
OF: graph: no port node found in /soc/dsi@11c8
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address
0280
pgd = (ptrval)
[0280] *pgd=
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.9.0-rc4-next-20200911-00010-g417dc70d70ec-dirty #1613
Hardware name: Samsung Exynos (Flattened Device Tree)
PC is at __mutex_lock+0x54/0xb18
LR is at lock_is_held_type+0x80/0x138
pc : []lr : []psr: 6013
sp : ef0dfd30  ip : 33937b74  fp : c13193c8
r10: c1208eec  r9 :   r8 : ee45f808
r7 : c19561a4  r6 :   r5 :   r4 : 024c
r3 :   r2 : 00204140  r1 : c124f13c  r0 : 
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Michal Hocko

On Tue 20-04-21 09:32:14, Christian König wrote:
> Am 20.04.21 um 09:04 schrieb Michal Hocko:
> > On Mon 19-04-21 18:37:13, Christian König wrote:
> > > Am 19.04.21 um 18:11 schrieb Michal Hocko:
[...]
> > What I am trying to bring up with NUMA side is that the same problem can
> > happen on per-node basis. Let's say that some user consumes unexpectedly
> > large amount of dma-buf on a certain node. This can lead to observable
> > performance impact on anybody on allocating from that node and even
> > worse cause an OOM for node bound consumers. How do I find out that it
> > was dma-buf that has caused the problem?
> 
> Yes, that is the direction my thinking goes as well, but also even further.
> 
> See DMA-buf is also used to share device local memory between processes as
> well. In other words VRAM on graphics hardware.
> 
> On my test system here I have 32GB of system memory and 16GB of VRAM. I can
> use DMA-buf to allocate that 16GB of VRAM quite easily which then shows up
> under /proc/meminfo as used memory.

This is something that would be really interesting in the changelog. I
mean the expected and extreme memory consumption of this memory. Ideally
with some hints on what to do when the number is really high (e.g. mount
debugfs and have a look here and there to check whether this is just too
many users or an unexpected pattern to be reported).

> But that isn't really system memory at all, it's just allocated device
> memory.

OK, that was not really clear to me. So this is not really accounted to
MemTotal? If that is really the case then reporting it into the oom
report is completely pointless and I am not even sure /proc/meminfo is
the right interface either. It would just add more confusion I am
afraid.
 
> > See where I am heading?
> 
> Yeah, totally. Thanks for pointing this out.
> 
> Suggestions how to handle that?

As I've pointed out in previous reply we do have an API to account per
node memory but now that you have brought up that this is not something
we account as a regular memory then this doesn't really fit into that
model. But maybe I am just confused.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Michal Hocko

On Tue 20-04-21 10:20:43, Mike Rapoport wrote:
> On Tue, Apr 20, 2021 at 09:04:51AM +0200, Michal Hocko wrote:
> > On Mon 19-04-21 18:37:13, Christian König wrote:
> > > Am 19.04.21 um 18:11 schrieb Michal Hocko:
> > [...]
> > > > The question is not whether it is NUMA aware but whether it is useful to
> > > > know per-numa data for the purpose the counter is supposed to serve.
> > > 
> > > No, not at all. The pages of a single DMA-buf could even be from different
> > > NUMA nodes if the exporting driver decides that this is somehow useful.
> > 
> > As the use of the counter hasn't been explained yet I can only
> > speculate. One thing that I can imagine to be useful is to fill gaps in
> > our accounting. It is quite often that the memroy accounted in
> > /proc/meminfo (or oom report) doesn't add up to the overall memory
> > usage. In some workloads the workload can be huge! In many cases there
> > are other means to find out additional memory by a subsystem specific
> > interfaces (e.g. networking buffers). I do assume that dma-buf is just
> > one of those and the counter can fill the said gap at least partially
> > for some workloads. That is definitely useful.
> 
> A bit off-topic.
> 
> Michal, I think it would have been nice to have an explanation like above
> in Documentation/proc/meminfo, what do you say?

Not sure which specific parts (likely the unaccounted memory?) but sure
why not. Our /proc/meminfo is rather underdocumented. More information
cannot hurt.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Mike Rapoport

On Tue, Apr 20, 2021 at 10:45:21AM +, peter.enderb...@sony.com wrote:
> On 4/20/21 11:41 AM, Mike Rapoport wrote:
> > Hello Peter,
> >
> > On Tue, Apr 20, 2021 at 09:26:00AM +, peter.enderb...@sony.com wrote:
> >> On 4/20/21 10:58 AM, Daniel Vetter wrote:
> >>> On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
>  This adds a total used dma-buf memory. Details
>  can be found in debugfs, however it is not for everyone
>  and not always available. dma-buf are indirect allocated by
>  userspace. So with this value we can monitor and detect
>  userspace applications that have problems.
> 
>  Signed-off-by: Peter Enderborg 
> >>> So there have been tons of discussions around how to track dma-buf and
> >>> why, and I really need to understand the use-cass here first I think. proc
> >>> uapi is as much forever as anything else, and depending what you're doing
> >>> this doesn't make any sense at all:
> >>>
> >>> - on most linux systems dma-buf are only instantiated for shared buffer.
> >>>   So there this gives you a fairly meaningless number and not anything
> >>>   reflecting gpu memory usage at all.
> >>>
> >>> - on Android all buffers are allocated through dma-buf afaik. But there
> >>>   we've recently had some discussions about how exactly we should track
> >>>   all this, and the conclusion was that most of this should be solved by
> >>>   cgroups long term. So if this is for Android, then I don't think adding
> >>>   random quick stop-gaps to upstream is a good idea (because it's a pretty
> >>>   long list of patches that have come up on this).
> >>>
> >>> So what is this for?
> >> For the overview. dma-buf today only have debugfs for info. Debugfs
> >> is not allowed by google to use in andoid. So this aggregate the 
> >> information
> >> so we can get information on what going on on the system. 
> >  
> > Can you send an example debugfs output to see what data are we talking
> > about?
> 
> Sure. This is on a idle system. Im not sure why you need it.The problem is 
> partly that debugfs is
> not accessable on a commercial device.

I wanted to see what kind of information is there, but I didn't think it's
that long :)
 
> Dma-buf Objects:
> size        flags       mode        count       exp_name        buf name    
> ino
> 00032768    0002    00080007    0002    
> ion-system-1006-allocator-servi    dmabuf17728    07400825    dmabuf17728
>     Attached Devices:
> Total 0 devices attached
> 
> 11083776    0002    00080007    0003    
> ion-system-1006-allocator-servi    dmabuf17727    07400824    dmabuf17727
>     Attached Devices:
>     ae0.qcom,mdss_mdp:qcom,smmu_sde_unsec_cb
> Total 1 devices attached
> 
> 00032768    0002    00080007    0002    
> ion-system-1006-allocator-servi    dmabuf17726    07400823    dmabuf17726
>     Attached Devices:
> Total 0 devices attached
> 
> 11083776    0002    00080007    0002    
> ion-system-1006-allocator-servi    dmabuf17725    07400822    dmabuf17725
>     Attached Devices:
>     ae0.qcom,mdss_mdp:qcom,smmu_sde_unsec_cb
> Total 1 devices attached

...

> Total 654 objects, 744144896 bytes
 
Isn't the size from the first column also available in fdinfo?

Is there anything that prevents monitoring those?

-- 
Sincerely yours,
Mike.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC v1 PATCH 1/3] drivers: soc: add support for soc_device_match returning -EPROBE_DEFER

2021-04-20 Thread Dan Carpenter

On Mon, Apr 19, 2021 at 10:20:13AM +0200, Geert Uytterhoeven wrote:
> Hi Alice,
> 
> CC Arnd (soc_device_match() author)
> 
> On Mon, Apr 19, 2021 at 6:28 AM Alice Guo (OSS)  wrote:
> > From: Alice Guo 
> >
> > In i.MX8M boards, the registration of SoC device is later than caam
> > driver which needs it. Caam driver needs soc_device_match to provide
> > -EPROBE_DEFER when no SoC device is registered and no
> > early_soc_dev_attr.
> 
> I'm wondering if this is really a good idea: soc_device_match() is a
> last-resort low-level check, and IMHO should be made available early on,
> so there is no need for -EPROBE_DEFER.
> 
> >
> > Signed-off-by: Alice Guo 
> 
> Thanks for your patch!
> 
> > --- a/drivers/base/soc.c
> > +++ b/drivers/base/soc.c
> > @@ -110,6 +110,7 @@ static void soc_release(struct device *dev)
> >  }
> >
> >  static struct soc_device_attribute *early_soc_dev_attr;
> > +static bool soc_dev_attr_init_done = false;
> 
> Do you need this variable?
> 
> >
> >  struct soc_device *soc_device_register(struct soc_device_attribute 
> > *soc_dev_attr)
> >  {
> > @@ -157,6 +158,7 @@ struct soc_device *soc_device_register(struct 
> > soc_device_attribute *soc_dev_attr
> > return ERR_PTR(ret);
> > }
> >
> > +   soc_dev_attr_init_done = true;
> > return soc_dev;
> >
> >  out3:
> > @@ -246,6 +248,9 @@ const struct soc_device_attribute *soc_device_match(
> > if (!matches)
> > return NULL;
> >
> > +   if (!soc_dev_attr_init_done && !early_soc_dev_attr)
> 
> if (!soc_bus_type.p && !early_soc_dev_attr)

There is one place checking this already.  We could wrap it in a helper
function:

static bool device_init_done(void)
{
return soc_bus_type.p ? true : false;
}

regards,
dan carpenter
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Christian König


Yeah. If we go with userspace fences, then userspace can hang itself. Not
the kernel's problem.


Well, the path of inner peace begins with four words. “Not my fucking 
problem.”


But I'm not that much concerned about the kernel, but rather about 
important userspace processes like X, Wayland, SurfaceFlinger etc...


I mean attaching a page to a sync object and allowing to wait/signal 
from both CPU as well as GPU side is not so much of a problem.



You have to somehow handle that, e.g. perhaps with conditional
rendering and just using the old frame in compositing if the new one
doesn't show up in time.


Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level.

Regards,
Christian.

Am 20.04.21 um 13:16 schrieb Daniel Vetter:

On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote:

Daniel, are you suggesting that we should skip any deadlock prevention in
the kernel, and just let userspace wait for and signal any fence it has
access to?

Yeah. If we go with userspace fences, then userspace can hang itself. Not
the kernel's problem. The only criteria is that the kernel itself must
never rely on these userspace fences, except for stuff like implementing
optimized cpu waits. And in those we must always guarantee that the
userspace process remains interruptible.

It's a completely different world from dma_fence based kernel fences,
whether those are implicit or explicit.


Do you have any concern with the deprecation/removal of BO fences in the
kernel assuming userspace is only using explicit fences? Any concern with
the submit and return fences for modesetting and other producer<->consumer
scenarios?

Let me work on the full replay for your rfc first, because there's a lot
of details here and nuance.
-Daniel


Thanks,
Marek

On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:


On Tue, Apr 20, 2021 at 12:15 PM Christian König
 wrote:

Am 19.04.21 um 17:48 schrieb Jason Ekstrand:

Not going to comment on everything on the first pass...

On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:

Hi,

This is our initial proposal for explicit fences everywhere and new

memory management that doesn't use BO fences. It's a redesign of how Linux
graphics drivers work, and it can coexist with what we have now.


1. Introduction
(skip this if you are already sold on explicit fences)

The current Linux graphics architecture was initially designed for

GPUs with only one graphics queue where everything was executed in the
submission order and per-BO fences were used for memory management and
CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
queues were added on top, which required the introduction of implicit
GPU-GPU synchronization between queues of different processes using per-BO
fences. Recently, even parallel execution within one queue was enabled
where a command buffer starts draws and compute shaders, but doesn't wait
for them, enabling parallelism between back-to-back command buffers.
Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
was created to enable all those use cases, and it's the only reason why the
scheduler exists.

The GPU scheduler, implicit synchronization, BO-fence-based memory

management, and the tracking of per-BO fences increase CPU overhead and
latency, and reduce parallelism. There is a desire to replace all of them
with something much simpler. Below is how we could do it.


2. Explicit synchronization for window systems and modesetting

The producer is an application and the consumer is a compositor or a

modesetting driver.

2.1. The Present request

As part of the Present request, the producer will pass 2 fences (sync

objects) to the consumer alongside the presented DMABUF BO:

- The submit fence: Initially unsignalled, it will be signalled when

the producer has finished drawing into the presented buffer.

- The return fence: Initially unsignalled, it will be signalled when

the consumer has finished using the presented buffer.

I'm not sure syncobj is what we want.  In the Intel world we're trying
to go even further to something we're calling "userspace fences" which
are a timeline implemented as a single 64-bit value in some
CPU-mappable BO.  The client writes a higher value into the BO to
signal the timeline.

Well that is exactly what our Windows guys have suggested as well, but
it strongly looks like that this isn't sufficient.

First of all you run into security problems when any application can
just write any value to that memory location. Just imagine an
application sets the counter to zero and X waits forever for some
rendering to finish.

The thing is, with userspace fences security boundary issue prevent
moves into userspace entirely. And it really doesn't matter whether
the event you're waiting on doesn't complete because the other app
crashed or was stupid or intentionally gave you a wrong fence point:
You have to somehow handle that, e.g. perhaps with conditional
rendering and just using the old frame in

Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Vetter

On Mon, Apr 19, 2021 at 06:47:48AM -0400, Marek Olšák wrote:
> Hi,
> 
> This is our initial proposal for explicit fences everywhere and new memory
> management that doesn't use BO fences. It's a redesign of how Linux
> graphics drivers work, and it can coexist with what we have now.
> 
> 
> *1. Introduction*
> (skip this if you are already sold on explicit fences)
> 
> The current Linux graphics architecture was initially designed for GPUs
> with only one graphics queue where everything was executed in the
> submission order and per-BO fences were used for memory management and
> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> queues were added on top, which required the introduction of implicit
> GPU-GPU synchronization between queues of different processes using per-BO
> fences. Recently, even parallel execution within one queue was enabled
> where a command buffer starts draws and compute shaders, but doesn't wait
> for them, enabling parallelism between back-to-back command buffers.
> Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
> was created to enable all those use cases, and it's the only reason why the
> scheduler exists.
> 
> The GPU scheduler, implicit synchronization, BO-fence-based memory
> management, and the tracking of per-BO fences increase CPU overhead and
> latency, and reduce parallelism. There is a desire to replace all of them
> with something much simpler. Below is how we could do it.

I get the feeling you're mixing up a lot of things here that have more
nuance, so first some lingo.

- There's kernel based synchronization, based on dma_fence. These come in
  two major variants: Implicit synchronization, where the kernel attaches
  the dma_fences to a dma-buf, and explicit synchronization, where the
  dma_fence gets passed around as a stand-alone object, either a sync_file
  or a drm_syncobj

- Then there's userspace fence synchronization, where userspace issues any
  fences directly and the kernel doesn't even know what's going on. This
  is the only model that allows you to ditch the kernel overhead, and it's
  also the model that vk uses.

  I concur with Jason that this one is the future, it's the model hw
  wants, compute wants and vk wants. Building an explicit fence world
  which doesn't aim at this is imo wasted effort.

Now you smash them into one thing by also changing the memory model, but I
think that doesn't work:

- Relying on gpu page faults across the board wont happen. I think right
  now only amd's GFX10 or so has enough pagefault support to allow this,
  and not even there I'm really sure. Nothing else will anytime soon, at
  least not as far as I know. So we need to support slightly more hw in
  upstream than just that.  Any plan that's realistic needs to cope with
  dma_fence for a really long time.

- Pown^WPin All The Things! is probably not a general enough memory
  management approach. We've kinda tried for years to move away from it.
  Sure we can support it as an optimization in specific workloads, and it
  will make stuff faster, but it's not going to be the default I think.

- We live in a post xf86-video-$vendor world, and all these other
  compositors rely on implicit sync. You're not going to be able to get
  rid of them anytime soon. What's worse, all the various EGL/vk buffer
  sharing things also rely on implicit sync, so you get to fix up tons of
  applications on top. Any plan that's realistic needs to cope with
  implicit/explicit at the same time together won't work.

- Absolute infuriating, but you can't use page-faulting together with any
  dma_fence synchronization primitives, whether implicit or explicit. This
  means until the entire ecosystem moved forward (good luck with that) we
  have to support dma_fence. The only sync model that works together with
  page faults is userspace fence based sync.

Then there's the somewhat aside topic of how amdgpu/radeonsi does implicit
sync, at least last I checked. Currently this oversynchronizes badly
because it's left to the kernel to guess what should be synchronized, and
that gets things wrong. What you need there is explicit implicit
synchronization:

- on the cs side, userspace must set explicit for which buffers the kernel
  should engage in implicit synchronization. That's how it works on all
  other drivers that support more explicit userspace like vk or gl drivers
  that are internally all explicit. So essentially you only set the
  implicit fence slot when you really want to, and only userspace knows
  this. Implementing this without breaking the current logic probably
  needs some flags.

- the other side isn't there yet upstream, but Jason has patches.
  Essentially you also need to sample your implicit sync points at the
  right spot, to avoid oversync on later rendering by the producer.
  Jason's patch solves this by adding an ioctl to dma-buf to get the
  current set.

- without any of this things for pure explicit fencing userspace t

Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 1:52 PM, Mike Rapoport wrote:
> On Tue, Apr 20, 2021 at 10:45:21AM +, peter.enderb...@sony.com wrote:
>> On 4/20/21 11:41 AM, Mike Rapoport wrote:
>>> Hello Peter,
>>>
>>> On Tue, Apr 20, 2021 at 09:26:00AM +, peter.enderb...@sony.com wrote:
 On 4/20/21 10:58 AM, Daniel Vetter wrote:
> On Sat, Apr 17, 2021 at 06:38:35PM +0200, Peter Enderborg wrote:
>> This adds a total used dma-buf memory. Details
>> can be found in debugfs, however it is not for everyone
>> and not always available. dma-buf are indirect allocated by
>> userspace. So with this value we can monitor and detect
>> userspace applications that have problems.
>>
>> Signed-off-by: Peter Enderborg 
> So there have been tons of discussions around how to track dma-buf and
> why, and I really need to understand the use-cass here first I think. proc
> uapi is as much forever as anything else, and depending what you're doing
> this doesn't make any sense at all:
>
> - on most linux systems dma-buf are only instantiated for shared buffer.
>   So there this gives you a fairly meaningless number and not anything
>   reflecting gpu memory usage at all.
>
> - on Android all buffers are allocated through dma-buf afaik. But there
>   we've recently had some discussions about how exactly we should track
>   all this, and the conclusion was that most of this should be solved by
>   cgroups long term. So if this is for Android, then I don't think adding
>   random quick stop-gaps to upstream is a good idea (because it's a pretty
>   long list of patches that have come up on this).
>
> So what is this for?
 For the overview. dma-buf today only have debugfs for info. Debugfs
 is not allowed by google to use in andoid. So this aggregate the 
 information
 so we can get information on what going on on the system. 
>>>  
>>> Can you send an example debugfs output to see what data are we talking
>>> about?
>> Sure. This is on a idle system. Im not sure why you need it.The problem is 
>> partly that debugfs is
>> not accessable on a commercial device.
> I wanted to see what kind of information is there, but I didn't think it's
> that long :)
Sorry, but it was making a point.
>  
>> Dma-buf Objects:
>> size        flags       mode        count       exp_name        buf name    
>> ino
>> 00032768    0002    00080007    0002    
>> ion-system-1006-allocator-servi    dmabuf17728    07400825    dmabuf17728
>>     Attached Devices:
>> Total 0 devices attached
>>
>> 11083776    0002    00080007    0003    
>> ion-system-1006-allocator-servi    dmabuf17727    07400824    dmabuf17727
>>     Attached Devices:
>>     ae0.qcom,mdss_mdp:qcom,smmu_sde_unsec_cb
>> Total 1 devices attached
>>
>> 00032768    0002    00080007    0002    
>> ion-system-1006-allocator-servi    dmabuf17726    07400823    dmabuf17726
>>     Attached Devices:
>> Total 0 devices attached
>>
>> 11083776    0002    00080007    0002    
>> ion-system-1006-allocator-servi    dmabuf17725    07400822    dmabuf17725
>>     Attached Devices:
>>     ae0.qcom,mdss_mdp:qcom,smmu_sde_unsec_cb
>> Total 1 devices attached
> ...
>
>> Total 654 objects, 744144896 bytes
>  
> Isn't the size from the first column also available in fdinfo?
>
> Is there anything that prevents monitoring those?
>
Yes, selinux.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Christian König


Hi Daniel,

Am 20.04.21 um 14:01 schrieb Daniel Vetter:

On Mon, Apr 19, 2021 at 06:47:48AM -0400, Marek Olšák wrote:

Hi,

This is our initial proposal for explicit fences everywhere and new memory
management that doesn't use BO fences. It's a redesign of how Linux
graphics drivers work, and it can coexist with what we have now.


*1. Introduction*
(skip this if you are already sold on explicit fences)

The current Linux graphics architecture was initially designed for GPUs
with only one graphics queue where everything was executed in the
submission order and per-BO fences were used for memory management and
CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
queues were added on top, which required the introduction of implicit
GPU-GPU synchronization between queues of different processes using per-BO
fences. Recently, even parallel execution within one queue was enabled
where a command buffer starts draws and compute shaders, but doesn't wait
for them, enabling parallelism between back-to-back command buffers.
Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
was created to enable all those use cases, and it's the only reason why the
scheduler exists.

The GPU scheduler, implicit synchronization, BO-fence-based memory
management, and the tracking of per-BO fences increase CPU overhead and
latency, and reduce parallelism. There is a desire to replace all of them
with something much simpler. Below is how we could do it.

I get the feeling you're mixing up a lot of things here that have more
nuance, so first some lingo.

- There's kernel based synchronization, based on dma_fence. These come in
   two major variants: Implicit synchronization, where the kernel attaches
   the dma_fences to a dma-buf, and explicit synchronization, where the
   dma_fence gets passed around as a stand-alone object, either a sync_file
   or a drm_syncobj

- Then there's userspace fence synchronization, where userspace issues any
   fences directly and the kernel doesn't even know what's going on. This
   is the only model that allows you to ditch the kernel overhead, and it's
   also the model that vk uses.

   I concur with Jason that this one is the future, it's the model hw
   wants, compute wants and vk wants. Building an explicit fence world
   which doesn't aim at this is imo wasted effort.

Now you smash them into one thing by also changing the memory model, but I
think that doesn't work:

- Relying on gpu page faults across the board wont happen. I think right
   now only amd's GFX10 or so has enough pagefault support to allow this,


It's even worse. GFX9 has enough support so that in theory can work.

Because of this Felix and his team are working on HMM support based on 
this generation.


On GFX10 some aspects of it are improved while others are totally broken 
again.



   and not even there I'm really sure. Nothing else will anytime soon, at
   least not as far as I know. So we need to support slightly more hw in
   upstream than just that.  Any plan that's realistic needs to cope with
   dma_fence for a really long time.

- Pown^WPin All The Things! is probably not a general enough memory
   management approach. We've kinda tried for years to move away from it.
   Sure we can support it as an optimization in specific workloads, and it
   will make stuff faster, but it's not going to be the default I think.

- We live in a post xf86-video-$vendor world, and all these other
   compositors rely on implicit sync. You're not going to be able to get
   rid of them anytime soon. What's worse, all the various EGL/vk buffer
   sharing things also rely on implicit sync, so you get to fix up tons of
   applications on top. Any plan that's realistic needs to cope with
   implicit/explicit at the same time together won't work.

- Absolute infuriating, but you can't use page-faulting together with any
   dma_fence synchronization primitives, whether implicit or explicit. This
   means until the entire ecosystem moved forward (good luck with that) we
   have to support dma_fence. The only sync model that works together with
   page faults is userspace fence based sync.

Then there's the somewhat aside topic of how amdgpu/radeonsi does implicit
sync, at least last I checked. Currently this oversynchronizes badly
because it's left to the kernel to guess what should be synchronized, and
that gets things wrong. What you need there is explicit implicit
synchronization:

- on the cs side, userspace must set explicit for which buffers the kernel
   should engage in implicit synchronization. That's how it works on all
   other drivers that support more explicit userspace like vk or gl drivers
   that are internally all explicit. So essentially you only set the
   implicit fence slot when you really want to, and only userspace knows
   this. Implementing this without breaking the current logic probably
   needs some flags.

- the other side isn't there yet upstream, but Jason has patches.
   Es

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Michal Hocko

On Tue 20-04-21 10:00:07, Christian König wrote:
> Am 20.04.21 um 09:46 schrieb Michal Hocko:
> > On Tue 20-04-21 09:32:14, Christian König wrote:
> > > Am 20.04.21 um 09:04 schrieb Michal Hocko:
> > > > On Mon 19-04-21 18:37:13, Christian König wrote:
> > > > > Am 19.04.21 um 18:11 schrieb Michal Hocko:
> > [...]
> > > > What I am trying to bring up with NUMA side is that the same problem can
> > > > happen on per-node basis. Let's say that some user consumes unexpectedly
> > > > large amount of dma-buf on a certain node. This can lead to observable
> > > > performance impact on anybody on allocating from that node and even
> > > > worse cause an OOM for node bound consumers. How do I find out that it
> > > > was dma-buf that has caused the problem?
> > > Yes, that is the direction my thinking goes as well, but also even 
> > > further.
> > > 
> > > See DMA-buf is also used to share device local memory between processes as
> > > well. In other words VRAM on graphics hardware.
> > > 
> > > On my test system here I have 32GB of system memory and 16GB of VRAM. I 
> > > can
> > > use DMA-buf to allocate that 16GB of VRAM quite easily which then shows up
> > > under /proc/meminfo as used memory.
> > This is something that would be really interesting in the changelog. I
> > mean the expected and extreme memory consumption of this memory. Ideally
> > with some hints on what to do when the number is really high (e.g. mount
> > debugfs and have a look here and there to check whether this is just too
> > many users or an unexpected pattern to be reported).
> > 
> > > But that isn't really system memory at all, it's just allocated device
> > > memory.
> > OK, that was not really clear to me. So this is not really accounted to
> > MemTotal?
> 
> It depends. In a lot of embedded systems you only have system memory and in
> this case that value here is indeed really useful.
> 
> > If that is really the case then reporting it into the oom
> > report is completely pointless and I am not even sure /proc/meminfo is
> > the right interface either. It would just add more confusion I am
> > afraid.
> 
> I kind of agree. As I said a DMA-buf could be backed by system memory or
> device memory.
> 
> In the case when it is backed by system memory it does make sense to report
> this in an OOM dump.
> 
> But only the exporting driver knows what the DMA-buf handle represents, the
> framework just provides the common ground for inter driver communication.

Then those drivers need to account for meminfo/oom report purposes.

> > > > See where I am heading?
> > > Yeah, totally. Thanks for pointing this out.
> > > 
> > > Suggestions how to handle that?
> > As I've pointed out in previous reply we do have an API to account per
> > node memory but now that you have brought up that this is not something
> > we account as a regular memory then this doesn't really fit into that
> > model. But maybe I am just confused.
> 
> Well does that API also has a counter for memory used by device drivers?

I think that "memory used by device drivers" is immaterial. The only
important thing is to account that memory where it makes sense. So for
RAM based allocations to report them via meminfo and find other way to
report device memory allocations.
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 0/2 V6]Add dma-buf counter

2021-04-20 Thread Michal Hocko

On Tue 20-04-21 10:22:18, Peter Enderborg wrote:
> The dma-buf counter is a metric for mapped memory used by it's clients.
> It is a shared buffer that is typically used for interprocess communication
> or process to hardware communication. In android we used to have ION,. but
> it is now replaced with dma-buf. ION had some overview metrics that was 
> similar.

The discussion around the previous version is still not over and as it
seems your proposed approach is not really viable. So please do not send
new versions until that is sorted out.

Thanks!
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Stone

Hi Marek,

On Mon, 19 Apr 2021 at 11:48, Marek Olšák  wrote:

> *2. Explicit synchronization for window systems and modesetting*
>
> The producer is an application and the consumer is a compositor or a
> modesetting driver.
>
> *2.1. The Present request*
>

So the 'present request' is an ioctl, right? Not a userspace construct like
it is today? If so, how do we correlate the two?

The terminology is pretty X11-centric so I'll assume that's what you've
designed against, but Wayland and even X11 carry much more auxiliary
information attached to a present request than just 'this buffer, this
swapchain'. Wayland latches a lot of data on presentation, including
non-graphics data such as surface geometry (so we can have resizes which
don't suck), window state (e.g. fullscreen or not, also so we can have
resizes which don't suck), and these requests can also cascade through a
tree of subsurfaces (so we can have embeds which don't suck). X11 mostly
just carries timestamps, which is more tractable.

Given we don't want to move the entirety of Wayland into kernel-visible
objects, how do we synchronise the two streams so they aren't incoherent?
Taking a rough stab at it whilst assuming we do have
DRM_IOCTL_NONMODE_PRESENT, this would create a present object somewhere in
kernel space, which the producer would create and ?? export a FD from, that
the compositor would ?? import.

As part of the Present request, the producer will pass 2 fences (sync
> objects) to the consumer alongside the presented DMABUF BO:
> - The submit fence: Initially unsignalled, it will be signalled when the
> producer has finished drawing into the presented buffer.
>

We have already have this in Wayland through dma_fence. I'm relaxed about
this becoming drm_syncobj or drm_newmappedysncobjthing, it's just a matter
of typing. X11 has patches to DRI3 to support dma_fence, but they never got
merged because it was far too invasive to a server which is no longer
maintained.

> - The return fence: Initially unsignalled, it will be signalled when the
> consumer has finished using the presented buffer.
>

Currently in Wayland the return fence (again a dma_fence) is generated by
the compositor and sent as an event when it's done, because we can't have
speculative/empty/future fences. drm_syncobj would make this possible, but
so far I've been hesitant because I don't see the benefit to it (more
below).

> Deadlock mitigation to recover from segfaults:
> - The kernel knows which process is obliged to signal which fence. This
> information is part of the Present request and supplied by userspace.
>

Same as today with dma_fence. Less true with drm_syncobj if we're using
timelines.

> - If the producer crashes, the kernel signals the submit fence, so that
> the consumer can make forward progress.
>

This is only a change if the producer is now allowed to submit a fence
before it's flushed the work which would eventually fulfill that fence.
Using dma_fence has so far isolated us from this.

> - If the consumer crashes, the kernel signals the return fence, so that
> the producer can reclaim the buffer.
>

'The consumer' is problematic, per below. I think the wording you want is
'if no references are held to the submitted present object'.

> - A GPU hang signals all fences. Other deadlocks will be handled like GPU
> hangs.
>
> Other window system requests can follow the same idea.
>

Which other window system requests did you have in mind? Again, moving the
entirety of Wayland's signaling into the kernel is a total non-starter.
Partly because it means our entire protocol would be subject to the
kernel's ABI rules, partly because the rules and interdependencies between
the requests are extremely complex, but mostly because the kernel is just a
useless proxy: it would be forced to do significant work to reason about
what those requests do and when they should happen, but wouldn't be able to
make those decisions itself so would have to just punt everything to
userspace. Unless we have eBPF compositors.

> Merged fences where one fence object contains multiple fences will be
> supported. A merged fence is signalled only when its fences are signalled.
> The consumer will have the option to redefine the unsignalled return fence
> to a merged fence.
>

An elaboration of how this differed from drm_syncobj would be really
helpful here. I can make some guesses based on the rest of the mail, but
I'm not sure how accurate they are.

> *2.2. Modesetting*
>
> Since a modesetting driver can also be the consumer, the present ioctl
> will contain a submit fence and a return fence too. One small problem with
> this is that userspace can hang the modesetting driver, but in theory, any
> later present ioctl can override the previous one, so the unsignalled
> presentation is never used.
>

This is also problematic. It's not just KMS, but media codecs too - V4L
doesn't yet have explicit fencing, but given the programming model of
codecs and how deeply they interoperate, but

Re: [PATCH v11 1/4] dt-bindings: gpu: mali-bifrost: Add Mediatek MT8183

2021-04-20 Thread Rob Herring

On Fri, Feb 5, 2021 at 9:02 PM Nicolas Boichat  wrote:
>
> On Sat, Feb 6, 2021 at 1:55 AM Rob Herring  wrote:
> >
> > On Tue, 26 Jan 2021 09:17:56 +0800, Nicolas Boichat wrote:
> > > Define a compatible string for the Mali Bifrost GPU found in
> > > Mediatek's MT8183 SoCs.
> > >
> > > Signed-off-by: Nicolas Boichat 
> > > ---
> > >
> > > Changes in v11:
> > >  - binding: power-domain-names not power-domainS-names
> > >
> > > Changes in v10:
> > >  - Fix the binding to make sure sram-supply property can be provided.
> > >
> > > Changes in v9: None
> > > Changes in v8: None
> > > Changes in v7: None
> > > Changes in v6:
> > >  - Rebased, actually tested with recent mesa driver.
> > >
> > > Changes in v5:
> > >  - Rename "2d" power domain to "core2"
> > >
> > > Changes in v4:
> > >  - Add power-domain-names description
> > >(kept Alyssa's reviewed-by as the change is minor)
> > >
> > > Changes in v3: None
> > > Changes in v2: None
> > >
> > >  .../bindings/gpu/arm,mali-bifrost.yaml| 28 +++
> > >  1 file changed, 28 insertions(+)
> > >
> >
> >
> > Please add Acked-by/Reviewed-by tags when posting new versions. However,
> > there's no need to repost patches *only* to add the tags. The upstream
> > maintainer will do that for acks received on the version they apply.
> >
> > If a tag was not added on purpose, please state why and what changed.
>
> There were changes in v11, I thought you'd want to review again?

Looked like a minor change from the changelog, so it would have been
appropriate to keep. However, I see another issue.

Rob
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v11 1/4] dt-bindings: gpu: mali-bifrost: Add Mediatek MT8183

2021-04-20 Thread Rob Herring

On Mon, Jan 25, 2021 at 7:18 PM Nicolas Boichat  wrote:
>
> Define a compatible string for the Mali Bifrost GPU found in
> Mediatek's MT8183 SoCs.
>
> Signed-off-by: Nicolas Boichat 
> ---
>
> Changes in v11:
>  - binding: power-domain-names not power-domainS-names
>
> Changes in v10:
>  - Fix the binding to make sure sram-supply property can be provided.
>
> Changes in v9: None
> Changes in v8: None
> Changes in v7: None
> Changes in v6:
>  - Rebased, actually tested with recent mesa driver.
>
> Changes in v5:
>  - Rename "2d" power domain to "core2"
>
> Changes in v4:
>  - Add power-domain-names description
>(kept Alyssa's reviewed-by as the change is minor)
>
> Changes in v3: None
> Changes in v2: None
>
>  .../bindings/gpu/arm,mali-bifrost.yaml| 28 +++
>  1 file changed, 28 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml 
> b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> index 184492162e7e..3e758f88e2cd 100644
> --- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> +++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml
> @@ -17,6 +17,7 @@ properties:
>  items:
>- enum:
>- amlogic,meson-g12a-mali
> +  - mediatek,mt8183-mali
>- realtek,rtd1619-mali
>- rockchip,px30-mali
>- const: arm,mali-bifrost # Mali Bifrost GPU model/revision is fully 
> discoverable
> @@ -41,6 +42,8 @@ properties:
>
>mali-supply: true
>
> +  sram-supply: true
> +
>operating-points-v2: true
>
>power-domains:
> @@ -87,6 +90,31 @@ allOf:
>  then:
>required:
>  - resets
> +  - if:
> +  properties:
> +compatible:
> +  contains:
> +const: mediatek,mt8183-mali
> +then:
> +  properties:
> +power-domains:
> +  description:
> +List of phandle and PM domain specifier as documented in
> +Documentation/devicetree/bindings/power/power_domain.txt
> +  minItems: 3
> +  maxItems: 3

This won't work because the top level schema restricts this to 1. The
top level needs to say:

power-domains:
  minItems: 1
  maxItems: 3

And you need just 'minItems: 3' here and 'maxItems: 1' in the else clause.

And drop the description. That's every 'power-domains' property.

> +power-domain-names:
> +  items:
> +- const: core0
> +- const: core1
> +- const: core2

Blank line

> +  required:
> +- sram-supply
> +- power-domains
> +- power-domain-names
> +else:
> +  properties:
> +sram-supply: false
>
>  examples:
>- |
> --
> 2.30.0.280.ga3ce27912f-goog
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Stone

Hi,

On Tue, 20 Apr 2021 at 13:01, Daniel Vetter  wrote:

> - We live in a post xf86-video-$vendor world, and all these other
>   compositors rely on implicit sync. You're not going to be able to get
>   rid of them anytime soon. What's worse, all the various EGL/vk buffer
>   sharing things also rely on implicit sync, so you get to fix up tons of
>   applications on top. Any plan that's realistic needs to cope with
>   implicit/explicit at the same time together won't work.
>
> - Absolute infuriating, but you can't use page-faulting together with any
>   dma_fence synchronization primitives, whether implicit or explicit. This
>   means until the entire ecosystem moved forward (good luck with that) we
>   have to support dma_fence. The only sync model that works together with
>   page faults is userspace fence based sync.
>
This should get rid of the oversync issues, and since implicit sync is
> backed in everywhere right now, you'll have to deal with implicit sync for
> a very long time.
>

Depends what you mean by 'implicit sync'. ;)

Getting userspace (Vulkan WSI, EGL, Wayland compositors, browsers, media
clients) over to explicit sync is easy, _provided_ that the explicit sync
gives us the same guarantees as implicit sync, i.e. completes in bounded
time, GPU/display work can be flushed to the kernel predicated on fence
completion with the kernel handling synchronisation and scheduling. It's
just a matter of typing, and until now we haven't had a great reason to do
that typing. Now we do have that reason, so we are implementing it. Whether
it's dma_fence or drm_syncobj is mostly immaterial; we can encode in
protocol requirements that you can't try to use wait-before-signal with
drm_syncobj and you'll get killed if you try.

Getting that userspace over to fully userspace-based sync
(wait-before-signal or wait-never-signal, no kernel assistance but you just
have to roll your own polling or signal handling on either CPU or GPU side)
is not easy. It might never happen, because it's an extraordinary amount of
work, introduces a huge amount of fragility into a super-critical path, and
and so far it's not clear that it's a global performance improvement for
the whole system, just shifting performance problems from kernel to
userspace, and probably (AFAICT) making them worse in addition to the other
problems it brings.

What am I missing?

Cheers,
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Michal Hocko

On Tue 20-04-21 09:02:57, peter.enderb...@sony.com wrote:
> 
> >> But that isn't really system memory at all, it's just allocated device
> >> memory.
> > OK, that was not really clear to me. So this is not really accounted to
> > MemTotal? If that is really the case then reporting it into the oom
> > report is completely pointless and I am not even sure /proc/meminfo is
> > the right interface either. It would just add more confusion I am
> > afraid.
> >  
> 
> Why is it confusing? Documentation is quite clear:

Because a single counter without a wider context cannot be put into any
reasonable context. There is no notion of the total amount of device
memory usable for dma-buf. As Christian explained some of it can be RAM
based. So a single number is rather pointless on its own in many cases.

Or let me just ask. What can you tell from dma-bud: $FOO kB in its
current form?
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/5] drm/i915: Create stolen memory region from local memory

2021-04-20 Thread Matthew Auld

From: CQ Tang 

Add "REGION_STOLEN" device info to dg1, create stolen memory
region from upper portion of local device memory, starting
from DSMBASE.

v2:
- s/drm_info/drm_dbg; userspace likely doesn't care about stolen.
- mem->type is only setup after the region probe, so setting the name
  as stolen-local or stolen-system based on this value won't work. Split
  system vs local stolen setup to fix this.
- kill all the region->devmem/is_devmem stuff. We already differentiate
  the different types of stolen so such things shouldn't be needed
  anymore.
v3:
- split stolen lmem vs smem ops(Tvrtko)
- add shortcut for stolen region in i915(Tvrtko)
- sanity check dsm base vs bar size(Xinyun)

Signed-off-by: CQ Tang 
Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Xinyun Liu 
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 137 ++---
 drivers/gpu/drm/i915/i915_drv.h|   7 ++
 drivers/gpu/drm/i915/i915_pci.c|   2 +-
 drivers/gpu/drm/i915/i915_reg.h|   1 +
 drivers/gpu/drm/i915/intel_memory_region.c |   8 ++
 drivers/gpu/drm/i915/intel_memory_region.h |   5 +-
 6 files changed, 140 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index b0597de206de..2ed1ca9aec75 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 
+#include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
 #include "i915_drv.h"
 #include "i915_gem_stolen.h"
@@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private 
*i915,
}
}
 
+   /*
+* With device local memory, we don't need to check the address range,
+* this is device memory physical address, could overlap with system
+* memory.
+*/
+   if (HAS_LMEM(i915))
+   return 0;
+
/*
 * Verify that nothing else uses this physical address. Stolen
 * memory should be reserved by the BIOS and hidden from the
@@ -374,8 +383,9 @@ static void icl_get_stolen_reserved(struct drm_i915_private 
*i915,
}
 }
 
-static int i915_gem_init_stolen(struct drm_i915_private *i915)
+static int i915_gem_init_stolen(struct intel_memory_region *mem)
 {
+   struct drm_i915_private *i915 = mem->i915;
struct intel_uncore *uncore = &i915->uncore;
resource_size_t reserved_base, stolen_top;
resource_size_t reserved_total, reserved_size;
@@ -396,10 +406,10 @@ static int i915_gem_init_stolen(struct drm_i915_private 
*i915)
return 0;
}
 
-   if (resource_size(&intel_graphics_stolen_res) == 0)
+   if (resource_size(&mem->region) == 0)
return 0;
 
-   i915->dsm = intel_graphics_stolen_res;
+   i915->dsm = mem->region;
 
if (i915_adjust_stolen(i915, &i915->dsm))
return 0;
@@ -688,39 +698,130 @@ struct drm_i915_gem_object *
 i915_gem_object_create_stolen(struct drm_i915_private *i915,
  resource_size_t size)
 {
-   return 
i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN_SMEM],
+   return i915_gem_object_create_region(i915->mm.stolen_region,
 size, I915_BO_ALLOC_CONTIGUOUS);
 }
 
-static int init_stolen(struct intel_memory_region *mem)
+static int init_stolen_smem(struct intel_memory_region *mem)
 {
-   intel_memory_region_set_name(mem, "stolen");
-
/*
 * Initialise stolen early so that we may reserve preallocated
 * objects for the BIOS to KMS transition.
 */
-   return i915_gem_init_stolen(mem->i915);
+   return i915_gem_init_stolen(mem);
+}
+
+static void release_stolen_smem(struct intel_memory_region *mem)
+{
+   i915_gem_cleanup_stolen(mem->i915);
+}
+
+static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
+   .init = init_stolen_smem,
+   .release = release_stolen_smem,
+   .init_object = _i915_gem_object_stolen_init,
+};
+
+static int init_stolen_lmem(struct intel_memory_region *mem)
+{
+   int err;
+
+   if (GEM_WARN_ON(resource_size(&mem->region) == 0))
+   return -ENODEV;
+
+   if (!io_mapping_init_wc(&mem->iomap,
+   mem->io_start,
+   resource_size(&mem->region)))
+   return -EIO;
+
+   /*
+* For stolen lmem we mostly just care about populating the dsm related
+* bits and setting up the drm_mm allocator for the range.
+*/
+   err = i915_gem_init_stolen(mem);
+   if (err)
+   goto err_fini;
+
+   return 0;
+
+err_fini:
+   io_mapping_fini(&mem->iomap);
+   return err;
 }
 
-static void release_stolen(struct intel_memory_region *mem)
+static void release_stolen_lmem(struct intel_memory_region *mem)

[PATCH 3/5] drm/i915/stolen: enforce the min_page_size contract

2021-04-20 Thread Matthew Auld

From: CQ Tang 

Since stolen can now be device local-memory underneath, we should try to
enforce any min_page_size restrictions when allocating pages.

Signed-off-by: CQ Tang 
Signed-off-by: Matthew Auld 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 2ed1ca9aec75..4f9fe5aca37e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -677,7 +677,8 @@ static int _i915_gem_object_stolen_init(struct 
intel_memory_region *mem,
if (!stolen)
return -ENOMEM;
 
-   ret = i915_gem_stolen_insert_node(i915, stolen, size, 4096);
+   ret = i915_gem_stolen_insert_node(i915, stolen, size,
+ mem->min_page_size);
if (ret)
goto err_free;
 
@@ -843,8 +844,8 @@ i915_gem_object_create_stolen_for_preallocated(struct 
drm_i915_private *i915,
 
/* KISS and expect everything to be page-aligned */
if (GEM_WARN_ON(size == 0) ||
-   GEM_WARN_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)) ||
-   GEM_WARN_ON(!IS_ALIGNED(stolen_offset, I915_GTT_MIN_ALIGNMENT)))
+   GEM_WARN_ON(!IS_ALIGNED(size, mem->min_page_size)) ||
+   GEM_WARN_ON(!IS_ALIGNED(stolen_offset, mem->min_page_size)))
return ERR_PTR(-EINVAL);
 
stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
-- 
2.26.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/5] drm/i915/stolen: treat stolen local as normal local memory

2021-04-20 Thread Matthew Auld

Underneath it's the same stuff, so things like the PTE_LM bits for the
GTT should just keep working as-is.

Signed-off-by: Matthew Auld 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index ce1c83c13d05..017db8f71130 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -19,7 +19,10 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = 
{
 
 bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
 {
-   return obj->ops == &i915_gem_lmem_obj_ops;
+   struct intel_memory_region *mr = obj->mm.region;
+
+   return mr && (mr->type == INTEL_MEMORY_LOCAL ||
+ mr->type == INTEL_MEMORY_STOLEN_LOCAL);
 }
 
 struct drm_i915_gem_object *
-- 
2.26.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 4/5] drm/i915/stolen: pass the allocation flags

2021-04-20 Thread Matthew Auld

From: CQ Tang 

Stolen memory is always allocated as physically contiguous pages, mark
the object flags as such.

v2: move setting I915_BO_ALLOC_CONTIGUOUS into create_stolen

Signed-off-by: CQ Tang 
Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 4f9fe5aca37e..46f79b240df7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -633,14 +633,21 @@ static const struct drm_i915_gem_object_ops 
i915_gem_object_stolen_ops = {
 
 static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,
   struct drm_i915_gem_object *obj,
-  struct drm_mm_node *stolen)
+  struct drm_mm_node *stolen,
+  unsigned int flags)
 {
static struct lock_class_key lock_class;
unsigned int cache_level;
int err;
 
+   /*
+* Stolen objects are always physically contiguous since we just
+* allocate one big block underneath using the drm_mm range allocator.
+*/
+   flags |= I915_BO_ALLOC_CONTIGUOUS;
+
drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size);
-   i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0);
+   i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 
flags);
 
obj->stolen = stolen;
obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
@@ -682,7 +689,7 @@ static int _i915_gem_object_stolen_init(struct 
intel_memory_region *mem,
if (ret)
goto err_free;
 
-   ret = __i915_gem_object_create_stolen(mem, obj, stolen);
+   ret = __i915_gem_object_create_stolen(mem, obj, stolen, flags);
if (ret)
goto err_remove;
 
@@ -700,7 +707,7 @@ i915_gem_object_create_stolen(struct drm_i915_private *i915,
  resource_size_t size)
 {
return i915_gem_object_create_region(i915->mm.stolen_region,
-size, I915_BO_ALLOC_CONTIGUOUS);
+size, 0);
 }
 
 static int init_stolen_smem(struct intel_memory_region *mem)
@@ -866,7 +873,7 @@ i915_gem_object_create_stolen_for_preallocated(struct 
drm_i915_private *i915,
goto err_stolen;
}
 
-   ret = __i915_gem_object_create_stolen(mem, obj, stolen);
+   ret = __i915_gem_object_create_stolen(mem, obj, stolen, 0);
if (ret)
goto err_object_free;
 
-- 
2.26.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 5/5] drm/i915/lmem: Fail driver init if LMEM training failed

2021-04-20 Thread Matthew Auld

From: Matt Roper 

Boot firmware performs memory training and health assessment during
startup.  If the memory training fails, the firmware will consider the
GPU unusable and will instruct the punit to keep the GT powered down.
If this happens, our driver will be unable to communicate with the GT
(all GT registers will read back as 0, forcewake requests will timeout,
etc.) so we should abort driver initialization if this happens.  We can
confirm that LMEM was initialized successfully via sgunit register
GU_CNTL.

Bspec: 53111
Signed-off-by: Matt Roper 
Cc: Caz Yokoyama 
Reviewed-by: Matthew Auld 
Signed-off-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_reg.h |  3 +++
 drivers/gpu/drm/i915/intel_uncore.c | 12 
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ea20058bc13f..e7d78b10c226 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -487,6 +487,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define GAB_CTL_MMIO(0x24000)
 #define   GAB_CTL_CONT_AFTER_PAGEFAULT (1 << 8)
 
+#define GU_CNTL_MMIO(0x101010)
+#define   LMEM_INITREG_BIT(7)
+
 #define GEN6_STOLEN_RESERVED   _MMIO(0x1082C0)
 #define GEN6_STOLEN_RESERVED_ADDR_MASK (0xFFF << 20)
 #define GEN7_STOLEN_RESERVED_ADDR_MASK (0x3FFF << 18)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index ed5abe7be498..b4aaf8b7109f 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1917,6 +1917,18 @@ int intel_uncore_init_mmio(struct intel_uncore *uncore)
if (ret)
return ret;
 
+   /*
+* The boot firmware initializes local memory and assesses its health.
+* If memory training fails, the punit will have been instructed to
+* keep the GT powered down; we won't be able to communicate with it
+* and we should not continue with driver initialization.
+*/
+   if (IS_DGFX(i915) &&
+   !(__raw_uncore_read32(uncore, GU_CNTL) & LMEM_INIT)) {
+   drm_err(&i915->drm, "LMEM not initialized by firmware\n");
+   return -ENODEV;
+   }
+
if (INTEL_GEN(i915) > 5 && !intel_vgpu_active(i915))
uncore->flags |= UNCORE_HAS_FORCEWAKE;
 
-- 
2.26.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 2/4] drm/panel: boe-tv101wum-n16 seperate the panel power control

2021-04-20 Thread Jitao Shi

Seperate the panel power control from prepare/unprepare.

Signed-off-by: Jitao Shi 
---
 .../gpu/drm/panel/panel-boe-tv101wum-nl6.c| 72 +--
 1 file changed, 50 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c 
b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
index db9d0b86d542..dc49079a74d1 100644
--- a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
+++ b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
@@ -50,6 +50,7 @@ struct boe_panel {
struct regulator *avdd;
struct gpio_desc *enable_gpio;
 
+   bool prepared_power;
bool prepared;
 };
 
@@ -488,22 +489,13 @@ static int boe_panel_enter_sleep_mode(struct boe_panel 
*boe)
return 0;
 }
 
-static int boe_panel_unprepare(struct drm_panel *panel)
+static int boe_panel_unprepare_power(struct drm_panel *panel)
 {
struct boe_panel *boe = to_boe_panel(panel);
-   int ret;
 
-   if (!boe->prepared)
+   if (!boe->prepared_power)
return 0;
 
-   ret = boe_panel_enter_sleep_mode(boe);
-   if (ret < 0) {
-   dev_err(panel->dev, "failed to set panel off: %d\n", ret);
-   return ret;
-   }
-
-   msleep(150);
-
if (boe->desc->discharge_on_disable) {
regulator_disable(boe->avee);
regulator_disable(boe->avdd);
@@ -512,6 +504,7 @@ static int boe_panel_unprepare(struct drm_panel *panel)
usleep_range(5000, 7000);
regulator_disable(boe->pp1800);
} else {
+   msleep(150);
gpiod_set_value(boe->enable_gpio, 0);
usleep_range(500, 1000);
regulator_disable(boe->avee);
@@ -520,17 +513,39 @@ static int boe_panel_unprepare(struct drm_panel *panel)
regulator_disable(boe->pp1800);
}
 
+   boe->prepared_power = false;
+
+   return 0;
+}
+
+static int boe_panel_unprepare(struct drm_panel *panel)
+{
+   struct boe_panel *boe = to_boe_panel(panel);
+   int ret;
+
+   if (!boe->prepared)
+   return 0;
+
+   if (!boe->desc->discharge_on_disable) {
+   ret = boe_panel_enter_sleep_mode(boe);
+   if (ret < 0) {
+   dev_err(panel->dev, "failed to set panel off: %d\n",
+   ret);
+   return ret;
+   }
+   }
+
boe->prepared = false;
 
return 0;
 }
 
-static int boe_panel_prepare(struct drm_panel *panel)
+static int boe_panel_prepare_power(struct drm_panel *panel)
 {
struct boe_panel *boe = to_boe_panel(panel);
int ret;
 
-   if (boe->prepared)
+   if (boe->prepared_power)
return 0;
 
gpiod_set_value(boe->enable_gpio, 0);
@@ -558,18 +573,10 @@ static int boe_panel_prepare(struct drm_panel *panel)
gpiod_set_value(boe->enable_gpio, 1);
usleep_range(6000, 1);
 
-   ret = boe_panel_init_dcs_cmd(boe);
-   if (ret < 0) {
-   dev_err(panel->dev, "failed to init panel: %d\n", ret);
-   goto poweroff;
-   }
-
-   boe->prepared = true;
+   boe->prepared_power = true;
 
return 0;
 
-poweroff:
-   regulator_disable(boe->avee);
 poweroffavdd:
regulator_disable(boe->avdd);
 poweroff1v8:
@@ -580,6 +587,25 @@ static int boe_panel_prepare(struct drm_panel *panel)
return ret;
 }
 
+static int boe_panel_prepare(struct drm_panel *panel)
+{
+   struct boe_panel *boe = to_boe_panel(panel);
+   int ret;
+
+   if (boe->prepared)
+   return 0;
+
+   ret = boe_panel_init_dcs_cmd(boe);
+   if (ret < 0) {
+   dev_err(panel->dev, "failed to init panel: %d\n", ret);
+   return ret;
+   }
+
+   boe->prepared = true;
+
+   return 0;
+}
+
 static int boe_panel_enable(struct drm_panel *panel)
 {
msleep(130);
@@ -749,7 +775,9 @@ static int boe_panel_get_modes(struct drm_panel *panel,
 
 static const struct drm_panel_funcs boe_panel_funcs = {
.unprepare = boe_panel_unprepare,
+   .unprepare_power = boe_panel_unprepare_power,
.prepare = boe_panel_prepare,
+   .prepare_power = boe_panel_prepare_power,
.enable = boe_panel_enable,
.get_modes = boe_panel_get_modes,
 };
-- 
2.25.1
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 4/4] drm/mediatek: add dsi module reset driver

2021-04-20 Thread Jitao Shi

Reset dsi HW to default when power on. Prevent the setting differet
between bootloader and kernel.

Signed-off-by: Jitao Shi 
---
 drivers/gpu/drm/mediatek/mtk_dsi.c | 36 +-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
b/drivers/gpu/drm/mediatek/mtk_dsi.c
index 455fe582c6b5..113438ddd4cc 100644
--- a/drivers/gpu/drm/mediatek/mtk_dsi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
@@ -7,10 +7,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -143,6 +145,8 @@
 #define DATA_0 (0xff << 16)
 #define DATA_1 (0xff << 24)
 
+#define MMSYS_SW_RST_DSI_B BIT(25)
+
 #define NS_TO_CYCLE(n, c)((n) / (c) + (((n) % (c)) ? 1 : 0))
 
 #define MTK_DSI_HOST_IS_READ(type) \
@@ -186,7 +190,8 @@ struct mtk_dsi {
struct drm_bridge *next_bridge;
struct drm_connector *connector;
struct phy *phy;
-
+   struct regmap *mmsys_sw_rst_b;
+   u32 sw_rst_b;
void __iomem *regs;
 
struct clk *engine_clk;
@@ -272,6 +277,16 @@ static void mtk_dsi_disable(struct mtk_dsi *dsi)
mtk_dsi_mask(dsi, DSI_CON_CTRL, DSI_EN, 0);
 }
 
+static void mtk_dsi_reset_all(struct mtk_dsi *dsi)
+{
+   regmap_update_bits(dsi->mmsys_sw_rst_b, dsi->sw_rst_b,
+  MMSYS_SW_RST_DSI_B, 0);
+   usleep_range(1000, 1100);
+
+   regmap_update_bits(dsi->mmsys_sw_rst_b, dsi->sw_rst_b,
+  MMSYS_SW_RST_DSI_B, MMSYS_SW_RST_DSI_B);
+}
+
 static void mtk_dsi_reset_engine(struct mtk_dsi *dsi)
 {
mtk_dsi_mask(dsi, DSI_CON_CTRL, DSI_RESET, DSI_RESET);
@@ -985,6 +1000,8 @@ static int mtk_dsi_bind(struct device *dev, struct device 
*master, void *data)
 
ret = mtk_dsi_encoder_init(drm, dsi);
 
+   mtk_dsi_reset_all(dsi);
+
return ret;
 }
 
@@ -1007,6 +1024,7 @@ static int mtk_dsi_probe(struct platform_device *pdev)
struct device *dev = &pdev->dev;
struct drm_panel *panel;
struct resource *regs;
+   struct regmap *regmap;
int irq_num;
int ret;
 
@@ -1022,6 +1040,22 @@ static int mtk_dsi_probe(struct platform_device *pdev)
return ret;
}
 
+   regmap = syscon_regmap_lookup_by_phandle(dev->of_node,
+"mediatek,syscon-dsi");
+   ret = of_property_read_u32_index(dev->of_node, "mediatek,syscon-dsi", 1,
+&dsi->sw_rst_b);
+
+   if (IS_ERR(regmap))
+   ret = PTR_ERR(regmap);
+
+   if (ret) {
+   ret = PTR_ERR(regmap);
+   dev_err(dev, "Failed to get mmsys registers: %d\n", ret);
+   return ret;
+   }
+
+   dsi->mmsys_sw_rst_b = regmap;
+
ret = drm_of_find_panel_or_bridge(dev->of_node, 0, 0,
  &panel, &dsi->next_bridge);
if (ret)
-- 
2.25.1
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/4] drm/panel: seperate panel power control from panel prepare/unprepare

2021-04-20 Thread Jitao Shi

Some dsi panels require the dsi lanes keeping low before panel power
on. So seperate the panel power control and the communication with panel.

And put the power control in drm_panel_prepare_power and
drm_panel_unprepare_power. Put the communication with panel in
drm_panel_prepare and drm_panel_unprepare.

Signed-off-by: Jitao Shi 
---
 drivers/gpu/drm/bridge/panel.c | 17 +++
 drivers/gpu/drm/drm_panel.c| 38 ++
 include/drm/drm_bridge.h   |  2 ++
 include/drm/drm_panel.h| 17 +++
 4 files changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/bridge/panel.c b/drivers/gpu/drm/bridge/panel.c
index 0ddc37551194..a19c96e710fc 100644
--- a/drivers/gpu/drm/bridge/panel.c
+++ b/drivers/gpu/drm/bridge/panel.c
@@ -125,6 +125,23 @@ static int panel_bridge_get_modes(struct drm_bridge 
*bridge,
return drm_panel_get_modes(panel_bridge->panel, connector);
 }
 
+int panel_bridge_prepare_power(struct drm_bridge *bridge)
+{
+   struct panel_bridge *panel_bridge = drm_bridge_to_panel_bridge(bridge);
+
+   return drm_panel_prepare_power(panel_bridge->panel);
+}
+EXPORT_SYMBOL(panel_bridge_prepare_power);
+
+int panel_bridge_unprepare_power(struct drm_bridge *bridge)
+{
+struct panel_bridge *panel_bridge = drm_bridge_to_panel_bridge(bridge);
+
+return drm_panel_unprepare_power(panel_bridge->panel);
+}
+EXPORT_SYMBOL(panel_bridge_unprepare_power);
+
+
 static const struct drm_bridge_funcs panel_bridge_bridge_funcs = {
.attach = panel_bridge_attach,
.detach = panel_bridge_detach,
diff --git a/drivers/gpu/drm/drm_panel.c b/drivers/gpu/drm/drm_panel.c
index f634371c717a..7bb5185db17d 100644
--- a/drivers/gpu/drm/drm_panel.c
+++ b/drivers/gpu/drm/drm_panel.c
@@ -115,6 +115,24 @@ int drm_panel_prepare(struct drm_panel *panel)
 }
 EXPORT_SYMBOL(drm_panel_prepare);
 
+/**
+ * drm_panel_prepare_power - power on a panel's power
+ * @panel: DRM panel
+ *
+ * Calling this function will enable power and deassert any reset signals to
+ * the panel.
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int drm_panel_prepare_power(struct drm_panel *panel)
+{
+   if (panel && panel->funcs && panel->funcs->prepare_power)
+   return panel->funcs->prepare_power(panel);
+
+   return panel ? -ENOSYS : -EINVAL;
+}
+EXPORT_SYMBOL(drm_panel_prepare_power);
+
 /**
  * drm_panel_unprepare - power off a panel
  * @panel: DRM panel
@@ -138,6 +156,26 @@ int drm_panel_unprepare(struct drm_panel *panel)
 }
 EXPORT_SYMBOL(drm_panel_unprepare);
 
+/**
+ * drm_panel_unprepare_power - power off a panel
+ * @panel: DRM panel
+ *
+ * Calling this function will completely power off a panel (assert the panel's
+ * reset, turn off power supplies, ...). After this function has completed, it
+ * is usually no longer possible to communicate with the panel until another
+ * call to drm_panel_prepare_power and drm_panel_prepare().
+ *
+ * Return: 0 on success or a negative error code on failure.
+ */
+int drm_panel_unprepare_power(struct drm_panel *panel)
+{
+   if (panel && panel->funcs && panel->funcs->unprepare_power)
+   return panel->funcs->unprepare_power(panel);
+
+   return panel ? -ENOSYS : -EINVAL;
+}
+EXPORT_SYMBOL(drm_panel_unprepare_power);
+
 /**
  * drm_panel_enable - enable a panel
  * @panel: DRM panel
diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
index 2195daa289d2..cc94c9da47d8 100644
--- a/include/drm/drm_bridge.h
+++ b/include/drm/drm_bridge.h
@@ -892,6 +892,8 @@ struct drm_bridge *devm_drm_panel_bridge_add_typed(struct 
device *dev,
   struct drm_panel *panel,
   u32 connector_type);
 struct drm_connector *drm_panel_bridge_connector(struct drm_bridge *bridge);
+int panel_bridge_prepare_power(struct drm_bridge *bridge);
+int panel_bridge_unprepare_power(struct drm_bridge *bridge);
 #endif
 
 #endif
diff --git a/include/drm/drm_panel.h b/include/drm/drm_panel.h
index 33605c3f0eba..48e83712ad44 100644
--- a/include/drm/drm_panel.h
+++ b/include/drm/drm_panel.h
@@ -68,6 +68,13 @@ enum drm_panel_orientation;
  * functionality to enable/disable backlight.
  */
 struct drm_panel_funcs {
+   /**
+* @prepare_power:
+*
+* Turn on panel power.
+*/
+   int (*prepare_power)(struct drm_panel *panel);
+
/**
 * @prepare:
 *
@@ -115,6 +122,13 @@ struct drm_panel_funcs {
int (*get_modes)(struct drm_panel *panel,
 struct drm_connector *connector);
 
+   /**
+* @unprepare_power:
+*
+* Turn off panel_power.
+*/
+   int (*unprepare_power)(struct drm_panel *panel);
+
/**
 * @get_timings:
 *
@@ -180,6 +194,9 @@ void drm_panel_init(struct drm_panel *panel, struct device 
*dev,
 void drm_panel_add(struct drm_panel *panel);
 void d

[PATCH 3/4] drm/mediatek: fine tune the dsi panel's power sequence

2021-04-20 Thread Jitao Shi

Add the drm_panel_prepare_power and drm_panel_unprepare_power control.
Turn on panel power(drm_panel_prepare_power) and control before dsi
enable. And then dsi enable, send dcs cmd in drm_panel_prepare, last
turn on backlight.

Signed-off-by: Jitao Shi 
---
 drivers/gpu/drm/mediatek/mtk_dsi.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
b/drivers/gpu/drm/mediatek/mtk_dsi.c
index a1ff152ef468..455fe582c6b5 100644
--- a/drivers/gpu/drm/mediatek/mtk_dsi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
@@ -615,10 +615,13 @@ static int mtk_dsi_poweron(struct mtk_dsi *dsi)
dsi->data_rate = DIV_ROUND_UP_ULL(dsi->vm.pixelclock * bit_per_pixel,
  dsi->lanes);
 
+   if (panel_bridge_prepare_power(dsi->next_bridge))
+   DRM_INFO("can't prepare power the panel\n");
+
ret = clk_set_rate(dsi->hs_clk, dsi->data_rate);
if (ret < 0) {
dev_err(dev, "Failed to set data rate: %d\n", ret);
-   goto err_refcount;
+   goto err_prepare_power;
}
 
phy_power_on(dsi->phy);
@@ -661,7 +664,9 @@ static int mtk_dsi_poweron(struct mtk_dsi *dsi)
clk_disable_unprepare(dsi->engine_clk);
 err_phy_power_off:
phy_power_off(dsi->phy);
-err_refcount:
+err_prepare_power:
+   if (panel_bridge_unprepare_power(dsi->next_bridge))
+   DRM_INFO("Can't unprepare power the panel\n");
dsi->refcount--;
return ret;
 }
@@ -694,6 +699,9 @@ static void mtk_dsi_poweroff(struct mtk_dsi *dsi)
clk_disable_unprepare(dsi->digital_clk);
 
phy_power_off(dsi->phy);
+
+   if (panel_bridge_unprepare_power(dsi->next_bridge))
+   DRM_INFO("Can't unprepare power the panel\n");
 }
 
 static void mtk_output_dsi_enable(struct mtk_dsi *dsi)
-- 
2.25.1
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Daniel Stone

Hi Peter,

On Fri, 16 Apr 2021 at 13:34, Peter Enderborg 
wrote:

> This adds a total used dma-buf memory. Details
> can be found in debugfs, however it is not for everyone
> and not always available. dma-buf are indirect allocated by
> userspace. So with this value we can monitor and detect
> userspace applications that have problems.
>

FWIW, this won't work super well for Android where gralloc is implemented
as a system service, so all graphics usage will instantly be accounted to
it.

Cheers,
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PULL] topic/intel-gen-to-ver -> drm-intel-next and drm-intel-gt-next

2021-04-20 Thread Rodrigo Vivi

On Tue, Apr 20, 2021 at 12:47:36PM +0300, Joonas Lahtinen wrote:
> Quoting Jani Nikula (2021-04-19 12:53:11)
> > 
> > Hi Joonas and Rodrigo -
> > 
> > Here's the gen to ver conversion topic branch to be merged to both
> > drm-intel-next and drm-intel-gt-next.
> 
> Pulled.

Pulled, thanks.
(Sorry for the delay on the notification here)

> 
> Regards, Joonas
> 
> > Lots of Cc's for heads up.
> > 
> > 
> > BR,
> > Jani.
> > 
> > 
> > topic/intel-gen-to-ver-2021-04-19:
> > Gen to ver conversions across the driver
> > 
> > The main change is Lucas' series [1], with Ville's GLK fixes [2] and a
> > cherry-pick of Matt's commit [3] from drm-intel-next as a base to avoid
> > conflicts.
> > 
> > [1] https://patchwork.freedesktop.org/series/88825/
> > [2] https://patchwork.freedesktop.org/series/88938/
> > [3] 70bfb30743d5 ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
> > 
> > BR,
> > Jani.
> > 
> > The following changes since commit 9c0fed84d5750e1eea6c664e073ffa2534a17743:
> > 
> >   Merge tag 'drm-intel-next-2021-04-01' of 
> > git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-04-08 
> > 14:02:21 +1000)
> > 
> > are available in the Git repository at:
> > 
> >   git://anongit.freedesktop.org/drm/drm-intel 
> > tags/topic/intel-gen-to-ver-2021-04-19
> > 
> > for you to fetch changes up to 425390c5dce6da76578389629d19517fcd79c959:
> > 
> >   drm/i915: split dgfx features from gen 12 (2021-04-14 13:05:06 +0300)
> > 
> > 
> > Gen to ver conversions across the driver
> > 
> > The main change is Lucas' series [1], with Ville's GLK fixes [2] and a
> > cherry-pick of Matt's commit [3] from drm-intel-next as a base to avoid
> > conflicts.
> > 
> > [1] https://patchwork.freedesktop.org/series/88825/
> > [2] https://patchwork.freedesktop.org/series/88938/
> > [3] 70bfb30743d5 ("drm/i915/display: Eliminate IS_GEN9_{BC,LP}")
> > 
> > 
> > Lucas De Marchi (12):
> >   drm/i915/display: use DISPLAY_VER() on remaining users
> >   drm/i915: rename display.version to display.ver
> >   drm/i915/display: rename display version macros
> >   drm/i915: add macros for graphics and media versions
> >   drm/i915/gt: replace gen use in intel_engine_cs
> >   drm/i915/selftests: replace unused mask with simple version
> >   drm/i915/selftests: eliminate use of gen_mask
> >   drm/i915: finish removal of gen_mask
> >   drm/i915: eliminate remaining uses of intel_device_info->gen
> >   drm/i915: finish removal of gen from intel_device_info
> >   drm/i915: add media and display versions to device_info print
> >   drm/i915: split dgfx features from gen 12
> > 
> > Matt Roper (1):
> >   drm/i915/display: Eliminate IS_GEN9_{BC,LP}
> > 
> > Ville Syrjälä (5):
> >   drm/i915: Restore lost glk FBC 16bpp w/a
> >   drm/i915: Restore lost glk ccs w/a
> >   drm/i915: Disable LTTPR detection on GLK once again
> >   drm/i915: Don't use {skl, cnl}_hpd_pin() for bxt/glk
> >   drm/i915: Remove a few redundant glk checks
> > 
> >  drivers/gpu/drm/i915/display/i9xx_plane.c  |  2 +-
> >  drivers/gpu/drm/i915/display/icl_dsi.c |  4 +-
> >  drivers/gpu/drm/i915/display/intel_atomic.c|  2 +-
> >  drivers/gpu/drm/i915/display/intel_audio.c |  4 +-
> >  drivers/gpu/drm/i915/display/intel_bios.c  |  9 +--
> >  drivers/gpu/drm/i915/display/intel_bw.c|  8 +--
> >  drivers/gpu/drm/i915/display/intel_cdclk.c | 42 +++---
> >  drivers/gpu/drm/i915/display/intel_color.c |  6 +-
> >  drivers/gpu/drm/i915/display/intel_crt.c   |  6 +-
> >  drivers/gpu/drm/i915/display/intel_crtc.c  |  4 +-
> >  drivers/gpu/drm/i915/display/intel_csr.c   |  4 +-
> >  drivers/gpu/drm/i915/display/intel_ddi.c   | 53 +-
> >  drivers/gpu/drm/i915/display/intel_ddi_buf_trans.c | 10 ++--
> >  drivers/gpu/drm/i915/display/intel_display.c   | 64 
> > +++---
> >  .../gpu/drm/i915/display/intel_display_debugfs.c   |  2 +-
> >  drivers/gpu/drm/i915/display/intel_display_power.c | 57 ++-
> >  drivers/gpu/drm/i915/display/intel_dp.c| 10 ++--
> >  .../gpu/drm/i915/display/intel_dp_link_training.c  |  2 +-
> >  drivers/gpu/drm/i915/display/intel_dp_mst.c|  2 +-
> >  drivers/gpu/drm/i915/display/intel_dpll.c  |  8 +--
> >  drivers/gpu/drm/i915/display/intel_dpll_mgr.c  |  6 +-
> >  drivers/gpu/drm/i915/display/intel_fb.c|  2 +-
> >  drivers/gpu/drm/i915/display/intel_fbc.c   | 21 +++
> >  drivers/gpu/drm/i915/display/intel_fifo_underrun.c |  4 +-
> >  drivers/gpu/drm/i915/display/intel_gmbus.c | 12 ++--
> >  drivers/gpu/drm/i915/display/intel_hdcp.c  |  9 +--
> >  drivers/gpu/drm/i915/display/intel_hdmi.c  |  9 +--
> >  drivers/gpu/drm/i915/display/intel_lvds.c

Re: [PATCH v2] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 3:34 PM, Daniel Stone wrote:
> Hi Peter,
>
> On Fri, 16 Apr 2021 at 13:34, Peter Enderborg  > wrote:
>
> This adds a total used dma-buf memory. Details
> can be found in debugfs, however it is not for everyone
> and not always available. dma-buf are indirect allocated by
> userspace. So with this value we can monitor and detect
> userspace applications that have problems.
>
>
> FWIW, this won't work super well for Android where gralloc is implemented as 
> a system service, so all graphics usage will instantly be accounted to it.
>
> Cheers,
> Daniel 

This resource allocation is a big part of why we need it. Why should it not 
work? 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 3:04 PM Daniel Stone  wrote:
>
> Hi,
>
> On Tue, 20 Apr 2021 at 13:01, Daniel Vetter  wrote:
>>
>> - We live in a post xf86-video-$vendor world, and all these other
>>   compositors rely on implicit sync. You're not going to be able to get
>>   rid of them anytime soon. What's worse, all the various EGL/vk buffer
>>   sharing things also rely on implicit sync, so you get to fix up tons of
>>   applications on top. Any plan that's realistic needs to cope with
>>   implicit/explicit at the same time together won't work.
>>
>> - Absolute infuriating, but you can't use page-faulting together with any
>>   dma_fence synchronization primitives, whether implicit or explicit. This
>>   means until the entire ecosystem moved forward (good luck with that) we
>>   have to support dma_fence. The only sync model that works together with
>>   page faults is userspace fence based sync.
>>
>> This should get rid of the oversync issues, and since implicit sync is
>> backed in everywhere right now, you'll have to deal with implicit sync for
>> a very long time.
>
>
> Depends what you mean by 'implicit sync'. ;)
>
> Getting userspace (Vulkan WSI, EGL, Wayland compositors, browsers, media 
> clients) over to explicit sync is easy, _provided_ that the explicit sync 
> gives us the same guarantees as implicit sync, i.e. completes in bounded 
> time, GPU/display work can be flushed to the kernel predicated on fence 
> completion with the kernel handling synchronisation and scheduling. It's just 
> a matter of typing, and until now we haven't had a great reason to do that 
> typing. Now we do have that reason, so we are implementing it. Whether it's 
> dma_fence or drm_syncobj is mostly immaterial; we can encode in protocol 
> requirements that you can't try to use wait-before-signal with drm_syncobj 
> and you'll get killed if you try.
>
> Getting that userspace over to fully userspace-based sync (wait-before-signal 
> or wait-never-signal, no kernel assistance but you just have to roll your own 
> polling or signal handling on either CPU or GPU side) is not easy. It might 
> never happen, because it's an extraordinary amount of work, introduces a huge 
> amount of fragility into a super-critical path, and and so far it's not clear 
> that it's a global performance improvement for the whole system, just 
> shifting performance problems from kernel to userspace, and probably (AFAICT) 
> making them worse in addition to the other problems it brings.
>
> What am I missing?

Nothing I think.

Which is why I'm arguing that kernel based sync with all the current
dma_fence guarantees is probably going to stick around for something
close to forever, and we need to assume so.

Only in specific cases does full userspace sync make sense imo:
- anything compute, excluding using compute/shaders to create
displayable buffers, but compute as in your final target is writing
some stuff to files and never interacting with any winsys. Those
really care because "run a compute kernel for a few hours" isn't
supported without userspace sync, and I don't think ever will.
- maybe vulkan direct display, once/if we have the extensions for
atomic kms wired up
- maybe someone wants to write a vulkan based compositor and deal with
all this themselves. That model I think would also imply that they
deal with all the timeouts and fallbacks, irrespective of whether
underneath we actually run on dma_fence timeline syncobjs or userspace
fence timeline syncobjs.

>From about 2 years of screaming at this stuff it feels like this will
be a pretty exhaustive list for the next 10 years. Definitely doesn't
include your random linux desktop wayland compositor stack. But
there's definitely some are specific areas where people care enough
for all the pain. For everyone else it's all the other pieces I laid
out.

This also means that I don't think we now have that impedus to start
typing all the explicit sync protocol/compositor bits, since:
- the main driver is compute stuff, that needs mesa work (well vk/ocl
plus all the various repainted copies of cuda)
- with the tricks to make implicit sync work more like explicit sync
the oversyncing can be largely solved without protocol work
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Vetter

On Tue, Apr 20, 2021 at 1:59 PM Christian König
 wrote:
>
> > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > the kernel's problem.
>
> Well, the path of inner peace begins with four words. “Not my fucking
> problem.”
>
> But I'm not that much concerned about the kernel, but rather about
> important userspace processes like X, Wayland, SurfaceFlinger etc...
>
> I mean attaching a page to a sync object and allowing to wait/signal
> from both CPU as well as GPU side is not so much of a problem.
>
> > You have to somehow handle that, e.g. perhaps with conditional
> > rendering and just using the old frame in compositing if the new one
> > doesn't show up in time.
>
> Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level.

For opengl we do all the same guarantees, so if you get one of these
you just block until the fence is signalled. Doing that properly means
submit thread to support drm_syncobj like for vulkan.

For vulkan we probably want to represent these as proper vk timeline
objects, and the vulkan way is to just let the application (well
compositor) here deal with it. If they import timelines from untrusted
other parties, they need to handle the potential fallback of being
lied at. How is "not vulkan's fucking problem", because that entire
"with great power (well performance) comes great responsibility" is
the entire vk design paradigm.

Glamour will just rely on GL providing nice package of the harsh
reality of gpus, like usual.

So I guess step 1 here for GL would be to provide some kind of
import/export of timeline syncobj, including properly handling this
"future/indefinite fences" aspect of them with submit thread and
everything.
-Daniel

>
> Regards,
> Christian.
>
> Am 20.04.21 um 13:16 schrieb Daniel Vetter:
> > On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote:
> >> Daniel, are you suggesting that we should skip any deadlock prevention in
> >> the kernel, and just let userspace wait for and signal any fence it has
> >> access to?
> > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > the kernel's problem. The only criteria is that the kernel itself must
> > never rely on these userspace fences, except for stuff like implementing
> > optimized cpu waits. And in those we must always guarantee that the
> > userspace process remains interruptible.
> >
> > It's a completely different world from dma_fence based kernel fences,
> > whether those are implicit or explicit.
> >
> >> Do you have any concern with the deprecation/removal of BO fences in the
> >> kernel assuming userspace is only using explicit fences? Any concern with
> >> the submit and return fences for modesetting and other producer<->consumer
> >> scenarios?
> > Let me work on the full replay for your rfc first, because there's a lot
> > of details here and nuance.
> > -Daniel
> >
> >> Thanks,
> >> Marek
> >>
> >> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:
> >>
> >>> On Tue, Apr 20, 2021 at 12:15 PM Christian König
> >>>  wrote:
>  Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > Not going to comment on everything on the first pass...
> >
> > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> >> Hi,
> >>
> >> This is our initial proposal for explicit fences everywhere and new
> >>> memory management that doesn't use BO fences. It's a redesign of how Linux
> >>> graphics drivers work, and it can coexist with what we have now.
> >>
> >> 1. Introduction
> >> (skip this if you are already sold on explicit fences)
> >>
> >> The current Linux graphics architecture was initially designed for
> >>> GPUs with only one graphics queue where everything was executed in the
> >>> submission order and per-BO fences were used for memory management and
> >>> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> >>> queues were added on top, which required the introduction of implicit
> >>> GPU-GPU synchronization between queues of different processes using per-BO
> >>> fences. Recently, even parallel execution within one queue was enabled
> >>> where a command buffer starts draws and compute shaders, but doesn't wait
> >>> for them, enabling parallelism between back-to-back command buffers.
> >>> Modesetting also uses per-BO fences for scheduling flips. Our GPU 
> >>> scheduler
> >>> was created to enable all those use cases, and it's the only reason why 
> >>> the
> >>> scheduler exists.
> >> The GPU scheduler, implicit synchronization, BO-fence-based memory
> >>> management, and the tracking of per-BO fences increase CPU overhead and
> >>> latency, and reduce parallelism. There is a desire to replace all of them
> >>> with something much simpler. Below is how we could do it.
> >>
> >> 2. Explicit synchronization for window systems and modesetting
> >>
> >> The producer is an application and the consumer is a compositor or a
> >>> modesetting driver.
> >> 2.1. The

Re: [PATCH] drm/i915: fix an error code in intel_overlay_do_put_image()

2021-04-20 Thread Rodrigo Vivi

On Wed, Apr 14, 2021 at 09:02:24AM +0300, Dan Carpenter wrote:
> This code should propagate the error from intel_overlay_pin_fb()
> but currently it returns success.
> 
> Fixes: 1b321026e213 ("drm/i915: Pass ww ctx to intel_pin_to_display_plane")
> Signed-off-by: Dan Carpenter 

Reviewed-by: Rodrigo Vivi 

and pushed. thanks for the patch.

> ---
>  drivers/gpu/drm/i915/display/intel_overlay.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
> b/drivers/gpu/drm/i915/display/intel_overlay.c
> index e477b6114a60..e5dadde422f7 100644
> --- a/drivers/gpu/drm/i915/display/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/display/intel_overlay.c
> @@ -803,8 +803,10 @@ static int intel_overlay_do_put_image(struct 
> intel_overlay *overlay,
>   atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
>  
>   vma = intel_overlay_pin_fb(new_bo);
> - if (IS_ERR(vma))
> + if (IS_ERR(vma)) {
> + ret = PTR_ERR(vma);
>   goto out_pin_section;
> + }
>  
>   i915_gem_object_flush_frontbuffer(new_bo, ORIGIN_DIRTYFB);
>  
> -- 
> 2.30.2
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 10/16] drm/exynos: implement a drm bridge

2021-04-20 Thread Laurent Pinchart

Hi Frieder,

On Tue, Apr 20, 2021 at 01:42:05PM +0200, Frieder Schrempf wrote:
> On 23.02.21 13:07, Daniel Vetter wrote:
> > On Thu, Feb 18, 2021 at 5:02 PM Andrzej Hajda  wrote:
> >> W dniu 18.02.2021 o 09:04, Michael Tretter pisze:
> >>> On Wed, 10 Feb 2021 10:10:37 +0100, Frieder Schrempf wrote:
>  On 04.02.21 18:46, Daniel Vetter wrote:
> > On Thu, Feb 4, 2021 at 6:26 PM Laurent Pinchart 
> >  wrote:
> >> On Thu, Feb 04, 2021 at 06:19:22PM +0100, Daniel Vetter wrote:
> >>> On Thu, Feb 4, 2021 at 5:28 PM Andrzej Hajda wrote:
>  W dniu 04.02.2021 o 17:05, Daniel Vetter pisze:
> > On Thu, Feb 04, 2021 at 11:56:32AM +0100, Michael Tretter wrote:
> >> On Thu, 04 Feb 2021 11:17:49 +0100, Daniel Vetter wrote:
> >>> On Wed, Feb 3, 2021 at 9:32 PM Michael Tretter wrote:
>  On Mon, 01 Feb 2021 17:33:14 +0100, Michael Tretter wrote:
> > On Tue, 15 Sep 2020 21:40:40 +0200, Andrzej Hajda wrote:
> >> W dniu 14.09.2020 o 23:19, Andrzej Hajda pisze:
> >>> On 14.09.2020 22:01, Michael Tretter wrote:
>  On Mon, 14 Sep 2020 14:31:19 +0200, Marek Szyprowski wrote:
> > On 14.09.2020 10:29, Marek Szyprowski wrote:
> >> On 11.09.2020 15:54, Michael Tretter wrote:
> >>> Make the exynos_dsi driver a full drm bridge that can be 
> >>> found and
> >>> used
> >>> from other drivers.
> >>>
> >>> Other drivers can only attach to the bridge, if a mipi 
> >>> dsi device
> >>> already attached to the bridge. This allows to defer the 
> >>> probe of the
> >>> display pipe until the downstream bridges are available, 
> >>> too.
> >>>
> >>> Signed-off-by: Michael Tretter 
> >> This one (and the whole series applied) still fails on 
> >> Exynos boards:
> >>
> >> [drm] Exynos DRM: using 11c0.fimd device for DMA 
> >> mapping
> >> operations
> >> exynos-drm exynos-drm: bound 11c0.fimd (ops 
> >> fimd_component_ops)
> >> OF: graph: no port node found in /soc/dsi@11c8
> >> 8<--- cut here ---
> >> Unable to handle kernel NULL pointer dereference at 
> >> virtual address
> >> 0084
> >> pgd = (ptrval)
> >> [0084] *pgd=
> >> Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> >> Modules linked in:
> >> CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> >> 5.9.0-rc4-next-20200911-00010-g417dc70d70ec #1608
> >> Hardware name: Samsung Exynos (Flattened Device Tree)
> >> PC is at drm_bridge_attach+0x18/0x164
> >> LR is at exynos_dsi_bind+0x88/0xa8
> >> pc : []lr : []psr: 2013
> >> sp : ef0dfca8  ip : 0002  fp : c13190e0
> >> r10:   r9 : ee46d580  r8 : c13190e0
> >> r7 : ee438800  r6 : 0018  r5 : ef253810  r4 : ef39e840
> >> r3 :   r2 : 0018  r1 : ef39e888  r0 : ef39e840
> >> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
> >> Segment none
> >> Control: 10c5387d  Table: 4000404a  DAC: 0051
> >> Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> >> Stack: (0xef0dfca8 to 0xef0e)
> >> ...
> >> [] (drm_bridge_attach) from []
> >> (exynos_dsi_bind+0x88/0xa8)
> >> [] (exynos_dsi_bind) from []
> >> (component_bind_all+0xfc/0x290)
> >> [] (component_bind_all) from []
> >> (exynos_drm_bind+0xe4/0x19c)
> >> [] (exynos_drm_bind) from []
> >> (try_to_bring_up_master+0x1e4/0x2c4)
> >> [] (try_to_bring_up_master) from []
> >> (component_master_add_with_match+0xd4/0x108)
> >> [] (component_master_add_with_match) from 
> >> []
> >> (exynos_drm_platform_probe+0xe4/0x110)
> >> [] (exynos_drm_platform_probe) from []
> >> (platform_drv_probe+0x6c/0xa4)
> >> [] (platform_drv_probe) from []
> >> (really_probe+0x200/0x4fc)
> >> [] (really_probe) from []
> >> (driver_probe_device+0x78/0x1fc)
> >> [] (driver_probe_device) from []
> >> (device_driver_attach+0x58/0x60)
> >> [] (device_driver_attach) from []
> >> (__driver_attach+0xdc/0x174)
> >> [] (__driver_at

[PATCH] drm/ttm: fix error handling if no BO can be swapped out

2021-04-20 Thread Shiwu Zhang

In case that all pre-allocated BOs are busy, just continue to populate
BOs since likely half of system memory in total is still free.

Signed-off-by: Shiwu Zhang 
---
 drivers/gpu/drm/ttm/ttm_device.c | 4 ++--
 drivers/gpu/drm/ttm/ttm_tt.c | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 1f2024164d72..0200709db9be 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -133,7 +133,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
struct ttm_resource_manager *man;
struct ttm_buffer_object *bo;
unsigned i, j;
-   int ret;
+   int ret=-EBUSY;
 
spin_lock(&bdev->lru_lock);
for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
@@ -161,7 +161,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
}
}
spin_unlock(&bdev->lru_lock);
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL(ttm_device_swapout);
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 48c407cff112..4e1e06a04428 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -329,6 +329,8 @@ int ttm_tt_populate(struct ttm_device *bdev,
   ttm_dma32_pages_limit) {
 
ret = ttm_global_swapout(ctx, GFP_KERNEL);
+   if (ret == -EBUSY)
+   break;
if (ret < 0)
goto error;
}
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Daniel Stone

On Tue, 20 Apr 2021 at 14:46,  wrote:

> On 4/20/21 3:34 PM, Daniel Stone wrote:
> > On Fri, 16 Apr 2021 at 13:34, Peter Enderborg  > wrote:
> > This adds a total used dma-buf memory. Details
> > can be found in debugfs, however it is not for everyone
> > and not always available. dma-buf are indirect allocated by
> > userspace. So with this value we can monitor and detect
> > userspace applications that have problems.
> >
> >
> > FWIW, this won't work super well for Android where gralloc is
> implemented as a system service, so all graphics usage will instantly be
> accounted to it.
>
> This resource allocation is a big part of why we need it. Why should it
> not work?
>

Sorry, I'd somehow completely misread that as being locally rather than
globally accounted. Given that, it's more correct, just also not super
useful.

Some drivers export allocation tracepoints which you could use if you have
a decent userspace tracing infrastructure. Short of that, many drivers
export this kind of thing through debugfs already. I think a better
long-term direction is probably getting accounting from dma-heaps rather
than extending core dmabuf itself.

Cheers,
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Stone

Hi,

On Mon, 19 Apr 2021 at 11:48, Marek Olšák  wrote:

> Deadlock mitigation to recover from segfaults:
> - The kernel knows which process is obliged to signal which fence. This
> information is part of the Present request and supplied by userspace.
> - If the producer crashes, the kernel signals the submit fence, so that
> the consumer can make forward progress.
> - If the consumer crashes, the kernel signals the return fence, so that
> the producer can reclaim the buffer.
> - A GPU hang signals all fences. Other deadlocks will be handled like GPU
> hangs.
>

Another thought: with completely arbitrary userspace fencing, none of this
is helpful either. If the compositor can't guarantee that a hostile client
has submitted a fence which will never be signaled, then it won't be
waiting on it, so it already needs infrastructure to handle something like
this. That already handles the crashed-client case, because if the client
crashes, then its connection will be dropped, which will trigger the
compositor to destroy all its resources anyway, including any pending waits.

GPU hangs also look pretty similar; it's an infinite wait, until the client
resubmits a new buffer which would replace (& discard) the old.

So signal-fence-on-process-exit isn't helpful and doesn't provide any extra
reliability; it in fact probably just complicates things.

Cheers,
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Peter.Enderborg

On 4/20/21 4:48 PM, Daniel Stone wrote:
> On Tue, 20 Apr 2021 at 14:46,  > wrote:
>
> On 4/20/21 3:34 PM, Daniel Stone wrote:
> > On Fri, 16 Apr 2021 at 13:34, Peter Enderborg    >> wrote:
> >     This adds a total used dma-buf memory. Details
> >     can be found in debugfs, however it is not for everyone
> >     and not always available. dma-buf are indirect allocated by
> >     userspace. So with this value we can monitor and detect
> >     userspace applications that have problems.
> >
> >
> > FWIW, this won't work super well for Android where gralloc is 
> implemented as a system service, so all graphics usage will instantly be 
> accounted to it.
>
> This resource allocation is a big part of why we need it. Why should it 
> not work?
>
>
> Sorry, I'd somehow completely misread that as being locally rather than 
> globally accounted. Given that, it's more correct, just also not super useful.
>
> Some drivers export allocation tracepoints which you could use if you have a 
> decent userspace tracing infrastructure. Short of that, many drivers export 
> this kind of thing through debugfs already. I think a better long-term 
> direction is probably getting accounting from dma-heaps rather than extending 
> core dmabuf itself.
>
> Cheers,
> Daniel 

Debugfs and traces are useful when you pin down your problem.  Debugfs does not 
exist on commercial devices so we need some hints on what going on, and trace 
points needs active debugging
and before the problem occurs. A metric on dma-buf can be sent with a bugreport.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Michal Hocko

On Tue 20-04-21 09:25:51, peter.enderb...@sony.com wrote:
> On 4/20/21 11:12 AM, Michal Hocko wrote:
> > On Tue 20-04-21 09:02:57, peter.enderb...@sony.com wrote:
>  But that isn't really system memory at all, it's just allocated device
>  memory.
> >>> OK, that was not really clear to me. So this is not really accounted to
> >>> MemTotal? If that is really the case then reporting it into the oom
> >>> report is completely pointless and I am not even sure /proc/meminfo is
> >>> the right interface either. It would just add more confusion I am
> >>> afraid.
> >>>  
> >> Why is it confusing? Documentation is quite clear:
> > Because a single counter without a wider context cannot be put into any
> > reasonable context. There is no notion of the total amount of device
> > memory usable for dma-buf. As Christian explained some of it can be RAM
> > based. So a single number is rather pointless on its own in many cases.
> >
> > Or let me just ask. What can you tell from dma-bud: $FOO kB in its
> > current form?
> 
> It is better to be blind?

No it is better to have a sensible counter that can be reasoned about.
So far you are only claiming that having something is better than
nothing and I would agree with you if that was a debugging one off
interface. But /proc/meminfo and other proc files have to be maintained
with future portability in mind. This is not a dumping ground for _some_
counters that might be interesting at the _current_ moment. E.g. what
happens if somebody wants to have a per device resp. memory based
dma-buf data? Are you going to change the semantic or add another
2 counters?
-- 
Michal Hocko
SUSE Labs
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Stone

On Tue, 20 Apr 2021 at 15:58, Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 20.04.21 um 16:53 schrieb Daniel Stone:
>
> On Mon, 19 Apr 2021 at 11:48, Marek Olšák  wrote:
>
>> Deadlock mitigation to recover from segfaults:
>> - The kernel knows which process is obliged to signal which fence. This
>> information is part of the Present request and supplied by userspace.
>> - If the producer crashes, the kernel signals the submit fence, so that
>> the consumer can make forward progress.
>> - If the consumer crashes, the kernel signals the return fence, so that
>> the producer can reclaim the buffer.
>> - A GPU hang signals all fences. Other deadlocks will be handled like GPU
>> hangs.
>>
>
> Another thought: with completely arbitrary userspace fencing, none of this
> is helpful either. If the compositor can't guarantee that a hostile client
> has submitted a fence which will never be signaled, then it won't be
> waiting on it, so it already needs infrastructure to handle something like
> this.
>
>
> That already handles the crashed-client case, because if the client
> crashes, then its connection will be dropped, which will trigger the
> compositor to destroy all its resources anyway, including any pending waits.
>
>
> Exactly that's the problem. A compositor isn't immediately informed that
> the client crashed, instead it is still referencing the buffer and trying
> to use it for compositing.
>

If the compositor no longer has a guarantee that the buffer will be ready
for composition in a reasonable amount of time (which dma_fence gives us,
and this proposal does not appear to give us), then the compositor isn't
trying to use the buffer for compositing, it's waiting asynchronously on a
notification that the fence has signaled before it attempts to use the
buffer.

Marek's initial suggestion is that the kernel signal the fence, which would
unblock composition (and presumably show garbage on screen, or at best jump
back to old content).

My position is that the compositor will know the process has crashed anyway
- because its socket has been closed - at which point we destroy all the
client's resources including its windows and buffers regardless. Signaling
the fence doesn't give us any value here, _unless_ the compositor is just
blindly waiting for the fence to signal ... which it can't do because
there's no guarantee the fence will ever signal.

Cheers,
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Christian König




Am 20.04.21 um 16:53 schrieb Daniel Stone:

Hi,

On Mon, 19 Apr 2021 at 11:48, Marek Olšák > wrote:


Deadlock mitigation to recover from segfaults:
- The kernel knows which process is obliged to signal which fence.
This information is part of the Present request and supplied by
userspace.
- If the producer crashes, the kernel signals the submit fence, so
that the consumer can make forward progress.
- If the consumer crashes, the kernel signals the return fence, so
that the producer can reclaim the buffer.
- A GPU hang signals all fences. Other deadlocks will be handled
like GPU hangs.


Another thought: with completely arbitrary userspace fencing, none of 
this is helpful either. If the compositor can't guarantee that a 
hostile client has submitted a fence which will never be signaled, 
then it won't be waiting on it, so it already needs infrastructure to 
handle something like this.


That already handles the crashed-client case, because if the client 
crashes, then its connection will be dropped, which will trigger the 
compositor to destroy all its resources anyway, including any pending 
waits.


Exactly that's the problem. A compositor isn't immediately informed that 
the client crashed, instead it is still referencing the buffer and 
trying to use it for compositing.




GPU hangs also look pretty similar; it's an infinite wait, until the 
client resubmits a new buffer which would replace (& discard) the old.


Correct. You just need to assume that all queues get destroyed and 
re-initialized when a GPU reset happens.




So signal-fence-on-process-exit isn't helpful and doesn't provide any 
extra reliability; it in fact probably just complicates things.


Well it is when you go for partial GPU resets.

Regards,
Christian.



Cheers,
Daniel

___
mesa-dev mailing list
mesa-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 0/2] drm/bridge: dw-hdmi: disable loading of DW-HDMI CEC sub-driver

2021-04-20 Thread Hans Verkuil

On 16/04/2021 13:38, Neil Armstrong wrote:
> On 16/04/2021 11:58, Laurent Pinchart wrote:
>> Hi Neil,
>>
>> On Fri, Apr 16, 2021 at 11:27:35AM +0200, Neil Armstrong wrote:
>>> This adds DW-HDMI driver a glue option to disable loading of the CEC 
>>> sub-driver.
>>>
>>> On some SoCs, the CEC functionality is enabled in the IP config bits, but 
>>> the
>>> CEC bus is non-functional like on Amlogic SoCs, where the CEC config bit is 
>>> set
>>> but the DW-HDMI CEC signal is not connected to a physical pin, leading to 
>>> some
>>> confusion when the DW-HDMI CEC controller can't communicate on the bus.
>>
>> If we can't trust the CEC config bit, would it be better to not use it
>> at all, and instead let each platform glue logic tell whether to enable
>> CEC or not ?
> 
> Actually, the CEC config bit is right, the HW exists and should be 
> functional, but
> this bit doesn't tell if the CEC signal is connected to something.
> 
> This lies in the IP integration, like other bits under the 
> "amlogic,meson-*-dw-hdmi"
> umbrella.
> 
> The first attempt was by Hans using DT, but adding a property in DT for a 
> vendor
> specific compatible doesn't make sense. Another idea would be to describe the
> CEC signal endpoint like we do for video signal, but I think this is out of 
> scope and
> this solution is much simpler and straightforward, and it's more an exception 
> than
> a general use case to solve.

While a DT property might not make sense in this particular case, I still
believe that it is a perfectly valid approach in general: whether or not
the CEC pin is connected is at the hardware level decision, it is not
something that software can detect. If the designer of the board didn't
connect it, then the only place you can define that is in the device tree.

Anyway, for meson I am fine with this solution. At least it prevents creating
a non-functioning cec device. So for this series:

Acked-by: Hans Verkuil 

Regards,

Hans

> 
> Neil
> 
>>
>>> Jernej Skrabec (1):
>>>   drm/bridge/synopsys: dw-hdmi: Add an option to suppress loading CEC
>>> driver
>>>
>>> Neil Armstrong (1):
>>>   drm/meson: dw-hdmi: disable DW-HDMI CEC sub-driver
>>>
>>>  drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 2 +-
>>>  drivers/gpu/drm/meson/meson_dw_hdmi.c | 1 +
>>>  include/drm/bridge/dw_hdmi.h  | 2 ++
>>>  3 files changed, 4 insertions(+), 1 deletion(-)
>>>
>>
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Christian König




Am 20.04.21 um 17:07 schrieb Daniel Stone:
On Tue, 20 Apr 2021 at 15:58, Christian König 
> wrote:


Am 20.04.21 um 16:53 schrieb Daniel Stone:

On Mon, 19 Apr 2021 at 11:48, Marek Olšák mailto:mar...@gmail.com>> wrote:

Deadlock mitigation to recover from segfaults:
- The kernel knows which process is obliged to signal which
fence. This information is part of the Present request and
supplied by userspace.
- If the producer crashes, the kernel signals the submit
fence, so that the consumer can make forward progress.
- If the consumer crashes, the kernel signals the return
fence, so that the producer can reclaim the buffer.
- A GPU hang signals all fences. Other deadlocks will be
handled like GPU hangs.


Another thought: with completely arbitrary userspace fencing,
none of this is helpful either. If the compositor can't guarantee
that a hostile client has submitted a fence which will never be
signaled, then it won't be waiting on it, so it already needs
infrastructure to handle something like this.



That already handles the crashed-client case, because if the
client crashes, then its connection will be dropped, which will
trigger the compositor to destroy all its resources anyway,
including any pending waits.


Exactly that's the problem. A compositor isn't immediately
informed that the client crashed, instead it is still referencing
the buffer and trying to use it for compositing.


If the compositor no longer has a guarantee that the buffer will be 
ready for composition in a reasonable amount of time (which dma_fence 
gives us, and this proposal does not appear to give us), then the 
compositor isn't trying to use the buffer for compositing, it's 
waiting asynchronously on a notification that the fence has signaled 
before it attempts to use the buffer.


Marek's initial suggestion is that the kernel signal the fence, which 
would unblock composition (and presumably show garbage on screen, or 
at best jump back to old content).


My position is that the compositor will know the process has crashed 
anyway - because its socket has been closed - at which point we 
destroy all the client's resources including its windows and buffers 
regardless. Signaling the fence doesn't give us any value here, 
_unless_ the compositor is just blindly waiting for the fence to 
signal ... which it can't do because there's no guarantee the fence 
will ever signal.


Yeah, but that assumes that the compositor has change to not blindly 
wait for the client to finish rendering and as Daniel explained that is 
rather unrealistic.


What we need is a fallback mechanism which signals the fence after a 
timeout and gives a penalty to the one causing the timeout.


That gives us the same functionality we have today with the in software 
scheduler inside the kernel.


Regards,
Christian.


Cheers,
Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 0/2] drm/bridge: dw-hdmi: disable loading of DW-HDMI CEC sub-driver

2021-04-20 Thread Neil Armstrong

On 20/04/2021 17:13, Hans Verkuil wrote:
> On 16/04/2021 13:38, Neil Armstrong wrote:
>> On 16/04/2021 11:58, Laurent Pinchart wrote:
>>> Hi Neil,
>>>
>>> On Fri, Apr 16, 2021 at 11:27:35AM +0200, Neil Armstrong wrote:
 This adds DW-HDMI driver a glue option to disable loading of the CEC 
 sub-driver.

 On some SoCs, the CEC functionality is enabled in the IP config bits, but 
 the
 CEC bus is non-functional like on Amlogic SoCs, where the CEC config bit 
 is set
 but the DW-HDMI CEC signal is not connected to a physical pin, leading to 
 some
 confusion when the DW-HDMI CEC controller can't communicate on the bus.
>>>
>>> If we can't trust the CEC config bit, would it be better to not use it
>>> at all, and instead let each platform glue logic tell whether to enable
>>> CEC or not ?
>>
>> Actually, the CEC config bit is right, the HW exists and should be 
>> functional, but
>> this bit doesn't tell if the CEC signal is connected to something.
>>
>> This lies in the IP integration, like other bits under the 
>> "amlogic,meson-*-dw-hdmi"
>> umbrella.
>>
>> The first attempt was by Hans using DT, but adding a property in DT for a 
>> vendor
>> specific compatible doesn't make sense. Another idea would be to describe the
>> CEC signal endpoint like we do for video signal, but I think this is out of 
>> scope and
>> this solution is much simpler and straightforward, and it's more an 
>> exception than
>> a general use case to solve.
> 
> While a DT property might not make sense in this particular case, I still
> believe that it is a perfectly valid approach in general: whether or not
> the CEC pin is connected is at the hardware level decision, it is not
> something that software can detect. If the designer of the board didn't
> connect it, then the only place you can define that is in the device tree.

Agreed, we need to define a smart way to declare CEC bus relationship in DT, 
the side
effect would be to handle this particular case.

> 
> Anyway, for meson I am fine with this solution. At least it prevents creating
> a non-functioning cec device. So for this series:
> 
> Acked-by: Hans Verkuil 

Thanks,

Applying this serie to drm-misc-next

Neil

> 
> Regards,
> 
>   Hans
> 
>>
>> Neil
>>
>>>
 Jernej Skrabec (1):
   drm/bridge/synopsys: dw-hdmi: Add an option to suppress loading CEC
 driver

 Neil Armstrong (1):
   drm/meson: dw-hdmi: disable DW-HDMI CEC sub-driver

  drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 2 +-
  drivers/gpu/drm/meson/meson_dw_hdmi.c | 1 +
  include/drm/bridge/dw_hdmi.h  | 2 ++
  3 files changed, 4 insertions(+), 1 deletion(-)

>>>
>>
>> ___
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Jason Ekstrand

It's still early in the morning here and I'm not awake yet so sorry if
this comes out in bits and pieces...

On Tue, Apr 20, 2021 at 7:43 AM Daniel Stone  wrote:
>
> Hi Marek,
>
> On Mon, 19 Apr 2021 at 11:48, Marek Olšák  wrote:
>>
>> 2. Explicit synchronization for window systems and modesetting
>>
>> The producer is an application and the consumer is a compositor or a 
>> modesetting driver.
>>
>> 2.1. The Present request
>
>
> So the 'present request' is an ioctl, right? Not a userspace construct like 
> it is today? If so, how do we correlate the two?
>
> The terminology is pretty X11-centric so I'll assume that's what you've 
> designed against, but Wayland and even X11 carry much more auxiliary 
> information attached to a present request than just 'this buffer, this 
> swapchain'. Wayland latches a lot of data on presentation, including 
> non-graphics data such as surface geometry (so we can have resizes which 
> don't suck), window state (e.g. fullscreen or not, also so we can have 
> resizes which don't suck), and these requests can also cascade through a tree 
> of subsurfaces (so we can have embeds which don't suck). X11 mostly just 
> carries timestamps, which is more tractable.
>
> Given we don't want to move the entirety of Wayland into kernel-visible 
> objects, how do we synchronise the two streams so they aren't incoherent? 
> Taking a rough stab at it whilst assuming we do have 
> DRM_IOCTL_NONMODE_PRESENT, this would create a present object somewhere in 
> kernel space, which the producer would create and ?? export a FD from, that 
> the compositor would ?? import.
>
>> As part of the Present request, the producer will pass 2 fences (sync 
>> objects) to the consumer alongside the presented DMABUF BO:
>> - The submit fence: Initially unsignalled, it will be signalled when the 
>> producer has finished drawing into the presented buffer.
>
>
> We have already have this in Wayland through dma_fence. I'm relaxed about 
> this becoming drm_syncobj or drm_newmappedysncobjthing, it's just a matter of 
> typing. X11 has patches to DRI3 to support dma_fence, but they never got 
> merged because it was far too invasive to a server which is no longer 
> maintained.
>
>>
>> - The return fence: Initially unsignalled, it will be signalled when the 
>> consumer has finished using the presented buffer.
>
>
> Currently in Wayland the return fence (again a dma_fence) is generated by the 
> compositor and sent as an event when it's done, because we can't have 
> speculative/empty/future fences. drm_syncobj would make this possible, but so 
> far I've been hesitant because I don't see the benefit to it (more below).
>
>>
>> Deadlock mitigation to recover from segfaults:
>> - The kernel knows which process is obliged to signal which fence. This 
>> information is part of the Present request and supplied by userspace.
>
>
> Same as today with dma_fence. Less true with drm_syncobj if we're using 
> timelines.
>
>>
>> - If the producer crashes, the kernel signals the submit fence, so that the 
>> consumer can make forward progress.
>
>
> This is only a change if the producer is now allowed to submit a fence before 
> it's flushed the work which would eventually fulfill that fence. Using 
> dma_fence has so far isolated us from this.
>
>>
>> - If the consumer crashes, the kernel signals the return fence, so that the 
>> producer can reclaim the buffer.
>
>
> 'The consumer' is problematic, per below. I think the wording you want is 'if 
> no references are held to the submitted present object'.
>
>>
>> - A GPU hang signals all fences. Other deadlocks will be handled like GPU 
>> hangs.
>>
>> Other window system requests can follow the same idea.
>
>
> Which other window system requests did you have in mind? Again, moving the 
> entirety of Wayland's signaling into the kernel is a total non-starter. 
> Partly because it means our entire protocol would be subject to the kernel's 
> ABI rules, partly because the rules and interdependencies between the 
> requests are extremely complex, but mostly because the kernel is just a 
> useless proxy: it would be forced to do significant work to reason about what 
> those requests do and when they should happen, but wouldn't be able to make 
> those decisions itself so would have to just punt everything to userspace. 
> Unless we have eBPF compositors.
>
>>
>> Merged fences where one fence object contains multiple fences will be 
>> supported. A merged fence is signalled only when its fences are signalled. 
>> The consumer will have the option to redefine the unsignalled return fence 
>> to a merged fence.
>
>
> An elaboration of how this differed from drm_syncobj would be really helpful 
> here. I can make some guesses based on the rest of the mail, but I'm not sure 
> how accurate they are.
>
>>
>> 2.2. Modesetting
>>
>> Since a modesetting driver can also be the consumer, the present ioctl will 
>> contain a submit fence and a return fence too. One small problem with th

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Daniel Stone

Hi,

On Tue, 20 Apr 2021 at 16:16, Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 20.04.21 um 17:07 schrieb Daniel Stone:
>
> If the compositor no longer has a guarantee that the buffer will be ready
> for composition in a reasonable amount of time (which dma_fence gives us,
> and this proposal does not appear to give us), then the compositor isn't
> trying to use the buffer for compositing, it's waiting asynchronously on a
> notification that the fence has signaled before it attempts to use the
> buffer.
>
> Marek's initial suggestion is that the kernel signal the fence, which
> would unblock composition (and presumably show garbage on screen, or at
> best jump back to old content).
>
> My position is that the compositor will know the process has crashed
> anyway - because its socket has been closed - at which point we destroy all
> the client's resources including its windows and buffers regardless.
> Signaling the fence doesn't give us any value here, _unless_ the compositor
> is just blindly waiting for the fence to signal ... which it can't do
> because there's no guarantee the fence will ever signal.
>
>
> Yeah, but that assumes that the compositor has change to not blindly wait
> for the client to finish rendering and as Daniel explained that is rather
> unrealistic.
>
> What we need is a fallback mechanism which signals the fence after a
> timeout and gives a penalty to the one causing the timeout.
>
> That gives us the same functionality we have today with the in software
> scheduler inside the kernel.
>

OK, if that's the case then I think I'm really missing something which
isn't explained in this thread, because I don't understand what the
additional complexity and API change gains us (see my first reply in this
thread).

By way of example - say I have a blind-but-explicit compositor that takes a
drm_syncobj along with a dmabuf with each client presentation request, but
doesn't check syncobj completion, it just imports that into a VkSemaphore +
VkImage and schedules work for the next frame.

Currently, that generates an execbuf ioctl for the composition (ignore KMS
for now) with a sync point to wait on, and the kernel+GPU scheduling
guarantees that the composition work will not begin until the client
rendering work has retired. We have a further guarantee that this work will
complete in reasonable time, for some value of 'reasonable'.

My understanding of this current proposal is that:
* userspace creates a 'present fence' with this new ioctl
* the fence becomes signaled when a value is written to a location in
memory, which is visible through both CPU and GPU mappings of that page
* this 'present fence' is imported as a VkSemaphore (?) and the userspace
Vulkan driver will somehow wait on this value  either before submitting
work or as a possibly-hardware-assisted GPU-side wait (?)
* the kernel's scheduler is thus eliminated from the equation, and every
execbuf is submitted directly to hardware, because either userspace knows
that the fence has already been signaled, or it will issue a GPU-side wait
(?)
* but the kernel is still required to monitor completion of every fence
itself, so it can forcibly complete, or penalise the client (?)

Lastly, let's say we stop ignoring KMS: what happens for the
render-with-GPU-display-on-KMS case? Do we need to do the equivalent of
glFinish() in userspace and only submit the KMS atomic request when the GPU
work has fully retired?

Clarifying those points would be really helpful so this is less of a
strawman. I have some further opinions, but I'm going to wait until I
understand what I'm actually arguing against before I go too far. :) The
last point is very salient though.

Cheers,
Daniel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/5] drm/i915: Create stolen memory region from local memory

2021-04-20 Thread Tvrtko Ursulin




On 20/04/2021 14:18, Matthew Auld wrote:

From: CQ Tang 

Add "REGION_STOLEN" device info to dg1, create stolen memory
region from upper portion of local device memory, starting
from DSMBASE.

v2:
 - s/drm_info/drm_dbg; userspace likely doesn't care about stolen.
 - mem->type is only setup after the region probe, so setting the name
   as stolen-local or stolen-system based on this value won't work. Split
   system vs local stolen setup to fix this.
 - kill all the region->devmem/is_devmem stuff. We already differentiate
   the different types of stolen so such things shouldn't be needed
   anymore.
v3:
 - split stolen lmem vs smem ops(Tvrtko)
 - add shortcut for stolen region in i915(Tvrtko)
 - sanity check dsm base vs bar size(Xinyun)

Signed-off-by: CQ Tang 
Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
Cc: Xinyun Liu 
---
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 137 ++---
  drivers/gpu/drm/i915/i915_drv.h|   7 ++
  drivers/gpu/drm/i915/i915_pci.c|   2 +-
  drivers/gpu/drm/i915/i915_reg.h|   1 +
  drivers/gpu/drm/i915/intel_memory_region.c |   8 ++
  drivers/gpu/drm/i915/intel_memory_region.h |   5 +-
  6 files changed, 140 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index b0597de206de..2ed1ca9aec75 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -10,6 +10,7 @@
  #include 
  #include 
  
+#include "gem/i915_gem_lmem.h"

  #include "gem/i915_gem_region.h"
  #include "i915_drv.h"
  #include "i915_gem_stolen.h"
@@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private 
*i915,
}
}
  
+	/*

+* With device local memory, we don't need to check the address range,
+* this is device memory physical address, could overlap with system
+* memory.
+*/
+   if (HAS_LMEM(i915))
+   return 0;


The grammar in the comment is a bit hard to parse for me, but more 
importantly, this is now not on the device stolen path, right?


[Comes back later, hm no, still called okay at least there is a comment 
now explaining which are the relevant bits.]



+
/*
 * Verify that nothing else uses this physical address. Stolen
 * memory should be reserved by the BIOS and hidden from the
@@ -374,8 +383,9 @@ static void icl_get_stolen_reserved(struct drm_i915_private 
*i915,
}
  }
  
-static int i915_gem_init_stolen(struct drm_i915_private *i915)

+static int i915_gem_init_stolen(struct intel_memory_region *mem)
  {
+   struct drm_i915_private *i915 = mem->i915;
struct intel_uncore *uncore = &i915->uncore;
resource_size_t reserved_base, stolen_top;
resource_size_t reserved_total, reserved_size;
@@ -396,10 +406,10 @@ static int i915_gem_init_stolen(struct drm_i915_private 
*i915)
return 0;
}
  
-	if (resource_size(&intel_graphics_stolen_res) == 0)

+   if (resource_size(&mem->region) == 0)
return 0;
  
-	i915->dsm = intel_graphics_stolen_res;

+   i915->dsm = mem->region;
  
  	if (i915_adjust_stolen(i915, &i915->dsm))

return 0;
@@ -688,39 +698,130 @@ struct drm_i915_gem_object *
  i915_gem_object_create_stolen(struct drm_i915_private *i915,
  resource_size_t size)
  {
-   return 
i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN_SMEM],
+   return i915_gem_object_create_region(i915->mm.stolen_region,
 size, I915_BO_ALLOC_CONTIGUOUS);
  }
  
-static int init_stolen(struct intel_memory_region *mem)

+static int init_stolen_smem(struct intel_memory_region *mem)
  {
-   intel_memory_region_set_name(mem, "stolen");
-
/*
 * Initialise stolen early so that we may reserve preallocated
 * objects for the BIOS to KMS transition.
 */
-   return i915_gem_init_stolen(mem->i915);
+   return i915_gem_init_stolen(mem);
+}
+
+static void release_stolen_smem(struct intel_memory_region *mem)
+{
+   i915_gem_cleanup_stolen(mem->i915);
+}
+
+static const struct intel_memory_region_ops i915_region_stolen_smem_ops = {
+   .init = init_stolen_smem,
+   .release = release_stolen_smem,
+   .init_object = _i915_gem_object_stolen_init,
+};
+
+static int init_stolen_lmem(struct intel_memory_region *mem)
+{
+   int err;
+
+   if (GEM_WARN_ON(resource_size(&mem->region) == 0))
+   return -ENODEV;
+
+   if (!io_mapping_init_wc(&mem->iomap,
+   mem->io_start,
+   resource_size(&mem->region)))
+   return -EIO;
+
+   /*
+* For stolen lmem we mostly just care about populating the dsm related
+* bits and setting up the drm_mm allocator for the range.
+

Re: [PATCH 4/5] drm/i915/stolen: pass the allocation flags

2021-04-20 Thread Tvrtko Ursulin




On 20/04/2021 14:18, Matthew Auld wrote:

From: CQ Tang 

Stolen memory is always allocated as physically contiguous pages, mark
the object flags as such.

v2: move setting I915_BO_ALLOC_CONTIGUOUS into create_stolen

Signed-off-by: CQ Tang 
Signed-off-by: Matthew Auld 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 4f9fe5aca37e..46f79b240df7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -633,14 +633,21 @@ static const struct drm_i915_gem_object_ops 
i915_gem_object_stolen_ops = {
  
  static int __i915_gem_object_create_stolen(struct intel_memory_region *mem,

   struct drm_i915_gem_object *obj,
-  struct drm_mm_node *stolen)
+  struct drm_mm_node *stolen,
+  unsigned int flags)
  {
static struct lock_class_key lock_class;
unsigned int cache_level;
int err;
  
+	/*

+* Stolen objects are always physically contiguous since we just
+* allocate one big block underneath using the drm_mm range allocator.
+*/
+   flags |= I915_BO_ALLOC_CONTIGUOUS;
+
drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size);
-   i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0);
+   i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 
flags);
  
  	obj->stolen = stolen;

obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
@@ -682,7 +689,7 @@ static int _i915_gem_object_stolen_init(struct 
intel_memory_region *mem,
if (ret)
goto err_free;
  
-	ret = __i915_gem_object_create_stolen(mem, obj, stolen);

+   ret = __i915_gem_object_create_stolen(mem, obj, stolen, flags);


Hm odd that previously the flags were ignored in here. I guess no 
callers were passing any when creating stolen objects. If none are 
supported should we add a GEM_BUG_ON to check for that?


Regards,

Tvrtko


if (ret)
goto err_remove;
  
@@ -700,7 +707,7 @@ i915_gem_object_create_stolen(struct drm_i915_private *i915,

  resource_size_t size)
  {
return i915_gem_object_create_region(i915->mm.stolen_region,
-size, I915_BO_ALLOC_CONTIGUOUS);
+size, 0);
  }
  
  static int init_stolen_smem(struct intel_memory_region *mem)

@@ -866,7 +873,7 @@ i915_gem_object_create_stolen_for_preallocated(struct 
drm_i915_private *i915,
goto err_stolen;
}
  
-	ret = __i915_gem_object_create_stolen(mem, obj, stolen);

+   ret = __i915_gem_object_create_stolen(mem, obj, stolen, 0);
if (ret)
goto err_object_free;
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Jason Ekstrand

Sorry for the mega-reply but timezones...

On Tue, Apr 20, 2021 at 6:59 AM Christian König
 wrote:
>
> > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > the kernel's problem.
>
> Well, the path of inner peace begins with four words. “Not my fucking
> problem.”

🧘

> But I'm not that much concerned about the kernel, but rather about
> important userspace processes like X, Wayland, SurfaceFlinger etc...
>
> I mean attaching a page to a sync object and allowing to wait/signal
> from both CPU as well as GPU side is not so much of a problem.

Yup... Sorting out these issues is what makes this a hard problem.


> > You have to somehow handle that, e.g. perhaps with conditional
> > rendering and just using the old frame in compositing if the new one
> > doesn't show up in time.
>
> Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level.

"Just handle it with conditional rendering" is a pretty trite answer.
If we have memory fences, we could expose a Vulkan extension to allow
them to be read by conditional rendering or by a shader.  However, as
Daniel has pointed out multiple times, composition pipelines are long
and complex and cheap tricks like that aren't something we can rely on
for solving the problem.  If we're going to solve the problem, we need
to make driver-internal stuff nice while still providing something
that looks very much like a sync_file with finite time semantics to
the composition pipeline.  How?  That's the question.


> Regards,
> Christian.
>
> Am 20.04.21 um 13:16 schrieb Daniel Vetter:
> > On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote:
> >> Daniel, are you suggesting that we should skip any deadlock prevention in
> >> the kernel, and just let userspace wait for and signal any fence it has
> >> access to?
> > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > the kernel's problem. The only criteria is that the kernel itself must
> > never rely on these userspace fences, except for stuff like implementing
> > optimized cpu waits. And in those we must always guarantee that the
> > userspace process remains interruptible.
> >
> > It's a completely different world from dma_fence based kernel fences,
> > whether those are implicit or explicit.
> >
> >> Do you have any concern with the deprecation/removal of BO fences in the
> >> kernel assuming userspace is only using explicit fences? Any concern with
> >> the submit and return fences for modesetting and other producer<->consumer
> >> scenarios?
> > Let me work on the full replay for your rfc first, because there's a lot
> > of details here and nuance.
> > -Daniel
> >
> >> Thanks,
> >> Marek
> >>
> >> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:
> >>
> >>> On Tue, Apr 20, 2021 at 12:15 PM Christian König
> >>>  wrote:
>  Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > Not going to comment on everything on the first pass...
> >
> > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> >> Hi,
> >>
> >> This is our initial proposal for explicit fences everywhere and new
> >>> memory management that doesn't use BO fences. It's a redesign of how Linux
> >>> graphics drivers work, and it can coexist with what we have now.
> >>
> >> 1. Introduction
> >> (skip this if you are already sold on explicit fences)
> >>
> >> The current Linux graphics architecture was initially designed for
> >>> GPUs with only one graphics queue where everything was executed in the
> >>> submission order and per-BO fences were used for memory management and
> >>> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> >>> queues were added on top, which required the introduction of implicit
> >>> GPU-GPU synchronization between queues of different processes using per-BO
> >>> fences. Recently, even parallel execution within one queue was enabled
> >>> where a command buffer starts draws and compute shaders, but doesn't wait
> >>> for them, enabling parallelism between back-to-back command buffers.
> >>> Modesetting also uses per-BO fences for scheduling flips. Our GPU 
> >>> scheduler
> >>> was created to enable all those use cases, and it's the only reason why 
> >>> the
> >>> scheduler exists.
> >> The GPU scheduler, implicit synchronization, BO-fence-based memory
> >>> management, and the tracking of per-BO fences increase CPU overhead and
> >>> latency, and reduce parallelism. There is a desire to replace all of them
> >>> with something much simpler. Below is how we could do it.
> >>
> >> 2. Explicit synchronization for window systems and modesetting
> >>
> >> The producer is an application and the consumer is a compositor or a
> >>> modesetting driver.
> >> 2.1. The Present request
> >>
> >> As part of the Present request, the producer will pass 2 fences (sync
> >>> objects) to the consumer alongside the presented DMABUF BO:
> >> - The submit fence: Initially unsignalled,

Re: [PATCH v3 1/5] dt-bindings: display: mediatek, hdmi: Convert to use graph schema

2021-04-20 Thread Rob Herring

On Mon, 19 Apr 2021 09:32:40 +0200, Neil Armstrong wrote:
> Update the mediatek,dpi binding to use the graph schema.
> 
> Signed-off-by: Neil Armstrong 
> ---
>  .../display/mediatek/mediatek,cec.yaml|  51 +++
>  .../display/mediatek/mediatek,hdmi-ddc.yaml   |  57 
>  .../display/mediatek/mediatek,hdmi.txt| 136 --
>  .../display/mediatek/mediatek,hdmi.yaml   | 132 +
>  4 files changed, 240 insertions(+), 136 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/display/mediatek/mediatek,cec.yaml
>  create mode 100644 
> Documentation/devicetree/bindings/display/mediatek/mediatek,hdmi-ddc.yaml
>  delete mode 100644 
> Documentation/devicetree/bindings/display/mediatek/mediatek,hdmi.txt
>  create mode 100644 
> Documentation/devicetree/bindings/display/mediatek/mediatek,hdmi.yaml
> 

Reviewed-by: Rob Herring 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Jason Ekstrand

On Tue, Apr 20, 2021 at 9:10 AM Daniel Vetter  wrote:
>
> On Tue, Apr 20, 2021 at 1:59 PM Christian König
>  wrote:
> >
> > > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > > the kernel's problem.
> >
> > Well, the path of inner peace begins with four words. “Not my fucking
> > problem.”
> >
> > But I'm not that much concerned about the kernel, but rather about
> > important userspace processes like X, Wayland, SurfaceFlinger etc...
> >
> > I mean attaching a page to a sync object and allowing to wait/signal
> > from both CPU as well as GPU side is not so much of a problem.
> >
> > > You have to somehow handle that, e.g. perhaps with conditional
> > > rendering and just using the old frame in compositing if the new one
> > > doesn't show up in time.
> >
> > Nice idea, but how would you handle that on the OpenGL/Glamor/Vulkan level.
>
> For opengl we do all the same guarantees, so if you get one of these
> you just block until the fence is signalled. Doing that properly means
> submit thread to support drm_syncobj like for vulkan.
>
> For vulkan we probably want to represent these as proper vk timeline
> objects, and the vulkan way is to just let the application (well
> compositor) here deal with it. If they import timelines from untrusted
> other parties, they need to handle the potential fallback of being
> lied at. How is "not vulkan's fucking problem", because that entire
> "with great power (well performance) comes great responsibility" is
> the entire vk design paradigm.

The security aspects are currently an unsolved problem in Vulkan.  The
assumption is that everyone trusts everyone else to be careful with
the scissors.  It's a great model!

I think we can do something in Vulkan to allow apps to protect
themselves a bit but it's tricky and non-obvious.

--Jason


> Glamour will just rely on GL providing nice package of the harsh
> reality of gpus, like usual.
>
> So I guess step 1 here for GL would be to provide some kind of
> import/export of timeline syncobj, including properly handling this
> "future/indefinite fences" aspect of them with submit thread and
> everything.
>
> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > Am 20.04.21 um 13:16 schrieb Daniel Vetter:
> > > On Tue, Apr 20, 2021 at 07:03:19AM -0400, Marek Olšák wrote:
> > >> Daniel, are you suggesting that we should skip any deadlock prevention in
> > >> the kernel, and just let userspace wait for and signal any fence it has
> > >> access to?
> > > Yeah. If we go with userspace fences, then userspace can hang itself. Not
> > > the kernel's problem. The only criteria is that the kernel itself must
> > > never rely on these userspace fences, except for stuff like implementing
> > > optimized cpu waits. And in those we must always guarantee that the
> > > userspace process remains interruptible.
> > >
> > > It's a completely different world from dma_fence based kernel fences,
> > > whether those are implicit or explicit.
> > >
> > >> Do you have any concern with the deprecation/removal of BO fences in the
> > >> kernel assuming userspace is only using explicit fences? Any concern with
> > >> the submit and return fences for modesetting and other 
> > >> producer<->consumer
> > >> scenarios?
> > > Let me work on the full replay for your rfc first, because there's a lot
> > > of details here and nuance.
> > > -Daniel
> > >
> > >> Thanks,
> > >> Marek
> > >>
> > >> On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:
> > >>
> > >>> On Tue, Apr 20, 2021 at 12:15 PM Christian König
> > >>>  wrote:
> >  Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > > Not going to comment on everything on the first pass...
> > >
> > > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> > >> Hi,
> > >>
> > >> This is our initial proposal for explicit fences everywhere and new
> > >>> memory management that doesn't use BO fences. It's a redesign of how 
> > >>> Linux
> > >>> graphics drivers work, and it can coexist with what we have now.
> > >>
> > >> 1. Introduction
> > >> (skip this if you are already sold on explicit fences)
> > >>
> > >> The current Linux graphics architecture was initially designed for
> > >>> GPUs with only one graphics queue where everything was executed in the
> > >>> submission order and per-BO fences were used for memory management and
> > >>> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> > >>> queues were added on top, which required the introduction of implicit
> > >>> GPU-GPU synchronization between queues of different processes using 
> > >>> per-BO
> > >>> fences. Recently, even parallel execution within one queue was enabled
> > >>> where a command buffer starts draws and compute shaders, but doesn't 
> > >>> wait
> > >>> for them, enabling parallelism between back-to-back command buffers.
> > >>> Modesetting also uses per-BO fences for scheduling flips. Our GPU 
> > >>> scheduler
> > >>> was created to enable all thos

Re: [PATCH v3 2/5] dt-bindings: mediatek: add mt8167 to hdmi, hdmi-ddc and cec bindings

2021-04-20 Thread Rob Herring

On Mon, 19 Apr 2021 09:32:41 +0200, Neil Armstrong wrote:
> Add mt8167 SoC compatible to Mediatek hdmi, hdmi-ddc and cec schema bindings.
> 
> Signed-off-by: Neil Armstrong 
> ---
>  .../devicetree/bindings/display/mediatek/mediatek,cec.yaml   | 1 +
>  .../devicetree/bindings/display/mediatek/mediatek,hdmi-ddc.yaml  | 1 +
>  .../devicetree/bindings/display/mediatek/mediatek,hdmi.yaml  | 1 +
>  3 files changed, 3 insertions(+)
> 

Acked-by: Rob Herring 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

1 2 >

1 - 100 of 167 matches

Mail list logo