date:20210820

RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Liu, Monk

[AMD Official Use Only]

@Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
 
Do you have any concern on the kthread_park() approach ?

Theoretically speaking sched_main shall run there exclusively with job_timeout 
since they both touches jobs, and stop scheduler during job_timeout won't 
impact performance since in that scenario
There was already something wrong/stuck on that ring/scheduler 

Thanks 

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Liu, Monk 
Sent: Thursday, August 19, 2021 6:26 PM
To: Daniel Vetter ; Grodzovsky, Andrey 

Cc: Alex Deucher ; Chen, JingWen 
; Maling list - DRI developers 
; amd-gfx list 
; Koenig, Christian 
Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

[AMD Official Use Only]

Hi Daniel

>> Why can't we stop the scheduler thread first, so that there's guaranteed no 
>> race? I've recently had a lot of discussions with panfrost folks about their 
>> reset that spawns across engines, and without stopping the scheduler thread 
>> first before you touch anything it's just plain impossible.

Yeah we had this though as well in our mind.

Our second approach is to call ktrhead_stop() in job_timedout() routine so that 
 the "bad" job is guaranteed to be used without scheduler's touching or 
freeing, Check this sample patch one as well please:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..50a49cb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+   kthread_park(sched->thread);
spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->pending_list,
   struct drm_sched_job, list);
 
if (job) {
-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
spin_unlock(&sched->job_list_lock);
 
status = job->sched->ops->timedout_job(job);
@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
} else {
spin_unlock(&sched->job_list_lock);
}
+   kthread_unpark(sched->thread);
 
if (status != DRM_GPU_SCHED_STAT_ENODEV) {
spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void 
drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
kthread_park(sched->thread);
 
/*
-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
-   /*
 * Iterate the job list from later to  earlier one and either deactive
 * their HW callbacks or remove them from pending list if they already
 * signaled.


Thanks 

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Daniel Vetter 
Sent: Thursday, August 19, 2021 5:31 PM
To: Grodzovsky, Andrey 
Cc: Daniel Vetter ; Alex Deucher ; 
Chen, JingWen ; Maling list - DRI developers 
; amd-gfx list 
; Liu, Monk ; Koenig, 
Christian 
Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
> 
> On 2021-08-18 10:42 a.m., Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote:
> > > On 2021-08-18 10:32 a.m., Daniel Vetter wrote:
> > > > On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote:
> > > > > On 2021-08-18 10:02 a.m., Alex Deucher wrote:
> > > > > 
> > > > > > + dri-devel
> > > > > > 
> > > > > > Since scheduler is a shared component, please add dri-devel 
> > > > > > on all scheduler patches.
> > > > > > 
> > > > > > On Wed, Aug 18, 2021 at 7:21 AM Jingwen Chen 
> > > > > >  wrote:
> > > > > > > [Why]
> > > > > > > for bailing job, this commit will delete it from pending 
> > > > > > > list thus the bailing job w

Re: [PATCH] drm/ttm: allow debugfs_create_file() to fail in ttm_global_init()

2021-08-20 Thread Duncan

On Mon, 16 Aug 2021 16:30:46 +0200
lnx7...@gregdf.com wrote:

> From: Greg Depoire--Ferrer 
> 
> Commit 69de4421bb4c ("drm/ttm: Initialize debugfs from
> ttm_global_init()") unintentionally made ttm_global_init() return
> early with an error when debugfs_create_file() fails. When
> CONFIG_DEBUG_FS is disabled, debugfs_create_file() returns a ENODEV
> error so the TTM device would fail to initialize.
> 
> Instead of returning early with the error, print it and continue.
> ENODEV can be ignored because it just means that CONFIG_DEBUG_FS is
> disabled.
> 
> Fixes: 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()")
> Reported-by: Mikael Pettersson 
> Reported-by: Duncan 
> Signed-off-by: Greg Depoire--Ferrer 
> ---
> Hi, I had this bug as well with the nouveau driver after updating.
> This patch fixes it for me.
> 
>  drivers/gpu/drm/ttm/ttm_device.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

This fixes the problem here, too.  Running it now.

Tested-by: Duncan 

-- 
Duncan - HTML messages treated as spam
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."
Benjamin Franklin

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-20 Thread Jerome Glisse

Note that you do not want GUP to succeed on device page, i do not see
where that is handled in the new code.

On Sun, Aug 15, 2021 at 1:40 PM John Hubbard  wrote:
>
> On 8/15/21 8:37 AM, Christoph Hellwig wrote:
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index 8ae31622deef..d48a1f0889d1 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -1218,7 +1218,7 @@ __maybe_unused struct page 
> >> *try_grab_compound_head(struct page *page, int refs,
> >>   static inline __must_check bool try_get_page(struct page *page)
> >>   {
> >>  page = compound_head(page);
> >> -if (WARN_ON_ONCE(page_ref_count(page) <= 0))
> >> +if (WARN_ON_ONCE(page_ref_count(page) < 
> >> (int)!is_zone_device_page(page)))
> >
> > Please avoid the overly long line.  In fact I'd be tempted to just not
> > bother here and keep the old, more lose check.  Especially given that
> > John has a patch ready that removes try_get_page entirely.
> >
>
> Yes. Andrew has accepted it into mmotm.
>
> Ralph's patch here was written well before my cleanup that removed
> try_grab_page() [1]. But now that we're here, if you drop this hunk then
> it will make merging easier, I think.
>
>
> [1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com
>
> thanks,
> --
> John Hubbard
> NVIDIA
>

[syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1462cb6130
kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for 
Debian) 2.35.1
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com

[ cut here ]
WARNING: CPU: 0 PID: 11697 at drivers/gpu/drm/drm_gem_shmem_helper.c:562 
drm_gem_shmem_vm_open+0x96/0xb0 drivers/gpu/drm/drm_gem_shmem_helper.c:562
Modules linked in:
CPU: 0 PID: 11697 Comm: syz-executor.3 Not tainted 5.14.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:drm_gem_shmem_vm_open+0x96/0xb0 
drivers/gpu/drm/drm_gem_shmem_helper.c:562
Code: 89 c6 e8 7d ec 23 fd 85 db 75 1a e8 34 e5 23 fd 48 89 ef 5b 5d 41 5c e9 
e8 61 f5 ff e8 23 e5 23 fd 0f 0b eb ca e8 1a e5 23 fd <0f> 0b eb dd e8 b1 1f 6a 
fd eb 89 e8 aa 1f 6a fd eb a8 0f 1f 84 00
RSP: 0018:c9000b3cfb90 EFLAGS: 00010293
RAX:  RBX: fffc RCX: 
RDX: 8880364eb880 RSI: 8451c3e6 RDI: 0003
RBP: 888033c70948 R08:  R09: 
R10: 8451c3c3 R11: 0001 R12: 888146490800
R13: 888033c70a50 R14: 20166000 R15: 888033c709d8
FS:  7fbe43056700() GS:8880b9c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 005422b8 CR3: 36274000 CR4: 00350ef0
Call Trace:
 __split_vma+0x23c/0x550 mm/mmap.c:2764
 __do_munmap+0x32a/0x11c0 mm/mmap.c:2868
 do_munmap mm/mmap.c:2922 [inline]
 munmap_vma_range mm/mmap.c:604 [inline]
 mmap_region+0x85a/0x1760 mm/mmap.c:1753
 do_mmap+0x86e/0x1180 mm/mmap.c:1584
 vm_mmap_pgoff+0x1b7/0x290 mm/util.c:519
 ksys_mmap_pgoff+0x4a8/0x620 mm/mmap.c:1635
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665e9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:7fbe43056188 EFLAGS: 0246 ORIG_RAX: 0009
RAX: ffda RBX: 0056bf80 RCX: 004665e9
RDX:  RSI: 2000 RDI: 20166000
RBP: 004bfcc4 R08: 0004 R09: 
R10: 0013 R11: 0246 R12: 0056bf80
R13: 7fffb615701f R14: 7fbe43056300 R15: 00022000

Code disassembly (best guess):
   0:   89 c6   mov%eax,%esi
   2:   e8 7d ec 23 fd  callq  0xfd23ec84
   7:   85 db   test   %ebx,%ebx
   9:   75 1a   jne0x25
   b:   e8 34 e5 23 fd  callq  0xfd23e544
  10:   48 89 efmov%rbp,%rdi
  13:   5b  pop%rbx
  14:   5d  pop%rbp
  15:   41 5c   pop%r12
  17:   e9 e8 61 f5 ff  jmpq   0xfff56204
  1c:   e8 23 e5 23 fd  callq  0xfd23e544
  21:   0f 0b   ud2
  23:   eb ca   jmp0xffef
  25:   e8 1a e5 23 fd  callq  0xfd23e544
  2a:   0f 0b   ud2 <-- trapping instruction
  2c:   eb dd   jmp0xb
  2e:   e8 b1 1f 6a fd  callq  0xfd6a1fe4
  33:   eb 89   jmp0xffbe
  35:   e8 aa 1f 6a fd  callq  0xfd6a1fe4
  3a:   eb a8   jmp0xffe4
  3c:   0f  .byte 0xf
  3d:   1f  (bad)  
  3e:   84 00   test   %al,(%rax)


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

[PATCH linux-next] : add put_device() after of_find_device_by_node()

2021-08-20 Thread jing yangyang

This was found by coccicheck:
./drivers/gpu/drm/kmb/kmb_drv.c:503:2-8:
ERROR  missing put_device; call of_find_device_by_node on line 490,
but without a corresponding object release within this function.

Reported-by: Zeal Robot 
Signed-off-by: jing yangyang 
---
 drivers/gpu/drm/kmb/kmb_drv.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/kmb/kmb_drv.c b/drivers/gpu/drm/kmb/kmb_drv.c
index f54392e..58495a9 100644
--- a/drivers/gpu/drm/kmb/kmb_drv.c
+++ b/drivers/gpu/drm/kmb/kmb_drv.c
@@ -500,8 +500,10 @@ static int kmb_probe(struct platform_device *pdev)
ret = kmb_dsi_host_bridge_init(get_device(&dsi_pdev->dev));
 
if (ret == -EPROBE_DEFER) {
+   put_device(&dsi_pdev->dev);
return -EPROBE_DEFER;
} else if (ret) {
+   put_device(&dsi_pdev->dev);
DRM_ERROR("probe failed to initialize DSI host bridge\n");
return ret;
}
@@ -509,9 +511,10 @@ static int kmb_probe(struct platform_device *pdev)
/* Create DRM device */
kmb = devm_drm_dev_alloc(dev, &kmb_driver,
 struct kmb_drm_private, drm);
-   if (IS_ERR(kmb))
+   if (IS_ERR(kmb)) {
+   put_device(&dsi_pdev->dev);
return PTR_ERR(kmb);
-
+   }
dev_set_drvdata(dev, &kmb->drm);
 
/* Initialize MIPI DSI */
-- 
1.8.3.1

Re: [PATCH v2 45/63] ath11k: Use memset_startat() for clearing queue descriptors

2021-08-20 Thread Kalle Valo

Kees Cook  writes:

> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memset(), avoid intentionally writing across
> neighboring fields.
>
> Use memset_startat() so memset() doesn't get confused about writing
> beyond the destination member that is intended to be the starting point
> of zeroing through the end of the struct. Additionally split up a later
> field-spanning memset() so that memset() can reason about the size.
>
> Cc: Kalle Valo 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: ath...@lists.infradead.org
> Cc: linux-wirel...@vger.kernel.org
> Cc: net...@vger.kernel.org
> Signed-off-by: Kees Cook 

To avoid conflicts I prefer taking this via my ath tree.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Christian König


No, that perfectly works for me.

The problem we used to have with this approach was that we potentially 
have multiple timeouts at the same time.


But when we serialize the timeout handling by using a single workqueue 
as suggested by Daniel now as well then that isn't an issue any more.


Regards,
Christian.

Am 20.08.21 um 09:12 schrieb Liu, Monk:

[AMD Official Use Only]

@Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
  
Do you have any concern on the kthread_park() approach ?


Theoretically speaking sched_main shall run there exclusively with job_timeout 
since they both touches jobs, and stop scheduler during job_timeout won't 
impact performance since in that scenario
There was already something wrong/stuck on that ring/scheduler

Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Liu, Monk
Sent: Thursday, August 19, 2021 6:26 PM
To: Daniel Vetter ; Grodzovsky, Andrey 

Cc: Alex Deucher ; Chen, JingWen ; Maling list - 
DRI developers ; amd-gfx list ; 
Koenig, Christian 
Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

[AMD Official Use Only]

Hi Daniel


Why can't we stop the scheduler thread first, so that there's guaranteed no 
race? I've recently had a lot of discussions with panfrost folks about their 
reset that spawns across engines, and without stopping the scheduler thread 
first before you touch anything it's just plain impossible.

Yeah we had this though as well in our mind.

Our second approach is to call ktrhead_stop() in job_timedout() routine so that  the 
"bad" job is guaranteed to be used without scheduler's touching or freeing, 
Check this sample patch one as well please:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..50a49cb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
 /* Protects against concurrent deletion in drm_sched_get_cleanup_job */

+   kthread_park(sched->thread);
 spin_lock(&sched->job_list_lock);
 job = list_first_entry_or_null(&sched->pending_list,
struct drm_sched_job, list);
  
 if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
 spin_unlock(&sched->job_list_lock);
  
 status = job->sched->ops->timedout_job(job);

@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 } else {
 spin_unlock(&sched->job_list_lock);
 }
+   kthread_unpark(sched->thread);
  
 if (status != DRM_GPU_SCHED_STAT_ENODEV) {

 spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void 
drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 kthread_park(sched->thread);
  
 /*

-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
-   /*
  * Iterate the job list from later to  earlier one and either deactive
  * their HW callbacks or remove them from pending list if they already
  * signaled.


Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Daniel Vetter 
Sent: Thursday, August 19, 2021 5:31 PM
To: Grodzovsky, Andrey 
Cc: Daniel Vetter ; Alex Deucher ; Chen, JingWen 
; Maling list - DRI developers ; amd-gfx list 
; Liu, Monk ; Koenig, Christian 

Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:

On 2021-08-18 10:42 a.m., Daniel Vetter wrote:

On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote:

On 2021-08-18 10:32 a.m., Daniel Vetter wrote:

On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote:

On 2021-08-18 10:02 a.m., Alex Deucher wrote:


+ dri-devel

Since scheduler is a shared

Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-20 Thread Daniel Vetter

On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe  wrote:
> On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman  wrote:
> > >
> > > Hey all,
> > >
> > > Currently, the RDMA subsystem can only work with dynamic dmabuf
> > > attachments, which requires the RDMA device to support on-demand-paging
> > > (ODP) which is not common on most devices (only supported by mlx5).
> > >
> > > While the dynamic requirement makes sense for certain GPUs, some devices
> > > (such as habanalabs) have device memory that is always "pinned" and do
> > > not need/use the move_notify operation.
> > >
> > > The motivation of this RFC is to use habanalabs as the dmabuf exporter,
> > > and EFA as the importer to allow for peer2peer access through libibverbs.
> > >
> > > This draft patch changes the dmabuf driver to differentiate between
> > > static/dynamic attachments by looking at the move_notify op instead of
> > > the importer_ops struct, and allowing the peer2peer flag to be enabled
> > > in case of a static exporter.
> > >
> > > Thanks
> > >
> > > Signed-off-by: Gal Pressman 
> >
> > Given that habanalabs dma-buf support is very firmly in limbo (at
> > least it's not yet in linux-next or anywhere else) I think you want to
> > solve that problem first before we tackle the additional issue of
> > making p2p work without dynamic dma-buf. Without that it just doesn't
> > make a lot of sense really to talk about solutions here.
>
> I have been thinking about adding a dmabuf exporter to VFIO, for
> basically the same reason habana labs wants to do it.
>
> In that situation we'd want to see an approach similar to this as well
> to have a broad usability.
>
> The GPU drivers also want this for certain sophisticated scenarios
> with RDMA, the intree drivers just haven't quite got there yet.
>
> So, I think it is worthwhile to start thinking about this regardless
> of habana labs.

Oh sure, I've been having these for a while. I think there's two options:
- some kind of soft-pin, where the contract is that we only revoke
when absolutely necessary, and it's expected to be catastrophic on the
importer's side. The use-case would be single user that fully controls
all accelerator local memory, and so kernel driver evicting stuff. I
havent implemented it, but the idea is that essentially in eviction we
check whom we're evicting for (by checking owners of buffers maybe,
atm those are not tracked in generic code but not that hard to add),
and if it's the same userspace owner we don't ever pick these buffers
as victims for eviction, preferreing -ENOMEM/-ENOSPC. If a new user
comes around then we'd still throw these out to avoid abuse, and it
would be up to sysadmins to make sure this doesn't happen untimely,
maybe with the next thing.

- cgroups for (pinned) buffers. Mostly because cgroups for local
memory is somewhere on the plans anyway, but that one could also take
forever since there's questions about overlap with memcg and things
like that, plus thus far everyone who cares made and incompatible
proposal about how it should be done :-/

A variant of the first one would be device-level revoke, which is a
concept we already have in drm for the modesetting side and also for
like 20 year old gpu drivers. We could brush that off and close some
of the gaps (a student is fixing the locking right now, the thing left
to do is mmap revoke), and I think that model of exclusive device
ownership with the option to revoke fits pretty well for at least some
of the accelerators floating around. In that case importers would
never get a move_notify (maybe we should call this revoke_notify to
make it clear it's a bit different) callback, except when the entire
thing has been yanked. I think that would fit pretty well for VFIO,
and I think we should be able to make it work for rdma too as some
kind of auto-deregister. The locking might be fun with both of these
since I expect some inversions compared to the register path, we'll
have to figure these out.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-20 Thread Jerome Glisse

On Thu, Aug 19, 2021 at 11:00 AM Sierra Guiza, Alejandro (Alex)
 wrote:
>
>
> On 8/18/2021 2:28 PM, Ralph Campbell wrote:
> > On 8/17/21 5:35 PM, Felix Kuehling wrote:
> >> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
> >>> On 8/12/21 11:31 PM, Alex Sierra wrote:
>  From: Ralph Campbell 
> 
>  ZONE_DEVICE struct pages have an extra reference count that
>  complicates the
>  code for put_page() and several places in the kernel that need to
>  check the
>  reference count to see that a page is not being used (gup, compaction,
>  migration, etc.). Clean up the code so the reference count doesn't
>  need to
>  be treated specially for ZONE_DEVICE.
> 
>  v2:
>  AS: merged this patch in linux 5.11 version
> 
>  v5:
>  AS: add condition at try_grab_page to check for the zone device type,
>  while
>  page ref counter is checked less/equal to zero. In case of device
>  zone, pages
>  ref counter are initialized to zero.
> 
>  Signed-off-by: Ralph Campbell 
>  Signed-off-by: Alex Sierra 
>  ---
> arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
> drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
> fs/dax.c   |  4 +-
> include/linux/dax.h|  2 +-
> include/linux/memremap.h   |  7 +--
> include/linux/mm.h | 13 +
> lib/test_hmm.c |  2 +-
> mm/internal.h  |  8 +++
> mm/memremap.c  | 68
>  +++---
> mm/migrate.c   |  5 --
> mm/page_alloc.c|  3 ++
> mm/swap.c  | 45 ++---
> 12 files changed, 46 insertions(+), 115 deletions(-)
> 
> >>> I haven't seen a response to the issues I raised back at v3 of this
> >>> series.
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F4f6dd918-d79b-1aa7-3a4c-caa67ddc29bc%40nvidia.com%2F&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P7FxYm%2BkJaCkMFa3OHtuKrPOn7SvytFRmYQdIzq7rN4%3D&reserved=0
> >>>
> >>>
> >>>
> >>> Did I miss something?
> >> I think part of the response was that we did more testing. Alex added
> >> support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
> >> recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
> >> about a zero page refcount in try_get_page. The fix is in the latest
> >> version of patch 2. But it's already obsolete because John Hubbard is
> >> about to remove that function altogether.
> >>
> >> I think the issues you raised were more uncertainty than known bugs. It
> >> seems the fact that you can have DAX pages with 0 refcount is a feature
> >> more than a bug.
> >>
> >> Regards,
> >>Felix
> >
> > Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
> > In that case, mmap() of a DAX device will call insert_page() which calls
> > get_page() which would trigger VM_BUG_ON_PAGE().
> >
> > I can believe it is OK for PTE_SPECIAL page table entries to have no
> > struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with
> > a zero reference count using insert_pfn().
> Hi Ralph,
> We have tried the DAX tests with and without CONFIG_ARCH_HAS_PTE_SPECIAL
> defined.
> Apparently none of the tests touches that condition for a DAX device. Of
> course,
> that doesn't mean it could happen.
>
> Regards,
> Alex S.
>
> >
> >
> > I find it hard to believe that other MM developers don't see an issue
> > with a struct page with refcount == 0 and mapcount == 1.
> >
> > I don't see where init_page_count() is being called for the
> > MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD
> > driver allocates and passes to migrate_vma_setup().
> > Looks like svm_migrate_get_vram_page() needs to call init_page_count()
> > instead of get_page(). (I'm looking at branch
> > origin/alexsierrag/device_generic
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver.git&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IXe8HP2s8x5OdJdERBkGOYJCQk3iqCu5AYkwpDL8zec%3D&reserved=0)
> Yes, you're right. My bad. Thanks for catching this up. I didn't realize
> I was missing
> to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never caught.
> It worked after I replaced get_pages by init_page_count at
> svm_migrate_get_vram_page. However, I don't think this is the

Re: [PATCH v6 08/13] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages

2021-08-20 Thread Jerome Glisse

On Thu, Aug 19, 2021 at 10:05 PM Christoph Hellwig  wrote:
>
> On Tue, Aug 17, 2021 at 11:44:54AM -0400, Felix Kuehling wrote:
> > >> That's a good catch. Existing drivers shouldn't need a page_free
> > >> callback if they didn't have one before. That means we need to add a
> > >> NULL-pointer check in free_device_page.
> > > Also the other state clearing (__ClearPageWaiters/mem_cgroup_uncharge/
> > > ->mapping = NULL).
> > >
> > > In many ways this seems like you want to bring back the DEVICE_PUBLIC
> > > pgmap type that was removed a while ago due to the lack of users
> > > instead of overloading the generic type.
> >
> > I think so. I'm not clear about how DEVICE_PUBLIC differed from what
> > DEVICE_GENERIC is today. As I understand it, DEVICE_PUBLIC was removed
> > because it was unused and also known to be broken in some ways.
> > DEVICE_GENERIC seemed close enough to what we need, other than not being
> > supported in the migration helpers.
> >
> > Would you see benefit in re-introducing DEVICE_PUBLIC as a distinct
> > memory type from DEVICE_GENERIC? What would be the benefits of making
> > that distinction?
>
> The old DEVICE_PUBLIC mostly different in that it allowed the page
> to be returned from vm_normal_page, which I think was horribly buggy.

Why was that buggy ? If I were to do it now, i would return
DEVICE_PUBLIC page from vm_normal_page but i would ban pinning as
pinning is exceptionally wrong for GPU. If you migrate some random
anonymous/file back to your GPU memory and it gets pinned there then
there is no way for the GPU to migrate the page out. Quickly you will
run out of physically contiguous memory and things like big graphic
buffer allocation (anything that needs physically contiguous memory)
will fail. It is less of an issue on some hardware that rely less and
less on physically contiguous memory but i do not think it is
completely gone from all hw.

> But the point is not to bring back these old semantics.  The idea
> is to be able to differeniate between your new coherent on-device
> memory and the existing DEVICE_GENERIC.  That is call the
> code in free_devmap_managed_page that is currently only used
> for device private pages also for your new public device pages without
> affecting the devdax and xen use cases.

Yes, I would rather bring back DEVICE_PUBLIC then try to use
DEVICE_GENERIC, the GENERIC change was done for users that closely
matched DAX semantics and it is not the case here, at least not from
my point of view.

Jerome

Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs

2021-08-20 Thread Michael Ellerman

Kees Cook  writes:
> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memset(), avoid intentionally writing across
> neighboring fields.
>
> Add a struct_group() for the spe registers so that memset() can correctly 
> reason
> about the size:
>
>In function 'fortify_memset_chk',
>inlined from 'restore_user_regs.part.0' at 
> arch/powerpc/kernel/signal_32.c:539:3:
>>> include/linux/fortify-string.h:195:4: error: call to 
>>> '__write_overflow_field' declared with attribute warning: detected write 
>>> beyond size of field (1st parameter); maybe use struct_group()? 
>>> [-Werror=attribute-warning]
>  195 |__write_overflow_field();
>  |^~~~
>
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Christophe Leroy 
> Cc: Sudeep Holla 
> Cc: linuxppc-...@lists.ozlabs.org
> Reported-by: kernel test robot 
> Signed-off-by: Kees Cook 
> ---
>  arch/powerpc/include/asm/processor.h | 6 --
>  arch/powerpc/kernel/signal_32.c  | 6 +++---
>  2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index f348e564f7dd..05dc567cb9a8 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -191,8 +191,10 @@ struct thread_struct {
>   int used_vsr;   /* set if process has used VSX */
>  #endif /* CONFIG_VSX */
>  #ifdef CONFIG_SPE
> - unsigned long   evr[32];/* upper 32-bits of SPE regs */
> - u64 acc;/* Accumulator */
> + struct_group(spe,
> + unsigned long   evr[32];/* upper 32-bits of SPE regs */
> + u64 acc;/* Accumulator */
> + );
>   unsigned long   spefscr;/* SPE & eFP status */
>   unsigned long   spefscr_last;   /* SPEFSCR value on last prctl
>  call or trap return */
> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> index 0608581967f0..77b86caf5c51 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs,
>   regs_set_return_msr(regs, regs->msr & ~MSR_SPE);
>   if (msr & MSR_SPE) {
>   /* restore spe registers from the stack */
> - unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs,
> -   ELF_NEVRREG * sizeof(u32), failed);
> + unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs,
> +   sizeof(current->thread.spe), failed);

This makes me nervous, because the ABI is that we copy ELF_NEVRREG *
sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to
be.

ie. if we use sizeof an inadvertent change to the fields in
thread_struct could change how many bytes we copy out to userspace,
which would be an ABI break.

And that's not that hard to do, because it's not at all obvious that the
size and layout of fields in thread_struct affects the user ABI.

At the same time we don't want to copy the right number of bytes but
the wrong content, so from that point of view using sizeof is good :)

The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that
things match up, so maybe we should do that here too.

ie. add:

BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32));


Not sure if you are happy doing that as part of this patch. I can always
do it later if not.

cheers

Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs

2021-08-20 Thread Christophe Leroy





Le 20/08/2021 à 09:49, Michael Ellerman a écrit :

Kees Cook  writes:

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

In function 'fortify_memset_chk',
inlined from 'restore_user_regs.part.0' at 
arch/powerpc/kernel/signal_32.c:539:3:

include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' 
declared with attribute warning: detected write beyond size of field (1st 
parameter); maybe use struct_group()? [-Werror=attribute-warning]

  195 |__write_overflow_field();
  |^~~~

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Christophe Leroy 
Cc: Sudeep Holla 
Cc: linuxppc-...@lists.ozlabs.org
Reported-by: kernel test robot 
Signed-off-by: Kees Cook 
---
  arch/powerpc/include/asm/processor.h | 6 --
  arch/powerpc/kernel/signal_32.c  | 6 +++---
  2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index f348e564f7dd..05dc567cb9a8 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -191,8 +191,10 @@ struct thread_struct {
int used_vsr;   /* set if process has used VSX */
  #endif /* CONFIG_VSX */
  #ifdef CONFIG_SPE
-   unsigned long   evr[32];/* upper 32-bits of SPE regs */
-   u64 acc;/* Accumulator */
+   struct_group(spe,
+   unsigned long   evr[32];/* upper 32-bits of SPE regs */
+   u64 acc;/* Accumulator */
+   );
unsigned long   spefscr;/* SPE & eFP status */
unsigned long   spefscr_last;   /* SPEFSCR value on last prctl
   call or trap return */
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 0608581967f0..77b86caf5c51 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs,
regs_set_return_msr(regs, regs->msr & ~MSR_SPE);
if (msr & MSR_SPE) {
/* restore spe registers from the stack */
-   unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs,
- ELF_NEVRREG * sizeof(u32), failed);
+   unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs,
+ sizeof(current->thread.spe), failed);


This makes me nervous, because the ABI is that we copy ELF_NEVRREG *
sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to
be.

ie. if we use sizeof an inadvertent change to the fields in
thread_struct could change how many bytes we copy out to userspace,
which would be an ABI break.

And that's not that hard to do, because it's not at all obvious that the
size and layout of fields in thread_struct affects the user ABI.

At the same time we don't want to copy the right number of bytes but
the wrong content, so from that point of view using sizeof is good :)

The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that
things match up, so maybe we should do that here too.

ie. add:

BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32));


You mean != I guess ?





Not sure if you are happy doing that as part of this patch. I can always
do it later if not.

cheers

[PATCH v3] drm/i915/dp: Use max params for panels < eDP 1.4

2021-08-20 Thread Kai-Heng Feng

Users reported that after commit 2bbd6dba84d4 ("drm/i915: Try to use
fast+narrow link on eDP again and fall back to the old max strategy on
failure"), the screen starts to have wobbly effect.

Commit a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for
everything") doesn't help either, that means the affected eDP 1.2 panels
only work with max params.

So use max params for panels < eDP 1.4 as Windows does to solve the
issue.

v3:
 - Do the eDP rev check in intel_edp_init_dpcd()

v2:
 - Check eDP 1.4 instead of DPCD 1.1 to apply max params

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3714
Fixes: 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and 
fall back to the old max strategy on failure")
Fixes: a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for 
everything")
Suggested-by: Ville Syrjälä 
Signed-off-by: Kai-Heng Feng 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 75d4ebc669411..e0dbd35ae7bc0 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -2445,11 +2445,14 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
 */
if (drm_dp_dpcd_read(&intel_dp->aux, DP_EDP_DPCD_REV,
 intel_dp->edp_dpcd, sizeof(intel_dp->edp_dpcd)) ==
-sizeof(intel_dp->edp_dpcd))
+sizeof(intel_dp->edp_dpcd)) {
drm_dbg_kms(&dev_priv->drm, "eDP DPCD: %*ph\n",
(int)sizeof(intel_dp->edp_dpcd),
intel_dp->edp_dpcd);
 
+   intel_dp->use_max_params = intel_dp->edp_dpcd[0] < DP_EDP_14;
+   }
+
/*
 * This has to be called after intel_dp->edp_dpcd is filled, PSR checks
 * for SET_POWER_CAPABLE bit in intel_dp->edp_dpcd[1]
-- 
2.32.0

[PATCH v3] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Jingwen Chen

[Why]
for bailing job, this commit will delete it from pending list thus the
bailing job will never have a chance to be resubmitted even in advance
tdr mode.

[How]
after embeded hw_fence into amdgpu_job is done, the race condition that
this commit tries to work around is completely solved.So revert this
commit.
This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90.
v2:
add dma_fence_get/put() around timedout_job to avoid  concurrent delete
during processing timedout_job
v3:
park sched->thread instead during timedout_job.

Signed-off-by: Jingwen Chen 
---
 drivers/gpu/drm/scheduler/sched_main.c | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a953693b45..c187fd3a6bb6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+   kthread_park(sched->thread);
spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->pending_list,
   struct drm_sched_job, list);
 
if (job) {
-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
spin_unlock(&sched->job_list_lock);
 
status = job->sched->ops->timedout_job(job);
@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
} else {
spin_unlock(&sched->job_list_lock);
}
+   kthread_unpark(sched->thread);
 
if (status != DRM_GPU_SCHED_STAT_ENODEV) {
spin_lock(&sched->job_list_lock);
@@ -392,20 +388,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
struct drm_sched_job *bad)
 
kthread_park(sched->thread);
 
-   /*
-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
/*
 * Iterate the job list from later to  earlier one and either deactive
 * their HW callbacks or remove them from pending list if they already
-- 
2.25.1

Re: [PATCH v5, 00/15] Using component framework to support multi hardware decode

2021-08-20 Thread yunfei.d...@mediatek.com

Hi Ezequiel,

Thanks for your detail feedback. 

On Thu, 2021-08-19 at 11:10 -0300, Ezequiel Garcia wrote:
> On Thu, 19 Aug 2021 at 04:13, yunfei.d...@mediatek.com
>  wrote:
> >
> > Hi Ezequiel,
> >
> > Thanks for your suggestion.
> >
> > On Wed, 2021-08-18 at 11:11 -0300, Ezequiel Garcia wrote:
> > > +danvet
> > >
> > > Hi,
> > >
> > > On Tue, 10 Aug 2021 at 23:58, Yunfei Dong 
> > > wrote:
> > > >
> > > > This series adds support for multi hardware decode into mtk-vcodec,
> > > > by first
> > > > adding component framework to manage each hardware information:
> > > > interrupt,
> > > > clock, register bases and power. Secondly add core thread to deal
> > > > with core
> > > > hardware message, at the same time, add msg queue for different
> > > > hardware
> > > > share messages. Lastly, the architecture of different specs are not
> > > > the same,
> > > > using specs type to separate them.
> > > >
> > >
> > > I don't think it's a good idea to introduce the component API in the
> > > media subsystem. It doesn't seem to be maintained, IRC there's not
> > > even
> > > a maintainer for it, and it has some issues that were never
> > > addressed.
> > >
> > > It would be really important to avoid it. Is it really needed in the
> > > first place?
> > >
> > > Thanks,
> > > Ezequiel
> >
> > For there are many hardware need to use, mt8192 is three and mt8195 is
> > five. Maybe need more to be used in the feature.
> >
> > Each hardware has independent clk/power/iommu port/irq.
> > Use component interface in prob to get each component's information.
> > Just enable the hardware when need to use it, very convenient and
> > simple.
> >
> > I found that there are many modules use component to manage hardware
> > information, such as iommu and drm etc.
> >
> 
> Many drivers support multiple hardware variants, where each variant
> has a different number of clocks or interrupts, see for instance
> struct hantro_variant which allows to expose different codec cores,
> some having both decoder/encoder, and some having just a decoder.
> 
> The component API is mostly used by DRM to aggregate independent
> subdevices (called components) into an aggregated driver.
> 
> For instance, a DRM driver needs to glue together the HDMI, MIPI,
> and plany controller, or any other hardware arrangement where
> devices can be described independently.
> 
The usage scenario is very similar with drm and iommu, So decide to use
component framework.
Decode has three/five or more hardwares, these hardware are independent.
For mt8183 just need core hardware to decode, but mt8192 has lat,soc and
core hardware to decode. When lat need to use, just enable lat hardware,
core is the same.And mt8195 will has two cores, each core can work well
independent.

For each component device just used to open their power/clk/iommu
port/irq when master need to enable it. The main logic is in master
device.

> The component API may look simple but has some issues, it's not easy
> to debug, and can cause troubles if not used as expected [1].
> It's worth making sure you actually need a framework
> to glue different devices together.
> 
Each hardware has its index, master can get hardware information
according these index, looks not complex. What do you mean about not
easy to debug?

> > Do you have any other suggestion for this architecture?
> >
> 
> Looking at the different patchsets that are posted, it's not clear
> to me what exactly are the different architectures that you intend
> to support, can you some documentation which clarifies that?
> 
Have five hardwares lat,soc,core0,core1 and main. Lat thread can use lat
soc and main, core thread can use soc,lat, core0 and core1. Core thread
can be used or not for different project. Also Need to use these
hardware dynamic at the same time. So I use component framework, just
need to know the used  hardware index according to different
project.Need not to do complex logic to manage these hardwares.


> Thanks,
> Ezequiel
> 
> [1] 
> https://patchwork.kernel.org/project/linux-rockchip/cover/20200120170602.3832-1-ezequ...@collabora.com/


Thanks,
Yunfei Dong

Re: [PATCH v8 12/13] drm/mediatek: add MERGE support for mediatek-drm

2021-08-20 Thread CK Hu

Hi, Jason:

On Thu, 2021-08-19 at 10:23 +0800, jason-jh.lin wrote:
> Add MERGE engine file:
> MERGE module is used to merge two slice-per-line inputs
> into one side-by-side output.
> 
> Signed-off-by: jason-jh.lin 
> ---

[snip]

> +
> +int mtk_merge_clk_enable(struct device *dev)
> +{
> + int ret = 0;
> + struct mtk_disp_merge *priv = dev_get_drvdata(dev);
> +
> + ret = clk_prepare_enable(priv->clk);
> + if (ret)
> + pr_err("merge clk prepare enable failed\n");
> +
> + if (priv->async_clk) {

This checking is redundant.

> + ret = clk_prepare_enable(priv->async_clk);
> + if (ret)
> + pr_err("async clk prepare enable failed\n");
> + }
> +
> + return ret;
> +}
> +
> +void mtk_merge_clk_disable(struct device *dev)
> +{
> + struct mtk_disp_merge *priv = dev_get_drvdata(dev);
> +
> + if (priv->async_clk)

Ditto.

Regards,
CK

> + clk_disable_unprepare(priv->async_clk);
> +
> + clk_disable_unprepare(priv->clk);
> +}
> +

[PATCH v4 0/5] drm: update locking for modesetting

2021-08-20 Thread Desmond Cheong Zhi Xi

Hi,

Thanks for all the helpful feedback on the previous version.

Taking all the suggestions together, this series now converts
drm_device.master_mutex into master_rwsem, and also attempts to remove
drm_file.master_lookup_lock. There might still be lock inversions
lurking, so the output from Intel-gfx CI should be interesting to see.

Overall, this series makes the following changes:

- Patch 1: Fix a potential null ptr dereference in drm_master_release

- Patch 2: Convert master_mutex into rwsem (avoids creating a new lock)

- Patch 3: Update global mutex locking in the ioctl handler (avoids
deadlock when grabbing read lock on master_rwsem in drm_ioctl_kernel)

- Patch 4: Plug races with drm modesetting rights

- Patch 5: Replace master_lookup_lock with master_rwsem by untangling
remaining lock hierarchy inversions

v3 -> v4 (suggested by Daniel Vetter):
- Drop a patch that added an unnecessary master_lookup_lock in
drm_master_release (previously patch 2)
- Drop a patch that addressed a non-existent race in
drm_is_current_master_locked (previously patch 3)
- Remove fixes for non-existent null ptr dereferences (previous patch 4)
- Protect drm_master.magic_map,unique{_len} with master_rwsem instead of
master_lookup_lock (dropped previous patch 5)
- Drop the patch that moved master_lookup_lock into struct drm_device
(previously patch 1)
- Drop a patch to export task_work_add (previously patch 8)
- Revert the check for the global mutex in the ioctl handler to use
drm_core_check_feature instead of drm_dev_needs_global_mutex
- Push down master_rwsem locking for selected ioctls to avoid lock
hierarchy inversions, and to allow us to hold write locks on
master_rwsem instead of flushing readers
- Remove master_lookup_lock by replacing it with master_rwsem

v2 -> v3:
- Unexport drm_master_flush, as suggested by Daniel Vetter.
- Merge master_mutex and master_rwsem, as suggested by Daniel Vetter.
- Export task_work_add, reported by kernel test robot.
- Make master_flush static, reported by kernel test robot.
- Move master_lookup_lock into struct drm_device.
- Add a missing lock on master_lookup_lock in drm_master_release.
- Fix a potential race in drm_is_current_master_locked.
- Fix potential null ptr dereferences in drm_{auth, ioctl}.
- Protect magic_map,unique{_len} with  master_lookup_lock.
- Convert master_mutex into a rwsem.
- Update global mutex locking in the ioctl handler.

v1 -> v2 (suggested by Daniel Vetter):
- Address an additional race when drm_open runs.
- Switch from SRCU to rwsem to synchronise readers and writers.
- Implement drm_master_flush with task_work so that flushes can be
queued to run before returning to userspace without creating a new
DRM_MASTER_FLUSH ioctl flag.

Best wishes,
Desmond

Desmond Cheong Zhi Xi (5):
  drm: fix null ptr dereference in drm_master_release
  drm: convert drm_device.master_mutex into a rwsem
  drm: lock drm_global_mutex earlier in the ioctl handler
  drm: avoid races with modesetting rights
  drm: remove drm_file.master_lookup_lock

 drivers/gpu/drm/drm_auth.c| 54 
 drivers/gpu/drm/drm_debugfs.c |  4 +-
 drivers/gpu/drm/drm_drv.c |  3 +-
 drivers/gpu/drm/drm_file.c|  7 ++--
 drivers/gpu/drm/drm_internal.h|  1 +
 drivers/gpu/drm/drm_ioctl.c   | 48 -
 drivers/gpu/drm/drm_lease.c   | 69 ++-
 drivers/gpu/drm/drm_mode_object.c | 14 +--
 include/drm/drm_auth.h|  6 +--
 include/drm/drm_device.h  | 15 +--
 include/drm/drm_file.h| 17 +++-
 include/drm/drm_lease.h   |  2 +-
 12 files changed, 125 insertions(+), 115 deletions(-)

-- 
2.25.1

[PATCH v4 1/5] drm: fix null ptr dereference in drm_master_release

2021-08-20 Thread Desmond Cheong Zhi Xi

drm_master_release can be called on a drm_file without a master, which
results in a null ptr dereference of file_priv->master->magic_map. The
three cases are:

1. Error path in drm_open_helper
  drm_open():
drm_open_helper():
  drm_master_open():
drm_new_set_master(); <--- returns -ENOMEM,
   drm_file.master not set
  drm_file_free():
drm_master_release(); <--- NULL ptr dereference
   (file_priv->master->magic_map)

2. Error path in mock_drm_getfile
  mock_drm_getfile():
anon_inode_getfile(); <--- returns error, drm_file.master not set
drm_file_free():
  drm_master_release(); <--- NULL ptr dereference
 (file_priv->master->magic_map)

3. In drm_client_close, as drm_client_open doesn't set up a master

drm_file.master is set up in drm_open_helper through the call to
drm_master_open, so we mirror it with a call to drm_master_release in
drm_close_helper, and remove drm_master_release from drm_file_free to
avoid the null ptr dereference.

Signed-off-by: Desmond Cheong Zhi Xi 
---
 drivers/gpu/drm/drm_file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ed25168619fc..90b62f360da1 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -282,9 +282,6 @@ void drm_file_free(struct drm_file *file)
 
drm_legacy_ctxbitmap_flush(dev, file);
 
-   if (drm_is_primary_client(file))
-   drm_master_release(file);
-
if (dev->driver->postclose)
dev->driver->postclose(dev, file);
 
@@ -305,6 +302,9 @@ static void drm_close_helper(struct file *filp)
list_del(&file_priv->lhead);
mutex_unlock(&dev->filelist_mutex);
 
+   if (drm_is_primary_client(file_priv))
+   drm_master_release(file_priv);
+
drm_file_free(file_priv);
 }
 
-- 
2.25.1

[PATCH v4 2/5] drm: convert drm_device.master_mutex into a rwsem

2021-08-20 Thread Desmond Cheong Zhi Xi

drm_device.master_mutex currently protects the following:
- drm_device.master
- drm_file.master
- drm_file.was_master
- drm_file.is_master
- drm_master.unique
- drm_master.unique_len
- drm_master.magic_map

There is a clear separation between functions that read or change
these attributes. Hence, convert master_mutex into a rwsem to enable
concurrent readers.

Signed-off-by: Desmond Cheong Zhi Xi 
---
 drivers/gpu/drm/drm_auth.c| 35 ++-
 drivers/gpu/drm/drm_debugfs.c |  4 ++--
 drivers/gpu/drm/drm_drv.c |  3 +--
 drivers/gpu/drm/drm_ioctl.c   | 10 +-
 include/drm/drm_auth.h|  6 +++---
 include/drm/drm_device.h  | 10 ++
 include/drm/drm_file.h| 12 ++--
 7 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 60a6b21474b1..73ade0513ccb 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -64,7 +64,7 @@
 static bool drm_is_current_master_locked(struct drm_file *fpriv)
 {
lockdep_assert_once(lockdep_is_held(&fpriv->master_lookup_lock) ||
-   lockdep_is_held(&fpriv->minor->dev->master_mutex));
+   lockdep_is_held(&fpriv->minor->dev->master_rwsem));
 
return fpriv->is_master && drm_lease_owner(fpriv->master) == 
fpriv->minor->dev->master;
 }
@@ -96,7 +96,7 @@ int drm_getmagic(struct drm_device *dev, void *data, struct 
drm_file *file_priv)
struct drm_auth *auth = data;
int ret = 0;
 
-   mutex_lock(&dev->master_mutex);
+   down_write(&dev->master_rwsem);
if (!file_priv->magic) {
ret = idr_alloc(&file_priv->master->magic_map, file_priv,
1, 0, GFP_KERNEL);
@@ -104,7 +104,7 @@ int drm_getmagic(struct drm_device *dev, void *data, struct 
drm_file *file_priv)
file_priv->magic = ret;
}
auth->magic = file_priv->magic;
-   mutex_unlock(&dev->master_mutex);
+   up_write(&dev->master_rwsem);
 
DRM_DEBUG("%u\n", auth->magic);
 
@@ -119,13 +119,13 @@ int drm_authmagic(struct drm_device *dev, void *data,
 
DRM_DEBUG("%u\n", auth->magic);
 
-   mutex_lock(&dev->master_mutex);
+   down_write(&dev->master_rwsem);
file = idr_find(&file_priv->master->magic_map, auth->magic);
if (file) {
file->authenticated = 1;
idr_replace(&file_priv->master->magic_map, NULL, auth->magic);
}
-   mutex_unlock(&dev->master_mutex);
+   up_write(&dev->master_rwsem);
 
return file ? 0 : -EINVAL;
 }
@@ -167,7 +167,7 @@ static int drm_new_set_master(struct drm_device *dev, 
struct drm_file *fpriv)
struct drm_master *old_master;
struct drm_master *new_master;
 
-   lockdep_assert_held_once(&dev->master_mutex);
+   lockdep_assert_held_once(&dev->master_rwsem);
 
WARN_ON(fpriv->is_master);
old_master = fpriv->master;
@@ -249,7 +249,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data,
 {
int ret;
 
-   mutex_lock(&dev->master_mutex);
+   down_write(&dev->master_rwsem);
 
ret = drm_master_check_perm(dev, file_priv);
if (ret)
@@ -281,7 +281,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data,
 
drm_set_master(dev, file_priv, false);
 out_unlock:
-   mutex_unlock(&dev->master_mutex);
+   up_write(&dev->master_rwsem);
return ret;
 }
 
@@ -298,7 +298,7 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
 {
int ret;
 
-   mutex_lock(&dev->master_mutex);
+   down_write(&dev->master_rwsem);
 
ret = drm_master_check_perm(dev, file_priv);
if (ret)
@@ -321,8 +321,9 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
}
 
drm_drop_master(dev, file_priv);
+
 out_unlock:
-   mutex_unlock(&dev->master_mutex);
+   up_write(&dev->master_rwsem);
return ret;
 }
 
@@ -334,7 +335,7 @@ int drm_master_open(struct drm_file *file_priv)
/* if there is no current master make this fd it, but do not create
 * any master object for render clients
 */
-   mutex_lock(&dev->master_mutex);
+   down_write(&dev->master_rwsem);
if (!dev->master) {
ret = drm_new_set_master(dev, file_priv);
} else {
@@ -342,7 +343,7 @@ int drm_master_open(struct drm_file *file_priv)
file_priv->master = drm_master_get(dev->master);
spin_unlock(&file_priv->master_lookup_lock);
}
-   mutex_unlock(&dev->master_mutex);
+   up_write(&dev->master_rwsem);
 
return ret;
 }
@@ -352,7 +353,7 @@ void drm_master_release(struct drm_file *file_priv)
struct drm_device *dev = file_priv->minor->dev;
struct drm_master *master;
 
-   mutex_lock(&dev->master_mutex);
+   down_write(&dev->master_rwsem);
master = file_priv->ma

[PATCH v4 3/5] drm: lock drm_global_mutex earlier in the ioctl handler

2021-08-20 Thread Desmond Cheong Zhi Xi

In a future patch, a read lock on drm_device.master_rwsem is
held in the ioctl handler before the check for ioctl
permissions. However, this inverts the lock hierarchy of
drm_global_mutex --> master_rwsem.

To avoid this, we do some prep work to grab the drm_global_mutex
before checking for ioctl permissions.

Signed-off-by: Desmond Cheong Zhi Xi 
---
 drivers/gpu/drm/drm_ioctl.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index d25713b09b80..158629d88319 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -772,19 +772,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t 
*func, void *kdata,
if (drm_dev_is_unplugged(dev))
return -ENODEV;
 
+   /* Enforce sane locking for modern driver ioctls. */
+   if (unlikely(drm_core_check_feature(dev, DRIVER_LEGACY)) && !(flags & 
DRM_UNLOCKED))
+   mutex_lock(&drm_global_mutex);
+
retcode = drm_ioctl_permit(flags, file_priv);
if (unlikely(retcode))
-   return retcode;
+   goto out;
 
-   /* Enforce sane locking for modern driver ioctls. */
-   if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
-   (flags & DRM_UNLOCKED))
-   retcode = func(dev, kdata, file_priv);
-   else {
-   mutex_lock(&drm_global_mutex);
-   retcode = func(dev, kdata, file_priv);
+   retcode = func(dev, kdata, file_priv);
+
+out:
+   if (unlikely(drm_core_check_feature(dev, DRIVER_LEGACY)) && !(flags & 
DRM_UNLOCKED))
mutex_unlock(&drm_global_mutex);
-   }
return retcode;
 }
 EXPORT_SYMBOL(drm_ioctl_kernel);
-- 
2.25.1

[PATCH v4 4/5] drm: avoid races with modesetting rights

2021-08-20 Thread Desmond Cheong Zhi Xi

In drm_client_modeset.c and drm_fb_helper.c,
drm_master_internal_{acquire,release} are used to avoid races with DRM
userspace. These functions hold onto drm_device.master_rwsem while
committing, and bail if there's already a master.

However, there are other places where modesetting rights can race. A
time-of-check-to-time-of-use error can occur if an ioctl that changes
the modeset has its rights revoked after it validates its permissions,
but before it completes.

There are four places where modesetting permissions can change:

- DROP_MASTER ioctl removes rights for a master and its leases

- REVOKE_LEASE ioctl revokes rights for a specific lease

- SET_MASTER ioctl sets the device master if the master role hasn't
been acquired yet

- drm_open which can create a new master for a device if one does not
currently exist

These races can be avoided using drm_device.master_rwsem: users that
perform modesetting should hold a read lock on the new
drm_device.master_rwsem, and users that change these permissions
should hold a write lock.

To avoid deadlocks with master_rwsem, for ioctls that need to check
for modesetting permissions, but also need to hold a write lock on
master_rwsem to protect some other attribute (or recurses to some
function that holds a write lock, like drm_mode_create_lease_ioctl
which eventually calls drm_master_open), we remove the DRM_MASTER flag
and push the master_rwsem lock and permissions check into the ioctl.

Reported-by: Daniel Vetter 
Signed-off-by: Desmond Cheong Zhi Xi 
---
 drivers/gpu/drm/drm_auth.c  |  4 
 drivers/gpu/drm/drm_ioctl.c | 20 +++-
 drivers/gpu/drm/drm_lease.c | 35 ---
 include/drm/drm_device.h|  5 +
 4 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 73ade0513ccb..65065f7e1499 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -120,6 +120,10 @@ int drm_authmagic(struct drm_device *dev, void *data,
DRM_DEBUG("%u\n", auth->magic);
 
down_write(&dev->master_rwsem);
+   if (unlikely(!drm_is_current_master(file_priv))) {
+   up_write(&dev->master_rwsem);
+   return -EACCES;
+   }
file = idr_find(&file_priv->master->magic_map, auth->magic);
if (file) {
file->authenticated = 1;
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 158629d88319..8bea39ffc5c0 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -386,6 +386,10 @@ static int drm_setversion(struct drm_device *dev, void 
*data, struct drm_file *f
int if_version, retcode = 0;
 
down_write(&dev->master_rwsem);
+   if (unlikely(!drm_is_current_master(file_priv))) {
+   retcode = -EACCES;
+   goto unlock;
+   }
if (sv->drm_di_major != -1) {
if (sv->drm_di_major != DRM_IF_MAJOR ||
sv->drm_di_minor < 0 || sv->drm_di_minor > DRM_IF_MINOR) {
@@ -420,8 +424,9 @@ static int drm_setversion(struct drm_device *dev, void 
*data, struct drm_file *f
sv->drm_di_minor = DRM_IF_MINOR;
sv->drm_dd_major = dev->driver->major;
sv->drm_dd_minor = dev->driver->minor;
-   up_write(&dev->master_rwsem);
 
+unlock:
+   up_write(&dev->master_rwsem);
return retcode;
 }
 
@@ -574,12 +579,12 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SET_CLIENT_CAP, drm_setclientcap, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
+   DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, 0),
 
DRM_IOCTL_DEF(DRM_IOCTL_SET_UNIQUE, drm_invalid_op, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_BLOCK, drm_noop, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_UNBLOCK, drm_noop, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
-   DRM_IOCTL_DEF(DRM_IOCTL_AUTH_MAGIC, drm_authmagic, DRM_MASTER),
+   DRM_IOCTL_DEF(DRM_IOCTL_AUTH_MAGIC, drm_authmagic, 0),
 
DRM_LEGACY_IOCTL_DEF(DRM_IOCTL_ADD_MAP, drm_legacy_addmap_ioctl, 
DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_LEGACY_IOCTL_DEF(DRM_IOCTL_RM_MAP, drm_legacy_rmmap_ioctl, 
DRM_AUTH),
@@ -706,10 +711,10 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
-   DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER),
+   DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, 
DRM_MASTER),

[PATCH v4 5/5] drm: remove drm_file.master_lookup_lock

2021-08-20 Thread Desmond Cheong Zhi Xi

Previously, master_lookup_lock was introduced in
commit 0b0860a3cf5e ("drm: serialize drm_file.master with a new
spinlock") to serialize accesses to drm_file.master. This then allowed
us to write drm_file_get_master in commit 56f0729a510f ("drm: protect
drm_master pointers in drm_lease.c").

The rationale behind introducing a new spinlock at the time was that
the other lock that could have been used (drm_device.master_mutex) was
the outermost lock, so embedding calls to drm_file_get_master in
various functions easily caused us to invert the lock hierarchy. For
example, we would invert the master_mutex --> mode_config.idr_mutex
hierarchy.

Following the conversion of master_mutex into a rwsem, and its use to
plug races with modesetting rights, we've untangled some lock
hierarchies and removed the need for using drm_file_get_master and the
unlocked version of drm_is_current_master in multiple places.

Hence, we can take this opportunity to clean up the locking design. To
do so, we rewrite drm_file_get_master and drm_is_current_master to use
master_rwsem, clean up their uses, fix remaining inversions (such as
with _drm_lease_held master grabbing master_rwsem while holding onto
mode_config.idr_mutex), then remove master_lookup_lock.

Signed-off-by: Desmond Cheong Zhi Xi 
---
 drivers/gpu/drm/drm_auth.c| 19 ++--
 drivers/gpu/drm/drm_file.c|  1 -
 drivers/gpu/drm/drm_internal.h|  1 +
 drivers/gpu/drm/drm_ioctl.c   |  4 ++--
 drivers/gpu/drm/drm_lease.c   | 38 ---
 drivers/gpu/drm/drm_mode_object.c | 14 +---
 include/drm/drm_file.h|  9 +---
 include/drm/drm_lease.h   |  2 +-
 8 files changed, 32 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
index 65065f7e1499..ff720f67eacb 100644
--- a/drivers/gpu/drm/drm_auth.c
+++ b/drivers/gpu/drm/drm_auth.c
@@ -61,10 +61,9 @@
  * trusted clients.
  */
 
-static bool drm_is_current_master_locked(struct drm_file *fpriv)
+bool drm_is_current_master_locked(struct drm_file *fpriv)
 {
-   lockdep_assert_once(lockdep_is_held(&fpriv->master_lookup_lock) ||
-   lockdep_is_held(&fpriv->minor->dev->master_rwsem));
+   lockdep_assert_held_once(&fpriv->minor->dev->master_rwsem);
 
return fpriv->is_master && drm_lease_owner(fpriv->master) == 
fpriv->minor->dev->master;
 }
@@ -83,9 +82,9 @@ bool drm_is_current_master(struct drm_file *fpriv)
 {
bool ret;
 
-   spin_lock(&fpriv->master_lookup_lock);
+   down_read(&fpriv->minor->dev->master_rwsem);
ret = drm_is_current_master_locked(fpriv);
-   spin_unlock(&fpriv->master_lookup_lock);
+   up_read(&fpriv->minor->dev->master_rwsem);
 
return ret;
 }
@@ -120,7 +119,7 @@ int drm_authmagic(struct drm_device *dev, void *data,
DRM_DEBUG("%u\n", auth->magic);
 
down_write(&dev->master_rwsem);
-   if (unlikely(!drm_is_current_master(file_priv))) {
+   if (unlikely(!drm_is_current_master_locked(file_priv))) {
up_write(&dev->master_rwsem);
return -EACCES;
}
@@ -178,9 +177,7 @@ static int drm_new_set_master(struct drm_device *dev, 
struct drm_file *fpriv)
new_master = drm_master_create(dev);
if (!new_master)
return -ENOMEM;
-   spin_lock(&fpriv->master_lookup_lock);
fpriv->master = new_master;
-   spin_unlock(&fpriv->master_lookup_lock);
 
fpriv->is_master = 1;
fpriv->authenticated = 1;
@@ -343,9 +340,7 @@ int drm_master_open(struct drm_file *file_priv)
if (!dev->master) {
ret = drm_new_set_master(dev, file_priv);
} else {
-   spin_lock(&file_priv->master_lookup_lock);
file_priv->master = drm_master_get(dev->master);
-   spin_unlock(&file_priv->master_lookup_lock);
}
up_write(&dev->master_rwsem);
 
@@ -410,13 +405,13 @@ struct drm_master *drm_file_get_master(struct drm_file 
*file_priv)
 {
struct drm_master *master = NULL;
 
-   spin_lock(&file_priv->master_lookup_lock);
+   down_read(&file_priv->minor->dev->master_rwsem);
if (!file_priv->master)
goto unlock;
master = drm_master_get(file_priv->master);
 
 unlock:
-   spin_unlock(&file_priv->master_lookup_lock);
+   up_read(&file_priv->minor->dev->master_rwsem);
return master;
 }
 EXPORT_SYMBOL(drm_file_get_master);
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 90b62f360da1..8c846e0179d7 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -176,7 +176,6 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
init_waitqueue_head(&file->event_wait);
file->event_space = 4096; /* set aside 4k for event buffer */
 
-   spin_lock_init(&file->master_lookup_lock);
mutex_init(&file->event_read_lock);
 
if (drm_core_chec

[PATCH] drm/rockchip: dsi: Fix duplicate included linux/phy/phy.h

2021-08-20 Thread Jiapeng Chong

Clean up the following includecheck warning:

./drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c: linux/phy/phy.h is
included more than once.

Reported-by: Abaci Robot 
Fixes: 71f68fe7f121 ("drm/rockchip: dsi: add ability to work as a phy instead 
of full dsi")
Signed-off-by: Jiapeng Chong 
---
 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index a2262be..4a6c100 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
-- 
1.8.3.1

Re: [PATCH v3 12/15] drm/mediatek: add display MDP RDMA support for MT8195

2021-08-20 Thread CK Hu


Hi, Nancy:

On Wed, 2021-08-18 at 17:18 +0800, Nancy.Lin wrote:
> Add MDP_RDMA driver for MT8195. MDP_RDMA is the DMA engine of
> the ovl_adaptor component.
> 
> Signed-off-by: Nancy.Lin 
> ---

[snip]

> +
> +#define MDP_RDMA_EN0x000
> +#define FLD_ROT_ENABLEREG_FLD(1, 0)

I would like the bitwise definition has one more indent than the byte
definition.

> +
> +#define MDP_RDMA_RESET 0x008
> +
> +#define MDP_RDMA_CON   0x020
> +#define FLD_OUTPUT_10BREG_FLD(1, 5)
> +#define FLD_SIMPLE_MODE   REG_FLD(1, 4)
> +
> +#define MDP_RDMA_GMCIF_CON 0x028
> +#define FLD_EXT_ULTRA_EN  REG_FLD(1, 18)
> +#define FLD_PRE_ULTRA_EN  REG_FLD(2, 16)
> +#define FLD_ULTRA_EN  REG_FLD(2, 12)
> +#define FLD_RD_REQ_TYPE   REG_FLD(4, 4)
> +#define VAL_RD_REQ_TYPE_BURST_8_ACCESS   7
> +#define FLD_EXT_PREULTRA_EN   REG_FLD(1, 3)
> +#define FLD_COMMAND_DIV   REG_FLD(1, 0)
> +
> +#define MDP_RDMA_SRC_CON   0x030
> +#define FLD_OUTPUT_ARGB   REG_FLD(1, 25)
> +#define FLD_BIT_NUMBERREG_FLD(2, 18)
> +#define FLD_UNIFORM_CONFIGREG_FLD(1, 17)
> +#define FLD_SWAP  REG_FLD(1, 14)
> +#define FLD_SRC_FORMATREG_FLD(4, 0)
> +
> +#define MDP_RDMA_COMP_CON  0x038
> +#define FLD_AFBC_EN   REG_FLD(1, 22)
> +#define FLD_AFBC_YUV_TRANSFORMREG_FLD(1, 21)
> +#define FLD_UFBDC_EN  REG_FLD(1, 12)
> +
> +#define MDP_RDMA_MF_BKGD_SIZE_IN_BYTE  0x060
> +#define FLD_MF_BKGD_WBREG_FLD(23, 0)
> +
> +#define MDP_RDMA_MF_BKGD_SIZE_IN_PIXEL 0x068
> +#define FLD_MF_BKGD_WPREG_FLD(23, 0)
> +
> +#define MDP_RDMA_MF_SRC_SIZE   0x070
> +#define FLD_MF_SRC_H  REG_FLD(15, 16)
> +#define FLD_MF_SRC_W  REG_FLD(15, 0)
> +
> +#define MDP_RDMA_MF_CLIP_SIZE  0x078
> +#define FLD_MF_CLIP_H REG_FLD(15, 16)
> +#define FLD_MF_CLIP_W REG_FLD(15, 0)
> +
> +#define MDP_RDMA_TARGET_LINE   0x0a0
> +#define FLD_LINE_THRESHOLDREG_FLD(15, 17)
> +#define FLD_TARGET_LINE_ENREG_FLD(1, 16)
> +
> +#define MDP_RDMA_SRC_OFFSET_0  0x118
> +#define FLD_SRC_OFFSET_0  REG_FLD(32, 0)
> +
> +#define MDP_RDMA_TRANSFORM_0   0x200
> +#define FLD_INT_MATRIX_SELREG_FLD(5, 23)
> +#define FLD_TRANS_EN  REG_FLD(1, 16)
> +
> +#define MDP_RDMA_UTRA_H_CON_0  0x248
> +#define FLD_PREUTRA_H_OFS_0   REG_FLD(10, 10)
> +
> +#define MDP_RDMA_UTRA_L_CON_0  0x250
> +#define FLD_PREUTRA_L_OFS_0   REG_FLD(10, 10)
> +
> +#define MDP_RDMA_SRC_BASE_00xf00
> +#define FLD_SRC_BASE_0REG_FLD(32, 0)
> +
> +#define RDMA_INPUT_SWAP  BIT(14)
> +#define RDMA_INPUT_10BIT BIT(18)
> +

[snip]

> +
> +static void mtk_mdp_rdma_fifo_config(void __iomem *base, struct cmdq_pkt 
> *cmdq_pkt,
> +  struct cmdq_client_reg *cmdq_base)
> +{
> + unsigned int pre_ultra_h = 156;
> + unsigned int pre_ultra_l = 104;

Give the reason why this value. You could refer to merge [1].

[1]
https://patchwork.kernel.org/project/linux-mediatek/patch/20210819022327.13040-13-jason-jh@mediatek.com/

> + unsigned int reg_mask;
> + unsigned int reg_val;
> + unsigned int reg;
> +
> + reg = MDP_RDMA_GMCIF_CON;
> + reg_val = REG_FLD_VAL(FLD_RD_REQ_TYPE, VAL_RD_REQ_TYPE_BURST_8_ACCESS) |
> +   REG_FLD_VAL(FLD_COMMAND_DIV, 1) |
> +   REG_FLD_VAL(FLD_EXT_PREULTRA_EN, 1) |
> +   REG_FLD_VAL(FLD_ULTRA_EN, 0) |
> +   REG_FLD_VAL(FLD_PRE_ULTRA_EN, 1) |
> +   REG_FLD_VAL(FLD_EXT_ULTRA_EN, 1);
> + reg_mask = REG_FLD_MASK(FLD_RD_REQ_TYPE) |
> +REG_FLD_MASK(FLD_COMMAND_DIV) |
> +REG_FLD_MASK(FLD_EXT_PREULTRA_EN) |
> +REG_FLD_MASK(FLD_ULTRA_EN) |
> +REG_FLD_MASK(FLD_PRE_ULTRA_EN) |
> +REG_FLD_MASK(FLD_EXT_ULTRA_EN);
> + mtk_ddp_write_mask(cm

[v1 1/2] drm/panel: boe-tv101wum-nl6: Support enabling a 3.3V rail

2021-08-20 Thread yangcong

The auo,b101uan08.3 panel (already supported by this driver) has
a 3.3V rail that needs to be turned on. For previous users of
this panel this voltage was directly output by pmic. On a new
user (the not-yet-upstream sc7180-trogdor-mrbland board) we need
to turn the 3.3V rail on. Add support in the driver for this.

Signed-off-by: yangcong 
---
 drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c 
b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
index db9d0b86d542..9a644433629e 100644
--- a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
+++ b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
@@ -45,6 +45,7 @@ struct boe_panel {
const struct panel_desc *desc;
 
enum drm_panel_orientation orientation;
+   struct regulator *pp3300;
struct regulator *pp1800;
struct regulator *avee;
struct regulator *avdd;
@@ -511,6 +512,7 @@ static int boe_panel_unprepare(struct drm_panel *panel)
gpiod_set_value(boe->enable_gpio, 0);
usleep_range(5000, 7000);
regulator_disable(boe->pp1800);
+   regulator_disable(boe->pp3300);
} else {
gpiod_set_value(boe->enable_gpio, 0);
usleep_range(500, 1000);
@@ -518,6 +520,7 @@ static int boe_panel_unprepare(struct drm_panel *panel)
regulator_disable(boe->avdd);
usleep_range(5000, 7000);
regulator_disable(boe->pp1800);
+   regulator_disable(boe->pp3300);
}
 
boe->prepared = false;
@@ -536,6 +539,10 @@ static int boe_panel_prepare(struct drm_panel *panel)
gpiod_set_value(boe->enable_gpio, 0);
usleep_range(1000, 1500);
 
+   ret = regulator_enable(boe->pp3300);
+   if (ret < 0)
+   return ret;
+
ret = regulator_enable(boe->pp1800);
if (ret < 0)
return ret;
@@ -767,6 +774,10 @@ static int boe_panel_add(struct boe_panel *boe)
if (IS_ERR(boe->avee))
return PTR_ERR(boe->avee);
 
+   boe->pp3300 = devm_regulator_get(dev, "pp3300");
+   if (IS_ERR(boe->pp3300))
+   return PTR_ERR(boe->pp3300);
+
boe->pp1800 = devm_regulator_get(dev, "pp1800");
if (IS_ERR(boe->pp1800))
return PTR_ERR(boe->pp1800);
-- 
2.25.1

[PATCH] This is only used for testing upstream progress

2021-08-20 Thread xuxinxiong

We want to test the upstream progress.

Signed-off-by: xuxinxiong 
---
 drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c 
b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
index dc49079a74d1..ce7437d1b682 100644
--- a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
+++ b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
@@ -4,6 +4,8 @@
  * Author: Jitao Shi 
  */
 
+/* This is used for test upstream progress */
+
 #include 
 #include 
 #include 
-- 
2.25.1

RE: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device

2021-08-20 Thread Lin, Wayne

[Public]

> -Original Message-
> From: Lyude Paul 
> Sent: Thursday, August 19, 2021 2:59 AM
> To: Lin, Wayne ; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas ; Wentland, Harry 
> ; Zuo, Jerry
> ; Wu, Hersen ; Juston Li 
> ; Imre Deak ;
> Ville Syrjälä ; Daniel Vetter 
> ; Sean Paul ; Maarten Lankhorst
> ; Maxime Ripard ; 
> Thomas Zimmermann ;
> David Airlie ; Daniel Vetter ; Deucher, 
> Alexander ; Siqueira,
> Rodrigo ; Pillai, Aurabindo 
> ; Eryk Brol ; Bas
> Nieuwenhuizen ; Cornij, Nikola 
> ; Jani Nikula ; Manasi
> Navare ; Ankit Nautiyal 
> ; José Roberto de Souza ;
> Sean Paul ; Ben Skeggs ; 
> sta...@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end 
> device
>
> On Wed, 2021-08-11 at 09:49 +, Lin, Wayne wrote:
> > [Public]
> >
> > > -Original Message-
> > > From: Lyude Paul 
> > > Sent: Wednesday, August 11, 2021 4:45 AM
> > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas ; Wentland,
> > > Harry < harry.wentl...@amd.com>; Zuo, Jerry ; Wu,
> > > Hersen ; Juston Li < juston...@intel.com>; Imre
> > > Deak ; Ville Syrjälä
> > > ; Daniel Vetter <
> > > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten
> > > Lankhorst ; Maxime Ripard
> > > ; Thomas Zimmermann ; David
> > > Airlie ; Daniel Vetter ; Deucher,
> > > Alexander ; Siqueira, Rodrigo
> > > ; Pillai, Aurabindo <
> > > aurabindo.pil...@amd.com>; Eryk Brol ; Bas
> > > Nieuwenhuizen ; Cornij, Nikola <
> > > nikola.cor...@amd.com>; Jani Nikula ; Manasi
> > > Navare ; Ankit Nautiyal <
> > > ankit.k.nauti...@intel.com>; José Roberto de Souza
> > > ; Sean Paul ; Ben
> > > Skeggs ; sta...@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > On Wed, 2021-08-04 at 07:13 +, Lin, Wayne wrote:
> > > > [Public]
> > > >
> > > > > -Original Message-
> > > > > From: Lyude Paul 
> > > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > > To: Lin, Wayne ;
> > > > > dri-devel@lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas ;
> > > > > Wentland, Harry < harry.wentl...@amd.com>; Zuo, Jerry
> > > > > ; Wu, Hersen ; Juston Li
> > > > > < juston...@intel.com>; Imre Deak ; Ville
> > > > > Syrjälä ; Wentland, Harry <
> > > > > harry.wentl...@amd.com>; Daniel Vetter ;
> > > > > Sean Paul ; Maarten Lankhorst <
> > > > > maarten.lankho...@linux.intel.com>; Maxime Ripard
> > > > > ; Thomas Zimmermann ;
> > > > > David Airlie ; Daniel Vetter
> > > > > ; Deucher, Alexander
> > > > > ; Siqueira, Rodrigo <
> > > > > rodrigo.sique...@amd.com>; Pillai, Aurabindo
> > > > > ; Eryk Brol ; Bas
> > > > > Nieuwenhuizen ; Cornij, Nikola
> > > > > ; Jani Nikula ;
> > > > > Manasi Navare ; Ankit Nautiyal
> > > > > ; José Roberto de Souza
> > > > > ; Sean Paul ; Ben
> > > > > Skeggs ; sta...@vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > >
> > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > > [Why]
> > > > > > > Currently, we will create connectors for all output ports no
> > > > > > > matter it's connected or not. However, in MST, we can only
> > > > > > > determine whether an output port really stands for a "connector"
> > > > > > > till it is connected and check its peer device type as an
> > > > > > > end device.
> > > > > >
> > > > > > What is this commit trying to solve exactly? e.g. is AMD
> > > > > > currently running into issues with there being too many DRM
> > > > > > connectors or something like that?
> > > > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > > > unless there's good reason to change it.
> > > > Hi Lyude,
> > > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > >
> > > > I come up with this commit because I observed something confusing
> > > > when I was analyzing MST connectors' life cycle. Take the topology
> > > > instance you mentioned below
> > > >
> > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected
> > > > w/
> > > > display)
> > > > |
> > > > -
> > > > > Output_Port 2 (Disconnected)
> > > > -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > > > (Disconnected)
> > > >
> > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within
> > > > this hub. With our MST implementation today, we'll create drm
> > > > connectors for all output ports. Hence, we totally create 6 drm 
> > > > connectors here.
> > > > However, Output ports of Root MSTB are not connected to a stream sink.
> > > > They are connected with branch devices.
> > > > Thus, creating drm connector for such port looks a bit strange to
> > > > me and increases complexity to tracking drm connectors.  My
> > > > thought is we only need to create drm connector fo

Re: [PATCH v8 20/34] mmc: sdhci-tegra: Add runtime PM and OPP support

2021-08-20 Thread Thierry Reding

On Fri, Aug 20, 2021 at 01:37:13AM +0300, Dmitry Osipenko wrote:
> 19.08.2021 20:03, Thierry Reding пишет:
> > On Tue, Aug 17, 2021 at 04:27:40AM +0300, Dmitry Osipenko wrote:
> >> The SDHCI on Tegra belongs to the core power domain and we're going to
> >> enable GENPD support for the core domain. Now SDHCI must be resumed using
> >> runtime PM API in order to initialize the SDHCI power state. The SDHCI
> >> clock rate must be changed using OPP API that will reconfigure the power
> >> domain performance state in accordance to the rate. Add runtime PM and OPP
> >> support to the SDHCI driver.
> >>
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  drivers/mmc/host/sdhci-tegra.c | 146 -
> >>  1 file changed, 105 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/drivers/mmc/host/sdhci-tegra.c 
> >> b/drivers/mmc/host/sdhci-tegra.c
> >> index 387ce9cdbd7c..a3583359c972 100644
> >> --- a/drivers/mmc/host/sdhci-tegra.c
> >> +++ b/drivers/mmc/host/sdhci-tegra.c
> >> @@ -15,6 +15,8 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >> +#include 
> >>  #include 
> >>  #include 
> >>  #include 
> >> @@ -24,6 +26,8 @@
> >>  #include 
> >>  #include 
> >>  
> >> +#include 
> >> +
> >>  #include "sdhci-pltfm.h"
> >>  #include "cqhci.h"
> >>  
> >> @@ -123,6 +127,12 @@
> >> SDHCI_TRNS_BLK_CNT_EN | \
> >> SDHCI_TRNS_DMA)
> >>  
> >> +enum {
> >> +  TEGRA_CLK_BULK_SDHCI,
> >> +  TEGRA_CLK_BULK_TMCLK,
> >> +  TEGRA_CLK_BULK_NUM,
> >> +};
> >> +
> >>  struct sdhci_tegra_soc_data {
> >>const struct sdhci_pltfm_data *pdata;
> >>u64 dma_mask;
> >> @@ -171,6 +181,8 @@ struct sdhci_tegra {
> >>bool enable_hwcq;
> >>unsigned long curr_clk_rate;
> >>u8 tuned_tap_delay;
> >> +
> >> +  struct clk_bulk_data clocks[TEGRA_CLK_BULK_NUM];
> > 
> > This doesn't seem worth it to me. There's a lot of churn in this driver
> > that's only needed to convert this to the clk_bulk API and it makes the
> > code a lot more difficult to read, in my opinion.
> > 
> > It looks like the only benefit that this gives us is that runtime
> > suspend and resume become a few lines shorter.
> 
> The driver probe code looks cleaner with that. You should be looking at
> the final result and not at the patch to see it.

I did look at the final result and didn't find it cleaner at all. =)

Thierry


signature.asc
Description: PGP signature

Re: [PATCH v8 07/34] clk: tegra: Support runtime PM and power domain

2021-08-20 Thread Thierry Reding

On Fri, Aug 20, 2021 at 01:09:46AM +0300, Dmitry Osipenko wrote:
> 19.08.2021 19:54, Thierry Reding пишет:
> > On Wed, Aug 18, 2021 at 08:11:03PM +0300, Dmitry Osipenko wrote:
> >> 18.08.2021 19:42, Thierry Reding пишет:
> >>> On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote:
>  18.08.2021 17:07, Thierry Reding пишет:
> > On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
> > [...]
> >> +struct clk *tegra_clk_register(struct clk_hw *hw)
> >> +{
> >> +  struct platform_device *pdev;
> >> +  struct device *dev = NULL;
> >> +  struct device_node *np;
> >> +  const char *dev_name;
> >> +
> >> +  np = tegra_clk_get_of_node(hw);
> >> +
> >> +  if (!of_device_is_available(np))
> >> +  goto put_node;
> >> +
> >> +  dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", 
> >> hw->init->name);
> >> +  if (!dev_name)
> >> +  goto put_node;
> >> +
> >> +  pdev = of_platform_device_create(np, dev_name, NULL);
> >> +  if (!pdev) {
> >> +  pr_err("%s: failed to create device for %pOF\n", 
> >> __func__, np);
> >> +  kfree(dev_name);
> >> +  goto put_node;
> >> +  }
> >> +
> >> +  dev = &pdev->dev;
> >> +  pm_runtime_enable(dev);
> >> +put_node:
> >> +  of_node_put(np);
> >> +
> >> +  return clk_register(dev, hw);
> >> +}
> >
> > This looks wrong. Why do we need struct platform_device objects for each
> > of these clocks? That's going to be a massive amount of platform devices
> > and they will completely mess up sysfs.
> 
>  RPM works with a device. It's not a massive amount of devices, it's one
>  device for T20 and four devices for T30.
> >>>
> >>> I'm still not sure I understand why we need to call RPM functions on a
> >>> clock. And even if they are few, it seems wrong to make these platform
> >>> devices.
> >>
> >> Before clock is enabled, we need to raise core voltage. After clock is
> >> disabled, the voltage should be dropped. CCF+RPM takes care of handling
> >> this for us.
> > 
> > That's the part that I do understand. What I don't understand is why a
> > clock needs to be runtime suspend/resumed. Typically we suspend/resume
> > devices, and doing so typically involves disabling/enabling clocks. So
> > I don't understand why the clocks themselves now need to be runtime
> > suspended/resumed.
> 
> CCF provides RPM management for a device that backs clock. When clock
> is enabled, it resumes the backing device.
> 
> RPM, GENPD and OPP frameworks work with a device. We use all these
> frameworks here. Since we don't have a dedicated device for a PLL
> clock, we need to create it in order to leverage the existing generic
> kernel APIs.
> 
> In this case clocks are not runtime suspended/resumed, the device
> which backs clock is suspended/resumed.
> 
> >>> Perhaps they can be simple struct device:s instead? Ideally they would
> >>> also be parented to the CAR so that they appear in the right place in
> >>> the sysfs hierarchy.
> >>
> >> Could you please clarify what do you mean by 'simple struct device:s'?
> >> These clock devices should be OF devices with a of_node and etc,
> >> otherwise we can't use OPP framework.
> > 
> > Perhaps I misunderstand the goal of the OPP framework. My understanding
> > was that this was to attach a table of operating points with a device so
> > that appropriate operating points could be selected and switched to when
> > the workload changes.
> > 
> > Typically these operating points would be roughly a clock rate and a
> > corresponding voltage for a regulator, so that when a certain clock rate
> > is requested, the regulator can be set to the matching voltage.
> > 
> > Hm... so is it that each of these clocks that you want to create a
> > platform device for has its own regulator? Because the patch series only
> > mentions the CORE domain, so I assumed that we would accumulate all the
> > clock rates for the clocks that are part of that CORE domain and then
> > derive a voltage to be supplied to that CORE domain.
> > 
> > But perhaps I just don't understand correctly how this is tied together.
> 
> We don't use regulators, we use power domain that controls regulator.
> GENPD takes care of accumulating performance requests on a per-device
> basis.
> 
> I'm creating platform device for the clocks that require DVFS. These
> clocks don't use regulator, they are attached to the CORE domain.
> GENPD framework manages the performance state, aggregating perf votes
> from each device, i.e. from each clock individually.
> 
> You want to reinvent another layer of aggregation on top of GENPD.
> This doesn't worth the effort, we won't get anything from it, it
> should be a lot of extra complexity for nothing. We will also lose
> from it because pm_genpd_summary won't show you a per-device inf

Adding sync_shrinkers function for TTM pool optimization

2021-08-20 Thread Christian König

Hi Andrew,

Daniel suggested that I ping you once more about this.

Basically we want to add a barrier function to make sure that our TTM pool 
shrinker is not freeing up pages from a device while the device is being 
unplugged.

Currently we are having a global mutex to serialize all of this, but this 
caused contention for unmapping the freed pages in the IOMMU.

We just need your Acked-by and I hope my explanation is now more understandable 
than the last time.

Cheers,
Christian.

[PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2

2021-08-20 Thread Christian König

Switch back to using a spinlock again by moving the IOMMU unmap outside
of the locked region.

This avoids contention especially while freeing pages.

v2: Add a comment explaining why we need sync_shrinkers().

Signed-off-by: Christian König 
Acked-by: Huang Rui 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 40 +++---
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cb38b1a17b09..7d4f76d4141d 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER];
 static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
 static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
 
-static struct mutex shrinker_lock;
+static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
@@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, 
struct ttm_pool *pool,
spin_lock_init(&pt->lock);
INIT_LIST_HEAD(&pt->pages);
 
-   mutex_lock(&shrinker_lock);
+   spin_lock(&shrinker_lock);
list_add_tail(&pt->shrinker_list, &shrinker_list);
-   mutex_unlock(&shrinker_lock);
+   spin_unlock(&shrinker_lock);
 }
 
 /* Remove a pool_type from the global shrinker list and free all pages */
@@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
struct page *p;
 
-   mutex_lock(&shrinker_lock);
+   spin_lock(&shrinker_lock);
list_del(&pt->shrinker_list);
-   mutex_unlock(&shrinker_lock);
+   spin_unlock(&shrinker_lock);
 
while ((p = ttm_pool_type_take(pt)))
ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
@@ -313,24 +313,23 @@ static struct ttm_pool_type *ttm_pool_select_type(struct 
ttm_pool *pool,
 static unsigned int ttm_pool_shrink(void)
 {
struct ttm_pool_type *pt;
-   unsigned int num_freed;
+   unsigned int num_pages;
struct page *p;
 
-   mutex_lock(&shrinker_lock);
+   spin_lock(&shrinker_lock);
pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
+   list_move_tail(&pt->shrinker_list, &shrinker_list);
+   spin_unlock(&shrinker_lock);
 
p = ttm_pool_type_take(pt);
if (p) {
ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-   num_freed = 1 << pt->order;
+   num_pages = 1 << pt->order;
} else {
-   num_freed = 0;
+   num_pages = 0;
}
 
-   list_move_tail(&pt->shrinker_list, &shrinker_list);
-   mutex_unlock(&shrinker_lock);
-
-   return num_freed;
+   return num_pages;
 }
 
 /* Return the allocation order based for a page */
@@ -530,6 +529,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
for (j = 0; j < MAX_ORDER; ++j)
ttm_pool_type_fini(&pool->caching[i].orders[j]);
}
+
+   /* We removed the pool types from the LRU, but we need to also make sure
+* that no shrinker is concurrently freeing pages from the pool.
+*/
+   sync_shrinkers();
 }
 
 /* As long as pages are available make sure to release at least one */
@@ -604,7 +608,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file 
*m, void *data)
 {
ttm_pool_debugfs_header(m);
 
-   mutex_lock(&shrinker_lock);
+   spin_lock(&shrinker_lock);
seq_puts(m, "wc\t:");
ttm_pool_debugfs_orders(global_write_combined, m);
seq_puts(m, "uc\t:");
@@ -613,7 +617,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file 
*m, void *data)
ttm_pool_debugfs_orders(global_dma32_write_combined, m);
seq_puts(m, "uc 32\t:");
ttm_pool_debugfs_orders(global_dma32_uncached, m);
-   mutex_unlock(&shrinker_lock);
+   spin_unlock(&shrinker_lock);
 
ttm_pool_debugfs_footer(m);
 
@@ -640,7 +644,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m)
 
ttm_pool_debugfs_header(m);
 
-   mutex_lock(&shrinker_lock);
+   spin_lock(&shrinker_lock);
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
seq_puts(m, "DMA ");
switch (i) {
@@ -656,7 +660,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m)
}
ttm_pool_debugfs_orders(pool->caching[i].orders, m);
}
-   mutex_unlock(&shrinker_lock);
+   spin_unlock(&shrinker_lock);
 
ttm_pool_debugfs_footer(m);
return 0;
@@ -693,7 +697,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
if (!page_pool_size)
page_pool_size = num_pages;
 
-   mutex_init(&shrinker_lock);
+   spin_lock_init(&shrinker_lock);
INIT_LIST_HEAD(&shrinker_list);
 
for (i = 0; i < MAX_ORDER; ++i) {
-- 
2.25.1

[PATCH 1/2] mm/vmscan: add sync_shrinkers function v2

2021-08-20 Thread Christian König

From: Christian König 

While unplugging a device the TTM shrinker implementation
needs a barrier to make sure that all concurrent shrink
operations are done and no other CPU is referring to a
device specific pool any more.

Taking and releasing the shrinker semaphore on the write
side after unmapping and freeing all pages from the device
pool should make sure that no shrinker is running in
paralell.

This allows us to avoid the contented mutex in the TTM pool
implementation for every alloc/free operation.

v2: rework the commit message to make clear why we need this

Signed-off-by: Christian König 
Acked-by: Huang Rui 
Reviewed-by: Daniel Vetter 
---
 include/linux/shrinker.h |  1 +
 mm/vmscan.c  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 9814fff58a69..1de17f53cdbc 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker 
*shrinker);
 extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
+extern void sync_shrinkers(void);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4620df62f0ff..fde1aabcfa7f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
+/**
+ * sync_shrinker - Wait for all running shrinkers to complete.
+ */
+void sync_shrinkers(void)
+{
+   down_write(&shrinker_rwsem);
+   up_write(&shrinker_rwsem);
+}
+EXPORT_SYMBOL(sync_shrinkers);
+
 #define SHRINK_BATCH 128
 
 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
-- 
2.25.1

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Jingwen Chen

Hi all,

I just submit a v3 patch according your opinion on using kthread_park
instead.

Thanks,
Jingwen
On Fri Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote:
> No, that perfectly works for me.
> 
> The problem we used to have with this approach was that we potentially have
> multiple timeouts at the same time.
> 
> But when we serialize the timeout handling by using a single workqueue as
> suggested by Daniel now as well then that isn't an issue any more.
> 
> Regards,
> Christian.
> 
> Am 20.08.21 um 09:12 schrieb Liu, Monk:
> > [AMD Official Use Only]
> > 
> > @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
> > Do you have any concern on the kthread_park() approach ?
> > 
> > Theoretically speaking sched_main shall run there exclusively with 
> > job_timeout since they both touches jobs, and stop scheduler during 
> > job_timeout won't impact performance since in that scenario
> > There was already something wrong/stuck on that ring/scheduler
> > 
> > Thanks
> > 
> > --
> > Monk Liu | Cloud-GPU Core team
> > --
> > 
> > -Original Message-
> > From: Liu, Monk
> > Sent: Thursday, August 19, 2021 6:26 PM
> > To: Daniel Vetter ; Grodzovsky, Andrey 
> > 
> > Cc: Alex Deucher ; Chen, JingWen 
> > ; Maling list - DRI developers 
> > ; amd-gfx list 
> > ; Koenig, Christian 
> > 
> > Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad 
> > job."
> > 
> > [AMD Official Use Only]
> > 
> > Hi Daniel
> > 
> > > > Why can't we stop the scheduler thread first, so that there's 
> > > > guaranteed no race? I've recently had a lot of discussions with 
> > > > panfrost folks about their reset that spawns across engines, and 
> > > > without stopping the scheduler thread first before you touch anything 
> > > > it's just plain impossible.
> > Yeah we had this though as well in our mind.
> > 
> > Our second approach is to call ktrhead_stop() in job_timedout() routine so 
> > that  the "bad" job is guaranteed to be used without scheduler's touching 
> > or freeing, Check this sample patch one as well please:
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index a2a9536..50a49cb 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> >  sched = container_of(work, struct drm_gpu_scheduler, 
> > work_tdr.work);
> >  /* Protects against concurrent deletion in 
> > drm_sched_get_cleanup_job */
> > +   kthread_park(sched->thread);
> >  spin_lock(&sched->job_list_lock);
> >  job = list_first_entry_or_null(&sched->pending_list,
> > struct drm_sched_job, list);
> >  if (job) {
> > -   /*
> > -* Remove the bad job so it cannot be freed by concurrent
> > -* drm_sched_cleanup_jobs. It will be reinserted back after 
> > sched->thread
> > -* is parked at which point it's safe.
> > -*/
> > -   list_del_init(&job->list);
> >  spin_unlock(&sched->job_list_lock);
> >  status = job->sched->ops->timedout_job(job);
> > @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> >  } else {
> >  spin_unlock(&sched->job_list_lock);
> >  }
> > +   kthread_unpark(sched->thread);
> >  if (status != DRM_GPU_SCHED_STAT_ENODEV) {
> >  spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ 
> > void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job 
> > *bad)
> >  kthread_park(sched->thread);
> >  /*
> > -* Reinsert back the bad job here - now it's safe as
> > -* drm_sched_get_cleanup_job cannot race against us and release the
> > -* bad job at this point - we parked (waited for) any in progress
> > -* (earlier) cleanups and drm_sched_get_cleanup_job will not be 
> > called
> > -* now until the scheduler thread is unparked.
> > -*/
> > -   if (bad && bad->sched == sched)
> > -   /*
> > -* Add at the head of the queue to reflect it was the 
> > earliest
> > -* job extracted.
> > -*/
> > -   list_add(&bad->list, &sched->pending_list);
> > -
> > -   /*
> >   * Iterate the job list from later to  earlier one and either 
> > deactive
> >   * their HW callbacks or remove them from pending list if they 
> > already
> >   * signaled.
> > 
> > 
> > Thanks
> > 
> > --
> > Monk Liu | Cloud-GPU Core team
> > --
> > 
> > -Original Message-
> > From: Daniel Vetter 
> > Sent: Thursday, Au

Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs

2021-08-20 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 20/08/2021 à 09:49, Michael Ellerman a écrit :
>> Kees Cook  writes:
>>> In preparation for FORTIFY_SOURCE performing compile-time and run-time
>>> field bounds checking for memset(), avoid intentionally writing across
>>> neighboring fields.
>>>
>>> Add a struct_group() for the spe registers so that memset() can correctly 
>>> reason
>>> about the size:
>>>
>>> In function 'fortify_memset_chk',
>>> inlined from 'restore_user_regs.part.0' at 
>>> arch/powerpc/kernel/signal_32.c:539:3:
> include/linux/fortify-string.h:195:4: error: call to 
> '__write_overflow_field' declared with attribute warning: detected write 
> beyond size of field (1st parameter); maybe use struct_group()? 
> [-Werror=attribute-warning]
>>>   195 |__write_overflow_field();
>>>   |^~~~
>>>
>>> Cc: Michael Ellerman 
>>> Cc: Benjamin Herrenschmidt 
>>> Cc: Paul Mackerras 
>>> Cc: Christophe Leroy 
>>> Cc: Sudeep Holla 
>>> Cc: linuxppc-...@lists.ozlabs.org
>>> Reported-by: kernel test robot 
>>> Signed-off-by: Kees Cook 
>>> ---
>>>   arch/powerpc/include/asm/processor.h | 6 --
>>>   arch/powerpc/kernel/signal_32.c  | 6 +++---
>>>   2 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/processor.h 
>>> b/arch/powerpc/include/asm/processor.h
>>> index f348e564f7dd..05dc567cb9a8 100644
>>> --- a/arch/powerpc/include/asm/processor.h
>>> +++ b/arch/powerpc/include/asm/processor.h
>>> @@ -191,8 +191,10 @@ struct thread_struct {
>>> int used_vsr;   /* set if process has used VSX */
>>>   #endif /* CONFIG_VSX */
>>>   #ifdef CONFIG_SPE
>>> -   unsigned long   evr[32];/* upper 32-bits of SPE regs */
>>> -   u64 acc;/* Accumulator */
>>> +   struct_group(spe,
>>> +   unsigned long   evr[32];/* upper 32-bits of SPE regs */
>>> +   u64 acc;/* Accumulator */
>>> +   );
>>> unsigned long   spefscr;/* SPE & eFP status */
>>> unsigned long   spefscr_last;   /* SPEFSCR value on last prctl
>>>call or trap return */
>>> diff --git a/arch/powerpc/kernel/signal_32.c 
>>> b/arch/powerpc/kernel/signal_32.c
>>> index 0608581967f0..77b86caf5c51 100644
>>> --- a/arch/powerpc/kernel/signal_32.c
>>> +++ b/arch/powerpc/kernel/signal_32.c
>>> @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs,
>>> regs_set_return_msr(regs, regs->msr & ~MSR_SPE);
>>> if (msr & MSR_SPE) {
>>> /* restore spe registers from the stack */
>>> -   unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs,
>>> - ELF_NEVRREG * sizeof(u32), failed);
>>> +   unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs,
>>> + sizeof(current->thread.spe), failed);
>> 
>> This makes me nervous, because the ABI is that we copy ELF_NEVRREG *
>> sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to
>> be.
>> 
>> ie. if we use sizeof an inadvertent change to the fields in
>> thread_struct could change how many bytes we copy out to userspace,
>> which would be an ABI break.
>> 
>> And that's not that hard to do, because it's not at all obvious that the
>> size and layout of fields in thread_struct affects the user ABI.
>> 
>> At the same time we don't want to copy the right number of bytes but
>> the wrong content, so from that point of view using sizeof is good :)
>> 
>> The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that
>> things match up, so maybe we should do that here too.
>> 
>> ie. add:
>> 
>>  BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32));
>
> You mean != I guess ?

Gah. Yes I do :)

cheers

Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-20 Thread Jason Gunthorpe

On Fri, Aug 20, 2021 at 09:25:30AM +0200, Daniel Vetter wrote:
> On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe  wrote:
> > On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote:
> > > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman  wrote:
> > > >
> > > > Hey all,
> > > >
> > > > Currently, the RDMA subsystem can only work with dynamic dmabuf
> > > > attachments, which requires the RDMA device to support on-demand-paging
> > > > (ODP) which is not common on most devices (only supported by mlx5).
> > > >
> > > > While the dynamic requirement makes sense for certain GPUs, some devices
> > > > (such as habanalabs) have device memory that is always "pinned" and do
> > > > not need/use the move_notify operation.
> > > >
> > > > The motivation of this RFC is to use habanalabs as the dmabuf exporter,
> > > > and EFA as the importer to allow for peer2peer access through 
> > > > libibverbs.
> > > >
> > > > This draft patch changes the dmabuf driver to differentiate between
> > > > static/dynamic attachments by looking at the move_notify op instead of
> > > > the importer_ops struct, and allowing the peer2peer flag to be enabled
> > > > in case of a static exporter.
> > > >
> > > > Thanks
> > > >
> > > > Signed-off-by: Gal Pressman 
> > >
> > > Given that habanalabs dma-buf support is very firmly in limbo (at
> > > least it's not yet in linux-next or anywhere else) I think you want to
> > > solve that problem first before we tackle the additional issue of
> > > making p2p work without dynamic dma-buf. Without that it just doesn't
> > > make a lot of sense really to talk about solutions here.
> >
> > I have been thinking about adding a dmabuf exporter to VFIO, for
> > basically the same reason habana labs wants to do it.
> >
> > In that situation we'd want to see an approach similar to this as well
> > to have a broad usability.
> >
> > The GPU drivers also want this for certain sophisticated scenarios
> > with RDMA, the intree drivers just haven't quite got there yet.
> >
> > So, I think it is worthwhile to start thinking about this regardless
> > of habana labs.
> 
> Oh sure, I've been having these for a while. I think there's two options:
> - some kind of soft-pin, where the contract is that we only revoke
> when absolutely necessary, and it's expected to be catastrophic on the
> importer's side. 

Honestly, I'm not very keen on this. We don't really have HW support
in several RDMA scenarios for even catastrophic unpin.

Gal, can EFA even do this for a MR? You basically have to resize the
rkey/lkey to zero length (or invalidate it like a FMR) under the
catstrophic revoke. The rkey/lkey cannot just be destroyed as that
opens a security problem with rkey/lkey re-use.

I think I saw EFA's current out of tree implementations had this bug.

> to do is mmap revoke), and I think that model of exclusive device
> ownership with the option to revoke fits pretty well for at least some
> of the accelerators floating around. In that case importers would
> never get a move_notify (maybe we should call this revoke_notify to
> make it clear it's a bit different) callback, except when the entire
> thing has been yanked. I think that would fit pretty well for VFIO,
> and I think we should be able to make it work for rdma too as some
> kind of auto-deregister. The locking might be fun with both of these
> since I expect some inversions compared to the register path, we'll
> have to figure these out.

It fits semantically nicely, VFIO also has a revoke semantic for BAR
mappings.

The challenge is the RDMA side which doesn't have a 'dma disabled
error state' for objects as part of the spec.

Some HW, like mlx5, can implement this for MR objects (see revoke_mr),
but I don't know if anything else can, and even mlx5 currently can't
do a revoke for any other object type.

I don't know how useful it would be, need to check on some of the use
cases.

The locking is tricky as we have to issue a device command, but that
device command cannot run concurrently with destruction or the tail
part of creation.

Jason

Re: [PATCH v2 56/63] RDMA/mlx5: Use struct_group() to zero struct mlx5_ib_mr

2021-08-20 Thread Jason Gunthorpe

On Thu, Aug 19, 2021 at 11:14:37AM -0700, Kees Cook wrote:

> Which do you mean? When doing the conversions I tended to opt for
> struct_group() since it provides more robust "intentionality". Strictly
> speaking, the new memset helpers are doing field-spanning writes, but the
> "clear to the end" pattern was so common it made sense to add the helpers,
> as they're a bit less disruptive. It's totally up to you! :)

Well, of the patches you cc'd to me only this one used the struct
group..

Jason

Re: [PATCH v8 01/34] opp: Add dev_pm_opp_sync() helper

2021-08-20 Thread Ulf Hansson

On Thu, 19 Aug 2021 at 21:35, Dmitry Osipenko  wrote:
>
> 19.08.2021 16:07, Ulf Hansson пишет:
> > On Wed, 18 Aug 2021 at 17:43, Dmitry Osipenko  wrote:
> >>
> >> 18.08.2021 13:08, Ulf Hansson пишет:
> >>> On Wed, 18 Aug 2021 at 11:50, Viresh Kumar  
> >>> wrote:
> 
>  On 18-08-21, 11:41, Ulf Hansson wrote:
> > On Wed, 18 Aug 2021 at 11:14, Viresh Kumar  
> > wrote:
> >> What we need here is just configure. So something like this then:
> >>
> >> - genpd->get_performance_state()
> >>   -> dev_pm_opp_get_current_opp() //New API
> >>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> >>
> >> This can be done just once from probe() then.
> >
> > How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
> 
>  The opp core already has a way of finding current OPP, that's what
>  Dmitry is trying to use here. It finds it using clk_get_rate(), if
>  that is zero, it picks the lowest freq possible.
> 
> > I am sure I understand the problem. When a device is getting probed,
> > it needs to consume power, how else can the corresponding driver
> > successfully probe it?
> 
>  Dmitry can answer that better, but a device doesn't necessarily need
>  to consume energy in probe. It can consume bus clock, like APB we
>  have, but the more energy consuming stuff can be left disabled until
>  the time a user comes up. Probe will just end up registering the
>  driver and initializing it.
> >>>
> >>> That's perfectly fine, as then it's likely that it won't vote for an
> >>> OPP, but can postpone that as well.
> >>>
> >>> Perhaps the problem is rather that the HW may already carry a non-zero
> >>> vote made from a bootloader. If the consumer driver tries to clear
> >>> that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
> >>> still not lead to any updates of the performance state in genpd,
> >>> because genpd internally has initialized the performance-state to
> >>> zero.
> >>
> >> We don't need to discover internal SoC devices because we use
> >> device-tree on ARM. For most devices power isn't required at a probe
> >> time because probe function doesn't touch h/w at all, thus devices are
> >> left in suspended state after probe.
> >>
> >> We have three components comprising PM on Tegra:
> >>
> >> 1. Power gate
> >> 2. Clock state
> >> 3. Voltage state
> >>
> >> GENPD on/off represents the 'power gate'.
> >>
> >> Clock and reset are controlled by device drivers using clk and rst APIs.
> >>
> >> Voltage state is represented by GENPD's performance level.
> >>
> >> GENPD core assumes that at a first rpm-resume of a consumer device, its
> >> genpd_performance=0. Not true for Tegra because h/w of the device is
> >> preconfigured to a non-zero perf level initially, h/w may not support
> >> zero level at all.
> >
> > I think you may be misunderstanding genpd's behaviour around this, but
> > let me elaborate.
> >
> > In genpd_runtime_resume(), we try to restore the performance state for
> > the device that genpd_runtime_suspend() *may* have dropped earlier.
> > That means, if genpd_runtime_resume() is called prior
> > genpd_runtime_suspend() for the first time, it means that
> > genpd_runtime_resume() will *not* restore a performance state, but
> > instead just leave the performance state as is for the device (see
> > genpd_restore_performance_state()).
> >
> > In other words, a consumer driver may use the following sequence to
> > set an initial performance state for the device during ->probe():
> >
> > ...
> > rate = clk_get_rate()
> > dev_pm_opp_set_rate(rate)
> >
> > pm_runtime_enable()
> > pm_runtime_resume_and_get()
> > ...
> >
> > Note that, it's the consumer driver's responsibility to manage device
> > specific resources, in its ->runtime_suspend|resume() callbacks.
> > Typically that means dealing with clock gating/ungating, for example.
> >
> > In the other scenario where a consumer driver prefers to *not* call
> > pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
> > to power on the device to complete probing, then we don't want to vote
> > for an OPP at all - and we also want the performance state for the
> > device in genpd to be set to zero. Correct?
>
> Yes
>
> > Is this the main problem you are trying to solve, because I think this
> > doesn't work out of the box as of today?
>
> The main problem is that the restored performance state is zero for the
> first genpd_runtime_resume(), while it's not zero from the h/w perspective.

This should not be a problem, but can be handled by the consumer driver.

genpd_runtime_resume() calls genpd_restore_performance_state() to
restore a performance state for the device. However, in the scenario
you describe, "gpd_data->rpm_pstate" is zero, which makes
genpd_restore_performance_state() to just leave the device's
performance state as is - it will *not* restore the performance state
to zero.

To make the consu

Re: [PATCH v8 01/34] opp: Add dev_pm_opp_sync() helper

2021-08-20 Thread Ulf Hansson

On Fri, 20 Aug 2021 at 07:18, Viresh Kumar  wrote:
>
> On 19-08-21, 16:55, Ulf Hansson wrote:
> > Right, that sounds reasonable.
> >
> > We already have pm_genpd_opp_to_performance_state() which translates
> > an OPP to a performance state. This function invokes the
> > ->opp_to_performance_state() for a genpd. Maybe we need to allow a
> > genpd to not have ->opp_to_performance_state() callback assigned
> > though, but continue up in the hierarchy to see if the parent has the
> > callback assigned, to make this work for Tegra?
> >
> > Perhaps we should add an API dev_pm_genpd_opp_to_performance_state(),
> > allowing us to pass the device instead of the genpd. But that's a
> > minor thing.
>
> I am not concerned a lot about how it gets implemented, and am not
> sure as well, as I haven't looked into these details since sometime.
> Any reasonable thing will be accepted, as simple as that.
>
> > Finally, the precondition to use the above, is to first get a handle
> > to an OPP table. This is where I am struggling to find a generic
> > solution, because I guess that would be platform or even consumer
> > driver specific for how to do this. And at what point should we do
> > this?
>
> Hmm, I am not very clear with the whole picture at this point of time.
>
> Dmitry, can you try to frame a sequence of events/calls/etc that will
> define what kind of devices we are looking at here, and how this can
> be made to work ?
>
> > > > Viresh, please take a look at what I did in [1]. Maybe it could be done
> > > > in another way.
> > >
> > > I looked into this and looked like too much trouble. The
> > > implementation needs to be simple. I am not sure I understand all the
> > > problems you faced while doing that, would be better to start with a
> > > simpler implementation of get_performance_state() kind of API for
> > > genpd, after the domain is attached and its OPP table is initialized.
> > >
> > > Note, that the OPP table isn't required to be fully initialized for
> > > the device at this point, we can parse the DT as well if needed be.
> >
> > Sure, but as I indicated above, you need some kind of input data to
> > figure out what OPP table to pick, before you can translate that into
> > a performance state. Is that always the clock rate, for example?
>
> Eventually it can be clock, bandwidth, or pstate of anther genpd, not
> sure what all we are looking for now. It should be just clock right
> now as far as I can imagine :)
>
> > Perhaps, we should start with adding a dev_pm_opp_get_from_rate() or
> > what do you think? Do you have other suggestions?
>
> We already have similar APIs, so that won't be a problem. We also have
> a mechanism inside the OPP core, frequency based, which is used to
> guess the current OPP. Maybe we can enhance and use that directly
> here.

After reading the last reply from Dmitry, I am starting to think that
the problem he is facing can be described and solved in a much easier
way.

If I am correct, it looks like we don't need to add APIs to get OPPs
for a clock rate or set initial performance state values according to
the HW in genpd.

See my other response to Dmitry, let's see where that leads us.

Kind regards
Uffe

Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region

2021-08-20 Thread Jiri Kosina

On Tue, 17 Aug 2021, Kees Cook wrote:

> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memcpy(), memmove(), and memset(), avoid
> intentionally writing across neighboring fields.
> 
> Use struct_group() in struct cp2112_string_report around members report,
> length, type, and string, so they can be referenced together. This will
> allow memcpy() and sizeof() to more easily reason about sizes, improve
> readability, and avoid future warnings about writing beyond the end of
> report.
> 
> "pahole" shows no size nor member offset changes to struct
> cp2112_string_report.  "objdump -d" shows no meaningful object
> code changes (i.e. only source line number induced differences.)
> 
> Cc: Jiri Kosina 
> Cc: Benjamin Tissoires 
> Cc: linux-in...@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied, thanks.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Jiri Kosina

On Tue, 17 Aug 2021, Kees Cook wrote:

> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memset(), avoid intentionally writing across
> neighboring fields.
> 
> Add struct_group() to mark region of struct kone_mouse_event that should
> be initialized to zero.
> 
> Cc: Stefan Achatz 
> Cc: Jiri Kosina 
> Cc: Benjamin Tissoires 
> Cc: linux-in...@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied, thank you Kees.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v8 07/34] clk: tegra: Support runtime PM and power domain

2021-08-20 Thread Ulf Hansson

[...]

> >
> > I'm creating platform device for the clocks that require DVFS. These
> > clocks don't use regulator, they are attached to the CORE domain.
> > GENPD framework manages the performance state, aggregating perf votes
> > from each device, i.e. from each clock individually.
> >
> > You want to reinvent another layer of aggregation on top of GENPD.
> > This doesn't worth the effort, we won't get anything from it, it
> > should be a lot of extra complexity for nothing. We will also lose
> > from it because pm_genpd_summary won't show you a per-device info.
> >
> > domain  status  children
> >performance
> > /device runtime status
> > --
> > heg on  
> >100
> > /devices/soc0/5000.host1x   active  
> >100
> > /devices/soc0/5000.host1x/5414.gr2d suspended   
> >0
> > mpe off-0   
> >0
> > vdecoff-0   
> >0
> > /devices/soc0/6001a000.vde  suspended   
> >0
> > vencoff-0   
> >0
> > 3d1 off-0   
> >0
> > /devices/genpd:1:5418.gr3d  suspended   
> >0
> > 3d0 off-0   
> >0
> > /devices/genpd:0:5418.gr3d  suspended   
> >0
> > core-domain on  
> >100
> > 3d0, 3d1, venc, vdec, mpe, 
> > heg
> > /devices/soc0/7d00.usb  active  
> >100
> > /devices/soc0/78000400.mmc  active  
> >95
> > /devices/soc0/7000f400.memory-controllerunsupported 
> >100
> > /devices/soc0/7000a000.pwm  active  
> >100
> > /devices/soc0/60006000.clock/tegra_clk_pll_cactive  
> >100
> > /devices/soc0/60006000.clock/tegra_clk_pll_esuspended   
> >0
> > /devices/soc0/60006000.clock/tegra_clk_pll_mactive  
> >100
> > /devices/soc0/60006000.clock/tegra_clk_sclk active  
> >100
> >
>
> I suppose if there's really no good way of doing this other than
> providing a struct device, then so be it. I think the cleaned up sysfs
> shown in the summary above looks much better than what the original
> would've looked like.
>
> Perhaps an additional tweak to that would be to not create platform
> devices. Instead, just create struct device. Those really have
> everything you need (.of_node, and can be used with RPM and GENPD). As I
> mentioned earlier, platform device implies a CPU-memory-mapped bus,
> which this clearly isn't. It's kind of a separate "bus" if you want, so
> just using struct device directly seems more appropriate.

Just a heads up. If you don't use a platform device or have a driver
associated with it for probing, you need to manage the attachment to
genpd yourself. That means calling one of the dev_pm_domain_attach*()
APIs, but that's perfectly fine, ofcourse.

>
> We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
> for an example of how that was done. I think you can do something
> similar here.
>
> Thierry

Kind regards
Uffe

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Andrey Grodzovsky


I believe we have some minor confusion here

On 2021-08-20 4:09 a.m., Jingwen Chen wrote:

Hi all,

I just submit a v3 patch according your opinion on using kthread_park
instead.

Thanks,
Jingwen
On Fri Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote:

No, that perfectly works for me.

The problem we used to have with this approach was that we potentially have
multiple timeouts at the same time.

But when we serialize the timeout handling by using a single workqueue as
suggested by Daniel now as well then that isn't an issue any more.



While we do use single work queue by default (system_wq) for this, we 
use different
work items, one per scheduler which means they still run in parallel.  I 
didn't see the original
mail by Daniel but from what Christian mentioned I assume he suggested 
to serialize all TO handlers
from all possible engines by either using single work item for TO 
handler or by using single threaded queue for all TO handlers.
So i believe it's premature to send V3 patch without also switching all 
TDR handling to actual single threaded
handling per entire ASIC or in case of amdgpu we actually need to 
consider XGMI hives and so it goes beyond a single

device.

Andrey




Regards,
Christian.

Am 20.08.21 um 09:12 schrieb Liu, Monk:

[AMD Official Use Only]

@Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
Do you have any concern on the kthread_park() approach ?

Theoretically speaking sched_main shall run there exclusively with job_timeout 
since they both touches jobs, and stop scheduler during job_timeout won't 
impact performance since in that scenario
There was already something wrong/stuck on that ring/scheduler

Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Liu, Monk
Sent: Thursday, August 19, 2021 6:26 PM
To: Daniel Vetter ; Grodzovsky, Andrey 

Cc: Alex Deucher ; Chen, JingWen ; Maling list - 
DRI developers ; amd-gfx list ; 
Koenig, Christian 
Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

[AMD Official Use Only]

Hi Daniel


Why can't we stop the scheduler thread first, so that there's guaranteed no 
race? I've recently had a lot of discussions with panfrost folks about their 
reset that spawns across engines, and without stopping the scheduler thread 
first before you touch anything it's just plain impossible.

Yeah we had this though as well in our mind.

Our second approach is to call ktrhead_stop() in job_timedout() routine so that  the 
"bad" job is guaranteed to be used without scheduler's touching or freeing, 
Check this sample patch one as well please:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..50a49cb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  /* Protects against concurrent deletion in drm_sched_get_cleanup_job 
*/
+   kthread_park(sched->thread);
  spin_lock(&sched->job_list_lock);
  job = list_first_entry_or_null(&sched->pending_list,
 struct drm_sched_job, list);
  if (job) {
-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
  spin_unlock(&sched->job_list_lock);
  status = job->sched->ops->timedout_job(job);
@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
  } else {
  spin_unlock(&sched->job_list_lock);
  }
+   kthread_unpark(sched->thread);
  if (status != DRM_GPU_SCHED_STAT_ENODEV) {
  spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void 
drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
  kthread_park(sched->thread);
  /*
-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
-   /*
   * Iterate the job list from later to  earlier one and either deactive
   * their HW ca

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Andrey Grodzovsky




On 2021-08-20 3:12 a.m., Liu, Monk wrote:

[AMD Official Use Only]

@Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
  
Do you have any concern on the kthread_park() approach ?


Theoretically speaking sched_main shall run there exclusively with job_timeout 
since they both touches jobs, and stop scheduler during job_timeout won't 
impact performance since in that scenario
There was already something wrong/stuck on that ring/scheduler



Regarding last paragraph, and specifically the claim that there was 
already something wrong if the TO handler
starts execution - Not sure about this and I wonder if we have a 
potential bug here - when we start the timeout timer in
drm_sched_job_begin we do it for each new incoming job. In a constant 
rapid stream of jobs each new job comming
will try to start the timer but most of the time this operation just 
bails out as there is already pending timer from one
of the previous jobs which cancels out any new ones [1] so, when the TO 
handler does execute eventually it's not
because something wrong but simply because TO has expired. If in this 
case the pending list not empty a false
TDR will be triggered. I think long ago we used TO handler per job and 
not per scheduler, this would solve this problem
but hurt the serialization issue we are trying to solve. So not sure 
what to do.


[1] - 
https://elixir.bootlin.com/linux/v5.14-rc1/source/kernel/workqueue.c#L1665


Andrey



Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Liu, Monk
Sent: Thursday, August 19, 2021 6:26 PM
To: Daniel Vetter ; Grodzovsky, Andrey 

Cc: Alex Deucher ; Chen, JingWen ; Maling list - 
DRI developers ; amd-gfx list ; 
Koenig, Christian 
Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

[AMD Official Use Only]

Hi Daniel


Why can't we stop the scheduler thread first, so that there's guaranteed no 
race? I've recently had a lot of discussions with panfrost folks about their 
reset that spawns across engines, and without stopping the scheduler thread 
first before you touch anything it's just plain impossible.

Yeah we had this though as well in our mind.

Our second approach is to call ktrhead_stop() in job_timedout() routine so that  the 
"bad" job is guaranteed to be used without scheduler's touching or freeing, 
Check this sample patch one as well please:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..50a49cb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
 /* Protects against concurrent deletion in drm_sched_get_cleanup_job */

+   kthread_park(sched->thread);
 spin_lock(&sched->job_list_lock);
 job = list_first_entry_or_null(&sched->pending_list,
struct drm_sched_job, list);
  
 if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
 spin_unlock(&sched->job_list_lock);
  
 status = job->sched->ops->timedout_job(job);

@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 } else {
 spin_unlock(&sched->job_list_lock);
 }
+   kthread_unpark(sched->thread);
  
 if (status != DRM_GPU_SCHED_STAT_ENODEV) {

 spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void 
drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 kthread_park(sched->thread);
  
 /*

-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
-   /*
  * Iterate the job list from later to  earlier one and either deactive
  * their HW callbacks or remove them from pending list if they already
  * signaled.


Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-

Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-20 Thread Jason Gunthorpe

On Fri, Aug 20, 2021 at 03:58:33PM +0300, Gal Pressman wrote:

> Though it would've been nicer if we could agree on a solution that could work
> for more than 1-2 RDMA devices, using the existing tools the RDMA subsystem 
> has.

I don't think it can really be done, revoke is necessary, and isn't a
primitive we have today.

Revoke is sort of like rereg MR, but with a guaranteed no-change to
the lkey/rkey

Then there is the locking complexity of linking the mr creation and
destruction to the lifecycle of the pages, which is messy and maybe
not general. For instance mlx5 would call its revoke_mr, disconnect
the dmabuf then destroy the mkey - but this is only safe because mlx5
HW can handle concurrent revokes.

> That's why I tried to approach this by denying such attachments for non-ODP
> importers instead of exposing a "limited" dynamic importer.

That is fine if there is no revoke - once revoke exists we must have
driver and HW support.

Jason

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021, 6:02 AM Jiri Kosina  wrote:

> On Tue, 17 Aug 2021, Kees Cook wrote:
>
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memset(), avoid intentionally writing across
> > neighboring fields.
> >
> > Add struct_group() to mark region of struct kone_mouse_event that should
> > be initialized to zero.
> >
> > Cc: Stefan Achatz 
> > Cc: Jiri Kosina 
> > Cc: Benjamin Tissoires 
> > Cc: linux-in...@vger.kernel.org
> > Signed-off-by: Kees Cook 
>
> Applied, thank you Kees.
>

Eek! No, this will break the build: struct_group() is not yet in the tree.
I can carry this with an Ack, etc.

-Kees

Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021, 6:01 AM Jiri Kosina  wrote:

> On Tue, 17 Aug 2021, Kees Cook wrote:
>
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memcpy(), memmove(), and memset(), avoid
> > intentionally writing across neighboring fields.
> >
> > Use struct_group() in struct cp2112_string_report around members report,
> > length, type, and string, so they can be referenced together. This will
> > allow memcpy() and sizeof() to more easily reason about sizes, improve
> > readability, and avoid future warnings about writing beyond the end of
> > report.
> >
> > "pahole" shows no size nor member offset changes to struct
> > cp2112_string_report.  "objdump -d" shows no meaningful object
> > code changes (i.e. only source line number induced differences.)
> >
> > Cc: Jiri Kosina 
> > Cc: Benjamin Tissoires 
> > Cc: linux-in...@vger.kernel.org
> > Signed-off-by: Kees Cook 
>
> Applied, thanks.
>

Same for this one: it's part of the larger series.

-Kees

>

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Jiri Kosina

On Fri, 20 Aug 2021, Kees Cook wrote:

> > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > field bounds checking for memset(), avoid intentionally writing across
> > > neighboring fields.
> > >
> > > Add struct_group() to mark region of struct kone_mouse_event that should
> > > be initialized to zero.
> > >
> > > Cc: Stefan Achatz 
> > > Cc: Jiri Kosina 
> > > Cc: Benjamin Tissoires 
> > > Cc: linux-in...@vger.kernel.org
> > > Signed-off-by: Kees Cook 
> >
> > Applied, thank you Kees.
> >
> 
> Eek! No, this will break the build: struct_group() is not yet in the tree.
> I can carry this with an Ack, etc.

I was pretty sure I saw struct_group() already in linux-next, but that was 
apparently a vacation-induced brainfart, sorry. Dropping.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH] drm/amd/pm: And destination bounds checking to struct copy

2021-08-20 Thread Alex Deucher

On Thu, Aug 19, 2021 at 4:14 PM Kees Cook  wrote:
>
> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memcpy(), memmove(), and memset(), avoid
> intentionally writing across neighboring fields.
>
> The "Board Parameters" members of the structs:
> struct atom_smc_dpm_info_v4_5
> struct atom_smc_dpm_info_v4_6
> struct atom_smc_dpm_info_v4_7
> struct atom_smc_dpm_info_v4_10
> are written to the corresponding members of the corresponding PPTable_t
> variables, but they lack destination size bounds checking, which means
> the compiler cannot verify at compile time that this is an intended and
> safe memcpy().
>
> Since the header files are effectively immutable[1] and a struct_group()
> cannot be used, nor a common struct referenced by both sides of the
> memcpy() arguments, add a new helper, memcpy_trailing(), to perform the
> bounds checking at compile time. Replace the open-coded memcpy()s with
> memcpy_trailing() which includes enough context for the bounds checking.
>
> "objdump -d" shows no object code changes.
>
> [1] https://lore.kernel.org/lkml/e56aad3c-a06f-da07-f491-a894a570d...@amd.com
>
> Cc: Lijo Lazar 
> Cc: "Christian König" 
> Cc: "Pan, Xinhui" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Hawking Zhang 
> Cc: Feifei Xu 
> Cc: Likun Gao 
> Cc: Jiawei Gu 
> Cc: Evan Quan 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Kees Cook 
> Link: 
> https://lore.kernel.org/lkml/cadnq5_npb8uyvd+r4uhgf-w8-cqj3joodjvijr_y9w9wqj7...@mail.gmail.com
> ---
> Alex, I dropped your prior Acked-by, since the implementation is very
> different. If you're still happy with it, I can add it back. :)

This looks reasonable to me:
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 25 +++
>  .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  6 ++---
>  .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  8 +++---
>  .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  5 ++--
>  4 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 96e895d6be35..4605934a4fb7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1446,4 +1446,29 @@ static inline int amdgpu_in_reset(struct amdgpu_device 
> *adev)
>  {
> return atomic_read(&adev->in_gpu_reset);
>  }
> +
> +/**
> + * memcpy_trailing - Copy the end of one structure into the middle of another
> + *
> + * @dst: Pointer to destination struct
> + * @first_dst_member: The member name in @dst where the overwrite begins
> + * @last_dst_member: The member name in @dst where the overwrite ends after
> + * @src: Pointer to the source struct
> + * @first_src_member: The member name in @src where the copy begins
> + *
> + */
> +#define memcpy_trailing(dst, first_dst_member, last_dst_member,  
>  \
> +   src, first_src_member) \
> +({\
> +   size_t __src_offset = offsetof(typeof(*(src)), first_src_member);  \
> +   size_t __src_size = sizeof(*(src)) - __src_offset; \
> +   size_t __dst_offset = offsetof(typeof(*(dst)), first_dst_member);  \
> +   size_t __dst_size = offsetofend(typeof(*(dst)), last_dst_member) - \
> +   __dst_offset;  \
> +   BUILD_BUG_ON(__src_size != __dst_size);\
> +   __builtin_memcpy((u8 *)(dst) + __dst_offset,   \
> +(u8 *)(src) + __src_offset,   \
> +__dst_size);  \
> +})
> +
>  #endif
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> index 8ab58781ae13..1918e6232319 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> @@ -465,10 +465,8 @@ static int arcturus_append_powerplay_table(struct 
> smu_context *smu)
>
> if ((smc_dpm_table->table_header.format_revision == 4) &&
> (smc_dpm_table->table_header.content_revision == 6))
> -   memcpy(&smc_pptable->MaxVoltageStepGfx,
> -  &smc_dpm_table->maxvoltagestepgfx,
> -  sizeof(*smc_dpm_table) - offsetof(struct 
> atom_smc_dpm_info_v4_6, maxvoltagestepgfx));
> -
> +   memcpy_trailing(smc_pptable, MaxVoltageStepGfx, BoardReserved,
> +   smc_dpm_table, maxvoltagestepgfx);
> return 0;
>  }
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> index 2e5d3669652b..b738042e064d 100644
> --- a/drivers/gpu/drm/amd/pm/sws

Re: [PATCH v8 12/13] drm/mediatek: add MERGE support for mediatek-drm

2021-08-20 Thread Chun-Kuang Hu

Hi, Jason:

jason-jh.lin  於 2021年8月19日 週四 上午10:23寫道：
>
> Add MERGE engine file:
> MERGE module is used to merge two slice-per-line inputs
> into one side-by-side output.
>
> Signed-off-by: jason-jh.lin 
> ---
>  drivers/gpu/drm/mediatek/Makefile   |   1 +
>  drivers/gpu/drm/mediatek/mtk_disp_drv.h |   8 +
>  drivers/gpu/drm/mediatek/mtk_disp_merge.c   | 268 
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c |  16 ++
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h |   1 +
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c  |   2 +
>  drivers/gpu/drm/mediatek/mtk_drm_drv.h  |   1 +
>  7 files changed, 297 insertions(+)
>  create mode 100644 drivers/gpu/drm/mediatek/mtk_disp_merge.c
>
> diff --git a/drivers/gpu/drm/mediatek/Makefile 
> b/drivers/gpu/drm/mediatek/Makefile
> index dc54a7a69005..538e0087a44c 100644
> --- a/drivers/gpu/drm/mediatek/Makefile
> +++ b/drivers/gpu/drm/mediatek/Makefile
> @@ -3,6 +3,7 @@
>  mediatek-drm-y := mtk_disp_ccorr.o \
>   mtk_disp_color.o \
>   mtk_disp_gamma.o \
> + mtk_disp_merge.o \
>   mtk_disp_ovl.o \
>   mtk_disp_rdma.o \
>   mtk_drm_crtc.o \
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_drv.h 
> b/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> index cafd9df2d63b..f407cd9d873e 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> @@ -46,6 +46,14 @@ void mtk_gamma_set_common(void __iomem *regs, struct 
> drm_crtc_state *state);
>  void mtk_gamma_start(struct device *dev);
>  void mtk_gamma_stop(struct device *dev);
>
> +int mtk_merge_clk_enable(struct device *dev);
> +void mtk_merge_clk_disable(struct device *dev);
> +void mtk_merge_config(struct device *dev, unsigned int width,
> + unsigned int height, unsigned int vrefresh,
> + unsigned int bpc, struct cmdq_pkt *cmdq_pkt);
> +void mtk_merge_start(struct device *dev);
> +void mtk_merge_stop(struct device *dev);
> +
>  void mtk_ovl_bgclr_in_on(struct device *dev);
>  void mtk_ovl_bgclr_in_off(struct device *dev);
>  void mtk_ovl_bypass_shadow(struct device *dev);
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_merge.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_merge.c
> new file mode 100644
> index ..ebcb646bde9c
> --- /dev/null
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_merge.c
> @@ -0,0 +1,268 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "mtk_drm_ddp_comp.h"
> +#include "mtk_drm_drv.h"
> +#include "mtk_disp_drv.h"
> +
> +#define DISP_REG_MERGE_CTRL0x000
> +#define MERGE_EN   1
> +#define DISP_REG_MERGE_CFG_0   0x010
> +#define DISP_REG_MERGE_CFG_4   0x020
> +#define DISP_REG_MERGE_CFG_10  0x038
> +/* no swap */
> +#define SWAP_MODE  0
> +#define FLD_SWAP_MODE  GENMASK(4, 0)
> +#define DISP_REG_MERGE_CFG_12  0x040
> +#define CFG_10_10_1PI_2PO_BUF_MODE 6
> +#define CFG_10_10_2PI_2PO_BUF_MODE 8
> +#define FLD_CFG_MERGE_MODE GENMASK(4, 0)
> +#define DISP_REG_MERGE_CFG_24  0x070
> +#define DISP_REG_MERGE_CFG_25  0x074
> +#define DISP_REG_MERGE_CFG_36  0x0a0
> +#define ULTRA_EN   1

You could use FLD_ULTRA_EN for this.

> +#define PREULTRA_EN1
> +#define HALT_FOR_DVFS_EN   0

You could just not set this.

> +#define FLD_ULTRA_EN   GENMASK(0, 0)

#define FLD_ULTRA_EN BIT(0)

Regards,
Chun-Kuang.

> +#define FLD_PREULTRA_ENGENMASK(4, 4)
> +#define FLD_HALT_FOR_DVFS_EN   GENMASK(8, 8)
> +#define DISP_REG_MERGE_CFG_37  0x0a4
> +/* 0: Off, 1: SRAM0, 2: SRAM1, 3: SRAM0 + SRAM1 */
> +#define BUFFER_MODE3
> +#define FLD_BUFFER_MODEGENMASK(1, 0)
> +#define DISP_REG_MERGE_CFG_38  0x0a8
> +#define FLD_VDE_BLOCK_ULTRAGENMASK(0, 0)
> +#define FLD_VALID_TH_BLOCK_ULTRA   GENMASK(4, 4)
> +#define FLD_ULTRA_FIFO_VALID_THGENMASK(31, 16)
> +#define DISP_REG_MERGE_CFG_39  0x0ac
> +#define FLD_NVDE_FORCE_PREULTRAGENMASK(8, 8)
> +#define FLD_NVALID_TH_FORCE_PREULTRA   GENMASK(12, 12)
> +#define FLD_PREULTRA_FIFO_VALID_TH GENMASK(31, 16)

[PATCH V3 5/9] PCI/VGA: Move vga_arb_integrated_gpu() earlier in file

2021-08-20 Thread Huacai Chen

Move vga_arb_integrated_gpu() earlier in file to prepare for future patch.
No functional change intended.

[bhelgaas: split to separate patch]
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index c984c76b3fd7..1f8fb37be5fa 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -563,6 +563,20 @@ void vga_put(struct pci_dev *pdev, unsigned int rsrc)
 }
 EXPORT_SYMBOL(vga_put);
 
+#if defined(CONFIG_ACPI)
+static bool vga_arb_integrated_gpu(struct device *dev)
+{
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+
+   return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID);
+}
+#else
+static bool vga_arb_integrated_gpu(struct device *dev)
+{
+   return false;
+}
+#endif
+
 /*
  * Rules for using a bridge to control a VGA descendant decoding: if a bridge
  * has only one VGA descendant then it can be used to control the VGA routing
@@ -1416,20 +1430,6 @@ static struct miscdevice vga_arb_device = {
MISC_DYNAMIC_MINOR, "vga_arbiter", &vga_arb_device_fops
 };
 
-#if defined(CONFIG_ACPI)
-static bool vga_arb_integrated_gpu(struct device *dev)
-{
-   struct acpi_device *adev = ACPI_COMPANION(dev);
-
-   return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID);
-}
-#else
-static bool vga_arb_integrated_gpu(struct device *dev)
-{
-   return false;
-}
-#endif
-
 static void __init vga_arb_select_default_device(void)
 {
struct pci_dev *pdev, *found = NULL;
-- 
2.27.0

[PATCH V3 9/9] PCI/VGA: Rework default VGA device selection

2021-08-20 Thread Huacai Chen

Current default VGA device selection fails in some cases:

  - On BMC system, the AST2500 bridge [1a03:1150] does not implement
PCI_BRIDGE_CTL_VGA [1].  This is perfectly legal but means the
legacy VGA resources won't reach downstream devices unless they're
included in the usual bridge windows.

  - vga_arb_select_default_device() will set a device below such a
bridge as the default VGA device as long as it has PCI_COMMAND_IO
and PCI_COMMAND_MEMORY enabled.

  - vga_arbiter_add_pci_device() is called for every VGA device,
either at boot-time or at hot-add time, and it will also set the
device as the default VGA device, but ONLY if all bridges leading
to it implement PCI_BRIDGE_CTL_VGA.

  - This difference between vga_arb_select_default_device() and
vga_arbiter_add_pci_device() means that a device below an AST2500
or similar bridge can only be set as the default if it is
enumerated before vga_arb_device_init().

  - On ACPI-based systems, PCI devices are enumerated by acpi_init(),
which runs before vga_arb_device_init().

  - On non-ACPI systems, like on MIPS system, they are enumerated by
pcibios_init(), which typically runs *after*
vga_arb_device_init().

So I made vga_arb_update_default_device() to replace the current vga_
arb_select_default_device(), which will be call from vga_arbiter_add_
pci_device(), set the default device even if it does not own the VGA
resources because an upstream bridge doesn't implement PCI_BRIDGE_CTL_
VGA. And the default VGA device is updated if a better one is found
(device with legacy resources enabled is better, device owns the
firmware framebuffer is even better).
---
 drivers/pci/vgaarb.c | 158 ++-
 1 file changed, 66 insertions(+), 92 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index dd07b1c3205f..0b059a2fc749 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -580,16 +580,79 @@ static bool vga_arb_integrated_gpu(struct device *dev)
 static void vga_arb_update_default_device(struct vga_device *vgadev)
 {
struct pci_dev *pdev = vgadev->pdev;
+   struct device *dev = &pdev->dev;
+   struct vga_device *vgadev_default;
+#if defined(CONFIG_X86) || defined(CONFIG_IA64)
+   int i;
+   unsigned long flags;
+   u64 base = screen_info.lfb_base;
+   u64 size = screen_info.lfb_size;
+   u64 limit;
+   resource_size_t start, end;
+#endif
 
/*
 * If we don't have a default VGA device yet, and this device owns
 * the legacy VGA resources, make it the default.
 */
-   if (!vga_default_device() &&
-   ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
-   vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
+   if (!vga_default_device()) {
+   if ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == 
VGA_RSRC_LEGACY_MASK)
+   vgaarb_info(dev, "setting as boot VGA device\n");
+   else
+   vgaarb_info(dev, "setting as boot device (VGA legacy 
resources not available)\n");
vga_set_default_device(pdev);
}
+
+   vgadev_default = vgadev_find(vga_default);
+
+   /* Overridden by a better device */
+   if (vgadev_default && ((vgadev_default->owns & VGA_RSRC_LEGACY_MASK) == 
0)
+   && ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == 
VGA_RSRC_LEGACY_MASK)) {
+   vgaarb_info(dev, "overriding boot VGA device\n");
+   vga_set_default_device(pdev);
+   }
+
+   if (vga_arb_integrated_gpu(dev)) {
+   vgaarb_info(dev, "overriding boot VGA device\n");
+   vga_set_default_device(pdev);
+   }
+
+#if defined(CONFIG_X86) || defined(CONFIG_IA64)
+   if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+   base |= (u64)screen_info.ext_lfb_base << 32;
+
+   limit = base + size;
+
+   /*
+* Override vga_arbiter_add_pci_device()'s I/O based detection
+* as it may take the wrong device (e.g. on Apple system under
+* EFI).
+*
+* Select the device owning the boot framebuffer if there is
+* one.
+*/
+
+   /* Does firmware framebuffer belong to us? */
+   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+   flags = pci_resource_flags(vgadev->pdev, i);
+
+   if ((flags & IORESOURCE_MEM) == 0)
+   continue;
+
+   start = pci_resource_start(vgadev->pdev, i);
+   end  = pci_resource_end(vgadev->pdev, i);
+
+   if (!start || !end)
+   continue;
+
+   if (base < start || limit >= end)
+   continue;
+
+   if (vgadev->pdev != vga_default_device())
+   vgaarb_info(dev, "overriding boot device\n");
+   vga_set_default_device(vgadev->pdev);
+   }
+#endif
 }

[PATCH V3 2/9] PCI/VGA: Replace full MIT license text with SPDX identifier

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

Per Documentation/process/license-rules.rst, the SPDX MIT identifier is
equivalent to including the entire MIT license text from
LICENSES/preferred/MIT.

Replace the MIT license text with the equivalent SPDX identifier.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 949fde433ea2..61b57abcb014 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -1,32 +1,11 @@
+// SPDX-License-Identifier: MIT
 /*
  * vgaarb.c: Implements the VGA arbitration. For details refer to
  * Documentation/gpu/vgaarbiter.rst
  *
- *
  * (C) Copyright 2005 Benjamin Herrenschmidt 
  * (C) Copyright 2007 Paulo R. Zanoni 
  * (C) Copyright 2007, 2009 Tiago Vignatti 
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS
- * IN THE SOFTWARE.
- *
  */
 
 #define pr_fmt(fmt) "vgaarb: " fmt
-- 
2.27.0

Re: [syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread syzbot

syzbot has bisected this issue to:

commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
Author: Daniel Vetter 
Date:   Fri Oct 9 23:21:56 2020 +

drm/vkms: fbdev emulation support

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530
start commit:   614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530
console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530
kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30

Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com
Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

[PATCH V3 8/9] PCI/VGA: Log bridge control messages when adding devices

2021-08-20 Thread Huacai Chen

Previously vga_arb_device_init() iterated through all VGA devices and
indicated whether legacy VGA routing to each could be controlled by an
upstream bridge.

But we determine that information in vga_arbiter_add_pci_device(), which we
call for every device, so we can log it there without iterating through the
VGA devices again.

Note that we call vga_arbiter_check_bridge_sharing() before adding the
device to vga_list, so we have to handle the very first device separately.

[bhelgaas: commit log, split another piece to separate patch, fix
list_empty() issue]
Link: https://lore.kernel.org/r/20210705100503.1120643-1-chenhua...@loongson.cn
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 4cecb599f5ed..dd07b1c3205f 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -609,8 +609,10 @@ static void vga_arbiter_check_bridge_sharing(struct 
vga_device *vgadev)
 
vgadev->bridge_has_one_vga = true;
 
-   if (list_empty(&vga_list))
+   if (list_empty(&vga_list)) {
+   vgaarb_info(&vgadev->pdev->dev, "bridge control possible\n");
return;
+   }
 
/* okay iterate the new devices bridge hierarachy */
new_bus = vgadev->pdev->bus;
@@ -649,6 +651,11 @@ static void vga_arbiter_check_bridge_sharing(struct 
vga_device *vgadev)
}
new_bus = new_bus->parent;
}
+
+   if (vgadev->bridge_has_one_vga)
+   vgaarb_info(&vgadev->pdev->dev, "bridge control possible\n");
+   else
+   vgaarb_info(&vgadev->pdev->dev, "no bridge control possible\n");
 }
 
 /*
@@ -1527,7 +1534,6 @@ static int __init vga_arb_device_init(void)
 {
int rc;
struct pci_dev *pdev;
-   struct vga_device *vgadev;
 
rc = misc_register(&vga_arb_device);
if (rc < 0)
@@ -1543,15 +1549,6 @@ static int __init vga_arb_device_init(void)
   PCI_ANY_ID, pdev)) != NULL)
vga_arbiter_add_pci_device(pdev);
 
-   list_for_each_entry(vgadev, &vga_list, list) {
-   struct device *dev = &vgadev->pdev->dev;
-
-   if (vgadev->bridge_has_one_vga)
-   vgaarb_info(dev, "bridge control possible\n");
-   else
-   vgaarb_info(dev, "no bridge control possible\n");
-   }
-
vga_arb_select_default_device();
 
pr_info("loaded\n");
-- 
2.27.0

[PATCH V3 7/9] PCI/VGA: Split out vga_arb_update_default_device()

2021-08-20 Thread Huacai Chen

If there's no default VGA device, and we find a VGA device that owns the
legacy VGA resources, we make that device the default.  Split this logic
out from vga_arbiter_add_pci_device() into a new function,
vga_arb_update_default_device().

[bhelgaas: split another piece to separate patch]
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index a6a5864ff538..4cecb599f5ed 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -577,6 +577,21 @@ static bool vga_arb_integrated_gpu(struct device *dev)
 }
 #endif
 
+static void vga_arb_update_default_device(struct vga_device *vgadev)
+{
+   struct pci_dev *pdev = vgadev->pdev;
+
+   /*
+* If we don't have a default VGA device yet, and this device owns
+* the legacy VGA resources, make it the default.
+*/
+   if (!vga_default_device() &&
+   ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
+   vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
+   vga_set_default_device(pdev);
+   }
+}
+
 /*
  * Rules for using a bridge to control a VGA descendant decoding: if a bridge
  * has only one VGA descendant then it can be used to control the VGA routing
@@ -704,15 +719,7 @@ static bool vga_arbiter_add_pci_device(struct pci_dev 
*pdev)
bus = bus->parent;
}
 
-   /* Deal with VGA default device. Use first enabled one
-* by default if arch doesn't have it's own hook
-*/
-   if (!vga_default_device() &&
-   ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
-   vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
-   vga_set_default_device(pdev);
-   }
-
+   vga_arb_update_default_device(vgadev);
vga_arbiter_check_bridge_sharing(vgadev);
 
/* Add to the list */
-- 
2.27.0

[PATCH V3 0/9] PCI/VGA: Rework default VGA device selection

2021-08-20 Thread Huacai Chen

My original work is at [1].

Bjorn do some rework and extension in V2. It moves the VGA arbiter to
the PCI subsystem, fixes a few nits, and breaks a few pieces to make
the main patch a little smaller.

V3 rewrite the commit log of the last patch (which is also summarized
by Bjorn).

All comments welcome!

[1] 
https://lore.kernel.org/dri-devel/20210705100503.1120643-1-chenhua...@loongson.cn/

Bjorn Helgaas (4):
  PCI/VGA: Move vgaarb to drivers/pci
  PCI/VGA: Replace full MIT license text with SPDX identifier
  PCI/VGA: Use unsigned format string to print lock counts
  PCI/VGA: Remove empty vga_arb_device_card_gone()

Huacai Chen (5):
  PCI/VGA: Move vga_arb_integrated_gpu() earlier in file
  PCI/VGA: Prefer vga_default_device()
  PCI/VGA: Split out vga_arb_update_default_device()
  PCI/VGA: Log bridge control messages when adding devices
  PCI/VGA: Rework default VGA device selection

Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas  
---
 drivers/gpu/vga/Kconfig   |  19 ---
 drivers/gpu/vga/Makefile  |   1 -
 drivers/pci/Kconfig   |  19 +++
 drivers/pci/Makefile  |   1 +
 drivers/{gpu/vga => pci}/vgaarb.c | 269 --
 5 files changed, 126 insertions(+), 183 deletions(-)
 rename drivers/{gpu/vga => pci}/vgaarb.c (90%)
--
2.27.0

[PATCH V3 1/9] PCI/VGA: Move vgaarb to drivers/pci

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

The VGA arbiter is really PCI-specific and doesn't depend on any GPU
things.  Move it to the PCI subsystem.

Signed-off-by: Bjorn Helgaas 
---
 drivers/gpu/vga/Kconfig   | 19 ---
 drivers/gpu/vga/Makefile  |  1 -
 drivers/pci/Kconfig   | 19 +++
 drivers/pci/Makefile  |  1 +
 drivers/{gpu/vga => pci}/vgaarb.c |  0
 5 files changed, 20 insertions(+), 20 deletions(-)
 rename drivers/{gpu/vga => pci}/vgaarb.c (100%)

diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig
index 1ad4c4ef0b5e..eb8b14ab22c3 100644
--- a/drivers/gpu/vga/Kconfig
+++ b/drivers/gpu/vga/Kconfig
@@ -1,23 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config VGA_ARB
-   bool "VGA Arbitration" if EXPERT
-   default y
-   depends on (PCI && !S390)
-   help
- Some "legacy" VGA devices implemented on PCI typically have the same
- hard-decoded addresses as they did on ISA. When multiple PCI devices
- are accessed at same time they need some kind of coordination. Please
- see Documentation/gpu/vgaarbiter.rst for more details. Select this to
- enable VGA arbiter.
-
-config VGA_ARB_MAX_GPUS
-   int "Maximum number of GPUs"
-   default 16
-   depends on VGA_ARB
-   help
- Reserves space in the kernel to maintain resource locking for
- multiple GPUS.  The overhead for each GPU is very small.
-
 config VGA_SWITCHEROO
bool "Laptop Hybrid Graphics - GPU switching support"
depends on X86
diff --git a/drivers/gpu/vga/Makefile b/drivers/gpu/vga/Makefile
index e92064442d60..9800620deda3 100644
--- a/drivers/gpu/vga/Makefile
+++ b/drivers/gpu/vga/Makefile
@@ -1,3 +1,2 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_VGA_ARB)  += vgaarb.o
 obj-$(CONFIG_VGA_SWITCHEROO) += vga_switcheroo.o
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..7c9e56d7b857 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -252,6 +252,25 @@ config PCIE_BUS_PEER2PEER
 
 endchoice
 
+config VGA_ARB
+   bool "VGA Arbitration" if EXPERT
+   default y
+   depends on (PCI && !S390)
+   help
+ Some "legacy" VGA devices implemented on PCI typically have the same
+ hard-decoded addresses as they did on ISA. When multiple PCI devices
+ are accessed at same time they need some kind of coordination. Please
+ see Documentation/gpu/vgaarbiter.rst for more details. Select this to
+ enable VGA arbiter.
+
+config VGA_ARB_MAX_GPUS
+   int "Maximum number of GPUs"
+   default 16
+   depends on VGA_ARB
+   help
+ Reserves space in the kernel to maintain resource locking for
+ multiple GPUS.  The overhead for each GPU is very small.
+
 source "drivers/pci/hotplug/Kconfig"
 source "drivers/pci/controller/Kconfig"
 source "drivers/pci/endpoint/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index d62c4ac4ae1b..ebe720f69b15 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o
 obj-$(CONFIG_PCI_ECAM) += ecam.o
 obj-$(CONFIG_PCI_P2PDMA)   += p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
+obj-$(CONFIG_VGA_ARB)  += vgaarb.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT) += endpoint/
diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/pci/vgaarb.c
similarity index 100%
rename from drivers/gpu/vga/vgaarb.c
rename to drivers/pci/vgaarb.c
-- 
2.27.0

[PATCH V3 6/9] PCI/VGA: Prefer vga_default_device()

2021-08-20 Thread Huacai Chen

Use the vga_default_device() interface consistently instead of directly
testing vga_default.  No functional change intended.

[bhelgaas: split to separate patch and extended]
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 1f8fb37be5fa..a6a5864ff538 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -173,7 +173,7 @@ int vga_remove_vgacon(struct pci_dev *pdev)
 {
int ret = 0;
 
-   if (pdev != vga_default)
+   if (pdev != vga_default_device())
return 0;
vgaarb_info(&pdev->dev, "deactivate vga console\n");
 
@@ -707,7 +707,7 @@ static bool vga_arbiter_add_pci_device(struct pci_dev *pdev)
/* Deal with VGA default device. Use first enabled one
 * by default if arch doesn't have it's own hook
 */
-   if (vga_default == NULL &&
+   if (!vga_default_device() &&
((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
vga_set_default_device(pdev);
@@ -744,7 +744,7 @@ static bool vga_arbiter_del_pci_device(struct pci_dev *pdev)
goto bail;
}
 
-   if (vga_default == pdev)
+   if (vga_default_device() == pdev)
vga_set_default_device(NULL);
 
if (vgadev->decodes & (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM))
-- 
2.27.0

[PATCH V3 3/9] PCI/VGA: Use unsigned format string to print lock counts

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

In struct vga_device, io_lock_cnt and mem_lock_cnt are unsigned, but we
previously printed them with "%d", the signed decimal format.  Print them
with the unsigned format "%u" instead.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 61b57abcb014..e4153ab70481 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -1022,7 +1022,7 @@ static ssize_t vga_arb_read(struct file *file, char 
__user *buf,
 
/* Fill the buffer with infos */
len = snprintf(lbuf, 1024,
-  "count:%d,PCI:%s,decodes=%s,owns=%s,locks=%s(%d:%d)\n",
+  "count:%d,PCI:%s,decodes=%s,owns=%s,locks=%s(%u:%u)\n",
   vga_decode_count, pci_name(pdev),
   vga_iostate_to_str(vgadev->decodes),
   vga_iostate_to_str(vgadev->owns),
-- 
2.27.0

[PATCH V3 4/9] PCI/VGA: Remove empty vga_arb_device_card_gone()

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

vga_arb_device_card_gone() has always been empty.  Remove it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index e4153ab70481..c984c76b3fd7 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -104,8 +104,6 @@ static int vga_str_to_iostate(char *buf, int str_size, int 
*io_state)
 /* this is only used a cookie - it should not be dereferenced */
 static struct pci_dev *vga_default;
 
-static void vga_arb_device_card_gone(struct pci_dev *pdev);
-
 /* Find somebody in our list */
 static struct vga_device *vgadev_find(struct pci_dev *pdev)
 {
@@ -741,10 +739,6 @@ static bool vga_arbiter_del_pci_device(struct pci_dev 
*pdev)
/* Remove entry from list */
list_del(&vgadev->list);
vga_count--;
-   /* Notify userland driver that the device is gone so it discards
-* it's copies of the pci_dev pointer
-*/
-   vga_arb_device_card_gone(pdev);
 
/* Wake up all possible waiters */
wake_up_all(&vga_wait_queue);
@@ -994,9 +988,7 @@ static ssize_t vga_arb_read(struct file *file, char __user 
*buf,
if (lbuf == NULL)
return -ENOMEM;
 
-   /* Shields against vga_arb_device_card_gone (pci_dev going
-* away), and allows access to vga list
-*/
+   /* Protects vga_list */
spin_lock_irqsave(&vga_lock, flags);
 
/* If we are targeting the default, use it */
@@ -1013,8 +1005,6 @@ static ssize_t vga_arb_read(struct file *file, char 
__user *buf,
/* Wow, it's not in the list, that shouldn't happen,
 * let's fix us up and return invalid card
 */
-   if (pdev == priv->target)
-   vga_arb_device_card_gone(pdev);
spin_unlock_irqrestore(&vga_lock, flags);
len = sprintf(lbuf, "invalid");
goto done;
@@ -1358,10 +1348,6 @@ static int vga_arb_release(struct inode *inode, struct 
file *file)
return 0;
 }
 
-static void vga_arb_device_card_gone(struct pci_dev *pdev)
-{
-}
-
 /*
  * callback any registered clients to let them know we have a
  * change in VGA cards
-- 
2.27.0

Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 03:01:43PM +0200, Jiri Kosina wrote:
> On Tue, 17 Aug 2021, Kees Cook wrote:
> 
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memcpy(), memmove(), and memset(), avoid
> > intentionally writing across neighboring fields.
> > 
> > Use struct_group() in struct cp2112_string_report around members report,
> > length, type, and string, so they can be referenced together. This will
> > allow memcpy() and sizeof() to more easily reason about sizes, improve
> > readability, and avoid future warnings about writing beyond the end of
> > report.
> > 
> > "pahole" shows no size nor member offset changes to struct
> > cp2112_string_report.  "objdump -d" shows no meaningful object
> > code changes (i.e. only source line number induced differences.)
> > 
> > Cc: Jiri Kosina 
> > Cc: Benjamin Tissoires 
> > Cc: linux-in...@vger.kernel.org
> > Signed-off-by: Kees Cook 
> 
> Applied, thanks.

I'm not sure if my other HTML email got through, but please don't apply
these to separate trees -- struct_group() is introduced as part of this
series.

-- 
Kees Cook

[PATCH] drm/i915: Actually delete gpu reloc selftests

2021-08-20 Thread Daniel Vetter

In

commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7
Author: Daniel Vetter 
Date:   Tue Aug 3 14:48:33 2021 +0200

drm/i915: delete gpu reloc code

I deleted the gpu relocation code and the selftest include and
enabling, but accidentally forgot about the selftest source code.

Fix this oversight.

Signed-off-by: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Chris Wilson 
Cc: Maarten Lankhorst 
Cc: Daniel Vetter 
Cc: Joonas Lahtinen 
Cc: "Thomas Hellström" 
Cc: Matthew Auld 
Cc: Lionel Landwerlin 
Cc: Dave Airlie 
Cc: Jason Ekstrand 
---
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 190 --
 1 file changed, 190 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
deleted file mode 100644
index 16162fc2782d..
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ /dev/null
@@ -1,190 +0,0 @@
-// SPDX-License-Identifier: MIT
-/*
- * Copyright © 2020 Intel Corporation
- */
-
-#include "i915_selftest.h"
-
-#include "gt/intel_engine_pm.h"
-#include "selftests/igt_flush_test.h"
-
-static u64 read_reloc(const u32 *map, int x, const u64 mask)
-{
-   u64 reloc;
-
-   memcpy(&reloc, &map[x], sizeof(reloc));
-   return reloc & mask;
-}
-
-static int __igt_gpu_reloc(struct i915_execbuffer *eb,
-  struct drm_i915_gem_object *obj)
-{
-   const unsigned int offsets[] = { 8, 3, 0 };
-   const u64 mask =
-   GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
-   const u32 *map = page_mask_bits(obj->mm.mapping);
-   struct i915_request *rq;
-   struct i915_vma *vma;
-   int err;
-   int i;
-
-   vma = i915_vma_instance(obj, eb->context->vm, NULL);
-   if (IS_ERR(vma))
-   return PTR_ERR(vma);
-
-   err = i915_gem_object_lock(obj, &eb->ww);
-   if (err)
-   return err;
-
-   err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
-   if (err)
-   return err;
-
-   /* 8-Byte aligned */
-   err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
-   if (err <= 0)
-   goto reloc_err;
-
-   /* !8-Byte aligned */
-   err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
-   if (err <= 0)
-   goto reloc_err;
-
-   /* Skip to the end of the cmd page */
-   i = PAGE_SIZE / sizeof(u32) - 1;
-   i -= eb->reloc_cache.rq_size;
-   memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
-MI_NOOP, i);
-   eb->reloc_cache.rq_size += i;
-
-   /* Force next batch */
-   err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
-   if (err <= 0)
-   goto reloc_err;
-
-   GEM_BUG_ON(!eb->reloc_cache.rq);
-   rq = i915_request_get(eb->reloc_cache.rq);
-   reloc_gpu_flush(eb, &eb->reloc_cache);
-   GEM_BUG_ON(eb->reloc_cache.rq);
-
-   err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
-   if (err) {
-   intel_gt_set_wedged(eb->engine->gt);
-   goto put_rq;
-   }
-
-   if (!i915_request_completed(rq)) {
-   pr_err("%s: did not wait for relocations!\n", eb->engine->name);
-   err = -EINVAL;
-   goto put_rq;
-   }
-
-   for (i = 0; i < ARRAY_SIZE(offsets); i++) {
-   u64 reloc = read_reloc(map, offsets[i], mask);
-
-   if (reloc != i) {
-   pr_err("%s[%d]: map[%d] %llx != %x\n",
-  eb->engine->name, i, offsets[i], reloc, i);
-   err = -EINVAL;
-   }
-   }
-   if (err)
-   igt_hexdump(map, 4096);
-
-put_rq:
-   i915_request_put(rq);
-unpin_vma:
-   i915_vma_unpin(vma);
-   return err;
-
-reloc_err:
-   if (!err)
-   err = -EIO;
-   goto unpin_vma;
-}
-
-static int igt_gpu_reloc(void *arg)
-{
-   struct i915_execbuffer eb;
-   struct drm_i915_gem_object *scratch;
-   int err = 0;
-   u32 *map;
-
-   eb.i915 = arg;
-
-   scratch = i915_gem_object_create_internal(eb.i915, 4096);
-   if (IS_ERR(scratch))
-   return PTR_ERR(scratch);
-
-   map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC);
-   if (IS_ERR(map)) {
-   err = PTR_ERR(map);
-   goto err_scratch;
-   }
-
-   intel_gt_pm_get(&eb.i915->gt);
-
-   for_each_uabi_engine(eb.engine, eb.i915) {
-   if (intel_engine_requires_cmd_parser(eb.engine) ||
-   intel_engine_using_cmd_parser(eb.engine))
-   continue;
-
-   reloc_cache_init(&eb.reloc_cache, eb.i915);
-   memset(map, POISON_INUSE, 4096);
-
-   intel_engine_pm_get(eb.engine);
-   eb.context = intel_context_create(e

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 05:27:35PM +0200, Jiri Kosina wrote:
> On Fri, 20 Aug 2021, Kees Cook wrote:
> 
> > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > > field bounds checking for memset(), avoid intentionally writing across
> > > > neighboring fields.
> > > >
> > > > Add struct_group() to mark region of struct kone_mouse_event that should
> > > > be initialized to zero.
> > > >
> > > > Cc: Stefan Achatz 
> > > > Cc: Jiri Kosina 
> > > > Cc: Benjamin Tissoires 
> > > > Cc: linux-in...@vger.kernel.org
> > > > Signed-off-by: Kees Cook 
> > >
> > > Applied, thank you Kees.
> > >
> > 
> > Eek! No, this will break the build: struct_group() is not yet in the tree.
> > I can carry this with an Ack, etc.
> 
> I was pretty sure I saw struct_group() already in linux-next, but that was 
> apparently a vacation-induced brainfart, sorry. Dropping.

Cool, no worries. Sorry for the confusion!

-- 
Kees Cook

Re: [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents

2021-08-20 Thread Jason Gunthorpe

On Thu, Jul 29, 2021 at 12:39:12PM +0300, Leon Romanovsky wrote:

> +/**
> + * __sg_free_table - Free a previously mapped sg table
> + * @table:   The sg table header to use
> + * @max_ents:The maximum number of entries per single scatterlist
> + * @total_ents:  The total number of entries in the table
> + * @nents_first_chunk: Number of entries int the (preallocated) first
> + * scatterlist chunk, 0 means no such preallocated
> + * first chunk
> + * @free_fn: Free function
> + *
> + *  Description:
> + *Free an sg table previously allocated and setup with
> + *__sg_alloc_table().  The @max_ents value must be identical to
> + *that previously used with __sg_alloc_table().
> + *
> + **/
> +void __sg_free_table(struct sg_table *table, unsigned int max_ents,
> +  unsigned int nents_first_chunk, sg_free_fn *free_fn)
> +{
> + sg_free_table_entries(table, max_ents, nents_first_chunk, free_fn,
> +   table->orig_nents);
> +}
>  EXPORT_SYMBOL(__sg_free_table);

This is getting a bit indirect, there is only one caller of
__sg_free_table() in sg_pool.c, so may as well just export
sg_free_table_entries have have it use that directly.

Jason

Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 05:49:35PM +1000, Michael Ellerman wrote:
> Kees Cook  writes:
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memset(), avoid intentionally writing across
> > neighboring fields.
> >
> > Add a struct_group() for the spe registers so that memset() can correctly 
> > reason
> > about the size:
> >
> >In function 'fortify_memset_chk',
> >inlined from 'restore_user_regs.part.0' at 
> > arch/powerpc/kernel/signal_32.c:539:3:
> >>> include/linux/fortify-string.h:195:4: error: call to 
> >>> '__write_overflow_field' declared with attribute warning: detected write 
> >>> beyond size of field (1st parameter); maybe use struct_group()? 
> >>> [-Werror=attribute-warning]
> >  195 |__write_overflow_field();
> >  |^~~~
> >
> > Cc: Michael Ellerman 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul Mackerras 
> > Cc: Christophe Leroy 
> > Cc: Sudeep Holla 
> > Cc: linuxppc-...@lists.ozlabs.org
> > Reported-by: kernel test robot 
> > Signed-off-by: Kees Cook 
> > ---
> >  arch/powerpc/include/asm/processor.h | 6 --
> >  arch/powerpc/kernel/signal_32.c  | 6 +++---
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/processor.h 
> > b/arch/powerpc/include/asm/processor.h
> > index f348e564f7dd..05dc567cb9a8 100644
> > --- a/arch/powerpc/include/asm/processor.h
> > +++ b/arch/powerpc/include/asm/processor.h
> > @@ -191,8 +191,10 @@ struct thread_struct {
> > int used_vsr;   /* set if process has used VSX */
> >  #endif /* CONFIG_VSX */
> >  #ifdef CONFIG_SPE
> > -   unsigned long   evr[32];/* upper 32-bits of SPE regs */
> > -   u64 acc;/* Accumulator */
> > +   struct_group(spe,
> > +   unsigned long   evr[32];/* upper 32-bits of SPE regs */
> > +   u64 acc;/* Accumulator */
> > +   );
> > unsigned long   spefscr;/* SPE & eFP status */
> > unsigned long   spefscr_last;   /* SPEFSCR value on last prctl
> >call or trap return */
> > diff --git a/arch/powerpc/kernel/signal_32.c 
> > b/arch/powerpc/kernel/signal_32.c
> > index 0608581967f0..77b86caf5c51 100644
> > --- a/arch/powerpc/kernel/signal_32.c
> > +++ b/arch/powerpc/kernel/signal_32.c
> > @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs,
> > regs_set_return_msr(regs, regs->msr & ~MSR_SPE);
> > if (msr & MSR_SPE) {
> > /* restore spe registers from the stack */
> > -   unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs,
> > - ELF_NEVRREG * sizeof(u32), failed);
> > +   unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs,
> > + sizeof(current->thread.spe), failed);
> 
> This makes me nervous, because the ABI is that we copy ELF_NEVRREG *
> sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to
> be.
> 
> ie. if we use sizeof an inadvertent change to the fields in
> thread_struct could change how many bytes we copy out to userspace,
> which would be an ABI break.
> 
> And that's not that hard to do, because it's not at all obvious that the
> size and layout of fields in thread_struct affects the user ABI.
> 
> At the same time we don't want to copy the right number of bytes but
> the wrong content, so from that point of view using sizeof is good :)
> 
> The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that
> things match up, so maybe we should do that here too.
> 
> ie. add:
> 
>   BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32));
> 
> Not sure if you are happy doing that as part of this patch. I can always
> do it later if not.

Sounds good to me; I did that in a few other cases in the series where
the relationships between things seemed tenuous. :) I'll add this (as
!=) in v3.

Thanks!

-- 
Kees Cook

Re: [PATCH v2 56/63] RDMA/mlx5: Use struct_group() to zero struct mlx5_ib_mr

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 09:34:00AM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 19, 2021 at 11:14:37AM -0700, Kees Cook wrote:
> 
> > Which do you mean? When doing the conversions I tended to opt for
> > struct_group() since it provides more robust "intentionality". Strictly
> > speaking, the new memset helpers are doing field-spanning writes, but the
> > "clear to the end" pattern was so common it made sense to add the helpers,
> > as they're a bit less disruptive. It's totally up to you! :)
> 
> Well, of the patches you cc'd to me only this one used the struct
> group..

Understood. I've adjusted this for v3. Thanks!

-- 
Kees Cook

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 05:27:35PM +0200, Jiri Kosina wrote:
> On Fri, 20 Aug 2021, Kees Cook wrote:
> 
> > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > > field bounds checking for memset(), avoid intentionally writing across
> > > > neighboring fields.
> > > >
> > > > Add struct_group() to mark region of struct kone_mouse_event that should
> > > > be initialized to zero.
> > > >
> > > > Cc: Stefan Achatz 
> > > > Cc: Jiri Kosina 
> > > > Cc: Benjamin Tissoires 
> > > > Cc: linux-in...@vger.kernel.org
> > > > Signed-off-by: Kees Cook 
> > >
> > > Applied, thank you Kees.
> > >
> > 
> > Eek! No, this will break the build: struct_group() is not yet in the tree.
> > I can carry this with an Ack, etc.
> 
> I was pretty sure I saw struct_group() already in linux-next, but that was 
> apparently a vacation-induced brainfart, sorry. Dropping.

Oh, for these two patches, can I add your Acked-by while I carry them?

Thanks!

-- 
Kees Cook

Re: [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents

2021-08-20 Thread Jason Gunthorpe

On Fri, Aug 20, 2021 at 12:54:25PM -0300, Jason Gunthorpe wrote:
> On Thu, Jul 29, 2021 at 12:39:12PM +0300, Leon Romanovsky wrote:
> 
> > +/**
> > + * __sg_free_table - Free a previously mapped sg table
> > + * @table: The sg table header to use
> > + * @max_ents:  The maximum number of entries per single scatterlist
> > + * @total_ents:The total number of entries in the table
> > + * @nents_first_chunk: Number of entries int the (preallocated) first
> > + * scatterlist chunk, 0 means no such preallocated
> > + * first chunk
> > + * @free_fn:   Free function
> > + *
> > + *  Description:
> > + *Free an sg table previously allocated and setup with
> > + *__sg_alloc_table().  The @max_ents value must be identical to
> > + *that previously used with __sg_alloc_table().
> > + *
> > + **/
> > +void __sg_free_table(struct sg_table *table, unsigned int max_ents,
> > +unsigned int nents_first_chunk, sg_free_fn *free_fn)
> > +{
> > +   sg_free_table_entries(table, max_ents, nents_first_chunk, free_fn,
> > + table->orig_nents);
> > +}
> >  EXPORT_SYMBOL(__sg_free_table);
> 
> This is getting a bit indirect, there is only one caller of
> __sg_free_table() in sg_pool.c, so may as well just export
> sg_free_table_entries have have it use that directly.

And further since sg_free_table_entries() doesn't actually use table->
except for the SGL it should probably be called sg_free_table_sgl()

Jason

Re: [PATCH v2 02/14] drm/arm/hdlcd: Convert to Linux IRQ interfaces

2021-08-20 Thread Liviu Dudau

On Tue, Aug 03, 2021 at 11:06:52AM +0200, Thomas Zimmermann wrote:
> Drop the DRM IRQ midlayer in favor of Linux IRQ interfaces. DRM's
> IRQ helpers are mostly useful for UMS drivers. Modern KMS drivers
> don't benefit from using it.
> 
> DRM IRQ callbacks are now being called directly or inlined.
> 
> Calls to platform_get_irq() can fail with a negative errno code.
> Abort initialization in this case. The DRM IRQ midlayer does not
> handle this case correctly.
> 
> v2:
>   * name struct drm_device variables 'drm' (Sam)
> 
> Signed-off-by: Thomas Zimmermann 
> Acked-by: Sam Ravnborg 

Sorry for the delayed response due to holidays.

Acked-by: Liviu Dudau 

Best regards,
Liviu

> ---
>  drivers/gpu/drm/arm/hdlcd_drv.c | 174 ++--
>  drivers/gpu/drm/arm/hdlcd_drv.h |   1 +
>  2 files changed, 97 insertions(+), 78 deletions(-)
> 
> diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
> index 81ae92390736..479c2422a2e0 100644
> --- a/drivers/gpu/drm/arm/hdlcd_drv.c
> +++ b/drivers/gpu/drm/arm/hdlcd_drv.c
> @@ -29,7 +29,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -38,6 +37,94 @@
>  #include "hdlcd_drv.h"
>  #include "hdlcd_regs.h"
>  
> +static irqreturn_t hdlcd_irq(int irq, void *arg)
> +{
> + struct drm_device *drm = arg;
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + unsigned long irq_status;
> +
> + irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS);
> +
> +#ifdef CONFIG_DEBUG_FS
> + if (irq_status & HDLCD_INTERRUPT_UNDERRUN)
> + atomic_inc(&hdlcd->buffer_underrun_count);
> +
> + if (irq_status & HDLCD_INTERRUPT_DMA_END)
> + atomic_inc(&hdlcd->dma_end_count);
> +
> + if (irq_status & HDLCD_INTERRUPT_BUS_ERROR)
> + atomic_inc(&hdlcd->bus_error_count);
> +
> + if (irq_status & HDLCD_INTERRUPT_VSYNC)
> + atomic_inc(&hdlcd->vsync_count);
> +
> +#endif
> + if (irq_status & HDLCD_INTERRUPT_VSYNC)
> + drm_crtc_handle_vblank(&hdlcd->crtc);
> +
> + /* acknowledge interrupt(s) */
> + hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, irq_status);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static void hdlcd_irq_preinstall(struct drm_device *drm)
> +{
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + /* Ensure interrupts are disabled */
> + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, 0);
> + hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, ~0);
> +}
> +
> +static void hdlcd_irq_postinstall(struct drm_device *drm)
> +{
> +#ifdef CONFIG_DEBUG_FS
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK);
> +
> + /* enable debug interrupts */
> + irq_mask |= HDLCD_DEBUG_INT_MASK;
> +
> + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask);
> +#endif
> +}
> +
> +static int hdlcd_irq_install(struct drm_device *drm, int irq)
> +{
> + int ret;
> +
> + if (irq == IRQ_NOTCONNECTED)
> + return -ENOTCONN;
> +
> + hdlcd_irq_preinstall(drm);
> +
> + ret = request_irq(irq, hdlcd_irq, 0, drm->driver->name, drm);
> + if (ret)
> + return ret;
> +
> + hdlcd_irq_postinstall(drm);
> +
> + return 0;
> +}
> +
> +static void hdlcd_irq_uninstall(struct drm_device *drm)
> +{
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + /* disable all the interrupts that we might have enabled */
> + unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK);
> +
> +#ifdef CONFIG_DEBUG_FS
> + /* disable debug interrupts */
> + irq_mask &= ~HDLCD_DEBUG_INT_MASK;
> +#endif
> +
> + /* disable vsync interrupts */
> + irq_mask &= ~HDLCD_INTERRUPT_VSYNC;
> + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask);
> +
> + free_irq(hdlcd->irq, drm);
> +}
> +
>  static int hdlcd_load(struct drm_device *drm, unsigned long flags)
>  {
>   struct hdlcd_drm_private *hdlcd = drm->dev_private;
> @@ -90,7 +177,12 @@ static int hdlcd_load(struct drm_device *drm, unsigned 
> long flags)
>   goto setup_fail;
>   }
>  
> - ret = drm_irq_install(drm, platform_get_irq(pdev, 0));
> + ret = platform_get_irq(pdev, 0);
> + if (ret < 0)
> + goto irq_fail;
> + hdlcd->irq = ret;
> +
> + ret = hdlcd_irq_install(drm, hdlcd->irq);
>   if (ret < 0) {
>   DRM_ERROR("failed to install IRQ handler\n");
>   goto irq_fail;
> @@ -122,76 +214,6 @@ static void hdlcd_setup_mode_config(struct drm_device 
> *drm)
>   drm->mode_config.funcs = &hdlcd_mode_config_funcs;
>  }
>  
> -static irqreturn_t hdlcd_irq(int irq, void *arg)
> -{
> - struct drm_device *drm = arg;
> - struct hdlcd_drm_private *hdlcd = drm->dev_private;
> - unsigned long irq_status;
> -
> - irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS);
> -
> -#ifdef CONFIG_DEBUG_FS
> - if (irq_status & HDLCD_INTERRUPT_UNDERRUN)
> -

Re: [BUG - BISECTED] display not detected anymore

2021-08-20 Thread Heiko Carstens

On Thu, Aug 19, 2021 at 09:07:26PM +0300, Ville Syrjälä wrote:
> > ef79d62b5ce5 ("drm/i915: Encapsulate dbuf state handling harder")
> > 
> > With that commit the display is not detected anymore, one commit
> > before that it still works. So this one seems to be broken.
> > 
> > Ville, Stanislav, any idea how to fix this?
> > 
> > commit ef79d62b5ce53851901d6c1d21b74cbb9e27219b
> > Author: Ville Syrjälä 
> > Date:   Fri Jan 22 22:56:32 2021 +0200
> > 
> > drm/i915: Encapsulate dbuf state handling harder
> 
> That has nothing to do with display detection, so very mysterious.
> 
> Please file a bug at https://gitlab.freedesktop.org/drm/intel/issues/new
> boot with drm.debug=0xe with both good and bad kernels and attach the
> dmesg from both to the bug.

Everything (hopefully) provided here:
https://gitlab.freedesktop.org/drm/intel/-/issues/4013

Please let me know if you need more, or if I can help otherwise to
resolve this.

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Jiri Kosina

On Fri, 20 Aug 2021, Kees Cook wrote:

> > > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > > > field bounds checking for memset(), avoid intentionally writing across
> > > > > neighboring fields.
> > > > >
> > > > > Add struct_group() to mark region of struct kone_mouse_event that 
> > > > > should
> > > > > be initialized to zero.
> > > > >
> > > > > Cc: Stefan Achatz 
> > > > > Cc: Jiri Kosina 
> > > > > Cc: Benjamin Tissoires 
> > > > > Cc: linux-in...@vger.kernel.org
> > > > > Signed-off-by: Kees Cook 
> > > >
> > > > Applied, thank you Kees.
> > > >
> > > 
> > > Eek! No, this will break the build: struct_group() is not yet in the tree.
> > > I can carry this with an Ack, etc.
> > 
> > I was pretty sure I saw struct_group() already in linux-next, but that was 
> > apparently a vacation-induced brainfart, sorry. Dropping.
> 
> Oh, for these two patches, can I add your Acked-by while I carry them?

Yes, thanks, and sorry for the noise.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH rdma-next v3 0/3] SG fix together with update to RDMA umem

2021-08-20 Thread Jason Gunthorpe

On Thu, Jul 29, 2021 at 12:39:10PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Changelog:
> v3:
>  * Rewrote to new API suggestion
>  * Split for more patches
> v2: https://lore.kernel.org/lkml/cover.1626605893.git.leo...@nvidia.com
>  * Changed implementation of first patch, based on our discussion with 
> Christoph.
>https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/
> v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/
>  * Fixed sg_page with a _dma_ API in the umem.c
> v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com
> 
> 
> Maor Gottlieb (3):
>   lib/scatterlist: Provide a dedicated function to support table append
>   lib/scatterlist: Fix wrong update of orig_nents
>   RDMA: Use the sg_table directly and remove the opencoded version from
> umem

I'm going to send this into linux-next, last time that triggered some
bug reports.

But overall it looks okay, though some of the sg_append_table is bit
odd. Certainly using the sg_table throughout the RDMA code is big
improvement.

Lets see a v4, reviews/etc and I'll update it.

Jason

Re: [PATCH] drm/i915/gt: Potential error pointer dereference in pinned_context()

2021-08-20 Thread Rodrigo Vivi

On Fri, Aug 13, 2021 at 04:01:06PM +0200, Thomas Hellström wrote:
> 
> On 8/13/21 1:36 PM, Dan Carpenter wrote:
> > If the intel_engine_create_pinned_context() function returns an error
> > pointer, then dereferencing "ce" will Oops.  Use "vm" instead of
> > "ce->vm".
> > 
> > Fixes: cf586021642d ("drm/i915/gt: Pipelined page migration")
> > Signed-off-by: Dan Carpenter 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
> > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index d0a7c934fd3b..1dac21aa7e5c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -177,7 +177,7 @@ static struct intel_context *pinned_context(struct 
> > intel_gt *gt)
> > ce = intel_engine_create_pinned_context(engine, vm, SZ_512K,
> > I915_GEM_HWS_MIGRATE,
> > &key, "migrate");
> > -   i915_vm_put(ce->vm);
> > +   i915_vm_put(vm);
> > return ce;
> >   }
> 
> Thanks,
> 
> Reviewed-by: Thomas Hellström 

And pushed to drm-intel-gt-next, thanks for the patch and review.

> 
>

Re: [PATCH v2 0/8] drm/bridge: Make panel and bridge probe order consistent

2021-08-20 Thread Maxime Ripard

Hi Andrzej,

On Wed, Aug 04, 2021 at 04:09:38PM +0200, a.hajda wrote:
> Hi Maxime,
> 
> I have been busy with other tasks, and I did not follow the list last 
> time, so sorry for my late response.
> 
> On 28.07.2021 15:32, Maxime Ripard wrote:
> > Hi,
> > 
> > We've encountered an issue with the RaspberryPi DSI panel that prevented the
> > whole display driver from probing.
> > 
> > The issue is described in detail in the commit 7213246a803f ("drm/vc4: dsi:
> > Only register our component once a DSI device is attached"), but the basic 
> > idea
> > is that since the panel is probed through i2c, there's no synchronization
> > between its probe and the registration of the MIPI-DSI host it's attached 
> > to.
> > 
> > We initially moved the component framework registration to the MIPI-DSI Host
> > attach hook to make sure we register our component only when we have a DSI
> > device attached to our MIPI-DSI host, and then use lookup our DSI device in 
> > our
> > bind hook.
> > 
> > However, all the DSI bridges controlled through i2c are only registering 
> > their
> > associated DSI device in their bridge attach hook, meaning with our change
> 
> I guess this is incorrect. I have promoted several times the pattern 
> that device driver shouldn't expose its interfaces (for example 
> component_add, drm_panel_add, drm_bridge_add) until it gathers all 
> required dependencies. In this particular case bridges should defer 
> probe until DSI bus becomes available. I guess this way the patch you 
> reverts would work.
> 
> I advised few times this pattern in case of DSI hosts, apparently I 
> didn't notice the similar issue can appear in case of bridges. Or there 
> is something I have missed???
> 
> Anyway there are already eleven(?) bridge drivers using this pattern. I 
> wonder if fixing it would be difficult, or if it expose other issues???
> The patches should be quite straightforward - move 
> of_find_mipi_dsi_host_by_node and mipi_dsi_device_register_full to probe 
> time.

I gave this a try today, went back to the current upstream code and
found that indeed it works. I converted two bridges that works now. I'll
send a new version some time next week and will convert all the others
if we agree on the approach.

Thanks for the suggestion!

> Finally I think that if we will not fix these bridge drivers we will 
> encounter another set of issues with new platforms connecting "DSI host 
> drivers assuming this pattern" and "i2c/dsi device drivers assuming 
> pattern already present in the bridges".

Yeah, this is exactly the situation I'm in :)

Maxime


signature.asc
Description: PGP signature

Re: [PATCH] drm/i915: Actually delete gpu reloc selftests

2021-08-20 Thread Rodrigo Vivi

On Fri, Aug 20, 2021 at 05:49:32PM +0200, Daniel Vetter wrote:
> In
> 
> commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7
> Author: Daniel Vetter 
> Date:   Tue Aug 3 14:48:33 2021 +0200
> 
> drm/i915: delete gpu reloc code

it would be better with dim cite format...

do we need the Fixes: tag?

anyway:

Reviewed-by: Rodrigo Vivi 


> 
> I deleted the gpu relocation code and the selftest include and
> enabling, but accidentally forgot about the selftest source code.
> 
> Fix this oversight.
> 
> Signed-off-by: Daniel Vetter 
> Cc: Jon Bloomfield 
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Daniel Vetter 
> Cc: Joonas Lahtinen 
> Cc: "Thomas Hellström" 
> Cc: Matthew Auld 
> Cc: Lionel Landwerlin 
> Cc: Dave Airlie 
> Cc: Jason Ekstrand 
> ---
>  .../i915/gem/selftests/i915_gem_execbuffer.c  | 190 --
>  1 file changed, 190 deletions(-)
>  delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> deleted file mode 100644
> index 16162fc2782d..
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> +++ /dev/null
> @@ -1,190 +0,0 @@
> -// SPDX-License-Identifier: MIT
> -/*
> - * Copyright © 2020 Intel Corporation
> - */
> -
> -#include "i915_selftest.h"
> -
> -#include "gt/intel_engine_pm.h"
> -#include "selftests/igt_flush_test.h"
> -
> -static u64 read_reloc(const u32 *map, int x, const u64 mask)
> -{
> - u64 reloc;
> -
> - memcpy(&reloc, &map[x], sizeof(reloc));
> - return reloc & mask;
> -}
> -
> -static int __igt_gpu_reloc(struct i915_execbuffer *eb,
> -struct drm_i915_gem_object *obj)
> -{
> - const unsigned int offsets[] = { 8, 3, 0 };
> - const u64 mask =
> - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
> - const u32 *map = page_mask_bits(obj->mm.mapping);
> - struct i915_request *rq;
> - struct i915_vma *vma;
> - int err;
> - int i;
> -
> - vma = i915_vma_instance(obj, eb->context->vm, NULL);
> - if (IS_ERR(vma))
> - return PTR_ERR(vma);
> -
> - err = i915_gem_object_lock(obj, &eb->ww);
> - if (err)
> - return err;
> -
> - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
> - if (err)
> - return err;
> -
> - /* 8-Byte aligned */
> - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
> - if (err <= 0)
> - goto reloc_err;
> -
> - /* !8-Byte aligned */
> - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
> - if (err <= 0)
> - goto reloc_err;
> -
> - /* Skip to the end of the cmd page */
> - i = PAGE_SIZE / sizeof(u32) - 1;
> - i -= eb->reloc_cache.rq_size;
> - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
> -  MI_NOOP, i);
> - eb->reloc_cache.rq_size += i;
> -
> - /* Force next batch */
> - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
> - if (err <= 0)
> - goto reloc_err;
> -
> - GEM_BUG_ON(!eb->reloc_cache.rq);
> - rq = i915_request_get(eb->reloc_cache.rq);
> - reloc_gpu_flush(eb, &eb->reloc_cache);
> - GEM_BUG_ON(eb->reloc_cache.rq);
> -
> - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
> - if (err) {
> - intel_gt_set_wedged(eb->engine->gt);
> - goto put_rq;
> - }
> -
> - if (!i915_request_completed(rq)) {
> - pr_err("%s: did not wait for relocations!\n", eb->engine->name);
> - err = -EINVAL;
> - goto put_rq;
> - }
> -
> - for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> - u64 reloc = read_reloc(map, offsets[i], mask);
> -
> - if (reloc != i) {
> - pr_err("%s[%d]: map[%d] %llx != %x\n",
> -eb->engine->name, i, offsets[i], reloc, i);
> - err = -EINVAL;
> - }
> - }
> - if (err)
> - igt_hexdump(map, 4096);
> -
> -put_rq:
> - i915_request_put(rq);
> -unpin_vma:
> - i915_vma_unpin(vma);
> - return err;
> -
> -reloc_err:
> - if (!err)
> - err = -EIO;
> - goto unpin_vma;
> -}
> -
> -static int igt_gpu_reloc(void *arg)
> -{
> - struct i915_execbuffer eb;
> - struct drm_i915_gem_object *scratch;
> - int err = 0;
> - u32 *map;
> -
> - eb.i915 = arg;
> -
> - scratch = i915_gem_object_create_internal(eb.i915, 4096);
> - if (IS_ERR(scratch))
> - return PTR_ERR(scratch);
> -
> - map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC);
> - if (IS_ERR(map)) {
> - err = PTR_ERR(map);
> - goto err_scratch;
> - }
> -
> - intel_gt_pm_get(&eb.i915->gt);
> -
> - for_each_uabi_engine(eb.engine, eb.i915) {
> - if (intel_engine_requires_c

[pull] amdgpu, amdkfd, radeon drm-next-5.15

2021-08-20 Thread Alex Deucher

Hi Dave, Daniel,

Updates for 5.15.  Mainly bug fixes and cleanups.

The following changes since commit 554594567b1fa3da74f88ec7b2dc83d000c58e98:

  drm/display: fix possible null-pointer dereference in dcn10_set_clock() 
(2021-08-11 17:19:54 -0400)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.15-2021-08-20

for you to fetch changes up to 90a9266269eb9f71af1f323c33e1dca53527bd22:

  drm/amdgpu: Cancel delayed work when GFXOFF is disabled (2021-08-20 12:09:44 
-0400)


amd-drm-next-5.15-2021-08-20:

amdgpu:
- embed hw fence into job
- Misc SMU fixes
- PSP TA code cleanup
- RAS fixes
- PWM fan speed fixes
- DC workqueue cleanups
- SR-IOV fixes
- gfxoff delayed work fix
- Pin domain check fix

amdkfd:
- SVM fixes

radeon:
- Code cleanup


Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.79

Aric Cyr (1):
  drm/amd/display: 3.2.149

Candice Li (3):
  drm/amd/amdgpu: consolidate PSP TA context
  drm/amd/amdgpu: remove unnecessary RAS context field
  drm/amd: consolidate TA shared memory structures

Christian König (1):
  drm/amdgpu: use the preferred pin domain after the check

Colin Ian King (1):
  drm/amd/pm: Fix spelling mistake "firwmare" -> "firmware"

Evan Quan (9):
  drm/amd/pm: correct the fan speed RPM setting
  drm/amd/pm: record the RPM and PWM based fan speed settings
  drm/amd/pm: correct the fan speed PWM retrieving
  drm/amd/pm: correct the fan speed RPM retrieving
  drm/amd/pm: drop the unnecessary intermediate percent-based transition
  drm/amd/pm: drop unnecessary manual mode check
  drm/amd/pm: correct the address of Arcturus fan related registers
  drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily
  drm/amd/pm: a quick fix for "divided by zero" error

Hawking Zhang (1):
  drm/amdgpu: increase max xgmi physical node for aldebaran

Jack Zhang (1):
  drm/amd/amdgpu embed hw_fence into amdgpu_job

Jake Wang (1):
  drm/amd/display: Ensure DCN save after VM setup

Jiange Zhao (1):
  drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR 
notification.

Jonathan Kim (1):
  drm/amdgpu: get extended xgmi topology data

Kenneth Feng (2):
  Revert "drm/amd/pm: fix workload mismatch on vega10"
  drm/amd/pm: change the workload type for some cards

Kevin Wang (5):
  drm/amd/pm: correct DPM_XGMI/VCN_DPM feature name
  drm/amd/pm: skip to load smu microcode on sriov for aldebaran
  drm/amd/pm: change return value in aldebaran_get_power_limit()
  drm/amd/pm: change smu msg's attribute to allow working under sriov
  drm/amd/pm: change pp_dpm_sclk/mclk/fclk attribute is RO for aldebaran

Lukas Bulwahn (1):
  drm: amdgpu: remove obsolete reference to config CHASH

Michel Dänzer (1):
  drm/amdgpu: Cancel delayed work when GFXOFF is disabled

Nathan Chancellor (1):
  drm/radeon: Add break to switch statement in 
radeonfb_create_pinned_object()

Nicholas Kazlauskas (3):
  drm/amd/display: Fix multi-display support for idle opt workqueue
  drm/amd/display: Use vblank control events for PSR enable/disable
  drm/amd/display: Guard vblank wq flush with DCN guards

Wayne Lin (1):
  drm/amd/display: Create dc_sink when EDID fail

Yifan Zhang (1):
  drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure

YuBiao Wang (1):
  drm/amd/amdgpu:flush ttm delayed work before cancel_sync

Zhan Liu (1):
  drm/amd/display: Use DCN30 watermark calc for DCN301

Zhigang Luo (1):
  drm/amdgpu: correct MMSCH 1.0 version

 drivers/gpu/drm/Kconfig|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  28 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  86 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|  37 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  39 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.c  |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c   |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 432 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 111 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h

Re: [PATCH v3] drm/i915/dp: Use max params for panels < eDP 1.4

2021-08-20 Thread Ville Syrjälä

On Fri, Aug 20, 2021 at 03:52:59PM +0800, Kai-Heng Feng wrote:
> Users reported that after commit 2bbd6dba84d4 ("drm/i915: Try to use
> fast+narrow link on eDP again and fall back to the old max strategy on
> failure"), the screen starts to have wobbly effect.
> 
> Commit a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for
> everything") doesn't help either, that means the affected eDP 1.2 panels
> only work with max params.
> 
> So use max params for panels < eDP 1.4 as Windows does to solve the
> issue.
> 
> v3:
>  - Do the eDP rev check in intel_edp_init_dpcd()
> 
> v2:
>  - Check eDP 1.4 instead of DPCD 1.1 to apply max params
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3714
> Fixes: 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and 
> fall back to the old max strategy on failure")
> Fixes: a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for 
> everything")
> Suggested-by: Ville Syrjälä 
> Signed-off-by: Kai-Heng Feng 

Slapped a cc:stable on it and pushed to drm-intel-next. Thanks.

> ---
>  drivers/gpu/drm/i915/display/intel_dp.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 75d4ebc669411..e0dbd35ae7bc0 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -2445,11 +2445,14 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
>*/
>   if (drm_dp_dpcd_read(&intel_dp->aux, DP_EDP_DPCD_REV,
>intel_dp->edp_dpcd, sizeof(intel_dp->edp_dpcd)) ==
> -  sizeof(intel_dp->edp_dpcd))
> +  sizeof(intel_dp->edp_dpcd)) {
>   drm_dbg_kms(&dev_priv->drm, "eDP DPCD: %*ph\n",
>   (int)sizeof(intel_dp->edp_dpcd),
>   intel_dp->edp_dpcd);
>  
> + intel_dp->use_max_params = intel_dp->edp_dpcd[0] < DP_EDP_14;
> + }
> +
>   /*
>* This has to be called after intel_dp->edp_dpcd is filled, PSR checks
>* for SET_POWER_CAPABLE bit in intel_dp->edp_dpcd[1]
> -- 
> 2.32.0

-- 
Ville Syrjälä
Intel

[GIT PULL] etnaviv-next for 5.15

2021-08-20 Thread Lucas Stach

Hi Dave, Daniel,

things are still slow in etnaviv land. Just one hardware support
addition for the GPU found on the NXP Layerscape LS1028A SoC from
Michael and the GEM mmap cleanup from Thomas.

Regards,
Lucas

The following changes since commit 8a02ea42bc1d4c448caf1bab0e05899dad503f74:

  Merge tag 'drm-intel-next-fixes-2021-06-29' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-06-30 15:42:05 
+1000)

are available in the Git repository at:

  https://git.pengutronix.de/git/lst/linux 
81fd23e2b3ccf71c807e671444e8accaba98ca53

for you to fetch changes up to 81fd23e2b3ccf71c807e671444e8accaba98ca53:

  drm/etnaviv: Implement mmap as GEM object function (2021-07-06 18:32:23 +0200)


Michael Walle (2):
  drm/etnaviv: add HWDB entry for GC7000 r6202
  drm/etnaviv: add clock gating workaround for GC7000 r6202

Thomas Zimmermann (1):
  drm/etnaviv: Implement mmap as GEM object function

 drivers/gpu/drm/etnaviv/etnaviv_drv.c   | 14 ++
 drivers/gpu/drm/etnaviv/etnaviv_drv.h   |  3 ---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c   | 18 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 13 -
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c   |  6 ++
 drivers/gpu/drm/etnaviv/etnaviv_hwdb.c  | 31 
+++
 6 files changed, 44 insertions(+), 41 deletions(-)

[GIT PULL] drm-misc + drm-intel: Add support for out-of-band hotplug notification

2021-08-20 Thread Hans de Goede

Hello drm-misc and drm-intel maintainers,

My "Add support for out-of-band hotplug notification" patchset:
https://patchwork.freedesktop.org/series/93763/

Is ready for merging now, as discussed on IRC I based this series
on top drm-tip and when trying to apply the i915 parts on top
of drm-misc this fails due to conflict.

So as Jani suggested here is a pull-req for a topic-branch with the
entire set, minus the troublesome i915 bits. Once this has been merged
into both drm-misc-next and drm-intel-next I can push the 2 i915
patch do drm-intel-next on top of the merge.

Note there are also 2 drivers/usb/typec patches in here these
have Greg KH's Reviewed-by for merging through the drm tree,
Since this USB code does not change all that much. I also checked
and the drm-misc-next-2021-08-12 base of this tree contains the
same last commit to the modified file as usb-next.

Daniel Vetter mentioned on IRC that it might be better for you to simply
pick-up the series directly from patchwork, that is fine too in that
case don't forget to add:

Reviewed-by: Lyude Paul 

To the entire series (given in a reply to the cover-letter)

And:

Reviewed-by: Greg Kroah-Hartman 

To the usb/typec patches (patch 7/8), this was given in reply
to a previous posting of the series and I forgot to add this
in the resend.

Regards,

Hans


The following changes since commit c7782443a88926a4f938f0193041616328cf2db2:

  drm/bridge: ti-sn65dsi86: Avoid creating multiple connectors (2021-08-12 
09:56:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux.git 
drm-misc-intel-oob-hotplug-v1

for you to fetch changes up to 7f811394878535ed9a6849717de8c2959ae38899:

  usb: typec: altmodes/displayport: Notify drm subsys of hotplug events 
(2021-08-20 12:35:59 +0200)


Topic branch for drm-misc / drm-intel for OOB hotplug support for Type-C 
connectors


Hans de Goede (6):
  drm/connector: Give connector sysfs devices there own device_type
  drm/connector: Add a fwnode pointer to drm_connector and register with 
ACPI (v2)
  drm/connector: Add drm_connector_find_by_fwnode() function (v3)
  drm/connector: Add support for out-of-band hotplug notification (v3)
  usb: typec: altmodes/displayport: Make dp_altmode_notify() more generic
  usb: typec: altmodes/displayport: Notify drm subsys of hotplug events

 drivers/gpu/drm/drm_connector.c  | 79 +
 drivers/gpu/drm/drm_crtc_internal.h  |  2 +
 drivers/gpu/drm/drm_sysfs.c  | 87 +++-
 drivers/usb/typec/altmodes/Kconfig   |  1 +
 drivers/usb/typec/altmodes/displayport.c | 58 +
 include/drm/drm_connector.h  | 25 +
 6 files changed, 217 insertions(+), 35 deletions(-)

Re: [PATCH] drm/i915: Actually delete gpu reloc selftests

2021-08-20 Thread Daniel Vetter

On Fri, Aug 20, 2021 at 7:00 PM Rodrigo Vivi  wrote:
>
> On Fri, Aug 20, 2021 at 05:49:32PM +0200, Daniel Vetter wrote:
> > In
> >
> > commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7
> > Author: Daniel Vetter 
> > Date:   Tue Aug 3 14:48:33 2021 +0200
> >
> > drm/i915: delete gpu reloc code
>
> it would be better with dim cite format...
>
> do we need the Fixes: tag?

I did delete the selftest, I just forgot to delete the code. So no
Fixes: imo. I'll bikeshed the commit citation.

> anyway:
>
> Reviewed-by: Rodrigo Vivi 

Thanks for the review, will merge when CI approves too, one never knows.
-Daniel

>
>
> >
> > I deleted the gpu relocation code and the selftest include and
> > enabling, but accidentally forgot about the selftest source code.
> >
> > Fix this oversight.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jon Bloomfield 
> > Cc: Chris Wilson 
> > Cc: Maarten Lankhorst 
> > Cc: Daniel Vetter 
> > Cc: Joonas Lahtinen 
> > Cc: "Thomas Hellström" 
> > Cc: Matthew Auld 
> > Cc: Lionel Landwerlin 
> > Cc: Dave Airlie 
> > Cc: Jason Ekstrand 
> > ---
> >  .../i915/gem/selftests/i915_gem_execbuffer.c  | 190 --
> >  1 file changed, 190 deletions(-)
> >  delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> >
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c 
> > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> > deleted file mode 100644
> > index 16162fc2782d..
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> > +++ /dev/null
> > @@ -1,190 +0,0 @@
> > -// SPDX-License-Identifier: MIT
> > -/*
> > - * Copyright © 2020 Intel Corporation
> > - */
> > -
> > -#include "i915_selftest.h"
> > -
> > -#include "gt/intel_engine_pm.h"
> > -#include "selftests/igt_flush_test.h"
> > -
> > -static u64 read_reloc(const u32 *map, int x, const u64 mask)
> > -{
> > - u64 reloc;
> > -
> > - memcpy(&reloc, &map[x], sizeof(reloc));
> > - return reloc & mask;
> > -}
> > -
> > -static int __igt_gpu_reloc(struct i915_execbuffer *eb,
> > -struct drm_i915_gem_object *obj)
> > -{
> > - const unsigned int offsets[] = { 8, 3, 0 };
> > - const u64 mask =
> > - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
> > - const u32 *map = page_mask_bits(obj->mm.mapping);
> > - struct i915_request *rq;
> > - struct i915_vma *vma;
> > - int err;
> > - int i;
> > -
> > - vma = i915_vma_instance(obj, eb->context->vm, NULL);
> > - if (IS_ERR(vma))
> > - return PTR_ERR(vma);
> > -
> > - err = i915_gem_object_lock(obj, &eb->ww);
> > - if (err)
> > - return err;
> > -
> > - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
> > - if (err)
> > - return err;
> > -
> > - /* 8-Byte aligned */
> > - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
> > - if (err <= 0)
> > - goto reloc_err;
> > -
> > - /* !8-Byte aligned */
> > - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
> > - if (err <= 0)
> > - goto reloc_err;
> > -
> > - /* Skip to the end of the cmd page */
> > - i = PAGE_SIZE / sizeof(u32) - 1;
> > - i -= eb->reloc_cache.rq_size;
> > - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
> > -  MI_NOOP, i);
> > - eb->reloc_cache.rq_size += i;
> > -
> > - /* Force next batch */
> > - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
> > - if (err <= 0)
> > - goto reloc_err;
> > -
> > - GEM_BUG_ON(!eb->reloc_cache.rq);
> > - rq = i915_request_get(eb->reloc_cache.rq);
> > - reloc_gpu_flush(eb, &eb->reloc_cache);
> > - GEM_BUG_ON(eb->reloc_cache.rq);
> > -
> > - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
> > - if (err) {
> > - intel_gt_set_wedged(eb->engine->gt);
> > - goto put_rq;
> > - }
> > -
> > - if (!i915_request_completed(rq)) {
> > - pr_err("%s: did not wait for relocations!\n", 
> > eb->engine->name);
> > - err = -EINVAL;
> > - goto put_rq;
> > - }
> > -
> > - for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> > - u64 reloc = read_reloc(map, offsets[i], mask);
> > -
> > - if (reloc != i) {
> > - pr_err("%s[%d]: map[%d] %llx != %x\n",
> > -eb->engine->name, i, offsets[i], reloc, i);
> > - err = -EINVAL;
> > - }
> > - }
> > - if (err)
> > - igt_hexdump(map, 4096);
> > -
> > -put_rq:
> > - i915_request_put(rq);
> > -unpin_vma:
> > - i915_vma_unpin(vma);
> > - return err;
> > -
> > -reloc_err:
> > - if (!err)
> > - err = -EIO;
> > - goto unpin_vma;
> > -}
> > -
> > -static int igt_gpu_reloc(void *arg)
> > -{
> > - struct i915_execbuffer eb;
> > - struct d

Re: [PATCH v3 1/3] dt-bindings: Add YAML bindings for NVDEC

2021-08-20 Thread Rob Herring

On Wed, Aug 18, 2021 at 8:18 AM Thierry Reding  wrote:
>
> On Wed, Aug 18, 2021 at 11:24:28AM +0300, Mikko Perttunen wrote:
> > On 8/18/21 12:20 AM, Rob Herring wrote:
> > > On Wed, Aug 11, 2021 at 01:50:28PM +0300, Mikko Perttunen wrote:
> > > > Add YAML device tree bindings for NVDEC, now in a more appropriate
> > > > place compared to the old textual Host1x bindings.
> > > >
> > > > Signed-off-by: Mikko Perttunen 
> > > > ---
> > > > v3:
> > > > * Drop host1x bindings
> > > > * Change read2 to read-1 in interconnect names
> > > > v2:
> > > > * Fix issues pointed out in v1
> > > > * Add T194 nvidia,instance property
> > > > ---
> > > >   .../gpu/host1x/nvidia,tegra210-nvdec.yaml | 109 ++
> > > >   MAINTAINERS   |   1 +
> > > >   2 files changed, 110 insertions(+)
> > > >   create mode 100644 
> > > > Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > >
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > >  
> > > > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > > new file mode 100644
> > > > index ..571849625da8
> > > > --- /dev/null
> > > > +++ 
> > > > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > > @@ -0,0 +1,109 @@
> > > > +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: 
> > > > "http://devicetree.org/schemas/gpu/host1x/nvidia,tegra210-nvdec.yaml#";
> > > > +$schema: "http://devicetree.org/meta-schemas/core.yaml#";
> > > > +
> > > > +title: Device tree binding for NVIDIA Tegra NVDEC
> > > > +
> > > > +description: |
> > > > +  NVDEC is the hardware video decoder present on NVIDIA Tegra210
> > > > +  and newer chips. It is located on the Host1x bus and typically
> > > > +  programmed through Host1x channels.
> > > > +
> > > > +maintainers:
> > > > +  - Thierry Reding 
> > > > +  - Mikko Perttunen 
> > > > +
> > > > +properties:
> > > > +  $nodename:
> > > > +pattern: "^nvdec@[0-9a-f]*$"
> > > > +
> > > > +  compatible:
> > > > +enum:
> > > > +  - nvidia,tegra210-nvdec
> > > > +  - nvidia,tegra186-nvdec
> > > > +  - nvidia,tegra194-nvdec
> > > > +
> > > > +  reg:
> > > > +maxItems: 1
> > > > +
> > > > +  clocks:
> > > > +maxItems: 1
> > > > +
> > > > +  clock-names:
> > > > +items:
> > > > +  - const: nvdec
> > > > +
> > > > +  resets:
> > > > +maxItems: 1
> > > > +
> > > > +  reset-names:
> > > > +items:
> > > > +  - const: nvdec
> > > > +
> > > > +  power-domains:
> > > > +maxItems: 1
> > > > +
> > > > +  iommus:
> > > > +maxItems: 1
> > > > +
> > > > +  interconnects:
> > > > +items:
> > > > +  - description: DMA read memory client
> > > > +  - description: DMA read 2 memory client
> > > > +  - description: DMA write memory client
> > > > +
> > > > +  interconnect-names:
> > > > +items:
> > > > +  - const: dma-mem
> > > > +  - const: read-1
> > > > +  - const: write
> > > > +
> > > > +required:
> > > > +  - compatible
> > > > +  - reg
> > > > +  - clocks
> > > > +  - clock-names
> > > > +  - resets
> > > > +  - reset-names
> > > > +  - power-domains
> > > > +
> > > > +if:
> > > > +  properties:
> > > > +compatible:
> > > > +  contains:
> > > > +const: nvidia,tegra194-host1x
> > >
> > > host1x? This will never be true as the schema is only applied to nodes
> > > with the nvdec compatible.
> >
> > Argh, it's a typo indeed. Should be nvidia,tegra194-nvdec.
> >
> > >
> > > > +then:
> > > > +  properties:
> > > > +nvidia,instance:
> > > > +  items:
> > > > +- description: 0 for NVDEC0, or 1 for NVDEC1
> > >
> > > What's this for? We generally don't do indices in DT.
> >
> > When programming the hardware through Host1x, we need to know the "class ID"
> > of the hardware, specific to each instance. So we need to know which
> > instance it is. Technically of course we could derive this from the MMIO
> > address but that seems more confusing.
> >
> > >
> > > > +
> > > > +additionalProperties: true
> > >
> > > This should be false or 'unevaluatedProperties: false'
> >
> > I tried that but it resulted in validation failures; please see the
> > discussion in v2.
>
> Rob mentioned that there is now support for unevaluatedProperties in
> dt-schema. I was able to test this, though with only limited success. I
> made the following changes on top of this patch:

Here's a branch that works with current jsonschema master:

https://github.com/robherring/dt-schema/tree/draft2020-12

> --- >8 ---
> diff --git 
> a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml 
> b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> index d2681c98db7e..0bdf05fc8fc7 100644
> --- a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> +++ b/Documentation/devicetree/bindings/gpu/host1x/nvidia,teg

Re: [Intel-gfx] [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

Kick tasklet after queuing a request so it submitted in a timely manner.

Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for new 
inteface")


Is this actually a bug or just a performance issue? in the latter case I 
don't think we need a fixes tag.



Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 8f7a11e65ef5..d61f906105ef 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1050,6 +1050,7 @@ static inline void queue_request(struct i915_sched_engine 
*sched_engine,
list_add_tail(&rq->sched.link,
  i915_sched_lookup_priolist(sched_engine, prio));
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+   tasklet_hi_schedule(&sched_engine->tasklet);


the caller of queue_request() already has a tasklet_hi_schedule in 
another branch of the if/else statement. Maybe we can have the caller 
own the kick to keep it in one place? Not a blocker.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


  }
  
  static int guc_bypass_tasklet_submit(struct intel_guc *guc,

Re: [Intel-gfx] [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request

2021-08-20 Thread Matthew Brost

On Fri, Aug 20, 2021 at 11:31:56AM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 8/18/2021 11:16 PM, Matthew Brost wrote:
> > Kick tasklet after queuing a request so it submitted in a timely manner.
> > 
> > Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for 
> > new inteface")
> 
> Is this actually a bug or just a performance issue? in the latter case I
> don't think we need a fixes tag.
> 

Basically the tasklet won't get queued in certain ituations until the
heartbeat ping. Didn't notice it as the tasklet is only used during flow
control or after a full GT reset which both are rather rare. We can
probably drop the fixes tag as GuC submission isn't on by default is
still works without this fix.

> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 8f7a11e65ef5..d61f906105ef 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1050,6 +1050,7 @@ static inline void queue_request(struct 
> > i915_sched_engine *sched_engine,
> > list_add_tail(&rq->sched.link,
> >   i915_sched_lookup_priolist(sched_engine, prio));
> > set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +   tasklet_hi_schedule(&sched_engine->tasklet);
> 
> the caller of queue_request() already has a tasklet_hi_schedule in another
> branch of the if/else statement. Maybe we can have the caller own the kick
> to keep it in one place? Not a blocker.
>

I guess it could be:

bool kick = need_taklet()

if (kick)
queue_requst()
else
kick = bypass()
if (kick)
kick_tasklet()

Idk, it that is better? I'll think on this and decide before the next
post.

Matt

> Reviewed-by: Daniele Ceraolo Spurio 
> 
> Daniele
> 
> >   }
> >   static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>

Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

When unblocking a context, do not enable scheduling if the context is
banned, guc_id invalid, or not registered.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost 
Cc: 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d61f906105ef..e53a4ef7d442 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1586,6 +1586,9 @@ static void guc_context_unblock(struct intel_context *ce)
spin_lock_irqsave(&ce->guc_state.lock, flags);
  
  	if (unlikely(submission_disabled(guc) ||

+intel_context_is_banned(ce) ||
+context_guc_id_invalid(ce) ||
+!lrc_desc_registered(guc, ce->guc_id) ||
 !intel_context_is_pinned(ce) ||
 context_pending_disable(ce) ||
 context_blocked(ce) > 1)) {


This is getting to a lot of conditions. Maybe we can simplify it a bit? 
E.g it should be possible to check context_blocked, context_banned and 
context_pending_disable as a single op:


#define SCHED_STATE_MULTI_BLOCKED_MASK \  /* 2 or more blocks */
    (SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED)
#define SCHED_STATE_NO_UNBLOCK \
    SCHED_STATE_MULTI_BLOCKED_MASK | \
    SCHED_STATE_PENDING_DISABLE | \
    SCHED_STATE_BANNED

Not a blocker.

Reviewed-by: Daniele Ceraolo Spurio 

Daniele

Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered

2021-08-20 Thread Matthew Brost

On Fri, Aug 20, 2021 at 11:42:38AM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 8/18/2021 11:16 PM, Matthew Brost wrote:
> > When unblocking a context, do not enable scheduling if the context is
> > banned, guc_id invalid, or not registered.
> > 
> > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > Signed-off-by: Matthew Brost 
> > Cc: 
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index d61f906105ef..e53a4ef7d442 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1586,6 +1586,9 @@ static void guc_context_unblock(struct intel_context 
> > *ce)
> > spin_lock_irqsave(&ce->guc_state.lock, flags);
> > if (unlikely(submission_disabled(guc) ||
> > +intel_context_is_banned(ce) ||
> > +context_guc_id_invalid(ce) ||
> > +!lrc_desc_registered(guc, ce->guc_id) ||
> >  !intel_context_is_pinned(ce) ||
> >  context_pending_disable(ce) ||
> >  context_blocked(ce) > 1)) {
> 
> This is getting to a lot of conditions. Maybe we can simplify it a bit? E.g

Yea, this some defensive programming to cover all the basis if another
async operation (ban, reset, unpin) happens when this op is in flight.
Probably some of the conditions are not needed but being extra safe
here.

> it should be possible to check context_blocked, context_banned and
> context_pending_disable as a single op:
> 
> #define SCHED_STATE_MULTI_BLOCKED_MASK \  /* 2 or more blocks */
>     (SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED)
> #define SCHED_STATE_NO_UNBLOCK \
>     SCHED_STATE_MULTI_BLOCKED_MASK | \
>     SCHED_STATE_PENDING_DISABLE | \
>     SCHED_STATE_BANNED

Good idea, let me move this to helper in the next spin.

Matt

> 
> Not a blocker.
> 
> Reviewed-by: Daniele Ceraolo Spurio 
> 
> Daniele
> 
>

Re: [syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread Thomas Zimmermann


Hi

Am 20.08.21 um 17:45 schrieb syzbot:

syzbot has bisected this issue to:


Good bot!



commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
Author: Daniel Vetter 
Date:   Fri Oct 9 23:21:56 2020 +

 drm/vkms: fbdev emulation support


Here's a guess.

GEM SHMEM + fbdev emulation requires that 
(drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O 
and SHMEM conflict over the use of page flags IIRC.


From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming 
amount of SHMEM-based drivers don't do either.


Best regards
Thomas



bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530
start commit:   614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530
console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530
kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30

Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com
Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature

Re: [syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread Daniel Vetter

On Fri, Aug 20, 2021 at 9:23 PM Thomas Zimmermann  wrote:
> Hi
>
> Am 20.08.21 um 17:45 schrieb syzbot:
> > syzbot has bisected this issue to:
>
> Good bot!
>
> >
> > commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
> > Author: Daniel Vetter 
> > Date:   Fri Oct 9 23:21:56 2020 +
> >
> >  drm/vkms: fbdev emulation support
>
> Here's a guess.
>
> GEM SHMEM + fbdev emulation requires that
> (drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O
> and SHMEM conflict over the use of page flags IIRC.

But we should only set up defio if fb->dirty is set, which vkms
doesn't do. So there's something else going on? So there must be
something else funny going on here I think ... No idea what's going on
really.
-Daniel

>  From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming
> amount of SHMEM-based drivers don't do either.
>
> Best regards
> Thomas
>
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530
> > start commit:   614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
> > git tree:   upstream
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
> > dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30
> >
> > Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com
> > Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 07/27] Revert "drm/i915/gt: Propagate change in error status to children on unhold"

2021-08-20 Thread Jason Ekstrand

On Thu, Aug 19, 2021 at 1:22 AM Matthew Brost  wrote:
>
> Propagating errors to dependent fences is wrong, don't do it. A selftest
> in the following exposed the propagating of an error to a dependent
> fence after an engine reset.

I feel like we could still have a bit of a better message.  Maybe
something like this:

Propagating errors to dependent fences is broken and can lead to
errors from one client ending up in another.  In 3761baae908a (Revert
"drm/i915: Propagate errors on awaiting already signaled fences"), we
attempted to get rid of fence error propagation but missed the case
added in 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status
to children on unhold").  Revert that one too.  This error was found
by an up-and-coming selftest which .

Otherwise, looks good to me.

--Jason

>
> This reverts commit 8e9f84cf5cac248a1c6a5daa4942879c8b765058.
>
> v2:
>  (Daniel Vetter)
>   - Use revert
>
> References: 3761baae908a (Revert "drm/i915: Propagate errors on awaiting 
> already signaled fences")
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de5f9c86b9a4..cafb0608ffb4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
> if (p->flags & I915_DEPENDENCY_WEAK)
> continue;
>
> -   /* Propagate any change in error status */
> -   if (rq->fence.error)
> -   i915_request_set_error_once(w, 
> rq->fence.error);
> -
> if (w->engine != rq->engine)
> continue;
>
> --
> 2.32.0
>

Re: refactor the i915 GVT support

2021-08-20 Thread Luis Chamberlain

On Fri, Aug 20, 2021 at 04:17:24PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 19, 2021 at 04:29:29PM +0800, Zhenyu Wang wrote:
> > I'm working on below patch to resolve this. But I met a weird issue in
> > case when building i915 as module and also kvmgt module, it caused
> > busy wait on request_module("kvmgt") when boot, it doesn't happen if
> > building i915 into kernel. I'm not sure what could be the reason?
> 
> Luis, do you know if there is a problem with a request_module from
> a driver ->probe routine that is probably called by a module_init
> function itself?

Generally no, but you can easily foot yourself in the feet by creating
cross dependencies and not dealing with them properly. I'd make sure
to keep module initialization as simple as possible, and run whatever
takes more time asynchronously, then use a state machine to allow
you to verify where you are in the initialization phase or query it
or wait for a completion with a timeout.

It seems the code in question is getting some spring cleaning, and its
unclear where the code is I can inspect. If there's a tree somewhere I
can take a peak I'd be happy to review possible oddities that may stick
out.

My goto model for these sorts of problems is to abstract the issue
*outside* of the driver in question and implement new selftests to
try to reproduce. This serves two purposes, 1) helps with testing
2) may allow you to see the problem more clearly.

  Luis

Re: [git pull] drm fixes for 5.14-rc7

2021-08-20 Thread pr-tracker-bot

The pull request you sent on Fri, 20 Aug 2021 15:36:29 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-08-20-3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8ba9fbe1e4b8a28050c283792344ee8b6bc3465c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

[PATCH 1/8] drm/etnaviv: return context from etnaviv_iommu_context_get

2021-08-20 Thread Lucas Stach

Being able to have the refcount manipulation in an assignment makes
it much easier to parse the code.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_buffer.c | 3 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem.c| 3 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 3 +--
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c| 6 ++
 drivers/gpu/drm/etnaviv/etnaviv_mmu.h| 4 +++-
 5 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_buffer.c 
b/drivers/gpu/drm/etnaviv/etnaviv_buffer.c
index 76d38561c910..cf741c5c82d2 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_buffer.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_buffer.c
@@ -397,8 +397,7 @@ void etnaviv_buffer_queue(struct etnaviv_gpu *gpu, u32 
exec_state,
if (switch_mmu_context) {
struct etnaviv_iommu_context *old_context = 
gpu->mmu_context;
 
-   etnaviv_iommu_context_get(mmu_context);
-   gpu->mmu_context = mmu_context;
+   gpu->mmu_context = 
etnaviv_iommu_context_get(mmu_context);
etnaviv_iommu_context_put(old_context);
}
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index b8fa6ed3dd73..fb7a33b88fc0 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -303,8 +303,7 @@ struct etnaviv_vram_mapping *etnaviv_gem_mapping_get(
list_del(&mapping->obj_node);
}
 
-   etnaviv_iommu_context_get(mmu_context);
-   mapping->context = mmu_context;
+   mapping->context = etnaviv_iommu_context_get(mmu_context);
mapping->use = 1;
 
ret = etnaviv_iommu_map_gem(mmu_context, etnaviv_obj,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 4dd7d9d541c0..486259e154af 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -532,8 +532,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void 
*data,
goto err_submit_objects;
 
submit->ctx = file->driver_priv;
-   etnaviv_iommu_context_get(submit->ctx->mmu);
-   submit->mmu_context = submit->ctx->mmu;
+   submit->mmu_context = etnaviv_iommu_context_get(submit->ctx->mmu);
submit->exec_state = args->exec_state;
submit->flags = args->flags;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 4102bcea3341..c8b9b0cc4442 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1365,12 +1365,10 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
}
 
if (!gpu->mmu_context) {
-   etnaviv_iommu_context_get(submit->mmu_context);
-   gpu->mmu_context = submit->mmu_context;
+   gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
etnaviv_gpu_start_fe_idleloop(gpu);
} else {
-   etnaviv_iommu_context_get(gpu->mmu_context);
-   submit->prev_mmu_context = gpu->mmu_context;
+   submit->prev_mmu_context = 
etnaviv_iommu_context_get(gpu->mmu_context);
}
 
if (submit->nr_pmrs) {
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.h 
b/drivers/gpu/drm/etnaviv/etnaviv_mmu.h
index d1d6902fd13b..e4a0b7d09c2e 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.h
@@ -105,9 +105,11 @@ void etnaviv_iommu_dump(struct etnaviv_iommu_context *ctx, 
void *buf);
 struct etnaviv_iommu_context *
 etnaviv_iommu_context_init(struct etnaviv_iommu_global *global,
   struct etnaviv_cmdbuf_suballoc *suballoc);
-static inline void etnaviv_iommu_context_get(struct etnaviv_iommu_context *ctx)
+static inline struct etnaviv_iommu_context *
+etnaviv_iommu_context_get(struct etnaviv_iommu_context *ctx)
 {
kref_get(&ctx->refcount);
+   return ctx;
 }
 void etnaviv_iommu_context_put(struct etnaviv_iommu_context *ctx);
 void etnaviv_iommu_restore(struct etnaviv_gpu *gpu,
-- 
2.30.2

[PATCH 2/8] drm/etnaviv: put submit prev MMU context when it exists

2021-08-20 Thread Lucas Stach

The prev context is the MMU context at the time of the job
queueing in hardware. As a job might be queued multiple times
due to recovery after a GPU hang, we need to make sure to put
the stale prev MMU context from a prior queuing, to avoid the
reference and thus the MMU context leaking.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index c8b9b0cc4442..c1b9c5cbed11 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1368,6 +1368,8 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
etnaviv_gpu_start_fe_idleloop(gpu);
} else {
+   if (submit->prev_mmu_context)
+   etnaviv_iommu_context_put(submit->prev_mmu_context);
submit->prev_mmu_context = 
etnaviv_iommu_context_get(gpu->mmu_context);
}
 
-- 
2.30.2

[PATCH 8/8] drm/etnaviv: add missing MMU context put when reaping MMU mapping

2021-08-20 Thread Lucas Stach

When we forcefully evict a mapping from the the address space and thus the
MMU context, the MMU context is leaked, as the mapping no longer points to
it, so it doesn't get freed when the GEM object is destroyed. Add the
mssing context put to fix the leak.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
index dab1b58006d8..9fb1a2aadbcb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
@@ -199,6 +199,7 @@ static int etnaviv_iommu_find_iova(struct 
etnaviv_iommu_context *context,
 */
list_for_each_entry_safe(m, n, &list, scan_node) {
etnaviv_iommu_remove_mapping(context, m);
+   etnaviv_iommu_context_put(m->context);
m->context = NULL;
list_del_init(&m->mmu_node);
list_del_init(&m->scan_node);
-- 
2.30.2

[PATCH 7/8] drm/etnaviv: reference MMU context when setting up hardware state

2021-08-20 Thread Lucas Stach

Move the refcount manipulation of the MMU context to the point where the
hardware state is programmed. At that point it is also known if a previous
MMU state is still there, or the state needs to be reprogrammed with a
potentially different context.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c  | 24 +++---
 drivers/gpu/drm/etnaviv/etnaviv_iommu.c|  4 
 drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c |  8 
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index f420c4f14657..1fa98ce870f7 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -641,17 +641,19 @@ void etnaviv_gpu_start_fe(struct etnaviv_gpu *gpu, u32 
address, u16 prefetch)
gpu->fe_running = true;
 }
 
-static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu)
+static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu,
+ struct etnaviv_iommu_context *context)
 {
-   u32 address = etnaviv_cmdbuf_get_va(&gpu->buffer,
-   &gpu->mmu_context->cmdbuf_mapping);
u16 prefetch;
+   u32 address;
 
/* setup the MMU */
-   etnaviv_iommu_restore(gpu, gpu->mmu_context);
+   etnaviv_iommu_restore(gpu, context);
 
/* Start command processor */
prefetch = etnaviv_buffer_init(gpu);
+   address = etnaviv_cmdbuf_get_va(&gpu->buffer,
+   &gpu->mmu_context->cmdbuf_mapping);
 
etnaviv_gpu_start_fe(gpu, address, prefetch);
 }
@@ -1369,14 +1371,12 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
goto out_unlock;
}
 
-   if (!gpu->fe_running) {
-   gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
-   etnaviv_gpu_start_fe_idleloop(gpu);
-   } else {
-   if (submit->prev_mmu_context)
-   etnaviv_iommu_context_put(submit->prev_mmu_context);
-   submit->prev_mmu_context = 
etnaviv_iommu_context_get(gpu->mmu_context);
-   }
+   if (!gpu->fe_running)
+   etnaviv_gpu_start_fe_idleloop(gpu, submit->mmu_context);
+
+   if (submit->prev_mmu_context)
+   etnaviv_iommu_context_put(submit->prev_mmu_context);
+   submit->prev_mmu_context = etnaviv_iommu_context_get(gpu->mmu_context);
 
if (submit->nr_pmrs) {
gpu->event[event[1]].sync_point = 
&sync_point_perfmon_sample_pre;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_iommu.c
index 1a7c89a67bea..afe5dd6a9925 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_iommu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu.c
@@ -92,6 +92,10 @@ static void etnaviv_iommuv1_restore(struct etnaviv_gpu *gpu,
struct etnaviv_iommuv1_context *v1_context = to_v1_context(context);
u32 pgtable;
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+   gpu->mmu_context = etnaviv_iommu_context_get(context);
+
/* set base addresses */
gpu_write(gpu, VIVS_MC_MEMORY_BASE_ADDR_RA, 
context->global->memory_base);
gpu_write(gpu, VIVS_MC_MEMORY_BASE_ADDR_FE, 
context->global->memory_base);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c 
b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
index f8bf488e9d71..d664ae29ae20 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
@@ -172,6 +172,10 @@ static void etnaviv_iommuv2_restore_nonsec(struct 
etnaviv_gpu *gpu,
if (gpu_read(gpu, VIVS_MMUv2_CONTROL) & VIVS_MMUv2_CONTROL_ENABLE)
return;
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+   gpu->mmu_context = etnaviv_iommu_context_get(context);
+
prefetch = etnaviv_buffer_config_mmuv2(gpu,
(u32)v2_context->mtlb_dma,
(u32)context->global->bad_page_dma);
@@ -192,6 +196,10 @@ static void etnaviv_iommuv2_restore_sec(struct etnaviv_gpu 
*gpu,
if (gpu_read(gpu, VIVS_MMUv2_SEC_CONTROL) & 
VIVS_MMUv2_SEC_CONTROL_ENABLE)
return;
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+   gpu->mmu_context = etnaviv_iommu_context_get(context);
+
gpu_write(gpu, VIVS_MMUv2_PTA_ADDRESS_LOW,
  lower_32_bits(context->global->v2.pta_dma));
gpu_write(gpu, VIVS_MMUv2_PTA_ADDRESS_HIGH,
-- 
2.30.2

[PATCH 6/8] drm/etnaviv: fix MMU context leak on GPU reset

2021-08-20 Thread Lucas Stach

After a reset the GPU is no longer using the MMU context and may be
restarted with a different context. While the mmu_state proeprly was
cleared, the context wasn't unreferenced, leading to a memory leak.

Cc: sta...@vger.kernel.org # 5.4
Reported-by: Michael Walle 
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 9c710924df6b..f420c4f14657 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -571,6 +571,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
 
gpu->fe_running = false;
gpu->exec_state = -1;
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
gpu->mmu_context = NULL;
 
return 0;
-- 
2.30.2

[PATCH 5/8] drm/etnaviv: exec and MMU state is lost when resetting the GPU

2021-08-20 Thread Lucas Stach

When the GPU is reset both the current exec state, as well as all MMU
state is lost. Move the driver side state tracking into the reset function
to keep hardware and software state from diverging.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 973843c35fca..9c710924df6b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -570,6 +570,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
etnaviv_gpu_update_clock(gpu);
 
gpu->fe_running = false;
+   gpu->exec_state = -1;
+   gpu->mmu_context = NULL;
 
return 0;
 }
@@ -830,7 +832,6 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu)
/* Now program the hardware */
mutex_lock(&gpu->lock);
etnaviv_gpu_hw_init(gpu);
-   gpu->exec_state = -1;
mutex_unlock(&gpu->lock);
 
pm_runtime_mark_last_busy(gpu->dev);
@@ -1055,8 +1056,6 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
spin_unlock(&gpu->event_spinlock);
 
etnaviv_gpu_hw_init(gpu);
-   gpu->exec_state = -1;
-   gpu->mmu_context = NULL;
 
mutex_unlock(&gpu->lock);
pm_runtime_mark_last_busy(gpu->dev);
-- 
2.30.2

[PATCH 4/8] drm/etnaviv: keep MMU context across runtime suspend/resume

2021-08-20 Thread Lucas Stach

The MMU state may be kept across a runtime suspend/resume cycle, as we
avoid a full hardware reset to keep the latency of the runtime PM small.

Don't pretend that the MMU state is lost in driver state. The MMU
context is pushed out when new HW jobs with a different context are
coming in. The only exception to this is when the GPU is unbound, in
which case we need to make sure to also free the last active context.

Cc: sta...@vger.kernel.org # 5.4
Reported-by: Michael Walle 
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 325858cfc2c3..973843c35fca 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1590,9 +1590,6 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
 */
etnaviv_gpu_wait_idle(gpu, 100);
 
-   etnaviv_iommu_context_put(gpu->mmu_context);
-   gpu->mmu_context = NULL;
-
gpu->fe_running = false;
}
 
@@ -1741,6 +1738,9 @@ static void etnaviv_gpu_unbind(struct device *dev, struct 
device *master,
etnaviv_gpu_hw_suspend(gpu);
 #endif
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+
if (gpu->initialized) {
etnaviv_cmdbuf_free(&gpu->buffer);
etnaviv_iommu_global_fini(gpu);
-- 
2.30.2

[PATCH 3/8] drm/etnaviv: stop abusing mmu_context as FE running marker

2021-08-20 Thread Lucas Stach

While the DMA frontend can only be active when the MMU context is set, the
reverse isn't necessarily true, as the frontend can be stopped while the
MMU state is kept. Stop treating mmu_context being set as a indication that
the frontend is running and instead add a explicit property.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 10 --
 drivers/gpu/drm/etnaviv/etnaviv_gpu.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index c1b9c5cbed11..325858cfc2c3 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -569,6 +569,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
/* We rely on the GPU running, so program the clock */
etnaviv_gpu_update_clock(gpu);
 
+   gpu->fe_running = false;
+
return 0;
 }
 
@@ -631,6 +633,8 @@ void etnaviv_gpu_start_fe(struct etnaviv_gpu *gpu, u32 
address, u16 prefetch)
  VIVS_MMUv2_SEC_COMMAND_CONTROL_ENABLE |
  VIVS_MMUv2_SEC_COMMAND_CONTROL_PREFETCH(prefetch));
}
+
+   gpu->fe_running = true;
 }
 
 static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu)
@@ -1364,7 +1368,7 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
goto out_unlock;
}
 
-   if (!gpu->mmu_context) {
+   if (!gpu->fe_running) {
gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
etnaviv_gpu_start_fe_idleloop(gpu);
} else {
@@ -1573,7 +1577,7 @@ int etnaviv_gpu_wait_idle(struct etnaviv_gpu *gpu, 
unsigned int timeout_ms)
 
 static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
 {
-   if (gpu->initialized && gpu->mmu_context) {
+   if (gpu->initialized && gpu->fe_running) {
/* Replace the last WAIT with END */
mutex_lock(&gpu->lock);
etnaviv_buffer_end(gpu);
@@ -1588,6 +1592,8 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
 
etnaviv_iommu_context_put(gpu->mmu_context);
gpu->mmu_context = NULL;
+
+   gpu->fe_running = false;
}
 
gpu->exec_state = -1;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
index 8ea48697d132..1c75c8ed5bce 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
@@ -101,6 +101,7 @@ struct etnaviv_gpu {
struct workqueue_struct *wq;
struct drm_gpu_scheduler sched;
bool initialized;
+   bool fe_running;
 
/* 'ring'-buffer: */
struct etnaviv_cmdbuf buffer;
-- 
2.30.2

Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device

2021-08-20 Thread Lyude Paul

On Fri, 2021-08-20 at 11:20 +, Lin, Wayne wrote:
> [Public]
> 
> > -Original Message-
> > From: Lyude Paul 
> > Sent: Thursday, August 19, 2021 2:59 AM
> > To: Lin, Wayne ; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas ; Wentland, Harry <
> > harry.wentl...@amd.com>; Zuo, Jerry
> > ; Wu, Hersen ; Juston Li <
> > juston...@intel.com>; Imre Deak ;
> > Ville Syrjälä ; Daniel Vetter <
> > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten Lankhorst
> > ; Maxime Ripard ;
> > Thomas Zimmermann ;
> > David Airlie ; Daniel Vetter ; Deucher,
> > Alexander ; Siqueira,
> > Rodrigo ; Pillai, Aurabindo <
> > aurabindo.pil...@amd.com>; Eryk Brol ; Bas
> > Nieuwenhuizen ; Cornij, Nikola <
> > nikola.cor...@amd.com>; Jani Nikula ; Manasi
> > Navare ; Ankit Nautiyal <
> > ankit.k.nauti...@intel.com>; José Roberto de Souza ;
> > Sean Paul ; Ben Skeggs ; 
> > sta...@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > On Wed, 2021-08-11 at 09:49 +, Lin, Wayne wrote:
> > > [Public]
> > > 
> > > > -Original Message-
> > > > From: Lyude Paul 
> > > > Sent: Wednesday, August 11, 2021 4:45 AM
> > > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org
> > > > Cc: Kazlauskas, Nicholas ; Wentland,
> > > > Harry < harry.wentl...@amd.com>; Zuo, Jerry ; Wu,
> > > > Hersen ; Juston Li < juston...@intel.com>; Imre
> > > > Deak ; Ville Syrjälä
> > > > ; Daniel Vetter <
> > > > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten
> > > > Lankhorst ; Maxime Ripard
> > > > ; Thomas Zimmermann ; David
> > > > Airlie ; Daniel Vetter ; Deucher,
> > > > Alexander ; Siqueira, Rodrigo
> > > > ; Pillai, Aurabindo <
> > > > aurabindo.pil...@amd.com>; Eryk Brol ; Bas
> > > > Nieuwenhuizen ; Cornij, Nikola <
> > > > nikola.cor...@amd.com>; Jani Nikula ; Manasi
> > > > Navare ; Ankit Nautiyal <
> > > > ankit.k.nauti...@intel.com>; José Roberto de Souza
> > > > ; Sean Paul ; Ben
> > > > Skeggs ; sta...@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > connected end device
> > > > 
> > > > On Wed, 2021-08-04 at 07:13 +, Lin, Wayne wrote:
> > > > > [Public]
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Lyude Paul 
> > > > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > > > To: Lin, Wayne ;
> > > > > > dri-devel@lists.freedesktop.org
> > > > > > Cc: Kazlauskas, Nicholas ;
> > > > > > Wentland, Harry < harry.wentl...@amd.com>; Zuo, Jerry
> > > > > > ; Wu, Hersen ; Juston Li
> > > > > > < juston...@intel.com>; Imre Deak ; Ville
> > > > > > Syrjälä ; Wentland, Harry <
> > > > > > harry.wentl...@amd.com>; Daniel Vetter ;
> > > > > > Sean Paul ; Maarten Lankhorst <
> > > > > > maarten.lankho...@linux.intel.com>; Maxime Ripard
> > > > > > ; Thomas Zimmermann ;
> > > > > > David Airlie ; Daniel Vetter
> > > > > > ; Deucher, Alexander
> > > > > > ; Siqueira, Rodrigo <
> > > > > > rodrigo.sique...@amd.com>; Pillai, Aurabindo
> > > > > > ; Eryk Brol ; Bas
> > > > > > Nieuwenhuizen ; Cornij, Nikola
> > > > > > ; Jani Nikula ;
> > > > > > Manasi Navare ; Ankit Nautiyal
> > > > > > ; José Roberto de Souza
> > > > > > ; Sean Paul ; Ben
> > > > > > Skeggs ; sta...@vger.kernel.org
> > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > > connected end device
> > > > > > 
> > > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > > > [Why]
> > > > > > > > Currently, we will create connectors for all output ports no
> > > > > > > > matter it's connected or not. However, in MST, we can only
> > > > > > > > determine whether an output port really stands for a
> > > > > > > > "connector"
> > > > > > > > till it is connected and check its peer device type as an
> > > > > > > > end device.
> > > > > > > 
> > > > > > > What is this commit trying to solve exactly? e.g. is AMD
> > > > > > > currently running into issues with there being too many DRM
> > > > > > > connectors or something like that?
> > > > > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > > > > unless there's good reason to change it.
> > > > > Hi Lyude,
> > > > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > > > 
> > > > > I come up with this commit because I observed something confusing
> > > > > when I was analyzing MST connectors' life cycle. Take the topology
> > > > > instance you mentioned below
> > > > > 
> > > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected
> > > > > w/
> > > > > display)
> > > > >     |
> > > > > -
> > > > > > Output_Port 2 (Disconnected)
> > > > >     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > > > > (Disconnected)
> > > > > 
> > > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within
> > > > > this hub. With our MST imple

[PATCH 0/3] drm/panfrost: Bug fixes for lock_region

2021-08-20 Thread Alyssa Rosenzweig

Chris Morgan reported UBSAN errors in panfrost and tracked them down to
the size computation in lock_region. This calculation is overcomplicated
(seemingly cargo culted from kbase) and can be simplified with kernel
helpers and some mathematical identities. The first patch in the series
rewrites the calculation in a form avoiding undefined behaviour; Chris
confirms it placates UBSAN.

While researching this function, I noticed a pair of other potential
bugs: Bifrost can lock more than 4GiB at a time, but must lock at least
32KiB at a time. The latter patches in the series handle these cases.

The size computation was unit-tested in userspace. Relevant code below,
just missing some copypaste definitions for fls64/clamp/etc:

#define MIN_LOCK (1ULL << 12)
#define MAX_LOCK (1ULL << 48)

struct {
uint64_t size;
uint8_t encoded;
} tests[] = {
/* Clamping */
{ 0, 11 },
{ 1, 11 },
{ 2, 11 },
{ 4095, 11 },
/* Power of two */
{ 4096, 11 },
/* Round up */
{ 4097, 12 },
{ 8192, 12 },
{ 16384, 13 },
{ 16385, 14 },
/* Maximum */
{ ~0ULL, 47 },
};

static uint8_t region_width(uint64_t size)
{
size = clamp(size, MIN_LOCK, MAX_LOCK);
return fls64(size - 1) - 1;
}

int main(int argc, char **argv)
{
for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) {
uint64_t test = tests[i].size;
uint8_t expected = tests[i].encoded;
uint8_t actual = region_width(test);

assert(expected == actual);
}
}

Alyssa Rosenzweig (3):
  drm/panfrost: Simplify lock_region calculation
  drm/panfrost: Use u64 for size in lock_region
  drm/panfrost: Clamp lock region to Bifrost minimum

 drivers/gpu/drm/panfrost/panfrost_mmu.c  | 31 +---
 drivers/gpu/drm/panfrost/panfrost_regs.h |  2 ++
 2 files changed, 13 insertions(+), 20 deletions(-)

-- 
2.30.2

1 2 >

1 - 100 of 137 matches

Mail list logo