RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
[AMD Official Use Only] @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian Do you have any concern on the kthread_park() approach ? Theoretically speaking sched_main shall run there exclusively with job_timeout since they both touches jobs, and stop scheduler during job_timeout won't impact performance since in that scenario There was already something wrong/stuck on that ring/scheduler Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Liu, Monk Sent: Thursday, August 19, 2021 6:26 PM To: Daniel Vetter ; Grodzovsky, Andrey Cc: Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Koenig, Christian Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." [AMD Official Use Only] Hi Daniel >> Why can't we stop the scheduler thread first, so that there's guaranteed no >> race? I've recently had a lot of discussions with panfrost folks about their >> reset that spawns across engines, and without stopping the scheduler thread >> first before you touch anything it's just plain impossible. Yeah we had this though as well in our mind. Our second approach is to call ktrhead_stop() in job_timedout() routine so that the "bad" job is guaranteed to be used without scheduler's touching or freeing, Check this sample patch one as well please: diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a9536..50a49cb 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct *work) sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ + kthread_park(sched->thread); spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* -* Remove the bad job so it cannot be freed by concurrent -* drm_sched_cleanup_jobs. It will be reinserted back after sched->thread -* is parked at which point it's safe. -*/ - list_del_init(&job->list); spin_unlock(&sched->job_list_lock); status = job->sched->ops->timedout_job(job); @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work) } else { spin_unlock(&sched->job_list_lock); } + kthread_unpark(sched->thread); if (status != DRM_GPU_SCHED_STAT_ENODEV) { spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); /* -* Reinsert back the bad job here - now it's safe as -* drm_sched_get_cleanup_job cannot race against us and release the -* bad job at this point - we parked (waited for) any in progress -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called -* now until the scheduler thread is unparked. -*/ - if (bad && bad->sched == sched) - /* -* Add at the head of the queue to reflect it was the earliest -* job extracted. -*/ - list_add(&bad->list, &sched->pending_list); - - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already * signaled. Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Daniel Vetter Sent: Thursday, August 19, 2021 5:31 PM To: Grodzovsky, Andrey Cc: Daniel Vetter ; Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Liu, Monk ; Koenig, Christian Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: > > On 2021-08-18 10:42 a.m., Daniel Vetter wrote: > > On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: > > > On 2021-08-18 10:32 a.m., Daniel Vetter wrote: > > > > On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote: > > > > > On 2021-08-18 10:02 a.m., Alex Deucher wrote: > > > > > > > > > > > + dri-devel > > > > > > > > > > > > Since scheduler is a shared component, please add dri-devel > > > > > > on all scheduler patches. > > > > > > > > > > > > On Wed, Aug 18, 2021 at 7:21 AM Jingwen Chen > > > > > > wrote: > > > > > > > [Why] > > > > > > > for bailing job, this commit will delete it from pending > > > > > > > list thus the bailing job w
Re: [PATCH] drm/ttm: allow debugfs_create_file() to fail in ttm_global_init()
On Mon, 16 Aug 2021 16:30:46 +0200 lnx7...@gregdf.com wrote: > From: Greg Depoire--Ferrer > > Commit 69de4421bb4c ("drm/ttm: Initialize debugfs from > ttm_global_init()") unintentionally made ttm_global_init() return > early with an error when debugfs_create_file() fails. When > CONFIG_DEBUG_FS is disabled, debugfs_create_file() returns a ENODEV > error so the TTM device would fail to initialize. > > Instead of returning early with the error, print it and continue. > ENODEV can be ignored because it just means that CONFIG_DEBUG_FS is > disabled. > > Fixes: 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()") > Reported-by: Mikael Pettersson > Reported-by: Duncan > Signed-off-by: Greg Depoire--Ferrer > --- > Hi, I had this bug as well with the nouveau driver after updating. > This patch fixes it for me. > > drivers/gpu/drm/ttm/ttm_device.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) This fixes the problem here, too. Running it now. Tested-by: Duncan -- Duncan - HTML messages treated as spam "They that can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety." Benjamin Franklin
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
Note that you do not want GUP to succeed on device page, i do not see where that is handled in the new code. On Sun, Aug 15, 2021 at 1:40 PM John Hubbard wrote: > > On 8/15/21 8:37 AM, Christoph Hellwig wrote: > >> diff --git a/include/linux/mm.h b/include/linux/mm.h > >> index 8ae31622deef..d48a1f0889d1 100644 > >> --- a/include/linux/mm.h > >> +++ b/include/linux/mm.h > >> @@ -1218,7 +1218,7 @@ __maybe_unused struct page > >> *try_grab_compound_head(struct page *page, int refs, > >> static inline __must_check bool try_get_page(struct page *page) > >> { > >> page = compound_head(page); > >> -if (WARN_ON_ONCE(page_ref_count(page) <= 0)) > >> +if (WARN_ON_ONCE(page_ref_count(page) < > >> (int)!is_zone_device_page(page))) > > > > Please avoid the overly long line. In fact I'd be tempted to just not > > bother here and keep the old, more lose check. Especially given that > > John has a patch ready that removes try_get_page entirely. > > > > Yes. Andrew has accepted it into mmotm. > > Ralph's patch here was written well before my cleanup that removed > try_grab_page() [1]. But now that we're here, if you drop this hunk then > it will make merging easier, I think. > > > [1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com > > thanks, > -- > John Hubbard > NVIDIA >
[syzbot] WARNING in drm_gem_shmem_vm_open
Hello, syzbot found the following issue on: HEAD commit:614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1462cb6130 kernel config: https://syzkaller.appspot.com/x/.config?x=96f0602203250753 dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com [ cut here ] WARNING: CPU: 0 PID: 11697 at drivers/gpu/drm/drm_gem_shmem_helper.c:562 drm_gem_shmem_vm_open+0x96/0xb0 drivers/gpu/drm/drm_gem_shmem_helper.c:562 Modules linked in: CPU: 0 PID: 11697 Comm: syz-executor.3 Not tainted 5.14.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:drm_gem_shmem_vm_open+0x96/0xb0 drivers/gpu/drm/drm_gem_shmem_helper.c:562 Code: 89 c6 e8 7d ec 23 fd 85 db 75 1a e8 34 e5 23 fd 48 89 ef 5b 5d 41 5c e9 e8 61 f5 ff e8 23 e5 23 fd 0f 0b eb ca e8 1a e5 23 fd <0f> 0b eb dd e8 b1 1f 6a fd eb 89 e8 aa 1f 6a fd eb a8 0f 1f 84 00 RSP: 0018:c9000b3cfb90 EFLAGS: 00010293 RAX: RBX: fffc RCX: RDX: 8880364eb880 RSI: 8451c3e6 RDI: 0003 RBP: 888033c70948 R08: R09: R10: 8451c3c3 R11: 0001 R12: 888146490800 R13: 888033c70a50 R14: 20166000 R15: 888033c709d8 FS: 7fbe43056700() GS:8880b9c0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 005422b8 CR3: 36274000 CR4: 00350ef0 Call Trace: __split_vma+0x23c/0x550 mm/mmap.c:2764 __do_munmap+0x32a/0x11c0 mm/mmap.c:2868 do_munmap mm/mmap.c:2922 [inline] munmap_vma_range mm/mmap.c:604 [inline] mmap_region+0x85a/0x1760 mm/mmap.c:1753 do_mmap+0x86e/0x1180 mm/mmap.c:1584 vm_mmap_pgoff+0x1b7/0x290 mm/util.c:519 ksys_mmap_pgoff+0x4a8/0x620 mm/mmap.c:1635 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665e9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:7fbe43056188 EFLAGS: 0246 ORIG_RAX: 0009 RAX: ffda RBX: 0056bf80 RCX: 004665e9 RDX: RSI: 2000 RDI: 20166000 RBP: 004bfcc4 R08: 0004 R09: R10: 0013 R11: 0246 R12: 0056bf80 R13: 7fffb615701f R14: 7fbe43056300 R15: 00022000 Code disassembly (best guess): 0: 89 c6 mov%eax,%esi 2: e8 7d ec 23 fd callq 0xfd23ec84 7: 85 db test %ebx,%ebx 9: 75 1a jne0x25 b: e8 34 e5 23 fd callq 0xfd23e544 10: 48 89 efmov%rbp,%rdi 13: 5b pop%rbx 14: 5d pop%rbp 15: 41 5c pop%r12 17: e9 e8 61 f5 ff jmpq 0xfff56204 1c: e8 23 e5 23 fd callq 0xfd23e544 21: 0f 0b ud2 23: eb ca jmp0xffef 25: e8 1a e5 23 fd callq 0xfd23e544 2a: 0f 0b ud2 <-- trapping instruction 2c: eb dd jmp0xb 2e: e8 b1 1f 6a fd callq 0xfd6a1fe4 33: eb 89 jmp0xffbe 35: e8 aa 1f 6a fd callq 0xfd6a1fe4 3a: eb a8 jmp0xffe4 3c: 0f .byte 0xf 3d: 1f (bad) 3e: 84 00 test %al,(%rax) --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches
[PATCH linux-next] : add put_device() after of_find_device_by_node()
This was found by coccicheck: ./drivers/gpu/drm/kmb/kmb_drv.c:503:2-8: ERROR missing put_device; call of_find_device_by_node on line 490, but without a corresponding object release within this function. Reported-by: Zeal Robot Signed-off-by: jing yangyang --- drivers/gpu/drm/kmb/kmb_drv.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/kmb/kmb_drv.c b/drivers/gpu/drm/kmb/kmb_drv.c index f54392e..58495a9 100644 --- a/drivers/gpu/drm/kmb/kmb_drv.c +++ b/drivers/gpu/drm/kmb/kmb_drv.c @@ -500,8 +500,10 @@ static int kmb_probe(struct platform_device *pdev) ret = kmb_dsi_host_bridge_init(get_device(&dsi_pdev->dev)); if (ret == -EPROBE_DEFER) { + put_device(&dsi_pdev->dev); return -EPROBE_DEFER; } else if (ret) { + put_device(&dsi_pdev->dev); DRM_ERROR("probe failed to initialize DSI host bridge\n"); return ret; } @@ -509,9 +511,10 @@ static int kmb_probe(struct platform_device *pdev) /* Create DRM device */ kmb = devm_drm_dev_alloc(dev, &kmb_driver, struct kmb_drm_private, drm); - if (IS_ERR(kmb)) + if (IS_ERR(kmb)) { + put_device(&dsi_pdev->dev); return PTR_ERR(kmb); - + } dev_set_drvdata(dev, &kmb->drm); /* Initialize MIPI DSI */ -- 1.8.3.1
Re: [PATCH v2 45/63] ath11k: Use memset_startat() for clearing queue descriptors
Kees Cook writes: > In preparation for FORTIFY_SOURCE performing compile-time and run-time > field bounds checking for memset(), avoid intentionally writing across > neighboring fields. > > Use memset_startat() so memset() doesn't get confused about writing > beyond the destination member that is intended to be the starting point > of zeroing through the end of the struct. Additionally split up a later > field-spanning memset() so that memset() can reason about the size. > > Cc: Kalle Valo > Cc: "David S. Miller" > Cc: Jakub Kicinski > Cc: ath...@lists.infradead.org > Cc: linux-wirel...@vger.kernel.org > Cc: net...@vger.kernel.org > Signed-off-by: Kees Cook To avoid conflicts I prefer taking this via my ath tree. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
No, that perfectly works for me. The problem we used to have with this approach was that we potentially have multiple timeouts at the same time. But when we serialize the timeout handling by using a single workqueue as suggested by Daniel now as well then that isn't an issue any more. Regards, Christian. Am 20.08.21 um 09:12 schrieb Liu, Monk: [AMD Official Use Only] @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian Do you have any concern on the kthread_park() approach ? Theoretically speaking sched_main shall run there exclusively with job_timeout since they both touches jobs, and stop scheduler during job_timeout won't impact performance since in that scenario There was already something wrong/stuck on that ring/scheduler Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Liu, Monk Sent: Thursday, August 19, 2021 6:26 PM To: Daniel Vetter ; Grodzovsky, Andrey Cc: Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Koenig, Christian Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." [AMD Official Use Only] Hi Daniel Why can't we stop the scheduler thread first, so that there's guaranteed no race? I've recently had a lot of discussions with panfrost folks about their reset that spawns across engines, and without stopping the scheduler thread first before you touch anything it's just plain impossible. Yeah we had this though as well in our mind. Our second approach is to call ktrhead_stop() in job_timedout() routine so that the "bad" job is guaranteed to be used without scheduler's touching or freeing, Check this sample patch one as well please: diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a9536..50a49cb 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct *work) sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ + kthread_park(sched->thread); spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* -* Remove the bad job so it cannot be freed by concurrent -* drm_sched_cleanup_jobs. It will be reinserted back after sched->thread -* is parked at which point it's safe. -*/ - list_del_init(&job->list); spin_unlock(&sched->job_list_lock); status = job->sched->ops->timedout_job(job); @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work) } else { spin_unlock(&sched->job_list_lock); } + kthread_unpark(sched->thread); if (status != DRM_GPU_SCHED_STAT_ENODEV) { spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); /* -* Reinsert back the bad job here - now it's safe as -* drm_sched_get_cleanup_job cannot race against us and release the -* bad job at this point - we parked (waited for) any in progress -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called -* now until the scheduler thread is unparked. -*/ - if (bad && bad->sched == sched) - /* -* Add at the head of the queue to reflect it was the earliest -* job extracted. -*/ - list_add(&bad->list, &sched->pending_list); - - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already * signaled. Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Daniel Vetter Sent: Thursday, August 19, 2021 5:31 PM To: Grodzovsky, Andrey Cc: Daniel Vetter ; Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Liu, Monk ; Koenig, Christian Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:42 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:32 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:02 a.m., Alex Deucher wrote: + dri-devel Since scheduler is a shared
Re: [RFC] Make use of non-dynamic dmabuf in RDMA
On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe wrote: > On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote: > > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman wrote: > > > > > > Hey all, > > > > > > Currently, the RDMA subsystem can only work with dynamic dmabuf > > > attachments, which requires the RDMA device to support on-demand-paging > > > (ODP) which is not common on most devices (only supported by mlx5). > > > > > > While the dynamic requirement makes sense for certain GPUs, some devices > > > (such as habanalabs) have device memory that is always "pinned" and do > > > not need/use the move_notify operation. > > > > > > The motivation of this RFC is to use habanalabs as the dmabuf exporter, > > > and EFA as the importer to allow for peer2peer access through libibverbs. > > > > > > This draft patch changes the dmabuf driver to differentiate between > > > static/dynamic attachments by looking at the move_notify op instead of > > > the importer_ops struct, and allowing the peer2peer flag to be enabled > > > in case of a static exporter. > > > > > > Thanks > > > > > > Signed-off-by: Gal Pressman > > > > Given that habanalabs dma-buf support is very firmly in limbo (at > > least it's not yet in linux-next or anywhere else) I think you want to > > solve that problem first before we tackle the additional issue of > > making p2p work without dynamic dma-buf. Without that it just doesn't > > make a lot of sense really to talk about solutions here. > > I have been thinking about adding a dmabuf exporter to VFIO, for > basically the same reason habana labs wants to do it. > > In that situation we'd want to see an approach similar to this as well > to have a broad usability. > > The GPU drivers also want this for certain sophisticated scenarios > with RDMA, the intree drivers just haven't quite got there yet. > > So, I think it is worthwhile to start thinking about this regardless > of habana labs. Oh sure, I've been having these for a while. I think there's two options: - some kind of soft-pin, where the contract is that we only revoke when absolutely necessary, and it's expected to be catastrophic on the importer's side. The use-case would be single user that fully controls all accelerator local memory, and so kernel driver evicting stuff. I havent implemented it, but the idea is that essentially in eviction we check whom we're evicting for (by checking owners of buffers maybe, atm those are not tracked in generic code but not that hard to add), and if it's the same userspace owner we don't ever pick these buffers as victims for eviction, preferreing -ENOMEM/-ENOSPC. If a new user comes around then we'd still throw these out to avoid abuse, and it would be up to sysadmins to make sure this doesn't happen untimely, maybe with the next thing. - cgroups for (pinned) buffers. Mostly because cgroups for local memory is somewhere on the plans anyway, but that one could also take forever since there's questions about overlap with memcg and things like that, plus thus far everyone who cares made and incompatible proposal about how it should be done :-/ A variant of the first one would be device-level revoke, which is a concept we already have in drm for the modesetting side and also for like 20 year old gpu drivers. We could brush that off and close some of the gaps (a student is fixing the locking right now, the thing left to do is mmap revoke), and I think that model of exclusive device ownership with the option to revoke fits pretty well for at least some of the accelerators floating around. In that case importers would never get a move_notify (maybe we should call this revoke_notify to make it clear it's a bit different) callback, except when the entire thing has been yanked. I think that would fit pretty well for VFIO, and I think we should be able to make it work for rdma too as some kind of auto-deregister. The locking might be fun with both of these since I expect some inversions compared to the register path, we'll have to figure these out. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On Thu, Aug 19, 2021 at 11:00 AM Sierra Guiza, Alejandro (Alex) wrote: > > > On 8/18/2021 2:28 PM, Ralph Campbell wrote: > > On 8/17/21 5:35 PM, Felix Kuehling wrote: > >> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: > >>> On 8/12/21 11:31 PM, Alex Sierra wrote: > From: Ralph Campbell > > ZONE_DEVICE struct pages have an extra reference count that > complicates the > code for put_page() and several places in the kernel that need to > check the > reference count to see that a page is not being used (gup, compaction, > migration, etc.). Clean up the code so the reference count doesn't > need to > be treated specially for ZONE_DEVICE. > > v2: > AS: merged this patch in linux 5.11 version > > v5: > AS: add condition at try_grab_page to check for the zone device type, > while > page ref counter is checked less/equal to zero. In case of device > zone, pages > ref counter are initialized to zero. > > Signed-off-by: Ralph Campbell > Signed-off-by: Alex Sierra > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > fs/dax.c | 4 +- > include/linux/dax.h| 2 +- > include/linux/memremap.h | 7 +-- > include/linux/mm.h | 13 + > lib/test_hmm.c | 2 +- > mm/internal.h | 8 +++ > mm/memremap.c | 68 > +++--- > mm/migrate.c | 5 -- > mm/page_alloc.c| 3 ++ > mm/swap.c | 45 ++--- > 12 files changed, 46 insertions(+), 115 deletions(-) > > >>> I haven't seen a response to the issues I raised back at v3 of this > >>> series. > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F4f6dd918-d79b-1aa7-3a4c-caa67ddc29bc%40nvidia.com%2F&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P7FxYm%2BkJaCkMFa3OHtuKrPOn7SvytFRmYQdIzq7rN4%3D&reserved=0 > >>> > >>> > >>> > >>> Did I miss something? > >> I think part of the response was that we did more testing. Alex added > >> support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests > >> recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE > >> about a zero page refcount in try_get_page. The fix is in the latest > >> version of patch 2. But it's already obsolete because John Hubbard is > >> about to remove that function altogether. > >> > >> I think the issues you raised were more uncertainty than known bugs. It > >> seems the fact that you can have DAX pages with 0 refcount is a feature > >> more than a bug. > >> > >> Regards, > >>Felix > > > > Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined? > > In that case, mmap() of a DAX device will call insert_page() which calls > > get_page() which would trigger VM_BUG_ON_PAGE(). > > > > I can believe it is OK for PTE_SPECIAL page table entries to have no > > struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with > > a zero reference count using insert_pfn(). > Hi Ralph, > We have tried the DAX tests with and without CONFIG_ARCH_HAS_PTE_SPECIAL > defined. > Apparently none of the tests touches that condition for a DAX device. Of > course, > that doesn't mean it could happen. > > Regards, > Alex S. > > > > > > > I find it hard to believe that other MM developers don't see an issue > > with a struct page with refcount == 0 and mapcount == 1. > > > > I don't see where init_page_count() is being called for the > > MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD > > driver allocates and passes to migrate_vma_setup(). > > Looks like svm_migrate_get_vram_page() needs to call init_page_count() > > instead of get_page(). (I'm looking at branch > > origin/alexsierrag/device_generic > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver.git&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IXe8HP2s8x5OdJdERBkGOYJCQk3iqCu5AYkwpDL8zec%3D&reserved=0) > Yes, you're right. My bad. Thanks for catching this up. I didn't realize > I was missing > to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never caught. > It worked after I replaced get_pages by init_page_count at > svm_migrate_get_vram_page. However, I don't think this is the
Re: [PATCH v6 08/13] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages
On Thu, Aug 19, 2021 at 10:05 PM Christoph Hellwig wrote: > > On Tue, Aug 17, 2021 at 11:44:54AM -0400, Felix Kuehling wrote: > > >> That's a good catch. Existing drivers shouldn't need a page_free > > >> callback if they didn't have one before. That means we need to add a > > >> NULL-pointer check in free_device_page. > > > Also the other state clearing (__ClearPageWaiters/mem_cgroup_uncharge/ > > > ->mapping = NULL). > > > > > > In many ways this seems like you want to bring back the DEVICE_PUBLIC > > > pgmap type that was removed a while ago due to the lack of users > > > instead of overloading the generic type. > > > > I think so. I'm not clear about how DEVICE_PUBLIC differed from what > > DEVICE_GENERIC is today. As I understand it, DEVICE_PUBLIC was removed > > because it was unused and also known to be broken in some ways. > > DEVICE_GENERIC seemed close enough to what we need, other than not being > > supported in the migration helpers. > > > > Would you see benefit in re-introducing DEVICE_PUBLIC as a distinct > > memory type from DEVICE_GENERIC? What would be the benefits of making > > that distinction? > > The old DEVICE_PUBLIC mostly different in that it allowed the page > to be returned from vm_normal_page, which I think was horribly buggy. Why was that buggy ? If I were to do it now, i would return DEVICE_PUBLIC page from vm_normal_page but i would ban pinning as pinning is exceptionally wrong for GPU. If you migrate some random anonymous/file back to your GPU memory and it gets pinned there then there is no way for the GPU to migrate the page out. Quickly you will run out of physically contiguous memory and things like big graphic buffer allocation (anything that needs physically contiguous memory) will fail. It is less of an issue on some hardware that rely less and less on physically contiguous memory but i do not think it is completely gone from all hw. > But the point is not to bring back these old semantics. The idea > is to be able to differeniate between your new coherent on-device > memory and the existing DEVICE_GENERIC. That is call the > code in free_devmap_managed_page that is currently only used > for device private pages also for your new public device pages without > affecting the devdax and xen use cases. Yes, I would rather bring back DEVICE_PUBLIC then try to use DEVICE_GENERIC, the GENERIC change was done for users that closely matched DAX semantics and it is not the case here, at least not from my point of view. Jerome
Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs
Kees Cook writes: > In preparation for FORTIFY_SOURCE performing compile-time and run-time > field bounds checking for memset(), avoid intentionally writing across > neighboring fields. > > Add a struct_group() for the spe registers so that memset() can correctly > reason > about the size: > >In function 'fortify_memset_chk', >inlined from 'restore_user_regs.part.0' at > arch/powerpc/kernel/signal_32.c:539:3: >>> include/linux/fortify-string.h:195:4: error: call to >>> '__write_overflow_field' declared with attribute warning: detected write >>> beyond size of field (1st parameter); maybe use struct_group()? >>> [-Werror=attribute-warning] > 195 |__write_overflow_field(); > |^~~~ > > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Christophe Leroy > Cc: Sudeep Holla > Cc: linuxppc-...@lists.ozlabs.org > Reported-by: kernel test robot > Signed-off-by: Kees Cook > --- > arch/powerpc/include/asm/processor.h | 6 -- > arch/powerpc/kernel/signal_32.c | 6 +++--- > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/processor.h > b/arch/powerpc/include/asm/processor.h > index f348e564f7dd..05dc567cb9a8 100644 > --- a/arch/powerpc/include/asm/processor.h > +++ b/arch/powerpc/include/asm/processor.h > @@ -191,8 +191,10 @@ struct thread_struct { > int used_vsr; /* set if process has used VSX */ > #endif /* CONFIG_VSX */ > #ifdef CONFIG_SPE > - unsigned long evr[32];/* upper 32-bits of SPE regs */ > - u64 acc;/* Accumulator */ > + struct_group(spe, > + unsigned long evr[32];/* upper 32-bits of SPE regs */ > + u64 acc;/* Accumulator */ > + ); > unsigned long spefscr;/* SPE & eFP status */ > unsigned long spefscr_last; /* SPEFSCR value on last prctl > call or trap return */ > diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c > index 0608581967f0..77b86caf5c51 100644 > --- a/arch/powerpc/kernel/signal_32.c > +++ b/arch/powerpc/kernel/signal_32.c > @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs, > regs_set_return_msr(regs, regs->msr & ~MSR_SPE); > if (msr & MSR_SPE) { > /* restore spe registers from the stack */ > - unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs, > - ELF_NEVRREG * sizeof(u32), failed); > + unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs, > + sizeof(current->thread.spe), failed); This makes me nervous, because the ABI is that we copy ELF_NEVRREG * sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to be. ie. if we use sizeof an inadvertent change to the fields in thread_struct could change how many bytes we copy out to userspace, which would be an ABI break. And that's not that hard to do, because it's not at all obvious that the size and layout of fields in thread_struct affects the user ABI. At the same time we don't want to copy the right number of bytes but the wrong content, so from that point of view using sizeof is good :) The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that things match up, so maybe we should do that here too. ie. add: BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32)); Not sure if you are happy doing that as part of this patch. I can always do it later if not. cheers
Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs
Le 20/08/2021 à 09:49, Michael Ellerman a écrit : Kees Cook writes: In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add a struct_group() for the spe registers so that memset() can correctly reason about the size: In function 'fortify_memset_chk', inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3: include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 195 |__write_overflow_field(); |^~~~ Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christophe Leroy Cc: Sudeep Holla Cc: linuxppc-...@lists.ozlabs.org Reported-by: kernel test robot Signed-off-by: Kees Cook --- arch/powerpc/include/asm/processor.h | 6 -- arch/powerpc/kernel/signal_32.c | 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index f348e564f7dd..05dc567cb9a8 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -191,8 +191,10 @@ struct thread_struct { int used_vsr; /* set if process has used VSX */ #endif /* CONFIG_VSX */ #ifdef CONFIG_SPE - unsigned long evr[32];/* upper 32-bits of SPE regs */ - u64 acc;/* Accumulator */ + struct_group(spe, + unsigned long evr[32];/* upper 32-bits of SPE regs */ + u64 acc;/* Accumulator */ + ); unsigned long spefscr;/* SPE & eFP status */ unsigned long spefscr_last; /* SPEFSCR value on last prctl call or trap return */ diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 0608581967f0..77b86caf5c51 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs, regs_set_return_msr(regs, regs->msr & ~MSR_SPE); if (msr & MSR_SPE) { /* restore spe registers from the stack */ - unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs, - ELF_NEVRREG * sizeof(u32), failed); + unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs, + sizeof(current->thread.spe), failed); This makes me nervous, because the ABI is that we copy ELF_NEVRREG * sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to be. ie. if we use sizeof an inadvertent change to the fields in thread_struct could change how many bytes we copy out to userspace, which would be an ABI break. And that's not that hard to do, because it's not at all obvious that the size and layout of fields in thread_struct affects the user ABI. At the same time we don't want to copy the right number of bytes but the wrong content, so from that point of view using sizeof is good :) The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that things match up, so maybe we should do that here too. ie. add: BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32)); You mean != I guess ? Not sure if you are happy doing that as part of this patch. I can always do it later if not. cheers
[PATCH v3] drm/i915/dp: Use max params for panels < eDP 1.4
Users reported that after commit 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and fall back to the old max strategy on failure"), the screen starts to have wobbly effect. Commit a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for everything") doesn't help either, that means the affected eDP 1.2 panels only work with max params. So use max params for panels < eDP 1.4 as Windows does to solve the issue. v3: - Do the eDP rev check in intel_edp_init_dpcd() v2: - Check eDP 1.4 instead of DPCD 1.1 to apply max params Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3714 Fixes: 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and fall back to the old max strategy on failure") Fixes: a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for everything") Suggested-by: Ville Syrjälä Signed-off-by: Kai-Heng Feng --- drivers/gpu/drm/i915/display/intel_dp.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c index 75d4ebc669411..e0dbd35ae7bc0 100644 --- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -2445,11 +2445,14 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp) */ if (drm_dp_dpcd_read(&intel_dp->aux, DP_EDP_DPCD_REV, intel_dp->edp_dpcd, sizeof(intel_dp->edp_dpcd)) == -sizeof(intel_dp->edp_dpcd)) +sizeof(intel_dp->edp_dpcd)) { drm_dbg_kms(&dev_priv->drm, "eDP DPCD: %*ph\n", (int)sizeof(intel_dp->edp_dpcd), intel_dp->edp_dpcd); + intel_dp->use_max_params = intel_dp->edp_dpcd[0] < DP_EDP_14; + } + /* * This has to be called after intel_dp->edp_dpcd is filled, PSR checks * for SET_POWER_CAPABLE bit in intel_dp->edp_dpcd[1] -- 2.32.0
[PATCH v3] Revert "drm/scheduler: Avoid accessing freed bad job."
[Why] for bailing job, this commit will delete it from pending list thus the bailing job will never have a chance to be resubmitted even in advance tdr mode. [How] after embeded hw_fence into amdgpu_job is done, the race condition that this commit tries to work around is completely solved.So revert this commit. This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. v2: add dma_fence_get/put() around timedout_job to avoid concurrent delete during processing timedout_job v3: park sched->thread instead during timedout_job. Signed-off-by: Jingwen Chen --- drivers/gpu/drm/scheduler/sched_main.c | 22 ++ 1 file changed, 2 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..c187fd3a6bb6 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct *work) sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ + kthread_park(sched->thread); spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* -* Remove the bad job so it cannot be freed by concurrent -* drm_sched_cleanup_jobs. It will be reinserted back after sched->thread -* is parked at which point it's safe. -*/ - list_del_init(&job->list); spin_unlock(&sched->job_list_lock); status = job->sched->ops->timedout_job(job); @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work) } else { spin_unlock(&sched->job_list_lock); } + kthread_unpark(sched->thread); if (status != DRM_GPU_SCHED_STAT_ENODEV) { spin_lock(&sched->job_list_lock); @@ -392,20 +388,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); - /* -* Reinsert back the bad job here - now it's safe as -* drm_sched_get_cleanup_job cannot race against us and release the -* bad job at this point - we parked (waited for) any in progress -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called -* now until the scheduler thread is unparked. -*/ - if (bad && bad->sched == sched) - /* -* Add at the head of the queue to reflect it was the earliest -* job extracted. -*/ - list_add(&bad->list, &sched->pending_list); - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already -- 2.25.1
Re: [PATCH v5, 00/15] Using component framework to support multi hardware decode
Hi Ezequiel, Thanks for your detail feedback. On Thu, 2021-08-19 at 11:10 -0300, Ezequiel Garcia wrote: > On Thu, 19 Aug 2021 at 04:13, yunfei.d...@mediatek.com > wrote: > > > > Hi Ezequiel, > > > > Thanks for your suggestion. > > > > On Wed, 2021-08-18 at 11:11 -0300, Ezequiel Garcia wrote: > > > +danvet > > > > > > Hi, > > > > > > On Tue, 10 Aug 2021 at 23:58, Yunfei Dong > > > wrote: > > > > > > > > This series adds support for multi hardware decode into mtk-vcodec, > > > > by first > > > > adding component framework to manage each hardware information: > > > > interrupt, > > > > clock, register bases and power. Secondly add core thread to deal > > > > with core > > > > hardware message, at the same time, add msg queue for different > > > > hardware > > > > share messages. Lastly, the architecture of different specs are not > > > > the same, > > > > using specs type to separate them. > > > > > > > > > > I don't think it's a good idea to introduce the component API in the > > > media subsystem. It doesn't seem to be maintained, IRC there's not > > > even > > > a maintainer for it, and it has some issues that were never > > > addressed. > > > > > > It would be really important to avoid it. Is it really needed in the > > > first place? > > > > > > Thanks, > > > Ezequiel > > > > For there are many hardware need to use, mt8192 is three and mt8195 is > > five. Maybe need more to be used in the feature. > > > > Each hardware has independent clk/power/iommu port/irq. > > Use component interface in prob to get each component's information. > > Just enable the hardware when need to use it, very convenient and > > simple. > > > > I found that there are many modules use component to manage hardware > > information, such as iommu and drm etc. > > > > Many drivers support multiple hardware variants, where each variant > has a different number of clocks or interrupts, see for instance > struct hantro_variant which allows to expose different codec cores, > some having both decoder/encoder, and some having just a decoder. > > The component API is mostly used by DRM to aggregate independent > subdevices (called components) into an aggregated driver. > > For instance, a DRM driver needs to glue together the HDMI, MIPI, > and plany controller, or any other hardware arrangement where > devices can be described independently. > The usage scenario is very similar with drm and iommu, So decide to use component framework. Decode has three/five or more hardwares, these hardware are independent. For mt8183 just need core hardware to decode, but mt8192 has lat,soc and core hardware to decode. When lat need to use, just enable lat hardware, core is the same.And mt8195 will has two cores, each core can work well independent. For each component device just used to open their power/clk/iommu port/irq when master need to enable it. The main logic is in master device. > The component API may look simple but has some issues, it's not easy > to debug, and can cause troubles if not used as expected [1]. > It's worth making sure you actually need a framework > to glue different devices together. > Each hardware has its index, master can get hardware information according these index, looks not complex. What do you mean about not easy to debug? > > Do you have any other suggestion for this architecture? > > > > Looking at the different patchsets that are posted, it's not clear > to me what exactly are the different architectures that you intend > to support, can you some documentation which clarifies that? > Have five hardwares lat,soc,core0,core1 and main. Lat thread can use lat soc and main, core thread can use soc,lat, core0 and core1. Core thread can be used or not for different project. Also Need to use these hardware dynamic at the same time. So I use component framework, just need to know the used hardware index according to different project.Need not to do complex logic to manage these hardwares. > Thanks, > Ezequiel > > [1] > https://patchwork.kernel.org/project/linux-rockchip/cover/20200120170602.3832-1-ezequ...@collabora.com/ Thanks, Yunfei Dong
Re: [PATCH v8 12/13] drm/mediatek: add MERGE support for mediatek-drm
Hi, Jason: On Thu, 2021-08-19 at 10:23 +0800, jason-jh.lin wrote: > Add MERGE engine file: > MERGE module is used to merge two slice-per-line inputs > into one side-by-side output. > > Signed-off-by: jason-jh.lin > --- [snip] > + > +int mtk_merge_clk_enable(struct device *dev) > +{ > + int ret = 0; > + struct mtk_disp_merge *priv = dev_get_drvdata(dev); > + > + ret = clk_prepare_enable(priv->clk); > + if (ret) > + pr_err("merge clk prepare enable failed\n"); > + > + if (priv->async_clk) { This checking is redundant. > + ret = clk_prepare_enable(priv->async_clk); > + if (ret) > + pr_err("async clk prepare enable failed\n"); > + } > + > + return ret; > +} > + > +void mtk_merge_clk_disable(struct device *dev) > +{ > + struct mtk_disp_merge *priv = dev_get_drvdata(dev); > + > + if (priv->async_clk) Ditto. Regards, CK > + clk_disable_unprepare(priv->async_clk); > + > + clk_disable_unprepare(priv->clk); > +} > +
[PATCH v4 0/5] drm: update locking for modesetting
Hi, Thanks for all the helpful feedback on the previous version. Taking all the suggestions together, this series now converts drm_device.master_mutex into master_rwsem, and also attempts to remove drm_file.master_lookup_lock. There might still be lock inversions lurking, so the output from Intel-gfx CI should be interesting to see. Overall, this series makes the following changes: - Patch 1: Fix a potential null ptr dereference in drm_master_release - Patch 2: Convert master_mutex into rwsem (avoids creating a new lock) - Patch 3: Update global mutex locking in the ioctl handler (avoids deadlock when grabbing read lock on master_rwsem in drm_ioctl_kernel) - Patch 4: Plug races with drm modesetting rights - Patch 5: Replace master_lookup_lock with master_rwsem by untangling remaining lock hierarchy inversions v3 -> v4 (suggested by Daniel Vetter): - Drop a patch that added an unnecessary master_lookup_lock in drm_master_release (previously patch 2) - Drop a patch that addressed a non-existent race in drm_is_current_master_locked (previously patch 3) - Remove fixes for non-existent null ptr dereferences (previous patch 4) - Protect drm_master.magic_map,unique{_len} with master_rwsem instead of master_lookup_lock (dropped previous patch 5) - Drop the patch that moved master_lookup_lock into struct drm_device (previously patch 1) - Drop a patch to export task_work_add (previously patch 8) - Revert the check for the global mutex in the ioctl handler to use drm_core_check_feature instead of drm_dev_needs_global_mutex - Push down master_rwsem locking for selected ioctls to avoid lock hierarchy inversions, and to allow us to hold write locks on master_rwsem instead of flushing readers - Remove master_lookup_lock by replacing it with master_rwsem v2 -> v3: - Unexport drm_master_flush, as suggested by Daniel Vetter. - Merge master_mutex and master_rwsem, as suggested by Daniel Vetter. - Export task_work_add, reported by kernel test robot. - Make master_flush static, reported by kernel test robot. - Move master_lookup_lock into struct drm_device. - Add a missing lock on master_lookup_lock in drm_master_release. - Fix a potential race in drm_is_current_master_locked. - Fix potential null ptr dereferences in drm_{auth, ioctl}. - Protect magic_map,unique{_len} with master_lookup_lock. - Convert master_mutex into a rwsem. - Update global mutex locking in the ioctl handler. v1 -> v2 (suggested by Daniel Vetter): - Address an additional race when drm_open runs. - Switch from SRCU to rwsem to synchronise readers and writers. - Implement drm_master_flush with task_work so that flushes can be queued to run before returning to userspace without creating a new DRM_MASTER_FLUSH ioctl flag. Best wishes, Desmond Desmond Cheong Zhi Xi (5): drm: fix null ptr dereference in drm_master_release drm: convert drm_device.master_mutex into a rwsem drm: lock drm_global_mutex earlier in the ioctl handler drm: avoid races with modesetting rights drm: remove drm_file.master_lookup_lock drivers/gpu/drm/drm_auth.c| 54 drivers/gpu/drm/drm_debugfs.c | 4 +- drivers/gpu/drm/drm_drv.c | 3 +- drivers/gpu/drm/drm_file.c| 7 ++-- drivers/gpu/drm/drm_internal.h| 1 + drivers/gpu/drm/drm_ioctl.c | 48 - drivers/gpu/drm/drm_lease.c | 69 ++- drivers/gpu/drm/drm_mode_object.c | 14 +-- include/drm/drm_auth.h| 6 +-- include/drm/drm_device.h | 15 +-- include/drm/drm_file.h| 17 +++- include/drm/drm_lease.h | 2 +- 12 files changed, 125 insertions(+), 115 deletions(-) -- 2.25.1
[PATCH v4 1/5] drm: fix null ptr dereference in drm_master_release
drm_master_release can be called on a drm_file without a master, which results in a null ptr dereference of file_priv->master->magic_map. The three cases are: 1. Error path in drm_open_helper drm_open(): drm_open_helper(): drm_master_open(): drm_new_set_master(); <--- returns -ENOMEM, drm_file.master not set drm_file_free(): drm_master_release(); <--- NULL ptr dereference (file_priv->master->magic_map) 2. Error path in mock_drm_getfile mock_drm_getfile(): anon_inode_getfile(); <--- returns error, drm_file.master not set drm_file_free(): drm_master_release(); <--- NULL ptr dereference (file_priv->master->magic_map) 3. In drm_client_close, as drm_client_open doesn't set up a master drm_file.master is set up in drm_open_helper through the call to drm_master_open, so we mirror it with a call to drm_master_release in drm_close_helper, and remove drm_master_release from drm_file_free to avoid the null ptr dereference. Signed-off-by: Desmond Cheong Zhi Xi --- drivers/gpu/drm/drm_file.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index ed25168619fc..90b62f360da1 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -282,9 +282,6 @@ void drm_file_free(struct drm_file *file) drm_legacy_ctxbitmap_flush(dev, file); - if (drm_is_primary_client(file)) - drm_master_release(file); - if (dev->driver->postclose) dev->driver->postclose(dev, file); @@ -305,6 +302,9 @@ static void drm_close_helper(struct file *filp) list_del(&file_priv->lhead); mutex_unlock(&dev->filelist_mutex); + if (drm_is_primary_client(file_priv)) + drm_master_release(file_priv); + drm_file_free(file_priv); } -- 2.25.1
[PATCH v4 2/5] drm: convert drm_device.master_mutex into a rwsem
drm_device.master_mutex currently protects the following: - drm_device.master - drm_file.master - drm_file.was_master - drm_file.is_master - drm_master.unique - drm_master.unique_len - drm_master.magic_map There is a clear separation between functions that read or change these attributes. Hence, convert master_mutex into a rwsem to enable concurrent readers. Signed-off-by: Desmond Cheong Zhi Xi --- drivers/gpu/drm/drm_auth.c| 35 ++- drivers/gpu/drm/drm_debugfs.c | 4 ++-- drivers/gpu/drm/drm_drv.c | 3 +-- drivers/gpu/drm/drm_ioctl.c | 10 +- include/drm/drm_auth.h| 6 +++--- include/drm/drm_device.h | 10 ++ include/drm/drm_file.h| 12 ++-- 7 files changed, 41 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c index 60a6b21474b1..73ade0513ccb 100644 --- a/drivers/gpu/drm/drm_auth.c +++ b/drivers/gpu/drm/drm_auth.c @@ -64,7 +64,7 @@ static bool drm_is_current_master_locked(struct drm_file *fpriv) { lockdep_assert_once(lockdep_is_held(&fpriv->master_lookup_lock) || - lockdep_is_held(&fpriv->minor->dev->master_mutex)); + lockdep_is_held(&fpriv->minor->dev->master_rwsem)); return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master; } @@ -96,7 +96,7 @@ int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv) struct drm_auth *auth = data; int ret = 0; - mutex_lock(&dev->master_mutex); + down_write(&dev->master_rwsem); if (!file_priv->magic) { ret = idr_alloc(&file_priv->master->magic_map, file_priv, 1, 0, GFP_KERNEL); @@ -104,7 +104,7 @@ int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv) file_priv->magic = ret; } auth->magic = file_priv->magic; - mutex_unlock(&dev->master_mutex); + up_write(&dev->master_rwsem); DRM_DEBUG("%u\n", auth->magic); @@ -119,13 +119,13 @@ int drm_authmagic(struct drm_device *dev, void *data, DRM_DEBUG("%u\n", auth->magic); - mutex_lock(&dev->master_mutex); + down_write(&dev->master_rwsem); file = idr_find(&file_priv->master->magic_map, auth->magic); if (file) { file->authenticated = 1; idr_replace(&file_priv->master->magic_map, NULL, auth->magic); } - mutex_unlock(&dev->master_mutex); + up_write(&dev->master_rwsem); return file ? 0 : -EINVAL; } @@ -167,7 +167,7 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv) struct drm_master *old_master; struct drm_master *new_master; - lockdep_assert_held_once(&dev->master_mutex); + lockdep_assert_held_once(&dev->master_rwsem); WARN_ON(fpriv->is_master); old_master = fpriv->master; @@ -249,7 +249,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data, { int ret; - mutex_lock(&dev->master_mutex); + down_write(&dev->master_rwsem); ret = drm_master_check_perm(dev, file_priv); if (ret) @@ -281,7 +281,7 @@ int drm_setmaster_ioctl(struct drm_device *dev, void *data, drm_set_master(dev, file_priv, false); out_unlock: - mutex_unlock(&dev->master_mutex); + up_write(&dev->master_rwsem); return ret; } @@ -298,7 +298,7 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data, { int ret; - mutex_lock(&dev->master_mutex); + down_write(&dev->master_rwsem); ret = drm_master_check_perm(dev, file_priv); if (ret) @@ -321,8 +321,9 @@ int drm_dropmaster_ioctl(struct drm_device *dev, void *data, } drm_drop_master(dev, file_priv); + out_unlock: - mutex_unlock(&dev->master_mutex); + up_write(&dev->master_rwsem); return ret; } @@ -334,7 +335,7 @@ int drm_master_open(struct drm_file *file_priv) /* if there is no current master make this fd it, but do not create * any master object for render clients */ - mutex_lock(&dev->master_mutex); + down_write(&dev->master_rwsem); if (!dev->master) { ret = drm_new_set_master(dev, file_priv); } else { @@ -342,7 +343,7 @@ int drm_master_open(struct drm_file *file_priv) file_priv->master = drm_master_get(dev->master); spin_unlock(&file_priv->master_lookup_lock); } - mutex_unlock(&dev->master_mutex); + up_write(&dev->master_rwsem); return ret; } @@ -352,7 +353,7 @@ void drm_master_release(struct drm_file *file_priv) struct drm_device *dev = file_priv->minor->dev; struct drm_master *master; - mutex_lock(&dev->master_mutex); + down_write(&dev->master_rwsem); master = file_priv->ma
[PATCH v4 3/5] drm: lock drm_global_mutex earlier in the ioctl handler
In a future patch, a read lock on drm_device.master_rwsem is held in the ioctl handler before the check for ioctl permissions. However, this inverts the lock hierarchy of drm_global_mutex --> master_rwsem. To avoid this, we do some prep work to grab the drm_global_mutex before checking for ioctl permissions. Signed-off-by: Desmond Cheong Zhi Xi --- drivers/gpu/drm/drm_ioctl.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index d25713b09b80..158629d88319 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -772,19 +772,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata, if (drm_dev_is_unplugged(dev)) return -ENODEV; + /* Enforce sane locking for modern driver ioctls. */ + if (unlikely(drm_core_check_feature(dev, DRIVER_LEGACY)) && !(flags & DRM_UNLOCKED)) + mutex_lock(&drm_global_mutex); + retcode = drm_ioctl_permit(flags, file_priv); if (unlikely(retcode)) - return retcode; + goto out; - /* Enforce sane locking for modern driver ioctls. */ - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) || - (flags & DRM_UNLOCKED)) - retcode = func(dev, kdata, file_priv); - else { - mutex_lock(&drm_global_mutex); - retcode = func(dev, kdata, file_priv); + retcode = func(dev, kdata, file_priv); + +out: + if (unlikely(drm_core_check_feature(dev, DRIVER_LEGACY)) && !(flags & DRM_UNLOCKED)) mutex_unlock(&drm_global_mutex); - } return retcode; } EXPORT_SYMBOL(drm_ioctl_kernel); -- 2.25.1
[PATCH v4 4/5] drm: avoid races with modesetting rights
In drm_client_modeset.c and drm_fb_helper.c, drm_master_internal_{acquire,release} are used to avoid races with DRM userspace. These functions hold onto drm_device.master_rwsem while committing, and bail if there's already a master. However, there are other places where modesetting rights can race. A time-of-check-to-time-of-use error can occur if an ioctl that changes the modeset has its rights revoked after it validates its permissions, but before it completes. There are four places where modesetting permissions can change: - DROP_MASTER ioctl removes rights for a master and its leases - REVOKE_LEASE ioctl revokes rights for a specific lease - SET_MASTER ioctl sets the device master if the master role hasn't been acquired yet - drm_open which can create a new master for a device if one does not currently exist These races can be avoided using drm_device.master_rwsem: users that perform modesetting should hold a read lock on the new drm_device.master_rwsem, and users that change these permissions should hold a write lock. To avoid deadlocks with master_rwsem, for ioctls that need to check for modesetting permissions, but also need to hold a write lock on master_rwsem to protect some other attribute (or recurses to some function that holds a write lock, like drm_mode_create_lease_ioctl which eventually calls drm_master_open), we remove the DRM_MASTER flag and push the master_rwsem lock and permissions check into the ioctl. Reported-by: Daniel Vetter Signed-off-by: Desmond Cheong Zhi Xi --- drivers/gpu/drm/drm_auth.c | 4 drivers/gpu/drm/drm_ioctl.c | 20 +++- drivers/gpu/drm/drm_lease.c | 35 --- include/drm/drm_device.h| 5 + 4 files changed, 48 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c index 73ade0513ccb..65065f7e1499 100644 --- a/drivers/gpu/drm/drm_auth.c +++ b/drivers/gpu/drm/drm_auth.c @@ -120,6 +120,10 @@ int drm_authmagic(struct drm_device *dev, void *data, DRM_DEBUG("%u\n", auth->magic); down_write(&dev->master_rwsem); + if (unlikely(!drm_is_current_master(file_priv))) { + up_write(&dev->master_rwsem); + return -EACCES; + } file = idr_find(&file_priv->master->magic_map, auth->magic); if (file) { file->authenticated = 1; diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 158629d88319..8bea39ffc5c0 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -386,6 +386,10 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f int if_version, retcode = 0; down_write(&dev->master_rwsem); + if (unlikely(!drm_is_current_master(file_priv))) { + retcode = -EACCES; + goto unlock; + } if (sv->drm_di_major != -1) { if (sv->drm_di_major != DRM_IF_MAJOR || sv->drm_di_minor < 0 || sv->drm_di_minor > DRM_IF_MINOR) { @@ -420,8 +424,9 @@ static int drm_setversion(struct drm_device *dev, void *data, struct drm_file *f sv->drm_di_minor = DRM_IF_MINOR; sv->drm_dd_major = dev->driver->major; sv->drm_dd_minor = dev->driver->minor; - up_write(&dev->master_rwsem); +unlock: + up_write(&dev->master_rwsem); return retcode; } @@ -574,12 +579,12 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0), DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SET_CLIENT_CAP, drm_setclientcap, 0), - DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER), + DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, 0), DRM_IOCTL_DEF(DRM_IOCTL_SET_UNIQUE, drm_invalid_op, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_BLOCK, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_IOCTL_DEF(DRM_IOCTL_UNBLOCK, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), - DRM_IOCTL_DEF(DRM_IOCTL_AUTH_MAGIC, drm_authmagic, DRM_MASTER), + DRM_IOCTL_DEF(DRM_IOCTL_AUTH_MAGIC, drm_authmagic, 0), DRM_LEGACY_IOCTL_DEF(DRM_IOCTL_ADD_MAP, drm_legacy_addmap_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY), DRM_LEGACY_IOCTL_DEF(DRM_IOCTL_RM_MAP, drm_legacy_rmmap_ioctl, DRM_AUTH), @@ -706,10 +711,10 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0), - DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER), + DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, DRM_MASTER),
[PATCH v4 5/5] drm: remove drm_file.master_lookup_lock
Previously, master_lookup_lock was introduced in commit 0b0860a3cf5e ("drm: serialize drm_file.master with a new spinlock") to serialize accesses to drm_file.master. This then allowed us to write drm_file_get_master in commit 56f0729a510f ("drm: protect drm_master pointers in drm_lease.c"). The rationale behind introducing a new spinlock at the time was that the other lock that could have been used (drm_device.master_mutex) was the outermost lock, so embedding calls to drm_file_get_master in various functions easily caused us to invert the lock hierarchy. For example, we would invert the master_mutex --> mode_config.idr_mutex hierarchy. Following the conversion of master_mutex into a rwsem, and its use to plug races with modesetting rights, we've untangled some lock hierarchies and removed the need for using drm_file_get_master and the unlocked version of drm_is_current_master in multiple places. Hence, we can take this opportunity to clean up the locking design. To do so, we rewrite drm_file_get_master and drm_is_current_master to use master_rwsem, clean up their uses, fix remaining inversions (such as with _drm_lease_held master grabbing master_rwsem while holding onto mode_config.idr_mutex), then remove master_lookup_lock. Signed-off-by: Desmond Cheong Zhi Xi --- drivers/gpu/drm/drm_auth.c| 19 ++-- drivers/gpu/drm/drm_file.c| 1 - drivers/gpu/drm/drm_internal.h| 1 + drivers/gpu/drm/drm_ioctl.c | 4 ++-- drivers/gpu/drm/drm_lease.c | 38 --- drivers/gpu/drm/drm_mode_object.c | 14 +--- include/drm/drm_file.h| 9 +--- include/drm/drm_lease.h | 2 +- 8 files changed, 32 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c index 65065f7e1499..ff720f67eacb 100644 --- a/drivers/gpu/drm/drm_auth.c +++ b/drivers/gpu/drm/drm_auth.c @@ -61,10 +61,9 @@ * trusted clients. */ -static bool drm_is_current_master_locked(struct drm_file *fpriv) +bool drm_is_current_master_locked(struct drm_file *fpriv) { - lockdep_assert_once(lockdep_is_held(&fpriv->master_lookup_lock) || - lockdep_is_held(&fpriv->minor->dev->master_rwsem)); + lockdep_assert_held_once(&fpriv->minor->dev->master_rwsem); return fpriv->is_master && drm_lease_owner(fpriv->master) == fpriv->minor->dev->master; } @@ -83,9 +82,9 @@ bool drm_is_current_master(struct drm_file *fpriv) { bool ret; - spin_lock(&fpriv->master_lookup_lock); + down_read(&fpriv->minor->dev->master_rwsem); ret = drm_is_current_master_locked(fpriv); - spin_unlock(&fpriv->master_lookup_lock); + up_read(&fpriv->minor->dev->master_rwsem); return ret; } @@ -120,7 +119,7 @@ int drm_authmagic(struct drm_device *dev, void *data, DRM_DEBUG("%u\n", auth->magic); down_write(&dev->master_rwsem); - if (unlikely(!drm_is_current_master(file_priv))) { + if (unlikely(!drm_is_current_master_locked(file_priv))) { up_write(&dev->master_rwsem); return -EACCES; } @@ -178,9 +177,7 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv) new_master = drm_master_create(dev); if (!new_master) return -ENOMEM; - spin_lock(&fpriv->master_lookup_lock); fpriv->master = new_master; - spin_unlock(&fpriv->master_lookup_lock); fpriv->is_master = 1; fpriv->authenticated = 1; @@ -343,9 +340,7 @@ int drm_master_open(struct drm_file *file_priv) if (!dev->master) { ret = drm_new_set_master(dev, file_priv); } else { - spin_lock(&file_priv->master_lookup_lock); file_priv->master = drm_master_get(dev->master); - spin_unlock(&file_priv->master_lookup_lock); } up_write(&dev->master_rwsem); @@ -410,13 +405,13 @@ struct drm_master *drm_file_get_master(struct drm_file *file_priv) { struct drm_master *master = NULL; - spin_lock(&file_priv->master_lookup_lock); + down_read(&file_priv->minor->dev->master_rwsem); if (!file_priv->master) goto unlock; master = drm_master_get(file_priv->master); unlock: - spin_unlock(&file_priv->master_lookup_lock); + up_read(&file_priv->minor->dev->master_rwsem); return master; } EXPORT_SYMBOL(drm_file_get_master); diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 90b62f360da1..8c846e0179d7 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -176,7 +176,6 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) init_waitqueue_head(&file->event_wait); file->event_space = 4096; /* set aside 4k for event buffer */ - spin_lock_init(&file->master_lookup_lock); mutex_init(&file->event_read_lock); if (drm_core_chec
[PATCH] drm/rockchip: dsi: Fix duplicate included linux/phy/phy.h
Clean up the following includecheck warning: ./drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c: linux/phy/phy.h is included more than once. Reported-by: Abaci Robot Fixes: 71f68fe7f121 ("drm/rockchip: dsi: add ability to work as a phy instead of full dsi") Signed-off-by: Jiapeng Chong --- drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c index a2262be..4a6c100 100644 --- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c +++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c @@ -14,7 +14,6 @@ #include #include #include -#include #include #include -- 1.8.3.1
Re: [PATCH v3 12/15] drm/mediatek: add display MDP RDMA support for MT8195
Hi, Nancy: On Wed, 2021-08-18 at 17:18 +0800, Nancy.Lin wrote: > Add MDP_RDMA driver for MT8195. MDP_RDMA is the DMA engine of > the ovl_adaptor component. > > Signed-off-by: Nancy.Lin > --- [snip] > + > +#define MDP_RDMA_EN0x000 > +#define FLD_ROT_ENABLEREG_FLD(1, 0) I would like the bitwise definition has one more indent than the byte definition. > + > +#define MDP_RDMA_RESET 0x008 > + > +#define MDP_RDMA_CON 0x020 > +#define FLD_OUTPUT_10BREG_FLD(1, 5) > +#define FLD_SIMPLE_MODE REG_FLD(1, 4) > + > +#define MDP_RDMA_GMCIF_CON 0x028 > +#define FLD_EXT_ULTRA_EN REG_FLD(1, 18) > +#define FLD_PRE_ULTRA_EN REG_FLD(2, 16) > +#define FLD_ULTRA_EN REG_FLD(2, 12) > +#define FLD_RD_REQ_TYPE REG_FLD(4, 4) > +#define VAL_RD_REQ_TYPE_BURST_8_ACCESS 7 > +#define FLD_EXT_PREULTRA_EN REG_FLD(1, 3) > +#define FLD_COMMAND_DIV REG_FLD(1, 0) > + > +#define MDP_RDMA_SRC_CON 0x030 > +#define FLD_OUTPUT_ARGB REG_FLD(1, 25) > +#define FLD_BIT_NUMBERREG_FLD(2, 18) > +#define FLD_UNIFORM_CONFIGREG_FLD(1, 17) > +#define FLD_SWAP REG_FLD(1, 14) > +#define FLD_SRC_FORMATREG_FLD(4, 0) > + > +#define MDP_RDMA_COMP_CON 0x038 > +#define FLD_AFBC_EN REG_FLD(1, 22) > +#define FLD_AFBC_YUV_TRANSFORMREG_FLD(1, 21) > +#define FLD_UFBDC_EN REG_FLD(1, 12) > + > +#define MDP_RDMA_MF_BKGD_SIZE_IN_BYTE 0x060 > +#define FLD_MF_BKGD_WBREG_FLD(23, 0) > + > +#define MDP_RDMA_MF_BKGD_SIZE_IN_PIXEL 0x068 > +#define FLD_MF_BKGD_WPREG_FLD(23, 0) > + > +#define MDP_RDMA_MF_SRC_SIZE 0x070 > +#define FLD_MF_SRC_H REG_FLD(15, 16) > +#define FLD_MF_SRC_W REG_FLD(15, 0) > + > +#define MDP_RDMA_MF_CLIP_SIZE 0x078 > +#define FLD_MF_CLIP_H REG_FLD(15, 16) > +#define FLD_MF_CLIP_W REG_FLD(15, 0) > + > +#define MDP_RDMA_TARGET_LINE 0x0a0 > +#define FLD_LINE_THRESHOLDREG_FLD(15, 17) > +#define FLD_TARGET_LINE_ENREG_FLD(1, 16) > + > +#define MDP_RDMA_SRC_OFFSET_0 0x118 > +#define FLD_SRC_OFFSET_0 REG_FLD(32, 0) > + > +#define MDP_RDMA_TRANSFORM_0 0x200 > +#define FLD_INT_MATRIX_SELREG_FLD(5, 23) > +#define FLD_TRANS_EN REG_FLD(1, 16) > + > +#define MDP_RDMA_UTRA_H_CON_0 0x248 > +#define FLD_PREUTRA_H_OFS_0 REG_FLD(10, 10) > + > +#define MDP_RDMA_UTRA_L_CON_0 0x250 > +#define FLD_PREUTRA_L_OFS_0 REG_FLD(10, 10) > + > +#define MDP_RDMA_SRC_BASE_00xf00 > +#define FLD_SRC_BASE_0REG_FLD(32, 0) > + > +#define RDMA_INPUT_SWAP BIT(14) > +#define RDMA_INPUT_10BIT BIT(18) > + [snip] > + > +static void mtk_mdp_rdma_fifo_config(void __iomem *base, struct cmdq_pkt > *cmdq_pkt, > + struct cmdq_client_reg *cmdq_base) > +{ > + unsigned int pre_ultra_h = 156; > + unsigned int pre_ultra_l = 104; Give the reason why this value. You could refer to merge [1]. [1] https://patchwork.kernel.org/project/linux-mediatek/patch/20210819022327.13040-13-jason-jh@mediatek.com/ > + unsigned int reg_mask; > + unsigned int reg_val; > + unsigned int reg; > + > + reg = MDP_RDMA_GMCIF_CON; > + reg_val = REG_FLD_VAL(FLD_RD_REQ_TYPE, VAL_RD_REQ_TYPE_BURST_8_ACCESS) | > + REG_FLD_VAL(FLD_COMMAND_DIV, 1) | > + REG_FLD_VAL(FLD_EXT_PREULTRA_EN, 1) | > + REG_FLD_VAL(FLD_ULTRA_EN, 0) | > + REG_FLD_VAL(FLD_PRE_ULTRA_EN, 1) | > + REG_FLD_VAL(FLD_EXT_ULTRA_EN, 1); > + reg_mask = REG_FLD_MASK(FLD_RD_REQ_TYPE) | > +REG_FLD_MASK(FLD_COMMAND_DIV) | > +REG_FLD_MASK(FLD_EXT_PREULTRA_EN) | > +REG_FLD_MASK(FLD_ULTRA_EN) | > +REG_FLD_MASK(FLD_PRE_ULTRA_EN) | > +REG_FLD_MASK(FLD_EXT_ULTRA_EN); > + mtk_ddp_write_mask(cm
[v1 1/2] drm/panel: boe-tv101wum-nl6: Support enabling a 3.3V rail
The auo,b101uan08.3 panel (already supported by this driver) has a 3.3V rail that needs to be turned on. For previous users of this panel this voltage was directly output by pmic. On a new user (the not-yet-upstream sc7180-trogdor-mrbland board) we need to turn the 3.3V rail on. Add support in the driver for this. Signed-off-by: yangcong --- drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c index db9d0b86d542..9a644433629e 100644 --- a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c +++ b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c @@ -45,6 +45,7 @@ struct boe_panel { const struct panel_desc *desc; enum drm_panel_orientation orientation; + struct regulator *pp3300; struct regulator *pp1800; struct regulator *avee; struct regulator *avdd; @@ -511,6 +512,7 @@ static int boe_panel_unprepare(struct drm_panel *panel) gpiod_set_value(boe->enable_gpio, 0); usleep_range(5000, 7000); regulator_disable(boe->pp1800); + regulator_disable(boe->pp3300); } else { gpiod_set_value(boe->enable_gpio, 0); usleep_range(500, 1000); @@ -518,6 +520,7 @@ static int boe_panel_unprepare(struct drm_panel *panel) regulator_disable(boe->avdd); usleep_range(5000, 7000); regulator_disable(boe->pp1800); + regulator_disable(boe->pp3300); } boe->prepared = false; @@ -536,6 +539,10 @@ static int boe_panel_prepare(struct drm_panel *panel) gpiod_set_value(boe->enable_gpio, 0); usleep_range(1000, 1500); + ret = regulator_enable(boe->pp3300); + if (ret < 0) + return ret; + ret = regulator_enable(boe->pp1800); if (ret < 0) return ret; @@ -767,6 +774,10 @@ static int boe_panel_add(struct boe_panel *boe) if (IS_ERR(boe->avee)) return PTR_ERR(boe->avee); + boe->pp3300 = devm_regulator_get(dev, "pp3300"); + if (IS_ERR(boe->pp3300)) + return PTR_ERR(boe->pp3300); + boe->pp1800 = devm_regulator_get(dev, "pp1800"); if (IS_ERR(boe->pp1800)) return PTR_ERR(boe->pp1800); -- 2.25.1
[PATCH] This is only used for testing upstream progress
We want to test the upstream progress. Signed-off-by: xuxinxiong --- drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c index dc49079a74d1..ce7437d1b682 100644 --- a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c +++ b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c @@ -4,6 +4,8 @@ * Author: Jitao Shi */ +/* This is used for test upstream progress */ + #include #include #include -- 2.25.1
RE: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
[Public] > -Original Message- > From: Lyude Paul > Sent: Thursday, August 19, 2021 2:59 AM > To: Lin, Wayne ; dri-devel@lists.freedesktop.org > Cc: Kazlauskas, Nicholas ; Wentland, Harry > ; Zuo, Jerry > ; Wu, Hersen ; Juston Li > ; Imre Deak ; > Ville Syrjälä ; Daniel Vetter > ; Sean Paul ; Maarten Lankhorst > ; Maxime Ripard ; > Thomas Zimmermann ; > David Airlie ; Daniel Vetter ; Deucher, > Alexander ; Siqueira, > Rodrigo ; Pillai, Aurabindo > ; Eryk Brol ; Bas > Nieuwenhuizen ; Cornij, Nikola > ; Jani Nikula ; Manasi > Navare ; Ankit Nautiyal > ; José Roberto de Souza ; > Sean Paul ; Ben Skeggs ; > sta...@vger.kernel.org > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end > device > > On Wed, 2021-08-11 at 09:49 +, Lin, Wayne wrote: > > [Public] > > > > > -Original Message- > > > From: Lyude Paul > > > Sent: Wednesday, August 11, 2021 4:45 AM > > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org > > > Cc: Kazlauskas, Nicholas ; Wentland, > > > Harry < harry.wentl...@amd.com>; Zuo, Jerry ; Wu, > > > Hersen ; Juston Li < juston...@intel.com>; Imre > > > Deak ; Ville Syrjälä > > > ; Daniel Vetter < > > > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten > > > Lankhorst ; Maxime Ripard > > > ; Thomas Zimmermann ; David > > > Airlie ; Daniel Vetter ; Deucher, > > > Alexander ; Siqueira, Rodrigo > > > ; Pillai, Aurabindo < > > > aurabindo.pil...@amd.com>; Eryk Brol ; Bas > > > Nieuwenhuizen ; Cornij, Nikola < > > > nikola.cor...@amd.com>; Jani Nikula ; Manasi > > > Navare ; Ankit Nautiyal < > > > ankit.k.nauti...@intel.com>; José Roberto de Souza > > > ; Sean Paul ; Ben > > > Skeggs ; sta...@vger.kernel.org > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for > > > connected end device > > > > > > On Wed, 2021-08-04 at 07:13 +, Lin, Wayne wrote: > > > > [Public] > > > > > > > > > -Original Message- > > > > > From: Lyude Paul > > > > > Sent: Wednesday, August 4, 2021 8:09 AM > > > > > To: Lin, Wayne ; > > > > > dri-devel@lists.freedesktop.org > > > > > Cc: Kazlauskas, Nicholas ; > > > > > Wentland, Harry < harry.wentl...@amd.com>; Zuo, Jerry > > > > > ; Wu, Hersen ; Juston Li > > > > > < juston...@intel.com>; Imre Deak ; Ville > > > > > Syrjälä ; Wentland, Harry < > > > > > harry.wentl...@amd.com>; Daniel Vetter ; > > > > > Sean Paul ; Maarten Lankhorst < > > > > > maarten.lankho...@linux.intel.com>; Maxime Ripard > > > > > ; Thomas Zimmermann ; > > > > > David Airlie ; Daniel Vetter > > > > > ; Deucher, Alexander > > > > > ; Siqueira, Rodrigo < > > > > > rodrigo.sique...@amd.com>; Pillai, Aurabindo > > > > > ; Eryk Brol ; Bas > > > > > Nieuwenhuizen ; Cornij, Nikola > > > > > ; Jani Nikula ; > > > > > Manasi Navare ; Ankit Nautiyal > > > > > ; José Roberto de Souza > > > > > ; Sean Paul ; Ben > > > > > Skeggs ; sta...@vger.kernel.org > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for > > > > > connected end device > > > > > > > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote: > > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote: > > > > > > > [Why] > > > > > > > Currently, we will create connectors for all output ports no > > > > > > > matter it's connected or not. However, in MST, we can only > > > > > > > determine whether an output port really stands for a "connector" > > > > > > > till it is connected and check its peer device type as an > > > > > > > end device. > > > > > > > > > > > > What is this commit trying to solve exactly? e.g. is AMD > > > > > > currently running into issues with there being too many DRM > > > > > > connectors or something like that? > > > > > > Ideally this is behavior I'd very much like us to keep as-is > > > > > > unless there's good reason to change it. > > > > Hi Lyude, > > > > Really appreciate for your time to elaborate in such detail. Thanks! > > > > > > > > I come up with this commit because I observed something confusing > > > > when I was analyzing MST connectors' life cycle. Take the topology > > > > instance you mentioned below > > > > > > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected > > > > w/ > > > > display) > > > > | > > > > - > > > > > Output_Port 2 (Disconnected) > > > > -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1 > > > > (Disconnected) > > > > > > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of > > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within > > > > this hub. With our MST implementation today, we'll create drm > > > > connectors for all output ports. Hence, we totally create 6 drm > > > > connectors here. > > > > However, Output ports of Root MSTB are not connected to a stream sink. > > > > They are connected with branch devices. > > > > Thus, creating drm connector for such port looks a bit strange to > > > > me and increases complexity to tracking drm connectors. My > > > > thought is we only need to create drm connector fo
Re: [PATCH v8 20/34] mmc: sdhci-tegra: Add runtime PM and OPP support
On Fri, Aug 20, 2021 at 01:37:13AM +0300, Dmitry Osipenko wrote: > 19.08.2021 20:03, Thierry Reding пишет: > > On Tue, Aug 17, 2021 at 04:27:40AM +0300, Dmitry Osipenko wrote: > >> The SDHCI on Tegra belongs to the core power domain and we're going to > >> enable GENPD support for the core domain. Now SDHCI must be resumed using > >> runtime PM API in order to initialize the SDHCI power state. The SDHCI > >> clock rate must be changed using OPP API that will reconfigure the power > >> domain performance state in accordance to the rate. Add runtime PM and OPP > >> support to the SDHCI driver. > >> > >> Signed-off-by: Dmitry Osipenko > >> --- > >> drivers/mmc/host/sdhci-tegra.c | 146 - > >> 1 file changed, 105 insertions(+), 41 deletions(-) > >> > >> diff --git a/drivers/mmc/host/sdhci-tegra.c > >> b/drivers/mmc/host/sdhci-tegra.c > >> index 387ce9cdbd7c..a3583359c972 100644 > >> --- a/drivers/mmc/host/sdhci-tegra.c > >> +++ b/drivers/mmc/host/sdhci-tegra.c > >> @@ -15,6 +15,8 @@ > >> #include > >> #include > >> #include > >> +#include > >> +#include > >> #include > >> #include > >> #include > >> @@ -24,6 +26,8 @@ > >> #include > >> #include > >> > >> +#include > >> + > >> #include "sdhci-pltfm.h" > >> #include "cqhci.h" > >> > >> @@ -123,6 +127,12 @@ > >> SDHCI_TRNS_BLK_CNT_EN | \ > >> SDHCI_TRNS_DMA) > >> > >> +enum { > >> + TEGRA_CLK_BULK_SDHCI, > >> + TEGRA_CLK_BULK_TMCLK, > >> + TEGRA_CLK_BULK_NUM, > >> +}; > >> + > >> struct sdhci_tegra_soc_data { > >>const struct sdhci_pltfm_data *pdata; > >>u64 dma_mask; > >> @@ -171,6 +181,8 @@ struct sdhci_tegra { > >>bool enable_hwcq; > >>unsigned long curr_clk_rate; > >>u8 tuned_tap_delay; > >> + > >> + struct clk_bulk_data clocks[TEGRA_CLK_BULK_NUM]; > > > > This doesn't seem worth it to me. There's a lot of churn in this driver > > that's only needed to convert this to the clk_bulk API and it makes the > > code a lot more difficult to read, in my opinion. > > > > It looks like the only benefit that this gives us is that runtime > > suspend and resume become a few lines shorter. > > The driver probe code looks cleaner with that. You should be looking at > the final result and not at the patch to see it. I did look at the final result and didn't find it cleaner at all. =) Thierry signature.asc Description: PGP signature
Re: [PATCH v8 07/34] clk: tegra: Support runtime PM and power domain
On Fri, Aug 20, 2021 at 01:09:46AM +0300, Dmitry Osipenko wrote: > 19.08.2021 19:54, Thierry Reding пишет: > > On Wed, Aug 18, 2021 at 08:11:03PM +0300, Dmitry Osipenko wrote: > >> 18.08.2021 19:42, Thierry Reding пишет: > >>> On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote: > 18.08.2021 17:07, Thierry Reding пишет: > > On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote: > > [...] > >> +struct clk *tegra_clk_register(struct clk_hw *hw) > >> +{ > >> + struct platform_device *pdev; > >> + struct device *dev = NULL; > >> + struct device_node *np; > >> + const char *dev_name; > >> + > >> + np = tegra_clk_get_of_node(hw); > >> + > >> + if (!of_device_is_available(np)) > >> + goto put_node; > >> + > >> + dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", > >> hw->init->name); > >> + if (!dev_name) > >> + goto put_node; > >> + > >> + pdev = of_platform_device_create(np, dev_name, NULL); > >> + if (!pdev) { > >> + pr_err("%s: failed to create device for %pOF\n", > >> __func__, np); > >> + kfree(dev_name); > >> + goto put_node; > >> + } > >> + > >> + dev = &pdev->dev; > >> + pm_runtime_enable(dev); > >> +put_node: > >> + of_node_put(np); > >> + > >> + return clk_register(dev, hw); > >> +} > > > > This looks wrong. Why do we need struct platform_device objects for each > > of these clocks? That's going to be a massive amount of platform devices > > and they will completely mess up sysfs. > > RPM works with a device. It's not a massive amount of devices, it's one > device for T20 and four devices for T30. > >>> > >>> I'm still not sure I understand why we need to call RPM functions on a > >>> clock. And even if they are few, it seems wrong to make these platform > >>> devices. > >> > >> Before clock is enabled, we need to raise core voltage. After clock is > >> disabled, the voltage should be dropped. CCF+RPM takes care of handling > >> this for us. > > > > That's the part that I do understand. What I don't understand is why a > > clock needs to be runtime suspend/resumed. Typically we suspend/resume > > devices, and doing so typically involves disabling/enabling clocks. So > > I don't understand why the clocks themselves now need to be runtime > > suspended/resumed. > > CCF provides RPM management for a device that backs clock. When clock > is enabled, it resumes the backing device. > > RPM, GENPD and OPP frameworks work with a device. We use all these > frameworks here. Since we don't have a dedicated device for a PLL > clock, we need to create it in order to leverage the existing generic > kernel APIs. > > In this case clocks are not runtime suspended/resumed, the device > which backs clock is suspended/resumed. > > >>> Perhaps they can be simple struct device:s instead? Ideally they would > >>> also be parented to the CAR so that they appear in the right place in > >>> the sysfs hierarchy. > >> > >> Could you please clarify what do you mean by 'simple struct device:s'? > >> These clock devices should be OF devices with a of_node and etc, > >> otherwise we can't use OPP framework. > > > > Perhaps I misunderstand the goal of the OPP framework. My understanding > > was that this was to attach a table of operating points with a device so > > that appropriate operating points could be selected and switched to when > > the workload changes. > > > > Typically these operating points would be roughly a clock rate and a > > corresponding voltage for a regulator, so that when a certain clock rate > > is requested, the regulator can be set to the matching voltage. > > > > Hm... so is it that each of these clocks that you want to create a > > platform device for has its own regulator? Because the patch series only > > mentions the CORE domain, so I assumed that we would accumulate all the > > clock rates for the clocks that are part of that CORE domain and then > > derive a voltage to be supplied to that CORE domain. > > > > But perhaps I just don't understand correctly how this is tied together. > > We don't use regulators, we use power domain that controls regulator. > GENPD takes care of accumulating performance requests on a per-device > basis. > > I'm creating platform device for the clocks that require DVFS. These > clocks don't use regulator, they are attached to the CORE domain. > GENPD framework manages the performance state, aggregating perf votes > from each device, i.e. from each clock individually. > > You want to reinvent another layer of aggregation on top of GENPD. > This doesn't worth the effort, we won't get anything from it, it > should be a lot of extra complexity for nothing. We will also lose > from it because pm_genpd_summary won't show you a per-device inf
Adding sync_shrinkers function for TTM pool optimization
Hi Andrew, Daniel suggested that I ping you once more about this. Basically we want to add a barrier function to make sure that our TTM pool shrinker is not freeing up pages from a device while the device is being unplugged. Currently we are having a global mutex to serialize all of this, but this caused contention for unmapping the freed pages in the IOMMU. We just need your Acked-by and I hope my explanation is now more understandable than the last time. Cheers, Christian.
[PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
Switch back to using a spinlock again by moving the IOMMU unmap outside of the locked region. This avoids contention especially while freeing pages. v2: Add a comment explaining why we need sync_shrinkers(). Signed-off-by: Christian König Acked-by: Huang Rui Reviewed-by: Daniel Vetter --- drivers/gpu/drm/ttm/ttm_pool.c | 40 +++--- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index cb38b1a17b09..7d4f76d4141d 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER]; static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER]; static struct ttm_pool_type global_dma32_uncached[MAX_ORDER]; -static struct mutex shrinker_lock; +static spinlock_t shrinker_lock; static struct list_head shrinker_list; static struct shrinker mm_shrinker; @@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool, spin_lock_init(&pt->lock); INIT_LIST_HEAD(&pt->pages); - mutex_lock(&shrinker_lock); + spin_lock(&shrinker_lock); list_add_tail(&pt->shrinker_list, &shrinker_list); - mutex_unlock(&shrinker_lock); + spin_unlock(&shrinker_lock); } /* Remove a pool_type from the global shrinker list and free all pages */ @@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt) { struct page *p; - mutex_lock(&shrinker_lock); + spin_lock(&shrinker_lock); list_del(&pt->shrinker_list); - mutex_unlock(&shrinker_lock); + spin_unlock(&shrinker_lock); while ((p = ttm_pool_type_take(pt))) ttm_pool_free_page(pt->pool, pt->caching, pt->order, p); @@ -313,24 +313,23 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool, static unsigned int ttm_pool_shrink(void) { struct ttm_pool_type *pt; - unsigned int num_freed; + unsigned int num_pages; struct page *p; - mutex_lock(&shrinker_lock); + spin_lock(&shrinker_lock); pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list); + list_move_tail(&pt->shrinker_list, &shrinker_list); + spin_unlock(&shrinker_lock); p = ttm_pool_type_take(pt); if (p) { ttm_pool_free_page(pt->pool, pt->caching, pt->order, p); - num_freed = 1 << pt->order; + num_pages = 1 << pt->order; } else { - num_freed = 0; + num_pages = 0; } - list_move_tail(&pt->shrinker_list, &shrinker_list); - mutex_unlock(&shrinker_lock); - - return num_freed; + return num_pages; } /* Return the allocation order based for a page */ @@ -530,6 +529,11 @@ void ttm_pool_fini(struct ttm_pool *pool) for (j = 0; j < MAX_ORDER; ++j) ttm_pool_type_fini(&pool->caching[i].orders[j]); } + + /* We removed the pool types from the LRU, but we need to also make sure +* that no shrinker is concurrently freeing pages from the pool. +*/ + sync_shrinkers(); } /* As long as pages are available make sure to release at least one */ @@ -604,7 +608,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data) { ttm_pool_debugfs_header(m); - mutex_lock(&shrinker_lock); + spin_lock(&shrinker_lock); seq_puts(m, "wc\t:"); ttm_pool_debugfs_orders(global_write_combined, m); seq_puts(m, "uc\t:"); @@ -613,7 +617,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data) ttm_pool_debugfs_orders(global_dma32_write_combined, m); seq_puts(m, "uc 32\t:"); ttm_pool_debugfs_orders(global_dma32_uncached, m); - mutex_unlock(&shrinker_lock); + spin_unlock(&shrinker_lock); ttm_pool_debugfs_footer(m); @@ -640,7 +644,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m) ttm_pool_debugfs_header(m); - mutex_lock(&shrinker_lock); + spin_lock(&shrinker_lock); for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) { seq_puts(m, "DMA "); switch (i) { @@ -656,7 +660,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m) } ttm_pool_debugfs_orders(pool->caching[i].orders, m); } - mutex_unlock(&shrinker_lock); + spin_unlock(&shrinker_lock); ttm_pool_debugfs_footer(m); return 0; @@ -693,7 +697,7 @@ int ttm_pool_mgr_init(unsigned long num_pages) if (!page_pool_size) page_pool_size = num_pages; - mutex_init(&shrinker_lock); + spin_lock_init(&shrinker_lock); INIT_LIST_HEAD(&shrinker_list); for (i = 0; i < MAX_ORDER; ++i) { -- 2.25.1
[PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
From: Christian König While unplugging a device the TTM shrinker implementation needs a barrier to make sure that all concurrent shrink operations are done and no other CPU is referring to a device specific pool any more. Taking and releasing the shrinker semaphore on the write side after unmapping and freeing all pages from the device pool should make sure that no shrinker is running in paralell. This allows us to avoid the contented mutex in the TTM pool implementation for every alloc/free operation. v2: rework the commit message to make clear why we need this Signed-off-by: Christian König Acked-by: Huang Rui Reviewed-by: Daniel Vetter --- include/linux/shrinker.h | 1 + mm/vmscan.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 9814fff58a69..1de17f53cdbc 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker); extern int register_shrinker(struct shrinker *shrinker); extern void unregister_shrinker(struct shrinker *shrinker); extern void free_prealloced_shrinker(struct shrinker *shrinker); +extern void sync_shrinkers(void); #endif diff --git a/mm/vmscan.c b/mm/vmscan.c index 4620df62f0ff..fde1aabcfa7f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker) } EXPORT_SYMBOL(unregister_shrinker); +/** + * sync_shrinker - Wait for all running shrinkers to complete. + */ +void sync_shrinkers(void) +{ + down_write(&shrinker_rwsem); + up_write(&shrinker_rwsem); +} +EXPORT_SYMBOL(sync_shrinkers); + #define SHRINK_BATCH 128 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, -- 2.25.1
Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
Hi all, I just submit a v3 patch according your opinion on using kthread_park instead. Thanks, Jingwen On Fri Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote: > No, that perfectly works for me. > > The problem we used to have with this approach was that we potentially have > multiple timeouts at the same time. > > But when we serialize the timeout handling by using a single workqueue as > suggested by Daniel now as well then that isn't an issue any more. > > Regards, > Christian. > > Am 20.08.21 um 09:12 schrieb Liu, Monk: > > [AMD Official Use Only] > > > > @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian > > Do you have any concern on the kthread_park() approach ? > > > > Theoretically speaking sched_main shall run there exclusively with > > job_timeout since they both touches jobs, and stop scheduler during > > job_timeout won't impact performance since in that scenario > > There was already something wrong/stuck on that ring/scheduler > > > > Thanks > > > > -- > > Monk Liu | Cloud-GPU Core team > > -- > > > > -Original Message- > > From: Liu, Monk > > Sent: Thursday, August 19, 2021 6:26 PM > > To: Daniel Vetter ; Grodzovsky, Andrey > > > > Cc: Alex Deucher ; Chen, JingWen > > ; Maling list - DRI developers > > ; amd-gfx list > > ; Koenig, Christian > > > > Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad > > job." > > > > [AMD Official Use Only] > > > > Hi Daniel > > > > > > Why can't we stop the scheduler thread first, so that there's > > > > guaranteed no race? I've recently had a lot of discussions with > > > > panfrost folks about their reset that spawns across engines, and > > > > without stopping the scheduler thread first before you touch anything > > > > it's just plain impossible. > > Yeah we had this though as well in our mind. > > > > Our second approach is to call ktrhead_stop() in job_timedout() routine so > > that the "bad" job is guaranteed to be used without scheduler's touching > > or freeing, Check this sample patch one as well please: > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > b/drivers/gpu/drm/scheduler/sched_main.c > > index a2a9536..50a49cb 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct > > *work) > > sched = container_of(work, struct drm_gpu_scheduler, > > work_tdr.work); > > /* Protects against concurrent deletion in > > drm_sched_get_cleanup_job */ > > + kthread_park(sched->thread); > > spin_lock(&sched->job_list_lock); > > job = list_first_entry_or_null(&sched->pending_list, > > struct drm_sched_job, list); > > if (job) { > > - /* > > -* Remove the bad job so it cannot be freed by concurrent > > -* drm_sched_cleanup_jobs. It will be reinserted back after > > sched->thread > > -* is parked at which point it's safe. > > -*/ > > - list_del_init(&job->list); > > spin_unlock(&sched->job_list_lock); > > status = job->sched->ops->timedout_job(job); > > @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct > > *work) > > } else { > > spin_unlock(&sched->job_list_lock); > > } > > + kthread_unpark(sched->thread); > > if (status != DRM_GPU_SCHED_STAT_ENODEV) { > > spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ > > void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job > > *bad) > > kthread_park(sched->thread); > > /* > > -* Reinsert back the bad job here - now it's safe as > > -* drm_sched_get_cleanup_job cannot race against us and release the > > -* bad job at this point - we parked (waited for) any in progress > > -* (earlier) cleanups and drm_sched_get_cleanup_job will not be > > called > > -* now until the scheduler thread is unparked. > > -*/ > > - if (bad && bad->sched == sched) > > - /* > > -* Add at the head of the queue to reflect it was the > > earliest > > -* job extracted. > > -*/ > > - list_add(&bad->list, &sched->pending_list); > > - > > - /* > > * Iterate the job list from later to earlier one and either > > deactive > > * their HW callbacks or remove them from pending list if they > > already > > * signaled. > > > > > > Thanks > > > > -- > > Monk Liu | Cloud-GPU Core team > > -- > > > > -Original Message- > > From: Daniel Vetter > > Sent: Thursday, Au
Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs
Christophe Leroy writes: > Le 20/08/2021 à 09:49, Michael Ellerman a écrit : >> Kees Cook writes: >>> In preparation for FORTIFY_SOURCE performing compile-time and run-time >>> field bounds checking for memset(), avoid intentionally writing across >>> neighboring fields. >>> >>> Add a struct_group() for the spe registers so that memset() can correctly >>> reason >>> about the size: >>> >>> In function 'fortify_memset_chk', >>> inlined from 'restore_user_regs.part.0' at >>> arch/powerpc/kernel/signal_32.c:539:3: > include/linux/fortify-string.h:195:4: error: call to > '__write_overflow_field' declared with attribute warning: detected write > beyond size of field (1st parameter); maybe use struct_group()? > [-Werror=attribute-warning] >>> 195 |__write_overflow_field(); >>> |^~~~ >>> >>> Cc: Michael Ellerman >>> Cc: Benjamin Herrenschmidt >>> Cc: Paul Mackerras >>> Cc: Christophe Leroy >>> Cc: Sudeep Holla >>> Cc: linuxppc-...@lists.ozlabs.org >>> Reported-by: kernel test robot >>> Signed-off-by: Kees Cook >>> --- >>> arch/powerpc/include/asm/processor.h | 6 -- >>> arch/powerpc/kernel/signal_32.c | 6 +++--- >>> 2 files changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/powerpc/include/asm/processor.h >>> b/arch/powerpc/include/asm/processor.h >>> index f348e564f7dd..05dc567cb9a8 100644 >>> --- a/arch/powerpc/include/asm/processor.h >>> +++ b/arch/powerpc/include/asm/processor.h >>> @@ -191,8 +191,10 @@ struct thread_struct { >>> int used_vsr; /* set if process has used VSX */ >>> #endif /* CONFIG_VSX */ >>> #ifdef CONFIG_SPE >>> - unsigned long evr[32];/* upper 32-bits of SPE regs */ >>> - u64 acc;/* Accumulator */ >>> + struct_group(spe, >>> + unsigned long evr[32];/* upper 32-bits of SPE regs */ >>> + u64 acc;/* Accumulator */ >>> + ); >>> unsigned long spefscr;/* SPE & eFP status */ >>> unsigned long spefscr_last; /* SPEFSCR value on last prctl >>>call or trap return */ >>> diff --git a/arch/powerpc/kernel/signal_32.c >>> b/arch/powerpc/kernel/signal_32.c >>> index 0608581967f0..77b86caf5c51 100644 >>> --- a/arch/powerpc/kernel/signal_32.c >>> +++ b/arch/powerpc/kernel/signal_32.c >>> @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs, >>> regs_set_return_msr(regs, regs->msr & ~MSR_SPE); >>> if (msr & MSR_SPE) { >>> /* restore spe registers from the stack */ >>> - unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs, >>> - ELF_NEVRREG * sizeof(u32), failed); >>> + unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs, >>> + sizeof(current->thread.spe), failed); >> >> This makes me nervous, because the ABI is that we copy ELF_NEVRREG * >> sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to >> be. >> >> ie. if we use sizeof an inadvertent change to the fields in >> thread_struct could change how many bytes we copy out to userspace, >> which would be an ABI break. >> >> And that's not that hard to do, because it's not at all obvious that the >> size and layout of fields in thread_struct affects the user ABI. >> >> At the same time we don't want to copy the right number of bytes but >> the wrong content, so from that point of view using sizeof is good :) >> >> The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that >> things match up, so maybe we should do that here too. >> >> ie. add: >> >> BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32)); > > You mean != I guess ? Gah. Yes I do :) cheers
Re: [RFC] Make use of non-dynamic dmabuf in RDMA
On Fri, Aug 20, 2021 at 09:25:30AM +0200, Daniel Vetter wrote: > On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe wrote: > > On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote: > > > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman wrote: > > > > > > > > Hey all, > > > > > > > > Currently, the RDMA subsystem can only work with dynamic dmabuf > > > > attachments, which requires the RDMA device to support on-demand-paging > > > > (ODP) which is not common on most devices (only supported by mlx5). > > > > > > > > While the dynamic requirement makes sense for certain GPUs, some devices > > > > (such as habanalabs) have device memory that is always "pinned" and do > > > > not need/use the move_notify operation. > > > > > > > > The motivation of this RFC is to use habanalabs as the dmabuf exporter, > > > > and EFA as the importer to allow for peer2peer access through > > > > libibverbs. > > > > > > > > This draft patch changes the dmabuf driver to differentiate between > > > > static/dynamic attachments by looking at the move_notify op instead of > > > > the importer_ops struct, and allowing the peer2peer flag to be enabled > > > > in case of a static exporter. > > > > > > > > Thanks > > > > > > > > Signed-off-by: Gal Pressman > > > > > > Given that habanalabs dma-buf support is very firmly in limbo (at > > > least it's not yet in linux-next or anywhere else) I think you want to > > > solve that problem first before we tackle the additional issue of > > > making p2p work without dynamic dma-buf. Without that it just doesn't > > > make a lot of sense really to talk about solutions here. > > > > I have been thinking about adding a dmabuf exporter to VFIO, for > > basically the same reason habana labs wants to do it. > > > > In that situation we'd want to see an approach similar to this as well > > to have a broad usability. > > > > The GPU drivers also want this for certain sophisticated scenarios > > with RDMA, the intree drivers just haven't quite got there yet. > > > > So, I think it is worthwhile to start thinking about this regardless > > of habana labs. > > Oh sure, I've been having these for a while. I think there's two options: > - some kind of soft-pin, where the contract is that we only revoke > when absolutely necessary, and it's expected to be catastrophic on the > importer's side. Honestly, I'm not very keen on this. We don't really have HW support in several RDMA scenarios for even catastrophic unpin. Gal, can EFA even do this for a MR? You basically have to resize the rkey/lkey to zero length (or invalidate it like a FMR) under the catstrophic revoke. The rkey/lkey cannot just be destroyed as that opens a security problem with rkey/lkey re-use. I think I saw EFA's current out of tree implementations had this bug. > to do is mmap revoke), and I think that model of exclusive device > ownership with the option to revoke fits pretty well for at least some > of the accelerators floating around. In that case importers would > never get a move_notify (maybe we should call this revoke_notify to > make it clear it's a bit different) callback, except when the entire > thing has been yanked. I think that would fit pretty well for VFIO, > and I think we should be able to make it work for rdma too as some > kind of auto-deregister. The locking might be fun with both of these > since I expect some inversions compared to the register path, we'll > have to figure these out. It fits semantically nicely, VFIO also has a revoke semantic for BAR mappings. The challenge is the RDMA side which doesn't have a 'dma disabled error state' for objects as part of the spec. Some HW, like mlx5, can implement this for MR objects (see revoke_mr), but I don't know if anything else can, and even mlx5 currently can't do a revoke for any other object type. I don't know how useful it would be, need to check on some of the use cases. The locking is tricky as we have to issue a device command, but that device command cannot run concurrently with destruction or the tail part of creation. Jason
Re: [PATCH v2 56/63] RDMA/mlx5: Use struct_group() to zero struct mlx5_ib_mr
On Thu, Aug 19, 2021 at 11:14:37AM -0700, Kees Cook wrote: > Which do you mean? When doing the conversions I tended to opt for > struct_group() since it provides more robust "intentionality". Strictly > speaking, the new memset helpers are doing field-spanning writes, but the > "clear to the end" pattern was so common it made sense to add the helpers, > as they're a bit less disruptive. It's totally up to you! :) Well, of the patches you cc'd to me only this one used the struct group.. Jason
Re: [PATCH v8 01/34] opp: Add dev_pm_opp_sync() helper
On Thu, 19 Aug 2021 at 21:35, Dmitry Osipenko wrote: > > 19.08.2021 16:07, Ulf Hansson пишет: > > On Wed, 18 Aug 2021 at 17:43, Dmitry Osipenko wrote: > >> > >> 18.08.2021 13:08, Ulf Hansson пишет: > >>> On Wed, 18 Aug 2021 at 11:50, Viresh Kumar > >>> wrote: > > On 18-08-21, 11:41, Ulf Hansson wrote: > > On Wed, 18 Aug 2021 at 11:14, Viresh Kumar > > wrote: > >> What we need here is just configure. So something like this then: > >> > >> - genpd->get_performance_state() > >> -> dev_pm_opp_get_current_opp() //New API > >> -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate); > >> > >> This can be done just once from probe() then. > > > > How would dev_pm_opp_get_current_opp() work? Do you have a suggestion? > > The opp core already has a way of finding current OPP, that's what > Dmitry is trying to use here. It finds it using clk_get_rate(), if > that is zero, it picks the lowest freq possible. > > > I am sure I understand the problem. When a device is getting probed, > > it needs to consume power, how else can the corresponding driver > > successfully probe it? > > Dmitry can answer that better, but a device doesn't necessarily need > to consume energy in probe. It can consume bus clock, like APB we > have, but the more energy consuming stuff can be left disabled until > the time a user comes up. Probe will just end up registering the > driver and initializing it. > >>> > >>> That's perfectly fine, as then it's likely that it won't vote for an > >>> OPP, but can postpone that as well. > >>> > >>> Perhaps the problem is rather that the HW may already carry a non-zero > >>> vote made from a bootloader. If the consumer driver tries to clear > >>> that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would > >>> still not lead to any updates of the performance state in genpd, > >>> because genpd internally has initialized the performance-state to > >>> zero. > >> > >> We don't need to discover internal SoC devices because we use > >> device-tree on ARM. For most devices power isn't required at a probe > >> time because probe function doesn't touch h/w at all, thus devices are > >> left in suspended state after probe. > >> > >> We have three components comprising PM on Tegra: > >> > >> 1. Power gate > >> 2. Clock state > >> 3. Voltage state > >> > >> GENPD on/off represents the 'power gate'. > >> > >> Clock and reset are controlled by device drivers using clk and rst APIs. > >> > >> Voltage state is represented by GENPD's performance level. > >> > >> GENPD core assumes that at a first rpm-resume of a consumer device, its > >> genpd_performance=0. Not true for Tegra because h/w of the device is > >> preconfigured to a non-zero perf level initially, h/w may not support > >> zero level at all. > > > > I think you may be misunderstanding genpd's behaviour around this, but > > let me elaborate. > > > > In genpd_runtime_resume(), we try to restore the performance state for > > the device that genpd_runtime_suspend() *may* have dropped earlier. > > That means, if genpd_runtime_resume() is called prior > > genpd_runtime_suspend() for the first time, it means that > > genpd_runtime_resume() will *not* restore a performance state, but > > instead just leave the performance state as is for the device (see > > genpd_restore_performance_state()). > > > > In other words, a consumer driver may use the following sequence to > > set an initial performance state for the device during ->probe(): > > > > ... > > rate = clk_get_rate() > > dev_pm_opp_set_rate(rate) > > > > pm_runtime_enable() > > pm_runtime_resume_and_get() > > ... > > > > Note that, it's the consumer driver's responsibility to manage device > > specific resources, in its ->runtime_suspend|resume() callbacks. > > Typically that means dealing with clock gating/ungating, for example. > > > > In the other scenario where a consumer driver prefers to *not* call > > pm_runtime_resume_and_get() in its ->probe(), because it doesn't need > > to power on the device to complete probing, then we don't want to vote > > for an OPP at all - and we also want the performance state for the > > device in genpd to be set to zero. Correct? > > Yes > > > Is this the main problem you are trying to solve, because I think this > > doesn't work out of the box as of today? > > The main problem is that the restored performance state is zero for the > first genpd_runtime_resume(), while it's not zero from the h/w perspective. This should not be a problem, but can be handled by the consumer driver. genpd_runtime_resume() calls genpd_restore_performance_state() to restore a performance state for the device. However, in the scenario you describe, "gpd_data->rpm_pstate" is zero, which makes genpd_restore_performance_state() to just leave the device's performance state as is - it will *not* restore the performance state to zero. To make the consu
Re: [PATCH v8 01/34] opp: Add dev_pm_opp_sync() helper
On Fri, 20 Aug 2021 at 07:18, Viresh Kumar wrote: > > On 19-08-21, 16:55, Ulf Hansson wrote: > > Right, that sounds reasonable. > > > > We already have pm_genpd_opp_to_performance_state() which translates > > an OPP to a performance state. This function invokes the > > ->opp_to_performance_state() for a genpd. Maybe we need to allow a > > genpd to not have ->opp_to_performance_state() callback assigned > > though, but continue up in the hierarchy to see if the parent has the > > callback assigned, to make this work for Tegra? > > > > Perhaps we should add an API dev_pm_genpd_opp_to_performance_state(), > > allowing us to pass the device instead of the genpd. But that's a > > minor thing. > > I am not concerned a lot about how it gets implemented, and am not > sure as well, as I haven't looked into these details since sometime. > Any reasonable thing will be accepted, as simple as that. > > > Finally, the precondition to use the above, is to first get a handle > > to an OPP table. This is where I am struggling to find a generic > > solution, because I guess that would be platform or even consumer > > driver specific for how to do this. And at what point should we do > > this? > > Hmm, I am not very clear with the whole picture at this point of time. > > Dmitry, can you try to frame a sequence of events/calls/etc that will > define what kind of devices we are looking at here, and how this can > be made to work ? > > > > > Viresh, please take a look at what I did in [1]. Maybe it could be done > > > > in another way. > > > > > > I looked into this and looked like too much trouble. The > > > implementation needs to be simple. I am not sure I understand all the > > > problems you faced while doing that, would be better to start with a > > > simpler implementation of get_performance_state() kind of API for > > > genpd, after the domain is attached and its OPP table is initialized. > > > > > > Note, that the OPP table isn't required to be fully initialized for > > > the device at this point, we can parse the DT as well if needed be. > > > > Sure, but as I indicated above, you need some kind of input data to > > figure out what OPP table to pick, before you can translate that into > > a performance state. Is that always the clock rate, for example? > > Eventually it can be clock, bandwidth, or pstate of anther genpd, not > sure what all we are looking for now. It should be just clock right > now as far as I can imagine :) > > > Perhaps, we should start with adding a dev_pm_opp_get_from_rate() or > > what do you think? Do you have other suggestions? > > We already have similar APIs, so that won't be a problem. We also have > a mechanism inside the OPP core, frequency based, which is used to > guess the current OPP. Maybe we can enhance and use that directly > here. After reading the last reply from Dmitry, I am starting to think that the problem he is facing can be described and solved in a much easier way. If I am correct, it looks like we don't need to add APIs to get OPPs for a clock rate or set initial performance state values according to the HW in genpd. See my other response to Dmitry, let's see where that leads us. Kind regards Uffe
Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region
On Tue, 17 Aug 2021, Kees Cook wrote: > In preparation for FORTIFY_SOURCE performing compile-time and run-time > field bounds checking for memcpy(), memmove(), and memset(), avoid > intentionally writing across neighboring fields. > > Use struct_group() in struct cp2112_string_report around members report, > length, type, and string, so they can be referenced together. This will > allow memcpy() and sizeof() to more easily reason about sizes, improve > readability, and avoid future warnings about writing beyond the end of > report. > > "pahole" shows no size nor member offset changes to struct > cp2112_string_report. "objdump -d" shows no meaningful object > code changes (i.e. only source line number induced differences.) > > Cc: Jiri Kosina > Cc: Benjamin Tissoires > Cc: linux-in...@vger.kernel.org > Signed-off-by: Kees Cook Applied, thanks. -- Jiri Kosina SUSE Labs
Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event
On Tue, 17 Aug 2021, Kees Cook wrote: > In preparation for FORTIFY_SOURCE performing compile-time and run-time > field bounds checking for memset(), avoid intentionally writing across > neighboring fields. > > Add struct_group() to mark region of struct kone_mouse_event that should > be initialized to zero. > > Cc: Stefan Achatz > Cc: Jiri Kosina > Cc: Benjamin Tissoires > Cc: linux-in...@vger.kernel.org > Signed-off-by: Kees Cook Applied, thank you Kees. -- Jiri Kosina SUSE Labs
Re: [PATCH v8 07/34] clk: tegra: Support runtime PM and power domain
[...] > > > > I'm creating platform device for the clocks that require DVFS. These > > clocks don't use regulator, they are attached to the CORE domain. > > GENPD framework manages the performance state, aggregating perf votes > > from each device, i.e. from each clock individually. > > > > You want to reinvent another layer of aggregation on top of GENPD. > > This doesn't worth the effort, we won't get anything from it, it > > should be a lot of extra complexity for nothing. We will also lose > > from it because pm_genpd_summary won't show you a per-device info. > > > > domain status children > >performance > > /device runtime status > > -- > > heg on > >100 > > /devices/soc0/5000.host1x active > >100 > > /devices/soc0/5000.host1x/5414.gr2d suspended > >0 > > mpe off-0 > >0 > > vdecoff-0 > >0 > > /devices/soc0/6001a000.vde suspended > >0 > > vencoff-0 > >0 > > 3d1 off-0 > >0 > > /devices/genpd:1:5418.gr3d suspended > >0 > > 3d0 off-0 > >0 > > /devices/genpd:0:5418.gr3d suspended > >0 > > core-domain on > >100 > > 3d0, 3d1, venc, vdec, mpe, > > heg > > /devices/soc0/7d00.usb active > >100 > > /devices/soc0/78000400.mmc active > >95 > > /devices/soc0/7000f400.memory-controllerunsupported > >100 > > /devices/soc0/7000a000.pwm active > >100 > > /devices/soc0/60006000.clock/tegra_clk_pll_cactive > >100 > > /devices/soc0/60006000.clock/tegra_clk_pll_esuspended > >0 > > /devices/soc0/60006000.clock/tegra_clk_pll_mactive > >100 > > /devices/soc0/60006000.clock/tegra_clk_sclk active > >100 > > > > I suppose if there's really no good way of doing this other than > providing a struct device, then so be it. I think the cleaned up sysfs > shown in the summary above looks much better than what the original > would've looked like. > > Perhaps an additional tweak to that would be to not create platform > devices. Instead, just create struct device. Those really have > everything you need (.of_node, and can be used with RPM and GENPD). As I > mentioned earlier, platform device implies a CPU-memory-mapped bus, > which this clearly isn't. It's kind of a separate "bus" if you want, so > just using struct device directly seems more appropriate. Just a heads up. If you don't use a platform device or have a driver associated with it for probing, you need to manage the attachment to genpd yourself. That means calling one of the dev_pm_domain_attach*() APIs, but that's perfectly fine, ofcourse. > > We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch] > for an example of how that was done. I think you can do something > similar here. > > Thierry Kind regards Uffe
Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
I believe we have some minor confusion here On 2021-08-20 4:09 a.m., Jingwen Chen wrote: Hi all, I just submit a v3 patch according your opinion on using kthread_park instead. Thanks, Jingwen On Fri Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote: No, that perfectly works for me. The problem we used to have with this approach was that we potentially have multiple timeouts at the same time. But when we serialize the timeout handling by using a single workqueue as suggested by Daniel now as well then that isn't an issue any more. While we do use single work queue by default (system_wq) for this, we use different work items, one per scheduler which means they still run in parallel. I didn't see the original mail by Daniel but from what Christian mentioned I assume he suggested to serialize all TO handlers from all possible engines by either using single work item for TO handler or by using single threaded queue for all TO handlers. So i believe it's premature to send V3 patch without also switching all TDR handling to actual single threaded handling per entire ASIC or in case of amdgpu we actually need to consider XGMI hives and so it goes beyond a single device. Andrey Regards, Christian. Am 20.08.21 um 09:12 schrieb Liu, Monk: [AMD Official Use Only] @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian Do you have any concern on the kthread_park() approach ? Theoretically speaking sched_main shall run there exclusively with job_timeout since they both touches jobs, and stop scheduler during job_timeout won't impact performance since in that scenario There was already something wrong/stuck on that ring/scheduler Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Liu, Monk Sent: Thursday, August 19, 2021 6:26 PM To: Daniel Vetter ; Grodzovsky, Andrey Cc: Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Koenig, Christian Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." [AMD Official Use Only] Hi Daniel Why can't we stop the scheduler thread first, so that there's guaranteed no race? I've recently had a lot of discussions with panfrost folks about their reset that spawns across engines, and without stopping the scheduler thread first before you touch anything it's just plain impossible. Yeah we had this though as well in our mind. Our second approach is to call ktrhead_stop() in job_timedout() routine so that the "bad" job is guaranteed to be used without scheduler's touching or freeing, Check this sample patch one as well please: diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a9536..50a49cb 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct *work) sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ + kthread_park(sched->thread); spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* -* Remove the bad job so it cannot be freed by concurrent -* drm_sched_cleanup_jobs. It will be reinserted back after sched->thread -* is parked at which point it's safe. -*/ - list_del_init(&job->list); spin_unlock(&sched->job_list_lock); status = job->sched->ops->timedout_job(job); @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work) } else { spin_unlock(&sched->job_list_lock); } + kthread_unpark(sched->thread); if (status != DRM_GPU_SCHED_STAT_ENODEV) { spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); /* -* Reinsert back the bad job here - now it's safe as -* drm_sched_get_cleanup_job cannot race against us and release the -* bad job at this point - we parked (waited for) any in progress -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called -* now until the scheduler thread is unparked. -*/ - if (bad && bad->sched == sched) - /* -* Add at the head of the queue to reflect it was the earliest -* job extracted. -*/ - list_add(&bad->list, &sched->pending_list); - - /* * Iterate the job list from later to earlier one and either deactive * their HW ca
Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."
On 2021-08-20 3:12 a.m., Liu, Monk wrote: [AMD Official Use Only] @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian Do you have any concern on the kthread_park() approach ? Theoretically speaking sched_main shall run there exclusively with job_timeout since they both touches jobs, and stop scheduler during job_timeout won't impact performance since in that scenario There was already something wrong/stuck on that ring/scheduler Regarding last paragraph, and specifically the claim that there was already something wrong if the TO handler starts execution - Not sure about this and I wonder if we have a potential bug here - when we start the timeout timer in drm_sched_job_begin we do it for each new incoming job. In a constant rapid stream of jobs each new job comming will try to start the timer but most of the time this operation just bails out as there is already pending timer from one of the previous jobs which cancels out any new ones [1] so, when the TO handler does execute eventually it's not because something wrong but simply because TO has expired. If in this case the pending list not empty a false TDR will be triggered. I think long ago we used TO handler per job and not per scheduler, this would solve this problem but hurt the serialization issue we are trying to solve. So not sure what to do. [1] - https://elixir.bootlin.com/linux/v5.14-rc1/source/kernel/workqueue.c#L1665 Andrey Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Liu, Monk Sent: Thursday, August 19, 2021 6:26 PM To: Daniel Vetter ; Grodzovsky, Andrey Cc: Alex Deucher ; Chen, JingWen ; Maling list - DRI developers ; amd-gfx list ; Koenig, Christian Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job." [AMD Official Use Only] Hi Daniel Why can't we stop the scheduler thread first, so that there's guaranteed no race? I've recently had a lot of discussions with panfrost folks about their reset that spawns across engines, and without stopping the scheduler thread first before you touch anything it's just plain impossible. Yeah we had this though as well in our mind. Our second approach is to call ktrhead_stop() in job_timedout() routine so that the "bad" job is guaranteed to be used without scheduler's touching or freeing, Check this sample patch one as well please: diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a9536..50a49cb 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct *work) sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ + kthread_park(sched->thread); spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* -* Remove the bad job so it cannot be freed by concurrent -* drm_sched_cleanup_jobs. It will be reinserted back after sched->thread -* is parked at which point it's safe. -*/ - list_del_init(&job->list); spin_unlock(&sched->job_list_lock); status = job->sched->ops->timedout_job(job); @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work) } else { spin_unlock(&sched->job_list_lock); } + kthread_unpark(sched->thread); if (status != DRM_GPU_SCHED_STAT_ENODEV) { spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); /* -* Reinsert back the bad job here - now it's safe as -* drm_sched_get_cleanup_job cannot race against us and release the -* bad job at this point - we parked (waited for) any in progress -* (earlier) cleanups and drm_sched_get_cleanup_job will not be called -* now until the scheduler thread is unparked. -*/ - if (bad && bad->sched == sched) - /* -* Add at the head of the queue to reflect it was the earliest -* job extracted. -*/ - list_add(&bad->list, &sched->pending_list); - - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already * signaled. Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message-
Re: [RFC] Make use of non-dynamic dmabuf in RDMA
On Fri, Aug 20, 2021 at 03:58:33PM +0300, Gal Pressman wrote: > Though it would've been nicer if we could agree on a solution that could work > for more than 1-2 RDMA devices, using the existing tools the RDMA subsystem > has. I don't think it can really be done, revoke is necessary, and isn't a primitive we have today. Revoke is sort of like rereg MR, but with a guaranteed no-change to the lkey/rkey Then there is the locking complexity of linking the mr creation and destruction to the lifecycle of the pages, which is messy and maybe not general. For instance mlx5 would call its revoke_mr, disconnect the dmabuf then destroy the mkey - but this is only safe because mlx5 HW can handle concurrent revokes. > That's why I tried to approach this by denying such attachments for non-ODP > importers instead of exposing a "limited" dynamic importer. That is fine if there is no revoke - once revoke exists we must have driver and HW support. Jason
Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event
On Fri, Aug 20, 2021, 6:02 AM Jiri Kosina wrote: > On Tue, 17 Aug 2021, Kees Cook wrote: > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > field bounds checking for memset(), avoid intentionally writing across > > neighboring fields. > > > > Add struct_group() to mark region of struct kone_mouse_event that should > > be initialized to zero. > > > > Cc: Stefan Achatz > > Cc: Jiri Kosina > > Cc: Benjamin Tissoires > > Cc: linux-in...@vger.kernel.org > > Signed-off-by: Kees Cook > > Applied, thank you Kees. > Eek! No, this will break the build: struct_group() is not yet in the tree. I can carry this with an Ack, etc. -Kees
Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region
On Fri, Aug 20, 2021, 6:01 AM Jiri Kosina wrote: > On Tue, 17 Aug 2021, Kees Cook wrote: > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > field bounds checking for memcpy(), memmove(), and memset(), avoid > > intentionally writing across neighboring fields. > > > > Use struct_group() in struct cp2112_string_report around members report, > > length, type, and string, so they can be referenced together. This will > > allow memcpy() and sizeof() to more easily reason about sizes, improve > > readability, and avoid future warnings about writing beyond the end of > > report. > > > > "pahole" shows no size nor member offset changes to struct > > cp2112_string_report. "objdump -d" shows no meaningful object > > code changes (i.e. only source line number induced differences.) > > > > Cc: Jiri Kosina > > Cc: Benjamin Tissoires > > Cc: linux-in...@vger.kernel.org > > Signed-off-by: Kees Cook > > Applied, thanks. > Same for this one: it's part of the larger series. -Kees >
Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event
On Fri, 20 Aug 2021, Kees Cook wrote: > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > > field bounds checking for memset(), avoid intentionally writing across > > > neighboring fields. > > > > > > Add struct_group() to mark region of struct kone_mouse_event that should > > > be initialized to zero. > > > > > > Cc: Stefan Achatz > > > Cc: Jiri Kosina > > > Cc: Benjamin Tissoires > > > Cc: linux-in...@vger.kernel.org > > > Signed-off-by: Kees Cook > > > > Applied, thank you Kees. > > > > Eek! No, this will break the build: struct_group() is not yet in the tree. > I can carry this with an Ack, etc. I was pretty sure I saw struct_group() already in linux-next, but that was apparently a vacation-induced brainfart, sorry. Dropping. -- Jiri Kosina SUSE Labs
Re: [PATCH] drm/amd/pm: And destination bounds checking to struct copy
On Thu, Aug 19, 2021 at 4:14 PM Kees Cook wrote: > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > field bounds checking for memcpy(), memmove(), and memset(), avoid > intentionally writing across neighboring fields. > > The "Board Parameters" members of the structs: > struct atom_smc_dpm_info_v4_5 > struct atom_smc_dpm_info_v4_6 > struct atom_smc_dpm_info_v4_7 > struct atom_smc_dpm_info_v4_10 > are written to the corresponding members of the corresponding PPTable_t > variables, but they lack destination size bounds checking, which means > the compiler cannot verify at compile time that this is an intended and > safe memcpy(). > > Since the header files are effectively immutable[1] and a struct_group() > cannot be used, nor a common struct referenced by both sides of the > memcpy() arguments, add a new helper, memcpy_trailing(), to perform the > bounds checking at compile time. Replace the open-coded memcpy()s with > memcpy_trailing() which includes enough context for the bounds checking. > > "objdump -d" shows no object code changes. > > [1] https://lore.kernel.org/lkml/e56aad3c-a06f-da07-f491-a894a570d...@amd.com > > Cc: Lijo Lazar > Cc: "Christian König" > Cc: "Pan, Xinhui" > Cc: David Airlie > Cc: Daniel Vetter > Cc: Hawking Zhang > Cc: Feifei Xu > Cc: Likun Gao > Cc: Jiawei Gu > Cc: Evan Quan > Cc: amd-...@lists.freedesktop.org > Cc: dri-devel@lists.freedesktop.org > Signed-off-by: Kees Cook > Link: > https://lore.kernel.org/lkml/cadnq5_npb8uyvd+r4uhgf-w8-cqj3joodjvijr_y9w9wqj7...@mail.gmail.com > --- > Alex, I dropped your prior Acked-by, since the implementation is very > different. If you're still happy with it, I can add it back. :) This looks reasonable to me: Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 25 +++ > .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 6 ++--- > .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 8 +++--- > .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 5 ++-- > 4 files changed, 33 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 96e895d6be35..4605934a4fb7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -1446,4 +1446,29 @@ static inline int amdgpu_in_reset(struct amdgpu_device > *adev) > { > return atomic_read(&adev->in_gpu_reset); > } > + > +/** > + * memcpy_trailing - Copy the end of one structure into the middle of another > + * > + * @dst: Pointer to destination struct > + * @first_dst_member: The member name in @dst where the overwrite begins > + * @last_dst_member: The member name in @dst where the overwrite ends after > + * @src: Pointer to the source struct > + * @first_src_member: The member name in @src where the copy begins > + * > + */ > +#define memcpy_trailing(dst, first_dst_member, last_dst_member, > \ > + src, first_src_member) \ > +({\ > + size_t __src_offset = offsetof(typeof(*(src)), first_src_member); \ > + size_t __src_size = sizeof(*(src)) - __src_offset; \ > + size_t __dst_offset = offsetof(typeof(*(dst)), first_dst_member); \ > + size_t __dst_size = offsetofend(typeof(*(dst)), last_dst_member) - \ > + __dst_offset; \ > + BUILD_BUG_ON(__src_size != __dst_size);\ > + __builtin_memcpy((u8 *)(dst) + __dst_offset, \ > +(u8 *)(src) + __src_offset, \ > +__dst_size); \ > +}) > + > #endif > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > index 8ab58781ae13..1918e6232319 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c > @@ -465,10 +465,8 @@ static int arcturus_append_powerplay_table(struct > smu_context *smu) > > if ((smc_dpm_table->table_header.format_revision == 4) && > (smc_dpm_table->table_header.content_revision == 6)) > - memcpy(&smc_pptable->MaxVoltageStepGfx, > - &smc_dpm_table->maxvoltagestepgfx, > - sizeof(*smc_dpm_table) - offsetof(struct > atom_smc_dpm_info_v4_6, maxvoltagestepgfx)); > - > + memcpy_trailing(smc_pptable, MaxVoltageStepGfx, BoardReserved, > + smc_dpm_table, maxvoltagestepgfx); > return 0; > } > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c > index 2e5d3669652b..b738042e064d 100644 > --- a/drivers/gpu/drm/amd/pm/sws
Re: [PATCH v8 12/13] drm/mediatek: add MERGE support for mediatek-drm
Hi, Jason: jason-jh.lin 於 2021年8月19日 週四 上午10:23寫道: > > Add MERGE engine file: > MERGE module is used to merge two slice-per-line inputs > into one side-by-side output. > > Signed-off-by: jason-jh.lin > --- > drivers/gpu/drm/mediatek/Makefile | 1 + > drivers/gpu/drm/mediatek/mtk_disp_drv.h | 8 + > drivers/gpu/drm/mediatek/mtk_disp_merge.c | 268 > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 16 ++ > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 1 + > drivers/gpu/drm/mediatek/mtk_drm_drv.c | 2 + > drivers/gpu/drm/mediatek/mtk_drm_drv.h | 1 + > 7 files changed, 297 insertions(+) > create mode 100644 drivers/gpu/drm/mediatek/mtk_disp_merge.c > > diff --git a/drivers/gpu/drm/mediatek/Makefile > b/drivers/gpu/drm/mediatek/Makefile > index dc54a7a69005..538e0087a44c 100644 > --- a/drivers/gpu/drm/mediatek/Makefile > +++ b/drivers/gpu/drm/mediatek/Makefile > @@ -3,6 +3,7 @@ > mediatek-drm-y := mtk_disp_ccorr.o \ > mtk_disp_color.o \ > mtk_disp_gamma.o \ > + mtk_disp_merge.o \ > mtk_disp_ovl.o \ > mtk_disp_rdma.o \ > mtk_drm_crtc.o \ > diff --git a/drivers/gpu/drm/mediatek/mtk_disp_drv.h > b/drivers/gpu/drm/mediatek/mtk_disp_drv.h > index cafd9df2d63b..f407cd9d873e 100644 > --- a/drivers/gpu/drm/mediatek/mtk_disp_drv.h > +++ b/drivers/gpu/drm/mediatek/mtk_disp_drv.h > @@ -46,6 +46,14 @@ void mtk_gamma_set_common(void __iomem *regs, struct > drm_crtc_state *state); > void mtk_gamma_start(struct device *dev); > void mtk_gamma_stop(struct device *dev); > > +int mtk_merge_clk_enable(struct device *dev); > +void mtk_merge_clk_disable(struct device *dev); > +void mtk_merge_config(struct device *dev, unsigned int width, > + unsigned int height, unsigned int vrefresh, > + unsigned int bpc, struct cmdq_pkt *cmdq_pkt); > +void mtk_merge_start(struct device *dev); > +void mtk_merge_stop(struct device *dev); > + > void mtk_ovl_bgclr_in_on(struct device *dev); > void mtk_ovl_bgclr_in_off(struct device *dev); > void mtk_ovl_bypass_shadow(struct device *dev); > diff --git a/drivers/gpu/drm/mediatek/mtk_disp_merge.c > b/drivers/gpu/drm/mediatek/mtk_disp_merge.c > new file mode 100644 > index ..ebcb646bde9c > --- /dev/null > +++ b/drivers/gpu/drm/mediatek/mtk_disp_merge.c > @@ -0,0 +1,268 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2021 MediaTek Inc. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "mtk_drm_ddp_comp.h" > +#include "mtk_drm_drv.h" > +#include "mtk_disp_drv.h" > + > +#define DISP_REG_MERGE_CTRL0x000 > +#define MERGE_EN 1 > +#define DISP_REG_MERGE_CFG_0 0x010 > +#define DISP_REG_MERGE_CFG_4 0x020 > +#define DISP_REG_MERGE_CFG_10 0x038 > +/* no swap */ > +#define SWAP_MODE 0 > +#define FLD_SWAP_MODE GENMASK(4, 0) > +#define DISP_REG_MERGE_CFG_12 0x040 > +#define CFG_10_10_1PI_2PO_BUF_MODE 6 > +#define CFG_10_10_2PI_2PO_BUF_MODE 8 > +#define FLD_CFG_MERGE_MODE GENMASK(4, 0) > +#define DISP_REG_MERGE_CFG_24 0x070 > +#define DISP_REG_MERGE_CFG_25 0x074 > +#define DISP_REG_MERGE_CFG_36 0x0a0 > +#define ULTRA_EN 1 You could use FLD_ULTRA_EN for this. > +#define PREULTRA_EN1 > +#define HALT_FOR_DVFS_EN 0 You could just not set this. > +#define FLD_ULTRA_EN GENMASK(0, 0) #define FLD_ULTRA_EN BIT(0) Regards, Chun-Kuang. > +#define FLD_PREULTRA_ENGENMASK(4, 4) > +#define FLD_HALT_FOR_DVFS_EN GENMASK(8, 8) > +#define DISP_REG_MERGE_CFG_37 0x0a4 > +/* 0: Off, 1: SRAM0, 2: SRAM1, 3: SRAM0 + SRAM1 */ > +#define BUFFER_MODE3 > +#define FLD_BUFFER_MODEGENMASK(1, 0) > +#define DISP_REG_MERGE_CFG_38 0x0a8 > +#define FLD_VDE_BLOCK_ULTRAGENMASK(0, 0) > +#define FLD_VALID_TH_BLOCK_ULTRA GENMASK(4, 4) > +#define FLD_ULTRA_FIFO_VALID_THGENMASK(31, 16) > +#define DISP_REG_MERGE_CFG_39 0x0ac > +#define FLD_NVDE_FORCE_PREULTRAGENMASK(8, 8) > +#define FLD_NVALID_TH_FORCE_PREULTRA GENMASK(12, 12) > +#define FLD_PREULTRA_FIFO_VALID_TH GENMASK(31, 16)
[PATCH V3 5/9] PCI/VGA: Move vga_arb_integrated_gpu() earlier in file
Move vga_arb_integrated_gpu() earlier in file to prepare for future patch. No functional change intended. [bhelgaas: split to separate patch] Signed-off-by: Huacai Chen Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index c984c76b3fd7..1f8fb37be5fa 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -563,6 +563,20 @@ void vga_put(struct pci_dev *pdev, unsigned int rsrc) } EXPORT_SYMBOL(vga_put); +#if defined(CONFIG_ACPI) +static bool vga_arb_integrated_gpu(struct device *dev) +{ + struct acpi_device *adev = ACPI_COMPANION(dev); + + return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID); +} +#else +static bool vga_arb_integrated_gpu(struct device *dev) +{ + return false; +} +#endif + /* * Rules for using a bridge to control a VGA descendant decoding: if a bridge * has only one VGA descendant then it can be used to control the VGA routing @@ -1416,20 +1430,6 @@ static struct miscdevice vga_arb_device = { MISC_DYNAMIC_MINOR, "vga_arbiter", &vga_arb_device_fops }; -#if defined(CONFIG_ACPI) -static bool vga_arb_integrated_gpu(struct device *dev) -{ - struct acpi_device *adev = ACPI_COMPANION(dev); - - return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID); -} -#else -static bool vga_arb_integrated_gpu(struct device *dev) -{ - return false; -} -#endif - static void __init vga_arb_select_default_device(void) { struct pci_dev *pdev, *found = NULL; -- 2.27.0
[PATCH V3 9/9] PCI/VGA: Rework default VGA device selection
Current default VGA device selection fails in some cases: - On BMC system, the AST2500 bridge [1a03:1150] does not implement PCI_BRIDGE_CTL_VGA [1]. This is perfectly legal but means the legacy VGA resources won't reach downstream devices unless they're included in the usual bridge windows. - vga_arb_select_default_device() will set a device below such a bridge as the default VGA device as long as it has PCI_COMMAND_IO and PCI_COMMAND_MEMORY enabled. - vga_arbiter_add_pci_device() is called for every VGA device, either at boot-time or at hot-add time, and it will also set the device as the default VGA device, but ONLY if all bridges leading to it implement PCI_BRIDGE_CTL_VGA. - This difference between vga_arb_select_default_device() and vga_arbiter_add_pci_device() means that a device below an AST2500 or similar bridge can only be set as the default if it is enumerated before vga_arb_device_init(). - On ACPI-based systems, PCI devices are enumerated by acpi_init(), which runs before vga_arb_device_init(). - On non-ACPI systems, like on MIPS system, they are enumerated by pcibios_init(), which typically runs *after* vga_arb_device_init(). So I made vga_arb_update_default_device() to replace the current vga_ arb_select_default_device(), which will be call from vga_arbiter_add_ pci_device(), set the default device even if it does not own the VGA resources because an upstream bridge doesn't implement PCI_BRIDGE_CTL_ VGA. And the default VGA device is updated if a better one is found (device with legacy resources enabled is better, device owns the firmware framebuffer is even better). --- drivers/pci/vgaarb.c | 158 ++- 1 file changed, 66 insertions(+), 92 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index dd07b1c3205f..0b059a2fc749 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -580,16 +580,79 @@ static bool vga_arb_integrated_gpu(struct device *dev) static void vga_arb_update_default_device(struct vga_device *vgadev) { struct pci_dev *pdev = vgadev->pdev; + struct device *dev = &pdev->dev; + struct vga_device *vgadev_default; +#if defined(CONFIG_X86) || defined(CONFIG_IA64) + int i; + unsigned long flags; + u64 base = screen_info.lfb_base; + u64 size = screen_info.lfb_size; + u64 limit; + resource_size_t start, end; +#endif /* * If we don't have a default VGA device yet, and this device owns * the legacy VGA resources, make it the default. */ - if (!vga_default_device() && - ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) { - vgaarb_info(&pdev->dev, "setting as boot VGA device\n"); + if (!vga_default_device()) { + if ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK) + vgaarb_info(dev, "setting as boot VGA device\n"); + else + vgaarb_info(dev, "setting as boot device (VGA legacy resources not available)\n"); vga_set_default_device(pdev); } + + vgadev_default = vgadev_find(vga_default); + + /* Overridden by a better device */ + if (vgadev_default && ((vgadev_default->owns & VGA_RSRC_LEGACY_MASK) == 0) + && ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) { + vgaarb_info(dev, "overriding boot VGA device\n"); + vga_set_default_device(pdev); + } + + if (vga_arb_integrated_gpu(dev)) { + vgaarb_info(dev, "overriding boot VGA device\n"); + vga_set_default_device(pdev); + } + +#if defined(CONFIG_X86) || defined(CONFIG_IA64) + if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) + base |= (u64)screen_info.ext_lfb_base << 32; + + limit = base + size; + + /* +* Override vga_arbiter_add_pci_device()'s I/O based detection +* as it may take the wrong device (e.g. on Apple system under +* EFI). +* +* Select the device owning the boot framebuffer if there is +* one. +*/ + + /* Does firmware framebuffer belong to us? */ + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + flags = pci_resource_flags(vgadev->pdev, i); + + if ((flags & IORESOURCE_MEM) == 0) + continue; + + start = pci_resource_start(vgadev->pdev, i); + end = pci_resource_end(vgadev->pdev, i); + + if (!start || !end) + continue; + + if (base < start || limit >= end) + continue; + + if (vgadev->pdev != vga_default_device()) + vgaarb_info(dev, "overriding boot device\n"); + vga_set_default_device(vgadev->pdev); + } +#endif }
[PATCH V3 2/9] PCI/VGA: Replace full MIT license text with SPDX identifier
From: Bjorn Helgaas Per Documentation/process/license-rules.rst, the SPDX MIT identifier is equivalent to including the entire MIT license text from LICENSES/preferred/MIT. Replace the MIT license text with the equivalent SPDX identifier. Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 23 +-- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index 949fde433ea2..61b57abcb014 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -1,32 +1,11 @@ +// SPDX-License-Identifier: MIT /* * vgaarb.c: Implements the VGA arbitration. For details refer to * Documentation/gpu/vgaarbiter.rst * - * * (C) Copyright 2005 Benjamin Herrenschmidt * (C) Copyright 2007 Paulo R. Zanoni * (C) Copyright 2007, 2009 Tiago Vignatti - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice (including the next - * paragraph) shall be included in all copies or substantial portions of the - * Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS - * IN THE SOFTWARE. - * */ #define pr_fmt(fmt) "vgaarb: " fmt -- 2.27.0
Re: [syzbot] WARNING in drm_gem_shmem_vm_open
syzbot has bisected this issue to: commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2 Author: Daniel Vetter Date: Fri Oct 9 23:21:56 2020 + drm/vkms: fbdev emulation support bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530 start commit: 614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern.. git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530 console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530 kernel config: https://syzkaller.appspot.com/x/.config?x=96f0602203250753 dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30 Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support") For information about bisection process see: https://goo.gl/tpsmEJ#bisection
[PATCH V3 8/9] PCI/VGA: Log bridge control messages when adding devices
Previously vga_arb_device_init() iterated through all VGA devices and indicated whether legacy VGA routing to each could be controlled by an upstream bridge. But we determine that information in vga_arbiter_add_pci_device(), which we call for every device, so we can log it there without iterating through the VGA devices again. Note that we call vga_arbiter_check_bridge_sharing() before adding the device to vga_list, so we have to handle the very first device separately. [bhelgaas: commit log, split another piece to separate patch, fix list_empty() issue] Link: https://lore.kernel.org/r/20210705100503.1120643-1-chenhua...@loongson.cn Signed-off-by: Huacai Chen Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index 4cecb599f5ed..dd07b1c3205f 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -609,8 +609,10 @@ static void vga_arbiter_check_bridge_sharing(struct vga_device *vgadev) vgadev->bridge_has_one_vga = true; - if (list_empty(&vga_list)) + if (list_empty(&vga_list)) { + vgaarb_info(&vgadev->pdev->dev, "bridge control possible\n"); return; + } /* okay iterate the new devices bridge hierarachy */ new_bus = vgadev->pdev->bus; @@ -649,6 +651,11 @@ static void vga_arbiter_check_bridge_sharing(struct vga_device *vgadev) } new_bus = new_bus->parent; } + + if (vgadev->bridge_has_one_vga) + vgaarb_info(&vgadev->pdev->dev, "bridge control possible\n"); + else + vgaarb_info(&vgadev->pdev->dev, "no bridge control possible\n"); } /* @@ -1527,7 +1534,6 @@ static int __init vga_arb_device_init(void) { int rc; struct pci_dev *pdev; - struct vga_device *vgadev; rc = misc_register(&vga_arb_device); if (rc < 0) @@ -1543,15 +1549,6 @@ static int __init vga_arb_device_init(void) PCI_ANY_ID, pdev)) != NULL) vga_arbiter_add_pci_device(pdev); - list_for_each_entry(vgadev, &vga_list, list) { - struct device *dev = &vgadev->pdev->dev; - - if (vgadev->bridge_has_one_vga) - vgaarb_info(dev, "bridge control possible\n"); - else - vgaarb_info(dev, "no bridge control possible\n"); - } - vga_arb_select_default_device(); pr_info("loaded\n"); -- 2.27.0
[PATCH V3 7/9] PCI/VGA: Split out vga_arb_update_default_device()
If there's no default VGA device, and we find a VGA device that owns the legacy VGA resources, we make that device the default. Split this logic out from vga_arbiter_add_pci_device() into a new function, vga_arb_update_default_device(). [bhelgaas: split another piece to separate patch] Signed-off-by: Huacai Chen Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index a6a5864ff538..4cecb599f5ed 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -577,6 +577,21 @@ static bool vga_arb_integrated_gpu(struct device *dev) } #endif +static void vga_arb_update_default_device(struct vga_device *vgadev) +{ + struct pci_dev *pdev = vgadev->pdev; + + /* +* If we don't have a default VGA device yet, and this device owns +* the legacy VGA resources, make it the default. +*/ + if (!vga_default_device() && + ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) { + vgaarb_info(&pdev->dev, "setting as boot VGA device\n"); + vga_set_default_device(pdev); + } +} + /* * Rules for using a bridge to control a VGA descendant decoding: if a bridge * has only one VGA descendant then it can be used to control the VGA routing @@ -704,15 +719,7 @@ static bool vga_arbiter_add_pci_device(struct pci_dev *pdev) bus = bus->parent; } - /* Deal with VGA default device. Use first enabled one -* by default if arch doesn't have it's own hook -*/ - if (!vga_default_device() && - ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) { - vgaarb_info(&pdev->dev, "setting as boot VGA device\n"); - vga_set_default_device(pdev); - } - + vga_arb_update_default_device(vgadev); vga_arbiter_check_bridge_sharing(vgadev); /* Add to the list */ -- 2.27.0
[PATCH V3 0/9] PCI/VGA: Rework default VGA device selection
My original work is at [1]. Bjorn do some rework and extension in V2. It moves the VGA arbiter to the PCI subsystem, fixes a few nits, and breaks a few pieces to make the main patch a little smaller. V3 rewrite the commit log of the last patch (which is also summarized by Bjorn). All comments welcome! [1] https://lore.kernel.org/dri-devel/20210705100503.1120643-1-chenhua...@loongson.cn/ Bjorn Helgaas (4): PCI/VGA: Move vgaarb to drivers/pci PCI/VGA: Replace full MIT license text with SPDX identifier PCI/VGA: Use unsigned format string to print lock counts PCI/VGA: Remove empty vga_arb_device_card_gone() Huacai Chen (5): PCI/VGA: Move vga_arb_integrated_gpu() earlier in file PCI/VGA: Prefer vga_default_device() PCI/VGA: Split out vga_arb_update_default_device() PCI/VGA: Log bridge control messages when adding devices PCI/VGA: Rework default VGA device selection Signed-off-by: Huacai Chen Signed-off-by: Bjorn Helgaas --- drivers/gpu/vga/Kconfig | 19 --- drivers/gpu/vga/Makefile | 1 - drivers/pci/Kconfig | 19 +++ drivers/pci/Makefile | 1 + drivers/{gpu/vga => pci}/vgaarb.c | 269 -- 5 files changed, 126 insertions(+), 183 deletions(-) rename drivers/{gpu/vga => pci}/vgaarb.c (90%) -- 2.27.0
[PATCH V3 1/9] PCI/VGA: Move vgaarb to drivers/pci
From: Bjorn Helgaas The VGA arbiter is really PCI-specific and doesn't depend on any GPU things. Move it to the PCI subsystem. Signed-off-by: Bjorn Helgaas --- drivers/gpu/vga/Kconfig | 19 --- drivers/gpu/vga/Makefile | 1 - drivers/pci/Kconfig | 19 +++ drivers/pci/Makefile | 1 + drivers/{gpu/vga => pci}/vgaarb.c | 0 5 files changed, 20 insertions(+), 20 deletions(-) rename drivers/{gpu/vga => pci}/vgaarb.c (100%) diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig index 1ad4c4ef0b5e..eb8b14ab22c3 100644 --- a/drivers/gpu/vga/Kconfig +++ b/drivers/gpu/vga/Kconfig @@ -1,23 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only -config VGA_ARB - bool "VGA Arbitration" if EXPERT - default y - depends on (PCI && !S390) - help - Some "legacy" VGA devices implemented on PCI typically have the same - hard-decoded addresses as they did on ISA. When multiple PCI devices - are accessed at same time they need some kind of coordination. Please - see Documentation/gpu/vgaarbiter.rst for more details. Select this to - enable VGA arbiter. - -config VGA_ARB_MAX_GPUS - int "Maximum number of GPUs" - default 16 - depends on VGA_ARB - help - Reserves space in the kernel to maintain resource locking for - multiple GPUS. The overhead for each GPU is very small. - config VGA_SWITCHEROO bool "Laptop Hybrid Graphics - GPU switching support" depends on X86 diff --git a/drivers/gpu/vga/Makefile b/drivers/gpu/vga/Makefile index e92064442d60..9800620deda3 100644 --- a/drivers/gpu/vga/Makefile +++ b/drivers/gpu/vga/Makefile @@ -1,3 +1,2 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_VGA_ARB) += vgaarb.o obj-$(CONFIG_VGA_SWITCHEROO) += vga_switcheroo.o diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 0c473d75e625..7c9e56d7b857 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -252,6 +252,25 @@ config PCIE_BUS_PEER2PEER endchoice +config VGA_ARB + bool "VGA Arbitration" if EXPERT + default y + depends on (PCI && !S390) + help + Some "legacy" VGA devices implemented on PCI typically have the same + hard-decoded addresses as they did on ISA. When multiple PCI devices + are accessed at same time they need some kind of coordination. Please + see Documentation/gpu/vgaarbiter.rst for more details. Select this to + enable VGA arbiter. + +config VGA_ARB_MAX_GPUS + int "Maximum number of GPUs" + default 16 + depends on VGA_ARB + help + Reserves space in the kernel to maintain resource locking for + multiple GPUS. The overhead for each GPU is very small. + source "drivers/pci/hotplug/Kconfig" source "drivers/pci/controller/Kconfig" source "drivers/pci/endpoint/Kconfig" diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index d62c4ac4ae1b..ebe720f69b15 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o obj-$(CONFIG_PCI_ECAM) += ecam.o obj-$(CONFIG_PCI_P2PDMA) += p2pdma.o obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o +obj-$(CONFIG_VGA_ARB) += vgaarb.o # Endpoint library must be initialized before its users obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/pci/vgaarb.c similarity index 100% rename from drivers/gpu/vga/vgaarb.c rename to drivers/pci/vgaarb.c -- 2.27.0
[PATCH V3 6/9] PCI/VGA: Prefer vga_default_device()
Use the vga_default_device() interface consistently instead of directly testing vga_default. No functional change intended. [bhelgaas: split to separate patch and extended] Signed-off-by: Huacai Chen Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index 1f8fb37be5fa..a6a5864ff538 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -173,7 +173,7 @@ int vga_remove_vgacon(struct pci_dev *pdev) { int ret = 0; - if (pdev != vga_default) + if (pdev != vga_default_device()) return 0; vgaarb_info(&pdev->dev, "deactivate vga console\n"); @@ -707,7 +707,7 @@ static bool vga_arbiter_add_pci_device(struct pci_dev *pdev) /* Deal with VGA default device. Use first enabled one * by default if arch doesn't have it's own hook */ - if (vga_default == NULL && + if (!vga_default_device() && ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) { vgaarb_info(&pdev->dev, "setting as boot VGA device\n"); vga_set_default_device(pdev); @@ -744,7 +744,7 @@ static bool vga_arbiter_del_pci_device(struct pci_dev *pdev) goto bail; } - if (vga_default == pdev) + if (vga_default_device() == pdev) vga_set_default_device(NULL); if (vgadev->decodes & (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM)) -- 2.27.0
[PATCH V3 3/9] PCI/VGA: Use unsigned format string to print lock counts
From: Bjorn Helgaas In struct vga_device, io_lock_cnt and mem_lock_cnt are unsigned, but we previously printed them with "%d", the signed decimal format. Print them with the unsigned format "%u" instead. Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index 61b57abcb014..e4153ab70481 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -1022,7 +1022,7 @@ static ssize_t vga_arb_read(struct file *file, char __user *buf, /* Fill the buffer with infos */ len = snprintf(lbuf, 1024, - "count:%d,PCI:%s,decodes=%s,owns=%s,locks=%s(%d:%d)\n", + "count:%d,PCI:%s,decodes=%s,owns=%s,locks=%s(%u:%u)\n", vga_decode_count, pci_name(pdev), vga_iostate_to_str(vgadev->decodes), vga_iostate_to_str(vgadev->owns), -- 2.27.0
[PATCH V3 4/9] PCI/VGA: Remove empty vga_arb_device_card_gone()
From: Bjorn Helgaas vga_arb_device_card_gone() has always been empty. Remove it. Signed-off-by: Bjorn Helgaas --- drivers/pci/vgaarb.c | 16 +--- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c index e4153ab70481..c984c76b3fd7 100644 --- a/drivers/pci/vgaarb.c +++ b/drivers/pci/vgaarb.c @@ -104,8 +104,6 @@ static int vga_str_to_iostate(char *buf, int str_size, int *io_state) /* this is only used a cookie - it should not be dereferenced */ static struct pci_dev *vga_default; -static void vga_arb_device_card_gone(struct pci_dev *pdev); - /* Find somebody in our list */ static struct vga_device *vgadev_find(struct pci_dev *pdev) { @@ -741,10 +739,6 @@ static bool vga_arbiter_del_pci_device(struct pci_dev *pdev) /* Remove entry from list */ list_del(&vgadev->list); vga_count--; - /* Notify userland driver that the device is gone so it discards -* it's copies of the pci_dev pointer -*/ - vga_arb_device_card_gone(pdev); /* Wake up all possible waiters */ wake_up_all(&vga_wait_queue); @@ -994,9 +988,7 @@ static ssize_t vga_arb_read(struct file *file, char __user *buf, if (lbuf == NULL) return -ENOMEM; - /* Shields against vga_arb_device_card_gone (pci_dev going -* away), and allows access to vga list -*/ + /* Protects vga_list */ spin_lock_irqsave(&vga_lock, flags); /* If we are targeting the default, use it */ @@ -1013,8 +1005,6 @@ static ssize_t vga_arb_read(struct file *file, char __user *buf, /* Wow, it's not in the list, that shouldn't happen, * let's fix us up and return invalid card */ - if (pdev == priv->target) - vga_arb_device_card_gone(pdev); spin_unlock_irqrestore(&vga_lock, flags); len = sprintf(lbuf, "invalid"); goto done; @@ -1358,10 +1348,6 @@ static int vga_arb_release(struct inode *inode, struct file *file) return 0; } -static void vga_arb_device_card_gone(struct pci_dev *pdev) -{ -} - /* * callback any registered clients to let them know we have a * change in VGA cards -- 2.27.0
Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region
On Fri, Aug 20, 2021 at 03:01:43PM +0200, Jiri Kosina wrote: > On Tue, 17 Aug 2021, Kees Cook wrote: > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > field bounds checking for memcpy(), memmove(), and memset(), avoid > > intentionally writing across neighboring fields. > > > > Use struct_group() in struct cp2112_string_report around members report, > > length, type, and string, so they can be referenced together. This will > > allow memcpy() and sizeof() to more easily reason about sizes, improve > > readability, and avoid future warnings about writing beyond the end of > > report. > > > > "pahole" shows no size nor member offset changes to struct > > cp2112_string_report. "objdump -d" shows no meaningful object > > code changes (i.e. only source line number induced differences.) > > > > Cc: Jiri Kosina > > Cc: Benjamin Tissoires > > Cc: linux-in...@vger.kernel.org > > Signed-off-by: Kees Cook > > Applied, thanks. I'm not sure if my other HTML email got through, but please don't apply these to separate trees -- struct_group() is introduced as part of this series. -- Kees Cook
[PATCH] drm/i915: Actually delete gpu reloc selftests
In commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7 Author: Daniel Vetter Date: Tue Aug 3 14:48:33 2021 +0200 drm/i915: delete gpu reloc code I deleted the gpu relocation code and the selftest include and enabling, but accidentally forgot about the selftest source code. Fix this oversight. Signed-off-by: Daniel Vetter Cc: Jon Bloomfield Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Daniel Vetter Cc: Joonas Lahtinen Cc: "Thomas Hellström" Cc: Matthew Auld Cc: Lionel Landwerlin Cc: Dave Airlie Cc: Jason Ekstrand --- .../i915/gem/selftests/i915_gem_execbuffer.c | 190 -- 1 file changed, 190 deletions(-) delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c deleted file mode 100644 index 16162fc2782d.. --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c +++ /dev/null @@ -1,190 +0,0 @@ -// SPDX-License-Identifier: MIT -/* - * Copyright © 2020 Intel Corporation - */ - -#include "i915_selftest.h" - -#include "gt/intel_engine_pm.h" -#include "selftests/igt_flush_test.h" - -static u64 read_reloc(const u32 *map, int x, const u64 mask) -{ - u64 reloc; - - memcpy(&reloc, &map[x], sizeof(reloc)); - return reloc & mask; -} - -static int __igt_gpu_reloc(struct i915_execbuffer *eb, - struct drm_i915_gem_object *obj) -{ - const unsigned int offsets[] = { 8, 3, 0 }; - const u64 mask = - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0); - const u32 *map = page_mask_bits(obj->mm.mapping); - struct i915_request *rq; - struct i915_vma *vma; - int err; - int i; - - vma = i915_vma_instance(obj, eb->context->vm, NULL); - if (IS_ERR(vma)) - return PTR_ERR(vma); - - err = i915_gem_object_lock(obj, &eb->ww); - if (err) - return err; - - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH); - if (err) - return err; - - /* 8-Byte aligned */ - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0); - if (err <= 0) - goto reloc_err; - - /* !8-Byte aligned */ - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1); - if (err <= 0) - goto reloc_err; - - /* Skip to the end of the cmd page */ - i = PAGE_SIZE / sizeof(u32) - 1; - i -= eb->reloc_cache.rq_size; - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size, -MI_NOOP, i); - eb->reloc_cache.rq_size += i; - - /* Force next batch */ - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2); - if (err <= 0) - goto reloc_err; - - GEM_BUG_ON(!eb->reloc_cache.rq); - rq = i915_request_get(eb->reloc_cache.rq); - reloc_gpu_flush(eb, &eb->reloc_cache); - GEM_BUG_ON(eb->reloc_cache.rq); - - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2); - if (err) { - intel_gt_set_wedged(eb->engine->gt); - goto put_rq; - } - - if (!i915_request_completed(rq)) { - pr_err("%s: did not wait for relocations!\n", eb->engine->name); - err = -EINVAL; - goto put_rq; - } - - for (i = 0; i < ARRAY_SIZE(offsets); i++) { - u64 reloc = read_reloc(map, offsets[i], mask); - - if (reloc != i) { - pr_err("%s[%d]: map[%d] %llx != %x\n", - eb->engine->name, i, offsets[i], reloc, i); - err = -EINVAL; - } - } - if (err) - igt_hexdump(map, 4096); - -put_rq: - i915_request_put(rq); -unpin_vma: - i915_vma_unpin(vma); - return err; - -reloc_err: - if (!err) - err = -EIO; - goto unpin_vma; -} - -static int igt_gpu_reloc(void *arg) -{ - struct i915_execbuffer eb; - struct drm_i915_gem_object *scratch; - int err = 0; - u32 *map; - - eb.i915 = arg; - - scratch = i915_gem_object_create_internal(eb.i915, 4096); - if (IS_ERR(scratch)) - return PTR_ERR(scratch); - - map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC); - if (IS_ERR(map)) { - err = PTR_ERR(map); - goto err_scratch; - } - - intel_gt_pm_get(&eb.i915->gt); - - for_each_uabi_engine(eb.engine, eb.i915) { - if (intel_engine_requires_cmd_parser(eb.engine) || - intel_engine_using_cmd_parser(eb.engine)) - continue; - - reloc_cache_init(&eb.reloc_cache, eb.i915); - memset(map, POISON_INUSE, 4096); - - intel_engine_pm_get(eb.engine); - eb.context = intel_context_create(e
Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event
On Fri, Aug 20, 2021 at 05:27:35PM +0200, Jiri Kosina wrote: > On Fri, 20 Aug 2021, Kees Cook wrote: > > > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > > > field bounds checking for memset(), avoid intentionally writing across > > > > neighboring fields. > > > > > > > > Add struct_group() to mark region of struct kone_mouse_event that should > > > > be initialized to zero. > > > > > > > > Cc: Stefan Achatz > > > > Cc: Jiri Kosina > > > > Cc: Benjamin Tissoires > > > > Cc: linux-in...@vger.kernel.org > > > > Signed-off-by: Kees Cook > > > > > > Applied, thank you Kees. > > > > > > > Eek! No, this will break the build: struct_group() is not yet in the tree. > > I can carry this with an Ack, etc. > > I was pretty sure I saw struct_group() already in linux-next, but that was > apparently a vacation-induced brainfart, sorry. Dropping. Cool, no worries. Sorry for the confusion! -- Kees Cook
Re: [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents
On Thu, Jul 29, 2021 at 12:39:12PM +0300, Leon Romanovsky wrote: > +/** > + * __sg_free_table - Free a previously mapped sg table > + * @table: The sg table header to use > + * @max_ents:The maximum number of entries per single scatterlist > + * @total_ents: The total number of entries in the table > + * @nents_first_chunk: Number of entries int the (preallocated) first > + * scatterlist chunk, 0 means no such preallocated > + * first chunk > + * @free_fn: Free function > + * > + * Description: > + *Free an sg table previously allocated and setup with > + *__sg_alloc_table(). The @max_ents value must be identical to > + *that previously used with __sg_alloc_table(). > + * > + **/ > +void __sg_free_table(struct sg_table *table, unsigned int max_ents, > + unsigned int nents_first_chunk, sg_free_fn *free_fn) > +{ > + sg_free_table_entries(table, max_ents, nents_first_chunk, free_fn, > + table->orig_nents); > +} > EXPORT_SYMBOL(__sg_free_table); This is getting a bit indirect, there is only one caller of __sg_free_table() in sg_pool.c, so may as well just export sg_free_table_entries have have it use that directly. Jason
Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs
On Fri, Aug 20, 2021 at 05:49:35PM +1000, Michael Ellerman wrote: > Kees Cook writes: > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > field bounds checking for memset(), avoid intentionally writing across > > neighboring fields. > > > > Add a struct_group() for the spe registers so that memset() can correctly > > reason > > about the size: > > > >In function 'fortify_memset_chk', > >inlined from 'restore_user_regs.part.0' at > > arch/powerpc/kernel/signal_32.c:539:3: > >>> include/linux/fortify-string.h:195:4: error: call to > >>> '__write_overflow_field' declared with attribute warning: detected write > >>> beyond size of field (1st parameter); maybe use struct_group()? > >>> [-Werror=attribute-warning] > > 195 |__write_overflow_field(); > > |^~~~ > > > > Cc: Michael Ellerman > > Cc: Benjamin Herrenschmidt > > Cc: Paul Mackerras > > Cc: Christophe Leroy > > Cc: Sudeep Holla > > Cc: linuxppc-...@lists.ozlabs.org > > Reported-by: kernel test robot > > Signed-off-by: Kees Cook > > --- > > arch/powerpc/include/asm/processor.h | 6 -- > > arch/powerpc/kernel/signal_32.c | 6 +++--- > > 2 files changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/processor.h > > b/arch/powerpc/include/asm/processor.h > > index f348e564f7dd..05dc567cb9a8 100644 > > --- a/arch/powerpc/include/asm/processor.h > > +++ b/arch/powerpc/include/asm/processor.h > > @@ -191,8 +191,10 @@ struct thread_struct { > > int used_vsr; /* set if process has used VSX */ > > #endif /* CONFIG_VSX */ > > #ifdef CONFIG_SPE > > - unsigned long evr[32];/* upper 32-bits of SPE regs */ > > - u64 acc;/* Accumulator */ > > + struct_group(spe, > > + unsigned long evr[32];/* upper 32-bits of SPE regs */ > > + u64 acc;/* Accumulator */ > > + ); > > unsigned long spefscr;/* SPE & eFP status */ > > unsigned long spefscr_last; /* SPEFSCR value on last prctl > >call or trap return */ > > diff --git a/arch/powerpc/kernel/signal_32.c > > b/arch/powerpc/kernel/signal_32.c > > index 0608581967f0..77b86caf5c51 100644 > > --- a/arch/powerpc/kernel/signal_32.c > > +++ b/arch/powerpc/kernel/signal_32.c > > @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs, > > regs_set_return_msr(regs, regs->msr & ~MSR_SPE); > > if (msr & MSR_SPE) { > > /* restore spe registers from the stack */ > > - unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs, > > - ELF_NEVRREG * sizeof(u32), failed); > > + unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs, > > + sizeof(current->thread.spe), failed); > > This makes me nervous, because the ABI is that we copy ELF_NEVRREG * > sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to > be. > > ie. if we use sizeof an inadvertent change to the fields in > thread_struct could change how many bytes we copy out to userspace, > which would be an ABI break. > > And that's not that hard to do, because it's not at all obvious that the > size and layout of fields in thread_struct affects the user ABI. > > At the same time we don't want to copy the right number of bytes but > the wrong content, so from that point of view using sizeof is good :) > > The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that > things match up, so maybe we should do that here too. > > ie. add: > > BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32)); > > Not sure if you are happy doing that as part of this patch. I can always > do it later if not. Sounds good to me; I did that in a few other cases in the series where the relationships between things seemed tenuous. :) I'll add this (as !=) in v3. Thanks! -- Kees Cook
Re: [PATCH v2 56/63] RDMA/mlx5: Use struct_group() to zero struct mlx5_ib_mr
On Fri, Aug 20, 2021 at 09:34:00AM -0300, Jason Gunthorpe wrote: > On Thu, Aug 19, 2021 at 11:14:37AM -0700, Kees Cook wrote: > > > Which do you mean? When doing the conversions I tended to opt for > > struct_group() since it provides more robust "intentionality". Strictly > > speaking, the new memset helpers are doing field-spanning writes, but the > > "clear to the end" pattern was so common it made sense to add the helpers, > > as they're a bit less disruptive. It's totally up to you! :) > > Well, of the patches you cc'd to me only this one used the struct > group.. Understood. I've adjusted this for v3. Thanks! -- Kees Cook
Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event
On Fri, Aug 20, 2021 at 05:27:35PM +0200, Jiri Kosina wrote: > On Fri, 20 Aug 2021, Kees Cook wrote: > > > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > > > field bounds checking for memset(), avoid intentionally writing across > > > > neighboring fields. > > > > > > > > Add struct_group() to mark region of struct kone_mouse_event that should > > > > be initialized to zero. > > > > > > > > Cc: Stefan Achatz > > > > Cc: Jiri Kosina > > > > Cc: Benjamin Tissoires > > > > Cc: linux-in...@vger.kernel.org > > > > Signed-off-by: Kees Cook > > > > > > Applied, thank you Kees. > > > > > > > Eek! No, this will break the build: struct_group() is not yet in the tree. > > I can carry this with an Ack, etc. > > I was pretty sure I saw struct_group() already in linux-next, but that was > apparently a vacation-induced brainfart, sorry. Dropping. Oh, for these two patches, can I add your Acked-by while I carry them? Thanks! -- Kees Cook
Re: [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents
On Fri, Aug 20, 2021 at 12:54:25PM -0300, Jason Gunthorpe wrote: > On Thu, Jul 29, 2021 at 12:39:12PM +0300, Leon Romanovsky wrote: > > > +/** > > + * __sg_free_table - Free a previously mapped sg table > > + * @table: The sg table header to use > > + * @max_ents: The maximum number of entries per single scatterlist > > + * @total_ents:The total number of entries in the table > > + * @nents_first_chunk: Number of entries int the (preallocated) first > > + * scatterlist chunk, 0 means no such preallocated > > + * first chunk > > + * @free_fn: Free function > > + * > > + * Description: > > + *Free an sg table previously allocated and setup with > > + *__sg_alloc_table(). The @max_ents value must be identical to > > + *that previously used with __sg_alloc_table(). > > + * > > + **/ > > +void __sg_free_table(struct sg_table *table, unsigned int max_ents, > > +unsigned int nents_first_chunk, sg_free_fn *free_fn) > > +{ > > + sg_free_table_entries(table, max_ents, nents_first_chunk, free_fn, > > + table->orig_nents); > > +} > > EXPORT_SYMBOL(__sg_free_table); > > This is getting a bit indirect, there is only one caller of > __sg_free_table() in sg_pool.c, so may as well just export > sg_free_table_entries have have it use that directly. And further since sg_free_table_entries() doesn't actually use table-> except for the SGL it should probably be called sg_free_table_sgl() Jason
Re: [PATCH v2 02/14] drm/arm/hdlcd: Convert to Linux IRQ interfaces
On Tue, Aug 03, 2021 at 11:06:52AM +0200, Thomas Zimmermann wrote: > Drop the DRM IRQ midlayer in favor of Linux IRQ interfaces. DRM's > IRQ helpers are mostly useful for UMS drivers. Modern KMS drivers > don't benefit from using it. > > DRM IRQ callbacks are now being called directly or inlined. > > Calls to platform_get_irq() can fail with a negative errno code. > Abort initialization in this case. The DRM IRQ midlayer does not > handle this case correctly. > > v2: > * name struct drm_device variables 'drm' (Sam) > > Signed-off-by: Thomas Zimmermann > Acked-by: Sam Ravnborg Sorry for the delayed response due to holidays. Acked-by: Liviu Dudau Best regards, Liviu > --- > drivers/gpu/drm/arm/hdlcd_drv.c | 174 ++-- > drivers/gpu/drm/arm/hdlcd_drv.h | 1 + > 2 files changed, 97 insertions(+), 78 deletions(-) > > diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c > index 81ae92390736..479c2422a2e0 100644 > --- a/drivers/gpu/drm/arm/hdlcd_drv.c > +++ b/drivers/gpu/drm/arm/hdlcd_drv.c > @@ -29,7 +29,6 @@ > #include > #include > #include > -#include > #include > #include > #include > @@ -38,6 +37,94 @@ > #include "hdlcd_drv.h" > #include "hdlcd_regs.h" > > +static irqreturn_t hdlcd_irq(int irq, void *arg) > +{ > + struct drm_device *drm = arg; > + struct hdlcd_drm_private *hdlcd = drm->dev_private; > + unsigned long irq_status; > + > + irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS); > + > +#ifdef CONFIG_DEBUG_FS > + if (irq_status & HDLCD_INTERRUPT_UNDERRUN) > + atomic_inc(&hdlcd->buffer_underrun_count); > + > + if (irq_status & HDLCD_INTERRUPT_DMA_END) > + atomic_inc(&hdlcd->dma_end_count); > + > + if (irq_status & HDLCD_INTERRUPT_BUS_ERROR) > + atomic_inc(&hdlcd->bus_error_count); > + > + if (irq_status & HDLCD_INTERRUPT_VSYNC) > + atomic_inc(&hdlcd->vsync_count); > + > +#endif > + if (irq_status & HDLCD_INTERRUPT_VSYNC) > + drm_crtc_handle_vblank(&hdlcd->crtc); > + > + /* acknowledge interrupt(s) */ > + hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, irq_status); > + > + return IRQ_HANDLED; > +} > + > +static void hdlcd_irq_preinstall(struct drm_device *drm) > +{ > + struct hdlcd_drm_private *hdlcd = drm->dev_private; > + /* Ensure interrupts are disabled */ > + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, 0); > + hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, ~0); > +} > + > +static void hdlcd_irq_postinstall(struct drm_device *drm) > +{ > +#ifdef CONFIG_DEBUG_FS > + struct hdlcd_drm_private *hdlcd = drm->dev_private; > + unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK); > + > + /* enable debug interrupts */ > + irq_mask |= HDLCD_DEBUG_INT_MASK; > + > + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask); > +#endif > +} > + > +static int hdlcd_irq_install(struct drm_device *drm, int irq) > +{ > + int ret; > + > + if (irq == IRQ_NOTCONNECTED) > + return -ENOTCONN; > + > + hdlcd_irq_preinstall(drm); > + > + ret = request_irq(irq, hdlcd_irq, 0, drm->driver->name, drm); > + if (ret) > + return ret; > + > + hdlcd_irq_postinstall(drm); > + > + return 0; > +} > + > +static void hdlcd_irq_uninstall(struct drm_device *drm) > +{ > + struct hdlcd_drm_private *hdlcd = drm->dev_private; > + /* disable all the interrupts that we might have enabled */ > + unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK); > + > +#ifdef CONFIG_DEBUG_FS > + /* disable debug interrupts */ > + irq_mask &= ~HDLCD_DEBUG_INT_MASK; > +#endif > + > + /* disable vsync interrupts */ > + irq_mask &= ~HDLCD_INTERRUPT_VSYNC; > + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask); > + > + free_irq(hdlcd->irq, drm); > +} > + > static int hdlcd_load(struct drm_device *drm, unsigned long flags) > { > struct hdlcd_drm_private *hdlcd = drm->dev_private; > @@ -90,7 +177,12 @@ static int hdlcd_load(struct drm_device *drm, unsigned > long flags) > goto setup_fail; > } > > - ret = drm_irq_install(drm, platform_get_irq(pdev, 0)); > + ret = platform_get_irq(pdev, 0); > + if (ret < 0) > + goto irq_fail; > + hdlcd->irq = ret; > + > + ret = hdlcd_irq_install(drm, hdlcd->irq); > if (ret < 0) { > DRM_ERROR("failed to install IRQ handler\n"); > goto irq_fail; > @@ -122,76 +214,6 @@ static void hdlcd_setup_mode_config(struct drm_device > *drm) > drm->mode_config.funcs = &hdlcd_mode_config_funcs; > } > > -static irqreturn_t hdlcd_irq(int irq, void *arg) > -{ > - struct drm_device *drm = arg; > - struct hdlcd_drm_private *hdlcd = drm->dev_private; > - unsigned long irq_status; > - > - irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS); > - > -#ifdef CONFIG_DEBUG_FS > - if (irq_status & HDLCD_INTERRUPT_UNDERRUN) > -
Re: [BUG - BISECTED] display not detected anymore
On Thu, Aug 19, 2021 at 09:07:26PM +0300, Ville Syrjälä wrote: > > ef79d62b5ce5 ("drm/i915: Encapsulate dbuf state handling harder") > > > > With that commit the display is not detected anymore, one commit > > before that it still works. So this one seems to be broken. > > > > Ville, Stanislav, any idea how to fix this? > > > > commit ef79d62b5ce53851901d6c1d21b74cbb9e27219b > > Author: Ville Syrjälä > > Date: Fri Jan 22 22:56:32 2021 +0200 > > > > drm/i915: Encapsulate dbuf state handling harder > > That has nothing to do with display detection, so very mysterious. > > Please file a bug at https://gitlab.freedesktop.org/drm/intel/issues/new > boot with drm.debug=0xe with both good and bad kernels and attach the > dmesg from both to the bug. Everything (hopefully) provided here: https://gitlab.freedesktop.org/drm/intel/-/issues/4013 Please let me know if you need more, or if I can help otherwise to resolve this.
Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event
On Fri, 20 Aug 2021, Kees Cook wrote: > > > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > > > > field bounds checking for memset(), avoid intentionally writing across > > > > > neighboring fields. > > > > > > > > > > Add struct_group() to mark region of struct kone_mouse_event that > > > > > should > > > > > be initialized to zero. > > > > > > > > > > Cc: Stefan Achatz > > > > > Cc: Jiri Kosina > > > > > Cc: Benjamin Tissoires > > > > > Cc: linux-in...@vger.kernel.org > > > > > Signed-off-by: Kees Cook > > > > > > > > Applied, thank you Kees. > > > > > > > > > > Eek! No, this will break the build: struct_group() is not yet in the tree. > > > I can carry this with an Ack, etc. > > > > I was pretty sure I saw struct_group() already in linux-next, but that was > > apparently a vacation-induced brainfart, sorry. Dropping. > > Oh, for these two patches, can I add your Acked-by while I carry them? Yes, thanks, and sorry for the noise. -- Jiri Kosina SUSE Labs
Re: [PATCH rdma-next v3 0/3] SG fix together with update to RDMA umem
On Thu, Jul 29, 2021 at 12:39:10PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky > > Changelog: > v3: > * Rewrote to new API suggestion > * Split for more patches > v2: https://lore.kernel.org/lkml/cover.1626605893.git.leo...@nvidia.com > * Changed implementation of first patch, based on our discussion with > Christoph. >https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/ > v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/ > * Fixed sg_page with a _dma_ API in the umem.c > v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com > > > Maor Gottlieb (3): > lib/scatterlist: Provide a dedicated function to support table append > lib/scatterlist: Fix wrong update of orig_nents > RDMA: Use the sg_table directly and remove the opencoded version from > umem I'm going to send this into linux-next, last time that triggered some bug reports. But overall it looks okay, though some of the sg_append_table is bit odd. Certainly using the sg_table throughout the RDMA code is big improvement. Lets see a v4, reviews/etc and I'll update it. Jason
Re: [PATCH] drm/i915/gt: Potential error pointer dereference in pinned_context()
On Fri, Aug 13, 2021 at 04:01:06PM +0200, Thomas Hellström wrote: > > On 8/13/21 1:36 PM, Dan Carpenter wrote: > > If the intel_engine_create_pinned_context() function returns an error > > pointer, then dereferencing "ce" will Oops. Use "vm" instead of > > "ce->vm". > > > > Fixes: cf586021642d ("drm/i915/gt: Pipelined page migration") > > Signed-off-by: Dan Carpenter > > --- > > drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c > > b/drivers/gpu/drm/i915/gt/intel_migrate.c > > index d0a7c934fd3b..1dac21aa7e5c 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c > > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c > > @@ -177,7 +177,7 @@ static struct intel_context *pinned_context(struct > > intel_gt *gt) > > ce = intel_engine_create_pinned_context(engine, vm, SZ_512K, > > I915_GEM_HWS_MIGRATE, > > &key, "migrate"); > > - i915_vm_put(ce->vm); > > + i915_vm_put(vm); > > return ce; > > } > > Thanks, > > Reviewed-by: Thomas Hellström And pushed to drm-intel-gt-next, thanks for the patch and review. > >
Re: [PATCH v2 0/8] drm/bridge: Make panel and bridge probe order consistent
Hi Andrzej, On Wed, Aug 04, 2021 at 04:09:38PM +0200, a.hajda wrote: > Hi Maxime, > > I have been busy with other tasks, and I did not follow the list last > time, so sorry for my late response. > > On 28.07.2021 15:32, Maxime Ripard wrote: > > Hi, > > > > We've encountered an issue with the RaspberryPi DSI panel that prevented the > > whole display driver from probing. > > > > The issue is described in detail in the commit 7213246a803f ("drm/vc4: dsi: > > Only register our component once a DSI device is attached"), but the basic > > idea > > is that since the panel is probed through i2c, there's no synchronization > > between its probe and the registration of the MIPI-DSI host it's attached > > to. > > > > We initially moved the component framework registration to the MIPI-DSI Host > > attach hook to make sure we register our component only when we have a DSI > > device attached to our MIPI-DSI host, and then use lookup our DSI device in > > our > > bind hook. > > > > However, all the DSI bridges controlled through i2c are only registering > > their > > associated DSI device in their bridge attach hook, meaning with our change > > I guess this is incorrect. I have promoted several times the pattern > that device driver shouldn't expose its interfaces (for example > component_add, drm_panel_add, drm_bridge_add) until it gathers all > required dependencies. In this particular case bridges should defer > probe until DSI bus becomes available. I guess this way the patch you > reverts would work. > > I advised few times this pattern in case of DSI hosts, apparently I > didn't notice the similar issue can appear in case of bridges. Or there > is something I have missed??? > > Anyway there are already eleven(?) bridge drivers using this pattern. I > wonder if fixing it would be difficult, or if it expose other issues??? > The patches should be quite straightforward - move > of_find_mipi_dsi_host_by_node and mipi_dsi_device_register_full to probe > time. I gave this a try today, went back to the current upstream code and found that indeed it works. I converted two bridges that works now. I'll send a new version some time next week and will convert all the others if we agree on the approach. Thanks for the suggestion! > Finally I think that if we will not fix these bridge drivers we will > encounter another set of issues with new platforms connecting "DSI host > drivers assuming this pattern" and "i2c/dsi device drivers assuming > pattern already present in the bridges". Yeah, this is exactly the situation I'm in :) Maxime signature.asc Description: PGP signature
Re: [PATCH] drm/i915: Actually delete gpu reloc selftests
On Fri, Aug 20, 2021 at 05:49:32PM +0200, Daniel Vetter wrote: > In > > commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7 > Author: Daniel Vetter > Date: Tue Aug 3 14:48:33 2021 +0200 > > drm/i915: delete gpu reloc code it would be better with dim cite format... do we need the Fixes: tag? anyway: Reviewed-by: Rodrigo Vivi > > I deleted the gpu relocation code and the selftest include and > enabling, but accidentally forgot about the selftest source code. > > Fix this oversight. > > Signed-off-by: Daniel Vetter > Cc: Jon Bloomfield > Cc: Chris Wilson > Cc: Maarten Lankhorst > Cc: Daniel Vetter > Cc: Joonas Lahtinen > Cc: "Thomas Hellström" > Cc: Matthew Auld > Cc: Lionel Landwerlin > Cc: Dave Airlie > Cc: Jason Ekstrand > --- > .../i915/gem/selftests/i915_gem_execbuffer.c | 190 -- > 1 file changed, 190 deletions(-) > delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > deleted file mode 100644 > index 16162fc2782d.. > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > +++ /dev/null > @@ -1,190 +0,0 @@ > -// SPDX-License-Identifier: MIT > -/* > - * Copyright © 2020 Intel Corporation > - */ > - > -#include "i915_selftest.h" > - > -#include "gt/intel_engine_pm.h" > -#include "selftests/igt_flush_test.h" > - > -static u64 read_reloc(const u32 *map, int x, const u64 mask) > -{ > - u64 reloc; > - > - memcpy(&reloc, &map[x], sizeof(reloc)); > - return reloc & mask; > -} > - > -static int __igt_gpu_reloc(struct i915_execbuffer *eb, > -struct drm_i915_gem_object *obj) > -{ > - const unsigned int offsets[] = { 8, 3, 0 }; > - const u64 mask = > - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0); > - const u32 *map = page_mask_bits(obj->mm.mapping); > - struct i915_request *rq; > - struct i915_vma *vma; > - int err; > - int i; > - > - vma = i915_vma_instance(obj, eb->context->vm, NULL); > - if (IS_ERR(vma)) > - return PTR_ERR(vma); > - > - err = i915_gem_object_lock(obj, &eb->ww); > - if (err) > - return err; > - > - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH); > - if (err) > - return err; > - > - /* 8-Byte aligned */ > - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0); > - if (err <= 0) > - goto reloc_err; > - > - /* !8-Byte aligned */ > - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1); > - if (err <= 0) > - goto reloc_err; > - > - /* Skip to the end of the cmd page */ > - i = PAGE_SIZE / sizeof(u32) - 1; > - i -= eb->reloc_cache.rq_size; > - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size, > - MI_NOOP, i); > - eb->reloc_cache.rq_size += i; > - > - /* Force next batch */ > - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2); > - if (err <= 0) > - goto reloc_err; > - > - GEM_BUG_ON(!eb->reloc_cache.rq); > - rq = i915_request_get(eb->reloc_cache.rq); > - reloc_gpu_flush(eb, &eb->reloc_cache); > - GEM_BUG_ON(eb->reloc_cache.rq); > - > - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2); > - if (err) { > - intel_gt_set_wedged(eb->engine->gt); > - goto put_rq; > - } > - > - if (!i915_request_completed(rq)) { > - pr_err("%s: did not wait for relocations!\n", eb->engine->name); > - err = -EINVAL; > - goto put_rq; > - } > - > - for (i = 0; i < ARRAY_SIZE(offsets); i++) { > - u64 reloc = read_reloc(map, offsets[i], mask); > - > - if (reloc != i) { > - pr_err("%s[%d]: map[%d] %llx != %x\n", > -eb->engine->name, i, offsets[i], reloc, i); > - err = -EINVAL; > - } > - } > - if (err) > - igt_hexdump(map, 4096); > - > -put_rq: > - i915_request_put(rq); > -unpin_vma: > - i915_vma_unpin(vma); > - return err; > - > -reloc_err: > - if (!err) > - err = -EIO; > - goto unpin_vma; > -} > - > -static int igt_gpu_reloc(void *arg) > -{ > - struct i915_execbuffer eb; > - struct drm_i915_gem_object *scratch; > - int err = 0; > - u32 *map; > - > - eb.i915 = arg; > - > - scratch = i915_gem_object_create_internal(eb.i915, 4096); > - if (IS_ERR(scratch)) > - return PTR_ERR(scratch); > - > - map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC); > - if (IS_ERR(map)) { > - err = PTR_ERR(map); > - goto err_scratch; > - } > - > - intel_gt_pm_get(&eb.i915->gt); > - > - for_each_uabi_engine(eb.engine, eb.i915) { > - if (intel_engine_requires_c
[pull] amdgpu, amdkfd, radeon drm-next-5.15
Hi Dave, Daniel, Updates for 5.15. Mainly bug fixes and cleanups. The following changes since commit 554594567b1fa3da74f88ec7b2dc83d000c58e98: drm/display: fix possible null-pointer dereference in dcn10_set_clock() (2021-08-11 17:19:54 -0400) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-next-5.15-2021-08-20 for you to fetch changes up to 90a9266269eb9f71af1f323c33e1dca53527bd22: drm/amdgpu: Cancel delayed work when GFXOFF is disabled (2021-08-20 12:09:44 -0400) amd-drm-next-5.15-2021-08-20: amdgpu: - embed hw fence into job - Misc SMU fixes - PSP TA code cleanup - RAS fixes - PWM fan speed fixes - DC workqueue cleanups - SR-IOV fixes - gfxoff delayed work fix - Pin domain check fix amdkfd: - SVM fixes radeon: - Code cleanup Anthony Koo (1): drm/amd/display: [FW Promotion] Release 0.0.79 Aric Cyr (1): drm/amd/display: 3.2.149 Candice Li (3): drm/amd/amdgpu: consolidate PSP TA context drm/amd/amdgpu: remove unnecessary RAS context field drm/amd: consolidate TA shared memory structures Christian König (1): drm/amdgpu: use the preferred pin domain after the check Colin Ian King (1): drm/amd/pm: Fix spelling mistake "firwmare" -> "firmware" Evan Quan (9): drm/amd/pm: correct the fan speed RPM setting drm/amd/pm: record the RPM and PWM based fan speed settings drm/amd/pm: correct the fan speed PWM retrieving drm/amd/pm: correct the fan speed RPM retrieving drm/amd/pm: drop the unnecessary intermediate percent-based transition drm/amd/pm: drop unnecessary manual mode check drm/amd/pm: correct the address of Arcturus fan related registers drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily drm/amd/pm: a quick fix for "divided by zero" error Hawking Zhang (1): drm/amdgpu: increase max xgmi physical node for aldebaran Jack Zhang (1): drm/amd/amdgpu embed hw_fence into amdgpu_job Jake Wang (1): drm/amd/display: Ensure DCN save after VM setup Jiange Zhao (1): drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification. Jonathan Kim (1): drm/amdgpu: get extended xgmi topology data Kenneth Feng (2): Revert "drm/amd/pm: fix workload mismatch on vega10" drm/amd/pm: change the workload type for some cards Kevin Wang (5): drm/amd/pm: correct DPM_XGMI/VCN_DPM feature name drm/amd/pm: skip to load smu microcode on sriov for aldebaran drm/amd/pm: change return value in aldebaran_get_power_limit() drm/amd/pm: change smu msg's attribute to allow working under sriov drm/amd/pm: change pp_dpm_sclk/mclk/fclk attribute is RO for aldebaran Lukas Bulwahn (1): drm: amdgpu: remove obsolete reference to config CHASH Michel Dänzer (1): drm/amdgpu: Cancel delayed work when GFXOFF is disabled Nathan Chancellor (1): drm/radeon: Add break to switch statement in radeonfb_create_pinned_object() Nicholas Kazlauskas (3): drm/amd/display: Fix multi-display support for idle opt workqueue drm/amd/display: Use vblank control events for PSR enable/disable drm/amd/display: Guard vblank wq flush with DCN guards Wayne Lin (1): drm/amd/display: Create dc_sink when EDID fail Yifan Zhang (1): drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure YuBiao Wang (1): drm/amd/amdgpu:flush ttm delayed work before cancel_sync Zhan Liu (1): drm/amd/display: Use DCN30 watermark calc for DCN301 Zhigang Luo (1): drm/amdgpu: correct MMSCH 1.0 version drivers/gpu/drm/Kconfig| 1 - drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 86 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 37 +- drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c| 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 39 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h| 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 432 - drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 111 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c| 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
Re: [PATCH v3] drm/i915/dp: Use max params for panels < eDP 1.4
On Fri, Aug 20, 2021 at 03:52:59PM +0800, Kai-Heng Feng wrote: > Users reported that after commit 2bbd6dba84d4 ("drm/i915: Try to use > fast+narrow link on eDP again and fall back to the old max strategy on > failure"), the screen starts to have wobbly effect. > > Commit a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for > everything") doesn't help either, that means the affected eDP 1.2 panels > only work with max params. > > So use max params for panels < eDP 1.4 as Windows does to solve the > issue. > > v3: > - Do the eDP rev check in intel_edp_init_dpcd() > > v2: > - Check eDP 1.4 instead of DPCD 1.1 to apply max params > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3714 > Fixes: 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and > fall back to the old max strategy on failure") > Fixes: a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for > everything") > Suggested-by: Ville Syrjälä > Signed-off-by: Kai-Heng Feng Slapped a cc:stable on it and pushed to drm-intel-next. Thanks. > --- > drivers/gpu/drm/i915/display/intel_dp.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c > b/drivers/gpu/drm/i915/display/intel_dp.c > index 75d4ebc669411..e0dbd35ae7bc0 100644 > --- a/drivers/gpu/drm/i915/display/intel_dp.c > +++ b/drivers/gpu/drm/i915/display/intel_dp.c > @@ -2445,11 +2445,14 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp) >*/ > if (drm_dp_dpcd_read(&intel_dp->aux, DP_EDP_DPCD_REV, >intel_dp->edp_dpcd, sizeof(intel_dp->edp_dpcd)) == > - sizeof(intel_dp->edp_dpcd)) > + sizeof(intel_dp->edp_dpcd)) { > drm_dbg_kms(&dev_priv->drm, "eDP DPCD: %*ph\n", > (int)sizeof(intel_dp->edp_dpcd), > intel_dp->edp_dpcd); > > + intel_dp->use_max_params = intel_dp->edp_dpcd[0] < DP_EDP_14; > + } > + > /* >* This has to be called after intel_dp->edp_dpcd is filled, PSR checks >* for SET_POWER_CAPABLE bit in intel_dp->edp_dpcd[1] > -- > 2.32.0 -- Ville Syrjälä Intel
[GIT PULL] etnaviv-next for 5.15
Hi Dave, Daniel, things are still slow in etnaviv land. Just one hardware support addition for the GPU found on the NXP Layerscape LS1028A SoC from Michael and the GEM mmap cleanup from Thomas. Regards, Lucas The following changes since commit 8a02ea42bc1d4c448caf1bab0e05899dad503f74: Merge tag 'drm-intel-next-fixes-2021-06-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-06-30 15:42:05 +1000) are available in the Git repository at: https://git.pengutronix.de/git/lst/linux 81fd23e2b3ccf71c807e671444e8accaba98ca53 for you to fetch changes up to 81fd23e2b3ccf71c807e671444e8accaba98ca53: drm/etnaviv: Implement mmap as GEM object function (2021-07-06 18:32:23 +0200) Michael Walle (2): drm/etnaviv: add HWDB entry for GC7000 r6202 drm/etnaviv: add clock gating workaround for GC7000 r6202 Thomas Zimmermann (1): drm/etnaviv: Implement mmap as GEM object function drivers/gpu/drm/etnaviv/etnaviv_drv.c | 14 ++ drivers/gpu/drm/etnaviv/etnaviv_drv.h | 3 --- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 18 +- drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 13 - drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 6 ++ drivers/gpu/drm/etnaviv/etnaviv_hwdb.c | 31 +++ 6 files changed, 44 insertions(+), 41 deletions(-)
[GIT PULL] drm-misc + drm-intel: Add support for out-of-band hotplug notification
Hello drm-misc and drm-intel maintainers, My "Add support for out-of-band hotplug notification" patchset: https://patchwork.freedesktop.org/series/93763/ Is ready for merging now, as discussed on IRC I based this series on top drm-tip and when trying to apply the i915 parts on top of drm-misc this fails due to conflict. So as Jani suggested here is a pull-req for a topic-branch with the entire set, minus the troublesome i915 bits. Once this has been merged into both drm-misc-next and drm-intel-next I can push the 2 i915 patch do drm-intel-next on top of the merge. Note there are also 2 drivers/usb/typec patches in here these have Greg KH's Reviewed-by for merging through the drm tree, Since this USB code does not change all that much. I also checked and the drm-misc-next-2021-08-12 base of this tree contains the same last commit to the modified file as usb-next. Daniel Vetter mentioned on IRC that it might be better for you to simply pick-up the series directly from patchwork, that is fine too in that case don't forget to add: Reviewed-by: Lyude Paul To the entire series (given in a reply to the cover-letter) And: Reviewed-by: Greg Kroah-Hartman To the usb/typec patches (patch 7/8), this was given in reply to a previous posting of the series and I forgot to add this in the resend. Regards, Hans The following changes since commit c7782443a88926a4f938f0193041616328cf2db2: drm/bridge: ti-sn65dsi86: Avoid creating multiple connectors (2021-08-12 09:56:09 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux.git drm-misc-intel-oob-hotplug-v1 for you to fetch changes up to 7f811394878535ed9a6849717de8c2959ae38899: usb: typec: altmodes/displayport: Notify drm subsys of hotplug events (2021-08-20 12:35:59 +0200) Topic branch for drm-misc / drm-intel for OOB hotplug support for Type-C connectors Hans de Goede (6): drm/connector: Give connector sysfs devices there own device_type drm/connector: Add a fwnode pointer to drm_connector and register with ACPI (v2) drm/connector: Add drm_connector_find_by_fwnode() function (v3) drm/connector: Add support for out-of-band hotplug notification (v3) usb: typec: altmodes/displayport: Make dp_altmode_notify() more generic usb: typec: altmodes/displayport: Notify drm subsys of hotplug events drivers/gpu/drm/drm_connector.c | 79 + drivers/gpu/drm/drm_crtc_internal.h | 2 + drivers/gpu/drm/drm_sysfs.c | 87 +++- drivers/usb/typec/altmodes/Kconfig | 1 + drivers/usb/typec/altmodes/displayport.c | 58 + include/drm/drm_connector.h | 25 + 6 files changed, 217 insertions(+), 35 deletions(-)
Re: [PATCH] drm/i915: Actually delete gpu reloc selftests
On Fri, Aug 20, 2021 at 7:00 PM Rodrigo Vivi wrote: > > On Fri, Aug 20, 2021 at 05:49:32PM +0200, Daniel Vetter wrote: > > In > > > > commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7 > > Author: Daniel Vetter > > Date: Tue Aug 3 14:48:33 2021 +0200 > > > > drm/i915: delete gpu reloc code > > it would be better with dim cite format... > > do we need the Fixes: tag? I did delete the selftest, I just forgot to delete the code. So no Fixes: imo. I'll bikeshed the commit citation. > anyway: > > Reviewed-by: Rodrigo Vivi Thanks for the review, will merge when CI approves too, one never knows. -Daniel > > > > > > I deleted the gpu relocation code and the selftest include and > > enabling, but accidentally forgot about the selftest source code. > > > > Fix this oversight. > > > > Signed-off-by: Daniel Vetter > > Cc: Jon Bloomfield > > Cc: Chris Wilson > > Cc: Maarten Lankhorst > > Cc: Daniel Vetter > > Cc: Joonas Lahtinen > > Cc: "Thomas Hellström" > > Cc: Matthew Auld > > Cc: Lionel Landwerlin > > Cc: Dave Airlie > > Cc: Jason Ekstrand > > --- > > .../i915/gem/selftests/i915_gem_execbuffer.c | 190 -- > > 1 file changed, 190 deletions(-) > > delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > > > > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > > deleted file mode 100644 > > index 16162fc2782d.. > > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c > > +++ /dev/null > > @@ -1,190 +0,0 @@ > > -// SPDX-License-Identifier: MIT > > -/* > > - * Copyright © 2020 Intel Corporation > > - */ > > - > > -#include "i915_selftest.h" > > - > > -#include "gt/intel_engine_pm.h" > > -#include "selftests/igt_flush_test.h" > > - > > -static u64 read_reloc(const u32 *map, int x, const u64 mask) > > -{ > > - u64 reloc; > > - > > - memcpy(&reloc, &map[x], sizeof(reloc)); > > - return reloc & mask; > > -} > > - > > -static int __igt_gpu_reloc(struct i915_execbuffer *eb, > > -struct drm_i915_gem_object *obj) > > -{ > > - const unsigned int offsets[] = { 8, 3, 0 }; > > - const u64 mask = > > - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0); > > - const u32 *map = page_mask_bits(obj->mm.mapping); > > - struct i915_request *rq; > > - struct i915_vma *vma; > > - int err; > > - int i; > > - > > - vma = i915_vma_instance(obj, eb->context->vm, NULL); > > - if (IS_ERR(vma)) > > - return PTR_ERR(vma); > > - > > - err = i915_gem_object_lock(obj, &eb->ww); > > - if (err) > > - return err; > > - > > - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH); > > - if (err) > > - return err; > > - > > - /* 8-Byte aligned */ > > - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0); > > - if (err <= 0) > > - goto reloc_err; > > - > > - /* !8-Byte aligned */ > > - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1); > > - if (err <= 0) > > - goto reloc_err; > > - > > - /* Skip to the end of the cmd page */ > > - i = PAGE_SIZE / sizeof(u32) - 1; > > - i -= eb->reloc_cache.rq_size; > > - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size, > > - MI_NOOP, i); > > - eb->reloc_cache.rq_size += i; > > - > > - /* Force next batch */ > > - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2); > > - if (err <= 0) > > - goto reloc_err; > > - > > - GEM_BUG_ON(!eb->reloc_cache.rq); > > - rq = i915_request_get(eb->reloc_cache.rq); > > - reloc_gpu_flush(eb, &eb->reloc_cache); > > - GEM_BUG_ON(eb->reloc_cache.rq); > > - > > - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2); > > - if (err) { > > - intel_gt_set_wedged(eb->engine->gt); > > - goto put_rq; > > - } > > - > > - if (!i915_request_completed(rq)) { > > - pr_err("%s: did not wait for relocations!\n", > > eb->engine->name); > > - err = -EINVAL; > > - goto put_rq; > > - } > > - > > - for (i = 0; i < ARRAY_SIZE(offsets); i++) { > > - u64 reloc = read_reloc(map, offsets[i], mask); > > - > > - if (reloc != i) { > > - pr_err("%s[%d]: map[%d] %llx != %x\n", > > -eb->engine->name, i, offsets[i], reloc, i); > > - err = -EINVAL; > > - } > > - } > > - if (err) > > - igt_hexdump(map, 4096); > > - > > -put_rq: > > - i915_request_put(rq); > > -unpin_vma: > > - i915_vma_unpin(vma); > > - return err; > > - > > -reloc_err: > > - if (!err) > > - err = -EIO; > > - goto unpin_vma; > > -} > > - > > -static int igt_gpu_reloc(void *arg) > > -{ > > - struct i915_execbuffer eb; > > - struct d
Re: [PATCH v3 1/3] dt-bindings: Add YAML bindings for NVDEC
On Wed, Aug 18, 2021 at 8:18 AM Thierry Reding wrote: > > On Wed, Aug 18, 2021 at 11:24:28AM +0300, Mikko Perttunen wrote: > > On 8/18/21 12:20 AM, Rob Herring wrote: > > > On Wed, Aug 11, 2021 at 01:50:28PM +0300, Mikko Perttunen wrote: > > > > Add YAML device tree bindings for NVDEC, now in a more appropriate > > > > place compared to the old textual Host1x bindings. > > > > > > > > Signed-off-by: Mikko Perttunen > > > > --- > > > > v3: > > > > * Drop host1x bindings > > > > * Change read2 to read-1 in interconnect names > > > > v2: > > > > * Fix issues pointed out in v1 > > > > * Add T194 nvidia,instance property > > > > --- > > > > .../gpu/host1x/nvidia,tegra210-nvdec.yaml | 109 ++ > > > > MAINTAINERS | 1 + > > > > 2 files changed, 110 insertions(+) > > > > create mode 100644 > > > > Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > > > > > > > > diff --git > > > > a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > > > > > > > > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > > > > new file mode 100644 > > > > index ..571849625da8 > > > > --- /dev/null > > > > +++ > > > > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > > > > @@ -0,0 +1,109 @@ > > > > +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) > > > > +%YAML 1.2 > > > > +--- > > > > +$id: > > > > "http://devicetree.org/schemas/gpu/host1x/nvidia,tegra210-nvdec.yaml#"; > > > > +$schema: "http://devicetree.org/meta-schemas/core.yaml#"; > > > > + > > > > +title: Device tree binding for NVIDIA Tegra NVDEC > > > > + > > > > +description: | > > > > + NVDEC is the hardware video decoder present on NVIDIA Tegra210 > > > > + and newer chips. It is located on the Host1x bus and typically > > > > + programmed through Host1x channels. > > > > + > > > > +maintainers: > > > > + - Thierry Reding > > > > + - Mikko Perttunen > > > > + > > > > +properties: > > > > + $nodename: > > > > +pattern: "^nvdec@[0-9a-f]*$" > > > > + > > > > + compatible: > > > > +enum: > > > > + - nvidia,tegra210-nvdec > > > > + - nvidia,tegra186-nvdec > > > > + - nvidia,tegra194-nvdec > > > > + > > > > + reg: > > > > +maxItems: 1 > > > > + > > > > + clocks: > > > > +maxItems: 1 > > > > + > > > > + clock-names: > > > > +items: > > > > + - const: nvdec > > > > + > > > > + resets: > > > > +maxItems: 1 > > > > + > > > > + reset-names: > > > > +items: > > > > + - const: nvdec > > > > + > > > > + power-domains: > > > > +maxItems: 1 > > > > + > > > > + iommus: > > > > +maxItems: 1 > > > > + > > > > + interconnects: > > > > +items: > > > > + - description: DMA read memory client > > > > + - description: DMA read 2 memory client > > > > + - description: DMA write memory client > > > > + > > > > + interconnect-names: > > > > +items: > > > > + - const: dma-mem > > > > + - const: read-1 > > > > + - const: write > > > > + > > > > +required: > > > > + - compatible > > > > + - reg > > > > + - clocks > > > > + - clock-names > > > > + - resets > > > > + - reset-names > > > > + - power-domains > > > > + > > > > +if: > > > > + properties: > > > > +compatible: > > > > + contains: > > > > +const: nvidia,tegra194-host1x > > > > > > host1x? This will never be true as the schema is only applied to nodes > > > with the nvdec compatible. > > > > Argh, it's a typo indeed. Should be nvidia,tegra194-nvdec. > > > > > > > > > +then: > > > > + properties: > > > > +nvidia,instance: > > > > + items: > > > > +- description: 0 for NVDEC0, or 1 for NVDEC1 > > > > > > What's this for? We generally don't do indices in DT. > > > > When programming the hardware through Host1x, we need to know the "class ID" > > of the hardware, specific to each instance. So we need to know which > > instance it is. Technically of course we could derive this from the MMIO > > address but that seems more confusing. > > > > > > > > > + > > > > +additionalProperties: true > > > > > > This should be false or 'unevaluatedProperties: false' > > > > I tried that but it resulted in validation failures; please see the > > discussion in v2. > > Rob mentioned that there is now support for unevaluatedProperties in > dt-schema. I was able to test this, though with only limited success. I > made the following changes on top of this patch: Here's a branch that works with current jsonschema master: https://github.com/robherring/dt-schema/tree/draft2020-12 > --- >8 --- > diff --git > a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > index d2681c98db7e..0bdf05fc8fc7 100644 > --- a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml > +++ b/Documentation/devicetree/bindings/gpu/host1x/nvidia,teg
Re: [Intel-gfx] [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request
On 8/18/2021 11:16 PM, Matthew Brost wrote: Kick tasklet after queuing a request so it submitted in a timely manner. Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for new inteface") Is this actually a bug or just a performance issue? in the latter case I don't think we need a fixes tag. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 8f7a11e65ef5..d61f906105ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1050,6 +1050,7 @@ static inline void queue_request(struct i915_sched_engine *sched_engine, list_add_tail(&rq->sched.link, i915_sched_lookup_priolist(sched_engine, prio)); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + tasklet_hi_schedule(&sched_engine->tasklet); the caller of queue_request() already has a tasklet_hi_schedule in another branch of the if/else statement. Maybe we can have the caller own the kick to keep it in one place? Not a blocker. Reviewed-by: Daniele Ceraolo Spurio Daniele } static int guc_bypass_tasklet_submit(struct intel_guc *guc,
Re: [Intel-gfx] [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request
On Fri, Aug 20, 2021 at 11:31:56AM -0700, Daniele Ceraolo Spurio wrote: > > > On 8/18/2021 11:16 PM, Matthew Brost wrote: > > Kick tasklet after queuing a request so it submitted in a timely manner. > > > > Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for > > new inteface") > > Is this actually a bug or just a performance issue? in the latter case I > don't think we need a fixes tag. > Basically the tasklet won't get queued in certain ituations until the heartbeat ping. Didn't notice it as the tasklet is only used during flow control or after a full GT reset which both are rather rare. We can probably drop the fixes tag as GuC submission isn't on by default is still works without this fix. > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 8f7a11e65ef5..d61f906105ef 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -1050,6 +1050,7 @@ static inline void queue_request(struct > > i915_sched_engine *sched_engine, > > list_add_tail(&rq->sched.link, > > i915_sched_lookup_priolist(sched_engine, prio)); > > set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); > > + tasklet_hi_schedule(&sched_engine->tasklet); > > the caller of queue_request() already has a tasklet_hi_schedule in another > branch of the if/else statement. Maybe we can have the caller own the kick > to keep it in one place? Not a blocker. > I guess it could be: bool kick = need_taklet() if (kick) queue_requst() else kick = bypass() if (kick) kick_tasklet() Idk, it that is better? I'll think on this and decide before the next post. Matt > Reviewed-by: Daniele Ceraolo Spurio > > Daniele > > > } > > static int guc_bypass_tasklet_submit(struct intel_guc *guc, >
Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered
On 8/18/2021 11:16 PM, Matthew Brost wrote: When unblocking a context, do not enable scheduling if the context is banned, guc_id invalid, or not registered. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost Cc: --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d61f906105ef..e53a4ef7d442 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1586,6 +1586,9 @@ static void guc_context_unblock(struct intel_context *ce) spin_lock_irqsave(&ce->guc_state.lock, flags); if (unlikely(submission_disabled(guc) || +intel_context_is_banned(ce) || +context_guc_id_invalid(ce) || +!lrc_desc_registered(guc, ce->guc_id) || !intel_context_is_pinned(ce) || context_pending_disable(ce) || context_blocked(ce) > 1)) { This is getting to a lot of conditions. Maybe we can simplify it a bit? E.g it should be possible to check context_blocked, context_banned and context_pending_disable as a single op: #define SCHED_STATE_MULTI_BLOCKED_MASK \ /* 2 or more blocks */ (SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED) #define SCHED_STATE_NO_UNBLOCK \ SCHED_STATE_MULTI_BLOCKED_MASK | \ SCHED_STATE_PENDING_DISABLE | \ SCHED_STATE_BANNED Not a blocker. Reviewed-by: Daniele Ceraolo Spurio Daniele
Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered
On Fri, Aug 20, 2021 at 11:42:38AM -0700, Daniele Ceraolo Spurio wrote: > > > On 8/18/2021 11:16 PM, Matthew Brost wrote: > > When unblocking a context, do not enable scheduling if the context is > > banned, guc_id invalid, or not registered. > > > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") > > Signed-off-by: Matthew Brost > > Cc: > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index d61f906105ef..e53a4ef7d442 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -1586,6 +1586,9 @@ static void guc_context_unblock(struct intel_context > > *ce) > > spin_lock_irqsave(&ce->guc_state.lock, flags); > > if (unlikely(submission_disabled(guc) || > > +intel_context_is_banned(ce) || > > +context_guc_id_invalid(ce) || > > +!lrc_desc_registered(guc, ce->guc_id) || > > !intel_context_is_pinned(ce) || > > context_pending_disable(ce) || > > context_blocked(ce) > 1)) { > > This is getting to a lot of conditions. Maybe we can simplify it a bit? E.g Yea, this some defensive programming to cover all the basis if another async operation (ban, reset, unpin) happens when this op is in flight. Probably some of the conditions are not needed but being extra safe here. > it should be possible to check context_blocked, context_banned and > context_pending_disable as a single op: > > #define SCHED_STATE_MULTI_BLOCKED_MASK \ /* 2 or more blocks */ > (SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED) > #define SCHED_STATE_NO_UNBLOCK \ > SCHED_STATE_MULTI_BLOCKED_MASK | \ > SCHED_STATE_PENDING_DISABLE | \ > SCHED_STATE_BANNED Good idea, let me move this to helper in the next spin. Matt > > Not a blocker. > > Reviewed-by: Daniele Ceraolo Spurio > > Daniele > >
Re: [syzbot] WARNING in drm_gem_shmem_vm_open
Hi Am 20.08.21 um 17:45 schrieb syzbot: syzbot has bisected this issue to: Good bot! commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2 Author: Daniel Vetter Date: Fri Oct 9 23:21:56 2020 + drm/vkms: fbdev emulation support Here's a guess. GEM SHMEM + fbdev emulation requires that (drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O and SHMEM conflict over the use of page flags IIRC. From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming amount of SHMEM-based drivers don't do either. Best regards Thomas bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530 start commit: 614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern.. git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530 console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530 kernel config: https://syzkaller.appspot.com/x/.config?x=96f0602203250753 dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30 Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support") For information about bisection process see: https://goo.gl/tpsmEJ#bisection -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer OpenPGP_signature Description: OpenPGP digital signature
Re: [syzbot] WARNING in drm_gem_shmem_vm_open
On Fri, Aug 20, 2021 at 9:23 PM Thomas Zimmermann wrote: > Hi > > Am 20.08.21 um 17:45 schrieb syzbot: > > syzbot has bisected this issue to: > > Good bot! > > > > > commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2 > > Author: Daniel Vetter > > Date: Fri Oct 9 23:21:56 2020 + > > > > drm/vkms: fbdev emulation support > > Here's a guess. > > GEM SHMEM + fbdev emulation requires that > (drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O > and SHMEM conflict over the use of page flags IIRC. But we should only set up defio if fb->dirty is set, which vkms doesn't do. So there's something else going on? So there must be something else funny going on here I think ... No idea what's going on really. -Daniel > From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming > amount of SHMEM-based drivers don't do either. > > Best regards > Thomas > > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530 > > start commit: 614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern.. > > git tree: upstream > > final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530 > > console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530 > > kernel config: https://syzkaller.appspot.com/x/.config?x=96f0602203250753 > > dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30 > > > > Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com > > Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support") > > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 07/27] Revert "drm/i915/gt: Propagate change in error status to children on unhold"
On Thu, Aug 19, 2021 at 1:22 AM Matthew Brost wrote: > > Propagating errors to dependent fences is wrong, don't do it. A selftest > in the following exposed the propagating of an error to a dependent > fence after an engine reset. I feel like we could still have a bit of a better message. Maybe something like this: Propagating errors to dependent fences is broken and can lead to errors from one client ending up in another. In 3761baae908a (Revert "drm/i915: Propagate errors on awaiting already signaled fences"), we attempted to get rid of fence error propagation but missed the case added in 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status to children on unhold"). Revert that one too. This error was found by an up-and-coming selftest which . Otherwise, looks good to me. --Jason > > This reverts commit 8e9f84cf5cac248a1c6a5daa4942879c8b765058. > > v2: > (Daniel Vetter) > - Use revert > > References: 3761baae908a (Revert "drm/i915: Propagate errors on awaiting > already signaled fences") > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 > 1 file changed, 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > index de5f9c86b9a4..cafb0608ffb4 100644 > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq) > if (p->flags & I915_DEPENDENCY_WEAK) > continue; > > - /* Propagate any change in error status */ > - if (rq->fence.error) > - i915_request_set_error_once(w, > rq->fence.error); > - > if (w->engine != rq->engine) > continue; > > -- > 2.32.0 >
Re: refactor the i915 GVT support
On Fri, Aug 20, 2021 at 04:17:24PM +0200, Christoph Hellwig wrote: > On Thu, Aug 19, 2021 at 04:29:29PM +0800, Zhenyu Wang wrote: > > I'm working on below patch to resolve this. But I met a weird issue in > > case when building i915 as module and also kvmgt module, it caused > > busy wait on request_module("kvmgt") when boot, it doesn't happen if > > building i915 into kernel. I'm not sure what could be the reason? > > Luis, do you know if there is a problem with a request_module from > a driver ->probe routine that is probably called by a module_init > function itself? Generally no, but you can easily foot yourself in the feet by creating cross dependencies and not dealing with them properly. I'd make sure to keep module initialization as simple as possible, and run whatever takes more time asynchronously, then use a state machine to allow you to verify where you are in the initialization phase or query it or wait for a completion with a timeout. It seems the code in question is getting some spring cleaning, and its unclear where the code is I can inspect. If there's a tree somewhere I can take a peak I'd be happy to review possible oddities that may stick out. My goto model for these sorts of problems is to abstract the issue *outside* of the driver in question and implement new selftests to try to reproduce. This serves two purposes, 1) helps with testing 2) may allow you to see the problem more clearly. Luis
Re: [git pull] drm fixes for 5.14-rc7
The pull request you sent on Fri, 20 Aug 2021 15:36:29 +1000: > git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-08-20-3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/8ba9fbe1e4b8a28050c283792344ee8b6bc3465c Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
[PATCH 1/8] drm/etnaviv: return context from etnaviv_iommu_context_get
Being able to have the refcount manipulation in an assignment makes it much easier to parse the code. Cc: sta...@vger.kernel.org # 5.4 Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_buffer.c | 3 +-- drivers/gpu/drm/etnaviv/etnaviv_gem.c| 3 +-- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 3 +-- drivers/gpu/drm/etnaviv/etnaviv_gpu.c| 6 ++ drivers/gpu/drm/etnaviv/etnaviv_mmu.h| 4 +++- 5 files changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_buffer.c b/drivers/gpu/drm/etnaviv/etnaviv_buffer.c index 76d38561c910..cf741c5c82d2 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_buffer.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_buffer.c @@ -397,8 +397,7 @@ void etnaviv_buffer_queue(struct etnaviv_gpu *gpu, u32 exec_state, if (switch_mmu_context) { struct etnaviv_iommu_context *old_context = gpu->mmu_context; - etnaviv_iommu_context_get(mmu_context); - gpu->mmu_context = mmu_context; + gpu->mmu_context = etnaviv_iommu_context_get(mmu_context); etnaviv_iommu_context_put(old_context); } diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c index b8fa6ed3dd73..fb7a33b88fc0 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c @@ -303,8 +303,7 @@ struct etnaviv_vram_mapping *etnaviv_gem_mapping_get( list_del(&mapping->obj_node); } - etnaviv_iommu_context_get(mmu_context); - mapping->context = mmu_context; + mapping->context = etnaviv_iommu_context_get(mmu_context); mapping->use = 1; ret = etnaviv_iommu_map_gem(mmu_context, etnaviv_obj, diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 4dd7d9d541c0..486259e154af 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -532,8 +532,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, goto err_submit_objects; submit->ctx = file->driver_priv; - etnaviv_iommu_context_get(submit->ctx->mmu); - submit->mmu_context = submit->ctx->mmu; + submit->mmu_context = etnaviv_iommu_context_get(submit->ctx->mmu); submit->exec_state = args->exec_state; submit->flags = args->flags; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 4102bcea3341..c8b9b0cc4442 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1365,12 +1365,10 @@ struct dma_fence *etnaviv_gpu_submit(struct etnaviv_gem_submit *submit) } if (!gpu->mmu_context) { - etnaviv_iommu_context_get(submit->mmu_context); - gpu->mmu_context = submit->mmu_context; + gpu->mmu_context = etnaviv_iommu_context_get(submit->mmu_context); etnaviv_gpu_start_fe_idleloop(gpu); } else { - etnaviv_iommu_context_get(gpu->mmu_context); - submit->prev_mmu_context = gpu->mmu_context; + submit->prev_mmu_context = etnaviv_iommu_context_get(gpu->mmu_context); } if (submit->nr_pmrs) { diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.h b/drivers/gpu/drm/etnaviv/etnaviv_mmu.h index d1d6902fd13b..e4a0b7d09c2e 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.h @@ -105,9 +105,11 @@ void etnaviv_iommu_dump(struct etnaviv_iommu_context *ctx, void *buf); struct etnaviv_iommu_context * etnaviv_iommu_context_init(struct etnaviv_iommu_global *global, struct etnaviv_cmdbuf_suballoc *suballoc); -static inline void etnaviv_iommu_context_get(struct etnaviv_iommu_context *ctx) +static inline struct etnaviv_iommu_context * +etnaviv_iommu_context_get(struct etnaviv_iommu_context *ctx) { kref_get(&ctx->refcount); + return ctx; } void etnaviv_iommu_context_put(struct etnaviv_iommu_context *ctx); void etnaviv_iommu_restore(struct etnaviv_gpu *gpu, -- 2.30.2
[PATCH 2/8] drm/etnaviv: put submit prev MMU context when it exists
The prev context is the MMU context at the time of the job queueing in hardware. As a job might be queued multiple times due to recovery after a GPU hang, we need to make sure to put the stale prev MMU context from a prior queuing, to avoid the reference and thus the MMU context leaking. Cc: sta...@vger.kernel.org # 5.4 Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index c8b9b0cc4442..c1b9c5cbed11 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1368,6 +1368,8 @@ struct dma_fence *etnaviv_gpu_submit(struct etnaviv_gem_submit *submit) gpu->mmu_context = etnaviv_iommu_context_get(submit->mmu_context); etnaviv_gpu_start_fe_idleloop(gpu); } else { + if (submit->prev_mmu_context) + etnaviv_iommu_context_put(submit->prev_mmu_context); submit->prev_mmu_context = etnaviv_iommu_context_get(gpu->mmu_context); } -- 2.30.2
[PATCH 8/8] drm/etnaviv: add missing MMU context put when reaping MMU mapping
When we forcefully evict a mapping from the the address space and thus the MMU context, the MMU context is leaked, as the mapping no longer points to it, so it doesn't get freed when the GEM object is destroyed. Add the mssing context put to fix the leak. Cc: sta...@vger.kernel.org # 5.4 Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c index dab1b58006d8..9fb1a2aadbcb 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c @@ -199,6 +199,7 @@ static int etnaviv_iommu_find_iova(struct etnaviv_iommu_context *context, */ list_for_each_entry_safe(m, n, &list, scan_node) { etnaviv_iommu_remove_mapping(context, m); + etnaviv_iommu_context_put(m->context); m->context = NULL; list_del_init(&m->mmu_node); list_del_init(&m->scan_node); -- 2.30.2
[PATCH 7/8] drm/etnaviv: reference MMU context when setting up hardware state
Move the refcount manipulation of the MMU context to the point where the hardware state is programmed. At that point it is also known if a previous MMU state is still there, or the state needs to be reprogrammed with a potentially different context. Cc: sta...@vger.kernel.org # 5.4 Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 24 +++--- drivers/gpu/drm/etnaviv/etnaviv_iommu.c| 4 drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 8 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index f420c4f14657..1fa98ce870f7 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -641,17 +641,19 @@ void etnaviv_gpu_start_fe(struct etnaviv_gpu *gpu, u32 address, u16 prefetch) gpu->fe_running = true; } -static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu) +static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu, + struct etnaviv_iommu_context *context) { - u32 address = etnaviv_cmdbuf_get_va(&gpu->buffer, - &gpu->mmu_context->cmdbuf_mapping); u16 prefetch; + u32 address; /* setup the MMU */ - etnaviv_iommu_restore(gpu, gpu->mmu_context); + etnaviv_iommu_restore(gpu, context); /* Start command processor */ prefetch = etnaviv_buffer_init(gpu); + address = etnaviv_cmdbuf_get_va(&gpu->buffer, + &gpu->mmu_context->cmdbuf_mapping); etnaviv_gpu_start_fe(gpu, address, prefetch); } @@ -1369,14 +1371,12 @@ struct dma_fence *etnaviv_gpu_submit(struct etnaviv_gem_submit *submit) goto out_unlock; } - if (!gpu->fe_running) { - gpu->mmu_context = etnaviv_iommu_context_get(submit->mmu_context); - etnaviv_gpu_start_fe_idleloop(gpu); - } else { - if (submit->prev_mmu_context) - etnaviv_iommu_context_put(submit->prev_mmu_context); - submit->prev_mmu_context = etnaviv_iommu_context_get(gpu->mmu_context); - } + if (!gpu->fe_running) + etnaviv_gpu_start_fe_idleloop(gpu, submit->mmu_context); + + if (submit->prev_mmu_context) + etnaviv_iommu_context_put(submit->prev_mmu_context); + submit->prev_mmu_context = etnaviv_iommu_context_get(gpu->mmu_context); if (submit->nr_pmrs) { gpu->event[event[1]].sync_point = &sync_point_perfmon_sample_pre; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu.c index 1a7c89a67bea..afe5dd6a9925 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu.c @@ -92,6 +92,10 @@ static void etnaviv_iommuv1_restore(struct etnaviv_gpu *gpu, struct etnaviv_iommuv1_context *v1_context = to_v1_context(context); u32 pgtable; + if (gpu->mmu_context) + etnaviv_iommu_context_put(gpu->mmu_context); + gpu->mmu_context = etnaviv_iommu_context_get(context); + /* set base addresses */ gpu_write(gpu, VIVS_MC_MEMORY_BASE_ADDR_RA, context->global->memory_base); gpu_write(gpu, VIVS_MC_MEMORY_BASE_ADDR_FE, context->global->memory_base); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c index f8bf488e9d71..d664ae29ae20 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c @@ -172,6 +172,10 @@ static void etnaviv_iommuv2_restore_nonsec(struct etnaviv_gpu *gpu, if (gpu_read(gpu, VIVS_MMUv2_CONTROL) & VIVS_MMUv2_CONTROL_ENABLE) return; + if (gpu->mmu_context) + etnaviv_iommu_context_put(gpu->mmu_context); + gpu->mmu_context = etnaviv_iommu_context_get(context); + prefetch = etnaviv_buffer_config_mmuv2(gpu, (u32)v2_context->mtlb_dma, (u32)context->global->bad_page_dma); @@ -192,6 +196,10 @@ static void etnaviv_iommuv2_restore_sec(struct etnaviv_gpu *gpu, if (gpu_read(gpu, VIVS_MMUv2_SEC_CONTROL) & VIVS_MMUv2_SEC_CONTROL_ENABLE) return; + if (gpu->mmu_context) + etnaviv_iommu_context_put(gpu->mmu_context); + gpu->mmu_context = etnaviv_iommu_context_get(context); + gpu_write(gpu, VIVS_MMUv2_PTA_ADDRESS_LOW, lower_32_bits(context->global->v2.pta_dma)); gpu_write(gpu, VIVS_MMUv2_PTA_ADDRESS_HIGH, -- 2.30.2
[PATCH 6/8] drm/etnaviv: fix MMU context leak on GPU reset
After a reset the GPU is no longer using the MMU context and may be restarted with a different context. While the mmu_state proeprly was cleared, the context wasn't unreferenced, leading to a memory leak. Cc: sta...@vger.kernel.org # 5.4 Reported-by: Michael Walle Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 9c710924df6b..f420c4f14657 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -571,6 +571,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu) gpu->fe_running = false; gpu->exec_state = -1; + if (gpu->mmu_context) + etnaviv_iommu_context_put(gpu->mmu_context); gpu->mmu_context = NULL; return 0; -- 2.30.2
[PATCH 5/8] drm/etnaviv: exec and MMU state is lost when resetting the GPU
When the GPU is reset both the current exec state, as well as all MMU state is lost. Move the driver side state tracking into the reset function to keep hardware and software state from diverging. Cc: sta...@vger.kernel.org # 5.4 Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 973843c35fca..9c710924df6b 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -570,6 +570,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu) etnaviv_gpu_update_clock(gpu); gpu->fe_running = false; + gpu->exec_state = -1; + gpu->mmu_context = NULL; return 0; } @@ -830,7 +832,6 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu) /* Now program the hardware */ mutex_lock(&gpu->lock); etnaviv_gpu_hw_init(gpu); - gpu->exec_state = -1; mutex_unlock(&gpu->lock); pm_runtime_mark_last_busy(gpu->dev); @@ -1055,8 +1056,6 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu) spin_unlock(&gpu->event_spinlock); etnaviv_gpu_hw_init(gpu); - gpu->exec_state = -1; - gpu->mmu_context = NULL; mutex_unlock(&gpu->lock); pm_runtime_mark_last_busy(gpu->dev); -- 2.30.2
[PATCH 4/8] drm/etnaviv: keep MMU context across runtime suspend/resume
The MMU state may be kept across a runtime suspend/resume cycle, as we avoid a full hardware reset to keep the latency of the runtime PM small. Don't pretend that the MMU state is lost in driver state. The MMU context is pushed out when new HW jobs with a different context are coming in. The only exception to this is when the GPU is unbound, in which case we need to make sure to also free the last active context. Cc: sta...@vger.kernel.org # 5.4 Reported-by: Michael Walle Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 325858cfc2c3..973843c35fca 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1590,9 +1590,6 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu) */ etnaviv_gpu_wait_idle(gpu, 100); - etnaviv_iommu_context_put(gpu->mmu_context); - gpu->mmu_context = NULL; - gpu->fe_running = false; } @@ -1741,6 +1738,9 @@ static void etnaviv_gpu_unbind(struct device *dev, struct device *master, etnaviv_gpu_hw_suspend(gpu); #endif + if (gpu->mmu_context) + etnaviv_iommu_context_put(gpu->mmu_context); + if (gpu->initialized) { etnaviv_cmdbuf_free(&gpu->buffer); etnaviv_iommu_global_fini(gpu); -- 2.30.2
[PATCH 3/8] drm/etnaviv: stop abusing mmu_context as FE running marker
While the DMA frontend can only be active when the MMU context is set, the reverse isn't necessarily true, as the frontend can be stopped while the MMU state is kept. Stop treating mmu_context being set as a indication that the frontend is running and instead add a explicit property. Cc: sta...@vger.kernel.org # 5.4 Signed-off-by: Lucas Stach Tested-by: Michael Walle --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 10 -- drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index c1b9c5cbed11..325858cfc2c3 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -569,6 +569,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu) /* We rely on the GPU running, so program the clock */ etnaviv_gpu_update_clock(gpu); + gpu->fe_running = false; + return 0; } @@ -631,6 +633,8 @@ void etnaviv_gpu_start_fe(struct etnaviv_gpu *gpu, u32 address, u16 prefetch) VIVS_MMUv2_SEC_COMMAND_CONTROL_ENABLE | VIVS_MMUv2_SEC_COMMAND_CONTROL_PREFETCH(prefetch)); } + + gpu->fe_running = true; } static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu) @@ -1364,7 +1368,7 @@ struct dma_fence *etnaviv_gpu_submit(struct etnaviv_gem_submit *submit) goto out_unlock; } - if (!gpu->mmu_context) { + if (!gpu->fe_running) { gpu->mmu_context = etnaviv_iommu_context_get(submit->mmu_context); etnaviv_gpu_start_fe_idleloop(gpu); } else { @@ -1573,7 +1577,7 @@ int etnaviv_gpu_wait_idle(struct etnaviv_gpu *gpu, unsigned int timeout_ms) static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu) { - if (gpu->initialized && gpu->mmu_context) { + if (gpu->initialized && gpu->fe_running) { /* Replace the last WAIT with END */ mutex_lock(&gpu->lock); etnaviv_buffer_end(gpu); @@ -1588,6 +1592,8 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu) etnaviv_iommu_context_put(gpu->mmu_context); gpu->mmu_context = NULL; + + gpu->fe_running = false; } gpu->exec_state = -1; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h index 8ea48697d132..1c75c8ed5bce 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h @@ -101,6 +101,7 @@ struct etnaviv_gpu { struct workqueue_struct *wq; struct drm_gpu_scheduler sched; bool initialized; + bool fe_running; /* 'ring'-buffer: */ struct etnaviv_cmdbuf buffer; -- 2.30.2
Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
On Fri, 2021-08-20 at 11:20 +, Lin, Wayne wrote: > [Public] > > > -Original Message- > > From: Lyude Paul > > Sent: Thursday, August 19, 2021 2:59 AM > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org > > Cc: Kazlauskas, Nicholas ; Wentland, Harry < > > harry.wentl...@amd.com>; Zuo, Jerry > > ; Wu, Hersen ; Juston Li < > > juston...@intel.com>; Imre Deak ; > > Ville Syrjälä ; Daniel Vetter < > > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten Lankhorst > > ; Maxime Ripard ; > > Thomas Zimmermann ; > > David Airlie ; Daniel Vetter ; Deucher, > > Alexander ; Siqueira, > > Rodrigo ; Pillai, Aurabindo < > > aurabindo.pil...@amd.com>; Eryk Brol ; Bas > > Nieuwenhuizen ; Cornij, Nikola < > > nikola.cor...@amd.com>; Jani Nikula ; Manasi > > Navare ; Ankit Nautiyal < > > ankit.k.nauti...@intel.com>; José Roberto de Souza ; > > Sean Paul ; Ben Skeggs ; > > sta...@vger.kernel.org > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected > > end device > > > > On Wed, 2021-08-11 at 09:49 +, Lin, Wayne wrote: > > > [Public] > > > > > > > -Original Message- > > > > From: Lyude Paul > > > > Sent: Wednesday, August 11, 2021 4:45 AM > > > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org > > > > Cc: Kazlauskas, Nicholas ; Wentland, > > > > Harry < harry.wentl...@amd.com>; Zuo, Jerry ; Wu, > > > > Hersen ; Juston Li < juston...@intel.com>; Imre > > > > Deak ; Ville Syrjälä > > > > ; Daniel Vetter < > > > > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten > > > > Lankhorst ; Maxime Ripard > > > > ; Thomas Zimmermann ; David > > > > Airlie ; Daniel Vetter ; Deucher, > > > > Alexander ; Siqueira, Rodrigo > > > > ; Pillai, Aurabindo < > > > > aurabindo.pil...@amd.com>; Eryk Brol ; Bas > > > > Nieuwenhuizen ; Cornij, Nikola < > > > > nikola.cor...@amd.com>; Jani Nikula ; Manasi > > > > Navare ; Ankit Nautiyal < > > > > ankit.k.nauti...@intel.com>; José Roberto de Souza > > > > ; Sean Paul ; Ben > > > > Skeggs ; sta...@vger.kernel.org > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for > > > > connected end device > > > > > > > > On Wed, 2021-08-04 at 07:13 +, Lin, Wayne wrote: > > > > > [Public] > > > > > > > > > > > -Original Message- > > > > > > From: Lyude Paul > > > > > > Sent: Wednesday, August 4, 2021 8:09 AM > > > > > > To: Lin, Wayne ; > > > > > > dri-devel@lists.freedesktop.org > > > > > > Cc: Kazlauskas, Nicholas ; > > > > > > Wentland, Harry < harry.wentl...@amd.com>; Zuo, Jerry > > > > > > ; Wu, Hersen ; Juston Li > > > > > > < juston...@intel.com>; Imre Deak ; Ville > > > > > > Syrjälä ; Wentland, Harry < > > > > > > harry.wentl...@amd.com>; Daniel Vetter ; > > > > > > Sean Paul ; Maarten Lankhorst < > > > > > > maarten.lankho...@linux.intel.com>; Maxime Ripard > > > > > > ; Thomas Zimmermann ; > > > > > > David Airlie ; Daniel Vetter > > > > > > ; Deucher, Alexander > > > > > > ; Siqueira, Rodrigo < > > > > > > rodrigo.sique...@amd.com>; Pillai, Aurabindo > > > > > > ; Eryk Brol ; Bas > > > > > > Nieuwenhuizen ; Cornij, Nikola > > > > > > ; Jani Nikula ; > > > > > > Manasi Navare ; Ankit Nautiyal > > > > > > ; José Roberto de Souza > > > > > > ; Sean Paul ; Ben > > > > > > Skeggs ; sta...@vger.kernel.org > > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for > > > > > > connected end device > > > > > > > > > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote: > > > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote: > > > > > > > > [Why] > > > > > > > > Currently, we will create connectors for all output ports no > > > > > > > > matter it's connected or not. However, in MST, we can only > > > > > > > > determine whether an output port really stands for a > > > > > > > > "connector" > > > > > > > > till it is connected and check its peer device type as an > > > > > > > > end device. > > > > > > > > > > > > > > What is this commit trying to solve exactly? e.g. is AMD > > > > > > > currently running into issues with there being too many DRM > > > > > > > connectors or something like that? > > > > > > > Ideally this is behavior I'd very much like us to keep as-is > > > > > > > unless there's good reason to change it. > > > > > Hi Lyude, > > > > > Really appreciate for your time to elaborate in such detail. Thanks! > > > > > > > > > > I come up with this commit because I observed something confusing > > > > > when I was analyzing MST connectors' life cycle. Take the topology > > > > > instance you mentioned below > > > > > > > > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected > > > > > w/ > > > > > display) > > > > > | > > > > > - > > > > > > Output_Port 2 (Disconnected) > > > > > -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1 > > > > > (Disconnected) > > > > > > > > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of > > > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within > > > > > this hub. With our MST imple
[PATCH 0/3] drm/panfrost: Bug fixes for lock_region
Chris Morgan reported UBSAN errors in panfrost and tracked them down to the size computation in lock_region. This calculation is overcomplicated (seemingly cargo culted from kbase) and can be simplified with kernel helpers and some mathematical identities. The first patch in the series rewrites the calculation in a form avoiding undefined behaviour; Chris confirms it placates UBSAN. While researching this function, I noticed a pair of other potential bugs: Bifrost can lock more than 4GiB at a time, but must lock at least 32KiB at a time. The latter patches in the series handle these cases. The size computation was unit-tested in userspace. Relevant code below, just missing some copypaste definitions for fls64/clamp/etc: #define MIN_LOCK (1ULL << 12) #define MAX_LOCK (1ULL << 48) struct { uint64_t size; uint8_t encoded; } tests[] = { /* Clamping */ { 0, 11 }, { 1, 11 }, { 2, 11 }, { 4095, 11 }, /* Power of two */ { 4096, 11 }, /* Round up */ { 4097, 12 }, { 8192, 12 }, { 16384, 13 }, { 16385, 14 }, /* Maximum */ { ~0ULL, 47 }, }; static uint8_t region_width(uint64_t size) { size = clamp(size, MIN_LOCK, MAX_LOCK); return fls64(size - 1) - 1; } int main(int argc, char **argv) { for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) { uint64_t test = tests[i].size; uint8_t expected = tests[i].encoded; uint8_t actual = region_width(test); assert(expected == actual); } } Alyssa Rosenzweig (3): drm/panfrost: Simplify lock_region calculation drm/panfrost: Use u64 for size in lock_region drm/panfrost: Clamp lock region to Bifrost minimum drivers/gpu/drm/panfrost/panfrost_mmu.c | 31 +--- drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++ 2 files changed, 13 insertions(+), 20 deletions(-) -- 2.30.2