Re: regression with mainline kernel
On Thu, Nov 11, 2021 at 8:51 PM Sudip Mukherjee wrote: > > Hi Linus, > > On Thu, Nov 11, 2021 at 2:03 PM Sudip Mukherjee > wrote: > > > > Hi Linus, > > > > My testing has been failing for the last few days. Last good test was > > with 6f2b76a4a384 and I started seeing the failure with ce840177930f5 > > where boot timeout. Last night's test on 66f4beaa6c1d worked fine. So I guess this has now been fixed. Thanks. -- Regards Sudip
Re: [PATCH v4] drm/ttm: Clarify that the TTM_PL_SYSTEM is under TTMs control
Hi, Zack, On 11/11/21 17:44, Zack Rusin wrote: On Wed, 2021-11-10 at 09:50 -0500, Zack Rusin wrote: TTM takes full control over TTM_PL_SYSTEM placed buffers. This makes driver internal usage of TTM_PL_SYSTEM prone to errors because it requires the drivers to manually handle all interactions between TTM which can swap out those buffers whenever it thinks it's the right thing to do and driver. CPU buffers which need to be fenced and shared with accelerators should be placed in driver specific placements that can explicitly handle CPU/accelerator buffer fencing. Currently, apart, from things silently failing nothing is enforcing that requirement which means that it's easy for drivers and new developers to get this wrong. To avoid the confusion we can document this requirement and clarify the solution. This came up during a discussion on dri-devel: https://lore.kernel.org/dri-devel/232f45e9-8748-1243-09bf-56763e666...@amd.com I took a slightly deeper look into this. I think we need to formalize this a bit more to understand pros and cons and what the restrictions are really all about. Anybody looking at the prevous discussion will mostly see arguments similar to "this is stupid and difficult" and "it has always been this way" which are not really constructive. First disregarding all accounting stuff, I think this all boils down to TTM_PL_SYSTEM having three distinct states: 1) POPULATED 2) LIMBO (Or whatever we want to call it. No pages present) 3) SWAPPED. The ttm_bo_move_memcpy() helper understands these, and any standalone driver implementation of the move() callback _currently_ needs to understand these as well, unless using the ttm_bo_move_memcpy() helper. Now using a bounce domain to proxy SYSTEM means that the driver can forget about the SWAPPED state, it's automatically handled by the move setup code. However, another pitfall is LIMBO, in that if when you move from SYSTEM/LIMBO to your bounce domain, the BO will be populated. So any naive accelerated move() implementation creating a 1GB BO in fixed memory, like VRAM, will needlessly allocate and free 1GB of system memory in the process instead of just performing a clear operation. Looks like amdgpu suffers from this? I think what is really needed is either a) A TTM helper that helps move callback implementations resolve the issues populating system from LIMBO or SWAP, and then also formalize driver notification for swapping. At a minimum, I think the swap_notify() callback needs to be able to return a late error. b) Make LIMBO and SWAPPED distinct memory regions. (I think I'd vote for this without looking into it in detail). In both these cases, we should really make SYSTEM bindable by GPU, otherwise we'd just be trading one pitfall for another related without really resolving the root problem. As for fencing not being supported by SYSTEM, I'm not sure why we don't want this, because it would for example prohibit async ttm_move_memcpy(), and also, async unbinding of ttm_tt memory like MOB on vmgfx. (I think it's still sync). There might be an accounting issue related to this as well, but I guess Christian would need to chime in on this. If so, I think it needs to be well understood and documented (in TTM, not in AMD drivers). Thanks, /Thomas Polite and gentle ping on that one. Are we ok with the wording here? z
[Bug 215003] New: apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019
https://bugzilla.kernel.org/show_bug.cgi?id=215003 Bug ID: 215003 Summary: apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019 Product: Drivers Version: 2.5 Kernel Version: 5.14.y Hardware: Intel OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-...@kernel-bugs.osdl.org Reporter: dreifachst...@gmail.com Regression: No MBP2019 can be tricked into booting with the iGPU enabled (https://github.com/0xbb/apple_set_os.efi) but i915 does not claim the iGPU after booting. I have tracked down the direct cause but do not how to fix it. When booting with the iGPU enabled `apple_gmux_present` returns true because GMUX_ACPI_HID ("APP000B") is present in the ACPI tables. Because apple_gmux fails to initialize with "Failed to find gmux I/O resource" and never registers with vgaswitcheroo the i915 probe routine always bails out after `vga_switcheroo_client_probe_defer` lefting the device unclaimed. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215003] apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019
https://bugzilla.kernel.org/show_bug.cgi?id=215003 --- Comment #1 from Xiaolei Yu (dreifachst...@gmail.com) --- Commenting out the `vga_switcheroo_client_probe_defer` lines make the iGPU usable but gpu switching is still not working. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215003] apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019
https://bugzilla.kernel.org/show_bug.cgi?id=215003 Lukas Wunner (lu...@wunner.de) changed: What|Removed |Added CC||lu...@wunner.de --- Comment #2 from Lukas Wunner (lu...@wunner.de) --- vga_switcheroo_client_probe_defer() only returns true if invoked by the *inactive* GPU's driver. (That's the "pdev != vga_default_device()" condition.) In other words, you've enabled the iGPU but it's not the *active* one. You need to use the gpu-switch utility (https://github.com/0xbb/gpu-switch) to tell the EFI BIOS that it should switch to the iGPU on next boot. That should resolve the issue. Unfortunately we don't support GPU switching for retina MBPs in the kernel yet, hence have to rely on the EFI BIOS to do that for now. (We do support GPU switching for *pre-retina* MBPs in the kernel since early 2016.) However, I'd like to know why apple_gmux fails to probe with "Failed to find gmux I/O resource". Could you attach an ACPI dump to this bugzilla? Thanks! -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v3 11/12] drm/virtio: implement context init: add virtio_gpu_fence_event
On Tue, Sep 21, 2021 at 04:20:23PM -0700, Gurchetan Singh wrote: > Similar to DRM_VMW_EVENT_FENCE_SIGNALED. Sends a pollable event > to the DRM file descriptor when a fence on a specific ring is > signaled. > > One difference is the event is not exposed via the UAPI -- this is > because host responses are on a shared memory buffer of type > BLOB_MEM_GUEST [this is the common way to receive responses with > virtgpu]. As such, there is no context specific read(..) > implementation either -- just a poll(..) implementation. > > Signed-off-by: Gurchetan Singh > Acked-by: Nicholas Verne > --- > drivers/gpu/drm/virtio/virtgpu_drv.c | 43 +- > drivers/gpu/drm/virtio/virtgpu_drv.h | 7 + > drivers/gpu/drm/virtio/virtgpu_fence.c | 10 ++ > drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 > 4 files changed, 93 insertions(+), 1 deletion(-) This commit seems to cause a crash in a virtual drm gpu driver for Android. I have reverted this, and the next commit in the series from Linus's tree and all is good again. Any ideas? thanks, greg k-h
Re: regression with mainline kernel
[ Hmm. This email was marked as spam for me. I see no obvious reason for it being marked as spam, but it happens.. ] On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee wrote: > > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0] > drm/virtio: implement context init: add virtio_gpu_fence_event Hmm. Judging from your automated screenshots, the login never appears. > And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed > the problem I was seeing on my qemu test of x86_64. The qemu image is > based on Ubuntu. Presumably either that commit is somehow buggy in itself - or it does exactly what it means to do, and the new poll() semantics just confuses the heck out of the X server (or wayland or whatever). And honestly, if I read that thing correctly, the patch is entirely broken. The new poll function (virtio_gpu_poll()) will unconditionally remove the first event from the event list, and then report "Yeah, I had events". This is completely bogus for a few reasons: - poll() really should be idempotent, because the poll function gets called multiple times - it doesn't even seem to check that the event that it removes is the new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will unconditionally just remove random events. - it does seem to check the "vfpriv->ring_idx_mask" and do the old thing if that is zero, but I see absolutely no reason for that (and that check itself has caused problems, see below) Honestly, my reaction to this all is that that commit is fundamentally broken and probably should be reverted regardless as "this commit does bad things". HOWEVER - it has had a fix for a NULL pointer dereference in the meantime - can you check whether the current top of tree happens to work for you? Maybe your problem isn't due to "that commit does unnatural things", but simply due to the bug fixed in d89c0c8322ec ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll"). And if it's still broken with that commit, I'll happily revert it and people need to go back to the drawing board. In fact, I would really suggest that people look at that virtio_gpu_poll() function regardless. That odd "let's unconditionally just drop events in the poll function is really REALLY broken behavior. Linus
[PATCH] drm: change logs to print connectors in the form CONNECTOR:id:name
The prefered way to log connectors is [CONNECTOR:id:name]. Change it in drm core programs. Suggested-by: Ville Syrjälä Signed-off-by: Claudio Suarez --- drivers/gpu/drm/drm_client_modeset.c | 51 ++-- drivers/gpu/drm/drm_connector.c | 12 --- drivers/gpu/drm/drm_edid.c | 36 ++-- drivers/gpu/drm/drm_edid_load.c | 11 +++--- drivers/gpu/drm/drm_mode_config.c| 3 +- 5 files changed, 59 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/drm_client_modeset.c b/drivers/gpu/drm/drm_client_modeset.c index ced09c7c06f9..4f35dc375bdd 100644 --- a/drivers/gpu/drm/drm_client_modeset.c +++ b/drivers/gpu/drm/drm_client_modeset.c @@ -240,7 +240,7 @@ static void drm_client_connectors_enabled(struct drm_connector **connectors, for (i = 0; i < connector_count; i++) { connector = connectors[i]; enabled[i] = drm_connector_enabled(connector, true); - DRM_DEBUG_KMS("connector %d enabled? %s\n", connector->base.id, + DRM_DEBUG_KMS("[CONNECTOR:%d;%s] enabled? %s\n", connector->base.id, connector->name, connector->display_info.non_desktop ? "non desktop" : enabled[i] ? "yes" : "no"); any_enabled |= enabled[i]; @@ -350,8 +350,8 @@ static int drm_client_get_tile_offsets(struct drm_connector **connectors, continue; if (!modes[i] && (h_idx || v_idx)) { - DRM_DEBUG_KMS("no modes for connector tiled %d %d\n", i, - connector->base.id); + DRM_DEBUG_KMS("no modes for [CONNECTOR:%d:%s] tiled %d\n", + connector->base.id, connector->name, i); continue; } if (connector->tile_h_loc < h_idx) @@ -419,14 +419,15 @@ static bool drm_client_target_preferred(struct drm_connector **connectors, drm_client_get_tile_offsets(connectors, connector_count, modes, offsets, i, connector->tile_h_loc, connector->tile_v_loc); } - DRM_DEBUG_KMS("looking for cmdline mode on connector %d\n", - connector->base.id); + DRM_DEBUG_KMS("looking for cmdline mode on [CONNECTOR:%d:%s]\n", + connector->base.id, connector->name); /* got for command line mode first */ modes[i] = drm_connector_pick_cmdline_mode(connector); if (!modes[i]) { - DRM_DEBUG_KMS("looking for preferred mode on connector %d %d\n", - connector->base.id, connector->tile_group ? connector->tile_group->id : 0); + DRM_DEBUG_KMS("looking for preferred mode on [CONNECTOR:%d:%s] %d\n", + connector->base.id, connector->name, + connector->tile_group ? connector->tile_group->id : 0); modes[i] = drm_connector_has_preferred_mode(connector, width, height); } /* No preferred modes, pick one off the list */ @@ -448,8 +449,8 @@ static bool drm_client_target_preferred(struct drm_connector **connectors, (connector->tile_h_loc == 0 && connector->tile_v_loc == 0 && !drm_connector_get_tiled_mode(connector))) { - DRM_DEBUG_KMS("Falling back to non tiled mode on Connector %d\n", - connector->base.id); + DRM_DEBUG_KMS("Falling back to non tiled mode on [CONNECTOR:%d:%s]\n", + connector->base.id, connector->name); modes[i] = drm_connector_fallback_non_tiled_mode(connector); } else { modes[i] = drm_connector_get_tiled_mode(connector); @@ -617,15 +618,15 @@ static bool drm_client_firmware_config(struct drm_client_dev *client, num_connectors_detected++; if (!enabled[i]) { - DRM_DEBUG_KMS("connector %s not enabled, skipping\n", - connector->name); + DRM_DEBUG_KMS("[CONNECTOR:%d:%s] not enabled, skipping\n", + connector->base.id, connector->name); conn_configured |= BIT(i); continue; } if (connector->force == DRM_FORCE_OFF) { - DRM_DEBUG_KMS("connector %s is disabled by user, skipping\n", - connector->name); + DRM_DEBUG_KMS("[CONNECTOR:%d:%s] is disabl
Re: [PATCH] drm/nouveau/core: fix the uninitialized use in nvkm_ioctl_map()
something seems to have messed with the patch so it doesn't apply correctly. On Thu, Jun 17, 2021 at 9:39 AM Yizhuo Zhai wrote: > > In function nvkm_ioctl_map(), the variable "type" could be > uninitialized if "nvkm_object_map()" returns error code, > however, it does not check the return value and directly > use the "type" in the if statement, which is potentially > unsafe. > > Signed-off-by: Yizhuo > --- > drivers/gpu/drm/nouveau/nvkm/core/ioctl.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c > b/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c > index d777df5a64e6..7f2e8482f167 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c > +++ b/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c > @@ -266,6 +266,8 @@ nvkm_ioctl_map(struct nvkm_client *client, > ret = nvkm_object_map(object, data, size, &type, > &args->v0.handle, > &args->v0.length); > + if (ret) > + return ret; > if (type == NVKM_OBJECT_MAP_IO) > args->v0.type = NVIF_IOCTL_MAP_V0_IO; > else > -- > 2.17.1 >
Re: [PATCH] drm: change logs to print connectors in the form CONNECTOR:id:name
Hi Claudio, On Sat, Nov 13, 2021 at 08:27:30PM +0100, Claudio Suarez wrote: > The prefered way to log connectors is [CONNECTOR:id:name]. Change it in > drm core programs. > > Suggested-by: Ville Syrjälä > Signed-off-by: Claudio Suarez While touching all these logging calls, could you convernt them to the newer drm_dbg_kms variants? DRM_DEBUG_* are all deprecated. Sam
Re: regression with mainline kernel
Hi Linus, On Sat, Nov 13, 2021 at 5:07 PM Linus Torvalds wrote: > > [ Hmm. This email was marked as spam for me. I see no obvious reason > for it being marked as spam, but it happens.. ] > > On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee > wrote: > > > > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0] > > drm/virtio: implement context init: add virtio_gpu_fence_event > > Hmm. Judging from your automated screenshots, the login never appears. > > > HOWEVER - it has had a fix for a NULL pointer dereference in the > meantime - can you check whether the current top of tree happens to > work for you? Maybe your problem isn't due to "that commit does > unnatural things", but simply due to the bug fixed in d89c0c8322ec > ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll"). > > And if it's still broken with that commit, I'll happily revert it and > people need to go back to the drawing board. I sent another mail yesterday which is now at https://lore.kernel.org/lkml/CADVatmOOzCxAgLhCu1tTz=44sgrdxds5-omz3v0w4f5klcl...@mail.gmail.com/ I will just pase that here for you. Last night's test on 66f4beaa6c1d worked fine. So I guess this has now been fixed. I have not done a bisect to see what has fixed it, but looking at the log I think it will be that NULL pointer fix. -- Regards Sudip
[Bug 215003] apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019
https://bugzilla.kernel.org/show_bug.cgi?id=215003 --- Comment #3 from Xiaolei Yu (dreifachst...@gmail.com) --- Created attachment 299561 --> https://bugzilla.kernel.org/attachment.cgi?id=299561&action=edit acpidump -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215003] apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019
https://bugzilla.kernel.org/show_bug.cgi?id=215003 --- Comment #4 from Xiaolei Yu (dreifachst...@gmail.com) --- gpu-switch does not work because `EFI runtime services are disabled`. [0.371791] [ cut here ] [0.371822] [Firmware Bug]: Page fault caused by firmware at PA: 0xb6ee80068000 [0.371872] WARNING: CPU: 0 PID: 75 at arch/x86/platform/efi/quirks.c:734 efi_crash_gracefully_on_page_fault+0x49/0xc0 [0.371942] Modules linked in: [0.371963] CPU: 0 PID: 75 Comm: kworker/u24:4 Not tainted 5.14.17+ #9 [0.372006] Hardware name: Apple Inc. MacBookPro16,1/Mac-E1008331FDC96864, BIOS 1554.120.19.0.0 (iBridge: 18.16.14663.0.0,0) 05/07/2021 [0.372083] Workqueue: efi_rts_wq efi_call_rts [0.372114] RIP: 0010:efi_crash_gracefully_on_page_fault+0x49/0xc0 [0.372154] Code: 48 89 fd e8 09 8d 02 00 48 81 fd ff 0f 00 00 76 08 48 3d 50 a0 61 b1 74 02 5d c3 48 89 ee 48 c7 c7 40 bf 6c b0 e8 f5 22 7e 00 <0f> 0b 83 3d 7e 0e fa 01 0a 0f 84 17 1e 7e 00 e8 53 19 00 00 48 8b [0.372271] RSP: :b6ee8033ab30 EFLAGS: 00010086 [0.372305] RAX: RBX: RCX: b0cb4a08 [0.372350] RDX: RSI: efff RDI: b0c5ca00 [0.372395] RBP: b6ee80068000 R08: R09: b6ee8033a958 [0.372440] R10: b6ee8033a950 R11: b0ccca48 R12: b6ee8033abc8 [0.374667] R13: R14: R15: [0.376933] FS: () GS:9fc8aea0() knlGS: [0.379162] CS: 0010 DS: ES: CR0: 80050033 [0.381422] CR2: b6ee80068000 CR3: 0001001a8006 CR4: 003706f0 [0.383698] Call Trace: [0.385899] page_fault_oops+0x9c/0x240 [0.388141] exc_page_fault+0xcc/0x150 [0.390313] asm_exc_page_fault+0x1e/0x30 [0.392540] RIP: 0010:0xfffeefc440c5 [0.394703] Code: 31 c9 48 29 f9 48 83 e1 0f 74 0c 4c 39 c1 49 0f 47 c8 49 29 c8 f3 a4 4c 89 c1 49 83 e0 0f 48 c1 e9 04 74 2c f3 0f 7f 44 24 18 0f 6f 06 66 0f e7 07 48 83 c6 10 48 83 c7 10 e2 ee 0f ae f0 66 [0.397081] RSP: :b6ee8033ac78 EFLAGS: 00010286 [0.399456] RAX: fffeefc921e5 RBX: b0709ce2 RCX: 0035 [0.401753] RDX: b6ee80067d48 RSI: b6ee80067ff3 RDI: fffeefc92490 [0.404119] RBP: b6ee8033ad00 R08: 000c R09: b6ee8006834e [0.406426] R10: R11: 005ef3bd R12: [0.408788] R13: b6ee80067dd0 R14: b6ee80067d01 R15: 0607 [0.411101] ? __efi_call+0x28/0x30 [0.413426] ? switch_mm_irqs_off+0x19a/0x3b0 [0.415753] ? efi_call_rts+0x17c/0x6c0 [0.418001] ? process_one_work+0x1ec/0x390 [0.420289] ? worker_thread+0x53/0x3e0 [0.422515] ? process_one_work+0x390/0x390 [0.424780] ? kthread+0x127/0x150 [0.426991] ? set_kthread_struct+0x40/0x40 [0.429254] ? ret_from_fork+0x22/0x30 [0.431488] ---[ end trace d9718208699f023a ]--- -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215003] apple_gmux fails to initialize and iGPU unclaimed on MacBook Pro 16" 2019
https://bugzilla.kernel.org/show_bug.cgi?id=215003 --- Comment #5 from Lukas Wunner (lu...@wunner.de) --- Hm, why are runtime services disabled? Are you using "noefi" or "efi=noruntime" on the command line or is this perhaps an RT kernel? Could you attach full dmesg output? Looking at the ACPI dump I notice that GMUX only has an 8 byte Memory32Fixed region: Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { Memory32Fixed (ReadWrite, 0xFE0B0200, // Address Base 0x0008, // Address Length ) }) On my pre-retina MacBookPro9,1 it's a 256 byte Decode16 region: Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { IO (Decode16, 0x0700, // Range Minimum 0x07FF, // Range Maximum 0x01, // Alignment 0xFF, // Length ) }) Apple already changed the I/O interface once when they switched to retina displays in 2012: Before, registers where accessed directly. After, registers where accessed through a mailbox interface which required 3 32-bit registers for communication. It looks like they changed the interface again, however now there's only a memory region with 8 bytes, so only 2 32-bit registers. It will be necessary to reverse-engineer the new interface (again) to get support for GMUX working on these newer machines. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.