Re: [PATCH] vt_ioctl: make VT_RESIZEX behave like VT_RESIZE
On Sun, Apr 11, 2021 at 2:43 PM Maciej W. Rozycki wrote: > > So it does trigger with vgacon and my old server, which I have started > experimenting with and for a start I have switched to a new kernel for an > unrelated purpose (now that I have relieved it from all its usual tasks > except for the last remaining one for which I haven't got the required > user software ported to the new system yet): > > "struct vt_consize"->v_vlin is ignored. Please report if you need this. > "struct vt_consize"->v_clin is ignored. Please report if you need this. Note that it's entirely possible that things continue to work well despite this warning. It's unclear to me from your email if you actually see any difference (and apparently you're not able to see it right now due to not being close to the machine). The fact that v_vlin/v_clin are ignored doesn't necessarily mean that they are different from what they were before, so the warning may be a non-issue. > It continues using svgatextmode with its glass (CRT) VT to set my usual > 80x37 text mode (720x576 pixel resolution) by manipulating the VGA clock > chip and the CRT controller appropriately for a nice refresh rate of 85Hz: > > Chipset = `TVGA8900', Textmode clock = 44.90 MHz, 80x37 chars, CharCell = > 9x16. Refresh = 52.51kHz/84.7Hz. That doesn't seem necessarily wrong to me. > So what's the supposed impact of this change that prompted the inclusion > of the messages? There _may_ be no impact at all apart from the messages. The code _used_ to set the scan lines (v_vlin) and font height (v_clin) from those numbers if they were non-zero, and now it just ignores them and warns instead. The code now just sets the font height from the actual font size when the font is set. Which is honestly the only thing that ever made sense. Trying to set it with v_clin is ignored, but it's entirely possible - maybe even likely - that your user of VT_RESIZEX sets it to the same values it already has. Exactly because setting a font line number to anything else than the font size isn't exactly sensible. But if your screen looks different than it used to, that is obviously interesting and says something actually changed (outside of the message itself). Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
New warnings with gcc-11
I've updated to Fedora 34 on one of my machines, and it causes a lot of i915 warnings like drivers/gpu/drm/i915/intel_pm.c: In function ‘ilk_setup_wm_latency’: drivers/gpu/drm/i915/intel_pm.c:3059:9: note: referencing argument 3 of type ‘const u16 *’ {aka ‘const short unsigned int *’} drivers/gpu/drm/i915/intel_pm.c:2994:13: note: in a call to function ‘intel_print_wm_latency’ and the reason is that gcc now seems to look at the argument array size more, and notices that (a) intel_print_wm_latency() takes a "const u16 wm[8]" argument but (b) most of the arrays passed in tend to look like 'u16 pri_latency[5]' I think I will make the argument type to intel_print_wm_latency() be just "const u16 wm[]" for now, just to avoid seeing a ton of silly warnings. I'm not sure if there is a better solution (like making all of those latency arrays be 8 entries in size), so I'm just letting you know about my change in this area in case anybody has a better idea. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: New warnings with gcc-11
On Tue, Apr 27, 2021 at 4:43 PM Linus Torvalds wrote: > > I think I will make the argument type to intel_print_wm_latency() be > just "const u16 wm[]" for now, just to avoid seeing a ton of silly > warnings. After fixing the trivial ones, this one remains: drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_check_mst_status’: drivers/gpu/drm/i915/display/intel_dp.c:4554:22: warning: ‘drm_dp_channel_eq_ok’ reading 6 bytes from a region of size 4 [-Wstringop-overread] 4554 | !drm_dp_channel_eq_ok(&esi[10], intel_dp->lane_count)) { | ^~~~ drivers/gpu/drm/i915/display/intel_dp.c:4554:22: note: referencing argument 1 of type ‘const u8 *’ {aka ‘const unsigned char *’} In file included from drivers/gpu/drm/i915/display/intel_dp.c:38: ./include/drm/drm_dp_helper.h:1459:6: note: in a call to function ‘drm_dp_channel_eq_ok’ 1459 | bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE], | ^~~~ and I'm not fixing that one, because it actually looks like a valid warning, and doesn't have an obvious fix. That "esi[]" array is 14 bytes in size (DP_DPRX_ESI_LEN). So when it does that "&esi[10]" and passes it in as an argument, then only 4 bytes remain of the array. And drm_dp_channel_eq_ok() supposedly takes a "const u8 link_status[DP_LINK_STATUS_SIZE]", which is 6 bytes. There may be some reason this is ok, but it does look a bit fishy, and the compiler warning is appropriate. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.13-rc1
On Tue, Apr 27, 2021 at 8:32 PM Dave Airlie wrote: > > There aren't a massive amount of conflicts, only with vmwgfx when I > did a test merge into your master yesterday, I think you should be > able to handle them yourself, but let me know if you want me to push a > merged tree somewhere (or if I missed something). The conflict was easy enough to resolve, but was unusual in that my tree had vmwgfx fixes that weren't in the development tree (ie that commit 2ef4fb92363c: "drm/vmwgfx: Make sure bo's are unpinned before putting them back"). Usually when I merge stuff, I can see that the fixes that were pushed to my tree are also in the development branch. Not so this time. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.13-rc1
On Wed, Apr 28, 2021 at 11:14 AM Daniel Vetter wrote: > > Maybe we're overdoing it a bit, but we're trying to not backmerge all > the time for no reason at all. Oh, I'm not complaining. I think it's reasonable if some particular issue doesn't hold up further development. So my email was more a statement of surprise at a new pattern than anything else. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.13-rc1
On Tue, Apr 27, 2021 at 8:32 PM Dave Airlie wrote: > > This is the main drm pull request for 5.13. The usual lots of work all > over the place. [...] > > Mikita Lipski: > drm/amd/display: Add MST capability to trigger_hotplug interface Hmm. I've already merged this, but my clang build shows that this looks buggy: drivers/gpu/drm/amd/amdgpu/amdgpu_dm/amdgpu_dm_debugfs.c:3015:53: warning: address of 'aconnector->mst_port->mst_mgr' will always evaluate to 'true' [-Wpointer-bool-conversion] if (!(aconnector->port && &aconnector->mst_port->mst_mgr)) ~~ ~~^~~ and yeah, checking for the address of a member of a structure benign NULL doesn't really work. I'm assuming the '&' is just a left-over cut-and-paste error or something. Please fix after reviewing (I'm not going to blindly just remove the '&' just to silence the warning, since I don't know the code). Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm fixes for 5.14-rc4
This might possibly have been fixed already by the previous drm pull, but I wanted to report it anyway, just in case. It happened after an uptime of over a week, so it might not be trivial to reproduce. It's a NULL pointer dereference in dc_stream_retain() with the code being lock xadd %eax,0x390(%rdi) <-- trapping instruction and that's just the kref_get(&stream->refcount); with a NULL 'stream' argument. Call Trace: dc_resource_state_copy_construct+0x13f/0x190 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu] commit_tail+0x97/0x180 [drm_kms_helper] process_one_work+0x1df/0x3a0 the oops is followed by a stream of [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1] hw_done or flip_done timed out and the machine was not usable afterwards. lspci says this is a 49:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7) (prog-if 00 [VGA controller]) Full oops in the attachment, but I think the above is all the really salient details. Linus amd-gpu-ooops Description: Binary data
Re: [Intel-gfx] [BUG] on reboot: bisected to: drm/i915: Shut down displays gracefully on reboot
On Thu, Jan 14, 2021 at 2:01 PM Steven Rostedt wrote: > > Thanks, I take it, it will be going into mainline soon. Just got merged - it might be a good idea to verify that your problem is solved. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.12-rc1
On Thu, Feb 18, 2021 at 10:06 PM Dave Airlie wrote: > > Let me know if there are any issues, gcc was happy, and I obviously already pushed out my merge, but then when I did my clang build afterwards, it reports: drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:764:2: warning: variable 'structure_size' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] default: ^~~ drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:770:23: note: uninitialized use occurs here memset(header, 0xFF, structure_size); ^~ and clang is very very right. That "default" case is completely broken, and will generate a randomly sized memset. Not good. Presumably that default case never happens, but if so it shouldn't exist. Perhaps better yet, make the "default" case just do a "return" instead of a break. Breaking out of the switch statement to code that cannot possibly work is all kinds of mindless. Kevin/Alex? This was introduced by commit de4b7cd8cb87 ("drm/amd/pm/swsmu: unify the init soft gpu metrics function") Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PULL] fixes around VM_PFNMAP and follow_pfn for 5.12 merge window
On Mon, Feb 22, 2021 at 2:25 AM Daniel Vetter wrote: > > Cc all the mailing lists ... my usual script crashed and I had to > hand-roll the email and screwed it up ofc :-/ Oh, and my reply thus also became just a reply to you personally. So repeating it here, in case somebody has comments about that access_process_vm() issue. On Mon, Feb 22, 2021 at 2:23 AM Daniel Vetter wrote: > > I've stumbled over this for my own learning and then realized there's a > bunch of races around VM_PFNMAP mappings vs follow pfn. > > If you're happy with this [..] Happy? No. But it seems an improvement. I did react to some of this: commit 0fb1b1ed7dd9 ("/dev/mem: Only set filp->f_mapping") talks about _what_ it does, but not so much _why_ it does it. It doesn't seem to actually matter, and seems almost incidental (because you've looked at f_mapping and i_mapping just didn't matter but was adjacent. And generic_access_phys() remains horrific. Does anything actually use this outside of the odd magical access_remote_vm() code? I'm wondering if that code shouldn't just be removed entirely. It's quite old, I'm not sure it's really relevant. See commit 28b2ee20c7cb ("access_process_vm device memory infrastructure"). I guess you do debug the X server, but still.. Do you actually ever look at device memory through the debugger? I'd hope that you'd use an access function and make gdb call it in the context of the debuggee? Whatever. I've pulled it, and I'm not _unhappy_ with it, but I'd also not call myself overly giddy and over the moon happy about this code. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PULL] fixes around VM_PFNMAP and follow_pfn for 5.12 merge window
On Mon, Feb 22, 2021 at 2:25 AM Daniel Vetter wrote: > > Cc all the mailing lists ... my usual script crashed and I had to > hand-roll the email and screwed it up ofc :-/ Oh, and you also didn't get a pr-tracker-bot response for this for the same reason. Even your fixed email was ignored by the bot (I think because of the "Re:" in the subject line). So consider this your manual pr-tracker-bot replacement. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PULL] topic/iomem-mmap-vs-gup
[ You had a really odd Reply-to on this one ] On Mon, May 3, 2021 at 12:15 PM Daniel Vetter wrote: > > Anyway here's a small pull for you to ponder, now that the big ones are > all through. Well, _now_ I'm all caught up. Knock wood. Anyway, time to look at it: > Follow-up to my pull from last merge window: kvm and vfio lost their > very unsafe use of follow_pfn, this appropriately marks up the very > last user for some userptr-as-buffer use-cases in media. There was > some resistance to outright removing it, maybe we can do this in a few > releases. Hmm. So this looks mostly ok to me, although I think the change to the nommu case is pretty ridiculous. On nommu, unsafe_follow_pfn() should just be a wrapper around follow_pfn(). There's no races when you can't remap anything. No? Do the two media cases even work on nommu? Finally - did you intend fo this to be a real pull request? Because the email read to me like "think about this and tell me what you think" rather than "please pull".. And I have now fulfilled that "think about and tell me" part ;) Linus
Re: [PULL] topic/iomem-mmap-vs-gup
[ Daniel, please fix your broken email setup. You have this insane "Reply-to" list that just duplicates all the participants. Very broken, very annoying ] On Fri, May 7, 2021 at 8:53 AM Daniel Vetter wrote: > > So personally I think the entire thing should just be thrown out, it's all > levels of scary and we have zero-copy buffer sharing done properly with > dma-buf since years in v4l. So I've been looking at this more, and the more I look at it, the less I like this series. I think the proper fix is to just fix things. For example, I'm looking at the v4l users of follow_pfn(), and I find get_vaddr_frames(), which is just broken. Fine, we know users are broken, but look at what appears to be the main user of get_vaddr_frames(): vb2_dc_get_userptr(). What does that function do? Immediately after doing get_vaddr_frames(), it tries to turn those pfn's into page pointers, and then do sg_alloc_table_from_pages() on the end result. Yes, yes, it also has that "ok, that failed, let's try to see if it's some physically contiguous mapping" and do DMA directly to those physical pages, but the point there is that that only happens when they weren't normal pages to begin with. So thew *fix* for at least that path is to (a) just use the regular pin_user_pages() for normal pages (b) perhaps keep the follow_pfn() case, but then limit it to that "no page backing" and that physical pages case. And honestly, the "struct frame_vector" thing already *has* support for this, and the problem is simply that the v4l code has decided to have the callers ask for pfn's rather than have the callers just ask for a frame-vector that is either "pfn's with no paeg backing" _or_ "page list with proper page reference counting". So this series of yours that just disables follow_pfn() actually seems very wrong. I think follow_pfn() is ok for the actual "this is not a 'struct page' backed area", and disabling that case is wrong even going forward. End result, I think the proper model is: - keep follow_pfn(), but limit it to the "not vm_normal_page()" case, and return error for some real page mapping - make the get_vaddr_frames() first try "pin_user_pages()" (and create a page array) and fall back to "follow_pfn()" if that fails (or the other way around). Set the IOW, get_vaddr_frames() would just do vec->got_ref = is_pages; vec->is_pfns = !is_pages; and everything would just work out - the v4l code seems to already have all the support for "it's a ofn array" vs "it's properly refcounted pages". So the only case we should disallow is the mixed case, that the v4l code already seems to not be able to handle anyway (and honestly, it looks like "got_ref/is_pfns" should be just one flag - they always have to have the opposite values). So I think this "unsafe_follow_pfn()" halfway step is actively wrong. It doesn't move us forward. Quite the reverse. It just makes the proper fix harder. End result: not pulling it, unless somebody can explain to me in small words why I'm wrong and have the mental capacity of a damaged rodent. Linus
Re: New warnings with gcc-11
I have heard nothing about this, and it remains the only warning from my allmodconfig build (I have another one for drm compiled with clang, but there I at least heard back that a fix exists). Since I am going to release rc1 tomorrow, and I don't want to release it with an ugly compiler warning, I took it upon myself to just fix the code: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fec4d42724a1bf3dcba52307e55375fdb967b852 HOWEVER. That commit fixes the warning, and is at worst harmless. At best it fixes an access to random stack memory. But it does smell like somebody who actually knows how these arrays work should look at that code. IOW, maybe the code should actually have read 16 bytes from the Event Status Indicator? Maybe offset 10 was wrong? Maybe drm_dp_channel_eq_ok() should never have taken six bytes to begin with? It's a mystery, and I haven't heard anything otherwise, so there it is. Linus On Wed, Apr 28, 2021 at 12:27 AM Jani Nikula wrote: > > On Tue, 27 Apr 2021, Linus Torvalds wrote: > > I've updated to Fedora 34 on one of my machines, and it causes a lot > > of i915 warnings like > > > > drivers/gpu/drm/i915/intel_pm.c: In function ‘ilk_setup_wm_latency’: > > drivers/gpu/drm/i915/intel_pm.c:3059:9: note: referencing argument 3 > > of type ‘const u16 *’ {aka ‘const short unsigned int *’} > > drivers/gpu/drm/i915/intel_pm.c:2994:13: note: in a call to function > > ‘intel_print_wm_latency’ > > > > and the reason is that gcc now seems to look at the argument array > > size more, and notices that > > Arnd Bergmann reported some of these a while back. I think we have some > of them fixed in our -next already, but not all. Thanks for the > reminder.
Re: [git pull] drm fixes round two for 5.13-rc1
On Sun, May 9, 2021 at 11:16 AM Dave Airlie wrote: > > Bit later than usual, I queued them all up on Friday then promptly > forgot to write the pull request email. This is mainly amdgpu fixes, > with some radeon/msm/fbdev and one i915 gvt fix thrown in. Hmm. Gcc seems ok with this, but clang complains: drivers/video/fbdev/core/fbmem.c:736:21: warning: attribute declaration must precede definition [-Wignored-attributes] static const struct __maybe_unused seq_operations proc_fb_seq_ops = { ^ but I noticed it only after I had already pushed out the pull. I'm actually surprised that gcc accepted that horrid mess: putting "__maybe_unused" between the "struct" and the struct name is very very wrong. I fixed it up after the merge due to not noticing earlier.. Maybe the drm test robots should start testing with clang too? Linus
Re: [PATCH] video: fbdev: vga16fb: fix OOB write in vga16fb_imageblit()
On Fri, May 14, 2021 at 9:20 AM Tetsuo Handa wrote: > > Currently it is impossible to control upper limit of rows/columns values > based on amount of memory reserved for the graphical screen, for > resize_screen() calls vc->vc_sw->con_resize() only if vc->vc_mode is not > already KD_GRAPHICS Honestly, the saner approach would seem to be to simply error out if vc_mode is KD_GRAPHICS. Doing VT_RESIZE while in KD_GRAPHICS mode seems _very_ questionable, and is clearly currently very buggy. So why not just say "that clearly already doesn't work, so make it explicitly not permitted"? Linus
Re: [PATCH] video: fbdev: vga16fb: fix OOB write in vga16fb_imageblit()
On Fri, May 14, 2021 at 10:29 AM Linus Torvalds wrote: > > So why not just say "that clearly already doesn't work, so make it > explicitly not permitted"? IOW, something like this would seem fairly simple and straightforward: diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 01645e87b3d5..f24e627b7402 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1171,8 +1171,13 @@ static inline int resize_screen(struct vc_data *vc, int width, int height, /* Resizes the resolution of the display adapater */ int err = 0; - if (vc->vc_mode != KD_GRAPHICS && vc->vc_sw->con_resize) + if (vc->vc_sw->con_resize) { + // If we have a resize function but are in KD_GRAPHICS mode, + // we can't actually do a resize and need to error out. + if (vc->vc_mode == KD_GRAPHICS) + return -EINVAL; err = vc->vc_sw->con_resize(vc, width, height, user); + } return err; } not tested, but it looks ObviouslyCorrect(tm), and since we know the old case didn't work right, it seems very safe to do. Linus
Re: [git pull] drm fixes for 5.13-rc2
On Thu, May 13, 2021 at 7:34 PM Dave Airlie wrote: > > Just realised I got the tag header wrong, these are the rc2 fixes. Heh. The tag message also says: > vc4: > - drop an used function which just makes me think you may have started drinking early ;) I fixed it up. Skål! Linus
Re: [PATCH] video: fbdev: vga16fb: fix OOB write in vga16fb_imageblit()
On Fri, May 14, 2021 at 10:37 AM Linus Torvalds wrote: > > IOW, something like this would seem fairly simple and straightforward: Proper patch in case syzbot can test this.. Linus From b33ca195cecea478768de353b3ae976c07a65615 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 14 May 2021 11:06:12 -0700 Subject: [PATCH] vt: don't allow text-mode resizing when in KD_GRAPHICS mode The VT layer itself just keeps track of the underlying text contents just fine, but if the underlying hardware driver has a con_resize() function, we can't just ignore it when in KD_GRAPHICS mode. So just refuse to do a text mode resize if we're not in text mode. Reported-by: Tetsuo Handa Reported-by: syzbot Signed-off-by: Linus Torvalds --- drivers/tty/vt/vt.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 01645e87b3d5..f24e627b7402 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1171,8 +1171,13 @@ static inline int resize_screen(struct vc_data *vc, int width, int height, /* Resizes the resolution of the display adapater */ int err = 0; - if (vc->vc_mode != KD_GRAPHICS && vc->vc_sw->con_resize) + if (vc->vc_sw->con_resize) { + // If we have a resize function but are in KD_GRAPHICS mode, + // we can't actually do a resize and need to error out. + if (vc->vc_mode == KD_GRAPHICS) + return -EINVAL; err = vc->vc_sw->con_resize(vc, width, height, user); + } return err; } -- 2.31.1.365.ga2a05a39c5
Re: [PATCH] video: fbdev: vga16fb: fix OOB write in vga16fb_imageblit()
On Fri, May 14, 2021 at 1:25 PM Maciej W. Rozycki wrote: > > Overall I think it does make sense to resize the text console at any > time, even if the visible console (VT) chosen is in the graphics mode, It might make sense, but only if we call the function to update the low-level data. Not calling it, and then starting to randomly use the (wrong) geometry, and just limiting it so that it's all within the buffer - THAT does not make sense. So I think your patch is fundamentally wrong. It basically says "let's use random stale incorrect data, but just make sure that the end result is still within the allocated buffer". My patch is at least conceptually sane. An alternative would be to just remove the "vcmode != KD_GRAPHICS" check entirely, and always call con_resize() to update the low-level data, but honestly, that seems very likelty to break something very fundamentally, since it's not how any of fbcon has ever been tested, Another alternative would be to just delay the resize to when vcmode is put back to text mode again. That sounds somewhat reasonable to me, but it's a pretty big thing. But no, your patch to just "knowingly use entirely wrong values, then add a limit check because we know the values are possibly garbage and not consistent with reality" is simply not acceptable. Linus
Re: [PATCH] video: fbdev: vga16fb: fix OOB write in vga16fb_imageblit()
On Fri, May 14, 2021 at 1:32 PM Linus Torvalds wrote: > > Another alternative would be to just delay the resize to when vcmode > is put back to text mode again. That sounds somewhat reasonable to me, > but it's a pretty big thing. Actually thinking more about that option, it sounds horrible. It would mean that we'd continue to use the old geometry for the actual VC buffers for a random time, and then change it to the new geometry at some arbitrary point. So I think the only reasonable approach (apart from just my "don't do that then") might be to just always call ->con_resize(). There are only actually three cases of "->con_resize()", so it might not be too bad. Looking at it, both sisusbcon_resize() and vgacon_resize() seem to be trivially fine in KD_GRAPHICS mode. vgacon already seems to have that "!vga_is_gfx" test, and does vgacon_doresize() at vgacon_switch(). It might need to add a vgacon_doresize() to the vgacon_blank() case 0 code so that it actually does the right thing when going back to KD_TEXT mode. And fbcon_resize() looks like it might be mostly ok with it too. Again, there is a con_is_visible() test, and I suspect that might need to be changed to if (con_is_visible(vc) && vc->vc_mode == KD_TEXT) instead, but it doesn't look _too_ bad. So I think just removing the "vc->vc_mode != KD_GRAPHICS" test from resize_screen() might be the way to go. That way, the low-level data structures actually are in sync with the resize, and the "out of bounds" bug should never happen. Would you mind testing that? Linus
Re: [PATCH v2] tty: vt: always invoke vc->vc_sw->con_resize callback
On Sat, May 15, 2021 at 9:33 AM Maciej W. Rozycki wrote: > > NB I suggest that you request your change to be backported, i.e. post v3 > with: > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Cc: sta...@vger.kernel.org # v2.6.12+ I've applied it to my tree, but let's wait to see that it doesn't cause any issues before notifying the stable people. Linus
Re: [PATCH AUTOSEL 5.12 5/5] tty: vt: always invoke vc->vc_sw->con_resize callback
On Mon, May 17, 2021 at 6:09 PM Sasha Levin wrote: > > From: Tetsuo Handa > > [ Upstream commit ffb324e6f874121f7dce5bdae5e05d02baae7269 ] So I think the commit is fine, and yes, it should be applied to stable, but it's one of those "there were three different patches in as many days to fix the problem, and this is the right one, but maybe stable should hold off for a while to see that there aren't any problem reports". I don't think there will be any problems from this, but while the patch is tiny, it's conceptually quite a big change to something that people haven't really touched for a long time. So use your own judgement, but it might be a good idea to wait a week before backporting this to see if anything screams. Linus
Re: [git pull] drm for 5.14-rc1
On Wed, Jun 30, 2021 at 9:34 PM Dave Airlie wrote: > > Hi Linus, > > This is the main drm pull request for 5.14-rc1. > > I've done a test pull into your current tree, and hit two conflicts > (one in vc4, one in amdgpu), both seem pretty trivial, the amdgpu one > is recent and sfr sent out a resolution for it today. Well, the resolutions may be trivial, but the conflict made me look at the code, and it's buggy. Commit 04d8d73dbcbe ("drm/amdgpu: add common HMM get pages function") is broken. It made the code do mmap_read_lock(mm); vma = find_vma(mm, start); mmap_read_unlock(mm); and then it *uses* that "vma" after it has dropped the lock. That's a big no-no - once you've dropped the lock, the vma contents simply aren't reliable any more. That mapping could now be unmapped and removed at any time. Now, the conflict actually made one of the uses go away (switching to vma_lookup() means that the subsequent code no longer needs to look at "vm_start" to verify we're actually _inside_ the vma), but it still checks for vma->vm_file afterwards. So those locking changes in commit 04d8d73dbcbe are completely bogus. I tried to fix up that bug while handling the conflict, but who knows what else similar is going on elsewhere. So I would ask people to (a) verify that I didn't make things worse as I fixed things up (note how I had to change the last argument to amdgpu_hmm_range_get_pages() from false to true etc). (b) go and look at their vma lookup code: you can't just look up a vma under the lock, and then drop the lock, and then think things stay stable. In particular for that (b) case: it is *NOT* enough to look up vma->vm_file inside the lock and cache that. No - if the test is about "no backing file before looking up pages", then you have to *keep* holding the lock until after you've actually looked up the pages! Because otherwise any test for "vma->vm_file" is entirely pointless, for the same reason it's buggy to even look at it after dropping the lock: because once you've dropped the lock, the thing you just tested for might not be true any more. So no, it's not valid to do bool has_file = vma && vma->vm_file; and then drop the lock, because you don't use 'vma' any more as a pointer, and then use 'has_file' outside the lock. Because after you've dropped the lock, 'has_file' is now meaningless. So it's not just about "you can't look at vma->vm_file after dropping the lock". It's more fundamental than that. Any *decision* you make based on the vma is entirely pointless and moot after the lock is dropped! Did I fix it up correctly? Who knows. The code makes more sense to me now and seems valid. But I really *really* want to stress how locking is important. You also can't just unlock in the middle of an operation - even if you then take the lock *again* later (as amdgpu_hmm_range_get_pages() then did), the fact that you unlocked in the middle means that all the earlier tests you did are simply no longer valid when you re-take the lock. Linus
Re: Linux 5.14-rc1
On Mon, Jul 12, 2021 at 12:08 AM Jon Masters wrote: > > I happened to be installing a Fedora 34 (x86) VM for something and did a > test kernel compile that hung on boot. Setting up a serial console I get > the below backtrace from ttm but I have not had chance to look at it. It's a NULL pointer in qxl_bo_delete_mem_notify(), with the code disassembling to 16: 55push %rbp 17: 48 89 fd mov%rdi,%rbp 1a: e8 a2 02 00 00callq 0x2c1 1f: 84 c0test %al,%al 21: 74 0dje 0x30 23: 48 8b 85 68 01 00 00 mov0x168(%rbp),%rax 2a:* 83 78 10 03 cmpl $0x3,0x10(%rax) <-- trapping instruction 2e: 74 02je 0x32 30: 5dpop%rbp 31: c3retq and that "cmpl $3" looks exactly like that if (bo->resource->mem_type == TTM_PL_PRIV and the bug is almost certainly from commit d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer"), which did - if (bo->mem.mem_type == TTM_PL_PRIV ... + if (bo->resource->mem_type == TTM_PL_PRIV ... and claimed "No functional change". But clearly the "bo->resource" pointer is NULL. Added guilty parties and dri-devel mailing list. Christian? Full report at https://lore.kernel.org/lkml/a9473821-1d53-0037-7590-aeaf8e85e...@jonmasters.org/ but there's not a whole lot else there that is interesting except for the call trace: ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm] ttm_bo_release+0x1a1/0x300 [ttm] ttm_bo_delayed_delete+0x1be/0x220 [ttm] ttm_device_delayed_workqueue+0x18/0x40 [ttm] process_one_work+0x1ec/0x390 worker_thread+0x53/0x3e0 so it's presumably the cleanup phase and perhaps "bo->resource" has been deallocated and cleared? Linus
Re: [patch V3 22/37] highmem: High implementation details and document API
On Tue, Nov 3, 2020 at 2:33 AM Thomas Gleixner wrote: > > +static inline void *kmap(struct page *page) > +{ > + void *addr; > + > + might_sleep(); > + if (!PageHighMem(page)) > + addr = page_address(page); > + else > + addr = kmap_high(page); > + kmap_flush_tlb((unsigned long)addr); > + return addr; > +} > + > +static inline void kunmap(struct page *page) > +{ > + might_sleep(); > + if (!PageHighMem(page)) > + return; > + kunmap_high(page); > +} I have no complaints about the patch, but it strikes me that if people want to actually have much better debug coverage, this is where it should be (I like the "every other address" thing too, don't get me wrong). In particular, instead of these PageHighMem(page) tests, I think something like this would be better: #ifdef CONFIG_DEBUG_HIGHMEM #define page_use_kmap(page) ((page),1) #else #define page_use_kmap(page) PageHighMem(page) #endif adn then replace those "if (!PageHighMem(page))" tests with "if (!page_use_kmap())" instead. IOW, in debug mode, it would _always_ remap the page, whether it's highmem or not. That would really stress the highmem code and find any fragilities. No? Anyway, this is all sepatrate from the series, which still looks fine to me. Just a reaction to seeing the patch, and Thomas' earlier mention that the highmem debugging doesn't actually do much. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm next pull for 5.10-rc1
On Tue, Nov 3, 2020 at 2:20 PM Kirill A. Shutemov wrote: > > On Thu, Oct 15, 2020 at 11:33:08AM +1000, Dave Airlie wrote: > > drm/nouveau/kms: Search for encoders' connectors properly > > This commit (09838c4efe9a) broke boot for me. These two hunks in > particular: Christ. It's been two weeks. I'm doing -rc4 today, and I still don't have the fix. The problem seems entirely obvious, as reported by Kirill: the nv50 code unconditionally calls the "atomic_{dis,en}able()" functions, even when not everybody was converted. The fix seems to be to either just do the conversion of the remaining cases (which looks like just adding an argument to the remaining functions, and using that for the "atomic" callback), or the trivial suggestion by Kirill from two weeks ago: > I hacked up patch to use help->disable/help->enable if atomic_ versions > are NULL. It worked. Kirill, since the nouveau people aren't fixing this, can you just send me your tested patch? Lyude/Ben - let me just say that I think this is all a huge disgrace. You had a problem report with a bisected commit, a suggested fix, and two weeks later there's absolutely _nothing_. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm nouveau urgent fixes for 5.10-rc4
On Sun, Nov 15, 2020 at 12:41 PM Dave Airlie wrote: > > As mentioned I did have a fixes pull from Ben from after I'd sent you > out stuff, it contains the fix for the regression reported in the rc1 > thread along with two others. Thanks, pulled, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Linux 5.10-rc4
On Wed, Nov 18, 2020 at 4:12 AM David Laight wrote: > > I've got the 'splat' below during boot. > This is an 8-core C2758 Atom cpu using the on-board/cpu graphics. > User space is Ubuntu 20.04. > > Additionally the X display has all the colours and alignment slightly > messed up. > 5.9.0 was ok. > I'm just guessing the two issues are related. Sounds likely. But it would be lovely if you could bisect when exactly the problem(s) started to both verify that, and just to pinpoint the exact change.. I'm adding Thomas Zimmermann to the cc, because he did that "drm/ast: Program display mode in CRTC's atomic_enable" which looks relevant in that it's right in that call-chain. Did some initialization perhaps get overlooked? And Dave and Daniel and the drm list cc'd as well.. Full splat left quoted below for new people and list. Linus > [ 20.809891] WARNING: CPU: 0 PID: 973 at > drivers/gpu/drm/drm_gem_vram_helper.c:284 drm_gem_vram_offset+0x35/0x40 > [drm_vram_helper] > [ 20.821543] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac > scsi_dh_emc scsi_dh_alua ipmi_ssif intel_powerclamp coretemp kvm_intel kvm > joydev input_leds ipmi_si intel_cstate ipmi_devintf ipmi_msghandler mac_hid > sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs > blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy > async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath > linear ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt > fb_sys_fops cec drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul > ghash_clmulni_intel gpio_ich drm aesni_intel hid_generic glue_helper > crypto_simd igb usbhid cryptd ahci i2c_i801 hid libahci i2c_smbus lpc_ich dca > i2c_ismt i2c_algo_bit > [ 20.887477] CPU: 0 PID: 973 Comm: gnome-shell Not tainted 5.10.0-rc4+ #78 > [ 20.894274] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 > [ 20.900896] RIP: 0010:drm_gem_vram_offset+0x35/0x40 [drm_vram_helper] > [ 20.907342] Code: 00 48 89 e5 85 c0 74 17 48 83 bf 78 01 00 00 00 74 18 48 > 8b 87 80 01 00 00 5d 48 c1 e0 0c c3 0f 0b 48 c7 c0 ed ff ff ff 5d c3 <0f> 0b > 31 c0 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 87 18 06 > [ 20.926100] RSP: 0018:9f59811d3a68 EFLAGS: 00010246 > [ 20.931339] RAX: 0002 RBX: 8b46861e20c0 RCX: > c032d600 > [ 20.938479] RDX: 8b468f47a000 RSI: 8b46861e2000 RDI: > 8b468f9acc00 > [ 20.945622] RBP: 9f59811d3a68 R08: 0040 R09: > 8b46864ce288 > [ 20.952769] R10: R11: 0001 R12: > 8b468f47a000 > [ 20.959915] R13: R14: R15: > 8b468ad2bf00 > [ 20.967057] FS: 7f5b37ac5cc0() GS:8b49efc0() > knlGS: > [ 20.975149] CS: 0010 DS: ES: CR0: 80050033 > [ 20.980904] CR2: 7f5b3d093f00 CR3: 000103438000 CR4: > 001006f0 > [ 20.988047] Call Trace: > [ 20.990506] ast_cursor_page_flip+0x22/0x100 [ast] > [ 20.995313] ast_cursor_plane_helper_atomic_update+0x46/0x70 [ast] > [ 21.001524] drm_atomic_helper_commit_planes+0xbd/0x220 [drm_kms_helper] > [ 21.008243] drm_atomic_helper_commit_tail_rpm+0x3a/0x70 [drm_kms_helper] > [ 21.015062] commit_tail+0x99/0x130 [drm_kms_helper] > [ 21.020050] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper] > [ 21.026269] drm_atomic_commit+0x4a/0x50 [drm] > [ 21.030737] drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper] > [ 21.037384] __setplane_atomic+0xcc/0x110 [drm] > [ 21.041953] drm_mode_cursor_universal+0x13e/0x260 [drm] > [ 21.047299] drm_mode_cursor_common+0xef/0x220 [drm] > [ 21.052287] ? alloc_set_pte+0x10d/0x6d0 > [ 21.056244] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] > [ 21.061242] drm_mode_cursor2_ioctl+0xe/0x10 [drm] > [ 21.066067] drm_ioctl_kernel+0xae/0xf0 [drm] > [ 21.070455] drm_ioctl+0x241/0x3f0 [drm] > [ 21.074415] ? drm_mode_cursor_ioctl+0x60/0x60 [drm] > [ 21.079401] __x64_sys_ioctl+0x91/0xc0 > [ 21.083167] do_syscall_64+0x38/0x90 > [ 21.086755] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 21.091813] RIP: 0033:0x7f5b3cf1350b > [ 21.095403] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 > c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 > [ 21.114154] RSP: 002b:7ffef1966588 EFLAGS: 0246 ORIG_RAX: > 0010 > [ 21.121730] RAX: ffda RBX: 7ffef19665c0 RCX: > 7f5b3cf1350b > [ 21.128870] RDX: 7ffef19665c0 RSI: c02464bb RDI: > 0009 > [ 21.136013] RBP: c02464bb R08: 0040 R09: > 0004 > [ 21.143157] R10: 0002 R11: 0246 R12: > 561ec9d10060 > [ 21.150295] R13: 0009 R14: 561eca2cc9a0 R15: > 0040
Re: [git pull] drm for 5.18-rc1
On Wed, Mar 23, 2022 at 7:30 PM Dave Airlie wrote: > > This is the main drm pull request for 5.18. > > The summary changelog is below, lots of work all over, > Intel improving DG2 support, amdkfd CRIU support, msm > new hw support, and faster fbdev support. Ok, so this was annoying. I've merged it, but will note three things that I really hope get fixed / checked: (1) My merge resolution looked mostly trivial, except for an annoying conflict between commits 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml") and 6d0990e6e844 ("media: dt-binding: mediatek: Get rid of mediatek,larb for multimedia HW") where one of them splits up a file that is modified by the other. I ended up just getting rid of all the "mediatek,larb" mentions in the split-up files, despite the fact that (a) those mentions can be found elsewhere and (b) the split-up did other changes too, so maybe it's wrong. (2) As Guenter reported, the fbdev performance improvement of cfb_imageblit() is broken. I was going to just revert it, but I see that there is a two-series patch to fix it at https://lore.kernel.org/all/20220313192952.12058-1-tzimmerm...@suse.de/ so I merged it in that broken form, in the hope that this set of fixes will be sent to me asap. (3) Very similarly to (2), but broken mediatek DT files. I hope my changes in (1) didn't make things worse, but there's a series of fixes as https://lore.kernel.org/all/20220309134702.9942-1-jason-jh@mediatek.com/ and again I hope I'll get those fixes from the proper places asap. I considered just delaying merging this all entirely, but it seems better to get this all in, with the known problems and known fixes, and see if we hit something _else_ too. Anyway, let's hope I didn't miss anything, and that those are the only major issues. Linus
Re: [git pull] drm fixes for 5.18-rc1
On Thu, Mar 24, 2022 at 7:13 PM Dave Airlie wrote: > > Some fixes were queued up in and in light of the fbdev regressions, > I've pulled those in as well, Thanks, pulled. It sounds (from the other thread) that the mediatek DT issue is also about to be fixed, even if it's not yet here. But that hopefully (probably?) affects fewer people and testing robots. Linus
amdgpu link problem (was Re: [git pull] drm for 5.18-rc1)
I didn't notice this until now, probably because everything still _works_, but I get a new big warning splat at bootup on my main workstation these days as of the merge window changes. The full warning is attached, but it's basically the ASSERT(0) at line 938 in drivers/gpu/drm/amd/display/dc/core/dc_link.c and it looks to have been introduced by commit c282d9512cdd ("drm/amd/display: factor out dp detection link training and mst top detection"). This is the same old setup I've reported before, with some random Radeon card with two monitors attached (PCI ID 1002:67df rev e7, subsystem ID 1da2:e353). I think the card went by the name "Sapphire Pulse RX 580 8GB" in case any of that matters, but it's been working fine. It still works fine, it just has a big ugly boot-time splat. As mentioned, two 4K monitors attached, both over HDMI. If there is any particular info you want, just let me know where/how to find it, and I can provide. Linus warn Description: Binary data
Re: [PATCH] Kbuild: remove -std=gnu89 from compiler arguments
On Sun, Feb 27, 2022 at 1:54 PM Arnd Bergmann wrote: > > Since the differences between gnu99, gnu11 and gnu17 are fairly minimal > and mainly impact warnings at the -Wpedantic level that the kernel > never enables, the easiest way is to just leave out the -std=gnu89 > argument entirely, and rely on the compiler default language setting, > which is gnu11 for gcc-5, and gnu1x/gnu17 for all other supported > versions of gcc or clang. Honestly, I'd rather keep the C version we support as some explicit thing, instead of "whatever the installed compiler is". Not only do I suspect that you can set it in gcc spec files (so the standard version might actually be site-specific, not compiler version specific), but particularly with clang, I'd like that "GNU extensions enabled" to be explicit. Yes, maybe it's the default, but let's make sure. The C version level has traditionally had a lot of odd semantic meaning details - you mention "inline", others have existed. So it's not just the actual new features that some C version implements, it's those kind of "same syntax, different meaning" issues. I really don't think that's something we want in the kernel any more. Been there, done that, and we did the explicit standards level for a reason. It may be true that c99/c11/c17 are all very similar, and don't have those issues. Or maybe they do. And I don't want somebody with a newer compiler version to not notice that he or she ended up using a c17 feature, just because _that_ compiler supported it, and then other people get build errors because their compilers use gnu11 instead by default. Put another way: I see absolutely no upside to allowing different users using higher/lower versions of the standard. There are only downsides. If gnu11 is supported by gcc-5.1 and up, and all the relevant clang versions, then let's just pick that. And if there are any possible future advantages to gnu17 (or eventual gnu2x versions), let's document those, so that we can say "once our compiler version requirements go up sufficiently, we'll move to gnuXX because we want to take advantage of YY". Please? Linus Linus
Re: [greybus-dev] [PATCH] Kbuild: remove -std=gnu89 from compiler arguments
On Sun, Feb 27, 2022 at 3:04 PM Alex Elder wrote: > > Glancing at the Greybus code, I don't believe there's any > reason it needs to shift a negative value. Such warnings > could be fixed by making certain variables unsigned, for > example. As mentioned in the original thread, making things unsigned actually is likely to introduce bugs and make things worse. The warning is simply bogus, and the fact that it was enabled by -Wextra in gcc for std=gnu99 and up was a mistake that looks likely to be fixed in gcc. So don't try to "fix" the code to make any possible warnings go away. You may just make things worse. (That is often true in general for the more esoteric warnings, but in this case it's just painfully more obvious). Linus
Re: [PATCH] [v2] Kbuild: move to -std=gnu11
On Mon, Feb 28, 2022 at 3:37 AM Arnd Bergmann wrote: > > I think the KBUILD_USERCFLAGS portion and the modpost.c fix for it > make sense regardless of the -std=gnu11 change I do think they make sense, but I want to note again that people doing cross builds obviously use different tools for user builds than for the kernel. In fact, even not cross-building, we've had situations where the "kbuild" compiler is different from the host compiler, because people have upgraded one but not the other (upgrading the kernel build environment is actually much easier than upgrading the host build environment, because you don't need all the random libraries etc, and you can literally _just_ build your own gcc and binutils) And we have *not* necessarily required that the host tools match the kernel tools. So I could well imagine that there are people who build their kernels, but their host build environment might be old enough that -std=gnu11 is problematic for that part. And note how any change to KBUILD_USERCFLAGS is reflected in KBUILD_HOSTCFLAGS. So I would suggest that the KBUILD_USERCFLAGS part of the patch (and the modpost.c change that goes with it) be done as a separate commit. Because we might end up reverting that part. Hmm? Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 4:19 AM Christian König wrote: > > I don't think that using the extra variable makes the code in any way > more reliable or easier to read. So I think the next step is to do the attached patch (which requires that "-std=gnu11" that was discussed in the original thread). That will guarantee that the 'pos' parameter of list_for_each_entry() is only updated INSIDE the for_each_list_entry() loop, and can never point to the (wrongly typed) head entry. And I would actually hope that it should actually cause compiler warnings about possibly uninitialized variables if people then use the 'pos' pointer outside the loop. Except (a) that code in sgx/encl.c currently initializes 'tmp' to NULL for inexplicable reasons - possibly because it already expected this behavior (b) when I remove that NULL initializer, I still don't get a warning, because we've disabled -Wno-maybe-uninitialized since it results in so many false positives. Oh well. Anyway, give this patch a look, and at least if it's expanded to do "(pos) = NULL" in the entry statement for the for-loop, it will avoid the HEAD type confusion that Jakob is working on. And I think in a cleaner way than the horrid games he plays. (But it won't avoid possible CPU speculation of such type confusion. That, in my opinion, is a completely different issue) I do wish we could actually poison the 'pos' value after the loop somehow - but clearly the "might be uninitialized" I was hoping for isn't the way to do it. Anybody have any ideas? Linus p Description: Binary data
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 11:56 AM Linus Torvalds wrote: > > I do wish we could actually poison the 'pos' value after the loop > somehow - but clearly the "might be uninitialized" I was hoping for > isn't the way to do it. Side note: we do need *some* way to do it. Because if we make that change, and only set it to another pointer when not the head, then we very much change the semantics of "list_for_each_head()". The "list was empty" case would now exit with 'pos' not set at all (or set to NULL if we add that). Or it would be set to the last entry. And regardless, that means that all the if (pos == head) kinds of checks after the loop would be fundamentally broken. Darn. I really hoped for (and naively expected) that we could actually have the compiler warn about the use-after-loop case. That whole "compiler will now complain about bad use" was integral to my clever plan to use the C99 feature of declaring the iterator inside the loop. But my "clever plan" was apparently some ACME-level Wile E. Coyote sh*t. Darn. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 12:03 PM Linus Torvalds wrote: > > Side note: we do need *some* way to do it. Ooh. This patch is a work of art. And I mean that in the worst possible way. We can do typeof(pos) pos in the 'for ()' loop, and never use __iter at all. That means that inside the for-loop, we use a _different_ 'pos' than outside. And then the compiler will not see some "might be uninitialized", but the outer 'pos' *will* be uninitialized. Unless, of course, the outer 'pos' had that pointless explicit initializer. Here - can somebody poke holes in this "work of art" patch? Linus Makefile | 2 +- arch/x86/kernel/cpu/sgx/encl.c | 2 +- include/linux/list.h | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index daeb5c88b50b..cc4b0a266af0 100644 --- a/Makefile +++ b/Makefile @@ -515,7 +515,7 @@ KBUILD_CFLAGS := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \ -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE \ -Werror=implicit-function-declaration -Werror=implicit-int \ -Werror=return-type -Wno-format-security \ - -std=gnu89 + -std=gnu11 KBUILD_CPPFLAGS := -D__KERNEL__ KBUILD_AFLAGS_KERNEL := KBUILD_CFLAGS_KERNEL := diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 48afe96ae0f0..87db2f3936b0 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -450,7 +450,7 @@ static void sgx_mmu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier); - struct sgx_encl_mm *tmp = NULL; + struct sgx_encl_mm *tmp; /* * The enclave itself can remove encl_mm. Note, objects can't be moved diff --git a/include/linux/list.h b/include/linux/list.h index dd6c2041d09c..708078b2f24d 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -634,9 +634,9 @@ static inline void list_splice_tail_init(struct list_head *list, * @head: the head for your list. * @member: the name of the list_head within the struct. */ -#define list_for_each_entry(pos, head, member)\ - for (pos = list_first_entry(head, typeof(*pos), member); \ - !list_entry_is_head(pos, head, member); \ +#define list_for_each_entry(pos, head, member) \ + for (typeof(pos) pos = list_first_entry(head, typeof(*pos), member); \ + !list_entry_is_head(pos, head, member); \ pos = list_next_entry(pos, member)) /**
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 12:10 PM Linus Torvalds wrote: > > We can do > > typeof(pos) pos > > in the 'for ()' loop, and never use __iter at all. > > That means that inside the for-loop, we use a _different_ 'pos' than outside. The thing that makes me throw up in my mouth a bit is that in that typeof(pos) pos the first 'pos' (that we use for just the typeof) is that outer-level 'pos', IOW it's a *different* 'pos' than the second 'pos' in that same declaration that declares the inner level shadowing new 'pos' variable. If I was a compiler person, I would say "Linus, that thing is too ugly to live", and I would hate it. I'm just hoping that even compiler people say "that's *so* ugly it's almost beautiful". Because it does seem to work. It's not pretty, but hey, it's not like our headers are really ever be winning any beauty contests... Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 12:16 PM Matthew Wilcox wrote: > > Then we can never use -Wshadow ;-( I'd love to be able to turn it on; > it catches real bugs. Oh, we already can never use -Wshadow regardless of things like this. That bridge hasn't just been burned, it never existed in the first place. The whole '-Wshadow' thing simply cannot work with local variables in macros - something that we've used since day 1. Try this (as a "p.c" file): #define min(a,b) ({ \ typeof(a) __a = (a);\ typeof(b) __b = (b);\ __a < __b ? __a : __b; }) int min3(int a, int b, int c) { return min(a,min(b,c)); } and now do "gcc -O2 -S t.c". Then try it with -Wshadow. In other words, -Wshadow is simply not acceptable. Never has been, never will be, and that has nothing to do with the typeof(pos) pos kind of thing. Your argument just isn't an argument. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 12:29 PM Johannes Berg wrote: > > If we're willing to change the API for the macro, we could do > > list_for_each_entry(type, pos, head, member) > > and then actually take advantage of -Wshadow? See my reply to Willy. There is no way -Wshadow will ever happen. I considered that (type, pos, head, member) kind of thing, to the point of trying it for one file, but it ends up as horrendous syntax. It turns out that declaring the type separately really helps, and avoids crazy long lines among other things. It would be unacceptable for another reason too - the amount of churn would just be immense. Every single use of that macro (and related macros) would change, even the ones that really don't need it or want it (ie the good kinds that already only use the variable inside the loop). So "typeof(pos) pos" may be ugly - but it's a very localized ugly. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel wrote: > > The goal of this is to get compiler warnings right? This would indeed be > great. Yes, so I don't mind having a one-time patch that has been gathered using some automated checker tool, but I don't think that works from a long-term maintenance perspective. So if we have the basic rule being "don't use the loop iterator after the loop has finished, because it can cause all kinds of subtle issues", then in _addition_ to fixing the existing code paths that have this issue, I really would want to (a) get a compiler warning for future cases and (b) make it not actually _work_ for future cases. Because otherwise it will just happen again. > Changing the list_for_each_entry() macro first will break all of those cases > (e.g. the ones using 'list_entry_is_head()). So I have no problems with breaking cases that we basically already have a patch for due to your automated tool. There were certainly more than a handful, but it didn't look _too_ bad to just make the rule be "don't use the iterator after the loop". Of course, that's just based on that patch of yours. Maybe there are a ton of other cases that your patch didn't change, because they didn't match your trigger case, so I may just be overly optimistic here. But basically to _me_, the important part is that the end result is maintainable longer-term. I'm more than happy to have a one-time patch to fix a lot of dubious cases if we can then have clean rules going forward. > I assumed it is better to fix those cases first and then have a simple > coccinelle script changing the macro + moving the iterator into the scope > of the macro. So that had been another plan of mine, until I actually looked at changing the macro. In the one case I looked at, it was ugly beyond belief. It turns out that just syntactically, it's really nice to give the type of the iterator from outside the way we do now. Yeah, it may be a bit odd, and maybe it's partly because I'm so used to the "list_for_each_list_entry()" syntax, but moving the type into the loop construct really made it nasty - either one very complex line, or having to split it over two lines which was even worse. Maybe the place I looked at just happened to have a long typename, but it's basically always going to be a struct, so it's never a _simple_ type. And it just looked very odd adn unnatural to have the type as one of the "arguments" to that list_for_each_entry() macro. So yes, initially my idea had been to just move the iterator entirely inside the macro. But specifying the type got so ugly that I think that typeof (pos) pos trick inside the macro really ends up giving us the best of all worlds: (a) let's us keep the existing syntax and code for all the nice cases that did everything inside the loop anyway (b) gives us a nice warning for any normal use-after-loop case (unless you explicitly initialized it like that sgx_mmu_notifier_release() function did for no good reason (c) also guarantees that even if you don't get a warning, non-converted (or newly written) bad code won't actually _work_ so you end up getting the new rules without any ambiguity or mistaken > With this you are no longer able to set the 'outer' pos within the list > iterator loop body or am I missing something? Correct. Any assignment inside the loop will be entirely just to the local loop case. So any "break;" out of the loop will have to set another variable - like your updated patch did. > I fail to see how this will make most of the changes in this > patch obsolete (if that was the intention). I hope my explanation above clarifies my thinking: I do not dislike your patch, and in fact your patch is indeed required to make the new semantics work. What I disliked was always the maintainability of your patch - making the rules be something that isn't actually visible in the source code, and letting the old semantics still work as well as they ever did, and having to basically run some verification pass to find bad users. (I also disliked your original patch that mixed up the "CPU speculation type safety" with the actual non-speculative problems, but that was another issue). Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 3:26 PM Matthew Wilcox wrote: > > #define ___PASTE(a, b) a##b > #define __PASTE(a, b) ___PASTE(a, b) > #define _min(a, b, u) ({ \ Yeah, except that's ugly beyond belief, plus it's literally not what we do in the kernel. Really. The "-Wshadow doesn't work on the kernel" is not some new issue, because you have to do completely insane things to the source code to enable it. Just compare your uglier-than-sin version to my straightforward one. One does the usual and obvious "use a private variable to avoid the classic multi-use of a macro argument". And the other one is an abomination. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 4:45 PM Linus Torvalds wrote: > > Yeah, except that's ugly beyond belief, plus it's literally not what > we do in the kernel. (Of course, I probably shouldn't have used 'min()' as an example, because that is actually one of the few places where we do exactly that, using our __UNIQUE_ID() macros. Exactly because people _have_ tried to do -Wshadow when doing W=2). Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 4:38 PM Segher Boessenkool wrote: > > In C its scope is the rest of the declaration and the entire loop, not > anything after it. This was the same in C++98 already, btw (but in > pre-standard versions of C++ things were like you remember, yes, and it > was painful). Yeah, the original C++ model was just unadulterated garbage, with no excuse for it, and the scope was not the loop, but the block the loop existed in. That would never have been acceptable for the kernel - it's basically just an even uglier version of "put variable declarations in the middle of code" (and we use "-Wdeclaration-after-statement" to disallow that for kernel code, although apparently some of our user space tooling code doesn't enforce or follow that rule). The actual C99 version is the sane one which actually makes it easier and clearer to have loop iterators that are clearly just in loop scope. That's a good idea in general, and I have wanted to start using that in the kernel even aside from some of the loop construct macros. Because putting variables in natural minimal scope is a GoodThing(tm). Of course, we shouldn't go crazy with it. Even after we do that -std=gnu11 thing, we'll have backports to worry about. And it's not clear that we necessarily want to backport that gnu11 thing - since people who run old stable kernels also may be still using those really old compilers... Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Tue, Mar 1, 2022 at 10:14 AM Kees Cook wrote: > > The first big glitch with -Wshadow was with shadowed global variables. > GCC 4.8 fixed that, but it still yells about shadowed functions. What > _almost_ works is -Wshadow=local. Heh. Yeah, I just have long memories of "-Wshadow was a disaster". You looked into the details. > Another way to try to catch misused shadow variables is > -Wunused-but-set-varible, but it, too, has tons of false positives. That on the face of it should be an easy warning to get technically right for a compiler. So I assume the "false positives" are simply because we end up having various variables that really don't end up being used - and "intentionally" so). Or rather, they might only be used under some config option - perhaps the use is even syntactically there and parsed, but the compiler notices that it's turned off under some if (IS_ENABLED(..)) option? Because yeah, we have a lot of those. I think that's a common theme with a lot of compiler warnings: on the face of it they sound "obviously sane" and nobody should ever write code like that. A conditional that is always true? Sounds idiotic, and sounds like a reasonable thing for a compiler to warn about, since why would you have a conditional in the first place for that? But then you realize that maybe the conditional is a build config option, and "always true" suddenly makes sense. Or it's a test for something that is always true on _that_architecture_ but not in some general sense (ie testing "sizeof()"). Or it's a purely syntactic conditional, like "do { } while (0)". It's why I'm often so down on a lot of the odd warnings that are hiding under W=1 and friends. They all may make sense in the trivial case ("That is insane") but then in the end they happen for sane code. And yeah, -Wshadow has had tons of history with macro nesting, and just being badly done in the first place (eg "strlen" can be a perfectly fine local variable). That said, maybe people could ask the gcc and clan people for a way to _mark_ the places where we expect to validly see shadowing. For example, that "local variable in a macro expression statement" thing is absolutely horrendous to fix with preprocessor tricks to try to make for unique identifiers. But I think it would be much more syntactically reasonable to add (for example) a "shadow" attribute to such a variable exactly to tell the compiler "yeah, yeah, I know this identifier could shadow an outer one" and turn it off that way. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, Feb 28, 2022 at 2:29 PM James Bottomley wrote: > > However, if the desire is really to poison the loop variable then we > can do > > #define list_for_each_entry(pos, head, member) \ > for (pos = list_first_entry(head, typeof(*pos), member);\ > !list_entry_is_head(pos, head, member) && ((pos = NULL) == NULL; > \ > pos = list_next_entry(pos, member)) > > Which would at least set pos to NULL when the loop completes. That would actually have been excellent if we had done that originally. It would not only avoid the stale and incorrectly typed head entry left-over turd, it would also have made it very easy to test for "did I find an entry in the loop". But I don't much like it in the situation we are now. Why? Mainly because it basically changes the semantics of the loop _without_ any warnings about it. And we don't actually get the advantage of the nicer semantics, because we can't actually make code do list_for_each_entry(entry, ) { .. } if (!entry) return -ESRCH; .. use the entry we found .. because that would be a disaster for back-porting, plus it would be a flag-day issue (ie we'd have to change the semantics of the loop at the same time we change every single user). So instead of that simple "if (!entry)", we'd effectively have to continue to use something that still works with the old world order (ie that "if (list_entry_is_head())" model). So we couldn't really take _advantage_ of the nicer semantics, and we'd not even get a warning if somebody does it wrong - the code would just silently do the wrong thing. IOW: I don't think you are wrong about that patch: it would solve the problem that Jakob wants to solve, and it would have absolutely been much better if we had done this from the beginning. But I think that in our current situation, it's actually a really fragile solution to the "don't do that then" problem we have. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Tue, Mar 1, 2022 at 11:06 AM Linus Torvalds wrote: > > So instead of that simple "if (!entry)", we'd effectively have to > continue to use something that still works with the old world order > (ie that "if (list_entry_is_head())" model). Just to prove my point about how this is painful, that doesn't work at all. If the loop iterator at the end is NULL (good, in theory), we can't use "list_entry_is_head()" to check whether we ended. We'd have to use a new thing entirely, to handle the "list_for_each_entry() has the old/new semantics" cases. That's largely why I was pushing for the "let's make it impossible to use the loop iterator at all outside the loop". It avoids the confusing case, and the patches to move to that stricter semantic can be merged independently (and before) doing the actual semantic change. I'm not saying my suggested approach is wonderful either. Honestly, it's painful that we have so nasty semantics for the end-of-loop case for list_for_each_entry(). The minimal patch would clearly be to keep those broken semantics, and just force everybody to use the list_entry_is_head() case. That's the "we know we messed up, we are too lazy to fix it, we'll just work around it and people need to be careful" approach. And laziness is a virtue. But bad semantics are bad semantics. So it's a question of balancing those two issues. Linus
Re: [PATCH 6/6] treewide: remove check of list iterator against head past the loop body
So looking at this patch, I really reacted to the fact that quite often the "use outside the loop" case is all kinds of just plain unnecessary, but _used_ to be a convenience feature. I'll just quote the first chunk in it's entirely as an example - not because I think this chunk is particularly important, but because it's a good example: On Mon, Feb 28, 2022 at 3:09 AM Jakob Koschel wrote: > > diff --git a/arch/arm/mach-mmp/sram.c b/arch/arm/mach-mmp/sram.c > index 6794e2db1ad5..fc47c107059b 100644 > --- a/arch/arm/mach-mmp/sram.c > +++ b/arch/arm/mach-mmp/sram.c > @@ -39,19 +39,22 @@ static LIST_HEAD(sram_bank_list); > struct gen_pool *sram_get_gpool(char *pool_name) > { > struct sram_bank_info *info = NULL; > + struct sram_bank_info *tmp; > > if (!pool_name) > return NULL; > > mutex_lock(&sram_lock); > > - list_for_each_entry(info, &sram_bank_list, node) > - if (!strcmp(pool_name, info->pool_name)) > + list_for_each_entry(tmp, &sram_bank_list, node) > + if (!strcmp(pool_name, tmp->pool_name)) { > + info = tmp; > break; > + } > > mutex_unlock(&sram_lock); > > - if (&info->node == &sram_bank_list) > + if (!info) > return NULL; > > return info->gpool; I realize this was probably at least auto-generated with coccinelle, but maybe that script could be taught to do avoid the "use after loop" by simply moving the code _into_ the loop. IOW, this all would be cleaner and clear written as if (!pool_name) return NULL; mutex_lock(&sram_lock); list_for_each_entry(info, &sram_bank_list, node) { if (!strcmp(pool_name, info->pool_name)) { mutex_unlock(&sram_lock); return info; } } mutex_unlock(&sram_lock); return NULL; Ta-daa - no use outside the loop, no need for new variables, just a simple "just do it inside the loop". Yes, we end up having that lock thing twice, but it looks worth it from a "make the code obvious" standpoint. Would it be even cleaner if the locking was done in the caller, and the loop was some simple helper function? It probably would. But that would require a bit more smarts than probably a simple coccinelle script would do. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Tue, Mar 1, 2022 at 2:58 PM David Laight wrote: > > Can it be resolved by making: > #define list_entry_is_head(pos, head, member) ((pos) == NULL) > and double-checking that it isn't used anywhere else (except in > the list macros themselves). Well, yes, except for the fact that then the name is entirely misleading... And somebody possibly uses it together with list_first_entry() etc, so it really is completely broken to mix that change with the list traversal change. Linus Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Tue, Mar 1, 2022 at 3:19 PM David Laight wrote: > > Having said that there are so few users of list_entry_is_head() > it is reasonable to generate two new names. Well, the problem is that the users of list_entry_is_head() may be few - but there are a number of _other_ ways to check "was that the HEAD pointer", and not all of them are necessarily correct. IOW, different places do different random tests for "did we walk the whole loop without breaking out". And many of them happen to work. In fact, in practice, pretty much *all* of them happen to work, and you have to have the right struct layout and really really bad luck to hit a case of "type confusion ended up causing the test to not work". And *THAT* is the problem here. It's not the "there are 25ish places that current use list_entry_is_head()". It's the "there are ~480 places that use the type-confused HEAD entry that has been cast to the wrong type". And THAT is why I think we'd be better off with that bigger change that simply means that you can't use the iterator variable at all outside the loop, and try to make it something where the compiler can help catch mis-uses. Now, making the list_for_each_entry() thing force the iterator to NULL at the end of the loop does fix the problem. The issue I have with it is really just that you end up getting no warning at all from the compiler if you mix old-style and new-style semantics. Now, you *will* get an oops (if using a new-style iterator with an old-style check), but many of these things will be in odd driver code and may happen only for error cases. And if you use a new-style check with an old-style iterator (ie some backport problem), you will probably end up getting random memory corruption, because you'll decide "it's not a HEAD entry", and then you'll actually *use* the HEAD that has the wrong type cast associated with it. See what my worry is? With the "don't use iterator outside the loop" approach, the exact same code works in both the old world order and the new world order, and you don't have the semantic confusion. And *if* you try to use the iterator outside the loop, you'll _mostly_ (*) get a compiler warning about it not being initialized. Linus (*) Unless somebody initializes the iterator pointer pointlessly. Which clearly does happen. Thus the "mostly". It's not perfect, and that's most definitely not nice - but it should at least hopefully make it that much harder to mess up.
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Wed, Mar 2, 2022 at 12:07 PM Kees Cook wrote: > > I've long wanted to change kfree() to explicitly set pointers to NULL on > free. https://github.com/KSPP/linux/issues/87 We've had this discussion with the gcc people in the past, and gcc actually has some support for it, but it's sadly tied to the actual function name (ie gcc has some special-casing for "free()") See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527 for some of that discussion. Oh, and I see some patch actually got merged since I looked there last so that you can mark "deallocator" functions, but I think it's only for the context matching, not for actually killing accesses to the pointer afterwards. Linus
Re: [git pull] drm for 5.15-rc1
On Mon, Aug 30, 2021 at 10:53 PM Dave Airlie wrote: > > There are a bunch of conflicts with your tree, but none of them seem > too serious, but I might have missed something. No worries. I enjoyed seeing the AMD code-names in the conflicts, they are using positively kernel-level naming. That said, I wonder why AMD people can't use consistent formatting, mixing ALL-CAPS with underscores, spaces, whatever: /* Sienna_Cichlid */ /* Yellow Carp */ /* Navy_Flounder */ /* DIMGREY_CAVEFISH */ /* Aldebaran */ /* CYAN_SKILLFISH */ /* BEIGE_GOBY */ which shows a distinct lack of professionalism and caring in the silly naming. Linus
Re: [git pull] drm for 5.15-rc1
On Wed, Sep 1, 2021 at 10:57 AM Linus Torvalds wrote: > > No worries. I enjoyed seeing the AMD code-names in the conflicts, they > are using positively kernel-level naming. Oh, I spoke too soon. The conflict in amdgpu_ras_eeprom.c is trivial to fix up, but the *code* is garbage. It does this (from commit 14fb496a84f1: "drm/amdgpu: set RAS EEPROM address from VBIOS"): ... control->i2c_address = 0; if (amdgpu_atomfirmware_ras_rom_addr(adev, (uint8_t*)&control->i2c_address)) { if (control->i2c_address == 0xA0) control->i2c_address = 0; ... and honestly, that just hurts to look at. It's completely wrong, even if it happens to work on a little-endian machine. Yes, yes, BE is irrelevant, and doubly so for an AMD GPU driver, but still. It's assigning a 8-bit value to a 32-bit entity by doing a pointer cast on the address, and then mixing things up by using/assigning to that same field. That's just *wrong* and nasty. Oh, the resolution would be easy - just take that broken code as-is - but I can't actually make myself do that. So I fixed it up to not be that incredibly ugly garbage. Please holler if I did something wrong. Linus
Re: [PATCH] drm/ttm: fix the type mismatch error on sparc64
On Tue, Sep 14, 2021 at 12:48 PM Alex Deucher wrote: > > On Tue, Sep 7, 2021 at 6:25 AM Christian König > wrote: > > > > > > Reviewed-by: Christian König > > Is one of you going to push this to drm-misc? I was assuming it was there already. I guess I'll just apply it directly. Linus
Re: [git pull] drm for 5.14-rc1
On Sat, Sep 18, 2021 at 2:18 AM Michael Stapelberg wrote: > > torvalds at linux-foundation.org (Linus Torvalds) writes: > > Did I fix it up correctly? Who knows. The code makes more sense to me > > now and seems valid. But I really *really* want to stress how locking > > is important. > > As far as I can tell, this merge conflict resolution made my Raspberry > Pi 3 hang on boot. Ok, that's a different merge issue than the locking one (which is about the amd ttm code). But the VC4 driver did have changes close to each other in the hdmi detection and clock setting code. And it doesn't seem to be just RPi3, there was a report back a couple of weeks ago about RPi4 also having regressed (with an Ubuntu install). That one was an oops in vc4_hdmi_audio_prepare(). I don't know if that got resolved, I heard nothing about it after the report. So there's something seriously wrong in VC4 space. The main issue seems to be the runtime power management changes. As far as I can tell, the commits that didn't come in through that drm pull were these two 9984d6664ce9 ("drm/vc4: hdmi: Make sure the controller is powered in detect") 411efa18e4b0 ("drm/vc4: hdmi: Move the HSM clock enable to runtime_pm") Michael - do things work if you revert those two (sadly, they don't revert cleanly exactly _because_ of the other changes in the same area)? Maxime? Linus
Re: [git pull] drm for 5.14-rc1
On Sat, Sep 18, 2021 at 1:13 PM Michael Stapelberg wrote: > > > Michael - do things work if you revert those two (sadly, they don't > > revert cleanly exactly _because_ of the other changes in the same > > area)? > > Reverting only 9984d6664ce9 is not sufficient, but reverting both > 9984d6664ce9 and 411efa18e4b0 does indeed make my Raspberry Pi 3 boot > again! Since you did those reverts and fixed up the conflicts, would you mind sending out the resulting patch so that maybe Sudip can test it too? Maybe the RPi4 hdmi audio issues are related to the RPi4 hdmi problems despite the symptoms apparently being rather different.. Linus
Re: [git pull] drm for 5.14-rc1
On Sat, Sep 18, 2021 at 3:00 PM Sudip Mukherjee wrote: > > Its still there. I am seeing it every night. This was from last night > - https://lava.qa.codethink.co.uk/scheduler/job/164#L1356 Note that that web server is not available at least to me. Looks like some internal name or limited dns, I just get lava.qa.codethink.co.uk: Name or service not known so your reports aren't getting a lot of traction. Linus
Re: [git pull] drm for 5.14-rc1
On Sat, Sep 18, 2021 at 3:48 PM Sudip Mukherjee wrote: > > Also, I have now tested by reverting those two commits and I still get > the same trace on rpi4. Ok. I'm afraid we really need to have the VC4 people figure it out - I count do the two reverts that are reported to fix the RPi3 issue, but it looks like the RPi4 pulseaudio issue needs something else. Any chance you could bisect that? Actually, looking at that first report of yours, the RPi4 sound problem apparently happened in the range 9e9fb7655ed5..7c636d4d20f8 and while that range does have a drm merge from Dave Airlie, it _also_ has a sound merge from Takashi and various ARM SoC updates from Arnd. But most (all?) of the changes in that range to the vc4 source code came from the drm tree. And Maxime in particular. I think. Linus
Re: [git pull] drm for 5.14-rc1
On Sun, Sep 19, 2021 at 4:05 AM Sudip Mukherjee wrote: > > And indeed, reverting 27da370e0fb3 ("drm/vc4: hdmi: Remove > drm_encoder->crtc usage") on top of d4d016caa4b8 ("alpha: move > __udiv_qrnnd library function to arch/alpha/lib/") > has fixed the error. Ok, since your pulseaudio problem was reported already over two weeks ago with no apparent movement, I've done that revert in my tree. I reverted the two runtime PM changes that cause problems for Michael too. Linus
Re: [git pull] drm for 5.14-rc1
On Mon, Sep 20, 2021 at 5:17 AM Maxime Ripard wrote: > > I'm sorry, but I find that situation to be a bit ridiculous. Honestly, the original report about the pulseaudio problem came in over two weeks ago, and all you seemed to do was to ignore everything that Sudip said and reported. THAT is the ridiculous part. The rules for the kernel are very clear: regressions get fixed or they get reverted. And I saw absolutely _zero_ indication that you took that pulseaudio oops at all seriously. Linus
Re: [git pull] drm for 5.14-rc1
On Mon, Sep 20, 2021 at 10:33 AM Maxime Ripard wrote: > > What I was interested in was more about the context itself, and I'd > still like an answer on whether it's ok to wait for a review for 5 > months though, or if it's an expectation from now on that we are > supposed to fix bugs over the week-end. Oh, it's definitely not "over a weekend". These reverts happened on a Sunday just because that's when I do rc releases, and this was one of those pending issues that had been around long enough that I went "ok, I'm reverting now since it's been bisected and verified". So it happened on a weekend, but that's pretty incidental. You should not wait for 5 months to send bug-fixes. That's not the point of review, and review shouldn't hold up reported regressions of existing code. That's just basic _testing_ - either the fix should be applied, or - if the fix is too invasive or too ugly - the problematic source of the regression should be reverted. Review should be about new code, it shouldn't be holding up "there's a bug report, here's the obvious fix". And for something like a NULL pointer dereference, there really should generally be an "obvious fix". Of course, a corollary to that "fixes are different from new development", though, is that bug fixes need to be kept separate from new code - just so that they _can_ be handled separately and so that you could have sent Sudip (and Michael, although that was apparently a very different bug, and the report came in later) a "can you test this fix" kind of thing. I don't know what the review issue on the vc4 drm side is, but I suspect that the vc4 people are just perhaps not as integrated with a lot of the other core drm people. Or maybe review of new features are held off because there are bug reports on the old code. Linus
Re: Regression with mainline kernel on rpi4
On Wed, Sep 22, 2021 at 3:11 AM Sudip Mukherjee wrote: > > That test script is triggering the openqa job, but its running only > after lava is able to login. The trace is appearing before the login > prompt even, so test_mainline.sh should not matter here. Side note: the traces might be more legible if you have debug info in the kernel, and run the dmesg through the script in scripts/decode_stacktrace.sh which should give line numbers and inlining information. That often makes it much easier to see which access it is that hits a NULL pointer dereference. On x86-64, generally just decode the instruction stream, and look at the instruction patterns and try to figure out where an oops is coming from, but that's much less useful on arm64 (partly because I'm not as used to it, but because the arm64 oopses don't print out much of the instructions so there's often little to go by). Linus
Re: Regression with mainline kernel on rpi4
On Wed, Sep 22, 2021 at 10:02 AM Sudip Mukherjee wrote: > > > Attached is a complete dmesg and also the decoded trace. > This is done on 4357f03d6611 ("Merge tag 'pm-5.15-rc2' of > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm") drivers/gpu/drm/vc4/vc4_hdmi.c:1214 is tmp = (u64)(mode->clock * 1000) * n; in vc4_hdmi_set_n_cts(), which has apparently been inlined from vc4_hdmi_audio_prepare() in vc4_hdmi.c:1398. So it looks like 'mode' is some offset off a NULL pointer. Which looks not impossible: 1207 struct drm_connector *connector = &vc4_hdmi->connector; 1208 struct drm_crtc *crtc = connector->state->crtc; 1209 const struct drm_display_mode *mode = &crtc->state->adjusted_mode; looks like crtc->state perhaps might be NULL. Although it's entirely possible that it's 'crtc' itself that is NULL or one of the earlier indirection accesses. The exact line information from the debug info is very useful and mostly correct, but at the same time should always be taken with a small pinch of salt. Compiler optimizations means that code gets munged and moved around, and since this is the first access to 'mode', I would not be surprised if some of the calculations and accesses to get 'mode' might be moved around to it. Linus
Re: Regression with mainline kernel on rpi4
On Wed, Sep 22, 2021 at 1:19 PM Sudip Mukherjee wrote: > > I added some debugs to print the addresses, and I am getting: > [ 38.813809] sudip crtc > > This is from struct drm_crtc *crtc = connector->state->crtc; Yeah, that was my personal suspicion, because while the line number implied "crtc->state" being NULL, the drm data structure documentation and other drivers both imply that "crtc" was the more likely one. I suspect a simple if (!crtc) return; in vc4_hdmi_set_n_cts() is at least part of the fix for this all, but I didn't check if there is possibly something else that needs to be done too. Linus
Re: [BUG 5.15-rc3] kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
On Sat, Oct 2, 2021 at 5:17 AM Steven Rostedt wrote: > > On Sat, 2 Oct 2021 03:17:29 -0700 (PDT) > Hugh Dickins wrote: > > > Yes (though bisection doesn't work right on this one): the fix > > Interesting, as it appeared to be very reliable. But I didn't do the > "try before / after" on the patch. Well, even the before/after might well have worked, since the problem depended on how that sw_fence_dummy_notify() function ended up aligned. So random unrelated changes could re-align it just by mistake. Patch applied directly. I'd also like to point out how that BUG_ON() actually made things worse, and made this harder to debug. If it had been a WARN_ON_ONCE(), this would presumably not even have needed bisecting, it would have been obvious. BUG_ON() really is pretty much *always* the wrong thing to do. It onl;y results in problems being harder to see because you end up with a dead machine and the message is often hidden. Linus
Re: [GIT PULL] fbdev updates for v5.17-rc1
On Sun, Jan 16, 2022 at 9:32 PM Helge Deller wrote: > > This pull request contains only one single initial patch which adds > myself to the MAINTAINERS file for the FRAMBUFFER LAYER. I'll pull this (as my test builds for other things complete), but this is just a note to say that this pull request email was marked as spam for me, with gmail saying something along the lines of "lots of emails from gmx.de have been marked as spam" I see nothing odd in the email itself, and it has proper SPF and DKIM, but it's possible that you end up sharing a subnet (or an ISP) with spammers... Or maybe it was a random one-off. We'll see. I check spam filters enough that I _usually_ tend to catch these things. Linus
Re: [PATCH 2/2] Revert "fbcon: Disable accelerated scrolling"
On Wed, Jan 19, 2022 at 2:29 PM Helge Deller wrote: > > >> > >> Ah, no, that was just the soft scrollback code I was thinking of, which > > Right. > That was commit 973c096f6a85 and it was about vgacon, not fbcon. No, fbcon had some bug too, although I've paged out the details. See commit 50145474f6ef ("fbcon: remove soft scrollback code"). If I remember correctly (and it's entirely possible that I don't), the whole "softback_lines" logic had serious problems with resizing the console (or maybe changing the font size). There may have been some other bad interaction with foreground/background consoles too, I forget. Linus
Re: [git pull] drm fixes + one missed next for 5.16-rc1
On Thu, Nov 11, 2021 at 7:25 PM Dave Airlie wrote: > > I missed a drm-misc-next pull for the main pull last week. It wasn't > that major and isn't the bulk of this at all. This has a bunch of > fixes all over, a lot for amdgpu and i915. Ugh. The i915 conflict was trivial, but made me aware of that absolutely disgusting "wbinvd_on_all_cpus()" hack. And that thing is much too ugly to survive. I made my merge resolution remove that disgusting thing. That driver is x86-only anyway, so it all seemed completely bogus in the first place. And if there is some actual non-x86 work in progress for i915, then that wbinvd_on_all_cpus() needs to be replaced with something proper and architecture-neutral anyway, most definitely involving a name change, and almost certainly also involving a range for the cache writeback. Because that "create broken macro on other architectures" thing is *NOT* acceptable. And I sincerely hope to the gods that no cache-incoherent i915 mess ever makes it out of the x86 world. Incoherent IO was always a historical mistake and should never ever happen again, so we should not spread that horrific pattern around. Linus
Re: regression with mainline kernel
[ Hmm. This email was marked as spam for me. I see no obvious reason for it being marked as spam, but it happens.. ] On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee wrote: > > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0] > drm/virtio: implement context init: add virtio_gpu_fence_event Hmm. Judging from your automated screenshots, the login never appears. > And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed > the problem I was seeing on my qemu test of x86_64. The qemu image is > based on Ubuntu. Presumably either that commit is somehow buggy in itself - or it does exactly what it means to do, and the new poll() semantics just confuses the heck out of the X server (or wayland or whatever). And honestly, if I read that thing correctly, the patch is entirely broken. The new poll function (virtio_gpu_poll()) will unconditionally remove the first event from the event list, and then report "Yeah, I had events". This is completely bogus for a few reasons: - poll() really should be idempotent, because the poll function gets called multiple times - it doesn't even seem to check that the event that it removes is the new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will unconditionally just remove random events. - it does seem to check the "vfpriv->ring_idx_mask" and do the old thing if that is zero, but I see absolutely no reason for that (and that check itself has caused problems, see below) Honestly, my reaction to this all is that that commit is fundamentally broken and probably should be reverted regardless as "this commit does bad things". HOWEVER - it has had a fix for a NULL pointer dereference in the meantime - can you check whether the current top of tree happens to work for you? Maybe your problem isn't due to "that commit does unnatural things", but simply due to the bug fixed in d89c0c8322ec ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll"). And if it's still broken with that commit, I'll happily revert it and people need to go back to the drawing board. In fact, I would really suggest that people look at that virtio_gpu_poll() function regardless. That odd "let's unconditionally just drop events in the poll function is really REALLY broken behavior. Linus
Re: [git pull] drm fixes + one missed next for 5.16-rc1
On Sun, Nov 14, 2021 at 1:00 PM Dave Airlie wrote: > > i915 will no longer be x86-64 only in theory, since Intel now produces > PCIe graphics cards using the same hw designs. Well, at least in my tree, it still has the "depends on X86", along with several other x86-only things (like "select INTEL_GTT", which is also x86-only) So by the time that non-x86 theory becomes reality, hopefully the i915 people will also have figured out how to do the cache flushing properly. And hopefully that "do it properly" ends up being simply that the particular configuration that ends up being portable simply doesn't need to do it at all and can statically just not build it, sidestepping the issue entirely. Fingers crossed. .. of course, I'm also sure some clueless hardware engineer is still convinced that non-coherent IO is the way to go for graphics, and that doing cross-CPU IPIs to write back all caches is somehow still a valid model. Because some people were still convinced about that not _that_ long ago. Hopefully reality (perhaps in the form of Apple) has caused people to finally reconsider. Linus
Re: [git pull] drm for 5.11-rc1
On Thu, Dec 10, 2020 at 7:52 PM Dave Airlie wrote: > > This is an early pull request for drm for 5.11 merge window. I'm going > to be out for most of the first week of the merge window so thought > I'd just preempt things and send this early. Ok, I've merged this, and Dave is likely gone already, but it _looks_ like there's a problem with the drm tree on my Radeon workstation. It works fine on my i915 laptop. The symptoms are that the boot just hangs after fb0: switching to amdgpudrmfb from EFI VGA when on a working kernel the next lines are amdgpu :49:00.0: vgaarb: deactivate vga console [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE353 0xE7). amdgpu :49:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [drm] register mmio base: 0xE1C0 [drm] register mmio size: 262144 I'm bisecting for more details, but it might take a while, and I thought I'd start the ball rolling with this initial report in case somebody goes "Aahh.., we know about that case already, the fix is XYZ"... Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.11-rc1
On Mon, Dec 14, 2020 at 2:29 PM Alex Deucher wrote: > > The relevant fixes are: Ok, I can confirm that applying those two patches gets my workstation working properly again. Would it be possible to get those submitted properly (or I can just take them as-is, but would like to get a "please just pick them directly")? I don't like to continue to do merges during the merge windows with a known broken base - it makes for a nightmare of bisection when you have multiple independent problems, and I assume this hits not just me but a lot of people with radeon/amdgpu graphics? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm fixes for 5.11-rc1
On Wed, Dec 23, 2020 at 6:29 PM Dave Airlie wrote: > > Xmas eve pull request present. Just some fixes that trickled in this > past week. Mostly amdgpu fixes, with a dma-buf/mips build fix and some > misc komeda fixes. Well, I already pulled and pushed out my merge, but only noticed afterwards that clang complains about this, and I think it's a real bug: drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_mpc.c:475:6: warning: variable 'val' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] and it sure is true: the code literally does uint32_t val; if (opp_id < MAX_OPP && REG(MUX[opp_id])) REG_GET(MUX[opp_id], MPC_OUT_MUX, &val); return val; so clearly 'val' isn't initialized if that if-statement isn't true. I assume 'opp_id' is always presumed to be valid, but that code really is disgusting. Just make it return 0 (or whatever) for invalid, possibly together with a WARN_ON_ONCE(). Ok? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm final fixes for 5.16
On Thu, Jan 6, 2022 at 7:23 PM Dave Airlie wrote: > > There is only the amdgpu runtime pm regression fix in here. Thanks, from a quick test it works for me - the backlight actually does eventually go away. It does so only on the second time the monitors say "no signal, going to power save", but that has been true before too. So I think there's still some confusion in this area, but it might be elsewhere - who knows what Wayland and friends do. At least it doesn't look like a regression to me any more. Thanks, Linus
Re: [git pull] drm fixes for 5.16-rc3
On Sun, Jan 9, 2022 at 11:38 PM Geert Uytterhoeven wrote: > > The commit that merged this branch made a seemingly innocent change to > the top Makefile: "Seemingly" innocent? Or something darker and more sinister, related to the unrelenting slaughter of flightless fowl? You be the judge. Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Thu, Jan 6, 2022 at 10:12 PM Dave Airlie wrote: > > nouveau_fence.c is the only conflict I've seen and I've taken the result from > our rerere cache in the merge above. It's non trivial, would be good to have > Christian confirm it as well. Thanks, that conflict really ended up being one that I would have done very differently without having had that pointer to your reference merge. And I would almost certainly have messed it up in the process. So what I did was to look at your merge resolution (or possibly Christian's? I don't know how you guys share your trees and the origin of that rerere), and tried to understand it, and basically recreate it. It's not exactly the same (different whitespace and variable lifetimes), but I think I got the gist of it. Thanks for the pointer, and hopefully I didn't mess it up _despite_ your merge showing me what I should aim for ;) Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Thu, Jan 6, 2022 at 10:12 PM Dave Airlie wrote: > > git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-01-07 Gaah. I merged things and it built cleanly, and I pushed it out. But then I actually *booted* it, and that's not pretty. It *works", but it's almost unusable because of random scanline flickering. I'm not sure how to explain it, but it's as if there wasn't quite enough bandwidth on the scan-out, so you get these lines of noise and/or shifted output. They are temporary - so the framebuffer contents themselves is not damaged (although I don't know how the compositor works - maybe the problem happens before scanout). This is on the same Radeon device: 49:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) with dual 4k monitors. Any idea? Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Mon, Jan 10, 2022 at 2:13 PM Alex Deucher wrote: > > Sounds like something related to watermarks. That said, we haven't > really touched the display code for DCE11 cards in quite a while. Can > you provide your dmesg output? I'm not seeing anything that would look interesting, but here's the parts that look relevant for drm.. Linus dmesg-gpu Description: Binary data
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Mon, Jan 10, 2022 at 5:11 PM Alex Deucher wrote: > > We are putting together a system to try and repro the issue. Does it > happen with a single monitor or only with two? Nope. With a single monitor everything seems to look fine. And when I plug in the second monitor, it immediately starts happening again. It also seems to depend a bit on the screen contents - or possibly on what else is going on. Hiding the browser window makes it happen less, I think. But I suspect that's about "less gpu activity" than anything else. I'll see if I can bisect it at least partially. Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Mon, Jan 10, 2022 at 5:21 PM Linus Torvalds wrote: > > It also seems to depend a bit on the screen contents - or possibly on > what else is going on. Hiding the browser window makes it happen less, > I think. But I suspect that's about "less gpu activity" than anything > else. Actually, sometimes "more activity" makes it go away too. Moving a window around wildly with the mouse makes it *stop* happen. But moving the mouse over different elements of the screen - or writing text in the web browser email window - seems to make it worse. Funky. It does "feel" to me like some bandwidth limitation, it has kind of the same behavior that I remember from the bad old times when you were pushing the video card past a resolution that it could really handle. But that can't be the case, this card has had no problems with this before. Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Mon, Jan 10, 2022 at 5:21 PM Linus Torvalds wrote: > > I'll see if I can bisect it at least partially. It seems to be very reliable. I can see the flickering even at early boot before gdb has started - the graphical screen where you type the encrypted disk password at boot already shows it as you type. Right now it is bad: 9602044d1cc12280e20c5f2cd640ae80f69e good: 3867e3704f136beadf5e004b61696ef7f990bee4 so it's going to be one of these: 9602044d1cc1 drm/amd/display: Fix for the no Audio bug with Tiled Displays a896f870f8a5 drm/amd/display: Fix for otg synchronization logic aba3c3fede54 drm/amd/display: Clear DPCD lane settings after repeater training 9311ed1e1241 drm/amd/display: add hdmi disable debug check 6421f7c750e9 drm/amd/display: Allow DSC on supported MST branch devices ebe5ffd8e271 drm/amd/display: Enable P010 for DCN3x ASICs c022375ae095 drm/amd/display: Add DP-HDMI FRL PCON Support in DC 50b1f44ec547 drm/amd/display: Add DP-HDMI FRL PCON SST Support in DM 81d104f4afbf drm/amdgpu: Don't halt RLC on GFX suspend fe9c5c9affc9 drm/amdgpu: Use MAX_HWIP instead of HW_ID_MAX 370016988665 drm/amdgpu: fix the missed handling for SDMA2 and SDMA3 6c18ecefaba7 drm/amdgpu: declare static function to fix compiler warning 94a80b5bc7a2 amdgpu/pm: Modify implmentations of get_power_profile_mode to use amdgpu_pp_profile_name and I guess I'll do the few more bisections to pick out the exact one. Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Mon, Jan 10, 2022 at 6:22 PM Linus Torvalds wrote: > > and I guess I'll do the few more bisections to pick out the exact one. a896f870f8a5f23ec961d16baffd3fda1f8be57c is the first bad commit. Attaching ther BISECT_LOG in case anybody cares. I'll double-check to see if a revert fixes it at the top of my tree. Linus BISECT_LOG Description: Binary data
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Mon, Jan 10, 2022 at 6:44 PM Linus Torvalds wrote: > > I'll double-check to see if a revert fixes it at the top of my tree. Yup. It reverts cleanly, and the end result builds and works fine, and doesn't show the horrendous flickering. I have done that revert, and will continue the merge window work. Somebody else gets to figure out what the actual bug is, but that commit was horribly broken on my machine (Sapphire Pulse RX 580 8GB, fwiw). Linus
Re: [git pull] drm for 5.17-rc1 (pre-merge window pull)
On Tue, Jan 11, 2022 at 7:38 AM Harry Wentland wrote: > > Attached is a v2 of the buggy patch that should get this right. > If you have a chance to try it out let us know I can confirm that I do not see the horribly flickering behavior with this patch. I didn't look at what the actual differences were from the one I reverted, but at least on my machine this version works. Linus
[git pull] drm for v4.8
On Mon, Aug 1, 2016 at 9:32 PM, Dave Airlie wrote: > > This is the main drm pull request for 4.8, I'm down with a cold at the moment > so hopefully this isn't in too bad a state, I finished pulling stuff last > week mostly (nouveau fixes just went in today), so only this message should > be influenced by illness. Apologies to anyone who's major feature I missed :-) > > i915: > BXT support enabled by default > GVT-g infrastructure > GuC command submission and fixes > BXT workarounds > SKL/BKL workarounds > Demidlayering device registration > Thundering herd fixes > Missing pci ids > Atomic updates Hmm. I did the merge and pushed it out, but testing it on my laptop shows some very annoying flickering problem. The screen goes dark for a very short while (one frame? Who knows? Seems longer occasionally). I have no idea what triggers it, but it happens quite a lot when it happens. Like once every second or two. And it seems to happen most of the time, although right now it happens to be behaving nicely, so sometimes it goes for a while without the flickering. Things *work*, but the flickering is nasty enough to make the end result painful to use. The only thing I see in dmesg that looks bad is [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe A [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun but I've seen that before, and it happens a couple of times during boot. Not once per second. This is my old Vaio 11 Pro, now running Fedora 24 (up-to-date as of today). So it's bog-standard intel graphics (i5-4200U - Haswell ULT). Suggestions to try? Linus
[git pull] drm for v4.8
On Tue, Aug 2, 2016 at 4:10 AM, Ville Syrjälä wrote: > > I have a couple of pending PSR patches you may want to try as well, > if i915.enable_psr=0 helps. Yes. i915.enable_psr=0 seems to make the bad flickering go away. I'll try your git trees out later, but what exactly changed with regards to psr lately? It's set as a "dangerous" option, and even just clearing it caused Setting dangerous option enable_psr - tainting kernel which seems entirely bogus. Linus
[git pull] drm for v4.8
On Tue, Aug 2, 2016 at 4:10 AM, Ville Syrjälä wrote: > > So PSR seems more likely. The underruns might point at some watermark > fail though :( > > I have a couple of pending PSR patches you may want to try as well, > if i915.enable_psr=0 helps. > > First set is here: > git://github.com/vsyrjala/linux.git psr_setup_time_2 > This should be perfectly safe to go in actually, as it will only result > in disabling PSR with certain panels. This first git pull fixes it for me, as far as I can tell. I'm not sure that the problem is 100% reproducible, but I booted into each kernel twice, and the current git tree is broken, while with your psr_setup_time_2 branch pulled it works. So it does seem to be the fix. > The second set is here: > git://github.com/vsyrjala/linux.git psr_fixes_2 I didn't even test that one. Should I just pull that psr_setup_time2 branch for real? I'd like to get a real pull request with explanations etc, but other than that it looks good to go. Linus
Linux 4.8-rc1 Cirrus QEMU crashes on boot.
On Mon, Aug 8, 2016 at 1:24 PM, Eric W. Biederman wrote: > > Booting up v4.8-rc1 in qemu today I ran I ran into this beautiful oops. > > I am just about to head out the door on vacation until the end of the > month so hopefully I am tossing out enough information to the right > people. > > I have confirmed that disabling CONFIG_DRM_CIRRUS_QEMU avoids this > crash. > > [0.720167] BUG: unable to handle kernel NULL pointer dereference at > 0018 > [0.721348] IP: [] drm_pick_crtcs+0xfc/0x270 > [0.722249] PGD 0 > [0.722659] Oops: [#1] SMP > [0.723141] Modules linked in: > [0.723643] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1+ #116 > [0.724602] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > [0.725450] task: 8800bbf28000 task.stack: 8800bbf3 > [0.726327] RIP: 0010:[] [] > drm_pick_crtcs+0xfc/0x270 > [0.727648] RSP: :8800bbf33978 EFLAGS: 00010217 > [0.728444] RAX: 8d623a20 RBX: RCX: > 8800baf3f860 > [0.729498] RDX: RSI: 1000 RDI: > 8800baf3f800 > [0.730626] RBP: 8800bbf339f8 R08: R09: > 0001 > [0.731704] R10: 0001 R11: R12: > 8800bbaf3af0 > [0.732827] R13: 8800bbaf3af8 R14: R15: > 8800baf3fc00 > [0.733863] FS: () GS:8800bf80() > knlGS: > [0.735106] CS: 0010 DS: ES: CR0: 80050033 > [0.735964] CR2: 0018 CR3: 3b219000 CR4: > 06f0 > [0.737021] Stack: > [0.737342] 8800bbf33998 0246 8800bbaf3b18 > 8800bbaf3b00 > [0.738602] 1001 8800bbaf3b00 > 8800baf3f800 > [0.739807] 00031000 8800bbaf3b18 8800bbf339e8 > 8800baf3fc00 > [0.740993] Call Trace: > [0.741393] [] drm_setup_crtcs+0x3bd/0xb50 > [0.742252] [] ? mark_held_locks+0x71/0x90 > [0.743176] [] ? __mutex_unlock_slowpath+0xeb/0x1c0 > [0.744138] [] drm_fb_helper_initial_config+0x81/0x3c0 > [0.745166] [] ? drm_modeset_unlock_all+0x54/0x80 > [0.746136] [] cirrus_fbdev_init+0xa0/0xb0 > [0.747033] [] cirrus_modeset_init+0x18b/0x1e0 > .. snip snip .. > [0.771023] Code: e8 ba e1 ff ff 48 8b 55 b8 48 83 f8 01 83 5d c4 ff 48 8b > 7d b8 48 8b 82 98 02 00 00 49 8b 57 08 48 8b 40 10 48 8b 92 00 0a 00 00 <48> > 83 7a 18 00 74 09 48 85 c0 0f 84 53 01 00 00 ff d0 49 89 c3 This decodes to 0: e8 ba e1 ff ff callq ... 5: 48 8b 55 b8 mov-0x48(%rbp),%rdx 9: 48 83 f8 01 cmp$0x1,%rax d: 83 5d c4 ff sbbl $0x,-0x3c(%rbp) 11: 48 8b 7d b8 mov-0x48(%rbp),%rdi 15: 48 8b 82 98 02 00 00 mov0x298(%rdx),%rax 1c: 49 8b 57 08 mov0x8(%r15),%rdx 20: 48 8b 40 10 mov0x10(%rax),%rax 24: 48 8b 92 00 0a 00 00 mov0xa00(%rdx),%rdx 2b:* 48 83 7a 18 00 cmpq $0x0,0x18(%rdx) <-- trapping instruction 30: 74 09 je 0x3b 32: 48 85 c0 test %rax,%rax 35: 0f 84 53 01 00 00 je 0x18e 3b: ff d0 callq *%rax 3d: 49 89 c3 mov%rax,%r11 and trying to find matching code I think the oops is from this: if (fb_helper->dev->mode_config.funcs->atomic_commit && !connector_funcs->best_encoder) encoder = drm_atomic_helper_best_encoder(connector); else encoder = connector_funcs->best_encoder(connector); in particular, the trapping instruction is the load of "atomic_commit". (The "callq *%rax" seems to be the "connector_funcs->best_encoder()" call) So I *think* we have "fb_helper->dev->mode_config.funcs" being NULL. But I might have gotten the code matching wrong. I don't have the same compiler as Eric, so even when using his config my code generation doesn't really match. But there is only one indirect call in that function, which is that ->best_encoder() call, so it does seem to match up. Adding Boris Brezillon and Daniel Vetter to the cc, since assuming I'm right, the main suspect is commit c61b93fe51b1 ("drm/atomic: Fix remaining places where !funcs->best_encoder is valid"). Boris? Daniel? Linus
Re: [git pull] drm for 5.2-rc1
Dave, On Wed, May 8, 2019 at 8:28 PM Dave Airlie wrote: > > This is the main drm pull request for 5.2. Thanks. I've merged it, but I got a couple of conflicts with fixes (reverts) to mainline in the meantime. The one to the i915 driver you seem to have applied again (after the function was moved and renamed). The one to the virtgpu driver, I really don't know if is needed any more. I suspect I completely unnecessarily merged that virtgpu_gem_prime_import_sg_table() function that came in because I decided to do the merge of the revert. It's a trivial function that just returns an error, and your tree just leaves it as NULL, and I suspect my merge doesn't hurt, but it also probably doesn't matter. So you should check my merge. Thanks, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH] kernel.h: Add generic roundup_64() macro
On Thu, May 23, 2019 at 7:00 AM Steven Rostedt wrote: > > +# define roundup_64(x, y) (\ > +{ \ > + typeof(y) __y = y; \ > + typeof(x) __x = (x) + (__y - 1);\ > + do_div(__x, __y); \ > + __x * __y; \ > +} \ The thing about this is that it absolutely sucks for power-of-two arguments. The regular roundup() that uses division has the compiler at least optimize them to shifts - at least for constant cases. But do_div() is meant for "we already know it's not a power of two", and the compiler doesn't have any understanding of the internals. And it looks to me like the use case you want this for is very much probably a power of two. In which case division is all kinds of just stupid. And we already have a power-of-two round up function that works on u64. It's called "round_up()". I wish we had a better visual warning about the differences between "round_up()" (limited to powers-of-two, but efficient, and works with any size) and "roundup()" (generic, potentially horribly slow, and doesn't work for 64-bit on 32-bit). Side note: "round_up()" has the problem that it uses "x" twice. End result: somebody should look at this, but I really don't like the "force division" case that is likely horribly slow and nasty. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH] kernel.h: Add generic roundup_64() macro
On Thu, May 23, 2019 at 8:27 AM Steven Rostedt wrote: > > I haven't yet tested this, but what about something like the following: So that at least handles the constant case that the normal "roundup()" case also handles. At the same time, in the case you are talking about, I really do suspect that we have a (non-constant) power of two, and that you should have just used "round_up()" which works fine regardless of size, and is always efficient. On a slight tangent.. Maybe we should have something like this: #define size_fn(x, prefix, ...) ({ \ typeof(x) __ret;\ switch (sizeof(x)) {\ case 1: __ret = prefix##8(__VA_ARGS__); break; \ case 2: __ret = prefix##16(__VA_ARGS__); break; \ case 4: __ret = prefix##32(__VA_ARGS__); break; \ case 8: __ret = prefix##64(__VA_ARGS__); break; \ default: __ret = prefix##bad(__VA_ARGS__); \ } __ret; }) #define type_fn(x, prefix, ...) ({ \ typeof(x) __ret;\ if ((typeof(x))-1 > 1) \ __ret = size_fn(x, prefix##_u, __VA_ARGS__);\ else\ __ret = size_fn(x, prefix##_s, __VA_ARGS__);\ __ret; }) which would allow typed integer functions like this. So you could do something like #define round_up(x, y) size_fn(x, round_up_size, x, y) and then you define functions for round_up_size8/16/32/64 (and you have toi declare - but not define - round_up_sizebad()). Of course, you probably want the usual "at least use 'int'" semantics, in which case the "type" should be "(x)+0": #define round_up(x, y) size_fn((x)+0, round_up_size, x, y) and the 8-bit and 16-bit cases will never be used. We have a lot of cases where we end up using "type overloading" by size. The most explicit case is perhaps "get_user()" and "put_user()", but this whole round_up thing is another example. Maybe we never really care about "char" and "short", and always want just the "int-vs-long-vs-longlong"? That would make the cases simpler (32 and 64). And maybe we never care about sign. But we could try to have some unified helper model like the above.. Linus
Re: [RFC][PATCH] kernel.h: Add generic roundup_64() macro
On Thu, May 23, 2019 at 10:36 AM Steven Rostedt wrote: > > > > > Of course, you probably want the usual "at least use 'int'" semantics, > > in which case the "type" should be "(x)+0": > > > > #define round_up(x, y) size_fn((x)+0, round_up_size, x, y) > > > > and the 8-bit and 16-bit cases will never be used. > > I'm curious to what the advantage of that is? Let's say that you have a structure with a 'unsigned char' member, because the value range is 0-255. What happens if you do x = round_up(p->member, 4); and the value is 255? Now, if you stay in 'unsigned char' the end result is 0. If you follow the usual C integer promotion rules ("all arithmetic promotes to at least 'int'"), you get 256. Most people probably expect 256, and that implies that even if you pass an 'unsigned char' to an arithmetic function like this, you expect any math to be done in 'int'. Doing the "(x)+0" forces that, because the "+0" changes the type of the expression from "unsigned char" to "int" due to C integer promotion. Yes. The C integer type rules are subtle and sometimes surprising. One of the things I've wanted is to have some way to limit silent promotion (and silent truncation!), and cause warnings. 'sparse' does some of that with some special-case types (ie __bitwise), but it's pretty limited. Linus
Re: [git pull] drm fixes for 5.0-rc3
On Fri, Jan 18, 2019 at 12:43 PM Dave Airlie wrote: > > Going to be at LCA next week in Christchurch, but should be fine for > normal pulls. .. hey, me too. > git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2019-01-18 Pulled, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm fixes for v5.2-rc4 (v2)
On Fri, Jun 7, 2019 at 12:24 AM Dave Airlie wrote: > > I sent this out earlier, but I forgot the subject, and then Ben asked > about some nouveau firmware fixes. Well, the first one at least had the address to pull from, and the diffstat. The second one has the subject, and mentions nouveau, but doesn't actually have the tag name or the expected diffstat and shortlog. Third time is the charm? Linus
Re: [git pull] drm fixes for v5.2-rc4 (v2)
On Fri, Jun 7, 2019 at 10:20 AM Linus Torvalds wrote: > > The second one has the subject, and mentions nouveau, but doesn't > actually have the tag name or the expected diffstat and shortlog. Hmm. I'm guessing you meant for me to pull the 'tags/drm-fixes-2019-06-07-1' thing, which looks likely, but I'd like to have confirmation. Linus
Re: [git pull] drm fixes for 5.2-rc6
On Thu, Jun 20, 2019 at 9:21 PM Dave Airlie wrote: > > Just catching up on the week since back from holidays, everything > seems quite sane. .. well, except for anongit.freedesktop.org itself, which seems very sick indeed. Does it work for you? I'm getting a connection reset, no data. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm fixes for 5.2-rc6
On Fri, Jun 21, 2019 at 10:06 AM Daniel Stone wrote: > > > Does it work for you? I'm getting a connection reset, no data. > > It is quite sick indeed; it's fallen down an NFS hole and is being > restarted. Should be back in a few minutes. Thanks, everything does indeed look good again now, Linus
Re: [GIT PULL] Please pull hmm changes
On Tue, Jul 9, 2019 at 12:24 PM Jason Gunthorpe wrote: > > I'm sending it early as it is now a dependency for several patches in > mm's quilt. .. but I waited to merge it until I had time to review it more closely, because I expected the review to be painful. I'm happy to say that I was overly pessimistic, and that instead of finding things to hate, I found it all looking good. Particularly the whole "use reference counts properly, so that lifetimes make sense and all those nasty cases can't happen" parts. It's all merged, just waiting for the test-build to verify that I didn't miss anything (well, at least nothing obvious). Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm pull for v5.3-rc1
On Mon, Jul 15, 2019 at 12:08 AM Dave Airlie wrote: > > VMware had some mm helpers go in via my tree (looking back I'm not > sure Thomas really secured enough acks on these, but I'm going with it > for now until I get push back). Yeah, this is the kind of completely unacceptable stuff that I was _afraid_ I'd get from the hmm tree, but didn't. It's not just "mm helpers". It's core changes like changing how do_page_mkwrite() works. With not a single ack or review from any of the VM people. Maybe that commit is fine. But there's a whole slew of questionable core VM changes there, and absolutely none of them look obvious, and none of them hack acks from any of the VM people. The hmm tree looked like good cleanups that largely removed broken code. This looks like it *adds* broken code, or at least adds code that had absolutely no real review outside of vmware. I'm not pulling this. Why did you merge it into your tree, when apparently you were aware of how questionable it is judging by the drm pull request. Linus