/cc Bagas as well (see below).

On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> 
> On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Since v6.5 kernel the following HW:
> > 
> > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > 
> > is affected by the following crash once KDE on either X11 or Wayland is 
> > started:
> > 
> > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > i915 0000:00:02.0: vgaarb: deactivate vga console
> > i915 0000:00:02.0: vgaarb: changed VGA decodes: 
> > olddecodes=io+mem,decodes=io+mem:owns=mem
> > i915 0000:00:02.0: [drm] Finished loading DMC firmware 
> > i915/skl_dmc_ver1_27.bin (v1.27)
> > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > fbcon: i915drmfb (fb0) is primary device
> > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > …
> > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > BUG: unable to handle page fault for address: ffffb422c2800000
> > #PF: supervisor write access in kernel mode
> > #PF: error_code(0x0002) - not-present page
> > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > Oops: 0002 [#1] PREEMPT SMP PTI
> > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 
> > a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 
> > 11/15/2022
> > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > …
> > Call Trace:
> >  <TASK>
> >  intel_ggtt_bind_vma+0x3e/0x60 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> >  __intel_context_do_pin_ww+0x312/0x700 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_do_execbuffer+0xfc6/0x2720 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 
> > a83fdc6539431252dba13053979a8b680af86836]
> >  drm_ioctl_kernel+0xca/0x170
> >  drm_ioctl+0x30f/0x580
> >  __x64_sys_ioctl+0x94/0xd0
> >  do_syscall_64+0x5d/0x90
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > …
> > note: kwin_wayland[674] exited with irqs disabled
> > 
> > RIP seems to translate into this:
> > 
> > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o 
> > gen8_ggtt_insert_entries+0xc2
> > gen8_ggtt_insert_entries+0xc2/0x150:
> > writeq at 
> > /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > (inlined by) gen8_set_pte at 
> > /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > (inlined by) gen8_ggtt_insert_entries at 
> > /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > 
> > Probably, recent PTE-related changes are relevant:
> > 
> > $ git log --oneline --no-merges v6.4..v6.5 -- 
> > drivers/gpu/drm/i915/gt/intel_ggtt.c
> > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode 
> > callbacks
> > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > 5e352e32aec23 drm/i915: preparation for using PAT index
> > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > 
> > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 
> > 01) is not affected by this issue.
> > 
> > Full dmesg with DRM debug enabled is available in the bugreport I've 
> > reported earlier [1]. I'm sending this email to make the issue more visible.
> > 
> > Please help.
> > 
> > Thanks.
> > 
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> 
> Matthew,
> 
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Bagas,

would you mind adding this to the regression tracker please?

Thanks.

-- 
Oleksandr Natalenko (post-factum)

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to