Re: [Intel-gfx] LOOKS GOOD: Possible regression in drm/i915 driver: memleak

2022-12-21 Thread Mirsad Todorovac

On 12/20/22 16:52, Tvrtko Ursulin wrote:


On 20/12/2022 15:22, srinivas pandruvada wrote:

+Added DRM mailing list and maintainers

On Tue, 2022-12-20 at 15:33 +0100, Mirsad Todorovac wrote:

Hi all,

I have been unsuccessful to find any particular Intel i915 maintainer
emails, so my best bet is to post here, as you will must assuredly
already know them.


For future reference you can use ${kernel_dir}/scripts/get_maintainer.pl 
-f ...



The problem is a kernel memory leak that is repeatedly occurring
triggered during the execution of Chrome browser under the latest
6.1.0+
kernel of this morning and Almalinux 8.6 on a Lenovo desktop box
with Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz CPU.

The build is with KMEMLEAK, KASAN and MGLRU turned on during the
build,
on a vanilla mainline kernel from Mr. Torvalds' tree.

The leaks look like this one:

unreferenced object 0x888131754880 (size 64):
    comm "chrome", pid 13058, jiffies 4298568878 (age 3708.084s)
    hex dump (first 32 bytes):
  01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  00 00 00 00 00 00 00 00 00 80 1e 3e 83 88 ff ff
...>
    backtrace:
  [] slab_post_alloc_hook+0xb2/0x340
  [] __kmem_cache_alloc_node+0x1bf/0x2c0
  [] kmalloc_trace+0x2a/0xb0
  [] drm_vma_node_allow+0x45/0x150 [drm]
  [] __assign_mmap_offset_handle+0x615/0x820
[i915]
  [] i915_gem_mmap_offset_ioctl+0x77/0x110
[i915]
  [] drm_ioctl_kernel+0x181/0x280 [drm]
  [] drm_ioctl+0x2dd/0x6a0 [drm]
  [] __x64_sys_ioctl+0xc4/0x100
  [] do_syscall_64+0x58/0x80
  [] entry_SYSCALL_64_after_hwframe+0x72/0xdc

The complete list of leaks in attachment, but they seem similar or
the same.

Please find attached lshw and kernel build config file.

I will probably check the same parms on my laptop at home, which is
also
Lenovo, but a different hw config and Ubuntu 22.10.


Could you try the below patch?

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c

index c3ea243d414d..0b07534c203a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -679,9 +679,10 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
  insert:
     mmo = insert_mmo(obj, mmo);
     GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
-out:
+
     if (file)
     drm_vma_node_allow(&mmo->vma_node, file);
+out:
     return mmo;

  err:

Maybe it is not the best fix but curious to know if it will make the 
leak go away.


Hi,

After 27 minutes uptime with the patched kernel it looks promising.
It is much longer than it took for the buggy kernel to leak slabs.

Here is the output:

[root@pc-mtodorov marvin]# echo scan > /sys/kernel/debug/kmemleak
[root@pc-mtodorov marvin]# cat !$
cat /sys/kernel/debug/kmemleak
unreferenced object 0x888105028d80 (size 16):
  comm "kworker/u12:5", pid 359, jiffies 4294902898 (age 1620.144s)
  hex dump (first 16 bytes):
6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00  memstick0...
  backtrace:
[] slab_post_alloc_hook+0xb2/0x340
[] __kmem_cache_alloc_node+0x1bf/0x2c0
[] __kmalloc_node_track_caller+0x55/0x160
[] kstrdup+0x36/0x60
[] kstrdup_const+0x28/0x30
[] kvasprintf_const+0x97/0xd0
[] kobject_set_name_vargs+0x34/0xc0
[] dev_set_name+0x9b/0xd0
[] memstick_check+0x181/0x639 [memstick]
[] process_one_work+0x4e6/0x7e0
[] worker_thread+0x76/0x770
[] kthread+0x168/0x1a0
[] ret_from_fork+0x29/0x50
[root@pc-mtodorov marvin]# w
 20:27:35 up 27 min,  2 users,  load average: 0.83, 1.15, 1.19
USER TTY  FROM LOGIN@   IDLE   JCPU   PCPU WHAT
marvin   tty2 tty2 20:01   27:10  10:12   2.09s 
/opt/google/chrome/chrome --type=utility --utility-sub-type=audio.m

marvin   pts/1-20:010.00s  2:00   0.38s sudo bash
[root@pc-mtodorov marvin]# uname -rms
Linux 6.1.0-b6bb9676f216-mglru-kmemlk-kasan+ x86_64
[root@pc-mtodorov marvin]#

2. On the Ubuntu 22.10 with Debian build I did not reproduce the error 
thus far.


This looks to me like fixed, but if it doesn't leak anything until 
Thursday morning when I will see this desktop box next time, then we'll 
know with more certainty.


Hope this helps. (My $0.02 .)

Kudos for the quick fix :)

Kind regards,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia


Re: [Intel-gfx] LOOKS GOOD: Possible regression in drm/i915 driver: memleak

2023-01-15 Thread Mirsad Todorovac




On 1/9/23 16:00, Tvrtko Ursulin wrote:


On 25/12/2022 22:48, Mirsad Goran Todorovac wrote:

On 22. 12. 2022. 09:04, Tvrtko Ursulin wrote:


In the meantime definitely thanks a lot for testing this quickly and reporting 
back!


Don't think much of it - anyone with CONFIG_KMEMLEAK enabled could have caught 
this bug.

I was surprised that you found the fix in less than an hour without me having 
to bisect :)


Fix sadly has a problem handling shared buffers so different version will 
hopefully appear soon.


Bummer.

Ready to test the new version in the same environment.

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia


Re: [Intel-gfx] [PATCH 2/2] drm/i915: Fix a memory leak with reused mmap_offset

2023-01-18 Thread Mirsad Todorovac

Hi,

On 1/18/23 10:19, Tvrtko Ursulin wrote:


Thanks for working on this, it looks good to me and it aligns with how i915 
uses the facility.

Copying Mirsad who reported the issue in case he is still happy to give it a quick test. Mirsad, I don't know if you are subscribed 
to one of the two mailing lists where series was posted. In case not, you can grab both patches from 
https://patchwork.freedesktop.org/series/112952/.


Nirmoy - we also have an IGT written by Chuansheng - https://patchwork.freedesktop.org/patch/515720/?series=101035&rev=4. A more 
generic one could be placed in gem_mmap_offset test but this one works too in my testing and is IMO better than nothing.


Finally, let me add some tags below:

On 17/01/2023 17:52, Nirmoy Das wrote:

drm_vma_node_allow() and drm_vma_node_revoke() should be called in
balanced pairs. We call drm_vma_node_allow() once per-file everytime a
user calls mmap_offset, but only call drm_vma_node_revoke once per-file
on each mmap_offset. As the mmap_offset is reused by the client, the
per-file vm_count may remain non-zero and the rbtree leaked.

Call drm_vma_node_allow_once() instead to prevent that memory leak.

Cc: Tvrtko Ursulin 
Cc: Andi Shyti 


Fixes: 786555987207 ("drm/i915/gem: Store mmap_offsets in an rbtree rather than a 
plain list")
Reported-by: Chuansheng Liu 
Reported-by: Mirsad Todorovac 
Cc:  # v5.7+
Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 4f69bff63068..2aac6bf78740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -697,7 +697,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
  GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
  out:
  if (file)
-    drm_vma_node_allow(&mmo->vma_node, file);
+    drm_vma_node_allow_once(&mmo->vma_node, file);
  return mmo;
  err:


The drm/i915 patch seems OK and there are currently no memory leaks as of
reported by /sys/kernel/debug/kmemleak under the same Chrome load that triggered
the initial bug ...

Will post you if there are any changes.

Regards,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia


Re: [Intel-gfx] [PATCH 2/2] drm/i915: Fix a memory leak with reused mmap_offset

2023-01-18 Thread Mirsad Todorovac

Hi,

On 1/18/23 11:39, Das, Nirmoy wrote:


On 1/18/2023 11:26 AM, Mirsad Todorovac wrote:

Hi,

On 1/18/23 10:19, Tvrtko Ursulin wrote:


Thanks for working on this, it looks good to me and it aligns with how i915 
uses the facility.

Copying Mirsad who reported the issue in case he is still happy to give it a quick test. Mirsad, I don't know if you are 
subscribed to one of the two mailing lists where series was posted. In case not, you can grab both patches from 
https://patchwork.freedesktop.org/series/112952/.


Nirmoy - we also have an IGT written by Chuansheng - https://patchwork.freedesktop.org/patch/515720/?series=101035&rev=4. A more 
generic one could be placed in gem_mmap_offset test but this one works too in my testing and is IMO better than nothing.


Finally, let me add some tags below:

On 17/01/2023 17:52, Nirmoy Das wrote:

drm_vma_node_allow() and drm_vma_node_revoke() should be called in
balanced pairs. We call drm_vma_node_allow() once per-file everytime a
user calls mmap_offset, but only call drm_vma_node_revoke once per-file
on each mmap_offset. As the mmap_offset is reused by the client, the
per-file vm_count may remain non-zero and the rbtree leaked.

Call drm_vma_node_allow_once() instead to prevent that memory leak.

Cc: Tvrtko Ursulin 
Cc: Andi Shyti 


Fixes: 786555987207 ("drm/i915/gem: Store mmap_offsets in an rbtree rather than a 
plain list")
Reported-by: Chuansheng Liu 
Reported-by: Mirsad Todorovac 
Cc:  # v5.7+
Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 4f69bff63068..2aac6bf78740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -697,7 +697,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
  GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
  out:
  if (file)
-    drm_vma_node_allow(&mmo->vma_node, file);
+    drm_vma_node_allow_once(&mmo->vma_node, file);
  return mmo;
  err:


The drm/i915 patch seems OK and there are currently no memory leaks as of
reported by /sys/kernel/debug/kmemleak under the same Chrome load that triggered
the initial bug ...



Thanks, Mirsad for quickly checking this!


There was no problem, Nirmoy, everything applied neatly :)

Regards,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia