Am 16.04.19 um 13:20 schrieb Karol Herbst:
On Tue, Apr 16, 2019 at 1:07 PM Koenig, Christian
<christian.koe...@amd.com> wrote:
Am 16.04.19 um 12:54 schrieb Karol Herbst:
On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian
<christian.koe...@amd.com> wrote:
Am 16.04.19 um 11:10 schrieb Karol Herbst:
On Tue, Apr 16, 2019 at 8:38 AM Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
Am 16.04.19 um 02:35 schrieb Karol Herbst:
Kobjects are supposed to be dynamically allocated, but with recent changes
this rule was violated. Reverting those commits fixes crashes when a drm
driver using TTM gets loaded again.
The object in question is "ttm_mem_glob" declared inside
"include/drm/ttm/ttm_memory.h" and instatiated inside
"drivers/gpu/drm/ttm/ttm_memory.c".
from "Documentation/kobject.txt":
"Because kobjects are dynamic, they must not be declared statically or on
the stack, but instead, always allocated dynamically. Future versions of
the kernel will contain a run-time check for kobjects that are created
statically and will warn the developer of this improper usage."
Unloading ttm before reloading the driver workarounds that crash, because
the memory backing the kobject member "kobj" is cleaned up. The kobject_del
and kobject_put function never free or clean up the kobject object leaving
it in an undefined state.
I reverted a few more commits to make it less painful for me to rever this
rather big change.
Well, NAK. By reverting those change you also re-introduced the problems
we originally fixed with those patches.
Please work on a proper fix instead,
Christian.
And which problem was that besides duplicated code (or maybe even a
bit of memory consumption if multiple ttm driver were used)? If I had
to choose between duplicated code and a crash, I choose the former.
Maybe I missed the real reason why those changes are made, but the
commit messages don't really seem to tell me.
The old implementation crashed because different drivers tried to
allocate the same kobj.
Crashing in one way is not better than crashing in another way.
Christian.
how can that old crash be triggered? By loading two ttm based drivers
at the same time or by other means?
Yes, exactly. Even worse it could be triggered by one driver
instantiating multiple times at the same time, e.g two AMD GPUs in the
same system.
On the other hand I completely agree that using kobj static is
completely nuts. The problem is that using a kobj was the wrong approach
in the first place.
In other words when you have something which controls a global behavior
of a module, what do you use? Right, a module parameter.
Point is that we can't get away from those kobj without breaking the
uapi, so that is something which can't be done easily.
Regards,
Christian.
okay, thanks for the explanation!
Currently I will be testing your first patch you sent out on top of
5.0.7, so that we could propose that for stable. The second didn't
apply, but it didn't look relevant for the actual fix. Will reply to
the patch with my results then.
The second is just a further clean, please ignore that one for backporting.
Let me know if the first one works on 5.0.7, if yes I'm going to add a
CC stable tag before pushing.
Thanks,
Christian.
Thanks
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel