Hi,

I have been investigating the crash on exit problem with mpv in ports
with vo=gpu. I think I made a little bit of progress and thought I'd
share my findings.

The crash (SIGSEGV) happens when thread local destructors
are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in
_rthread_tls_destructors after the gpu thread exits: vo_thread in
video/out/vo.c:1067. The crashing call stack looks like this:

#0  0x00000176ffdc9680 in ?? ()
#1  0x0000017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) at 
/usr/src/lib/libc/thread/rthread_tls.c:182
#2  0x0000017748d98623 in _libc_pthread_exit (retval=<error reading variable: 
Unhandled dwarf expression opcode 0xa3>) at 
/usr/src/lib/libc/thread/rthread.c:150
#3  0x0000017795b22189 in _rthread_start (v=<error reading variable: Unhandled 
dwarf expression opcode 0xa3>) at /usr/src/lib/librthread/rthread.c:97
#4  0x0000017748d0c5ba in __tfork_thread () at 
/usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84

Note that some of the traces were taken from different runs so there
might be some mismatch between the handles/addresses.

It crashes because the destructor is dangling. This mystified me because
if I look at the mpv source, there is no thread local data for the gpu
thread. Indeed, right after the gpu thread starts running if we look
inside the thread structure, local_storage is null. However, if we look
at the same thread at the point of the crash, its local_storage is
populated:

(gdb) p *(*thread).local_storage
$3 = {
  keyid = 7,
  next = 0x177353442e0,
  data = 0x1770c276000
}

The keys are indexed by the keyid in the rkeys array, from where the
destructor is fetched in _rthread_tls_destructors:

(gdb) p rkeys[7]
$6 = {
  used = 1,
  destructor = 0x43dd33e2680
}

This destructor now points to invalid memory. It turns out the thread
local storage is being initialised here:

#0  _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) at 
/usr/src/lib/libc/thread/rthread_tls.c:42
#1  0x0000043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#2  0x0000043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, 
init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at 
/usr/src/lib/libc/thread/rthread_once.c:26
#3  0x0000043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#4  0x0000043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#5  0x0000043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#6  0x0000043dd30e3ca3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#7  0x0000043dd30e4b96 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#8  0x0000043dd302d89a in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#9  0x0000043dd3031162 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#10 0x0000043dd3031ec6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#11 0x0000043dd30f7a8b in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#12 0x0000043dd311a94e in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#13 0x0000043dd311addf in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#14 0x0000043dd33ae4a6 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#15 0x0000043dd33ad6a2 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#16 0x0000043dd276e1d3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
#17 0x0000043b8f0816c6 in gl_clear (ra=0x43e50a61a50, dst=0x43e399aa950, 
color=0x43e0e382370, scissor=0x43e0e382390) at 
../mpv-0.33.1/video/out/opengl/ra_gl.c:684
#18 0x0000043b8f061db8 in gl_video_render_frame (p=0x43db938c050, 
frame=0x43e399bb350, fbo=..., flags=3) at 
../mpv-0.33.1/video/out/gpu/video.c:3251
#19 0x0000043b8f089a8f in draw_frame (vo=0x43e32b9f450, frame=0x43e399bb350) at 
../mpv-0.33.1/video/out/vo_gpu.c:87
#20 0x0000043b8f0882a4 in render_frame (vo=0x43e32b9f450) at 
../mpv-0.33.1/video/out/vo.c:957
#21 0x0000043b8f087735 in vo_thread (ptr=0x43e32b9f450) at 
../mpv-0.33.1/video/out/vo.c:1095
#22 0x0000043da1682181 in _rthread_start (v=<error reading variable: Unhandled 
dwarf expression opcode 0xa3>) at /usr/src/lib/librthread/rthread.c:96
#23 0x0000043e793b35ba in __tfork_thread () at 
/usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84

So the mpv code is not directly aware of the TLS, and it is being
allocated in iris_dri.so. This trace was taken at the start of video
playback, and at this point iris_dri.so is loaded (using dlopen) and the
destructor is valid.

#0  dlopen (libname=0xbbddfb93320 "/usr/X11R6/lib/modules/dri/iris_dri.so", 
flags=258) at /usr/src/libexec/ld.so/dlfcn.c:51
#1  0x00000bbd76e126e6 in loader_open_driver (driver_name=0xbbd52b277e0 "iris", 
out_driver_handle=0xbbd417eda28, search_path_vars=<optimized out>) at 
/usr/xenocara/lib/mesa/mk/libloader/../../src/loader/loader.c:579
#2  0x00000bbd76e0a7a8 in dri2_open_driver (disp=<optimized out>) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:771
#3  dri2_load_driver_common (disp=<optimized out>, driver_extensions=<optimized 
out>) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:783
#4  dri2_load_driver_dri3 (disp=<error reading variable: Unhandled dwarf 
expression opcode 0xa3>) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:808
#5  0x00000bbd76e0331c in dri2_initialize_x11_dri3 (drv=<optimized out>, 
disp=0xbbd4180c000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1393
#6  dri2_initialize_x11 (drv=<error reading variable: Unhandled dwarf 
expression opcode 0xa3>, disp=0xbbd4180c000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/platform_x11.c:1554
#7  0x00000bbd76e0c352 in dri2_initialize (drv=0xbbd417fd200, 
disp=0xbbd4180c000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1143
#8  0x00000bbd76e0649e in _eglMatchAndInitialize (disp=0xbbd4180c000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:75
#9  _eglMatchDriver (disp=0xbbd4180c000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/egldriver.c:98
#10 0x00000bbd76dfa1c1 in eglInitialize (dpy=<optimized out>, major=0x0, 
minor=0x0) at /usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:617
#11 0x00000bbb39afbca0 in mpegl_init (ctx=0xbbd41809350) at 
../mpv-0.33.1/video/out/opengl/context_x11egl.c:109
#12 0x00000bbb39ad0c96 in ra_ctx_create (vo=0xbbd5b2df050, context_type=0x0, 
context_name=0x0, opts=...) at ../mpv-0.33.1/video/out/gpu/context.c:185
#13 0x00000bbb39b083a7 in preinit (vo=0xbbd5b2df050) at 
../mpv-0.33.1/video/out/vo_gpu.c:298
#14 0x00000bbb39b06679 in vo_thread (ptr=0xbbd5b2df050) at 
../mpv-0.33.1/video/out/vo.c:1080
#15 0x00000bbe30c09181 in _rthread_start (v=<error reading variable: Unhandled 
dwarf expression opcode 0xa3>) at /usr/src/lib/librthread/rthread.c:96
#16 0x00000bbdb38aa5ba in __tfork_thread () at 
/usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84

However, when mpv is shutting down, iris_dri.so is
unloaded here:

#0  dlclose (handle=0xb79c1fa5800) at /usr/src/libexec/ld.so/dlfcn.c:274
#1  0x00000b7a51b541e0 in dri2_display_destroy (disp=0xb7969481000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1204
#2  0x00000b7a51b55407 in dri2_display_release (disp=0xb7969481000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1188
#3  dri2_terminate (drv=<error reading variable: Unhandled dwarf expression 
opcode 0xa3>, disp=0xb7969481000) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/drivers/dri2/egl_dri2.c:1285
#4  0x00000b7a51b43db7 in eglTerminate (dpy=<optimized out>) at 
/usr/xenocara/lib/mesa/mk/libEGL/../../src/egl/main/eglapi.c:675
#5  0x00000b775b50a174 in mpegl_uninit (ctx=0xb796948f550) at 
../mpv-0.33.1/video/out/opengl/context_x11egl.c:51
#6  0x00000b775b4dedc8 in ra_ctx_destroy (ctx_ptr=0xb796faf1158) at 
../mpv-0.33.1/video/out/gpu/context.c:211
#7  0x00000b775b516d8d in uninit (vo=0xb796fabb650) at 
../mpv-0.33.1/video/out/vo_gpu.c:286
#8  0x00000b775b514994 in vo_thread (ptr=0xb796fabb650) at 
../mpv-0.33.1/video/out/vo.c:1136
#9  0x00000b796775c181 in _rthread_start (v=<error reading variable: Unhandled 
dwarf expression opcode 0xa3>) at /usr/src/lib/librthread/rthread.c:96
#10 0x00000b7a3e8ef5ba in __tfork_thread () at 
/usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84

So iris_dri.so is unloaded before the _rthread_tls_destructors function
gets to the destructor. This is the cause of the crash. I did a quick
and dirty test by doing this:
LD_PRELOAD=/usr/X11R6/lib/modules/dri/iris_dri.so ./mpv -v file.mp4
and indeed now mpv does not crash on exit (vo=gpu is being used by
default), because the destructor is being resolved from LD_PRELOAD.

I intend to look at this on Linux to see why it does not crash there,
but haven't gotten to it yet. In the meanwhile, I wonder if we can
patch the OpenBSD port in some way to prevent the dangling TLS
destructor. If anyone has a clean solution based on the above
information please feel free to chime in. I'd love to get this fixed.

Regards,
Anindya

Reply via email to