Hi all, I was chasing an elusive bug that went away with GALLIUM_THREAD=0, so I wanted to use ddebug with Gallium threads. That required some fixes to how radeonsi compiles shaders. However, with that fixed, ddebug *also* made the bug go away.
This series does a lot of things, but the overarching goal is to rewrite ddebug in a way that can be used with Gallium threading in a minimally intrusive way to reduce the chance of Heisenbugs. Patches 1-4: Cleanup some time handling and add a util_queue_fence_wait_timeout. We'll use that later since (queue) fences will be embedded inside (Gallium) fences, and for ddebug hang detection, we really need waits with timeouts. Patches 5-14: Add asynchronous flushing and fine-grained fences to Gallium and radeonsi: 1. pipe_context::flush has always provided a stronger guarantee than what may be expected by glFlush() [0]. Specifically, pipe_context::flush establishes a happens-before relationship where all operations in all contexts of the screen that are called after flush() returns will happen after all operations that happened in the flushed context before the flush(). glFlush() doesn't actually make this stronger guarantee as I understand the spec, at least the OpenGL one. I suspect that the stronger guarantee may be implied by GLX and other WSI, but I'm not sure. And certainly, I wouldn't be surprised if there's software out there that assumes it. Anyway, this series adds a new PIPE_FLUSH_ASYNC flag which can be used to tell the driver that a weaker guarantee suffices. This flag is handled by the Gallium threaded context: it will execute the flush asynchronously, in the separate driver thread. The same behavior is enabled for PIPE_FLUSH_DEFERRED flushes. Both require a new special protocol for Gallium threading enabled drivers (currently only radeonsi). 2. ddebug hang detection needs a way of adding fences to individual draw calls, in order to pin-point exactly which draw call causes a hang. This was previously done by inserting clear_buffer() calls, relying on the precise implementation of those by the driver. It so happened that radeonsi could decide to send those to SDMA, which stopped them from working. In general, this approach was a terrible abuse of the interface and layering violation. This series adds PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE flags which can be used to specifically request a fine-grained, per-draw call fence. Waiting on those fences will mostly work like waiting on a deferred fence, but when a timeout is used (especially timeout == 0), the fence can be signaled earlier based on a value written by the GPU inside the command stream. Patches 15-23: Rewrite the ddebug core to always use pipelined mode, and streamline the GALLIUM_DDEBUG parsing. See the detailed comment on patch #20, which is the main chunk of code. Also adds the option of treating transfers as draw calls. Patches 24 & 25: Turn on Gallium threading for debug contexts with radeonsi. Please review! Thanks, Nicolai [0] This is true for Radeon. But since AFAIK amdgpu is the only kernel driver with a scheduler, I suspect the same is true for other drivers. If your driver *doesn't* provide this stronger guarantee, please speak up! -- include/c11/threads.h | 11 - include/c11/threads_posix.h | 39 +- include/c11/threads_win32.h | 37 +- src/egl/drivers/dri2/egl_dri2.c | 24 +- src/gallium/auxiliary/Makefile.sources | 3 - src/gallium/auxiliary/gallivm/lp_bld_init.c | 2 +- src/gallium/auxiliary/hud/hud_cpu.c | 2 +- src/gallium/auxiliary/hud/hud_cpufreq.c | 2 +- src/gallium/auxiliary/hud/hud_diskstat.c | 2 +- src/gallium/auxiliary/hud/hud_driver_query.c | 2 +- src/gallium/auxiliary/hud/hud_fps.c | 2 +- src/gallium/auxiliary/hud/hud_nic.c | 2 +- src/gallium/auxiliary/hud/hud_sensors_temp.c | 2 +- src/gallium/auxiliary/meson.build | 3 - .../auxiliary/pipebuffer/pb_bufmgr_cache.c | 1 - .../auxiliary/pipebuffer/pb_bufmgr_debug.c | 1 - .../auxiliary/pipebuffer/pb_bufmgr_slab.c | 1 - src/gallium/auxiliary/pipebuffer/pb_cache.c | 2 +- src/gallium/auxiliary/util/u_debug.c | 19 +- src/gallium/auxiliary/util/u_dump.h | 9 + src/gallium/auxiliary/util/u_dump_defines.c | 53 + src/gallium/auxiliary/util/u_dump_state.c | 16 +- .../auxiliary/util/u_threaded_context.c | 212 +++- .../auxiliary/util/u_threaded_context.h | 58 +- .../util/u_threaded_context_calls.h | 2 + src/gallium/auxiliary/util/u_time.h | 150 --- src/gallium/docs/source/context.rst | 23 + src/gallium/drivers/ddebug/dd_context.c | 130 +- src/gallium/drivers/ddebug/dd_draw.c | 1049 ++++++++++------ src/gallium/drivers/ddebug/dd_pipe.h | 93 +- src/gallium/drivers/ddebug/dd_screen.c | 168 ++- src/gallium/drivers/ddebug/dd_util.h | 32 +- .../drivers/etnaviv/etnaviv_query_sw.c | 2 +- src/gallium/drivers/etnaviv/etnaviv_screen.c | 2 +- .../drivers/freedreno/freedreno_query_sw.c | 2 +- .../drivers/freedreno/freedreno_screen.c | 2 +- src/gallium/drivers/llvmpipe/lp_query.c | 2 +- src/gallium/drivers/llvmpipe/lp_rast.c | 2 +- src/gallium/drivers/llvmpipe/lp_screen.c | 2 +- src/gallium/drivers/llvmpipe/lp_setup.c | 2 +- src/gallium/drivers/llvmpipe/lp_state_fs.c | 2 +- .../drivers/llvmpipe/lp_state_setup.c | 2 +- src/gallium/drivers/nouveau/nouveau_fence.c | 2 +- src/gallium/drivers/nouveau/nouveau_screen.c | 2 +- src/gallium/drivers/r300/r300_context.c | 2 +- src/gallium/drivers/r300/r300_flush.c | 2 +- src/gallium/drivers/r300/r300_screen.c | 2 +- src/gallium/drivers/r600/r600_gpu_load.c | 2 +- src/gallium/drivers/r600/r600_pipe.c | 2 +- src/gallium/drivers/r600/r600_pipe_common.c | 2 +- src/gallium/drivers/r600/r600_query.c | 2 +- src/gallium/drivers/r600/r600_texture.c | 2 +- src/gallium/drivers/r600/sb/sb_core.cpp | 2 +- src/gallium/drivers/radeon/r600_gpu_load.c | 2 +- .../drivers/radeon/r600_pipe_common.c | 269 +--- src/gallium/drivers/radeon/r600_query.c | 2 +- src/gallium/drivers/radeon/r600_texture.c | 2 +- .../drivers/radeonsi/Makefile.sources | 1 + src/gallium/drivers/radeonsi/meson.build | 1 + src/gallium/drivers/radeonsi/si_debug.c | 5 +- src/gallium/drivers/radeonsi/si_fence.c | 482 +++++++ src/gallium/drivers/radeonsi/si_hw_context.c | 3 + src/gallium/drivers/radeonsi/si_pipe.c | 14 +- src/gallium/drivers/radeonsi/si_pipe.h | 7 + src/gallium/drivers/rbug/rbug_core.c | 2 +- src/gallium/drivers/softpipe/sp_query.c | 2 +- src/gallium/drivers/softpipe/sp_screen.c | 2 +- src/gallium/drivers/svga/svga_context.h | 2 +- src/gallium/drivers/svga/svga_pipe_draw.c | 1 - src/gallium/drivers/swr/swr_fence.cpp | 2 +- src/gallium/drivers/swr/swr_query.cpp | 2 +- src/gallium/drivers/trace/tr_dump.c | 2 +- src/gallium/drivers/virgl/virgl_screen.c | 2 +- src/gallium/include/pipe/p_context.h | 19 +- src/gallium/include/pipe/p_defines.h | 4 + .../state_trackers/wgl/stw_framebuffer.c | 2 +- src/gallium/tests/unit/pipe_barrier_test.c | 3 +- src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 2 +- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 3 +- .../winsys/radeon/drm/radeon_drm_bo.c | 2 +- .../winsys/radeon/drm/radeon_drm_cs.c | 2 +- .../winsys/virgl/drm/virgl_drm_winsys.c | 2 +- .../winsys/virgl/vtest/virgl_vtest_winsys.c | 2 +- src/util/Makefile.sources | 2 + src/util/futex.h | 9 +- src/util/meson.build | 2 + src/{gallium/auxiliary/os => util}/os_time.c | 19 +- src/{gallium/auxiliary/os => util}/os_time.h | 23 +- src/util/simple_mtx.h | 2 +- src/util/u_queue.c | 77 +- src/util/u_queue.h | 51 +- 91 files changed, 1990 insertions(+), 1235 deletions(-) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev