This patch adds internal multithreading to gallium nine. The goal is to offload almost all gallium nine calls (and some other work) to a worker thread.
The patch serie does first a lot of refactoring, and introduces a new nine_context structure containing all the required internal states to do the gallium calls. It will be the structure used exclusively by the worker thread. The pipe_context is exclusive to the worker thread, and the main thread needs special functions to access it, which either wait for all pending commands to execute, or pause the thread. A secondary pipe_context is also introduced for operations that don't need implicit synchronization with rendering (buffer upload with DISCARD/NOOVERWRITE in particular). To maximize performance, the commands are queued into preallocated queues, and the queues are made visible to the worker thread only when a significant amount of commands are queued. To be a performance gain, this requires waiting on the worker thread to finish its job to be very rare. With all the patches of the serie, synchronization basically happens only at: . surface/volume destruction if their content is in RAM and was needed for a pending command. . Buffer lock not using MANAGED pool or DISCARD/NOOVERWRITE . Surface/volume lock very close to a previous lock. ^^^ Usually these cases only happen at the beginning of scene, where items are initialized, and do not happen in a normal frame. . At the end of a frame, when Present() is called. Thus basically for the great majority of games, the only moment we really wait for the worker thread to finish its job is when all frame commands have been sent. Because we require driver thread safety (pipe_screen commands can be made in the main thread while the pipe_context is used in the worker thread), internal multithreading (dubbed CSMT, in reference to wine ogl internal multithreading mode) is enabled by default only on r600/radeonsi, but can be forced on/off via a setting (csmt_force=0 or 1). One thing the patchset could improve is stateblocks handling. The function to apply them to the nine_context can be optimized, and overhead can be reduced. Most games don't use stateblocks though. How does this compare to wine CSMT ? I haven't looked much at the details of wine CSMT. My understanding is that opengl calls are offloaded to a worker thread. I don't know exactly which optimizations are done to avoid waiting on the worker thread. How does this compare to Windows internal multithreading ? The public direct3D DDI documentation gives some indications on how works the multithreading. Some tests can also be made to deduce some information. Basically most commands are said to be put into a worker thread, while buffer locks and check for query results are made with reentrant functions in the main thread. Some tests suggest MANAGED pool upload is done in the worker thread, like the gallium patch serie implements. Thus we expect performance to be comparable. Axel Davy (75): st/nine: Introduce nine_context st/nine: Move core of device clear to nine_state st/nine: Move draw calls to nine_state st/nine: Track changed.texture only for stateblocks st/nine: Move texture setting to nine_context_* st/nine: Back textures into nine_context st/nine: Move stream_usage_mask to nine_context st/nine: Move vtxbuf to nine_context st/nine: Move stream freq data to nine_context st/nine: Back vdecl to nine_context st/nine: Back vs to nine_context st/nine: Back sampler states to nine_context st/nine: Back all shader constants to nine_context st/nine: Back current index buffer to nine_context st/nine: Back RT to nine_context st/nine: Back scissor to nine_context st/nine: Back viewport to nine_context st/nine: Put ff data in a separate structure st/nine: Refactor SetLight st/nine: Refactor LightEnable st/nine: Back all ff states in nine_context st/nine: Back ds to nine_context st/nine: Back ps to nine_context st/nine: Back User Clip Planes to nine_context st/nine: Track dirty state groups in nine_context st/nine: Use atomics for nine_bind st/nine: Move query9 pipe calls to nine_context st/nine: Remove NineDevice9_GetCSO st/nine: Access pipe_context via NineDevice9_GetPipe st/nine: Rename cso in nine_context to cso_shader st/nine: Rename pipe to pipe_data in nine_context st/nine: Move pipe and cso to nine_context st/nine: Integrate nine_pipe_context_clear to nine_context_clear st/nine: Move Managed Pool handling out of nine_context st/nine: Do not use NineBaseTexture9 in nine_context st/nine: Decompose nine_context_set_stream_source st/nine: Decompose nine_context_set_indices st/nine: Decompose nine_context_set_texture st/nine: Reimplement nine_context_apply_stateblock st/nine: Change the way nine_shader gets the pipe st/nine: Back swvp in nine_context st/nine: Create pipe_surfaces on resource creation. st/nine: Simplify the logic to bind textures st/nine: Fix BASETEX_REGISTER_UPDATE st/nine: Track bindings for buffers st/nine: Upload Managed buffers just before draw call using them st/nine: Add nine_context_get_pipe_acquire/release st/nine: Add secondary pipe for device st/nine: Implement Fast path for dynamic buffers and csmt st/nine: use get_pipe_acquire/release when possible st/nine: Simplify ColorFill st/nine: Optimize ColorFill st/nine: Use nine_context_clear_render_target st/nine: Avoid flushing the queue for queries GetData st/nine: Simplify ARG_BIND_REF st/nine: Fix NineUnknown_Detach st/nine: Detach buffers in swapchain dtor. st/nine: Comment and simplify iunknown st/nine: Do not bind the container if forward is false st/nine: Implement nine_context_range_upload st/nine: Optimize managed buffer upload st/nine: Implement nine_context_gen_mipmap st/nine: Use nine_context_gen_mipmap in BaseTexture9 st/nine: Implement nine_context_box_upload st/nine: Use nine_context_box_upload for surfaces st/nine: Fix leak with cubetexture dtor st/nine: Fix leak with volume dtor st/nine: Use nine_context_box_upload for volumes st/nine: Bind destination for surface/volume uploads st/nine: Idem for nine_context_gen_mipmap st/nine: Add arguments to context's blit and copy_region st/nine: Do not wait for DEFAULT lock for surfaces when we can st/nine: Do not wait for DEFAULT lock for volumes when we can st/nine: Allow non-zero resource offset for vertex buffers st/nine: Implement new buffer upload path Patrick Rudolph (9): st/nine: Add nine_queue st/nine: Add struct nine_clipplane st/nine: Pass size of memory to nine_state st/nine: Implement gallium nine CSMT st/nine: Print threadid in debug log st/nine: Add NINE_DEBUG=tid to turn threadid on or off st/nine: Use nine_context for blit st/nine: Use nine_context for resource_copy_region st/nine: Add CSMT_NO_WAIT_WITH_COUNTER src/gallium/auxiliary/os/os_thread.h | 11 + src/gallium/state_trackers/nine/Makefile.sources | 5 + src/gallium/state_trackers/nine/adapter9.h | 1 + src/gallium/state_trackers/nine/basetexture9.c | 45 +- src/gallium/state_trackers/nine/basetexture9.h | 23 +- src/gallium/state_trackers/nine/buffer9.c | 155 +- src/gallium/state_trackers/nine/buffer9.h | 56 +- src/gallium/state_trackers/nine/cubetexture9.c | 2 +- src/gallium/state_trackers/nine/device9.c | 1001 +++---- src/gallium/state_trackers/nine/device9.h | 17 +- src/gallium/state_trackers/nine/device9ex.c | 2 +- src/gallium/state_trackers/nine/indexbuffer9.c | 10 +- src/gallium/state_trackers/nine/indexbuffer9.h | 2 - src/gallium/state_trackers/nine/iunknown.c | 9 +- src/gallium/state_trackers/nine/iunknown.h | 40 +- .../state_trackers/nine/nine_buffer_upload.c | 288 ++ .../state_trackers/nine/nine_buffer_upload.h | 59 + src/gallium/state_trackers/nine/nine_csmt_helper.h | 427 +++ src/gallium/state_trackers/nine/nine_debug.c | 28 +- src/gallium/state_trackers/nine/nine_debug.h | 1 + src/gallium/state_trackers/nine/nine_ff.c | 280 +- src/gallium/state_trackers/nine/nine_ff.h | 18 +- src/gallium/state_trackers/nine/nine_pipe.c | 22 - src/gallium/state_trackers/nine/nine_pipe.h | 2 - src/gallium/state_trackers/nine/nine_queue.c | 275 ++ src/gallium/state_trackers/nine/nine_queue.h | 54 + src/gallium/state_trackers/nine/nine_shader.c | 3 +- src/gallium/state_trackers/nine/nine_shader.h | 4 +- src/gallium/state_trackers/nine/nine_state.c | 2914 ++++++++++++++++---- src/gallium/state_trackers/nine/nine_state.h | 446 ++- src/gallium/state_trackers/nine/pixelshader9.c | 20 +- src/gallium/state_trackers/nine/pixelshader9.h | 14 +- src/gallium/state_trackers/nine/query9.c | 32 +- src/gallium/state_trackers/nine/query9.h | 1 + src/gallium/state_trackers/nine/stateblock9.c | 201 +- src/gallium/state_trackers/nine/surface9.c | 130 +- src/gallium/state_trackers/nine/surface9.h | 8 +- src/gallium/state_trackers/nine/swapchain9.c | 25 +- src/gallium/state_trackers/nine/swapchain9.h | 2 - src/gallium/state_trackers/nine/vertexbuffer9.c | 4 +- src/gallium/state_trackers/nine/vertexbuffer9.h | 2 +- src/gallium/state_trackers/nine/vertexshader9.c | 28 +- src/gallium/state_trackers/nine/vertexshader9.h | 12 +- src/gallium/state_trackers/nine/volume9.c | 94 +- src/gallium/state_trackers/nine/volume9.h | 2 +- src/gallium/state_trackers/nine/volumetexture9.c | 2 +- src/gallium/targets/d3dadapter9/drm.c | 6 + src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 + 48 files changed, 5048 insertions(+), 1740 deletions(-) create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h create mode 100644 src/gallium/state_trackers/nine/nine_csmt_helper.h create mode 100644 src/gallium/state_trackers/nine/nine_queue.c create mode 100644 src/gallium/state_trackers/nine/nine_queue.h -- 2.10.2 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev