On May 16, 2018, at 9:25 AM, Alexandre <alexandre.gauthier-foic...@inria.fr<mailto:alexandre.gauthier-foic...@inria.fr>> wrote:
Thank you for your answer. I understand I can control the number of threads and prevent them to be assigned to actual hardware threads. Preventing oversubscription of the hardware threads is challenging when using OpenMP/TBB/OpenSWR in hybrid environments. I am wondering if having N SWR contexts (where N correspond to the number of hardware threads) each single-threaded is *good enough* (not too bad performances compared to a single SWR context that serially render the tasks). Do you have a take on this ? That might be do the trick. A single threaded swr context would not give high performance; swr was architected to parallelize the pipeline stages and depends on multiple threads/cpus to deliver high performance. Notably compared to llvmpipe we can parallelize the geometry frontend and thus achieve much higher throughput. Similar oversubscription problems occur with all applications that use multiple threading technologies (Cilk, TBB, OpenMP … ) and there are minimal solutions to prevent it besides re-writing code to use only 1 tech. Yes, getting different threading libraries to agree can be tricky. Does your application overlap heavy compute with graphics rendering? If not, the oversubscription point might be moot. One bit of advice we give to TBB library users is to initialize the TBB library before creating an OpenGL/SWR context. This allows TBB to size its thread pool to the entire machine, and then SWR will come in and create all its threads. The other way round, SWR binds threads to cores, which TBB understands as unavailable resources resulting in a thread pool size of one. If your concern is multiple SWR contexts running simultaneously and oversubscribing, it’s true that the swr thread pool creation is per-context and as Bruce says the only way to prevent that currently is setting the environmental variable to limit the number of worker threads. This number should be greater than 1 for good performance, though. -Tim An alternative solution would be to have a callback mechanism in OpenSWR to launch a task on the application. Cheers Alex On 16 May 2018, at 14:34, Cherniak, Bruce <bruce.chern...@intel.com<mailto:bruce.chern...@intel.com>> wrote: On May 14, 2018, at 8:59 AM, Alexandre <alexandre.gauthier-foic...@inria.fr<mailto:alexandre.gauthier-foic...@inria.fr>> wrote: Hello, Sorry for the inconvenience if this message is not appropriate for this mailing list. The following is a question for developers of the swr driver of gallium. I am the main developer of a motion graphics application. Our application internally has a dependency graph where each node may run concurrently. We use OpenGL extensively in the implementation of the nodes (for example with Shadertoy). Our application has 2 main requirements: - A GPU backend, mainly for user interaction and fast results - A CPU backend for batch rendering Internally we use OSMesa for CPU backend so that our code is mostly identical for both GPU and CPU paths. However when it comes to CPU, our application is heavily multi-threaded: each processing node can potentially run in parallel of others as a dependency graph. We use Intel TBB to schedule the CPU threads. For each actual hardware thread (not task) we allocate a new OSMesa context so that we can freely multi-thread operators rendering. It works fine with llvmpipe and also SWR so far (with a patch to fix some static variables inside state_trackers/osmesa.c). However with SWR using its own thread pool, I’m afraid of over-threading, introducing a bottleneck in threads scheduling e.g: on a 32 cores processor, we already have lets say 24 threads busy on a TBB task on each core with 1 OSMesa context. I looked at the code and all those concurrent OSMesa contexts will create a SWR context and each will try to initialise its own thread pool in CreateThreadPool in swr/rasterizer/core/api.cpp Is there a way to have a single “static” thread-pool shared across all contexts ? There is not currently a way to create a single thread-pool shared across all contexts. Each context creates unique worker threads. However, OpenSWR provides an environment variable, KNOB_MAX_WORKER_THREADS, that overrides the default thread allocation. Setting this will limit the number of threads created by an OpenSWR context *and* prevent the threads from being bound to physical cores. Please, give this a try. By adjusting the value, you may find the optimal value for your situation. Cheers, Bruce Thank you Alexandre _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev