Thank you for your answer. I understand I can control the number of threads and prevent them to be assigned to actual hardware threads. Preventing oversubscription of the hardware threads is challenging when using OpenMP/TBB/OpenSWR in hybrid environments.
I am wondering if having N SWR contexts (where N correspond to the number of hardware threads) each single-threaded is *good enough* (not too bad performances compared to a single SWR context that serially render the tasks). Do you have a take on this ? That might be do the trick. Similar oversubscription problems occur with all applications that use multiple threading technologies (Cilk, TBB, OpenMP … ) and there are minimal solutions to prevent it besides re-writing code to use only 1 tech. An alternative solution would be to have a callback mechanism in OpenSWR to launch a task on the application. Cheers Alex > On 16 May 2018, at 14:34, Cherniak, Bruce <bruce.chern...@intel.com> wrote: > >> >> On May 14, 2018, at 8:59 AM, Alexandre <alexandre.gauthier-foic...@inria.fr >> <mailto:alexandre.gauthier-foic...@inria.fr>> wrote: >> >> Hello, >> >> Sorry for the inconvenience if this message is not appropriate for this >> mailing list. >> >> The following is a question for developers of the swr driver of gallium. >> >> I am the main developer of a motion graphics application. >> Our application internally has a dependency graph where each node may run >> concurrently. >> We use OpenGL extensively in the implementation of the nodes (for example >> with Shadertoy). >> >> Our application has 2 main requirements: >> - A GPU backend, mainly for user interaction and fast results >> - A CPU backend for batch rendering >> >> Internally we use OSMesa for CPU backend so that our code is mostly >> identical for both GPU and CPU paths. >> However when it comes to CPU, our application is heavily multi-threaded: >> each processing node can potentially run in parallel of others as a >> dependency graph. >> We use Intel TBB to schedule the CPU threads. >> >> For each actual hardware thread (not task) we allocate a new OSMesa context >> so that we can freely multi-thread operators rendering. It works fine with >> llvmpipe and also SWR so far (with a patch to fix some static variables >> inside state_trackers/osmesa.c). >> >> However with SWR using its own thread pool, I’m afraid of over-threading, >> introducing a bottleneck in threads scheduling >> e.g: on a 32 cores processor, we already have lets say 24 threads busy on a >> TBB task on each core with 1 OSMesa context. >> I looked at the code and all those concurrent OSMesa contexts will create a >> SWR context and each will try to initialise its own thread pool in >> CreateThreadPool in swr/rasterizer/core/api.cpp >> >> Is there a way to have a single “static” thread-pool shared across all >> contexts ? > > There is not currently a way to create a single thread-pool shared across all > contexts. Each context creates unique worker threads. > > However, OpenSWR provides an environment variable, KNOB_MAX_WORKER_THREADS, > that overrides the default thread allocation. > Setting this will limit the number of threads created by an OpenSWR context > *and* prevent the threads from being bound to physical cores. > > Please, give this a try. By adjusting the value, you may find the optimal > value for your situation. > > Cheers, > Bruce > >> Thank you >> >> Alexandre >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev