Thank you for your answer.
I understand  I can control the number of threads and prevent them to be 
assigned to actual hardware threads.
Preventing oversubscription of the hardware threads is challenging when using 
OpenMP/TBB/OpenSWR in hybrid environments.

I am wondering if having N SWR contexts (where N correspond to the number of 
hardware threads) each single-threaded 
is *good enough* (not too bad performances compared to a single SWR context 
that serially render the tasks). 
Do you have a take on this ?
That might be do the trick.

Similar oversubscription problems occur with all applications that use multiple 
threading technologies (Cilk, TBB, OpenMP … ) and there are minimal solutions 
to prevent it besides re-writing code to use only 1 tech.

An alternative solution would be to have a callback mechanism in OpenSWR to 
launch a task on the application.

Cheers

Alex


> On 16 May 2018, at 14:34, Cherniak, Bruce <bruce.chern...@intel.com> wrote:
> 
>> 
>> On May 14, 2018, at 8:59 AM, Alexandre <alexandre.gauthier-foic...@inria.fr 
>> <mailto:alexandre.gauthier-foic...@inria.fr>> wrote:
>> 
>> Hello,
>> 
>> Sorry for the inconvenience if this message is not appropriate for this 
>> mailing list.
>> 
>> The following is a question for developers of the swr driver of gallium.
>> 
>> I am the main developer of a motion graphics application. 
>> Our application internally has a dependency graph where each node may run 
>> concurrently.
>> We use OpenGL extensively in the implementation of the nodes (for example 
>> with Shadertoy).
>> 
>> Our application has 2 main requirements: 
>> - A GPU backend, mainly for user interaction and fast results
>> - A CPU backend for batch rendering
>> 
>> Internally we use OSMesa for CPU backend so that our code is mostly 
>> identical for both GPU and CPU paths.
>> However when it comes to CPU, our application is heavily multi-threaded: 
>> each processing node can potentially run in parallel of others as a 
>> dependency graph.
>> We use Intel TBB to schedule the CPU threads.
>> 
>> For each actual hardware thread (not task) we allocate a new OSMesa context 
>> so that we can freely multi-thread operators rendering. It works fine with 
>> llvmpipe and also SWR so far (with a  patch to fix some static variables 
>> inside state_trackers/osmesa.c).
>> 
>> However with SWR using its own thread pool, I’m afraid of over-threading, 
>> introducing a bottleneck in threads scheduling
>> e.g: on a 32 cores processor, we already have lets say 24 threads busy on a 
>> TBB task on each core with 1 OSMesa context. 
>> I looked at the code and all those concurrent OSMesa contexts will create a 
>> SWR context and each will try to initialise its own thread pool in 
>> CreateThreadPool in swr/rasterizer/core/api.cpp 
>> 
>> Is there a way to have a single “static” thread-pool shared across all 
>> contexts ?
> 
> There is not currently a way to create a single thread-pool shared across all 
> contexts.  Each context creates unique worker threads.
> 
> However, OpenSWR provides an environment variable, KNOB_MAX_WORKER_THREADS, 
> that overrides the default thread allocation.
> Setting this will limit the number of threads created by an OpenSWR context 
> *and* prevent the threads from being bound to physical cores.
> 
> Please, give this a try.  By adjusting the value, you may find the optimal 
> value for your situation.
> 
> Cheers,
> Bruce
> 
>> Thank you
>> 
>> Alexandre
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org>
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev 
>> <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to