Re: [lldb-dev] Parallelizing loading of shared libraries

Zachary Turner via lldb-dev Sun, 30 Apr 2017 22:02:01 -0700

#1 is no big deal, we could just allocate one in a global class somewhere.

#2 actually seems quite desirable, is there any reason you don't want that?


#3 seems like a win for performance since no locks have to be acquired to
manage the collection of threads
On Sun, Apr 30, 2017 at 9:41 PM Scott Smith <[email protected]>
wrote:

> The overall concept is similar; it comes down to implementation details
> like
> 1. llvm doesn't have a global pool, it's probably instantiated on demand
> 2. llvm keeps threads around until the pool is destroyed, rather than
> letting the threads exit when they have nothing to do
> 3. llvm starts up all the threads immediately, rather than on demand.
>
> Overall I like the current lldb version better than the llvm version, but
> I haven't examined any of the use cases of the llvm version to know whether
> it could be dropped in without issue.  However, neither does what I want,
> so I'll move forward prototyping what I think it should do, and then see
> how applicable it is to llvm.
>
> On Sun, Apr 30, 2017 at 9:02 PM, Zachary Turner <[email protected]>
> wrote:
>
>> Have we examined llvm::ThreadPool to see if it can work for our needs?
>> And if not, what kind of changes would be needed to llvm::ThreadPool to
>> make it suitable?
>>
>> On Fri, Apr 28, 2017 at 8:04 AM Scott Smith via lldb-dev <
>> [email protected]> wrote:
>>
>>> Hmmm ok, I don't like hard coding pools.  Your idea about limiting the
>>> number of high level threads gave me an idea:
>>>
>>> 1. System has one high level TaskPool.
>>> 2. TaskPools have up to one child and one parent (the parent for the
>>> high level TaskPool = nullptr).
>>> 3. When a worker starts up for a given TaskPool, it ensures a single
>>> child exists.
>>> 4. There is a thread local variable that indicates which TaskPool that
>>> thread enqueues into (via AddTask).  If that variable is nullptr, then it
>>> is the high level TaskPool.Threads that are not workers enqueue into this
>>> TaskPool.  If the thread is a worker thread, then the variable points to
>>> the worker's child.
>>> 5. When creating a thread in a TaskPool, it's thread count AND the
>>> thread count of the parent, grandparent, etc are incremented.
>>> 6. In the main worker loop, if there is no more work to do, OR the
>>> thread count is too high, the worker "promotes" itself.  Promotion means:
>>> a. decrement the thread count for the current task pool
>>> b. if there is no parent, exit; otherwise, become a worker for the
>>> parent task pool (and update the thread local TaskPool enqueue pointer).
>>>
>>> The main points are:
>>> 1. We don't hard code the number of task pools; the code automatically
>>> uses the fewest number of taskpools needed regardless of the number of
>>> places in the code that want task pools.
>>> 2. When the child taskpools are busy, parent taskpools reduce their
>>> number of workers over time to reduce oversubscription.
>>>
>>> You can fiddle with the # of allowed threads per level; for example, if
>>> you take into account number the height of the pool, and the number of
>>> child threads, then you could allocate each level 1/2 of the number of
>>> threads as the level below it, unless the level below wasn't using all the
>>> threads; then the steady state would be 2 * cores, rather than height *
>>> cores.  I think that it probably overkill though.
>>>
>>>
>>> On Fri, Apr 28, 2017 at 4:37 AM, Pavel Labath <[email protected]> wrote:
>>>
>>>> On 27 April 2017 at 00:12, Scott Smith via lldb-dev
>>>> <[email protected]> wrote:
>>>> > After a dealing with a bunch of microoptimizations, I'm back to
>>>> > parallelizing loading of shared modules.  My naive approach was to
>>>> just
>>>> > create a new thread per shared library.  I have a feeling some users
>>>> may not
>>>> > like that; I think I read an email from someone who has thousands of
>>>> shared
>>>> > libraries.  That's a lot of threads :-)
>>>> >
>>>> > The problem is loading a shared library can cause downstream
>>>> parallelization
>>>> > through TaskPool.  I can't then also have the loading of a shared
>>>> library
>>>> > itself go through TaskPool, as that could cause a deadlock - if all
>>>> the
>>>> > worker threads are waiting on work that TaskPool needs to run on a
>>>> worker
>>>> > thread.... then nothing will happen.
>>>> >
>>>> > Three possible solutions:
>>>> >
>>>> > 1. Remove the notion of a single global TaskPool, but instead have a
>>>> static
>>>> > pool at each callsite that wants it.  That way multiple paths into
>>>> the same
>>>> > code would share the same pool, but different places in the code
>>>> would have
>>>> > their own pool.
>>>> >
>>>>
>>>> I looked at this option in the past and this was my preferred
>>>> solution. My suggestion would be to have two task pools. One for
>>>> low-level parallelism, which spawns
>>>> std::thread::hardware_concurrency() threads, and another one for
>>>> higher level tasks, which can only spawn a smaller number of threads
>>>> (the algorithm for the exact number TBD). The high-level threads can
>>>> access to low-level ones, but not the other way around, which
>>>> guarantees progress.
>>>>
>>>> I propose to hardcode 2 pools, as I don't want to make it easy for
>>>> people to create additional ones -- I think we should be having this
>>>> discussion every time someone tries to add one, and have a very good
>>>> justification for it (FWIW, I think your justification is good in this
>>>> case, and I am grateful that you are pursuing this).
>>>>
>>>> pl
>>>>
>>>
>>> _______________________________________________
>>> lldb-dev mailing list
>>> [email protected]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>>
>

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Parallelizing loading of shared libraries

Reply via email to