On Wed, Jun 5, 2019 at 10:42 PM 김규래 <msc...@naver.com> wrote: > > On Wed, Jun 5, 2019 at 9:25 PM 김규래 <msc...@naver.com> wrote: > > > > > > Hi, thanks for the detailed explanation. > > > I think I now get the picture. > > > Judging from my current understanding, the task-parallelism currently > works as follows: > > > 1. Tasks are placed in a global shared queue. > > > 2. Workers consume the tasks by bashing the queue in a while loop, > just as self-scheduling (dynamic scheduling)/ > > > > > > Then the improvements including work-stealing must be done by: > > > 1. Each worker holds a dedicated task queue reducing the resource > contention. > > > 2. The tasks are distributed in a round-robin fashion > > > For nested task submission (does OpenMP support that?) you probably > > want to submit to the local queue rather than round-robin, no? > > > > Hi Janne, > > I believe you're talking about spawning child tasks within an already > running task, which I believe is supported by OpenMP. > > In this case, pushing to the local queue could introduce a deadlock if the > master thread waits on the spawned tasks. > > A short one though since work-stealing can resolve the deadlock. > > A better way to handle this is to round-robin the child tasks to the > queues excluding the queue of the thread consuming the current task. > > Then waiting on the tasks should never cause a deadlock. > > Or did I misunderstand your question? > > Maybe there is a specific reason for avoiding avoid round-robin that I > missed? > Better cache locality.
Another option, which I guess starts to go out of scope of your gsoc, is parallel depth first (PDF) search (Blelloch 1999) as an alternative to work stealing. Here's a presentation about some recent work in this area, although for Julia and not OpenMP (no idea if PDF would fit with OpenMP at all): https://www.youtube.com/watch?v=YdiZa0Y3F3c -- Janne Blomqvist