> > Implementation of competing systems:
> > The intel OpenMP implementation [1] is simpler.
> > It uses a single queue for each thread and a single subroutine for 
> > dequeuing and executing the tasks  [2, 3].
> > The taskgroup tasks and childen tasks are only counted (not queued) [4, 5, 
> > 6].
> > So the taskwait or barrier like constructs only have to check whether all 
> > the tasks of interest were computed.
> > This unifies the task queuing system and makes scheduling much simpler.

> I see the per-thread locks td_deque_lock in there, does it have anything
> like the per team lock (or per taskgroup lock) for some cases or not?
> What kind of locking does it use for task dependencies?  E.g. a task could
> depend on tasks running in multiple threads and some further tasks still
> queued in various threads' queues (plus some not yet satisfied tasks), for
> the dependencies certainly in libgomp data structures all the bookkeeping
> can't be done just with atomic accesses, so some mutex is needed. 
 
It doesn't seem like there are such locks.
Everything done outside of queue accesses are atomic count reads.
The dependencies, however, are handled in an interesting way.
They create a mutex for each dependency and try to acquire them before 
selecting a task [1, 2, 3].
 
> Where will a new task be placed when created?  Pick a random thread, take
> that thread's lock and put it into that queue?  What about threads that are
> currently sleeping and need to be woken up using gomp_sem_post, do we somehow
> prefer them for the new task placement?  Do we consider priorities just
> within each thread's queue and not globally?  I'm afraid if we want to avoid
> the team->task_lock altogether or as much as possible, we have to and in the
> standard, priority is just a best effort, so it might be fine.
> For the work-stealing, if the current thread doesn't have anything in its
> own queue, will it pick a random other thread (or say remember what it tried
> last or similar)?  And as said earlier, if there is no work to do by the
> current thread, but it needs to wait for some other tasks to complete,
> it shouldn't busy wait forever, but needs to sleep and needs to be awaken if
> there is further work for it or if the condition it is waiting on is finally
> met (gomp_team_barrier_wake, taskwait->taskwait_sem,
> taskgroup->taskgroup_sem).
 
I think this is more of a policy issue.
libomp currently queues the children tasks on the queue of the current thread.
When there's no work to do, (or the current task is waiting on dependencies),
it uniformly chooses a victim, tests the dependencies and steals it.
We could play around with different configurations to see what works once we 
get a running prototype.
There's also some recent literature done in this part (locality-aware victim 
selection stuff).
For the case of the priorities, libomp doesn't actually seem to care about 
priority at all.
Since the OpenMP standard doesn't require strict priorities, I think we can 
just relax the priority constraints on a per-thread basis. 
 
> Another option, which I guess starts to go out of scope of your gsoc, is
> parallel depth first (PDF) search (Blelloch 1999) as an alternative to work
> stealing. Here's a presentation about some recent work in this area,
> although for Julia and not OpenMP (no idea if PDF would fit with OpenMP at
> all): https://www.youtube.com/watch?v=YdiZa0Y3F3c 
 
I think PDF is also an interesting idea.
It seems the Julia fellows implemented PDF by assigning higher priorities to 
children tasks.
If that is the case, it should be trivial to implement since we already use 
prioirity queues (and libomp doesn't)
 
> > What to do on libgomp:
> > I think we should follow a similar path to libomp.
> > Instead of using 3 different queues, we could simply use one and only count 
> > the tasks of interest.

> We already do have such counts at least in some cases, e.g.
> taskgroup->num_children and sure, new ones can be added.
> The replacement of the 3 queues with either a single one or
> a per implicit task is the easy part. 
 
I'll first try to get libgomp work without the child and taskgroup queues.
 
Ray Kim
 
[1] Check done before stealing/executing a task. 
https://github.com/llvm-mirror/openmp/blob/a31ae2e44a8f91ca20aa03319603829d994a1635/runtime/src/kmp_tasking.cpp#L2721
[2] Acquirement of all the dependency mutexes. 
https://github.com/llvm-mirror/openmp/blob/a31ae2e44a8f91ca20aa03319603829d994a1635/runtime/src/kmp_tasking.cpp#L284
[3] Construction of mutexes for each dependency. 
https://github.com/llvm-mirror/openmp/blob/7e084d9fdffa07d74a4d4769c3681fbfbe86e6f3/runtime/src/kmp_taskdeps.cpp#L237

Reply via email to