On Tue, Apr 5, 2011 at 3:01 PM, Sho Nakatani <dev.laysak...@gmail.com> wrote:
> From: Jakub Jelinek <ja...@redhat.com>
> Date: Tue, 5 Apr 2011 08:33:23 +0200
>
>> On Tue, Apr 05, 2011 at 11:05:01AM +0900, Sho Nakatani wrote:
>>>   - When task A encounters "#pragma omp task" derective, worker creates a 
>>> task
>>>     and immediately execute it. Worker pushes A to the head of deque.
>>
>> Immediately starting a freshly created task on #pragma omp task is nice for 
>> cache
>> locality, ...
>
> Yes. Also, Lazy Task Creation has another good aspect.
> Look at the pictures below. Both shows the call-tree of fibonacci(8).
> One is compiled by gcc and the other is by icc.
>
> (GCC) 
> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-gcc.png
> (ICC) 
> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-icc.png
>
> Each node with the same color represents the task which is processed on the 
> same
> thread (or core).
> These pictures show that the program compiled by icc, which might use Lazy 
> Task Creation,
> executes nodes in subtree on the same core, while that compiled by gcc 
> doesn't.
> Of course, fib(8) is doesn't clearly show if Lazy Task Creation really has 
> good effect
> for the programs that need sophisticated parallelization. But other 
> researches like
>    https://iwomp.zih.tu-dresden.de/downloads/runtime-olivier.pdf
> also point out it.
>
> My teacher and senior associates know good about Lazy Task Creation and I 
> also learned it
> by some Japanese paper.
> My teacher recommended me a paper related to Lazy Task Creation (in English).
> Please google the query and take a look at it if you have a time.
>    "Lazy Task Creation: A Technique for Increasing the Granularity of 
> Parallel Programs"
>
>
>> Immediately starting a freshly created task on #pragma omp task is nice for 
>> cache
>> locality, except it doesn't work at all for tied tasks which you can't move 
>> to other
>> threads.  For tied tasks (and GCC currently has all tasks tied) it serializes
>> everything.
>
> Then, how about implementing `untied task' as a GSoC project?
> If it goes successful, I'll tackle better `tied task' also.
> What do you think about this idea?
> I'd like to write my proposal tomorrow :-)

I think this could indeed be quite interesting, and a much needed
improvement to GCC's OpenMP runtime. The overall idea behind Lazy Task
Creation looks good.
However, I would recommend starting with bibliographical work. I'm
certain this solution could be an improvement over the current state,
but it would be nice to know how it compares to other techniques, and
also if there are slowdowns in less dynamic and unbalanced cases.
The work I'm aware of in this area is "Evaluation of OpenMP task
scheduling strategies" by Duran et al
(http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf).
It could be a reasonable starting point.

You may also want to make sure the scheduling policy it enforces
cannot interfere with OpenMP semantics. For example, how would the
lazy task creation behave if a worker steals a bottom-of-the-stack
task that has a taskwait and the tasks bound to this synchronization
are left in the other worker's dequeue? There can be deadlock issues
or just plain performance issues that can arise. I can't see any
obvious issues, but it's worth thinking about.

I believe the GSoC idea for working on the libGOMP scheduler is great!

Antoniu

Reply via email to