On Tue, Apr 5, 2011 at 3:01 PM, Sho Nakatani <dev.laysak...@gmail.com> wrote: > From: Jakub Jelinek <ja...@redhat.com> > Date: Tue, 5 Apr 2011 08:33:23 +0200 > >> On Tue, Apr 05, 2011 at 11:05:01AM +0900, Sho Nakatani wrote: >>> - When task A encounters "#pragma omp task" derective, worker creates a >>> task >>> and immediately execute it. Worker pushes A to the head of deque. >> >> Immediately starting a freshly created task on #pragma omp task is nice for >> cache >> locality, ... > > Yes. Also, Lazy Task Creation has another good aspect. > Look at the pictures below. Both shows the call-tree of fibonacci(8). > One is compiled by gcc and the other is by icc. > > (GCC) > https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-gcc.png > (ICC) > https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-icc.png > > Each node with the same color represents the task which is processed on the > same > thread (or core). > These pictures show that the program compiled by icc, which might use Lazy > Task Creation, > executes nodes in subtree on the same core, while that compiled by gcc > doesn't. > Of course, fib(8) is doesn't clearly show if Lazy Task Creation really has > good effect > for the programs that need sophisticated parallelization. But other > researches like > https://iwomp.zih.tu-dresden.de/downloads/runtime-olivier.pdf > also point out it. > > My teacher and senior associates know good about Lazy Task Creation and I > also learned it > by some Japanese paper. > My teacher recommended me a paper related to Lazy Task Creation (in English). > Please google the query and take a look at it if you have a time. > "Lazy Task Creation: A Technique for Increasing the Granularity of > Parallel Programs" > > >> Immediately starting a freshly created task on #pragma omp task is nice for >> cache >> locality, except it doesn't work at all for tied tasks which you can't move >> to other >> threads. For tied tasks (and GCC currently has all tasks tied) it serializes >> everything. > > Then, how about implementing `untied task' as a GSoC project? > If it goes successful, I'll tackle better `tied task' also. > What do you think about this idea? > I'd like to write my proposal tomorrow :-)
I think this could indeed be quite interesting, and a much needed improvement to GCC's OpenMP runtime. The overall idea behind Lazy Task Creation looks good. However, I would recommend starting with bibliographical work. I'm certain this solution could be an improvement over the current state, but it would be nice to know how it compares to other techniques, and also if there are slowdowns in less dynamic and unbalanced cases. The work I'm aware of in this area is "Evaluation of OpenMP task scheduling strategies" by Duran et al (http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf). It could be a reasonable starting point. You may also want to make sure the scheduling policy it enforces cannot interfere with OpenMP semantics. For example, how would the lazy task creation behave if a worker steals a bottom-of-the-stack task that has a taskwait and the tasks bound to this synchronization are left in the other worker's dequeue? There can be deadlock issues or just plain performance issues that can arise. I can't see any obvious issues, but it's worth thinking about. I believe the GSoC idea for working on the libGOMP scheduler is great! Antoniu