Re: Give me advice on GSoC OpenMP
On Tue, Apr 5, 2011 at 3:01 PM, Sho Nakatani wrote: > From: Jakub Jelinek > Date: Tue, 5 Apr 2011 08:33:23 +0200 > >> On Tue, Apr 05, 2011 at 11:05:01AM +0900, Sho Nakatani wrote: >>> - When task A encounters "#pragma omp task" derective, worker creates a >>> task >>> and immediately execute it. Worker pushes A to the head of deque. >> >> Immediately starting a freshly created task on #pragma omp task is nice for >> cache >> locality, ... > > Yes. Also, Lazy Task Creation has another good aspect. > Look at the pictures below. Both shows the call-tree of fibonacci(8). > One is compiled by gcc and the other is by icc. > > (GCC) > https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-gcc.png > (ICC) > https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-icc.png > > Each node with the same color represents the task which is processed on the > same > thread (or core). > These pictures show that the program compiled by icc, which might use Lazy > Task Creation, > executes nodes in subtree on the same core, while that compiled by gcc > doesn't. > Of course, fib(8) is doesn't clearly show if Lazy Task Creation really has > good effect > for the programs that need sophisticated parallelization. But other > researches like > https://iwomp.zih.tu-dresden.de/downloads/runtime-olivier.pdf > also point out it. > > My teacher and senior associates know good about Lazy Task Creation and I > also learned it > by some Japanese paper. > My teacher recommended me a paper related to Lazy Task Creation (in English). > Please google the query and take a look at it if you have a time. > "Lazy Task Creation: A Technique for Increasing the Granularity of > Parallel Programs" > > >> Immediately starting a freshly created task on #pragma omp task is nice for >> cache >> locality, except it doesn't work at all for tied tasks which you can't move >> to other >> threads. For tied tasks (and GCC currently has all tasks tied) it serializes >> everything. > > Then, how about implementing `untied task' as a GSoC project? > If it goes successful, I'll tackle better `tied task' also. > What do you think about this idea? > I'd like to write my proposal tomorrow :-) I think this could indeed be quite interesting, and a much needed improvement to GCC's OpenMP runtime. The overall idea behind Lazy Task Creation looks good. However, I would recommend starting with bibliographical work. I'm certain this solution could be an improvement over the current state, but it would be nice to know how it compares to other techniques, and also if there are slowdowns in less dynamic and unbalanced cases. The work I'm aware of in this area is "Evaluation of OpenMP task scheduling strategies" by Duran et al (http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf). It could be a reasonable starting point. You may also want to make sure the scheduling policy it enforces cannot interfere with OpenMP semantics. For example, how would the lazy task creation behave if a worker steals a bottom-of-the-stack task that has a taskwait and the tasks bound to this synchronization are left in the other worker's dequeue? There can be deadlock issues or just plain performance issues that can arise. I can't see any obvious issues, but it's worth thinking about. I believe the GSoC idea for working on the libGOMP scheduler is great! Antoniu
Re: Give me advice on GSoC OpenMP
On Tue, Apr 5, 2011 at 4:37 PM, Jakub Jelinek wrote: > On Tue, Apr 05, 2011 at 03:45:26PM +0200, Antoniu Pop wrote: >> On Tue, Apr 5, 2011 at 3:01 PM, Sho Nakatani wrote: >> > From: Jakub Jelinek >> > Date: Tue, 5 Apr 2011 08:33:23 +0200 >> > Yes. Also, Lazy Task Creation has another good aspect. >> > Look at the pictures below. Both shows the call-tree of fibonacci(8). >> > One is compiled by gcc and the other is by icc. >> > >> > (GCC) >> > https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-gcc.png >> > (ICC) >> > https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-icc.png > > As your testcase doesn't have untied keywords, I doubt ICC uses such > scheduling > for it. > See also the paper Antoniu referenced, e.g. figure 7 in there. > > If you want to do something about task scheduling as a GSoC project, > you are certainly welcome, but starting with the idea that > you want to implement Lazy Task Creation is probably not a good idea, > you want to read a bunch of papers, download Mercurium project > with its runtime OpenMP library, read it, see what they actually implement > and how and if they have multiple strategies still implemented in there, > see how they perform on the various OpenMP task testcases (we have some > in libgomp testsuite, but the authors of Mercurium also have some testcases > (called BOTS)), then try to implement some changes into libgomp/ and see how > it performs on various testcases with various parameters. Indeed, the Lazy Task Creation scheme is 20 years old, which does not mean it isn't good, but it's very likely that much more has been done in that area. >> You may also want to make sure the scheduling policy it enforces >> cannot interfere with OpenMP semantics. For example, how would the >> lazy task creation behave if a worker steals a bottom-of-the-stack >> task that has a taskwait and the tasks bound to this synchronization >> are left in the other worker's dequeue? There can be deadlock issues >> or just plain performance issues that can arise. I can't see any >> obvious issues, but it's worth thinking about. > > Tied tasks have quite strict rules on what can be actually scheduled, > a new task to be scheduled must be descendant of all the tasks tied to current > thread that aren't waiting in a barrier. Yes, but I don't think Sho means to implement that for tied tasks. I agree that respecting the standard for tied tasks would preclude using this scheduling model. Antoniu
Re: Give me advice on GSoC OpenMP
> Thanks a lot for your kind advice. > Then my latest plan is: > - Learn other implementations (as Antoniu said) > - Learn Mercurium implementation > - Implement the same/similar feature as Mercurium in libgomp/ , > then evaluate it > (This is the goal for GSoC project) > > However, I currently know nothing about Mercurium. > (Of course I'll read the documentation of Mercurium from now on) > Does Mercurium implement both `tied task' and `untied task'? As far as I know (I have not verified the source) they do. > Should my goal be implementing both of them? That would be great, though probably not an easy feat. Maybe just start with tied for now and see if you have enough time. Antoniu
Re: Give me advice on GSoC OpenMP
>> >> As far as I know (I have not verified the source) they do. > > That's important information. > If you remember the source, please tell me about it. Sorry, I meant I didn't verify the "source code" :) Jakub is right though, the paper clearly presents performance results on tied vs. untied, so this confirms they have both. >>> Should my goal be implementing both of them? >> >> That would be great, though probably not an easy feat. Maybe just >> start with tied for now and see if you have enough time. > > Yeah. Although it is difficult to implement both tied and untied task, > I'll continue to try them after GSoC term finish. Great to hear that. I hope you get the GSoC! Antoniu
Re: OpenMP 3.0 libgomp ABI documentation for TASK construct
Hi Tim, > From gcc online docs (http://gcc.gnu.org/onlinedocs/libgomp/), I found > documentations for most of OpenMP constructs, except one very > important construct TASK. I cannot answer this. It may be that the documentation was written before tasks (which were introduced in a latter version of OpenMP) were added to GCC OpenMP. > > I don't know why it is missing, but I really > need to find out how TASK get transformed into GOMP_* routines. I > posted this question before, but haven't got a reply yet. I am very > grateful if someone can point me to the right track. For example, is > there a pretty-print feature of the AST after processing the OpenMP > pragmas? If so, how? Of course, if someone have the expertise, and > directly show me how TASK get transformed, it would be even better! For the pretty print, just use -fdump-tree-ompexp-all on the compile line and look for the file *.c.*ompexp that is generated. It contains the dump just after OpenMP expansion. If you need to check out the code generation routines by yourself, take a look at gcc/omp-low.c Most of everything happens there. There are two passes, OpenMP lowerring then expansion. The generation of the GOMP_* routine calls happens during expansion pass and so you should start from the "expand_omp_taskreg" function (in gcc/omp-low.c). Best, Antoniu
Re: Automatic Parallelization & Graphite - future plans
> I'd like to explore distributing threads across a heterogenous NUMA > architecture. I.e. input/output data would have to be transferred > explicitly, and the compiler would have to have more than one backend. I'm currently working on something that looks quite similar, in the "streamization" branch. The gist is that tasks (or threads), with no access to shared memory, communicate through streams (or input/output channels). I'm using OpenMP annotations to help in the analysis, but they are not a requirement. > Would such work be appropriate for an existing branch, or should I better > work on my own branch for that? The multiple backends compilation is not directly related, so you should use a separate branch. It makes sense to go in that direction. > And do the current autoparallelization algorithms find or propagate > sufficent > alias information ((not always, obviously, but at least sometimes) to > determine > if offloading a job to another processor with separate memories is safe and > likely to be worthwhile? For the safety, what matters is that no data dependences are violated. Alias analysis will be used to determine whether such dependences exist. The analysis will not be able to always tell you yes or no for the presence of such dependences, but it's conservative, so if it says there are none, then you're safe. If the code is nasty, it will probably just decide that it clobbers memory and reject it. For the worthwhile part, it depends on many things ... the communication latencies and bandwidths, each node's computational capabilities, the task or thread's workload (or rather the arithmetic intensity) ... I would tend to believe that this is not available and it would probably be a most interesting addition. Antoniu
Re: OpenMP to GPGPU langauges
Hi, > I would like to do backend work to convert OpenMP to GPGPU > langauages(Brook+). I'm not sure this would be best suited for backend work. The OpenMP pragma expansion is done very early and the decision to offload parts of the computation to the GPGPU would probably need to be taken before the expansion happens. > could you please send me pointers to documents which explains the source > code details for OpenMP backend code generation? AFAIK there is no specific backend code generation for OpenMP, nor the possibility to compile for both the CPU and GPGPU in one go. If you want to take a look at the OpenMP code generation, the best way is to take a look at the code in gcc/omp-low.c > As side point, I am very new to GCC development. Welcome then :) Antoniu
Re: OpenMP to GPGPU langauges
> [Naganna] I would like to know all phases of GCC OpenMP(Paser level, > intermediate reprsentation, code generation and OpenMP runtime librariy) > > could you please point documents which explain work flow of GOMP? The GOMP project page: http://gcc.gnu.org/projects/gomp/ The only article AFAIK on the topic: http://www.airs.com/dnovillo/Papers/gcc2006.pdf Antoniu
Re: GCC and OpenMP
Hi, > I am currently working on installing OpenMP(2.5v or 3.0v specification) > on my linux machine (GCC 4.1.2 SuSE10.2). It requires at least GCC 4.3 > version. It seems > that I need to upgrade to GCC4.3.1 or GCC 4.4 from my current version of > GCC 4.1.2. Which GCC version do you suggest in order to use OpenMP on my > linux machine? For the time being OpenMP 2.5 is supported in GCC 4.3, so that's what I'd recommend. If you need OpenMP 3.0 features, you need to wait for GCC 4.4, otherwise GCC 4.3.1 would be the best choice. Best, Antoniu