Antoniu Pop <antoniu....@gmail.com> wrote on 18/03/2009 18:55:52:

> > I'd like to explore distributing threads across a heterogenous NUMA
> > architecture.  I.e. input/output data would have to be transferred
> > explicitly, and the compiler would have to have more than one backend.
> 
> I'm currently working on something that looks quite similar, in the
> "streamization" branch. The gist is that tasks (or threads), with no
> access to shared memory, communicate through streams (or input/output
> channels). I'm using OpenMP annotations to help in the analysis, but
> they are not a requirement.
> 
> > Would such work be appropriate for an existing branch, or should I 
better
> > work on my own branch for that?
> 
> The multiple backends compilation is not directly related, so you
> should use a separate branch. It makes sense to go in that direction.
> 
> > And do the current autoparallelization algorithms find or propagate
> > sufficent
> > alias information ((not always, obviously, but at least sometimes) to
> > determine
> > if offloading a job to another processor with separate memories is 
safe and
> > likely to be worthwhile?
> 
> For the safety, what matters is that no data dependences are violated.
> Alias analysis will be used to determine whether such dependences
> exist.
> The analysis will not be able to always tell you yes or no for the
> presence of such dependences, but it's conservative, so if it says
> there are none, then you're safe. If the code is nasty, it will
> probably just decide that it clobbers memory and reject it.

Yes. The current automatic parallelization in GCC allows distributing 
iterations
to different threads only if there are no dependences in the loop, except 
for reduction 
dependencies.

> 
> For the worthwhile part, it depends on many things ... the
> communication latencies and bandwidths, each node's computational
> capabilities, the task or thread's workload (or rather the arithmetic
> intensity) ...  I would tend to believe that this is not available and
> it would probably be a most interesting addition.

Yes, definitely.
In the current implementation the profitablity part is very naiive:
a loop is parallelized if it is hot enough according to profiling 
information, and 
if there is a large enough number of iterations to create new threads.
The number of threads is determined by the user, and not by any heuristic.

Razya

> 
> Antoniu

Reply via email to