Hi Elias,
I believe we should first find out how big the thread dispatch effort
actually is.
Because coalescing can also fir back by creating unequally distributed
intermediate results.
For skalar functions you have a parallel eecution time of:
a + b×⌈N÷P where a = startup time (thread dispa
Oh and one more thing: Have you given any thought to my comments re. the
coalescing of certain functions to reduce thread dispatch effort? (also,
add some more functions to the no-copy optimisation?)
Regards,
Elias
On 11 March 2014 23:22, Elias Mårtenson wrote:
> I agree. I just wanted to poin
Thanks, Jürgen.
I'll try to work up some test cases this week.
In my quick scan of the OpenMP document yesterday, I noted that there
are different strategies for assigning work to threads. As with just
about everything else in OpenMP, the strategy is configurable.
My initial thought for putting
I agree. I just wanted to point out that without a runtime option,
delivering binary versions will be hard, forcing the package maintainers to
choose a default that will surely be wrong for the majority of users.
That said, being able to choose a compile-time value is good too.
Regards,
Elias
O
Hi,
we could do it similar to the LOG macro where you can choose between
more efficient compile-time settings and less efficient run-time settings.
It is important that we do these things properly from the outset to avoid
too many changes later on.
/// Jürgen
On 03/11/2014 04:10 PM, Elias Mår
May I suggest that being able to choose the number of cores at runtime
should actually be the default. Remember that most Linux distributions will
not compile the source on the local machine and instead distributes
binaries.
Having some #ifdefs would be good, and having runtime user-selected (or
a
Hi David,
looks good! Some comments, though.
1 .you could adapt src/testcases/Performance.pt with some longer
skalar functions in order to get some performance figures. You can start
it like this:
./apl -T testcases/Performance.pt
2. I believe we should not bother the user with specifying
p