On 03/27/2014 10:19 AM, Andreas Schäfer wrote:
On 14:26 Wed 26 Mar , Ross Boylan wrote:
[Main part is at the bottom]
On Wed, 2014-03-26 at 19:28 +0100, Andreas Schäfer wrote:
If you have a complex workflow with varying computational loads, then
you might want to take a look at runtime systems which allow you to
express this directly through their API, e.g. HPX[1]. HPX has proven to
run with high efficiency on a wide range of architectures, and with a
multitude of different workloads.
Thanks for the pointer.
I might add that HPX can run on top of MPI, so you could gradually
migrate code towards it.
Another note which is relevant to this discussion:
In HPX we actually do oversubscribe the nodes. There are worker threads
which are supposed to do the actual computations, those are usually
pinned to the actual CPU Cores (or hardware threads, depending on your
machine and the way you want to do your thread pinning). On those worker
threads, we then schedule (very lightweight) user level tasks which run
the actual user code. You can have in the order of several million
concurrent HPX-Threads (the user level tasks) running in an application
per node.
In addition to those worker threads, we have dedicated Operating threads
(only pinned to a certain socket or NUMA domain), which are responsible
for doing the actual communication (This is however completely hidden
behind our API, which supports truly asynchronous communication). In the
case you have communication running over MPI or directly on top of
(native) ibverbs, those threads do a busy wait on the actual sends and
receives. The impact on performance is negligible here. But keep in mind
that we put quite some effort in there in order to achieve that
Cheers,
Thomas
Cheers
-Andreas
--
Thomas Heller
Friedrich-Alexander-Universität Erlangen-Nürnberg
Department Informatik - Lehrstuhl Rechnerarchitektur
Martensstr. 3
91058 Erlangen
Tel.: 09131/85-27018
Fax: 09131/85-27912
Email: thomas.hel...@cs.fau.de