On 03/27/2014 10:19 AM, Andreas Schäfer wrote:
On 14:26 Wed 26 Mar     , Ross Boylan wrote:
[Main part is at the bottom]
On Wed, 2014-03-26 at 19:28 +0100, Andreas Schäfer wrote:
If you have a complex workflow with varying computational loads, then
you might want to take a look at runtime systems which allow you to
express this directly through their API, e.g. HPX[1]. HPX has proven to
run with high efficiency on a wide range of architectures, and with a
multitude of different workloads.
Thanks for the pointer.

I might add that HPX can run on top of MPI, so you could gradually
migrate code towards it.

Another note which is relevant to this discussion:
In HPX we actually do oversubscribe the nodes. There are worker threads which are supposed to do the actual computations, those are usually pinned to the actual CPU Cores (or hardware threads, depending on your machine and the way you want to do your thread pinning). On those worker threads, we then schedule (very lightweight) user level tasks which run the actual user code. You can have in the order of several million concurrent HPX-Threads (the user level tasks) running in an application per node. In addition to those worker threads, we have dedicated Operating threads (only pinned to a certain socket or NUMA domain), which are responsible for doing the actual communication (This is however completely hidden behind our API, which supports truly asynchronous communication). In the case you have communication running over MPI or directly on top of (native) ibverbs, those threads do a busy wait on the actual sends and receives. The impact on performance is negligible here. But keep in mind that we put quite some effort in there in order to achieve that

Cheers,
Thomas



Cheers
-Andreas

--
Thomas Heller
Friedrich-Alexander-Universität Erlangen-Nürnberg
Department Informatik - Lehrstuhl Rechnerarchitektur
Martensstr. 3
91058 Erlangen
Tel.: 09131/85-27018
Fax:  09131/85-27912
Email: thomas.hel...@cs.fau.de

Reply via email to