Note if your program is synchronous, it will run at the speed of the
slowest task.
(E.g. Tasks on node 2, 1GB per task, will wait for the other tasks, 2 GB
per task)

You can use MPI_Comm_split_type in order to create inter node communicators.
Then you can find how much memory is available per task, MPI_Gather that on
the master task, and MPI_Scatterv/MPI_Gatherv to distribute/consolidate the
data

Cheers,

Gilles

On Tuesday, June 14, 2016, MM <finjulh...@gmail.com> wrote:

> Hello,
> I have the following 3 1-socket nodes:
>
> node1:  4GB RAM 2-core: rank 0  rank 1
> node2:  4GB RAM 4-core: rank 2  rank 3 rank 4 rank 5
> node3:  8GB RAM 4-core: rank 6  rank 7 rank 8 rank 9
>
> I have a model that takes a input and produces a output, and I want to run
> this model for N possible combinations of inputs. N is very big and i am
> limited by memory capacity.
>
> I am using the world collective and I want to know how to distribute N
> over the 10 ranks, given the mem specs of each node.
>
> For now, i have been simply dividing N by the number of ranks and
> scatter/gather that way.
> How can I improve without hardcoding the specs in my own c++ code?
>
> thanks,
>
>

Reply via email to