Note if your program is synchronous, it will run at the speed of the slowest task. (E.g. Tasks on node 2, 1GB per task, will wait for the other tasks, 2 GB per task)
You can use MPI_Comm_split_type in order to create inter node communicators. Then you can find how much memory is available per task, MPI_Gather that on the master task, and MPI_Scatterv/MPI_Gatherv to distribute/consolidate the data Cheers, Gilles On Tuesday, June 14, 2016, MM <finjulh...@gmail.com> wrote: > Hello, > I have the following 3 1-socket nodes: > > node1: 4GB RAM 2-core: rank 0 rank 1 > node2: 4GB RAM 4-core: rank 2 rank 3 rank 4 rank 5 > node3: 8GB RAM 4-core: rank 6 rank 7 rank 8 rank 9 > > I have a model that takes a input and produces a output, and I want to run > this model for N possible combinations of inputs. N is very big and i am > limited by memory capacity. > > I am using the world collective and I want to know how to distribute N > over the 10 ranks, given the mem specs of each node. > > For now, i have been simply dividing N by the number of ranks and > scatter/gather that way. > How can I improve without hardcoding the specs in my own c++ code? > > thanks, > >