On 14 June 2016 at 13:56, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
On Tuesday, June 14, 2016, MM <finjulh...@gmail.com> wrote:
>
> Hello,
> I have the following 3 1-socket nodes:
>
> node1:  4GB RAM 2-core: rank 0  rank 1
> node2:  4GB RAM 4-core: rank 2  rank 3 rank 4 rank 5
> node3:  8GB RAM 4-core: rank 6  rank 7 rank 8 rank 9
>
> I have a model that takes a input and produces a output, and I want to run 
> this model for N possible combinations of inputs. N is very big and i am 
> limited by memory capacity.
>
> I am using the world collective and I want to know how to distribute N over 
> the 10 ranks, given the mem specs of each node.
>
> For now, i have been simply dividing N by the number of ranks and 
> scatter/gather that way.
> How can I improve without hardcoding the specs in my own c++ code?
>
> thanks,
>
> Note if your program is synchronous, it will run at the speed of the slowest 
> task.
> (E.g. Tasks on node 2, 1GB per task, will wait for the other tasks, 2 GB per 
> task)

Yes I think my program is synchronous: I generate the N possibilities
in rank 0, I then want to split in order to achieve the fastest
runtime (assuming the nodes do nothing else on the side).
Well the gather/gatherv are blocking until everything is gathered. So
the overall time is the time of the rank with the slowest compute time
(this depends on the rank's speed, and how much out of N it is
processing, assuming each evaluation takes the same time).

I just realized the core frequency as a major factor and added it now
on top of my initial email:
In my case, node1 cores exhibit 4500 bogomips each, while node2 and
node3 cores exhibit 5300 bogomips.

Surely, I don't want to distribute N equally across the 10 ranks:
1. because of # cores frequencies, assuming I trust the bogomips
measure for e.g., I should gather those measures first (can orted
help?), or I could have them as a config file?
2. The memory available really only impacts when N is big enough that
I would not be able to eval all Ns if I limit myself to 4GB
throughout.

If I have a static config file, from RAM+bogomips i can estimate how
to distribute N myself? But your response hints to doing this
programmatically, which I prefer.

> You can use MPI_Comm_split_type in order to create inter node communicators.
I don't quite understand this function. One starts with the world
communicator including all ranks 0...9, and splits that  into multiple
subcommunicators? only split type appears to be MPI_COMM_TYPE_SHARED.

> Then you can find how much memory is available per task,
How? by reading '/proc/self/statm' on linux?

>MPI_Gather that on the master task, and MPI_Scatterv/MPI_Gatherv to 
>distribute/consolidate the data


Apologies for my scattered comments, my question is not actually
totally clear in my head :-)

Reply via email to