Dear Ralph,
I am copying your email from the web site because I had enabled the
option to receive all the emails once per day
On 11/04/2012 05:27 PM, George Markomanolis wrote:
> Dear all,
>
> I am trying to execute an experiment by oversubscribing the nodes. So I
have available some clusters (I can use up to 8-10 different clusters
during one execution) and I have totally around to 1300 cores. I am
executing the EP benchmark from the NAS suite which means that there
are not a lot of MPI messages, just some collective MPI calls.
>
> The number of the MPI processes per node, depends on the available
memory of each node. Thus in the machinefile I have declared one node
13 times if I want 13 MPI processes on it. Is that correct?
You *can* do it that way, or you could just use "slots=13" for that
node in the file, and list it only once.
OK, but I assume the result is the same, right?
> Giving a machinefile of 32768 nodes when I want to execute 32768 processes, does OpenMPI
behave like there is no oversubscribing?
Yes, it should - I assume you mean "slots" and not "nodes" in the
above statement, since you indicate that you listed each node multiple
times to set the number of slots on that node.
Yes, I mean slots.
> If yes how can I give a machinefile where there is different number of MPI processes on each
node? The maximum number of MPI processes that I have in a node is 388.
Just assign the number of slots on each node to be the number of
processes you want on that node
OK
>
> My problem is that I can execute 16384 processes but not 32768. In
the first case I need around to 3 minutes for the execution but in the
second case, even after 7 hours the benchmark does not even start.
There is no error, I am just cancelling the job by myself but I am
assuming that something is wrong because 7 hours it is too much. I
have to say that I executed the instance of 16384 processes without
any problem. I added some debug info in the benchmark and I can see
that the execution is delayed during MPI_Init, it never passes this
point. For the instance of 16384 processes I need around to 2 minutes
to finish the MPI_Init call. I am checking the memory of all the nodes
and there is at least 0.5GB free memory on each node.
>
> I know about the parameter mpi_yield_when_idle but I have read that if
there are not a lot of MPI messages will not improve the performance.
I tried though and nothing changed. I tried also the
mpi_preconnect_mpi just in case but again nothing. Could you please
indicate a reason why is this happening?
You indicated that these jobs are actually spanning multiple clusters
- true? If so, when you cross that 16384 boundary, do you also cross
clusters? Is it possible one or more of the additional clusters is
blocking communications?
I have tried both configurations even using exactly the same nodes with
less MPI processes per node in order to check if a site is blocking the
rest ones and I have tried the half machinefile for the instance of
16384 in order to see if there is any issue by using so many MPI
processes per node. Both were executed fine with the instance of 16384
MPI processes. Also I tried to combine different quarters of the
machinefile in order to check if there is any issue with the combination
of specific sites and again I didn't have a problem.
>
> Moreover I used just one node with 48GB memory in order to execute
2048 MPI processes without any problem, of course I just had to wait a
lot.
>
> I am using OpenMPI v1.4.1 and all the clusters are 64 bit.
>
> I execute the benchmark with the following command:
> mpirun --mca pml ob1 --mca btl tcp,self --mca btl_tcp_if_exclude
ib0,lo,myri0 -machinefile machines -np 32768 ep.D.32768
You could just leave off the "-np N" part of the command line - we'll
assign one process to every slot specified in the machinefile.
OK, nice
Best regards,
George Markomanolis
>
> Best regards,
> George Markomanolis
> _______________________________________________
> users mailing list
> users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users