Steve,

if I understand correctly, running on one node with 4 MPI tasks is three
times faster than running on 10 nodes with 40 (10 ?) tasks.

did you try this test on a x86 cluster and with tcp interconnect, and
did you get better performance when increasing the number of nodes ?

can you try to run on the pi cluster with one task per node, and increase
the number of nodes one step at a time. does the performance improve ?
then you can increase the number of tasks per node and see how it impacts
performances.

you can also run some standard MPI benchmark (osu, imb) and see if you get
the performance you expect.

Cheers,

Gilles

On Sunday, January 24, 2016, Steve O'Hara <soh...@pivotal-solutions.co.uk>
wrote:

> Hi,
>
>
>
> I’m afraid I’m pretty new to both OpenFOAM and openMPI so please excuse me
> if my questions are either stupid or badly framed.
>
>
>
> I’ve created a 10 Raspberry pi beowulf cluster for testing out MPI
> concepts and see how they are harnessed in OpenFOAM.  After a helluva lot
> of hassle, I’ve got the thing running using OpneMPI to run a solver in
> parallel.
>
> The problem I have is that if I switch the server node to not use the
> cluster (still use 3 cores in an MPI job) the job finishes in x minutes. If
> I tell it to use the 9 other members of the cluster, the same job takes x
> times 3!
>
>
>
> This is what I’m doing;
>
>
>
> 1.       Create a mesh, adjust it with some other OF stuff
>
> 2.       Run the process to split the job into processes for each node
>
> 3.       Copy the process directories to each of the affected nodes using
> scp
>
> 4.       Run mpirun with a hosts file
>
> 5.       Re-constitute the case directory by copying back the processor
> folders
>
> 6.       Re-construct the case
>
>
>
> Only step 4 Uses MPI and the other steps have a reasonably linear response
> time.
>
> Step 4 is characterised by a flurry of network activity, followed by all
> the Pis lighting up with CPU activity followed a long time of no CPU
> activity but huge network action.
>
> It’s this last bit that is consuming all the time – is this a tear-down
> phase of MPI?
>
> Each of the Pi nodes is set up as slots=4 max_slots=4
>
>
>
> What is all the network activity?  It seems to happen after the solver has
> completed its job so I’m guessing it has to be MPI.
>
> The network interface on the Pi is not a stellar performer so is there
> anything I can do to minimise the network traffic?
>
>
>
> Thanks,
>
> Steve
>
>
>
>
>

Reply via email to