Steve,

I am curious as to how you set the cpu0 activity on the LED0.  Do you have a 
quick read on how you did that?


Thanks,


Spencer


________________________________
From: users <users-boun...@open-mpi.org> on behalf of Steve O'Hara 
<soh...@pivotal-solutions.co.uk>
Sent: Sunday, January 24, 2016 2:39 PM
To: Open MPI Users
Subject: Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM


Hi Gilles,



Yes that's correct - one node with 3 cores is about 1.5 minutes for a 10 second 
simulation, this turns into 4 minutes when I send the job to 36 cores on 9 IP 
connected nodes.



I haven't setup an x86 cluster to do a comparison, I know this would be a lot 
easier than setting up the Pis but to be honest, this is more about figuring 
out the performance characteristics of the technology and the one thing that 
the Pi gives you, is total visibility of each of the components and how they 
perform.



I'll try a different strategy and come back to the list with some results.



No I haven't tried the osu and imb tools, I'll do some reading and try and 
figure it out.



For those that are interested, the attached PDF shows what I'm up to. I'll be 
happy to share the images for both the master and slaves.



Thanks,

Steve







From: Gilles Gouaillardet [mailto:gilles.gouaillar...@gmail.com]
Sent: 24 January 2016 13:26
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM



Steve,



if I understand correctly, running on one node with 4 MPI tasks is three times 
faster than running on 10 nodes with 40 (10 ?) tasks.



did you try this test on a x86 cluster and with tcp interconnect, and did you 
get better performance when increasing the number of nodes ?



can you try to run on the pi cluster with one task per node, and increase the 
number of nodes one step at a time. does the performance improve ?

then you can increase the number of tasks per node and see how it impacts 
performances.



you can also run some standard MPI benchmark (osu, imb) and see if you get the 
performance you expect.



Cheers,



Gilles

On Sunday, January 24, 2016, Steve O'Hara 
<soh...@pivotal-solutions.co.uk<mailto:soh...@pivotal-solutions.co.uk>> wrote:

Hi,



I'm afraid I'm pretty new to both OpenFOAM and openMPI so please excuse me if 
my questions are either stupid or badly framed.



I've created a 10 Raspberry pi beowulf cluster for testing out MPI concepts and 
see how they are harnessed in OpenFOAM.  After a helluva lot of hassle, I've 
got the thing running using OpneMPI to run a solver in parallel.

The problem I have is that if I switch the server node to not use the cluster 
(still use 3 cores in an MPI job) the job finishes in x minutes. If I tell it 
to use the 9 other members of the cluster, the same job takes x times 3!



This is what I'm doing;



1.       Create a mesh, adjust it with some other OF stuff

2.       Run the process to split the job into processes for each node

3.       Copy the process directories to each of the affected nodes using scp

4.       Run mpirun with a hosts file

5.       Re-constitute the case directory by copying back the processor folders

6.       Re-construct the case



Only step 4 Uses MPI and the other steps have a reasonably linear response time.

Step 4 is characterised by a flurry of network activity, followed by all the 
Pis lighting up with CPU activity followed a long time of no CPU activity but 
huge network action.

It's this last bit that is consuming all the time - is this a tear-down phase 
of MPI?

Each of the Pi nodes is set up as slots=4 max_slots=4



What is all the network activity?  It seems to happen after the solver has 
completed its job so I'm guessing it has to be MPI.

The network interface on the Pi is not a stellar performer so is there anything 
I can do to minimise the network traffic?



Thanks,

Steve




Reply via email to