Steve,
I am curious as to how you set the cpu0 activity on the LED0. Do you have a quick read on how you did that? Thanks, Spencer ________________________________ From: users <users-boun...@open-mpi.org> on behalf of Steve O'Hara <soh...@pivotal-solutions.co.uk> Sent: Sunday, January 24, 2016 2:39 PM To: Open MPI Users Subject: Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM Hi Gilles, Yes that's correct - one node with 3 cores is about 1.5 minutes for a 10 second simulation, this turns into 4 minutes when I send the job to 36 cores on 9 IP connected nodes. I haven't setup an x86 cluster to do a comparison, I know this would be a lot easier than setting up the Pis but to be honest, this is more about figuring out the performance characteristics of the technology and the one thing that the Pi gives you, is total visibility of each of the components and how they perform. I'll try a different strategy and come back to the list with some results. No I haven't tried the osu and imb tools, I'll do some reading and try and figure it out. For those that are interested, the attached PDF shows what I'm up to. I'll be happy to share the images for both the master and slaves. Thanks, Steve From: Gilles Gouaillardet [mailto:gilles.gouaillar...@gmail.com] Sent: 24 January 2016 13:26 To: Open MPI Users <us...@open-mpi.org> Subject: Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM Steve, if I understand correctly, running on one node with 4 MPI tasks is three times faster than running on 10 nodes with 40 (10 ?) tasks. did you try this test on a x86 cluster and with tcp interconnect, and did you get better performance when increasing the number of nodes ? can you try to run on the pi cluster with one task per node, and increase the number of nodes one step at a time. does the performance improve ? then you can increase the number of tasks per node and see how it impacts performances. you can also run some standard MPI benchmark (osu, imb) and see if you get the performance you expect. Cheers, Gilles On Sunday, January 24, 2016, Steve O'Hara <soh...@pivotal-solutions.co.uk<mailto:soh...@pivotal-solutions.co.uk>> wrote: Hi, I'm afraid I'm pretty new to both OpenFOAM and openMPI so please excuse me if my questions are either stupid or badly framed. I've created a 10 Raspberry pi beowulf cluster for testing out MPI concepts and see how they are harnessed in OpenFOAM. After a helluva lot of hassle, I've got the thing running using OpneMPI to run a solver in parallel. The problem I have is that if I switch the server node to not use the cluster (still use 3 cores in an MPI job) the job finishes in x minutes. If I tell it to use the 9 other members of the cluster, the same job takes x times 3! This is what I'm doing; 1. Create a mesh, adjust it with some other OF stuff 2. Run the process to split the job into processes for each node 3. Copy the process directories to each of the affected nodes using scp 4. Run mpirun with a hosts file 5. Re-constitute the case directory by copying back the processor folders 6. Re-construct the case Only step 4 Uses MPI and the other steps have a reasonably linear response time. Step 4 is characterised by a flurry of network activity, followed by all the Pis lighting up with CPU activity followed a long time of no CPU activity but huge network action. It's this last bit that is consuming all the time - is this a tear-down phase of MPI? Each of the Pi nodes is set up as slots=4 max_slots=4 What is all the network activity? It seems to happen after the solver has completed its job so I'm guessing it has to be MPI. The network interface on the Pi is not a stellar performer so is there anything I can do to minimise the network traffic? Thanks, Steve