Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

2016-01-24 Thread John Hearns
Hi Steve.
Regarding Step 3, have you thought of using some shared storage?
NFS shared drive perhaps, or there are many alternatives!

On 23 January 2016 at 20:47, Steve O'Hara 
wrote:

> Hi,
>
>
>
> I’m afraid I’m pretty new to both OpenFOAM and openMPI so please excuse me
> if my questions are either stupid or badly framed.
>
>
>
> I’ve created a 10 Raspberry pi beowulf cluster for testing out MPI
> concepts and see how they are harnessed in OpenFOAM.  After a helluva lot
> of hassle, I’ve got the thing running using OpneMPI to run a solver in
> parallel.
>
> The problem I have is that if I switch the server node to not use the
> cluster (still use 3 cores in an MPI job) the job finishes in x minutes. If
> I tell it to use the 9 other members of the cluster, the same job takes x
> times 3!
>
>
>
> This is what I’m doing;
>
>
>
> 1.   Create a mesh, adjust it with some other OF stuff
>
> 2.   Run the process to split the job into processes for each node
>
> 3.   Copy the process directories to each of the affected nodes using
> scp
>
> 4.   Run mpirun with a hosts file
>
> 5.   Re-constitute the case directory by copying back the processor
> folders
>
> 6.   Re-construct the case
>
>
>
> Only step 4 Uses MPI and the other steps have a reasonably linear response
> time.
>
> Step 4 is characterised by a flurry of network activity, followed by all
> the Pis lighting up with CPU activity followed a long time of no CPU
> activity but huge network action.
>
> It’s this last bit that is consuming all the time – is this a tear-down
> phase of MPI?
>
> Each of the Pi nodes is set up as slots=4 max_slots=4
>
>
>
> What is all the network activity?  It seems to happen after the solver has
> completed its job so I’m guessing it has to be MPI.
>
> The network interface on the Pi is not a stellar performer so is there
> anything I can do to minimise the network traffic?
>
>
>
> Thanks,
>
> Steve
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28349.php
>


Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

2016-01-24 Thread Gilles Gouaillardet
Steve,

if I understand correctly, running on one node with 4 MPI tasks is three
times faster than running on 10 nodes with 40 (10 ?) tasks.

did you try this test on a x86 cluster and with tcp interconnect, and
did you get better performance when increasing the number of nodes ?

can you try to run on the pi cluster with one task per node, and increase
the number of nodes one step at a time. does the performance improve ?
then you can increase the number of tasks per node and see how it impacts
performances.

you can also run some standard MPI benchmark (osu, imb) and see if you get
the performance you expect.

Cheers,

Gilles

On Sunday, January 24, 2016, Steve O'Hara 
wrote:

> Hi,
>
>
>
> I’m afraid I’m pretty new to both OpenFOAM and openMPI so please excuse me
> if my questions are either stupid or badly framed.
>
>
>
> I’ve created a 10 Raspberry pi beowulf cluster for testing out MPI
> concepts and see how they are harnessed in OpenFOAM.  After a helluva lot
> of hassle, I’ve got the thing running using OpneMPI to run a solver in
> parallel.
>
> The problem I have is that if I switch the server node to not use the
> cluster (still use 3 cores in an MPI job) the job finishes in x minutes. If
> I tell it to use the 9 other members of the cluster, the same job takes x
> times 3!
>
>
>
> This is what I’m doing;
>
>
>
> 1.   Create a mesh, adjust it with some other OF stuff
>
> 2.   Run the process to split the job into processes for each node
>
> 3.   Copy the process directories to each of the affected nodes using
> scp
>
> 4.   Run mpirun with a hosts file
>
> 5.   Re-constitute the case directory by copying back the processor
> folders
>
> 6.   Re-construct the case
>
>
>
> Only step 4 Uses MPI and the other steps have a reasonably linear response
> time.
>
> Step 4 is characterised by a flurry of network activity, followed by all
> the Pis lighting up with CPU activity followed a long time of no CPU
> activity but huge network action.
>
> It’s this last bit that is consuming all the time – is this a tear-down
> phase of MPI?
>
> Each of the Pi nodes is set up as slots=4 max_slots=4
>
>
>
> What is all the network activity?  It seems to happen after the solver has
> completed its job so I’m guessing it has to be MPI.
>
> The network interface on the Pi is not a stellar performer so is there
> anything I can do to minimise the network traffic?
>
>
>
> Thanks,
>
> Steve
>
>
>
>
>


[OMPI users] how to benchmark a server with openmpi?

2016-01-24 Thread Ibrahim Ikhlawi


Hallo,

I am working on a server and run java codes with OpenMPI. I want to know which 
number of process is the fastest to run my code with?
For this reason I wrote a code that multiply two matrices but the differences 
between the results is not significant. 

Therefore I want to know how can I benchmark my server? Or is there any example 
which I can run many times on a different number of processes, so that I can 
see which number of process is the best?

Or is there examples with results which I can compare with my results?

Thanx in advance
Ibrahim
  

Re: [OMPI users] how to benchmark a server with openmpi?

2016-01-24 Thread Nick Papior
*All* codes scale differently.
So you should do these tests with your own code, and not a different code
(such as MM).


2016-01-24 15:38 GMT+01:00 Ibrahim Ikhlawi :

>
>
> Hallo,
>
> I am working on a server and run java codes with OpenMPI. I want to know
> which number of process is the fastest to run my code with?
> For this reason I wrote a code that multiply two matrices but the
> differences between the results is not significant.
>
> Therefore I want to know how can I benchmark my server? Or is there any
> example which I can run many times on a different number of processes, so
> that I can see which number of process is the best?
>
> Or is there examples with results which I can compare with my results?
>
> Thanx in advance
> Ibrahim
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28352.php
>



-- 
Kind regards Nick


Re: [OMPI users] how to benchmark a server with openmpi?

2016-01-24 Thread Saliya Ekanayake
We've looked at performance in detail with regard to OpenMPI Java for large
scale real data analytics. Here's a paper still in submission that
identifies 5 rules you'd find useful to get good performance. It talks
about how the number of processes affect performance as well. Tests were
done up 3072 cores on a large Intel Haswell HPC cluster

https://www.researchgate.net/publication/291695527_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters

Thank you,
Saliya

On Sun, Jan 24, 2016 at 9:41 AM, Nick Papior  wrote:

> *All* codes scale differently.
> So you should do these tests with your own code, and not a different code
> (such as MM).
>
>
> 2016-01-24 15:38 GMT+01:00 Ibrahim Ikhlawi :
>
>>
>>
>> Hallo,
>>
>> I am working on a server and run java codes with OpenMPI. I want to know
>> which number of process is the fastest to run my code with?
>> For this reason I wrote a code that multiply two matrices but the
>> differences between the results is not significant.
>>
>> Therefore I want to know how can I benchmark my server? Or is there any
>> example which I can run many times on a different number of processes, so
>> that I can see which number of process is the best?
>>
>> Or is there examples with results which I can compare with my results?
>>
>> Thanx in advance
>> Ibrahim
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/01/28352.php
>>
>
>
>
> --
> Kind regards Nick
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28353.php
>



-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell 812-391-4914
http://saliya.org


Re: [OMPI users] how to benchmark a server with openmpi?

2016-01-24 Thread Ibrahim Ikhlawi
Thanks for reply.

But I want to have an imagination about the behaviour of my server. Therefore I 
need an Code which I can run it on my server.
Could anyone give me an example for any code? My last code was a simple example 
and not enough to get an imagination.

Thanx in advance
List-Post: users@lists.open-mpi.org
Date: Sun, 24 Jan 2016 10:21:07 -0500
From: esal...@gmail.com
To: us...@open-mpi.org
Subject: Re: [OMPI users] how to benchmark a server with openmpi?

We've looked at performance in detail with regard to OpenMPI Java for large 
scale real data analytics. Here's a paper still in submission that identifies 5 
rules you'd find useful to get good performance. It talks about how the number 
of processes affect performance as well. Tests were done up 3072 cores on a 
large Intel Haswell HPC cluster
https://www.researchgate.net/publication/291695527_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters

Thank you,Saliya
On Sun, Jan 24, 2016 at 9:41 AM, Nick Papior  wrote:
*All* codes scale differently. So you should do these tests with your own code, 
and not a different code (such as MM).

2016-01-24 15:38 GMT+01:00 Ibrahim Ikhlawi :





Hallo,

I am working on a server and run java codes with OpenMPI. I want to know which 
number of process is the fastest to run my code with?
For this reason I wrote a code that multiply two matrices but the differences 
between the results is not significant. 

Therefore I want to know how can I benchmark my server? Or is there any example 
which I can run many times on a different number of processes, so that I can 
see which number of process is the best?

Or is there examples with results which I can compare with my results?

Thanx in advance
Ibrahim
  

___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/01/28352.php


-- 
Kind regards Nick


___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/01/28353.php


-- 
Saliya EkanayakePh.D. Candidate | Research AssistantSchool of Informatics and 
Computing | Digital Science CenterIndiana University, Bloomington
Cell 812-391-4914http://saliya.org


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/01/28354.php 
   

Re: [OMPI users] how to benchmark a server with openmpi?

2016-01-24 Thread Saliya Ekanayake
The code for the paper with details is in GitHub
https://github.com/DSC-SPIDAL/damds

On Sun, Jan 24, 2016 at 2:51 PM, Ibrahim Ikhlawi 
wrote:

> Thanks for reply.
>
> But I want to have an imagination about the behaviour of my server.
> Therefore I need an Code which I can run it on my server.
> Could anyone give me an example for any code? My last code was a simple
> example and not enough to get an imagination.
>
> Thanx in advance
> --
> Date: Sun, 24 Jan 2016 10:21:07 -0500
> From: esal...@gmail.com
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] how to benchmark a server with openmpi?
>
>
> We've looked at performance in detail with regard to OpenMPI Java for
> large scale real data analytics. Here's a paper still in submission that
> identifies 5 rules you'd find useful to get good performance. It talks
> about how the number of processes affect performance as well. Tests were
> done up 3072 cores on a large Intel Haswell HPC cluster
>
>
> https://www.researchgate.net/publication/291695527_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters
>
> Thank you,
> Saliya
>
> On Sun, Jan 24, 2016 at 9:41 AM, Nick Papior  wrote:
>
> *All* codes scale differently.
> So you should do these tests with your own code, and not a different code
> (such as MM).
>
>
> 2016-01-24 15:38 GMT+01:00 Ibrahim Ikhlawi :
>
>
>
> Hallo,
>
> I am working on a server and run java codes with OpenMPI. I want to know
> which number of process is the fastest to run my code with?
> For this reason I wrote a code that multiply two matrices but the
> differences between the results is not significant.
>
> Therefore I want to know how can I benchmark my server? Or is there any
> example which I can run many times on a different number of processes, so
> that I can see which number of process is the best?
>
> Or is there examples with results which I can compare with my results?
>
> Thanx in advance
> Ibrahim
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28352.php
>
>
>
>
> --
> Kind regards Nick
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28353.php
>
>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org
>
> ___ users mailing list
> us...@open-mpi.org Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28354.php
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28355.php
>



-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell 812-391-4914
http://saliya.org


Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

2016-01-24 Thread Steve O'Hara
Hi,

Yes I did, in fact the master node is an NFS server and the other 9 nodes 
auto-mount it on startup.  The NFS partition on the master is also a RAMdisk.
The initial method was to create the OpenFOAM case in the NFS partition and 
then just point the nodes at it using wdir on the mpirun command line. It 
worked, but wow that lit up the network switch!
So I changed the method so that it actually copied the case process folders to 
each node prior to running mpirun.  It improved things, but still enormous 
amounts of IP traffic.

From: John Hearns [mailto:hear...@googlemail.com]
Sent: 24 January 2016 09:28
To: Open MPI Users 
Subject: Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

Hi Steve.
Regarding Step 3, have you thought of using some shared storage?
NFS shared drive perhaps, or there are many alternatives!

On 23 January 2016 at 20:47, Steve O'Hara 
mailto:soh...@pivotal-solutions.co.uk>> wrote:
Hi,

I’m afraid I’m pretty new to both OpenFOAM and openMPI so please excuse me if 
my questions are either stupid or badly framed.

I’ve created a 10 Raspberry pi beowulf cluster for testing out MPI concepts and 
see how they are harnessed in OpenFOAM.  After a helluva lot of hassle, I’ve 
got the thing running using OpneMPI to run a solver in parallel.
The problem I have is that if I switch the server node to not use the cluster 
(still use 3 cores in an MPI job) the job finishes in x minutes. If I tell it 
to use the 9 other members of the cluster, the same job takes x times 3!

This is what I’m doing;


1.   Create a mesh, adjust it with some other OF stuff

2.   Run the process to split the job into processes for each node

3.   Copy the process directories to each of the affected nodes using scp

4.   Run mpirun with a hosts file

5.   Re-constitute the case directory by copying back the processor folders

6.   Re-construct the case

Only step 4 Uses MPI and the other steps have a reasonably linear response time.
Step 4 is characterised by a flurry of network activity, followed by all the 
Pis lighting up with CPU activity followed a long time of no CPU activity but 
huge network action.
It’s this last bit that is consuming all the time – is this a tear-down phase 
of MPI?
Each of the Pi nodes is set up as slots=4 max_slots=4

What is all the network activity?  It seems to happen after the solver has 
completed its job so I’m guessing it has to be MPI.
The network interface on the Pi is not a stellar performer so is there anything 
I can do to minimise the network traffic?

Thanks,
Steve



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/01/28349.php



Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

2016-01-24 Thread Steve O'Hara
Hi Gilles,

Yes that’s correct – one node with 3 cores is about 1.5 minutes for a 10 second 
simulation, this turns into 4 minutes when I send the job to 36 cores on 9 IP 
connected nodes.

I haven’t setup an x86 cluster to do a comparison, I know this would be a lot 
easier than setting up the Pis but to be honest, this is more about figuring 
out the performance characteristics of the technology and the one thing that 
the Pi gives you, is total visibility of each of the components and how they 
perform.

I’ll try a different strategy and come back to the list with some results.

No I haven’t tried the osu and imb tools, I’ll do some reading and try and 
figure it out.

For those that are interested, the attached PDF shows what I’m up to. I’ll be 
happy to share the images for both the master and slaves.

Thanks,
Steve



From: Gilles Gouaillardet [mailto:gilles.gouaillar...@gmail.com]
Sent: 24 January 2016 13:26
To: Open MPI Users 
Subject: Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

Steve,

if I understand correctly, running on one node with 4 MPI tasks is three times 
faster than running on 10 nodes with 40 (10 ?) tasks.

did you try this test on a x86 cluster and with tcp interconnect, and did you 
get better performance when increasing the number of nodes ?

can you try to run on the pi cluster with one task per node, and increase the 
number of nodes one step at a time. does the performance improve ?
then you can increase the number of tasks per node and see how it impacts 
performances.

you can also run some standard MPI benchmark (osu, imb) and see if you get the 
performance you expect.

Cheers,

Gilles

On Sunday, January 24, 2016, Steve O'Hara 
mailto:soh...@pivotal-solutions.co.uk>> wrote:
Hi,

I’m afraid I’m pretty new to both OpenFOAM and openMPI so please excuse me if 
my questions are either stupid or badly framed.

I’ve created a 10 Raspberry pi beowulf cluster for testing out MPI concepts and 
see how they are harnessed in OpenFOAM.  After a helluva lot of hassle, I’ve 
got the thing running using OpneMPI to run a solver in parallel.
The problem I have is that if I switch the server node to not use the cluster 
(still use 3 cores in an MPI job) the job finishes in x minutes. If I tell it 
to use the 9 other members of the cluster, the same job takes x times 3!

This is what I’m doing;


1.   Create a mesh, adjust it with some other OF stuff

2.   Run the process to split the job into processes for each node

3.   Copy the process directories to each of the affected nodes using scp

4.   Run mpirun with a hosts file

5.   Re-constitute the case directory by copying back the processor folders

6.   Re-construct the case

Only step 4 Uses MPI and the other steps have a reasonably linear response time.
Step 4 is characterised by a flurry of network activity, followed by all the 
Pis lighting up with CPU activity followed a long time of no CPU activity but 
huge network action.
It’s this last bit that is consuming all the time – is this a tear-down phase 
of MPI?
Each of the Pi nodes is set up as slots=4 max_slots=4

What is all the network activity?  It seems to happen after the solver has 
completed its job so I’m guessing it has to be MPI.
The network interface on the Pi is not a stellar performer so is there anything 
I can do to minimise the network traffic?

Thanks,
Steve




SMARTset OpenFOAM Cluster.pdf
Description: SMARTset OpenFOAM Cluster.pdf


[OMPI users] segmentation fault with java MPI

2016-01-24 Thread Marko Blatzheim

Hi,

I want to load a saved object using java mpi. Without MPI there is no 
problem in reading the file and casting it to the correct type. I tried 
to open the file as a byte array and convert this to an object. I 
checked that all bytes are read correctly. Here I have an example where 
the saved file is a Serializable double array and I tried files with a 
size up to 120MB without a problem. When trying the same with an 
ArrayList, a few kb file size leads to segmentation faults.



// creating the file

int num = 100;
Random r = new Random(1234);
ArrayList obj0 = new ArrayList<>(num);
double[] obj1 = new double[num];
for (int j = 0; j < num; j++) {
double d = r.nextGaussian();
obj0.add(d);
obj1[j] = d;
}
obj0.trimToSize();



// trying to read the file

String filename = "testfile";
byte[] readbuf;

File myfile = new File(MPI.COMM_SELF, filename, MPI.MODE_RDONLY);
int filesize = (int) myfile.getSize();
readbuf = new byte[filesize];

byte[] copyarray = new byte[filesize];
Status status = myfile.read(readbuf, filesize, MPI.BYTE);
Object test = null;
if (myrank == 0) {
ByteArrayInputStream in = new ByteArrayInputStream(readbuf);
ObjectInputStream is = new ObjectInputStream(in);
test = is.readObject(); // This line causes a segmnentation 
fault

}


Thanks for your help
Marko


Re: [OMPI users] segmentation fault with java MPI

2016-01-24 Thread Gilles Gouaillardet

Marko,

i wrote a test program based on your code snippet and it works for me.

could you please :
- post a standalone test case that is ready to be compiled and ran
- which version of OpenMPI are you using ?
- which JVM are you using ? (vendor and version)
- post your full command line

Cheers,

Gilles

On 1/25/2016 8:23 AM, Marko Blatzheim wrote:

String filename = "testfile";
byte[] readbuf;

File myfile = new File(MPI.COMM_SELF, filename, MPI.MODE_RDONLY);
int filesize = (int) myfile.getSize();
readbuf = new byte[filesize];

byte[] copyarray = new byte[filesize];
Status status = myfile.read(readbuf, filesize, MPI.BYTE);
Object test = null;
if (myrank == 0) {
ByteArrayInputStream in = new ByteArrayInputStream(readbuf);
ObjectInputStream is = new ObjectInputStream(in);
test = is.readObject(); // This line causes a 
segmnentation fault
}