Re: [OMPI users] MPI_THREAD_FUNNELED and enable-mpi-thread-multiple

2013-01-28 Thread Brian Budge
I believe that yes, you have to compile enable-mpi-thread-multiple to get anything other than SINGLE. Brian On Tue, Jan 22, 2013 at 12:56 PM, Roland Schulz wrote: > Hi, > > compiling 1.6.1 or 1.6.2 without enable-mpi-thread-multiple returns from > MPI_Init_thread as provided level MPI_THREAD_S

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread George Bosilca
Based on the paper you linked the answer is quite obvious. The proposed CRFS mechanism supports all of the checkpoint-enabled MPI implementation, thus you just have to go with the one providing and caring about the services you need. George. On Mon, Jan 28, 2013 at 3:46 PM, Maxime Boissonneault

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Hi George, The problem here is not the bandwidth, but the number of IOPs. I wrote to the BLCR list, and they confirmed that : "While ideally the checkpoint would be written in sizable chunks, the current code in BLCR will issue a single write operation for each contiguous range of user memory,

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread George Bosilca
At the scale you address you should have no trouble with the C/R if the file system is correctly configured. We get more bandwidth per node out of an NFS over 1Gb/s at 32 nodes. Have you run some parallel benchmarks on your cluster ? George. PS: You can some MPI I/O benchmarks at http://www.mcs.

Re: [OMPI users] very low performance over infiniband

2013-01-28 Thread John Hearns
Have you run ibstat on every single node and made sure all links are up at the correct speed? Have you checkef the output to make sure that you are not domehow running over ethernet?

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
On Jan 28, 2013, at 10:53 AM, Maxime Boissonneault wrote: > Le 2013-01-28 13:15, Ralph Castain a écrit : >> On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault >> wrote: >> >>> Le 2013-01-28 12:46, Ralph Castain a écrit : On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: >>>

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Le 2013-01-28 13:15, Ralph Castain a écrit : On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault wrote: Le 2013-01-28 12:46, Ralph Castain a écrit : On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: Hello Ralph, I agree that ideally, someone would implement checkpointing in the appl

Re: [OMPI users] very low performance over infiniband

2013-01-28 Thread Shamis, Pavel
Also make sure that processes were not swapped out to a hard drive. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 27, 2013, at 6:39 AM, John Hearns mailto:hear...@googlemail.com>> wrote: 2 percent? Have yo

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault wrote: > Le 2013-01-28 12:46, Ralph Castain a écrit : >> On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault >> wrote: >> >>> Hello Ralph, >>> I agree that ideally, someone would implement checkpointing in the >>> application itself, but that

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Le 2013-01-28 12:46, Ralph Castain a écrit : On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: Hello Ralph, I agree that ideally, someone would implement checkpointing in the application itself, but that is not always possible (commercial applications, use of complicated libraries, a

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: > Hello Ralph, > I agree that ideally, someone would implement checkpointing in the > application itself, but that is not always possible (commercial applications, > use of complicated libraries, algorithms with no clear progression poi

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Hello Ralph, I agree that ideally, someone would implement checkpointing in the application itself, but that is not always possible (commercial applications, use of complicated libraries, algorithms with no clear progression points at which you can interrupt the algorithm and start it back fro

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
Our c/r person has moved on to a different career path, so we may not have anyone who can answer this question. What we can say is that checkpointing at any significant scale will always be a losing proposition. It just takes too long and hammers the file system. People have been working on ext

[OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Hello, I am doing checkpointing tests (with BLCR) with an MPI application compiled with OpenMPI 1.6.3, and I am seeing behaviors that are quite strange. First, some details about the tests : - The only filesystem available on the nodes are 1) one tmpfs, 2) one lustre shared filesystem (tested

Re: [OMPI users] Error when attempting to run LAMMPS on Centos 6.2 with OpenMPI

2013-01-28 Thread #YEO JINGJIE#
I obtained exactly the same error: [NTU-2:24680] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 194 -- It looks like orte_init failed for some reason; your parallel process is likely to abort.