[OMPI users] OpenIB problem: error polling HP CQ...

2008-05-29 Thread Matt Hughes
I have a program which uses MPI::Comm::Spawn to start processes on compute nodes (c0-0, c0-1, etc). The communication between the compute nodes consists of ISend and IRecv pairs, while communication between the compute nodes consists of gather and bcast operations. After executing ~80 successful l

[OMPI users] Help: Program Terminated

2008-05-29 Thread Lee Amy
Hello, I use a bioinformatics software called MicroTar to do some work. But it seems that it dosen't finish well. MicroTar parallel version was terminated after 463 minutes with following error messages: [gnode5:31982] [ 0] /lib64/tls/libpthread.so

[OMPI users] Process size

2008-05-29 Thread Leonardo Fialho
Hi All, I made some tests with a dummy "ping" application. Some memory problems occurred. On these tests I obtained the following results: 1) OpenMPI (without FT): - delaying 1 second to send token to other node: orted and application size stable; - delaying 0 seconds to send token to o

Re: [OMPI users] Process size

2008-05-29 Thread Josh Hursey
Leonardo, You are exactly correct. The CRCP module/component will grow the application size probably for every message that you send or receive. This is because the CRCP component tracks the signature {data_size, tag, communicator, peer} (*not* the contents of the message) of every messag

Re: [OMPI users] Help: Program Terminated

2008-05-29 Thread Andreas Schäfer
Hi Amy, On 16:10 Thu 29 May , Lee Amy wrote: > MicroTar parallel version was terminated after 463 minutes with following > error messages: > > [gnode5:31982] [ 0] /lib64/tls/libpthread.so.0 [0x345460c430] > [gnode5:31982] [ 1] microtar(LocateNuc

Re: [OMPI users] [torqueusers] Job dies randomly, but only through torque

2008-05-29 Thread Jim Kusznir
I have verified that maui is killing the job. I actually ran into this with another user all of a sudden. I don't know why its only effecting a few currently. Here's the maui log extract for a current run of this users' program: --- [root@aeolus log]# grep 2120 * maui.log:05/29 09:01:45

Re: [OMPI users] [torqueusers] Job dies randomly, but only through torque

2008-05-29 Thread Jeff Squyres
I don't know much about Maui, but these lines from the log seem relevant: - maui.log:05/29 09:27:21 INFO: job 2120 exceeds requested proc limit (3.72 > 1.00) maui.log:05/29 09:27:21 MSysRegEvent(JOBRESVIOLATION: job '2120' in state 'Running' has exceeded PROC resource limit (372 >

[OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Davi Vercillo C. Garcia
Hi, I'm trying to run my program in my environment and some problems are happening. My environment is based on PVFS2 over NFS (PVFS is mounted over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and BZ2 development libraries. When I tried to run, a message appears: File locking failed

Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Brock Palen
Well don't run like this. Have PVFS have NFS, don't mix them like that your asking for pain, my 2 cents. I get this error all the time also, you have to disable a large portion of the caching that NFS does to make sure that all MPI-IO clients get true data on the file they are all trying t

[OMPI users] ulimit question from video open-fabrics-concepts...

2008-05-29 Thread twurgl
HI, I am in one of your MPI instructional videos and have a question. You said to make sure the registered memory ulimit is set to unlimited. I type the command "ulimit -a" and don't see a registered memory entry. Is this maybe the same as "max locked memory"? Or can you tell me where to check

Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Robert Latham
On Thu, May 29, 2008 at 04:24:18PM -0300, Davi Vercillo C. Garcia wrote: > Hi, > > I'm trying to run my program in my environment and some problems are > happening. My environment is based on PVFS2 over NFS (PVFS is mounted > over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and > BZ

Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Davi Vercillo C. Garcia
Hi, I'm already using "noac" option on my /etc/fstab but this error is still happening. I need to put this in another file ? On Thu, May 29, 2008 at 4:33 PM, Brock Palen wrote: > Well don't run like this. Have PVFS have NFS, don't mix them like that your > asking for pain, my 2 cents. > I get t

Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Davi Vercillo C. Garcia
HI, > Oh, I see you want to use ordered i/o in your application. PVFS > doesn't support that mode. However, since you know how much data each > process wants to write, a combination of MPI_Scan (to compute each > processes offset) and MPI_File_write_at_all (to carry out the > collective i/o) wil

Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Brock Palen
That I don't know, we use lustre for this stuff now. And our users don't use parallel IO, (though I hope to change that). Sorry can't help more. I would really use 'just' pvfs2 for your IO. The other reply pointed out you can have both and not use NFS at all for your IO but leave it mounte

Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Robert Latham
On Thu, May 29, 2008 at 04:48:49PM -0300, Davi Vercillo C. Garcia wrote: > > Oh, I see you want to use ordered i/o in your application. PVFS > > doesn't support that mode. However, since you know how much data each > > process wants to write, a combination of MPI_Scan (to compute each > > process

Re: [OMPI users] ulimit question from video open-fabrics-concepts...

2008-05-29 Thread Jeff Squyres
On May 29, 2008, at 3:41 PM, twu...@goodyear.com wrote: I am in one of your MPI instructional videos and have a question. You said to make sure the registered memory ulimit is set to unlimited. I type the command "ulimit -a" and don't see a registered memory entry. Is this maybe the same a

[OMPI users] specifying hosts in mpi_spawn()

2008-05-29 Thread Bruno Coutinho
How mpi handles the host string passed in the info argument to mpi_comm_spawn() ? if I set host to: "host1,host2,host3,host2,host2,host1" then ranks 0 and 5 will run in host1, ranks 1,3,4 in host 2 and rank 3 in host3?