Re: [OMPI users] job fails to terminate

2006-10-20 Thread Lydia Heck
In answer to Ralph's request and question. Indeed the version number was incorrect it should have been openmpi-1.3a1r12121 my configure command is #!/bin/ksh CC="/opt/studio11/SUNWspro/bin/cc" CFLAGS="-xarch=amd64a -I/opt/mx/include -I/opt/SUNWsge/include" LDFLAGS="-xarch=amd64a -I/opt/m

Re: [OMPI users] job fails to terminate

2006-10-20 Thread Ralph H Castain
Hi Lydia Thanks - that does help! Could you try this without threads? We have tried to make the system work with threads, but our testing has been limited. First thing I would try is to make sure that we aren't hitting a thread-lock. Thanks Ralph On 10/20/06 2:11 AM, "Lydia Heck" wrote: >

[OMPI users] OMPI launching problem using TM and openib on 1920 nodes

2006-10-20 Thread Ogden, Jeffry Brandon
We are having quite a bit of trouble reliably launching larger jobs (1920 nodes, 1 ppn) with OMPI (1.1.2rc4 with gcc) at the moment. The launches usually either just hang or fail with output like: Cbench numprocs: 1920 Cbench numnodes: 1921 Cbench ppn: 1 Cbench jobname: xhpl-1ppn-1920 Cbench jobl

Re: [OMPI users] users Digest, Vol 411, Issue 2

2006-10-20 Thread Lydia Heck
Hi Ralph, which of the thread options should I remove: > > --enable-mpi-threads \ > > --enable-progress-threads \ > > --with-threads=solaris all of them? Lydia > > -- > > Message: 1 > Date: Fri, 20 Oct 2006 06:30:36 -

Re: [OMPI users] users Digest, Vol 411, Issue 2

2006-10-20 Thread Ralph H Castain
Sorry, I should have been clearer. Yes, please remove them all - let's just see if that's the problem. Thanks On 10/20/06 10:41 AM, "Lydia Heck" wrote: > > Hi Ralph, > > which of the thread options should I remove: > >>> --enable-mpi-threads \ >>> --enable-progress-threads \ >>> --w

Re: [OMPI users] OMPI launching problem using TM and openib on 1920 nodes

2006-10-20 Thread Jeff Squyres
This message is coming from torque: [15:15] 69-94-204-35:~/Desktop/torque-2.1.2 % grep -r "out of space in buffer and cannot commit message" * src/lib/Libifl/tcp_dis.c: DBPRT(("%s: error! out of space in buffer and cannot commit message (bufsize=%d, buflen=%d, ct=%d)\n", Are you able

Re: [OMPI users] Problem with PGI 6.1 and OpenMPI-1.1.1

2006-10-20 Thread Jeff Squyres
Two questions: 1. Have you tried the just-released 1.1.2? 2. Are you closing stdin/out/err? On Oct 19, 2006, at 3:31 PM, Jeffrey B. Layton wrote: A small update. I was looking through the error file a bit more (it was 159MB). I found the following error message sequence: o1:22805] mca_oob_tc

Re: [OMPI users] Problem with PGI 6.1 and OpenMPI-1.1.1

2006-10-20 Thread Jeffrey B. Layton
Jeff Squyres wrote: Two questions: 1. Have you tried the just-released 1.1.2? No, not yet. 2. Are you closing stdin/out/err? How do you do this? I did get some help on how to fix the problem by adding ' < /dev/null' at the very end of the mpirun line. This seems to have fixed the problem.

Re: [OMPI users] OMPI launching problem using TM and openib on 1920 nodes

2006-10-20 Thread Ogden, Jeffry Brandon
We don't actually have the capability to test the mpiexec + MVAPICH launch at the moment. I was able to get a job to launch at 1920 and I'm waiting for it to finish. When it is done, I can at least try an mpiexec -comm=none launch to see how TM responds to it. > -Original Message- > From: