In answer to Ralph's request and question.
Indeed the version number was incorrect it should have been
openmpi-1.3a1r12121
my configure command is
#!/bin/ksh
CC="/opt/studio11/SUNWspro/bin/cc"
CFLAGS="-xarch=amd64a -I/opt/mx/include -I/opt/SUNWsge/include"
LDFLAGS="-xarch=amd64a -I/opt/m
Hi Lydia
Thanks - that does help!
Could you try this without threads? We have tried to make the system work
with threads, but our testing has been limited. First thing I would try is
to make sure that we aren't hitting a thread-lock.
Thanks
Ralph
On 10/20/06 2:11 AM, "Lydia Heck" wrote:
>
We are having quite a bit of trouble reliably launching larger jobs
(1920 nodes, 1 ppn) with OMPI (1.1.2rc4 with gcc) at the moment. The
launches usually either just hang or fail with output like:
Cbench numprocs: 1920
Cbench numnodes: 1921
Cbench ppn: 1
Cbench jobname: xhpl-1ppn-1920
Cbench jobl
Hi Ralph,
which of the thread options should I remove:
> > --enable-mpi-threads \
> > --enable-progress-threads \
> > --with-threads=solaris
all of them?
Lydia
>
> --
>
> Message: 1
> Date: Fri, 20 Oct 2006 06:30:36 -
Sorry, I should have been clearer. Yes, please remove them all - let's just
see if that's the problem.
Thanks
On 10/20/06 10:41 AM, "Lydia Heck" wrote:
>
> Hi Ralph,
>
> which of the thread options should I remove:
>
>>> --enable-mpi-threads \
>>> --enable-progress-threads \
>>> --w
This message is coming from torque:
[15:15] 69-94-204-35:~/Desktop/torque-2.1.2 % grep -r "out of space
in buffer and cannot commit message" *
src/lib/Libifl/tcp_dis.c: DBPRT(("%s: error! out of space in
buffer and cannot commit message (bufsize=%d, buflen=%d, ct=%d)\n",
Are you able
Two questions:
1. Have you tried the just-released 1.1.2?
2. Are you closing stdin/out/err?
On Oct 19, 2006, at 3:31 PM, Jeffrey B. Layton wrote:
A small update. I was looking through the error file a bit more
(it was 159MB). I found the following error message sequence:
o1:22805] mca_oob_tc
Jeff Squyres wrote:
Two questions:
1. Have you tried the just-released 1.1.2?
No, not yet.
2. Are you closing stdin/out/err?
How do you do this?
I did get some help on how to fix the problem by adding ' < /dev/null'
at the very end of the mpirun line. This seems to have fixed the
problem.
We don't actually have the capability to test the mpiexec + MVAPICH
launch at the moment. I was able to get a job to launch at 1920 and I'm
waiting for it to finish. When it is done, I can at least try an mpiexec
-comm=none launch to see how TM responds to it.
> -Original Message-
> From: