subject:"Re\: \[OMPI users\] OpenMPI\+PGI errors"

Re: [OMPI users] OpenMPI+PGI errors

2008-05-28 Thread Jeff Squyres

On May 27, 2008, at 11:47 AM, Jim Kusznir wrote: I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's getting similar output: [root@aeolus logs]# more 2047.aeolus.OU Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. data directory is /mnt/pvf

Re: [OMPI users] OpenMPI+PGI errors

2008-05-27 Thread Jim Kusznir

I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's getting similar output: [root@aeolus logs]# more 2047.aeolus.OU Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. data directory is /mnt/pvfs2/patton/data/chem/aa1 exec directory is /mnt/pvfs

Re: [OMPI users] OpenMPI+PGI errors

2008-05-23 Thread Jim Kusznir

I've asked for verification, but I recall the original verbal complaint claiming the wall time was random and sometimes as short as 2 minutes into a job. They have said they've run more tests with more instrumentation on their code, and it always fails in a random placeSame job, different resu

Re: [OMPI users] OpenMPI+PGI errors

2008-05-23 Thread Jeff Squyres

This may be a dumb question, but is there a chance that his job is running beyond 30 minutes, and PBS/Torque/whatever is killing it? On May 20, 2008, at 4:23 PM, Jim Kusznir wrote: Hello all: I've got a user on our ROCKS 4.3 cluster that's having some strange errors. I have other users usin