Re: [OMPI users] OpenMPI+PGI errors

2008-05-28 Thread Jeff Squyres
On May 27, 2008, at 11:47 AM, Jim Kusznir wrote: I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's getting similar output: [root@aeolus logs]# more 2047.aeolus.OU Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. data directory is /mnt/pvf

Re: [OMPI users] OpenMPI+PGI errors

2008-05-27 Thread Jim Kusznir
I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's getting similar output: [root@aeolus logs]# more 2047.aeolus.OU Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. data directory is /mnt/pvfs2/patton/data/chem/aa1 exec directory is /mnt/pvfs

Re: [OMPI users] OpenMPI+PGI errors

2008-05-23 Thread Jim Kusznir
I've asked for verification, but I recall the original verbal complaint claiming the wall time was random and sometimes as short as 2 minutes into a job. They have said they've run more tests with more instrumentation on their code, and it always fails in a random placeSame job, different resu

Re: [OMPI users] OpenMPI+PGI errors

2008-05-23 Thread Jeff Squyres
This may be a dumb question, but is there a chance that his job is running beyond 30 minutes, and PBS/Torque/whatever is killing it? On May 20, 2008, at 4:23 PM, Jim Kusznir wrote: Hello all: I've got a user on our ROCKS 4.3 cluster that's having some strange errors. I have other users usin

[OMPI users] OpenMPI+PGI errors

2008-05-20 Thread Jim Kusznir
Hello all: I've got a user on our ROCKS 4.3 cluster that's having some strange errors. I have other users using the cluster without any such errors reported, but this user also runs this code on other clusters without any problems, so I'm not really sure where the problem lies. They are getting