On May 27, 2008, at 11:47 AM, Jim Kusznir wrote:
I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's
getting similar output:
[root@aeolus logs]# more 2047.aeolus.OU
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
data directory is /mnt/pvf
I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's
getting similar output:
[root@aeolus logs]# more 2047.aeolus.OU
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
data directory is /mnt/pvfs2/patton/data/chem/aa1
exec directory is /mnt/pvfs
I've asked for verification, but I recall the original verbal
complaint claiming the wall time was random and sometimes as short as
2 minutes into a job.
They have said they've run more tests with more instrumentation on
their code, and it always fails in a random placeSame job,
different resu
This may be a dumb question, but is there a chance that his job is
running beyond 30 minutes, and PBS/Torque/whatever is killing it?
On May 20, 2008, at 4:23 PM, Jim Kusznir wrote:
Hello all:
I've got a user on our ROCKS 4.3 cluster that's having some strange
errors. I have other users usin
Hello all:
I've got a user on our ROCKS 4.3 cluster that's having some strange
errors. I have other users using the cluster without any such errors
reported, but this user also runs this code on other clusters without
any problems, so I'm not really sure where the problem lies. They are
getting