On May 27, 2008, at 11:47 AM, Jim Kusznir wrote:
I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's
getting similar output:
[root@aeolus logs]# more 2047.aeolus.OU
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
data directory is /mnt/pvf
I have updated to OpenMPI 1.2.6 and had the user rerun his jobs. He's
getting similar output:
[root@aeolus logs]# more 2047.aeolus.OU
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
data directory is /mnt/pvfs2/patton/data/chem/aa1
exec directory is /mnt/pvfs
I've asked for verification, but I recall the original verbal
complaint claiming the wall time was random and sometimes as short as
2 minutes into a job.
They have said they've run more tests with more instrumentation on
their code, and it always fails in a random placeSame job,
different resu
This may be a dumb question, but is there a chance that his job is
running beyond 30 minutes, and PBS/Torque/whatever is killing it?
On May 20, 2008, at 4:23 PM, Jim Kusznir wrote:
Hello all:
I've got a user on our ROCKS 4.3 cluster that's having some strange
errors. I have other users usin