How about setting TMPDIR to a local filesystem?
On Mar 3, 2014, at 3:43 PM, Beichuan Yan <beichuan....@colorado.edu> wrote: > I agree there are two cases for pure-MPI mode: 1. Job fails with no apparent > reason; 2 job complains shared-memory file on network file system, which can > be resolved by " export TMPDIR=/home/yanb/tmp", /home/yanb/tmp is my local > directory. The default TMPDIR points to a Lustre directory. > > There is no any other output. I checked my job with "qstat -n" and found that > processes were actually not started on compute nodes even though PBS Pro has > "started" my job. > > Beichuan > >> 3. Then I test pure-MPI mode: OPENMP is turned off, and each compute node >> runs 16 processes (clearly shared-memory of MPI is used). Four combinations >> of "TMPDIR" and "TCP" are tested: >> case 1: >> #export TMPDIR=/home/yanb/tmp >> TCP="--mca btl_tcp_if_include 10.148.0.0/16" >> mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE ./paraEllip3d >> input.txt >> output: >> Start Prologue v2.5 Mon Mar 3 15:47:16 EST 2014 End Prologue v2.5 Mon >> Mar 3 15:47:16 EST 2014 >> -bash: line 1: 448597 Terminated >> /var/spool/PBS/mom_priv/jobs/602244.service12.SC >> Start Epilogue v2.5 Mon Mar 3 15:50:51 EST 2014 Statistics >> cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb,walltime >> =00:03:24 End Epilogue v2.5 Mon Mar 3 15:50:52 EST 2014 > > It looks like you have two general cases: > > 1. The job fails for no apparent reason (like above), or 2. The job complains > that your TMPDIR is on a shared filesystem > > Right? > > I think the real issue, then, is to figure out why your jobs are failing with > no output. > > Is there anything in the stderr output? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/