When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude lo,eth1 $NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine.
When running the same job via the PBSPro scheduler I get errors. The PBS script is called nwrun and is run with the following command - qsub -V -S /bin/bash ./nwrun. Nwrun listing: #!/bin/tcsh #PBS -N h2o #PBS -l select=4:ncpus=4:mpiprocs=4 #PBS -l walltime=0:10:00 #PBS -e . #PBS -j eo #PBS -k eo # # set working directory set echo cd $PBS_O_WORKDIR # # make sure that the proper mpirun is installed ##module load hpc/openmpi-1.2.6-intel # # load NWChem #module load hpc/nwchem-5.1 setenv LD_LIBRARY_PATH /share/apps/openmpi-1.2.6-intel/lib:/share/apps/intel/mkl/10.0.1.014/lib /em64t:/s hare/apps/intel/cce/10.1.015/lib:/share/apps/intel/fce/10.1.015/lib setenv NWCHEM /share/apps/nwchem-5.1/bin/nwchem setenv PERMANENT_DIR $PBS_O_WORKDIR setenv SCRATCH_DIR $TMPDIR # setenv | grep LD_LIB which mpirun cat $PBS_NODEFILE # run a parallel job mpirun --byslot --mca btl self,sm,tcp --mca btl_tcp_if_exclude lo,eth1 $NWCHEM h2o.nw >& h2o.nwo.$$ exit Error listing from error file: ARMCI configured for 4 cluster nodes. Network protocol is 'TCP/IP Sockets'. 1:trying connect to host=compute-1-4.local, port=35506 t=5 111 1:armci_CreateSocketAndConnect: connect failed: -1 trying to connect:: Connection refused 1:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 1:: Connection refused [compute-1-4.local:04739] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode -1 3:trying connect to host=compute-1-4.local, port=35508 t=5 111 trying to connect:: Connection refused 3:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 3:: Connection refused 3:armci_CreateSocketAndConnect: connect failed: -1 [compute-1-4.local:04741] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode -1 6:trying connect to host=compute-1-5.local, port=48920 t=5 111 10:trying connect to host=compute-1-6.local, port=36350 t=5 111 4:armci_CreateSocketAndConnect: connect failed: -1 4:trying connect to host=compute-1-5.local, port=48918 t=5 111 trying to connect:: Connection refused 4:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 4:: Connection refused 5:armci_CreateSocketAndConnect: connect failed: -1 5:trying connect to host=compute-1-5.local, port=48919 t=5 111 trying to connect:: Connection refused 5:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 5:: Connection refused [compute-1-5.local:01175] MPI_ABORT invoked on rank 5 in communicator MPI_COMM_WORLD with errorcode -1 6:armci_CreateSocketAndConnect: connect failed: -1 trying to connect:: Connection refused 6:armci_CreateSocketAndConnect: connect failed: -1 Last System Error Message from Task 6:: Connection refused Is anybody familiar with this error? Robert C. Jackson Software Systems Specialist III The University of Texas - Pan American 1201 W. University Dr. Edinburg Texas 78539 Academic Computing Department ASB 2.162E 956-381-2455 office 956-381-2355 fax email: rjack...@utpa.edu