On Jun 25, 2009, at 12:06 PM, Robert Jackson wrote:
When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl
self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude
lo,eth1 $NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine.
When running the same job via the PBSPro scheduler I get errors. The
PBS script is called nwrun and is run with the following command –
qsub –V –S /bin/bash ./nwrun.
Odd.
I'm unfortunately unfamiliar with nwchem; it looks like the error is
coming from ARMCI. Have you checked with the nwchem authors to see
what this error means?
Error listing from error file:
ARMCI configured for 4 cluster nodes. Network protocol is 'TCP/IP
Sockets'.
1:trying connect to host=compute-1-4.local, port=35506 t=5 111
1:armci_CreateSocketAndConnect: connect failed: -1
trying to connect:: Connection refused
1:armci_CreateSocketAndConnect: connect failed: -1
Last System Error Message from Task 1:: Connection refused
[compute-1-4.local:04739] MPI_ABORT invoked on rank 1 in
communicator MPI_COMM_WORLD with errorcode -1
--
Jeff Squyres
Cisco Systems