On Jun 25, 2009, at 12:06 PM, Robert Jackson wrote:

When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude lo,eth1 $NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine.

When running the same job via the PBSPro scheduler I get errors. The PBS script is called nwrun and is run with the following command – qsub –V –S /bin/bash ./nwrun.

Odd.

I'm unfortunately unfamiliar with nwchem; it looks like the error is coming from ARMCI. Have you checked with the nwchem authors to see what this error means?

Error listing from error file:
ARMCI configured for 4 cluster nodes. Network protocol is 'TCP/IP Sockets'.
1:trying connect to host=compute-1-4.local, port=35506 t=5 111
1:armci_CreateSocketAndConnect: connect failed: -1
trying to connect:: Connection refused
1:armci_CreateSocketAndConnect: connect failed: -1
Last System Error Message from Task 1:: Connection refused
[compute-1-4.local:04739] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode -1


--
Jeff Squyres
Cisco Systems


Reply via email to