On Aug 27, 2007, at 3:14 PM, Lev Givon wrote:
I have OpenMPI 1.2.3 installed on an XGrid cluster and a separate Mac client that I am using to submit jobs to the head (controller) node of the cluster. The cluster's compute nodes are all connected to the head node via a private network and are not running any firewalls. When I try running jobs with mpirun directly on the cluster's head node, they execute successfully; if I attempt to submit the jobs from the client (which can run jobs on the cluster using the xgrid command line tool) with mpirun, however, they appear to hang indefinitely (i.e., a job ID is created, but the mpirun itself never returns or terminates). Is it nececessary to configure the firewall on the submission client to grant access to the cluster head node in order to remotely submit jobs to the cluster's head node?
Currently, every node on which an MPI process is launched must be able to open a connection to a random port on the machine running mpirun. So in your case, you'd have to configure the network on the cluster to be able to connect back to your workstation (and the workstation would have to allow connections from all your cluster nodes). Far from ideal, but it's what it is.
Brian