Received from Brian Barrett on Tue, Aug 28, 2007 at 12:22:29PM EDT:
> On Aug 27, 2007, at 3:14 PM, Lev Givon wrote:
> 
> > I have OpenMPI 1.2.3 installed on an XGrid cluster and a separate Mac
> > client that I am using to submit jobs to the head (controller) node of
> > the cluster. The cluster's compute nodes are all connected to the head
> > node via a private network and are not running any firewalls. When I
> > try running jobs with mpirun directly on the cluster's head node, they
> > execute successfully; if I attempt to submit the jobs from the client
> > (which can run jobs on the cluster using the xgrid command line tool)
> > with mpirun, however, they appear to hang indefinitely (i.e., a job ID
> > is created, but the mpirun itself never returns or terminates). Is it
> > nececessary to configure the firewall on the submission client to
> > grant access to the cluster head node in order to remotely submit jobs
> > to the cluster's head node?
> 
> Currently, every node on which an MPI process is launched must be
> able to open a connection to a random port on the machine running
> mpirun.  So in your case, you'd have to configure the network on the
> cluster to be able to connect back to your workstation (and the
> workstation would have to allow connections from all your cluster
> nodes). Far from ideal, but it's what it is.
> 
> Brian

Can this be avoided by submitting the "mpirun -n 10 myProg" command
directly to the controller node with the xgrid command line tool? For
some reason, sending the above command to the cluster results in a
"task: failed with status 255" error even though I can successfully
run other programs or commands to the cluster with the xgrid tool.  I
know that OpenMPI on the cluster is running properly because I can run
programs with mpirun successfully when logged into the controller node
itself.

                                                   L.G.

Reply via email to