Received from Brian Barrett on Tue, Aug 28, 2007 at 12:22:29PM EDT: > On Aug 27, 2007, at 3:14 PM, Lev Givon wrote: > > > I have OpenMPI 1.2.3 installed on an XGrid cluster and a separate Mac > > client that I am using to submit jobs to the head (controller) node of > > the cluster. The cluster's compute nodes are all connected to the head > > node via a private network and are not running any firewalls. When I > > try running jobs with mpirun directly on the cluster's head node, they > > execute successfully; if I attempt to submit the jobs from the client > > (which can run jobs on the cluster using the xgrid command line tool) > > with mpirun, however, they appear to hang indefinitely (i.e., a job ID > > is created, but the mpirun itself never returns or terminates). Is it > > nececessary to configure the firewall on the submission client to > > grant access to the cluster head node in order to remotely submit jobs > > to the cluster's head node? > > Currently, every node on which an MPI process is launched must be > able to open a connection to a random port on the machine running > mpirun. So in your case, you'd have to configure the network on the > cluster to be able to connect back to your workstation (and the > workstation would have to allow connections from all your cluster > nodes). Far from ideal, but it's what it is. > > Brian
Can this be avoided by submitting the "mpirun -n 10 myProg" command directly to the controller node with the xgrid command line tool? For some reason, sending the above command to the cluster results in a "task: failed with status 255" error even though I can successfully run other programs or commands to the cluster with the xgrid tool. I know that OpenMPI on the cluster is running properly because I can run programs with mpirun successfully when logged into the controller node itself. L.G.