Received from Brian Barrett on Tue, Aug 28, 2007 at 05:07:51PM EDT: > On Aug 28, 2007, at 10:59 AM, Lev Givon wrote: > > > Received from Brian Barrett on Tue, Aug 28, 2007 at 12:22:29PM EDT: > >> On Aug 27, 2007, at 3:14 PM, Lev Givon wrote: > >> > >>> I have OpenMPI 1.2.3 installed on an XGrid cluster and a separate > >>> Mac > >>> client that I am using to submit jobs to the head (controller) > >>> node of > >>> the cluster. The cluster's compute nodes are all connected to the > >>> head > >>> node via a private network and are not running any firewalls. When I > >>> try running jobs with mpirun directly on the cluster's head node, > >>> they > >>> execute successfully; if I attempt to submit the jobs from the > >>> client > >>> (which can run jobs on the cluster using the xgrid command line > >>> tool) > >>> with mpirun, however, they appear to hang indefinitely (i.e., a > >>> job ID > >>> is created, but the mpirun itself never returns or terminates). > >>> Is it > >>> nececessary to configure the firewall on the submission client to > >>> grant access to the cluster head node in order to remotely submit > >>> jobs > >>> to the cluster's head node? > >> > >> Currently, every node on which an MPI process is launched must be > >> able to open a connection to a random port on the machine running > >> mpirun. So in your case, you'd have to configure the network on the > >> cluster to be able to connect back to your workstation (and the > >> workstation would have to allow connections from all your cluster > >> nodes). Far from ideal, but it's what it is. > >> > >> Brian > > > > Can this be avoided by submitting the "mpirun -n 10 myProg" command > > directly to the controller node with the xgrid command line tool? For > > some reason, sending the above command to the cluster results in a > > "task: failed with status 255" error even though I can successfully > > run other programs or commands to the cluster with the xgrid tool. I > > know that OpenMPI on the cluster is running properly because I can run > > programs with mpirun successfully when logged into the controller node > > itself. > > Open MPI was designed to be the one calling XGrid's scheduling > algorithm, so I'm pretty sure that you can't submit a job that just > runs Open MPI's mpirun. That wasn't really in our original design > space as an option. > > Brian
I see. Apart from employing some grid package with more features than Xgrid (e.g., perhaps Sun GridEngine), is anyone aware of a mechanism that would allow for the submission of MPI jobs to a cluster's head node from remote submit hosts without having to provide every user with an actual Unix account on the head node? L.G.