On Feb 6, 2006, at 5:25 PM, Warner Yuen wrote:

Brian help!!!!!! :-)

On Feb 5, 2006, at 9:00 AM, users-requ...@open-mpi.org wrote:

If this is the case, my next question is, how do I supply the usual
xgrid options, such as working directory, standard input file, etc?
Or is that simply not possible?
Do I simply have to have some other way (eg ssh) to get files to/
from agent machines, like I would for a batch system like PBS?

It looks like I never implemented those options (shame on me).  I've
added that to my to-do list, although I can't give an accurate time-
table for implementation at this point.  One thing to note is that
rather than using XGrid's standard input/output forwarding services,
we use Open MPI's services.  So if you do:

   mpirun -np 2 ./myapp < foo.txt

Under Xgrid with Open MPI, I'm trying to run applications that require more than just reading standard input/output but also the creation and writing of other intermediate files. For an application that like HP Linpack that just reads and writes one file, things work fine. My guess is that this is where things are getting hung up. Below, my application was trying to write out a file called "testrun.nex.run1.p" and failed. The MrBayes application writes out two files for each mpi process.

Initial log likelihoods for run 1:
Chain 1 -- -429.987779
Chain 2 -- -386.761468
Could not open file "testrun.nex.run1.p"
Memory allocation error on at least one processor
Error in command "Mcmc"
There was an error on at least one processor
Error in command "Execute"
Will exit with signal 1 (error) because quitonerror is set to yes

Am I just misunderstanding how to set up Open MPI to work with Xgrid?

Ah, yes, this would make sense. When password authentication is used to authenticate to an XGrid controller, all jobs run as user 'nobody'. So all the files that MrBayes (for example) are trying to read/write must have permissions for user 'nobody'. If the files only need to be read, making them (and your home directory itself) world readable is an option. If the files need to be written, then there's a bit of a problem, since you probably (in general) don't want to allow user nobody to write all over your home directory. One solution (if possible) would be to have the application write into / tmp and then collect the files after the job completes.

If kerberos authentication (aka Single Signon) is used for controller authentication, then the processes started by XGrid run as the user who submitted the job. This makes I/O on the compute nodes significantly easier, but setting up the grid is more difficult. All the computers have to use the same kerberos authentication realm, and I think there are some other restrictions. Also, because I didn't have access to such a setup, Open MPI 1.0.x does not support process startup with single signon authentication. This is something I'm hoping to have fixed for Open MPI 1.1, if I can find a properly configured cluster to test on.


Hope this made some sense...

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to