If you want, you can upgrade to the last release in the 1.2 series from the 
www.open-mpi.org web site. Anything in 1.2 will work - just not beyond.


On Jun 21, 2010, at 1:40 PM, Barrett, Brian W wrote:

> You have to set two environment variables (XGRID_CONTROLLER_HOSTNAME and 
> XGRID_CONTROLLER_PASSWORD) with the correct information in order for the 
> XGrid starter to work.  Due to the way XGrid works, the nolocal option will 
> not work properly when launching with XGrid.
> 
> Brian
> 
> On Jun 21, 2010, at 1:28 PM, charlie strauss wrote:
> 
>> Perhaps I was mistaken about 1.5rc1.    As for  the installed openMPI on mac 
>> osx, my 10.5 OSX has v1.2.3  when I try to run it, it works fine locally but 
>> it never finds the xgrid.
>> 
>> any mpi job I run, will run on the localhost not the xgrid agents.  If try 
>> to force the issue by specifying -nolocal then it just complains there are 
>> no nodes.
>> 
>> SO how do I use openMPI so that it uses the nodes of an xgrid cluster?
>> 
>> mpirun -nolocal -n 32 /bin/hostname
>> --------------------------------------------------------------------------
>> There are no available nodes allocated to this job. This could be because
>> no nodes were found or all the available nodes were already used.
>> 
>> Note that since the -nolocal option was given no processes can be 
>> launched on the local node.
>> --------------------------------------------------------------------------
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in 
>> file 
>> /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/rmaps_base_support_fns.c
>>  at line 168
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in 
>> file 
>> /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/round_robin/rmaps_rr.c 
>> at line 402
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in 
>> file 
>> /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/rmaps_base_map_job.c
>>  at line 210
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in 
>> file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmgr/urm/rmgr_urm.c at 
>> line 372
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Jun 16, 2010, at 1:36 PM, Ralph Castain wrote:
>> 
>>> Where did you see that 1.5 works with xgrid? That support has been broken 
>>> since the 1.2 series, unfortunately, so it would help to ensure we don't 
>>> have stale docs out there to the contrary.
>>> 
>>> As for the 1.2 results, you are aware (I imagine) that OSX ships with the 
>>> last 1.2 release already installed? You don't have to do anything to use it 
>>> but run.
>>> 
>>> If you are getting peer timeouts, that is almost always a firewall issue. 
>>> But I would try the factory-installed version first to be sure.
>>> 
>>> On Jun 16, 2010, at 1:14 PM, Charlie E. Strauss wrote:
>>> 
>>>> I'm new to openMPI.  I'm trying to set it up for using xgrid.  I have read
>>>> that v1.3 and v1.4 are broken on OSX 10.5 and 10.6 although I have seen
>>>> some discussions in the archives of this mail list saying some people have
>>>> v1.4 running on 10.6.
>>>> 
>>>> I have now compiled both openMPI 1.2 and openMPI1.5rc  and neither of
>>>> these is working for me with xgrid.   Both of these say they work with
>>>> xgrid.
>>>> 
>>>> The failuremodes are different.
>>>> 
>>>> Anyone know how to get a working install?  I am building this on a OSX 
>>>> 10.5.8
>>>> machine.  THe xgrid controller is on a OSX 10.6 server machine.  I have 
>>>> tried
>>>> configuring with and without the --with-xgrid option.
>>>> 
>>>> Behaviour of openMPI1.2
>>>> $ /usr/local/openmpi/bin/mpirun -nolocal -n 2 /bin/hostname
>>>> 
>>>> THe job appears in the xgrid queue, and the logs show it is running on a
>>>> remote machine.  However nothing ever happens and peeking in the xgrid
>>>> results I see:
>>>> 
>>>> $ xgrid -job results -id 8703
>>>> [brio.llnl.gov:38789] [0,0,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>>> connection failed: Operation timed out (60) - retrying
>>>> [brio.llnl.gov:38792] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>>> connection failed: Operation timed out (60) - retrying
>>>> 
>>>> Perhaps a firewall issue?
>>>> 
>>>> Of course I'm more interested in getting the new openMPI1.5 working.
>>>> When I run this, again I get an entry in the queue, and the job runs on a
>>>> remote machine but  I get a job failed message
>>>> 
>>>> $ /usr/local/openmpi5/bin/mpirun -n 2 /bin/hostname
>>>> $ xgrid -job results -id 8702
>>>> [brio.llnl.gov:38776] Error: unknown option "-mca"
>>>> 
>>>> ----
>>>> 
>>>> Note I have NOT installed openMPI on any of the other computers in the
>>>> grid.  So perhaps that is the problem?  If I did install it on other
>>>> computers how would I tell mpirun where to find the path to the install
>>>> point?
>>>> 
>>>> ----
>>>> 
>>>> 
>>>> Finally in both cases, I don't see any way to pass xgrid specific argument
>>>> in on the mpi command line.  An xgrid controller divides the agents into
>>>> sets of logical grids and you need to specify which logical grid to submit
>>>> the job to.    In xgrid cli syntax one write "xgrid -gid 2"  for grid 2. 
>>>> When I use openMPI all the jobs get sent to just the default grid which is
>>>> the grid that xgrid uses if no gid is specified.
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> Charlie Strauss
>> Bioscience Division
>> c...@lanl.gov
>> 505 665 4838
>> Quidquid latine dictum sit, altum sonatur.
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to