[OMPI users] 'orte_ess_base_select failed'

2009-03-26 Thread Russell McQueeney
I installed OpenMPI 1.3.1, and whenever I or mpirun try to start orted 
on any of the machines, it shows that message, and

--> Returned value Not found (-13) instead of ORTE-SUCCESS
Is there anything obvious that I missed?
My machines are Intel x86-32, running fedora (10 and 2)



Re: [OMPI users] 'orte_ess_base_select failed'

2009-03-27 Thread Russell McQueeney

command = mpirun --hostfile hostfile -np 2 echo `uname -a`
PATH = ...:/opt/openmpi/bin
LD_LIBRARY_PATH = /opt/openmpi/lib
no MCA parameters used

I set up the default shell to bash, and put some echo's in .bash_profile 
and .bashrc, and when i run the mpirun command, i see those echoes, but 
then it stops, and the job is never completed


Ralph Castain wrote:

Could you please send the info shown here:

http://www.open-mpi.org/community/help/

If the ess is failing, then we don't recognize the environment. 
Probably an issue with how it is configured vs being run.


Thanks
Ralph

On Mar 26, 2009, at 3:42 PM, Russell McQueeney wrote:

I installed OpenMPI 1.3.1, and whenever I or mpirun try to start 
orted on any of the machines, it shows that message, and

--> Returned value Not found (-13) instead of ORTE-SUCCESS
Is there anything obvious that I missed?
My machines are Intel x86-32, running fedora (10 and 2)

___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




config.log.bz2
Description: application/bzip


ompi_info.bz2
Description: application/bzip


orted_errors.bz2
Description: application/bzip


ifconfig.bz2
Description: application/bzip


Re: [OMPI users] 'orte_ess_base_select failed'

2009-03-27 Thread Russell McQueeney

Jeff Squyres wrote:

Hmm -- puzzling -- the error file you sent shows the following:

bash: /opt/openmpi/orted: No such file or directory

But that shouldn't happen; according to your config.log, you installed 
with a prefix of /opt/openmpi, so Open MPI should be looking for orted 
in /opt/openmpi/bin/orted.


You said that the command was


command = mpirun --hostfile hostfile -np 2 echo `uname -a`


Is there any chance that you ran with mpirun's absolute filename, such 
as:


/opt/openmpi/bin/mpirun --hostfile hostfile -np 2 echo `uname -a`

Or do you have any aliases involved?  I can't imagine how you're 
getting that error message -- Open MPI should never use a full path 
name for orted unless you specified --prefix on the mpirun command 
line (which you didn't), or youused a full path name for mpirun (which 
it looks like you didn't, and even if you did use 
/opt/openmpi/bin/mpirun, it should use that path to look for 
/opt/openmpi/bin/orted on the other node).  Otherwise, Open MPI relies 
on the PATH set in your shell startup files on remote nodes to find 
the orted.


This is very odd -- can you look at the exact command that is being 
executed on the remote node?



On Mar 27, 2009, at 12:24 PM, Russell McQueeney wrote:


command = mpirun --hostfile hostfile -np 2 echo `uname -a`
PATH = ...:/opt/openmpi/bin
LD_LIBRARY_PATH = /opt/openmpi/lib
no MCA parameters used

I set up the default shell to bash, and put some echo's in .bash_profile
and .bashrc, and when i run the mpirun command, i see those echoes, but
then it stops, and the job is never completed

Ralph Castain wrote:
> Could you please send the info shown here:
>
> http://www.open-mpi.org/community/help/
>
> If the ess is failing, then we don't recognize the environment.
> Probably an issue with how it is configured vs being run.
>
> Thanks
> Ralph
>
> On Mar 26, 2009, at 3:42 PM, Russell McQueeney wrote:
>
>> I installed OpenMPI 1.3.1, and whenever I or mpirun try to start
>> orted on any of the machines, it shows that message, and
>> --> Returned value Not found (-13) instead of ORTE-SUCCESS
>> Is there anything obvious that I missed?
>> My machines are Intel x86-32, running fedora (10 and 2)
>>
>> ___
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> 


>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

 




Oops.  I just did `/opt/openmpi/orted 2>orted_erros ; bzip2 
orted_errors` and didn't check it before I atached it.  What ends up 
happening is ^C kill mpirun on the head node, and all the other nodes 
have a zombie, nonresponsive 'orted' process, which I have to kill 
manually.  Interestingly enough, no matter what environment variables I 
set, and no matter which machine, when I try to run `orted` or 
`/opt/openmpi/bin/orted`, I get the exact same error.  I have attached 
the real orted errors file here.  The reason that bash was whining was 
an incorrect syntax on the stderr redierct, `orted 2> orted_errors` 
instead of the correct version; `orted 2>orted_errors`


orted_errors.bz2
Description: application/bzip


Re: [OMPI users] 'orte_ess_base_select failed'

2009-03-30 Thread Russell McQueeney
I only invoked orted manually to see the error message, as it wasn't 
showing up on the node's monitor or the xterm window i used to run 
mpirun.  And no, no prefix command, no aliases, no absolute path, 
environment variables set.


Re: [OMPI users] 'orte_ess_base_select failed'

2009-04-06 Thread Russell McQueeney

Jeff Squyres wrote:
Run with "--mca ess_base_verbose 1000" on the mpirun command line and 
send the output, such as:


  mpirun --mca ess_base_verbose 1000 rest of your command here...


On Mar 30, 2009, at 5:33 PM, Russell McQueeney wrote:


I only invoked orted manually to see the error message, as it wasn't
showing up on the node's monitor or the xterm window i used to run
mpirun.  And no, no prefix command, no aliases, no absolute path,
environment variables set.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Sorry, I was away for a few days.  Anyway, here's the verbose output.



a.doc
Description: MS-Word document