Re: [OMPI users] openmpi hangs when running on more than one node (unless i use --debug-daemons )

2010-12-28 Thread Advanced Computing Group University of Padova
yes i've tested 'em
In fact using the --debug-daemons switch everything works fine! (and i see
that on the nodes a process calles orted... is started whenever i launch a
test application)
I believe this is a environment variables problem

On Mon, Dec 27, 2010 at 10:16 PM, David Zhang  wrote:

> have you tested your ssh key setup, fire wall, and switch settings to
> ensure all nodes are talking to each other?
>
> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of
> Padova  wrote:
>
>> using openmpi 1.4.2
>>
>>
>> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University of
>> Padova  wrote:
>>
>>> Hi,
>>> i am building a small 16 nodes cluster gentoo based.
>>> I succesfully installed openmpi and i succesfully tried some simple small
>>> test parallel program on a single host but...
>>> i can't run parallel program on more than one nodes
>>>
>>>
>>> The nodes are cloned (so they are equals).
>>> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a nfs
>>> share.
>>> I modified .bashrc
>>>
>>> -
>>> PATH=/usr/bin:$PATH ; export PATH ;
>>> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ;
>>>
>>> # already present below
>>> if [[ $- != *i* ]] ; then
>>> # Shell is non-interactive.  Be done now!
>>> return
>>> fi
>>> -
>>>
>>> The very very strange behaviour is that using the --debug-daemons let my
>>> program run succesfully.
>>>
>>> Thank you in advance and sorry for my bad english
>>>
>>>
>>>
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> David Zhang
> University of California, San Diego
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openmpi hangs when running on more than one node (unless i use --debug-daemons )

2010-12-28 Thread Ralph Castain
All --debug-daemons really does is keep the ssh session open after launching 
the remote daemon and turn on some output. Otherwise, we close that session as 
most systems only allow a limited number of concurrent ssh sessions to be open.

I suspect you have a system setting that kills any running job upon ssh close. 
It would be best if you removed that restriction. If you cannot, then you can 
always run your MPI jobs with --no-daemonize. This will keep the ssh session 
open, but without all the debug output.

That flag is just shorthand for an MCA param, so you can set it in your environ 
or put it in your default MCA param file.


On Dec 28, 2010, at 3:31 AM, Advanced Computing Group University of Padova 
wrote:

> yes i've tested 'em
> In fact using the --debug-daemons switch everything works fine! (and i see 
> that on the nodes a process calles orted... is started whenever i launch a 
> test application)
> I believe this is a environment variables problem
> 
> On Mon, Dec 27, 2010 at 10:16 PM, David Zhang  wrote:
> have you tested your ssh key setup, fire wall, and switch settings to ensure 
> all nodes are talking to each other?
> 
> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of 
> Padova  wrote:
> using openmpi 1.4.2
> 
> 
> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University of 
> Padova  wrote:
> Hi,
> i am building a small 16 nodes cluster gentoo based.
> I succesfully installed openmpi and i succesfully tried some simple small 
> test parallel program on a single host but...
> i can't run parallel program on more than one nodes
> 
> 
> The nodes are cloned (so they are equals).
> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a nfs 
> share.
> I modified .bashrc
> 
> -
> PATH=/usr/bin:$PATH ; export PATH ; 
> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ;
> 
> # already present below
> if [[ $- != *i* ]] ; then
> # Shell is non-interactive.  Be done now!
> return
> fi
> -
> 
> The very very strange behaviour is that using the --debug-daemons let my 
> program run succesfully.
> 
> Thank you in advance and sorry for my bad english
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> David Zhang
> University of California, San Diego
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users