All --debug-daemons really does is keep the ssh session open after launching
the remote daemon and turn on some output. Otherwise, we close that session as
most systems only allow a limited number of concurrent ssh sessions to be open.
I suspect you have a system setting that kills any running job upon ssh close.
It would be best if you removed that restriction. If you cannot, then you can
always run your MPI jobs with --no-daemonize. This will keep the ssh session
open, but without all the debug output.
That flag is just shorthand for an MCA param, so you can set it in your environ
or put it in your default MCA param file.
On Dec 28, 2010, at 3:31 AM, Advanced Computing Group University of Padova
wrote:
> yes i've tested 'em
> In fact using the --debug-daemons switch everything works fine! (and i see
> that on the nodes a process calles orted... is started whenever i launch a
> test application)
> I believe this is a environment variables problem
>
> On Mon, Dec 27, 2010 at 10:16 PM, David Zhang wrote:
> have you tested your ssh key setup, fire wall, and switch settings to ensure
> all nodes are talking to each other?
>
> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of
> Padova wrote:
> using openmpi 1.4.2
>
>
> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University of
> Padova wrote:
> Hi,
> i am building a small 16 nodes cluster gentoo based.
> I succesfully installed openmpi and i succesfully tried some simple small
> test parallel program on a single host but...
> i can't run parallel program on more than one nodes
>
>
> The nodes are cloned (so they are equals).
> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a nfs
> share.
> I modified .bashrc
>
> -
> PATH=/usr/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ;
>
> # already present below
> if [[ $- != *i* ]] ; then
> # Shell is non-interactive. Be done now!
> return
> fi
> -
>
> The very very strange behaviour is that using the --debug-daemons let my
> program run succesfully.
>
> Thank you in advance and sorry for my bad english
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> David Zhang
> University of California, San Diego
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users