Thank you Ralph, Your suspects seems to be quite interesting :) I try to run the same program from node 192.168.1/2.11 using also 192.168.2.12 "tracing" .12 activities. I attach the two files (_succ: using --debug-daemons , _fail:without --debug-daemons) I notice that orted daemon on the second node is called in a different way..... Moreover when i launch without --debug-daemons a process called orted...... remain active on the second node after i kill (ctrl+c) the command on the first node.
Can you continue to help me ? On Tue, Dec 28, 2010 at 8:51 PM, Ralph Castain <r...@open-mpi.org> wrote: > All --debug-daemons really does is keep the ssh session open after > launching the remote daemon and turn on some output. Otherwise, we close > that session as most systems only allow a limited number of concurrent ssh > sessions to be open. > > I suspect you have a system setting that kills any running job upon ssh > close. It would be best if you removed that restriction. If you cannot, then > you can always run your MPI jobs with --no-daemonize. This will keep the ssh > session open, but without all the debug output. > > That flag is just shorthand for an MCA param, so you can set it in your > environ or put it in your default MCA param file. > > > On Dec 28, 2010, at 3:31 AM, Advanced Computing Group University of Padova > wrote: > > yes i've tested 'em > In fact using the --debug-daemons switch everything works fine! (and i see > that on the nodes a process calles orted... is started whenever i launch a > test application) > I believe this is a environment variables problem.... > > On Mon, Dec 27, 2010 at 10:16 PM, David Zhang <solarbik...@gmail.com>wrote: > >> have you tested your ssh key setup, fire wall, and switch settings to >> ensure all nodes are talking to each other? >> >> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of >> Padova <acg.un...@gmail.com> wrote: >> >>> using openmpi 1.4.2 >>> >>> >>> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University of >>> Padova <acg.un...@gmail.com> wrote: >>> >>>> Hi, >>>> i am building a small 16 nodes cluster gentoo based. >>>> I succesfully installed openmpi and i succesfully tried some simple >>>> small test parallel program on a single host but... >>>> i can't run parallel program on more than one nodes >>>> >>>> >>>> The nodes are cloned (so they are equals). >>>> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a >>>> nfs share. >>>> I modified .bashrc >>>> >>>> ------------------------- >>>> PATH=/usr/bin:$PATH ; export PATH ; >>>> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; >>>> >>>> # already present below >>>> if [[ $- != *i* ]] ; then >>>> # Shell is non-interactive. Be done now! >>>> return >>>> fi >>>> --------------------- >>>> >>>> The very very strange behaviour is that using the --debug-daemons let my >>>> program run succesfully..... >>>> >>>> Thank you in advance and sorry for my bad english >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> David Zhang >> University of California, San Diego >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >