Thank You Raplh It works!!!! :) On Wed, Dec 29, 2010 at 4:23 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Both look perfectly right to me. The difference is only because your > "success" one still has the ssh session active. > > It looks to me like something is preventing communication when the ssh > session is terminated, but I have no clue why. > > Given the small cluster size, I would just add this to your default param > file and not worry about it: > > orte_leave_session_attached = 1 > > > On Dec 29, 2010, at 2:10 AM, Advanced Computing Group University of Padova > wrote: > > > > On Wed, Dec 29, 2010 at 10:10 AM, Advanced Computing Group University of > Padova <acg.un...@gmail.com> wrote: > >> Thank you Ralph, >> Your suspects seems to be quite interesting :) >> I try to run the same program from node 192.168.1/2.11 using also >> 192.168.2.12 "tracing" .12 activities. >> I attach the two files (_succ: using --debug-daemons , _fail:without >> --debug-daemons) >> I notice that orted daemon on the second node is called in a different >> way..... >> Moreover when i launch without --debug-daemons a process called >> orted...... remain active on the second node after i kill (ctrl+c) the >> command on the first node. >> >> Can you continue to help me ? >> >> >> On Tue, Dec 28, 2010 at 8:51 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> All --debug-daemons really does is keep the ssh session open after >>> launching the remote daemon and turn on some output. Otherwise, we close >>> that session as most systems only allow a limited number of concurrent ssh >>> sessions to be open. >>> >>> I suspect you have a system setting that kills any running job upon ssh >>> close. It would be best if you removed that restriction. If you cannot, then >>> you can always run your MPI jobs with --no-daemonize. This will keep the ssh >>> session open, but without all the debug output. >>> >>> That flag is just shorthand for an MCA param, so you can set it in your >>> environ or put it in your default MCA param file. >>> >>> >>> On Dec 28, 2010, at 3:31 AM, Advanced Computing Group University of >>> Padova wrote: >>> >>> yes i've tested 'em >>> In fact using the --debug-daemons switch everything works fine! (and i >>> see that on the nodes a process calles orted... is started whenever i launch >>> a test application) >>> I believe this is a environment variables problem.... >>> >>> On Mon, Dec 27, 2010 at 10:16 PM, David Zhang <solarbik...@gmail.com>wrote: >>> >>>> have you tested your ssh key setup, fire wall, and switch settings to >>>> ensure all nodes are talking to each other? >>>> >>>> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of >>>> Padova <acg.un...@gmail.com> wrote: >>>> >>>>> using openmpi 1.4.2 >>>>> >>>>> >>>>> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University >>>>> of Padova <acg.un...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> i am building a small 16 nodes cluster gentoo based. >>>>>> I succesfully installed openmpi and i succesfully tried some simple >>>>>> small test parallel program on a single host but... >>>>>> i can't run parallel program on more than one nodes >>>>>> >>>>>> >>>>>> The nodes are cloned (so they are equals). >>>>>> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a >>>>>> nfs share. >>>>>> I modified .bashrc >>>>>> >>>>>> ------------------------- >>>>>> PATH=/usr/bin:$PATH ; export PATH ; >>>>>> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; >>>>>> >>>>>> # already present below >>>>>> if [[ $- != *i* ]] ; then >>>>>> # Shell is non-interactive. Be done now! >>>>>> return >>>>>> fi >>>>>> --------------------- >>>>>> >>>>>> The very very strange behaviour is that using the --debug-daemons let >>>>>> my program run succesfully..... >>>>>> >>>>>> Thank you in advance and sorry for my bad english >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> >>>> >>>> -- >>>> David Zhang >>>> University of California, San Diego >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> > <dump_succ.txt><dump_fail.txt> > _______________________________________________ > > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >