Ah - okay, my misunderstanding. Would you be willing to give the trunk a try? It might help to know if the problem is solely in 1.6, or continues.
On Jul 26, 2012, at 4:32 PM, Brock Palen wrote: > I think so, sorry if I gave you the impression that Rmpi changed, > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > bro...@umich.edu > (734)936-1985 > > > > On Jul 26, 2012, at 7:30 PM, Ralph Castain wrote: > >> Guess I'm confused - your original note indicated that something had changed >> in Rmpi that broke things. Are you now saying it was something in OMPI? >> >> On Jul 26, 2012, at 4:22 PM, Brock Palen wrote: >> >>> Ok will see, Rmpi we had working with 1.4 and has not been updated after >>> 2010, this this kinda stinks. >>> >>> I will keep digging into it thanks for the help. >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote: >>> >>>> Crud - afraid you'll have to ask them, then :-( >>>> >>>> >>>> On Jul 26, 2012, at 3:50 PM, Brock Palen wrote: >>>> >>>>> Ralph, >>>>> >>>>> Rmpi wraps everything up, so I tried setting them with >>>>> >>>>> export OMPI_plm_base_verbose=5 >>>>> export OMPI_dpm_base_verbose=5 >>>>> >>>>> and I get no extra messages even on helloworld example simple MPI-1.0 >>>>> code. >>>>> >>>>> >>>>> Brock Palen >>>>> www.umich.edu/~brockp >>>>> CAEN Advanced Computing >>>>> bro...@umich.edu >>>>> (734)936-1985 >>>>> >>>>> >>>>> >>>>> On Jul 26, 2012, at 6:42 PM, Ralph Castain wrote: >>>>> >>>>>> Well, it looks like comm_spawn is working on 1.6. Afraid I don't know >>>>>> enough about Rmpi/snow to advise on what changed, but you could add some >>>>>> debug params to get an idea of where the problem is occurring: >>>>>> >>>>>> -mca plm_base_verbose 5 -mca dpm_base_verbose 5 >>>>>> >>>>>> should tell you from an OMPI perspective. I can try to help debug that >>>>>> end, at least. >>>>>> >>>>>> >>>>>> On Jul 26, 2012, at 3:02 PM, Ralph Castain wrote: >>>>>> >>>>>>> Weird - looks like it has done a comm_spawn and having trouble >>>>>>> connecting between the jobs. I can check the basic code and make sure >>>>>>> it is working - I seem to recall someone else recently talking about >>>>>>> Rmpi changes causing problems (different ones than this, IIRC), so you >>>>>>> might want to search our user archives for rmpi to see what they ran >>>>>>> into. Not sure what rmpi changed, or why. >>>>>>> >>>>>>> On Jul 26, 2012, at 2:41 PM, Brock Palen wrote: >>>>>>> >>>>>>>> I have ran into a problem using Rmpi with OpenMPI (trying to get snow >>>>>>>> running). >>>>>>>> >>>>>>>> I built OpenMPI following another post where I built static: >>>>>>>> >>>>>>>> ./configure --prefix=$INSTALL/gcc-4.4.6-static >>>>>>>> --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ >>>>>>>> --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran >>>>>>>> F77=gfortran >>>>>>>> >>>>>>>> Rmpi/snow work fine when I run on a single node. When I span more >>>>>>>> than one node I get nasty errors (pasted below). >>>>>>>> >>>>>>>> I tested this mpi install with a simple hello world and that works. >>>>>>>> Any thoughts what is different about Rmpi/snow that could cause this? >>>>>>>> >>>>>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not >>>>>>>> found in file routed_binomial.c at line 386 >>>>>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried >>>>>>>> routing message from [[48116,2],16] to [[48116,1],0]:16, can't find >>>>>>>> route >>>>>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not >>>>>>>> found in file routed_binomial.c at line 386 >>>>>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried >>>>>>>> routing message from [[48116,2],32] to [[48116,1],0]:16, can't find >>>>>>>> route >>>>>>>> [0] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) >>>>>>>> [0x2b7e9209e0df] >>>>>>>> [1] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a) >>>>>>>> [0x2b7e9206577a] >>>>>>>> [2] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f) >>>>>>>> [0x2b7e920404af] >>>>>>>> [3] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2) >>>>>>>> [0x2b7e92041ed2] >>>>>>>> [4] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238) >>>>>>>> [0x2b7e92087e38] >>>>>>>> [5] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8) >>>>>>>> [0x2b7e92016768] >>>>>>>> [6] func:orted(main+0x66) [0x400966] >>>>>>>> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecdd] >>>>>>>> [8] func:orted() [0x400839] >>>>>>>> [nyx0397.engin.umich.edu:06959] [[48116,0],1] ORTE_ERROR_LOG: Not >>>>>>>> found in file routed_binomial.c at line 386 >>>>>>>> [nyx0397.engin.umich.edu:06959] [[48116,0],1]:route_callback tried >>>>>>>> routing message from [[48116,2],7] to [[48116,1],0]:16, can't find >>>>>>>> route >>>>>>>> [nyx0401.engin.umich.edu:07782] [[48116,0],5] ORTE_ERROR_LOG: Not >>>>>>>> found in file routed_binomial.c at line 386 >>>>>>>> [nyx0401.engin.umich.edu:07782] [[48116,0],5]:route_callback tried >>>>>>>> routing message from [[48116,2],23] to [[48116,1],0]:16, can't find >>>>>>>> route >>>>>>>> [nyx0406.engin.umich.edu:07743] [[48116,0],9] ORTE_ERROR_LOG: Not >>>>>>>> found in file routed_binomial.c at line 386 >>>>>>>> [nyx0406.engin.umich.edu:07743] [[48116,0],9]:route_callback tried >>>>>>>> routing message from [[48116,2],39] to [[48116,1],0]:16, can't find >>>>>>>> route >>>>>>>> [0] >>>>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) >>>>>>>> [0x2ae2ad17d0df] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Brock Palen >>>>>>>> www.umich.edu/~brockp >>>>>>>> CAEN Advanced Computing >>>>>>>> bro...@umich.edu >>>>>>>> (734)936-1985 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users