You should ping the Rocks maintainers and ask them to upgrade. Open MPI 1.4.3 was released in September of 2010.
On Apr 8, 2014, at 5:37 AM, Nisha Dhankher -M.Tech(CSE) <nishadhankher-coaese...@pau.edu> wrote: > latest rocks 6.2 carry this version only > > > On Tue, Apr 8, 2014 at 3:49 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > Open MPI 1.4.3 is *ancient*. Please upgrade -- we just released Open MPI 1.8 > last week. > > Also, please look at this FAQ entry -- it steps you through a lot of basic > troubleshooting steps about getting basic MPI programs working. > > http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems > > Once you get basic MPI programs working, then try with MPI Blast. > > > > On Apr 5, 2014, at 3:11 AM, Nisha Dhankher -M.Tech(CSE) > <nishadhankher-coaese...@pau.edu> wrote: > > > Mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 -np 16 > > -machinefile mf mpiblast -d all.fas -p blastn -i query.fas -o out.txt > > > > was the command i executed on cluster... > > > > > > > > On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) > > <nishadhankher-coaese...@pau.edu> wrote: > > sorry Ralph my mistake its not names...it is "it does not happen on same > > nodes." > > > > > > On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) > > <nishadhankher-coaese...@pau.edu> wrote: > > same vm on all machines that is virt-manager > > > > > > On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) > > <nishadhankher-coaese...@pau.edu> wrote: > > opmpi version 1.4.3 > > > > > > On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Okay, so if you run mpiBlast on all the non-name nodes, everything is okay? > > What do you mean by "names nodes"? > > > > > > On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) > > <nishadhankher-coaese...@pau.edu> wrote: > > > >> no it does not happen on names nodes > >> > >> > >> On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Hi Nisha > >> > >> I'm sorry if my questions appear abrasive - I'm just a little frustrated > >> at the communication bottleneck as I can't seem to get a clear picture of > >> your situation. So you really don't need to keep calling me "sir" :-) > >> > >> The error you are hitting is very unusual - it means that the processes > >> are able to make a connection, but are failing to correctly complete a > >> simple handshake exchange of their process identifications. There are only > >> a few ways that can happen, and I'm trying to get you to test for them. > >> > >> So let's try and see if we can narrow this down. You mention that it works > >> on some machines, but not all. Is this consistent - i.e., is it always the > >> same machines that work, and the same ones that generate the error? If you > >> exclude the ones that show the error, does it work? If so, what is > >> different about those nodes? Are they a different architecture? > >> > >> > >> On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) > >> <nishadhankher-coaese...@pau.edu> wrote: > >> > >>> sir > >>> smae virt-manager is bein used by all pc's.no i did n't enable > >>> openmpi-hetro.Yes openmpi version is same in all through same kickstart > >>> file. > >>> ok...actually sir...rocks itself installed,configured openmpi and mpich > >>> on it own through hpc roll. > >>> > >>> > >>> On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org> wrote: > >>> > >>> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) > >>> <nishadhankher-coaese...@pau.edu> wrote: > >>> > >>>> thankyou Ralph. > >>>> Yes cluster is heterogenous... > >>> > >>> And did you configure OMPI --enable-heterogeneous? And are you running it > >>> with ---hetero-nodes? What version of OMPI are you using anyway? > >>> > >>> Note that we don't care if the host pc's are hetero - what we care about > >>> is the VM. If all the VMs are the same, then it shouldn't matter. > >>> However, most VM technologies don't handle hetero hardware very well - > >>> i.e., you can't emulate an x86 architecture on top of a Sparc or Power > >>> chip or vice versa. > >>> > >>> > >>>> And i haven't made compute nodes on direct physical nodes (pc's) becoz > >>>> in college it is not possible to take whole lab of 32 pc's for your work > >>>> so i ran on vm. > >>> > >>> Yes, but at least it would let you test the setup to run MPI across even > >>> a couple of pc's - this is simple debugging practice. > >>> > >>>> In Rocks cluster, frontend give the same kickstart to all the pc's so > >>>> openmpi version should be same i guess. > >>> > >>> Guess? or know? Makes a difference - might be worth testing. > >>> > >>>> Sir > >>>> mpiformatdb is a command to distribute database fragments to different > >>>> compute nodes after partitioning od database. > >>>> And sir have you done mpiblast ? > >>> > >>> Nope - but that isn't the issue, is it? The issue is with the MPI setup. > >>> > >>>> > >>>> > >>>> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> wrote: > >>>> What is "mpiformatdb"? We don't have an MPI database in our system, and > >>>> I have no idea what that command means > >>>> > >>>> As for that error - it means that the identifier we exchange between > >>>> processes is failing to be recognized. This could mean a couple of > >>>> things: > >>>> > >>>> 1. the OMPI version on the two ends is different - could be you aren't > >>>> getting the right paths set on the various machines > >>>> > >>>> 2. the cluster is heterogeneous > >>>> > >>>> You say you have "virtual nodes" running on various PC's? That would be > >>>> an unusual setup - VM's can be problematic given the way they handle TCP > >>>> connections, so that might be another source of the problem if my > >>>> understanding of your setup is correct. Have you tried running this > >>>> across the PCs directly - i.e., without any VMs? > >>>> > >>>> > >>>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) > >>>> <nishadhankher-coaese...@pau.edu> wrote: > >>>> > >>>>> i first formatted my database with mpiformatdb command then i ran > >>>>> command : > >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i > >>>>> query.fas -o output.txt > >>>>> but then it gave this error 113 from some hosts and continue to run for > >>>>> other but with no results even after 2 hours lapsed.....on rocks 6.0 > >>>>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , > >>>>> 1 gb ram to each > >>>>> > >>>>> > >>>>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) > >>>>> <nishadhankher-coaese...@pau.edu> wrote: > >>>>> i also made machine file which contain ip adresses of all compute nodes > >>>>> + .ncbirc file for path to mpiblast and shared ,local storage path.... > >>>>> Sir > >>>>> I ran the same command of mpirun on my college supercomputer 8 nodes > >>>>> each having 24 processors but it just running....gave no result uptill > >>>>> 3 hours... > >>>>> > >>>>> > >>>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) > >>>>> <nishadhankher-coaese...@pau.edu> wrote: > >>>>> i first formatted my database with mpiformatdb command then i ran > >>>>> command : > >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i > >>>>> query.fas -o output.txt > >>>>> but then it gave this error 113 from some hosts and continue to run for > >>>>> other but with results even after 2 hours lapsed.....on rocks 6.0 > >>>>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , > >>>>> 1 gb ram to each > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> wrote: > >>>>> I'm having trouble understanding your note, so perhaps I am getting > >>>>> this wrong. Let's see if I can figure out what you said: > >>>>> > >>>>> * your perl command fails with "no route to host" - but I don't see any > >>>>> host in your cmd. Maybe I'm just missing something. > >>>>> > >>>>> * you tried running a couple of "mpirun", but the mpirun command wasn't > >>>>> recognized? Is that correct? > >>>>> > >>>>> * you then ran mpiblast and it sounds like it successfully started the > >>>>> processes, but then one aborted? Was there an error message beyond just > >>>>> the -1 return status? > >>>>> > >>>>> > >>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) > >>>>> <nishadhankher-coaese...@pau.edu> wrote: > >>>>> > >>>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113 > >>>>>> > >>>>>> In openmpi: this error came when i run my mpiblast program on rocks > >>>>>> cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And > >>>>>> when i run following command linux_shell$ perl -e 'die$!=113' this msg > >>>>>> comes: "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp > >>>>>> shell$ mpirun --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca > >>>>>> btl_tcp_if_include 10.1.255.244 was also executed but it did nt > >>>>>> recognized these commands....nd aborted.... what should i do...? When > >>>>>> i run my mpiblast program for the frst time then it give mpi_abort > >>>>>> error...bailing out of signal -1 on rank 2 processor...then i removed > >>>>>> my public ethernet cable....and then give btl_tcp endpint error 113.... > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> us...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/