Open MPI 1.4.3 is *ancient*. Please upgrade -- we just released Open MPI 1.8 last week.
Also, please look at this FAQ entry -- it steps you through a lot of basic troubleshooting steps about getting basic MPI programs working. http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems Once you get basic MPI programs working, then try with MPI Blast. On Apr 5, 2014, at 3:11 AM, Nisha Dhankher -M.Tech(CSE) <nishadhankher-coaese...@pau.edu> wrote: > Mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 -np 16 -machinefile > mf mpiblast -d all.fas -p blastn -i query.fas -o out.txt > > was the command i executed on cluster... > > > > On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) > <nishadhankher-coaese...@pau.edu> wrote: > sorry Ralph my mistake its not names...it is "it does not happen on same > nodes." > > > On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) > <nishadhankher-coaese...@pau.edu> wrote: > same vm on all machines that is virt-manager > > > On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) > <nishadhankher-coaese...@pau.edu> wrote: > opmpi version 1.4.3 > > > On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > Okay, so if you run mpiBlast on all the non-name nodes, everything is okay? > What do you mean by "names nodes"? > > > On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) > <nishadhankher-coaese...@pau.edu> wrote: > >> no it does not happen on names nodes >> >> >> On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> wrote: >> Hi Nisha >> >> I'm sorry if my questions appear abrasive - I'm just a little frustrated at >> the communication bottleneck as I can't seem to get a clear picture of your >> situation. So you really don't need to keep calling me "sir" :-) >> >> The error you are hitting is very unusual - it means that the processes are >> able to make a connection, but are failing to correctly complete a simple >> handshake exchange of their process identifications. There are only a few >> ways that can happen, and I'm trying to get you to test for them. >> >> So let's try and see if we can narrow this down. You mention that it works >> on some machines, but not all. Is this consistent - i.e., is it always the >> same machines that work, and the same ones that generate the error? If you >> exclude the ones that show the error, does it work? If so, what is different >> about those nodes? Are they a different architecture? >> >> >> On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) >> <nishadhankher-coaese...@pau.edu> wrote: >> >>> sir >>> smae virt-manager is bein used by all pc's.no i did n't enable >>> openmpi-hetro.Yes openmpi version is same in all through same kickstart >>> file. >>> ok...actually sir...rocks itself installed,configured openmpi and mpich on >>> it own through hpc roll. >>> >>> >>> On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) >>> <nishadhankher-coaese...@pau.edu> wrote: >>> >>>> thankyou Ralph. >>>> Yes cluster is heterogenous... >>> >>> And did you configure OMPI --enable-heterogeneous? And are you running it >>> with ---hetero-nodes? What version of OMPI are you using anyway? >>> >>> Note that we don't care if the host pc's are hetero - what we care about is >>> the VM. If all the VMs are the same, then it shouldn't matter. However, >>> most VM technologies don't handle hetero hardware very well - i.e., you >>> can't emulate an x86 architecture on top of a Sparc or Power chip or vice >>> versa. >>> >>> >>>> And i haven't made compute nodes on direct physical nodes (pc's) becoz in >>>> college it is not possible to take whole lab of 32 pc's for your work so >>>> i ran on vm. >>> >>> Yes, but at least it would let you test the setup to run MPI across even a >>> couple of pc's - this is simple debugging practice. >>> >>>> In Rocks cluster, frontend give the same kickstart to all the pc's so >>>> openmpi version should be same i guess. >>> >>> Guess? or know? Makes a difference - might be worth testing. >>> >>>> Sir >>>> mpiformatdb is a command to distribute database fragments to different >>>> compute nodes after partitioning od database. >>>> And sir have you done mpiblast ? >>> >>> Nope - but that isn't the issue, is it? The issue is with the MPI setup. >>> >>>> >>>> >>>> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>> What is "mpiformatdb"? We don't have an MPI database in our system, and I >>>> have no idea what that command means >>>> >>>> As for that error - it means that the identifier we exchange between >>>> processes is failing to be recognized. This could mean a couple of things: >>>> >>>> 1. the OMPI version on the two ends is different - could be you aren't >>>> getting the right paths set on the various machines >>>> >>>> 2. the cluster is heterogeneous >>>> >>>> You say you have "virtual nodes" running on various PC's? That would be an >>>> unusual setup - VM's can be problematic given the way they handle TCP >>>> connections, so that might be another source of the problem if my >>>> understanding of your setup is correct. Have you tried running this across >>>> the PCs directly - i.e., without any VMs? >>>> >>>> >>>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) >>>> <nishadhankher-coaese...@pau.edu> wrote: >>>> >>>>> i first formatted my database with mpiformatdb command then i ran command >>>>> : >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas >>>>> -o output.txt >>>>> but then it gave this error 113 from some hosts and continue to run for >>>>> other but with no results even after 2 hours lapsed.....on rocks 6.0 >>>>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 >>>>> gb ram to each >>>>> >>>>> >>>>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) >>>>> <nishadhankher-coaese...@pau.edu> wrote: >>>>> i also made machine file which contain ip adresses of all compute nodes + >>>>> .ncbirc file for path to mpiblast and shared ,local storage path.... >>>>> Sir >>>>> I ran the same command of mpirun on my college supercomputer 8 nodes each >>>>> having 24 processors but it just running....gave no result uptill 3 >>>>> hours... >>>>> >>>>> >>>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) >>>>> <nishadhankher-coaese...@pau.edu> wrote: >>>>> i first formatted my database with mpiformatdb command then i ran command >>>>> : >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas >>>>> -o output.txt >>>>> but then it gave this error 113 from some hosts and continue to run for >>>>> other but with results even after 2 hours lapsed.....on rocks 6.0 cluster >>>>> with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram >>>>> to each >>>>> >>>>> >>>>> >>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> I'm having trouble understanding your note, so perhaps I am getting this >>>>> wrong. Let's see if I can figure out what you said: >>>>> >>>>> * your perl command fails with "no route to host" - but I don't see any >>>>> host in your cmd. Maybe I'm just missing something. >>>>> >>>>> * you tried running a couple of "mpirun", but the mpirun command wasn't >>>>> recognized? Is that correct? >>>>> >>>>> * you then ran mpiblast and it sounds like it successfully started the >>>>> processes, but then one aborted? Was there an error message beyond just >>>>> the -1 return status? >>>>> >>>>> >>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) >>>>> <nishadhankher-coaese...@pau.edu> wrote: >>>>> >>>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113 >>>>>> >>>>>> In openmpi: this error came when i run my mpiblast program on rocks >>>>>> cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And >>>>>> when i run following command linux_shell$ perl -e 'die$!=113' this msg >>>>>> comes: "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp >>>>>> shell$ mpirun --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca >>>>>> btl_tcp_if_include 10.1.255.244 was also executed but it did nt >>>>>> recognized these commands....nd aborted.... what should i do...? When i >>>>>> run my mpiblast program for the frst time then it give mpi_abort >>>>>> error...bailing out of signal -1 on rank 2 processor...then i removed my >>>>>> public ethernet cable....and then give btl_tcp endpint error 113.... >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/