and thank you very much
On Tue, Apr 8, 2014 at 3:07 PM, Nisha Dhankher -M.Tech(CSE) < nishadhankher-coaese...@pau.edu> wrote: > latest rocks 6.2 carry this version only > > > On Tue, Apr 8, 2014 at 3:49 AM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> Open MPI 1.4.3 is *ancient*. Please upgrade -- we just released Open MPI >> 1.8 last week. >> >> Also, please look at this FAQ entry -- it steps you through a lot of >> basic troubleshooting steps about getting basic MPI programs working. >> >> http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems >> >> Once you get basic MPI programs working, then try with MPI Blast. >> >> >> >> On Apr 5, 2014, at 3:11 AM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >> > Mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 -np 16 >> -machinefile mf mpiblast -d all.fas -p blastn -i query.fas -o out.txt >> > >> > was the command i executed on cluster... >> > >> > >> > >> > On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> > sorry Ralph my mistake its not names...it is "it does not happen on >> same nodes." >> > >> > >> > On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> > same vm on all machines that is virt-manager >> > >> > >> > On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> > opmpi version 1.4.3 >> > >> > >> > On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >> > Okay, so if you run mpiBlast on all the non-name nodes, everything is >> okay? What do you mean by "names nodes"? >> > >> > >> > On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> > >> >> no it does not happen on names nodes >> >> >> >> >> >> On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> >> wrote: >> >> Hi Nisha >> >> >> >> I'm sorry if my questions appear abrasive - I'm just a little >> frustrated at the communication bottleneck as I can't seem to get a clear >> picture of your situation. So you really don't need to keep calling me >> "sir" :-) >> >> >> >> The error you are hitting is very unusual - it means that the >> processes are able to make a connection, but are failing to correctly >> complete a simple handshake exchange of their process identifications. >> There are only a few ways that can happen, and I'm trying to get you to >> test for them. >> >> >> >> So let's try and see if we can narrow this down. You mention that it >> works on some machines, but not all. Is this consistent - i.e., is it >> always the same machines that work, and the same ones that generate the >> error? If you exclude the ones that show the error, does it work? If so, >> what is different about those nodes? Are they a different architecture? >> >> >> >> >> >> On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >> >> >>> sir >> >>> smae virt-manager is bein used by all pc's.no i did n't enable >> openmpi-hetro.Yes openmpi version is same in all through same kickstart >> file. >> >>> ok...actually sir...rocks itself installed,configured openmpi and >> mpich on it own through hpc roll. >> >>> >> >>> >> >>> On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org> >> wrote: >> >>> >> >>> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>> >> >>>> thankyou Ralph. >> >>>> Yes cluster is heterogenous... >> >>> >> >>> And did you configure OMPI --enable-heterogeneous? And are you >> running it with ---hetero-nodes? What version of OMPI are you using anyway? >> >>> >> >>> Note that we don't care if the host pc's are hetero - what we care >> about is the VM. If all the VMs are the same, then it shouldn't matter. >> However, most VM technologies don't handle hetero hardware very well - >> i.e., you can't emulate an x86 architecture on top of a Sparc or Power chip >> or vice versa. >> >>> >> >>> >> >>>> And i haven't made compute nodes on direct physical nodes (pc's) >> becoz in college it is not possible to take whole lab of 32 pc's for your >> work so i ran on vm. >> >>> >> >>> Yes, but at least it would let you test the setup to run MPI across >> even a couple of pc's - this is simple debugging practice. >> >>> >> >>>> In Rocks cluster, frontend give the same kickstart to all the pc's >> so openmpi version should be same i guess. >> >>> >> >>> Guess? or know? Makes a difference - might be worth testing. >> >>> >> >>>> Sir >> >>>> mpiformatdb is a command to distribute database fragments to >> different compute nodes after partitioning od database. >> >>>> And sir have you done mpiblast ? >> >>> >> >>> Nope - but that isn't the issue, is it? The issue is with the MPI >> setup. >> >>> >> >>>> >> >>>> >> >>>> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> >> wrote: >> >>>> What is "mpiformatdb"? We don't have an MPI database in our system, >> and I have no idea what that command means >> >>>> >> >>>> As for that error - it means that the identifier we exchange between >> processes is failing to be recognized. This could mean a couple of things: >> >>>> >> >>>> 1. the OMPI version on the two ends is different - could be you >> aren't getting the right paths set on the various machines >> >>>> >> >>>> 2. the cluster is heterogeneous >> >>>> >> >>>> You say you have "virtual nodes" running on various PC's? That would >> be an unusual setup - VM's can be problematic given the way they handle TCP >> connections, so that might be another source of the problem if my >> understanding of your setup is correct. Have you tried running this across >> the PCs directly - i.e., without any VMs? >> >>>> >> >>>> >> >>>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>>> >> >>>>> i first formatted my database with mpiformatdb command then i ran >> command : >> >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i >> query.fas -o output.txt >> >>>>> but then it gave this error 113 from some hosts and continue to run >> for other but with no results even after 2 hours lapsed.....on rocks 6.0 >> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb >> ram to each >> >>>>> >> >>>>> >> >>>>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>>>> i also made machine file which contain ip adresses of all compute >> nodes + .ncbirc file for path to mpiblast and shared ,local storage path.... >> >>>>> Sir >> >>>>> I ran the same command of mpirun on my college supercomputer 8 >> nodes each having 24 processors but it just running....gave no result >> uptill 3 hours... >> >>>>> >> >>>>> >> >>>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>>>> i first formatted my database with mpiformatdb command then i ran >> command : >> >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i >> query.fas -o output.txt >> >>>>> but then it gave this error 113 from some hosts and continue to run >> for other but with results even after 2 hours lapsed.....on rocks 6.0 >> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb >> ram to each >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> >> wrote: >> >>>>> I'm having trouble understanding your note, so perhaps I am getting >> this wrong. Let's see if I can figure out what you said: >> >>>>> >> >>>>> * your perl command fails with "no route to host" - but I don't see >> any host in your cmd. Maybe I'm just missing something. >> >>>>> >> >>>>> * you tried running a couple of "mpirun", but the mpirun command >> wasn't recognized? Is that correct? >> >>>>> >> >>>>> * you then ran mpiblast and it sounds like it successfully started >> the processes, but then one aborted? Was there an error message beyond just >> the -1 return status? >> >>>>> >> >>>>> >> >>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>>>> >> >>>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113 >> >>>>>> >> >>>>>> In openmpi: this error came when i run my mpiblast program on >> rocks cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And >> when i run following command linux_shell$ perl -e 'die$!=113' this msg >> comes: "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp shell$ >> mpirun --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca >> btl_tcp_if_include 10.1.255.244 was also executed but it did nt recognized >> these commands....nd aborted.... what should i do...? When i run my >> mpiblast program for the frst time then it give mpi_abort error...bailing >> out of signal -1 on rank 2 processor...then i removed my public ethernet >> cable....and then give btl_tcp endpint error 113.... >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> users mailing list >> >>>>>> us...@open-mpi.org >> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> users mailing list >> >>>> us...@open-mpi.org >> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>> >> >>>> _______________________________________________ >> >>>> users mailing list >> >>>> us...@open-mpi.org >> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>> >> >>> >> >>> _______________________________________________ >> >>> users mailing list >> >>> us...@open-mpi.org >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>> >> >>> _______________________________________________ >> >>> users mailing list >> >>> us...@open-mpi.org >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >