latest rocks 6.2 carry this version only
On Tue, Apr 8, 2014 at 3:49 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>wrote: > Open MPI 1.4.3 is *ancient*. Please upgrade -- we just released Open MPI > 1.8 last week. > > Also, please look at this FAQ entry -- it steps you through a lot of basic > troubleshooting steps about getting basic MPI programs working. > > http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems > > Once you get basic MPI programs working, then try with MPI Blast. > > > > On Apr 5, 2014, at 3:11 AM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > > > Mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 -np 16 > -machinefile mf mpiblast -d all.fas -p blastn -i query.fas -o out.txt > > > > was the command i executed on cluster... > > > > > > > > On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > > sorry Ralph my mistake its not names...it is "it does not happen on same > nodes." > > > > > > On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > > same vm on all machines that is virt-manager > > > > > > On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > > opmpi version 1.4.3 > > > > > > On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Okay, so if you run mpiBlast on all the non-name nodes, everything is > okay? What do you mean by "names nodes"? > > > > > > On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > > > >> no it does not happen on names nodes > >> > >> > >> On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Hi Nisha > >> > >> I'm sorry if my questions appear abrasive - I'm just a little > frustrated at the communication bottleneck as I can't seem to get a clear > picture of your situation. So you really don't need to keep calling me > "sir" :-) > >> > >> The error you are hitting is very unusual - it means that the processes > are able to make a connection, but are failing to correctly complete a > simple handshake exchange of their process identifications. There are only > a few ways that can happen, and I'm trying to get you to test for them. > >> > >> So let's try and see if we can narrow this down. You mention that it > works on some machines, but not all. Is this consistent - i.e., is it > always the same machines that work, and the same ones that generate the > error? If you exclude the ones that show the error, does it work? If so, > what is different about those nodes? Are they a different architecture? > >> > >> > >> On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >> > >>> sir > >>> smae virt-manager is bein used by all pc's.no i did n't enable > openmpi-hetro.Yes openmpi version is same in all through same kickstart > file. > >>> ok...actually sir...rocks itself installed,configured openmpi and > mpich on it own through hpc roll. > >>> > >>> > >>> On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org> > wrote: > >>> > >>> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >>> > >>>> thankyou Ralph. > >>>> Yes cluster is heterogenous... > >>> > >>> And did you configure OMPI --enable-heterogeneous? And are you running > it with ---hetero-nodes? What version of OMPI are you using anyway? > >>> > >>> Note that we don't care if the host pc's are hetero - what we care > about is the VM. If all the VMs are the same, then it shouldn't matter. > However, most VM technologies don't handle hetero hardware very well - > i.e., you can't emulate an x86 architecture on top of a Sparc or Power chip > or vice versa. > >>> > >>> > >>>> And i haven't made compute nodes on direct physical nodes (pc's) > becoz in college it is not possible to take whole lab of 32 pc's for your > work so i ran on vm. > >>> > >>> Yes, but at least it would let you test the setup to run MPI across > even a couple of pc's - this is simple debugging practice. > >>> > >>>> In Rocks cluster, frontend give the same kickstart to all the pc's so > openmpi version should be same i guess. > >>> > >>> Guess? or know? Makes a difference - might be worth testing. > >>> > >>>> Sir > >>>> mpiformatdb is a command to distribute database fragments to > different compute nodes after partitioning od database. > >>>> And sir have you done mpiblast ? > >>> > >>> Nope - but that isn't the issue, is it? The issue is with the MPI > setup. > >>> > >>>> > >>>> > >>>> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> > wrote: > >>>> What is "mpiformatdb"? We don't have an MPI database in our system, > and I have no idea what that command means > >>>> > >>>> As for that error - it means that the identifier we exchange between > processes is failing to be recognized. This could mean a couple of things: > >>>> > >>>> 1. the OMPI version on the two ends is different - could be you > aren't getting the right paths set on the various machines > >>>> > >>>> 2. the cluster is heterogeneous > >>>> > >>>> You say you have "virtual nodes" running on various PC's? That would > be an unusual setup - VM's can be problematic given the way they handle TCP > connections, so that might be another source of the problem if my > understanding of your setup is correct. Have you tried running this across > the PCs directly - i.e., without any VMs? > >>>> > >>>> > >>>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >>>> > >>>>> i first formatted my database with mpiformatdb command then i ran > command : > >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i > query.fas -o output.txt > >>>>> but then it gave this error 113 from some hosts and continue to run > for other but with no results even after 2 hours lapsed.....on rocks 6.0 > cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb > ram to each > >>>>> > >>>>> > >>>>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >>>>> i also made machine file which contain ip adresses of all compute > nodes + .ncbirc file for path to mpiblast and shared ,local storage path.... > >>>>> Sir > >>>>> I ran the same command of mpirun on my college supercomputer 8 nodes > each having 24 processors but it just running....gave no result uptill 3 > hours... > >>>>> > >>>>> > >>>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >>>>> i first formatted my database with mpiformatdb command then i ran > command : > >>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i > query.fas -o output.txt > >>>>> but then it gave this error 113 from some hosts and continue to run > for other but with results even after 2 hours lapsed.....on rocks 6.0 > cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb > ram to each > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> > wrote: > >>>>> I'm having trouble understanding your note, so perhaps I am getting > this wrong. Let's see if I can figure out what you said: > >>>>> > >>>>> * your perl command fails with "no route to host" - but I don't see > any host in your cmd. Maybe I'm just missing something. > >>>>> > >>>>> * you tried running a couple of "mpirun", but the mpirun command > wasn't recognized? Is that correct? > >>>>> > >>>>> * you then ran mpiblast and it sounds like it successfully started > the processes, but then one aborted? Was there an error message beyond just > the -1 return status? > >>>>> > >>>>> > >>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >>>>> > >>>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113 > >>>>>> > >>>>>> In openmpi: this error came when i run my mpiblast program on rocks > cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And when > i run following command linux_shell$ perl -e 'die$!=113' this msg comes: > "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp shell$ mpirun > --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca btl_tcp_if_include > 10.1.255.244 was also executed but it did nt recognized these > commands....nd aborted.... what should i do...? When i run my mpiblast > program for the frst time then it give mpi_abort error...bailing out of > signal -1 on rank 2 processor...then i removed my public ethernet > cable....and then give btl_tcp endpint error 113.... > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> us...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >