Mpirun *--mca btl ^openib --mca btl_tcp_if_include eth0* -np 16 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas -o out.txt
was the command i executed on cluster... On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) < nishadhankher-coaese...@pau.edu> wrote: > sorry Ralph my mistake its not names...it is "it does not happen on same > nodes." > > > On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >> same vm on all machines that is virt-manager >> >> >> On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>> opmpi version 1.4.3 >>> >>> >>> On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Okay, so if you run mpiBlast on all the non-name nodes, everything is >>>> okay? What do you mean by "names nodes"? >>>> >>>> >>>> On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) < >>>> nishadhankher-coaese...@pau.edu> wrote: >>>> >>>> no it does not happen on names nodes >>>> >>>> >>>> On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> Hi Nisha >>>>> >>>>> I'm sorry if my questions appear abrasive - I'm just a little >>>>> frustrated at the communication bottleneck as I can't seem to get a clear >>>>> picture of your situation. So you really don't need to keep calling me >>>>> "sir" :-) >>>>> >>>>> The error you are hitting is very unusual - it means that the >>>>> processes are able to make a connection, but are failing to correctly >>>>> complete a simple handshake exchange of their process identifications. >>>>> There are only a few ways that can happen, and I'm trying to get you to >>>>> test for them. >>>>> >>>>> So let's try and see if we can narrow this down. You mention that it >>>>> works on some machines, but not all. Is this consistent - i.e., is it >>>>> always the same machines that work, and the same ones that generate the >>>>> error? If you exclude the ones that show the error, does it work? If so, >>>>> what is different about those nodes? Are they a different architecture? >>>>> >>>>> >>>>> On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) < >>>>> nishadhankher-coaese...@pau.edu> wrote: >>>>> >>>>> sir >>>>> smae virt-manager is bein used by all pc's.no i did n't enable >>>>> openmpi-hetro.Yes openmpi version is same in all through same kickstart >>>>> file. >>>>> ok...actually sir...rocks itself installed,configured openmpi and >>>>> mpich on it own through hpc roll. >>>>> >>>>> >>>>> On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org>wrote: >>>>> >>>>>> >>>>>> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) < >>>>>> nishadhankher-coaese...@pau.edu> wrote: >>>>>> >>>>>> thankyou Ralph. >>>>>> Yes cluster is heterogenous... >>>>>> >>>>>> >>>>>> And did you configure OMPI --enable-heterogeneous? And are you >>>>>> running it with ---hetero-nodes? What version of OMPI are you using >>>>>> anyway? >>>>>> >>>>>> Note that we don't care if the host pc's are hetero - what we care >>>>>> about is the VM. If all the VMs are the same, then it shouldn't matter. >>>>>> However, most VM technologies don't handle hetero hardware very well - >>>>>> i.e., you can't emulate an x86 architecture on top of a Sparc or Power >>>>>> chip >>>>>> or vice versa. >>>>>> >>>>>> >>>>>> And i haven't made compute nodes on direct physical nodes (pc's) >>>>>> becoz in college it is not possible to take whole lab of 32 pc's for your >>>>>> work so i ran on vm. >>>>>> >>>>>> >>>>>> Yes, but at least it would let you test the setup to run MPI across >>>>>> even a couple of pc's - this is simple debugging practice. >>>>>> >>>>>> In Rocks cluster, frontend give the same kickstart to all the pc's >>>>>> so openmpi version should be same i guess. >>>>>> >>>>>> >>>>>> Guess? or know? Makes a difference - might be worth testing. >>>>>> >>>>>> Sir >>>>>> mpiformatdb is a command to distribute database fragments to >>>>>> different compute nodes after partitioning od database. >>>>>> And sir have you done mpiblast ? >>>>>> >>>>>> >>>>>> Nope - but that isn't the issue, is it? The issue is with the MPI >>>>>> setup. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org>wrote: >>>>>> >>>>>>> What is "mpiformatdb"? We don't have an MPI database in our system, >>>>>>> and I have no idea what that command means >>>>>>> >>>>>>> As for that error - it means that the identifier we exchange between >>>>>>> processes is failing to be recognized. This could mean a couple of >>>>>>> things: >>>>>>> >>>>>>> 1. the OMPI version on the two ends is different - could be you >>>>>>> aren't getting the right paths set on the various machines >>>>>>> >>>>>>> 2. the cluster is heterogeneous >>>>>>> >>>>>>> You say you have "virtual nodes" running on various PC's? That would >>>>>>> be an unusual setup - VM's can be problematic given the way they handle >>>>>>> TCP >>>>>>> connections, so that might be another source of the problem if my >>>>>>> understanding of your setup is correct. Have you tried running this >>>>>>> across >>>>>>> the PCs directly - i.e., without any VMs? >>>>>>> >>>>>>> >>>>>>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) < >>>>>>> nishadhankher-coaese...@pau.edu> wrote: >>>>>>> >>>>>>> i first formatted my database with mpiformatdb command then i ran >>>>>>> command : >>>>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i >>>>>>> query.fas -o output.txt >>>>>>> but then it gave this error 113 from some hosts and continue to run >>>>>>> for other but with no results even after 2 hours lapsed.....on rocks >>>>>>> 6.0 >>>>>>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , >>>>>>> 1 gb >>>>>>> ram to each >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) < >>>>>>> nishadhankher-coaese...@pau.edu> wrote: >>>>>>> >>>>>>>> i also made machine file which contain ip adresses of all compute >>>>>>>> nodes + .ncbirc file for path to mpiblast and shared ,local storage >>>>>>>> path.... >>>>>>>> Sir >>>>>>>> I ran the same command of mpirun on my college supercomputer 8 >>>>>>>> nodes each having 24 processors but it just running....gave no result >>>>>>>> uptill 3 hours... >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) < >>>>>>>> nishadhankher-coaese...@pau.edu> wrote: >>>>>>>> >>>>>>>>> i first formatted my database with mpiformatdb command then i ran >>>>>>>>> command : >>>>>>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i >>>>>>>>> query.fas -o output.txt >>>>>>>>> but then it gave this error 113 from some hosts and continue to >>>>>>>>> run for other but with results even after 2 hours lapsed.....on rocks >>>>>>>>> 6.0 >>>>>>>>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger >>>>>>>>> , 1 gb >>>>>>>>> ram to each >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain >>>>>>>>> <r...@open-mpi.org>wrote: >>>>>>>>> >>>>>>>>>> I'm having trouble understanding your note, so perhaps I am >>>>>>>>>> getting this wrong. Let's see if I can figure out what you said: >>>>>>>>>> >>>>>>>>>> * your perl command fails with "no route to host" - but I don't >>>>>>>>>> see any host in your cmd. Maybe I'm just missing something. >>>>>>>>>> >>>>>>>>>> * you tried running a couple of "mpirun", but the mpirun command >>>>>>>>>> wasn't recognized? Is that correct? >>>>>>>>>> >>>>>>>>>> * you then ran mpiblast and it sounds like it successfully >>>>>>>>>> started the processes, but then one aborted? Was there an error >>>>>>>>>> message >>>>>>>>>> beyond just the -1 return status? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) < >>>>>>>>>> nishadhankher-coaese...@pau.edu> wrote: >>>>>>>>>> >>>>>>>>>> error btl_tcp_endpint.c: 638 connection failed due to error >>>>>>>>>> 113<http://biosupport.se/questions/696/error-btl_tcp_endpintc-638-connection-failed-due-to-error-113> >>>>>>>>>> >>>>>>>>>> In openmpi: this error came when i run my mpiblast program on >>>>>>>>>> rocks cluster.Connect to hosts failed on ip >>>>>>>>>> 10.1.255.236,10.1.255.244 . And >>>>>>>>>> when i run following command linux_shell$ perl -e 'die$!=113' this >>>>>>>>>> msg >>>>>>>>>> comes: "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp >>>>>>>>>> shell$ >>>>>>>>>> mpirun --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca >>>>>>>>>> btl_tcp_if_include 10.1.255.244 was also executed but it did nt >>>>>>>>>> recognized >>>>>>>>>> these commands....nd aborted.... what should i do...? When i run my >>>>>>>>>> mpiblast program for the frst time then it give mpi_abort >>>>>>>>>> error...bailing >>>>>>>>>> out of signal -1 on rank 2 processor...then i removed my public >>>>>>>>>> ethernet >>>>>>>>>> cable....and then give btl_tcp endpint error 113.... >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> >> >