thankyou Ralph. Yes cluster is heterogenous... And i haven't made compute nodes on direct physical nodes (pc's) becoz in college it is not possible to take whole lab of 32 pc's for your work so i ran on vm. In Rocks cluster, frontend give the same kickstart to all the pc's so openmpi version should be same i guess. Sir mpiformatdb is a command to distribute database fragments to different compute nodes after partitioning od database. And sir have you done mpiblast ?
On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> wrote: > What is "mpiformatdb"? We don't have an MPI database in our system, and I > have no idea what that command means > > As for that error - it means that the identifier we exchange between > processes is failing to be recognized. This could mean a couple of things: > > 1. the OMPI version on the two ends is different - could be you aren't > getting the right paths set on the various machines > > 2. the cluster is heterogeneous > > You say you have "virtual nodes" running on various PC's? That would be an > unusual setup - VM's can be problematic given the way they handle TCP > connections, so that might be another source of the problem if my > understanding of your setup is correct. Have you tried running this across > the PCs directly - i.e., without any VMs? > > > On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > > i first formatted my database with mpiformatdb command then i ran command : > mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas > -o output.txt > but then it gave this error 113 from some hosts and continue to run for > other but with no results even after 2 hours lapsed.....on rocks 6.0 > cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb > ram to each > > > On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) < > nishadhankher-coaese...@pau.edu> wrote: > >> i also made machine file which contain ip adresses of all compute nodes + >> .ncbirc file for path to mpiblast and shared ,local storage path.... >> Sir >> I ran the same command of mpirun on my college supercomputer 8 nodes each >> having 24 processors but it just running....gave no result uptill 3 hours... >> >> >> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) < >> nishadhankher-coaese...@pau.edu> wrote: >> >>> i first formatted my database with mpiformatdb command then i ran >>> command : >>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas >>> -o output.txt >>> but then it gave this error 113 from some hosts and continue to run for >>> other but with results even after 2 hours lapsed.....on rocks 6.0 cluster >>> with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram to >>> each >>> >>> >>> >>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> I'm having trouble understanding your note, so perhaps I am getting >>>> this wrong. Let's see if I can figure out what you said: >>>> >>>> * your perl command fails with "no route to host" - but I don't see any >>>> host in your cmd. Maybe I'm just missing something. >>>> >>>> * you tried running a couple of "mpirun", but the mpirun command wasn't >>>> recognized? Is that correct? >>>> >>>> * you then ran mpiblast and it sounds like it successfully started the >>>> processes, but then one aborted? Was there an error message beyond just the >>>> -1 return status? >>>> >>>> >>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) < >>>> nishadhankher-coaese...@pau.edu> wrote: >>>> >>>> error btl_tcp_endpint.c: 638 connection failed due to error >>>> 113<http://biosupport.se/questions/696/error-btl_tcp_endpintc-638-connection-failed-due-to-error-113> >>>> >>>> In openmpi: this error came when i run my mpiblast program on rocks >>>> cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And when >>>> i run following command linux_shell$ perl -e 'die$!=113' this msg comes: >>>> "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp shell$ mpirun >>>> --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca btl_tcp_if_include >>>> 10.1.255.244 was also executed but it did nt recognized these >>>> commands....nd aborted.... what should i do...? When i run my mpiblast >>>> program for the frst time then it give mpi_abort error...bailing out of >>>> signal -1 on rank 2 processor...then i removed my public ethernet >>>> cable....and then give btl_tcp endpint error 113.... >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >