Open MPI 1.4.3 is *ancient*.  Please upgrade -- we just released Open MPI 1.8 
last week.

Also, please look at this FAQ entry -- it steps you through a lot of basic 
troubleshooting steps about getting basic MPI programs working.  

http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems

Once you get basic MPI programs working, then try with MPI Blast.



On Apr 5, 2014, at 3:11 AM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

> Mpirun --mca btl ^openib --mca btl_tcp_if_include eth0  -np 16  -machinefile 
> mf mpiblast -d all.fas -p blastn -i query.fas -o out.txt
> 
> was the command i executed on cluster...
>  
> 
> 
> On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) 
> <nishadhankher-coaese...@pau.edu> wrote:
> sorry Ralph my mistake its not names...it is "it does not happen on same 
> nodes."
> 
> 
> On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) 
> <nishadhankher-coaese...@pau.edu> wrote:
> same vm on all machines that is virt-manager
> 
> 
> On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) 
> <nishadhankher-coaese...@pau.edu> wrote:
> opmpi version 1.4.3
> 
> 
> On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Okay, so if you run mpiBlast on all the non-name nodes, everything is okay? 
> What do you mean by "names nodes"?
> 
> 
> On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) 
> <nishadhankher-coaese...@pau.edu> wrote:
> 
>> no it does not happen on names nodes 
>> 
>> 
>> On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> Hi Nisha
>> 
>> I'm sorry if my questions appear abrasive - I'm just a little frustrated at 
>> the communication bottleneck as I can't seem to get a clear picture of your 
>> situation. So you really don't need to keep calling me "sir" :-)
>> 
>> The error you are hitting is very unusual - it means that the processes are 
>> able to make a connection, but are failing to correctly complete a simple 
>> handshake exchange of their process identifications. There are only a few 
>> ways that can happen, and I'm trying to get you to test for them.
>> 
>> So let's try and see if we can narrow this down. You mention that it works 
>> on some machines, but not all. Is this consistent - i.e., is it always the 
>> same machines that work, and the same ones that generate the error? If you 
>> exclude the ones that show the error, does it work? If so, what is different 
>> about those nodes? Are they a different architecture?
>> 
>> 
>> On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) 
>> <nishadhankher-coaese...@pau.edu> wrote:
>> 
>>> sir
>>> smae virt-manager is bein used by all pc's.no i did n't enable 
>>> openmpi-hetro.Yes openmpi version is same in all through same kickstart 
>>> file.
>>> ok...actually sir...rocks itself installed,configured openmpi and mpich on 
>>> it own through hpc roll.
>>> 
>>> 
>>> On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) 
>>> <nishadhankher-coaese...@pau.edu> wrote:
>>> 
>>>> thankyou Ralph.
>>>> Yes cluster is heterogenous...
>>> 
>>> And did you configure OMPI --enable-heterogeneous? And are you running it 
>>> with ---hetero-nodes? What version of OMPI are you using anyway?
>>> 
>>> Note that we don't care if the host pc's are hetero - what we care about is 
>>> the VM. If all the VMs are the same, then it shouldn't matter. However, 
>>> most VM technologies don't handle hetero hardware very well - i.e., you 
>>> can't emulate an x86 architecture on top of a Sparc or Power chip or vice 
>>> versa.
>>> 
>>> 
>>>> And i haven't made compute nodes on direct physical nodes (pc's) becoz in 
>>>> college it is not possible to take whole lab of 32 pc's for your work  so 
>>>> i ran on vm.
>>> 
>>> Yes, but at least it would let you test the setup to run MPI across even a 
>>> couple of pc's - this is simple debugging practice.
>>> 
>>>> In Rocks cluster, frontend give the same kickstart to all the pc's so 
>>>> openmpi version should be same i guess.
>>> 
>>> Guess? or know? Makes a difference - might be worth testing.
>>> 
>>>> Sir 
>>>> mpiformatdb is a command to distribute database fragments to different 
>>>> compute nodes after partitioning od database.
>>>> And sir have you done mpiblast ?
>>> 
>>> Nope - but that isn't the issue, is it? The issue is with the MPI setup.
>>> 
>>>> 
>>>> 
>>>> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> What is "mpiformatdb"? We don't have an MPI database in our system, and I 
>>>> have no idea what that command means
>>>> 
>>>> As for that error - it means that the identifier we exchange between 
>>>> processes is failing to be recognized. This could mean a couple of things:
>>>> 
>>>> 1. the OMPI version on the two ends is different - could be you aren't 
>>>> getting the right paths set on the various machines
>>>> 
>>>> 2. the cluster is heterogeneous
>>>> 
>>>> You say you have "virtual nodes" running on various PC's? That would be an 
>>>> unusual setup - VM's can be problematic given the way they handle TCP 
>>>> connections, so that might be another source of the problem if my 
>>>> understanding of your setup is correct. Have you tried running this across 
>>>> the PCs directly - i.e., without any VMs?
>>>> 
>>>> 
>>>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) 
>>>> <nishadhankher-coaese...@pau.edu> wrote:
>>>> 
>>>>> i first formatted my database with mpiformatdb command then i ran command 
>>>>> :
>>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas 
>>>>> -o output.txt
>>>>> but then it gave this error 113 from some hosts and continue to run for 
>>>>> other but with no  results even after 2 hours lapsed.....on rocks 6.0 
>>>>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 
>>>>> gb ram to each
>>>>> 
>>>>> 
>>>>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) 
>>>>> <nishadhankher-coaese...@pau.edu> wrote:
>>>>> i also made machine file which contain ip adresses of all compute nodes + 
>>>>> .ncbirc file for path to mpiblast and shared ,local storage path....
>>>>> Sir
>>>>> I ran the same command of mpirun on my college supercomputer 8 nodes each 
>>>>> having 24 processors but it just running....gave no result uptill 3 
>>>>> hours...
>>>>> 
>>>>> 
>>>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) 
>>>>> <nishadhankher-coaese...@pau.edu> wrote:
>>>>> i first formatted my database with mpiformatdb command then i ran command 
>>>>> :
>>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas 
>>>>> -o output.txt
>>>>> but then it gave this error 113 from some hosts and continue to run for 
>>>>> other but with results even after 2 hours lapsed.....on rocks 6.0 cluster 
>>>>> with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram 
>>>>> to each
>>>>>  
>>>>> 
>>>>> 
>>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> I'm having trouble understanding your note, so perhaps I am getting this 
>>>>> wrong. Let's see if I can figure out what you said:
>>>>> 
>>>>> * your perl command fails with "no route to host" - but I don't see any 
>>>>> host in your cmd. Maybe I'm just missing something.
>>>>> 
>>>>> * you tried running a couple of "mpirun", but the mpirun command wasn't 
>>>>> recognized? Is that correct?
>>>>> 
>>>>> * you then ran mpiblast and it sounds like it successfully started the 
>>>>> processes, but then one aborted? Was there an error message beyond just 
>>>>> the -1 return status?
>>>>> 
>>>>> 
>>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) 
>>>>> <nishadhankher-coaese...@pau.edu> wrote:
>>>>> 
>>>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113
>>>>>> 
>>>>>> In openmpi: this error came when i run my mpiblast program on rocks 
>>>>>> cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And 
>>>>>> when i run following command linux_shell$ perl -e 'die$!=113' this msg 
>>>>>> comes: "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp 
>>>>>> shell$ mpirun --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca 
>>>>>> btl_tcp_if_include 10.1.255.244 was also executed but it did nt 
>>>>>> recognized these commands....nd aborted.... what should i do...? When i 
>>>>>> run my mpiblast program for the frst time then it give mpi_abort 
>>>>>> error...bailing out of signal -1 on rank 2 processor...then i removed my 
>>>>>> public ethernet cable....and then give btl_tcp endpint error 113....
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to