Hello,
Thanks i figured out what was the exact problem in my case.
Now i am using the following execution line.
it is directing the mpi comm port to start from 10000...

mpiexec -n 2 --host karp,wirth --mca btl ^openib --mca btl_tcp_if_include
br0 --mca btl_tcp_port_min_v4 10000 ./a.out

and every thing works again.

Thanks.

Best regards.




On Tue, Mar 25, 2014 at 10:23 AM, Hamid Saeed <e.hamidsa...@gmail.com>wrote:

> Hello,
> I am not sure what approach does the MPI communication follow but when i
> use
> --mca btl_base_verbose 30
>
> I observe the mentioned port.
>
> [karp:23756] btl: tcp: attempting to connect() to address 134.106.3.252 on
> port 4
> [karp][[4612,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> connect() to 134.106.3.252 failed: Connection refused (111)
>
>
> the information on the
> http://www.open-mpi.org/community/lists/users/2011/11/17732.php
> is not enough could you kindly explain..
>
> How can restrict MPI communication to use the ports starting from 1025.
> or use the port some what like
> 59822...
>
> Regards.
>
>
>
> On Tue, Mar 25, 2014 at 9:15 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>
>> Hi,
>>
>> Am 25.03.2014 um 08:34 schrieb Hamid Saeed:
>>
>> > Is it possible to change the port number for the MPI communication?
>> >
>> > I can see that my program uses port 4 for the MPI communication.
>> >
>> > [karp:23756] btl: tcp: attempting to connect() to address 134.106.3.252
>> on port 4
>> >
>> [karp][[4612,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 134.106.3.252 failed: Connection refused (111)
>> >
>> > In my case the ports from 1 to 1024 are reserved.
>> > MPI tries to use one of the reserve ports and prompts the connection
>> refused error.
>> >
>> > I will be very glade for the kind suggestions.
>>
>> There are certain parameters to set the range of used ports, but using
>> any up to 1024 should not be the default:
>>
>> http://www.open-mpi.org/community/lists/users/2011/11/17732.php
>>
>> Are any of these set by accident beforehand by your environment?
>>
>> -- Reuti
>>
>>
>> > Regards.
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Mar 24, 2014 at 5:32 PM, Hamid Saeed <e.hamidsa...@gmail.com>
>> wrote:
>> > Hello Jeff,
>> >
>> > Thanks for your cooperation.
>> >
>> > --mca btl_tcp_if_include br0
>> >
>> > worked out of the box.
>> >
>> > The problem was from the network administrator. The machines on the
>> network side were halting the mpi...
>> >
>> > so cleaning and killing every thing worked.
>> >
>> > :)
>> >
>> > regards.
>> >
>> >
>> > On Mon, Mar 24, 2014 at 4:34 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>> > There is no "self" IP interface in the Linux kernel.
>> >
>> > Try using btl_tcp_if_include and list just the interface(s) that you
>> want to use.  From your prior email, I'm *guessing* it's just br2 (i.e.,
>> the 10.x address inside your cluster).
>> >
>> > Also, it looks like you didn't setup your SSH keys properly for logging
>> in to remote notes automatically.
>> >
>> >
>> >
>> > On Mar 24, 2014, at 10:56 AM, Hamid Saeed <e.hamidsa...@gmail.com>
>> wrote:
>> >
>> > > Hello,
>> > >
>> > > I added the "self" e.g
>> > >
>> > > hsaeed@karp:~/Task4_mpi/scatterv$ mpirun -np 8 --mca btl ^openib
>> --mca btl_tcp_if_exclude sm,self,lo,br0,br1,ib0,br2 --host karp,wirth
>> ./scatterv
>> > >
>> > > Enter passphrase for key '/home/hsaeed/.ssh/id_rsa':
>> > >
>> --------------------------------------------------------------------------
>> > >
>> > > ERROR::
>> > >
>> > > At least one pair of MPI processes are unable to reach each other for
>> > > MPI communications.  This means that no Open MPI device has indicated
>> > > that it can be used to communicate between these processes.  This is
>> > > an error; Open MPI requires that all MPI processes be able to reach
>> > > each other.  This error can sometimes be the result of forgetting to
>> > > specify the "self" BTL.
>> > >
>> > >   Process 1 ([[15751,1],7]) is on host: wirth
>> > >   Process 2 ([[15751,1],0]) is on host: karp
>> > >   BTLs attempted: self sm
>> > >
>> > > Your MPI job is now going to abort; sorry.
>> > >
>> --------------------------------------------------------------------------
>> > >
>> --------------------------------------------------------------------------
>> > > MPI_INIT has failed because at least one MPI process is unreachable
>> > > from another.  This *usually* means that an underlying communication
>> > > plugin -- such as a BTL or an MTL -- has either not loaded or not
>> > > allowed itself to be used.  Your MPI job will now abort.
>> > >
>> > > You may wish to try to narrow down the problem;
>> > >
>> > >  * Check the output of ompi_info to see which BTL/MTL plugins are
>> > >    available.
>> > >  * Run your application with MPI_THREAD_SINGLE.
>> > >  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>> > >    if using MTL-based communications) to see exactly which
>> > >    communication plugins were considered and/or discarded.
>> > >
>> --------------------------------------------------------------------------
>> > > [wirth:40329] *** An error occurred in MPI_Init
>> > > [wirth:40329] *** on a NULL communicator
>> > > [wirth:40329] *** Unknown error
>> > > [wirth:40329] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> > >
>> --------------------------------------------------------------------------
>> > > An MPI process is aborting at a time when it cannot guarantee that all
>> > > of its peer processes in the job will be killed properly.  You should
>> > > double check that everything has shut down cleanly.
>> > >
>> > >   Reason:     Before MPI_INIT completed
>> > >   Local host: wirth
>> > >   PID:        40329
>> > >
>> --------------------------------------------------------------------------
>> > >
>> --------------------------------------------------------------------------
>> > > mpirun has exited due to process rank 7 with PID 40329 on
>> > > node wirth exiting improperly. There are two reasons this could occur:
>> > >
>> > > 1. this process did not call "init" before exiting, but others in
>> > > the job did. This can cause a job to hang indefinitely while it waits
>> > > for all processes to call "init". By rule, if one process calls
>> "init",
>> > > then ALL processes must call "init" prior to termination.
>> > >
>> > > 2. this process called "init", but exited without calling "finalize".
>> > > By rule, all processes that call "init" MUST call "finalize" prior to
>> > > exiting or it will be considered an "abnormal termination"
>> > >
>> > > This may have caused other processes in the application to be
>> > > terminated by signals sent by mpirun (as reported here).
>> > >
>> --------------------------------------------------------------------------
>> > > [karp:29513] 1 more process has sent help message help-mca-bml-r2.txt
>> / unreachable proc
>> > > [karp:29513] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>> > > [karp:29513] 1 more process has sent help message help-mpi-runtime /
>> mpi_init:startup:pml-add-procs-fail
>> > > [karp:29513] 1 more process has sent help message help-mpi-errors.txt
>> / mpi_errors_are_fatal unknown handle
>> > > [karp:29513] 1 more process has sent help message
>> help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
>> > >
>> > > I tried every combination for btl_tcp_if_include or exclude...
>> > >
>> > > I cant figure out what is wrong.
>> > > I can easily talk with the remote pc using netcat.
>> > > I am sure i am very near to the solution but..
>> > >
>> > > regards.
>> > >
>> > >
>> > >
>> > > On Mon, Mar 24, 2014 at 3:25 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>> > > If you you use btl_tcp_if_exclude, you also need to exclude the
>> loopback interface.  Loopback is excluded by the default value of
>> btl_tcp_if_exclude, but if you overwrite that value, then you need to
>> *also* include the loopback interface in the new value.
>> > >
>> > >
>> > >
>> > > On Mar 24, 2014, at 4:57 AM, Hamid Saeed <e.hamidsa...@gmail.com>
>> wrote:
>> > >
>> > > > Hello,
>> > > > Still i am facing problems.
>> > > > I checked there is no firewall which is acting as a barrier for the
>> mpi communication.
>> > > >
>> > > > even i used the execution line like
>> > > > hsaeed@karp:~/Task4_mpi/scatterv$ mpiexec -n 2 --mca
>> btl_tcp_if_exclude br2 -host wirth,karp ./a.out
>> > > >
>> > > > Now the output hangup without displaying any error.
>> > > >
>> > > > Used "..exclude bt2" because the failed connection was from bt2 as
>> you can see in the "ifconfig" output mentioned above.
>> > > >
>> > > > I know there is something wrong but i am almost unable to figure it
>> out.
>> > > >
>> > > > I need some more kind suggestions.
>> > > >
>> > > > regards.
>> > > >
>> > > >
>> > > > On Fri, Mar 21, 2014 at 6:05 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>> > > > Do you have any firewalling enabled on these machines?  If so,
>> you'll want to either disable it, or allow random TCP connections between
>> any of the cluster nodes.
>> > > >
>> > > >
>> > > > On Mar 21, 2014, at 10:24 AM, Hamid Saeed <e.hamidsa...@gmail.com>
>> wrote:
>> > > >
>> > > > > /sbin/ifconfig
>> > > > >
>> > > > > hsaeed@karp:~$ /sbin/ifconfig
>> > > > > br0       Link encap:Ethernet  HWaddr 00:25:90:59:c9:ba
>> > > > >           inet addr:134.106.3.231  Bcast:134.106.3.255
>>  Mask:255.255.255.0
>> > > > >           inet6 addr: fe80::225:90ff:fe59:c9ba/64 Scope:Link
>> > > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> > > > >           RX packets:49080961 errors:0 dropped:50263 overruns:0
>> frame:0
>> > > > >           TX packets:43279252 errors:0 dropped:0 overruns:0
>> carrier:0
>> > > > >           collisions:0 txqueuelen:0
>> > > > >           RX bytes:41348407558 (38.5 GiB)  TX bytes:80505842745
>> (74.9 GiB)
>> > > > >
>> > > > > br1       Link encap:Ethernet  HWaddr 00:25:90:59:c9:bb
>> > > > >           inet addr:134.106.53.231  Bcast:134.106.53.255
>>  Mask:255.255.255.0
>> > > > >           inet6 addr: fe80::225:90ff:fe59:c9bb/64 Scope:Link
>> > > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> > > > >           RX packets:41573060 errors:0 dropped:50261 overruns:0
>> frame:0
>> > > > >           TX packets:1693509 errors:0 dropped:0 overruns:0
>> carrier:0
>> > > > >           collisions:0 txqueuelen:0
>> > > > >           RX bytes:6177072160 (5.7 GiB)  TX bytes:230617435
>> (219.9 MiB)
>> > > > >
>> > > > > br2       Link encap:Ethernet  HWaddr 00:c0:0a:ec:02:e7
>> > > > >           inet addr:10.231.2.231  Bcast:10.231.2.255
>>  Mask:255.255.255.0
>> > > > >           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>> > > > >           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> > > > >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> > > > >           collisions:0 txqueuelen:0
>> > > > >           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>> > > > >
>> > > > > eth0      Link encap:Ethernet  HWaddr 00:25:90:59:c9:ba
>> > > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> > > > >           RX packets:69108377 errors:0 dropped:0 overruns:0
>> frame:0
>> > > > >           TX packets:86459066 errors:0 dropped:0 overruns:0
>> carrier:0
>> > > > >           collisions:0 txqueuelen:1000
>> > > > >           RX bytes:43533091399 (40.5 GiB)  TX bytes:83359370885
>> (77.6 GiB)
>> > > > >           Memory:dfe60000-dfe80000
>> > > > >
>> > > > > eth1      Link encap:Ethernet  HWaddr 00:25:90:59:c9:bb
>> > > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> > > > >           RX packets:43531546 errors:0 dropped:0 overruns:0
>> frame:0
>> > > > >           TX packets:1716151 errors:0 dropped:0 overruns:0
>> carrier:0
>> > > > >           collisions:0 txqueuelen:1000
>> > > > >           RX bytes:7201915977 (6.7 GiB)  TX bytes:232026383
>> (221.2 MiB)
>> > > > >           Memory:dfee0000-dff00000
>> > > > >
>> > > > > lo        Link encap:Local Loopback
>> > > > >           inet addr:127.0.0.1  Mask:255.0.0.0
>> > > > >           inet6 addr: ::1/128 Scope:Host
>> > > > >           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>> > > > >           RX packets:10890707 errors:0 dropped:0 overruns:0
>> frame:0
>> > > > >           TX packets:10890707 errors:0 dropped:0 overruns:0
>> carrier:0
>> > > > >           collisions:0 txqueuelen:0
>> > > > >           RX bytes:36194379576 (33.7 GiB)  TX bytes:36194379576
>> (33.7 GiB)
>> > > > >
>> > > > > tap0      Link encap:Ethernet  HWaddr 00:c0:0a:ec:02:e7
>> > > > >           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>> > > > >           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> > > > >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> > > > >           collisions:0 txqueuelen:500
>> > > > >           RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>> > > > >
>> > > > > When i execute the following line
>> > > > >
>> > > > > hsaeed@karp:~/Task4_mpi/scatterv$ mpiexec -n 2 -host wirth,karp
>> ./a.out
>> > > > >
>> > > > > i receive Error
>> > > > >
>> > > > >
>> [wirth][[59430,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 10.231.2.231 failed: Connection refused (111)
>> > > > >
>> > > > >
>> > > > > NOTE: Karp and wirth are two machines on ssh cluster.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Mar 21, 2014 at 3:13 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>> > > > > On Mar 21, 2014, at 10:09 AM, Hamid Saeed <e.hamidsa...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > > I think i have a tcp connection. As for as i know my cluster
>> is not configured for Infiniband (IB).
>> > > > >
>> > > > > Ok.
>> > > > >
>> > > > > > > but even for tcp connections.
>> > > > > > >
>> > > > > > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self
>> ./helloworldmpi
>> > > > > > > mpirun -n 2 -host master,node001 ./helloworldmpi
>> > > > > > >
>> > > > > > > These line are not working they output
>> > > > > > > Error like
>> > > > > > >
>> [btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to
>> xx.xxx.x.xxx failed: Connection refused (111)
>> > > > >
>> > > > > What are the IP addresses reported by connect()?  (i.e., the
>> address you X'ed out)
>> > > > >
>> > > > > Send the output from ifconfig on each of your servers.  Note that
>> some Linux distributions do not put ifconfig in the default PATH of normal
>> users; look for it in/sbin/ifconfig or /usr/sbin/ifconfig.
>> > > > >
>> > > > > --
>> > > > > Jeff Squyres
>> > > > > jsquy...@cisco.com
>> > > > > For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > > > >
>> > > > > _______________________________________________
>> > > > > users mailing list
>> > > > > us...@open-mpi.org
>> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > _______________________________________________
>> > > > > Hamid Saeed
>> > > > > CoSynth GmbH & Co. KG
>> > > > > Escherweg 2 - 26121 Oldenburg - Germany
>> > > > > Tel +49 441 9722 738 | Fax -278
>> > > > > http://www.cosynth.com
>> > > > > _______________________________________________
>> > > > > _______________________________________________
>> > > > > users mailing list
>> > > > > us...@open-mpi.org
>> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > >
>> > > >
>> > > > --
>> > > > Jeff Squyres
>> > > > jsquy...@cisco.com
>> > > > For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > > >
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > us...@open-mpi.org
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > _______________________________________________
>> > > > Hamid Saeed
>> > > > CoSynth GmbH & Co. KG
>> > > > Escherweg 2 - 26121 Oldenburg - Germany
>> > > > Tel +49 441 9722 738 | Fax -278
>> > > > http://www.cosynth.com
>> > > > _______________________________________________
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > us...@open-mpi.org
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > >
>> > > --
>> > > Jeff Squyres
>> > > jsquy...@cisco.com
>> > > For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > >
>> > >
>> > > --
>> > > _______________________________________________
>> > > Hamid Saeed
>> > > _______________________________________________
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com
>> > For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> >
>> > --
>> > _______________________________________________
>> > Hamid Saeed
>> > _______________________________________________
>> >
>> >
>> >
>> > --
>> > _______________________________________________
>> > Hamid Saeed
>> > ______________________________________________
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
>
> _______________________________________________
>
> Hamid Saeed
> CoSynth GmbH & Co. KG
> Escherweg 2 - 26121 Oldenburg - Germany
>
> Tel +49 441 9722 738 | Fax -278
> http://www.cosynth.com
>
> _______________________________________________
>



-- 

_______________________________________________

Hamid Saeed
_______________________________________________

Reply via email to