Re: [OMPI users] libfabric verb provider for iWARP RNIC

2016-04-04 Thread Howard Pritchard
Hi Durga,

I'd suggest reposting this to the libfabric-users mail list.
You can join that list at
http://lists.openfabrics.org/mailman/listinfo/libfabric-users

I'd suggest including the output of config.log.  If you installed
ofed in non-canonical location, you may need to give an explicit
path as an argument to the --enable-verbs configury option.

Note if you're trying to use libfabric with the Open MPI ofi
mtl, you will need to get literally the freshest version of
libfabric, either at github or the 1.3rc2 tarball at

http://www.openfabrics.org/downloads/ofi/

Good luck,

Howard


2016-04-02 13:41 GMT-06:00 dpchoudh . :

> Hello all
>
> My machine has 3 network cards:
>
> 1. Broadcom GbE (vanilla type, with some offload capability)
> 2. Chelsion S310 10Gb iWARP
> 3. Qlogic DDR 4X Infiniband.
>
> With this setup, I built libfabric like this:
>
> ./configure --enable-udp=auto --enable-gni=auto --enable-mxm=auto
> --enable-usnic=auto --enable-verbs=auto --enable-sockets=auto
> --enable-psm2=auto --enable-psm=auto && make && sudo make install
>
> However, in the built libfabric, I do not see a verb provider, which I'd
> expect for the iWARP card, at least.
>
> [durga@smallMPI libfabric]$ fi_info
> psm: psm
> version: 0.9
> type: FI_EP_RDM
> protocol: FI_PROTO_PSMX
> UDP: UDP-IP
> version: 1.0
> type: FI_EP_DGRAM
> protocol: FI_PROTO_UDP
> sockets: IP
> version: 1.0
> type: FI_EP_MSG
> protocol: FI_PROTO_SOCK_TCP
> sockets: IP
> version: 1.0
> type: FI_EP_DGRAM
> protocol: FI_PROTO_SOCK_TCP
> sockets: IP
> version: 1.0
> type: FI_EP_RDM
> protocol: FI_PROTO_SOCK_TCP
>
>
> Am I doing something wrong or misunderstanding how libfabric works?
>
> Thanks in advance
> Durga
>
> We learn from history that we never learn from history.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/04/28870.php
>


Re: [OMPI users] Fault tolerant feature in Open MPI

2016-04-04 Thread Xavier Besseron
Hi Husen,

Sorry for this late reply.
I gave a quick try at FTB and I managed to get it to work on my local
machine.
I just had to apply this patch to prevent the agent to crash. Maybe this
was your issue:
https://github.com/besserox/ftb/commit/01aa44f5ed34e35429ddf99084395e4e8ba67b7c

Here is a (very) quick tutorial:

# Compile FTB (after applying patch)
./configure --enable-debug --prefix="${FTB_INSTALL_PATH}"
make
make install

# Start server
export FTB_BSTRAP_SERVER=127.0.0.1
"${FTB_INSTALL_PATH}/sbin/ftb_database_server"

# Start agent
export FTB_BSTRAP_SERVER=127.0.0.1
"${FTB_INSTALL_PATH}/sbin/ftb_agent"

# First check that server and agent are running
ps aux | grep 'ftb_'

# You should see the 2 processes running



# Compile examples
cd components
./autogen.sh
./configure --with-ftb="${FTB_INSTALL_PATH}"
make

# Start subscriber example
export FTB_BSTRAP_SERVER=127.0.0.1
export LD_LIBRARY_PATH="${FTB_INSTALL_PATH}/lib:${LD_LIBARY_PATH}"
./examples/ftb_simple_subscriber


# Start publisher example
export FTB_BSTRAP_SERVER=127.0.0.1
export LD_LIBRARY_PATH="${FTB_INSTALL_PATH}/lib:${LD_LIBARY_PATH}"

./examples/ftb_simple_publisher


The subscriber should output something like:

Caught event: event_space: FTB.FTB_EXAMPLES.SIMPLE, severity: INFO,
event_name: SIMPLE_EVENT from host: 10.91.2.156 and pid: 9654




I hope this will help you.
Unfortunately, FTB (and the CIFTS project) have been discontinued for quite
some time now, so it will be difficult to get real help on this.


Best regards,

Xavier




On Mon, Mar 21, 2016 at 3:52 AM, Husen R  wrote:

> Dear Xavier,
>
> Yes, I did. I followed the instructions available in that file, especially
> at sub-section 4.1.
>
> I configured boot-strap IP using the ./configure options.
> in front-end node, the boot-strap IP is its IP address because I want to
> make it as an ftb_database_server.
> in every compute nodes, the boot-strap IP is the front-end's IP address.
> finally, I use default values for boot-strap port and agent-port.
>
>
> I asked MVAPICH authority about this issue along with process migration
> issue and they said it looks like the feature is broken and they will take
> a look at it in a low priority due to other on-going activities in the
> project.
> Thank you.
>
> Regards,
>
> Husen
>
>
>
> On Sun, Mar 20, 2016 at 3:04 AM, Xavier Besseron 
> wrote:
>
>> Dear Husen,
>>
>> Did you check the information in file
>> ./docs/chapters/01_FTB_on_Linux.txt inside the ftb tarball?
>> You might want to look at sub-section 4.1.
>>
>> You can also try to get support on this via the MVAPICH2 mailing list.
>>
>>
>> Best regards,
>>
>> Xavier
>>
>>
>> On Fri, Mar 18, 2016 at 11:24 AM, Husen R  wrote:
>> > Dear all,
>> >
>> > Thanks for the reply and valuable informations.
>> >
>> > I have configured MVAPICH2 using the instructions available in a
>> resource
>> > provided by Xavier.
>> > I also have installed FTB (Fault-Tolerant Backplane) in order for
>> MVAPICH2
>> > to have process migration feature.
>> >
>> > however, I got the following error message when I tried to run
>> > ftb_database_server.
>> >
>> 
>> > pro@head-node:/usr/local/sbin$ ftb_database_server &
>> > [2] 10678
>> > pro@head-node:/usr/local/sbin$
>> >
>> [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h:
>> > line 205][hostname:head-node]Cannot find boot-strap server ip address
>> >
>> --
>> > Error message : "cannot find boot-strap server ip address".
>> > I have configured bootstrap ip address when I install FTB.
>> >
>> > does anyone have experience solving this problem when using FTB in Open
>> MPI?
>> > I need help.
>> >
>> > Regards,
>> >
>> >
>> > Husen
>> >
>> >
>> > On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron <
>> xavier.besse...@uni.lu>
>> > wrote:
>> >>
>> >> On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain 
>> wrote:
>> >> > Just to clarify: I am not aware of any MPI that will allow you to
>> >> > relocate a
>> >> > process while it is running. You have to checkpoint the job,
>> terminate
>> >> > it,
>> >> > and then restart the entire thing with the desired process on the new
>> >> > node.
>> >> >
>> >>
>> >>
>> >> Dear all,
>> >>
>> >> For your information, MVAPICH2 supports live migration of MPI
>> >> processes, without the need to terminate and restart the whole job.
>> >>
>> >> All the details are in the MVAPICH2 user guide:
>> >>   - How to configure MVAPICH2 for migration
>> >>
>> >>
>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4
>> >>   - How to trigger process migration
>> >>
>> >>
>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3
>> >>
>> >> You can also check the paper

Re: [OMPI users] libfabric verb provider for iWARP RNIC

2016-04-04 Thread dpchoudh .
Hi Howard

Thank you very much for your suggestions. All the installation location in
my case are the default ones, so that is likely not the issue.

What I find a bit confusing is this:

As I mentioned, my cluster has both Qlogic Infiniband and Chelsio iWARP
(which are exposed to OpenMPI natively as well as an IP interface)

With this configuration, if I build libfabric with configure options
--with-psm=auto --with-verbs=auto, as I mentioned earlier, only the PSM
interface shows up in the fi_info listing, and OpenMPI programs using ofi
MTL *do* work. However, I do not know if the traffic is going through the
Qlogic card or the Chelsio card; it is likely the former.

I am going to ask this on the libfabric list, but perhaps the following
question is relevant in the OpenMPI list:

My understanding about the OFI MTL is the following: please correct me
where I am wrong:
As I understand, ANY type of transport that exposes a verbs interface
(iWARP, RoCE, Infiniband from any manufacturer) can become a libfabric
provider (when libfabric is compiled with --with-verbs option) and thus
support the OFI MTL (and thus the cm PML?)

Is the above true?

Best regards
Durga

We learn from history that we never learn from history.

On Mon, Apr 4, 2016 at 7:29 AM, Howard Pritchard 
wrote:

> Hi Durga,
>
> I'd suggest reposting this to the libfabric-users mail list.
> You can join that list at
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users
>
> I'd suggest including the output of config.log.  If you installed
> ofed in non-canonical location, you may need to give an explicit
> path as an argument to the --enable-verbs configury option.
>
> Note if you're trying to use libfabric with the Open MPI ofi
> mtl, you will need to get literally the freshest version of
> libfabric, either at github or the 1.3rc2 tarball at
>
> http://www.openfabrics.org/downloads/ofi/
>
> Good luck,
>
> Howard
>
>
> 2016-04-02 13:41 GMT-06:00 dpchoudh . :
>
>> Hello all
>>
>> My machine has 3 network cards:
>>
>> 1. Broadcom GbE (vanilla type, with some offload capability)
>> 2. Chelsion S310 10Gb iWARP
>> 3. Qlogic DDR 4X Infiniband.
>>
>> With this setup, I built libfabric like this:
>>
>> ./configure --enable-udp=auto --enable-gni=auto --enable-mxm=auto
>> --enable-usnic=auto --enable-verbs=auto --enable-sockets=auto
>> --enable-psm2=auto --enable-psm=auto && make && sudo make install
>>
>> However, in the built libfabric, I do not see a verb provider, which I'd
>> expect for the iWARP card, at least.
>>
>> [durga@smallMPI libfabric]$ fi_info
>> psm: psm
>> version: 0.9
>> type: FI_EP_RDM
>> protocol: FI_PROTO_PSMX
>> UDP: UDP-IP
>> version: 1.0
>> type: FI_EP_DGRAM
>> protocol: FI_PROTO_UDP
>> sockets: IP
>> version: 1.0
>> type: FI_EP_MSG
>> protocol: FI_PROTO_SOCK_TCP
>> sockets: IP
>> version: 1.0
>> type: FI_EP_DGRAM
>> protocol: FI_PROTO_SOCK_TCP
>> sockets: IP
>> version: 1.0
>> type: FI_EP_RDM
>> protocol: FI_PROTO_SOCK_TCP
>>
>>
>> Am I doing something wrong or misunderstanding how libfabric works?
>>
>> Thanks in advance
>> Durga
>>
>> We learn from history that we never learn from history.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/04/28870.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/04/28884.php
>