Re: [OMPI users] libfabric verb provider for iWARP RNIC
Hi Durga, I'd suggest reposting this to the libfabric-users mail list. You can join that list at http://lists.openfabrics.org/mailman/listinfo/libfabric-users I'd suggest including the output of config.log. If you installed ofed in non-canonical location, you may need to give an explicit path as an argument to the --enable-verbs configury option. Note if you're trying to use libfabric with the Open MPI ofi mtl, you will need to get literally the freshest version of libfabric, either at github or the 1.3rc2 tarball at http://www.openfabrics.org/downloads/ofi/ Good luck, Howard 2016-04-02 13:41 GMT-06:00 dpchoudh . : > Hello all > > My machine has 3 network cards: > > 1. Broadcom GbE (vanilla type, with some offload capability) > 2. Chelsion S310 10Gb iWARP > 3. Qlogic DDR 4X Infiniband. > > With this setup, I built libfabric like this: > > ./configure --enable-udp=auto --enable-gni=auto --enable-mxm=auto > --enable-usnic=auto --enable-verbs=auto --enable-sockets=auto > --enable-psm2=auto --enable-psm=auto && make && sudo make install > > However, in the built libfabric, I do not see a verb provider, which I'd > expect for the iWARP card, at least. > > [durga@smallMPI libfabric]$ fi_info > psm: psm > version: 0.9 > type: FI_EP_RDM > protocol: FI_PROTO_PSMX > UDP: UDP-IP > version: 1.0 > type: FI_EP_DGRAM > protocol: FI_PROTO_UDP > sockets: IP > version: 1.0 > type: FI_EP_MSG > protocol: FI_PROTO_SOCK_TCP > sockets: IP > version: 1.0 > type: FI_EP_DGRAM > protocol: FI_PROTO_SOCK_TCP > sockets: IP > version: 1.0 > type: FI_EP_RDM > protocol: FI_PROTO_SOCK_TCP > > > Am I doing something wrong or misunderstanding how libfabric works? > > Thanks in advance > Durga > > We learn from history that we never learn from history. > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28870.php >
Re: [OMPI users] Fault tolerant feature in Open MPI
Hi Husen, Sorry for this late reply. I gave a quick try at FTB and I managed to get it to work on my local machine. I just had to apply this patch to prevent the agent to crash. Maybe this was your issue: https://github.com/besserox/ftb/commit/01aa44f5ed34e35429ddf99084395e4e8ba67b7c Here is a (very) quick tutorial: # Compile FTB (after applying patch) ./configure --enable-debug --prefix="${FTB_INSTALL_PATH}" make make install # Start server export FTB_BSTRAP_SERVER=127.0.0.1 "${FTB_INSTALL_PATH}/sbin/ftb_database_server" # Start agent export FTB_BSTRAP_SERVER=127.0.0.1 "${FTB_INSTALL_PATH}/sbin/ftb_agent" # First check that server and agent are running ps aux | grep 'ftb_' # You should see the 2 processes running # Compile examples cd components ./autogen.sh ./configure --with-ftb="${FTB_INSTALL_PATH}" make # Start subscriber example export FTB_BSTRAP_SERVER=127.0.0.1 export LD_LIBRARY_PATH="${FTB_INSTALL_PATH}/lib:${LD_LIBARY_PATH}" ./examples/ftb_simple_subscriber # Start publisher example export FTB_BSTRAP_SERVER=127.0.0.1 export LD_LIBRARY_PATH="${FTB_INSTALL_PATH}/lib:${LD_LIBARY_PATH}" ./examples/ftb_simple_publisher The subscriber should output something like: Caught event: event_space: FTB.FTB_EXAMPLES.SIMPLE, severity: INFO, event_name: SIMPLE_EVENT from host: 10.91.2.156 and pid: 9654 I hope this will help you. Unfortunately, FTB (and the CIFTS project) have been discontinued for quite some time now, so it will be difficult to get real help on this. Best regards, Xavier On Mon, Mar 21, 2016 at 3:52 AM, Husen R wrote: > Dear Xavier, > > Yes, I did. I followed the instructions available in that file, especially > at sub-section 4.1. > > I configured boot-strap IP using the ./configure options. > in front-end node, the boot-strap IP is its IP address because I want to > make it as an ftb_database_server. > in every compute nodes, the boot-strap IP is the front-end's IP address. > finally, I use default values for boot-strap port and agent-port. > > > I asked MVAPICH authority about this issue along with process migration > issue and they said it looks like the feature is broken and they will take > a look at it in a low priority due to other on-going activities in the > project. > Thank you. > > Regards, > > Husen > > > > On Sun, Mar 20, 2016 at 3:04 AM, Xavier Besseron > wrote: > >> Dear Husen, >> >> Did you check the information in file >> ./docs/chapters/01_FTB_on_Linux.txt inside the ftb tarball? >> You might want to look at sub-section 4.1. >> >> You can also try to get support on this via the MVAPICH2 mailing list. >> >> >> Best regards, >> >> Xavier >> >> >> On Fri, Mar 18, 2016 at 11:24 AM, Husen R wrote: >> > Dear all, >> > >> > Thanks for the reply and valuable informations. >> > >> > I have configured MVAPICH2 using the instructions available in a >> resource >> > provided by Xavier. >> > I also have installed FTB (Fault-Tolerant Backplane) in order for >> MVAPICH2 >> > to have process migration feature. >> > >> > however, I got the following error message when I tried to run >> > ftb_database_server. >> > >> >> > pro@head-node:/usr/local/sbin$ ftb_database_server & >> > [2] 10678 >> > pro@head-node:/usr/local/sbin$ >> > >> [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h: >> > line 205][hostname:head-node]Cannot find boot-strap server ip address >> > >> -- >> > Error message : "cannot find boot-strap server ip address". >> > I have configured bootstrap ip address when I install FTB. >> > >> > does anyone have experience solving this problem when using FTB in Open >> MPI? >> > I need help. >> > >> > Regards, >> > >> > >> > Husen >> > >> > >> > On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron < >> xavier.besse...@uni.lu> >> > wrote: >> >> >> >> On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain >> wrote: >> >> > Just to clarify: I am not aware of any MPI that will allow you to >> >> > relocate a >> >> > process while it is running. You have to checkpoint the job, >> terminate >> >> > it, >> >> > and then restart the entire thing with the desired process on the new >> >> > node. >> >> > >> >> >> >> >> >> Dear all, >> >> >> >> For your information, MVAPICH2 supports live migration of MPI >> >> processes, without the need to terminate and restart the whole job. >> >> >> >> All the details are in the MVAPICH2 user guide: >> >> - How to configure MVAPICH2 for migration >> >> >> >> >> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4 >> >> - How to trigger process migration >> >> >> >> >> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3 >> >> >> >> You can also check the paper
Re: [OMPI users] libfabric verb provider for iWARP RNIC
Hi Howard Thank you very much for your suggestions. All the installation location in my case are the default ones, so that is likely not the issue. What I find a bit confusing is this: As I mentioned, my cluster has both Qlogic Infiniband and Chelsio iWARP (which are exposed to OpenMPI natively as well as an IP interface) With this configuration, if I build libfabric with configure options --with-psm=auto --with-verbs=auto, as I mentioned earlier, only the PSM interface shows up in the fi_info listing, and OpenMPI programs using ofi MTL *do* work. However, I do not know if the traffic is going through the Qlogic card or the Chelsio card; it is likely the former. I am going to ask this on the libfabric list, but perhaps the following question is relevant in the OpenMPI list: My understanding about the OFI MTL is the following: please correct me where I am wrong: As I understand, ANY type of transport that exposes a verbs interface (iWARP, RoCE, Infiniband from any manufacturer) can become a libfabric provider (when libfabric is compiled with --with-verbs option) and thus support the OFI MTL (and thus the cm PML?) Is the above true? Best regards Durga We learn from history that we never learn from history. On Mon, Apr 4, 2016 at 7:29 AM, Howard Pritchard wrote: > Hi Durga, > > I'd suggest reposting this to the libfabric-users mail list. > You can join that list at > http://lists.openfabrics.org/mailman/listinfo/libfabric-users > > I'd suggest including the output of config.log. If you installed > ofed in non-canonical location, you may need to give an explicit > path as an argument to the --enable-verbs configury option. > > Note if you're trying to use libfabric with the Open MPI ofi > mtl, you will need to get literally the freshest version of > libfabric, either at github or the 1.3rc2 tarball at > > http://www.openfabrics.org/downloads/ofi/ > > Good luck, > > Howard > > > 2016-04-02 13:41 GMT-06:00 dpchoudh . : > >> Hello all >> >> My machine has 3 network cards: >> >> 1. Broadcom GbE (vanilla type, with some offload capability) >> 2. Chelsion S310 10Gb iWARP >> 3. Qlogic DDR 4X Infiniband. >> >> With this setup, I built libfabric like this: >> >> ./configure --enable-udp=auto --enable-gni=auto --enable-mxm=auto >> --enable-usnic=auto --enable-verbs=auto --enable-sockets=auto >> --enable-psm2=auto --enable-psm=auto && make && sudo make install >> >> However, in the built libfabric, I do not see a verb provider, which I'd >> expect for the iWARP card, at least. >> >> [durga@smallMPI libfabric]$ fi_info >> psm: psm >> version: 0.9 >> type: FI_EP_RDM >> protocol: FI_PROTO_PSMX >> UDP: UDP-IP >> version: 1.0 >> type: FI_EP_DGRAM >> protocol: FI_PROTO_UDP >> sockets: IP >> version: 1.0 >> type: FI_EP_MSG >> protocol: FI_PROTO_SOCK_TCP >> sockets: IP >> version: 1.0 >> type: FI_EP_DGRAM >> protocol: FI_PROTO_SOCK_TCP >> sockets: IP >> version: 1.0 >> type: FI_EP_RDM >> protocol: FI_PROTO_SOCK_TCP >> >> >> Am I doing something wrong or misunderstanding how libfabric works? >> >> Thanks in advance >> Durga >> >> We learn from history that we never learn from history. >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28870.php >> > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28884.php >