Hi Husen, Sorry for this late reply. I gave a quick try at FTB and I managed to get it to work on my local machine. I just had to apply this patch to prevent the agent to crash. Maybe this was your issue: https://github.com/besserox/ftb/commit/01aa44f5ed34e35429ddf99084395e4e8ba67b7c
Here is a (very) quick tutorial: # Compile FTB (after applying patch) ./configure --enable-debug --prefix="${FTB_INSTALL_PATH}" make make install # Start server export FTB_BSTRAP_SERVER=127.0.0.1 "${FTB_INSTALL_PATH}/sbin/ftb_database_server" # Start agent export FTB_BSTRAP_SERVER=127.0.0.1 "${FTB_INSTALL_PATH}/sbin/ftb_agent" # First check that server and agent are running ps aux | grep 'ftb_' # You should see the 2 processes running # Compile examples cd components ./autogen.sh ./configure --with-ftb="${FTB_INSTALL_PATH}" make # Start subscriber example export FTB_BSTRAP_SERVER=127.0.0.1 export LD_LIBRARY_PATH="${FTB_INSTALL_PATH}/lib:${LD_LIBARY_PATH}" ./examples/ftb_simple_subscriber # Start publisher example export FTB_BSTRAP_SERVER=127.0.0.1 export LD_LIBRARY_PATH="${FTB_INSTALL_PATH}/lib:${LD_LIBARY_PATH}" ./examples/ftb_simple_publisher The subscriber should output something like: Caught event: event_space: FTB.FTB_EXAMPLES.SIMPLE, severity: INFO, event_name: SIMPLE_EVENT from host: 10.91.2.156 and pid: 9654 I hope this will help you. Unfortunately, FTB (and the CIFTS project) have been discontinued for quite some time now, so it will be difficult to get real help on this. Best regards, Xavier On Mon, Mar 21, 2016 at 3:52 AM, Husen R <hus...@gmail.com> wrote: > Dear Xavier, > > Yes, I did. I followed the instructions available in that file, especially > at sub-section 4.1. > > I configured boot-strap IP using the ./configure options. > in front-end node, the boot-strap IP is its IP address because I want to > make it as an ftb_database_server. > in every compute nodes, the boot-strap IP is the front-end's IP address. > finally, I use default values for boot-strap port and agent-port. > > > I asked MVAPICH authority about this issue along with process migration > issue and they said it looks like the feature is broken and they will take > a look at it in a low priority due to other on-going activities in the > project. > Thank you. > > Regards, > > Husen > > > > On Sun, Mar 20, 2016 at 3:04 AM, Xavier Besseron <xavier.besse...@uni.lu> > wrote: > >> Dear Husen, >> >> Did you check the information in file >> ./docs/chapters/01_FTB_on_Linux.txt inside the ftb tarball? >> You might want to look at sub-section 4.1. >> >> You can also try to get support on this via the MVAPICH2 mailing list. >> >> >> Best regards, >> >> Xavier >> >> >> On Fri, Mar 18, 2016 at 11:24 AM, Husen R <hus...@gmail.com> wrote: >> > Dear all, >> > >> > Thanks for the reply and valuable informations. >> > >> > I have configured MVAPICH2 using the instructions available in a >> resource >> > provided by Xavier. >> > I also have installed FTB (Fault-Tolerant Backplane) in order for >> MVAPICH2 >> > to have process migration feature. >> > >> > however, I got the following error message when I tried to run >> > ftb_database_server. >> > >> ------------------------------------------------------------------------------------------------------------------------------------------------ >> > pro@head-node:/usr/local/sbin$ ftb_database_server & >> > [2] 10678 >> > pro@head-node:/usr/local/sbin$ >> > >> [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h: >> > line 205][hostname:head-node]Cannot find boot-strap server ip address >> > >> ---------------------------------------------------------------------------------------------------------- >> > Error message : "cannot find boot-strap server ip address". >> > I have configured bootstrap ip address when I install FTB. >> > >> > does anyone have experience solving this problem when using FTB in Open >> MPI? >> > I need help. >> > >> > Regards, >> > >> > >> > Husen >> > >> > >> > On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron < >> xavier.besse...@uni.lu> >> > wrote: >> >> >> >> On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org> >> wrote: >> >> > Just to clarify: I am not aware of any MPI that will allow you to >> >> > relocate a >> >> > process while it is running. You have to checkpoint the job, >> terminate >> >> > it, >> >> > and then restart the entire thing with the desired process on the new >> >> > node. >> >> > >> >> >> >> >> >> Dear all, >> >> >> >> For your information, MVAPICH2 supports live migration of MPI >> >> processes, without the need to terminate and restart the whole job. >> >> >> >> All the details are in the MVAPICH2 user guide: >> >> - How to configure MVAPICH2 for migration >> >> >> >> >> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4 >> >> - How to trigger process migration >> >> >> >> >> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3 >> >> >> >> You can also check the paper "High Performance Pipelined Process >> >> Migration with RDMA" >> >> >> >> >> http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf >> >> >> >> >> >> Best regards, >> >> >> >> Xavier >> >> >> >> >> >> >> >> > >> >> > On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com> wrote: >> >> > >> >> > In the case of MPI application (not gromacs), How do I relocate MPI >> >> > application from one node to another node while it is running ? >> >> > I'm sorry, as far as I know the ompi-restart command is used to >> restart >> >> > application, based on checkpoint file, once the application already >> >> > terminated (no longer running). >> >> > >> >> > Thanks >> >> > >> >> > regards, >> >> > >> >> > Husen >> >> > >> >> > On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond < >> jeff.scie...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Just checkpoint-restart the app to relocate. The overhead will be >> lower >> >> >> than trying to do with MPI. >> >> >> >> >> >> Jeff >> >> >> >> >> >> >> >> >> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com> wrote: >> >> >>> >> >> >>> Hi Jeff, >> >> >>> >> >> >>> Thanks for the reply. >> >> >>> >> >> >>> After consulting the Gromacs docs, as you suggested, Gromacs >> already >> >> >>> supports checkpoint/restart. thanks for the suggestion. >> >> >>> >> >> >>> Previously, I asked about checkpoint/restart in Open MPI because I >> >> >>> want >> >> >>> to checkpoint MPI Application and restart/migrate it while it is >> >> >>> running. >> >> >>> For the example, I run MPI application in node A,B and C in a >> cluster >> >> >>> and >> >> >>> I want to migrate process running in node A to other node, let's >> say >> >> >>> to node >> >> >>> C. >> >> >>> is there a way to do this with open MPI ? thanks. >> >> >>> >> >> >>> Regards, >> >> >>> >> >> >>> Husen >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond >> >> >>> <jeff.scie...@gmail.com> >> >> >>> wrote: >> >> >>>> >> >> >>>> Why do you need OpenMPI to do this? Molecular dynamics >> trajectories >> >> >>>> are >> >> >>>> trivial to checkpoint and restart at the application level. I'm >> sure >> >> >>>> Gromacs >> >> >>>> already supports this. Please consult the Gromacs docs or user >> >> >>>> support for >> >> >>>> details. >> >> >>>> >> >> >>>> Jeff >> >> >>>> >> >> >>>> >> >> >>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com> wrote: >> >> >>>>> >> >> >>>>> Dear Open MPI Users, >> >> >>>>> >> >> >>>>> >> >> >>>>> Does the current stable release of Open MPI (v1.10 series) >> support >> >> >>>>> fault tolerant feature ? >> >> >>>>> I got the information from Open MPI FAQ that The >> checkpoint/restart >> >> >>>>> support was last released as part of the v1.6 series. >> >> >>>>> I just want to make sure about this. >> >> >>>>> >> >> >>>>> and by the way, does Open MPI able to checkpoint or restart mpi >> >> >>>>> application/GROMACS automatically ? >> >> >>>>> Please, I really need help. >> >> >>>>> >> >> >>>>> Regards, >> >> >>>>> >> >> >>>>> >> >> >>>>> Husen >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> Jeff Hammond >> >> >>>> jeff.scie...@gmail.com >> >> >>>> http://jeffhammond.github.io/ >> >> >>>> >> >> >>>> _______________________________________________ >> >> >>>> users mailing list >> >> >>>> us...@open-mpi.org >> >> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >>>> Link to this post: >> >> >>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php >> >> >>> >> >> >>> >> >> >> >> >> >> >> >> >> -- >> >> >> Jeff Hammond >> >> >> jeff.scie...@gmail.com >> >> >> http://jeffhammond.github.io/ >> >> >> >> >> >> _______________________________________________ >> >> >> users mailing list >> >> >> us...@open-mpi.org >> >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> Link to this post: >> >> >> http://www.open-mpi.org/community/lists/users/2016/03/28709.php >> >> > >> >> > >> >> > _______________________________________________ >> >> > users mailing list >> >> > us...@open-mpi.org >> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > Link to this post: >> >> > http://www.open-mpi.org/community/lists/users/2016/03/28710.php >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > users mailing list >> >> > us...@open-mpi.org >> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > Link to this post: >> >> > http://www.open-mpi.org/community/lists/users/2016/03/28731.php >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> Link to this post: >> >> http://www.open-mpi.org/community/lists/users/2016/03/28742.php >> > >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/03/28752.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28759.php >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28765.php >