Dear Husen, Did you check the information in file ./docs/chapters/01_FTB_on_Linux.txt inside the ftb tarball? You might want to look at sub-section 4.1.
You can also try to get support on this via the MVAPICH2 mailing list. Best regards, Xavier On Fri, Mar 18, 2016 at 11:24 AM, Husen R <hus...@gmail.com> wrote: > Dear all, > > Thanks for the reply and valuable informations. > > I have configured MVAPICH2 using the instructions available in a resource > provided by Xavier. > I also have installed FTB (Fault-Tolerant Backplane) in order for MVAPICH2 > to have process migration feature. > > however, I got the following error message when I tried to run > ftb_database_server. > ------------------------------------------------------------------------------------------------------------------------------------------------ > pro@head-node:/usr/local/sbin$ ftb_database_server & > [2] 10678 > pro@head-node:/usr/local/sbin$ > [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h: > line 205][hostname:head-node]Cannot find boot-strap server ip address > ---------------------------------------------------------------------------------------------------------- > Error message : "cannot find boot-strap server ip address". > I have configured bootstrap ip address when I install FTB. > > does anyone have experience solving this problem when using FTB in Open MPI? > I need help. > > Regards, > > > Husen > > > On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron <xavier.besse...@uni.lu> > wrote: >> >> On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org> wrote: >> > Just to clarify: I am not aware of any MPI that will allow you to >> > relocate a >> > process while it is running. You have to checkpoint the job, terminate >> > it, >> > and then restart the entire thing with the desired process on the new >> > node. >> > >> >> >> Dear all, >> >> For your information, MVAPICH2 supports live migration of MPI >> processes, without the need to terminate and restart the whole job. >> >> All the details are in the MVAPICH2 user guide: >> - How to configure MVAPICH2 for migration >> >> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4 >> - How to trigger process migration >> >> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3 >> >> You can also check the paper "High Performance Pipelined Process >> Migration with RDMA" >> >> http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf >> >> >> Best regards, >> >> Xavier >> >> >> >> > >> > On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com> wrote: >> > >> > In the case of MPI application (not gromacs), How do I relocate MPI >> > application from one node to another node while it is running ? >> > I'm sorry, as far as I know the ompi-restart command is used to restart >> > application, based on checkpoint file, once the application already >> > terminated (no longer running). >> > >> > Thanks >> > >> > regards, >> > >> > Husen >> > >> > On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond <jeff.scie...@gmail.com> >> > wrote: >> >> >> >> Just checkpoint-restart the app to relocate. The overhead will be lower >> >> than trying to do with MPI. >> >> >> >> Jeff >> >> >> >> >> >> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com> wrote: >> >>> >> >>> Hi Jeff, >> >>> >> >>> Thanks for the reply. >> >>> >> >>> After consulting the Gromacs docs, as you suggested, Gromacs already >> >>> supports checkpoint/restart. thanks for the suggestion. >> >>> >> >>> Previously, I asked about checkpoint/restart in Open MPI because I >> >>> want >> >>> to checkpoint MPI Application and restart/migrate it while it is >> >>> running. >> >>> For the example, I run MPI application in node A,B and C in a cluster >> >>> and >> >>> I want to migrate process running in node A to other node, let's say >> >>> to node >> >>> C. >> >>> is there a way to do this with open MPI ? thanks. >> >>> >> >>> Regards, >> >>> >> >>> Husen >> >>> >> >>> >> >>> >> >>> >> >>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond >> >>> <jeff.scie...@gmail.com> >> >>> wrote: >> >>>> >> >>>> Why do you need OpenMPI to do this? Molecular dynamics trajectories >> >>>> are >> >>>> trivial to checkpoint and restart at the application level. I'm sure >> >>>> Gromacs >> >>>> already supports this. Please consult the Gromacs docs or user >> >>>> support for >> >>>> details. >> >>>> >> >>>> Jeff >> >>>> >> >>>> >> >>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com> wrote: >> >>>>> >> >>>>> Dear Open MPI Users, >> >>>>> >> >>>>> >> >>>>> Does the current stable release of Open MPI (v1.10 series) support >> >>>>> fault tolerant feature ? >> >>>>> I got the information from Open MPI FAQ that The checkpoint/restart >> >>>>> support was last released as part of the v1.6 series. >> >>>>> I just want to make sure about this. >> >>>>> >> >>>>> and by the way, does Open MPI able to checkpoint or restart mpi >> >>>>> application/GROMACS automatically ? >> >>>>> Please, I really need help. >> >>>>> >> >>>>> Regards, >> >>>>> >> >>>>> >> >>>>> Husen >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Jeff Hammond >> >>>> jeff.scie...@gmail.com >> >>>> http://jeffhammond.github.io/ >> >>>> >> >>>> _______________________________________________ >> >>>> users mailing list >> >>>> us...@open-mpi.org >> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>> Link to this post: >> >>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php >> >>> >> >>> >> >> >> >> >> >> -- >> >> Jeff Hammond >> >> jeff.scie...@gmail.com >> >> http://jeffhammond.github.io/ >> >> >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> Link to this post: >> >> http://www.open-mpi.org/community/lists/users/2016/03/28709.php >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/03/28710.php >> > >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/03/28731.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28742.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28752.php