I don’t believe OMPI supports FTB, I’m afraid - you might want to post your question on an FTB mailing list (I don’t recall if that project is even active any more?)
> On Mar 18, 2016, at 3:24 AM, Husen R <hus...@gmail.com> wrote: > > Dear all, > > Thanks for the reply and valuable informations. > > I have configured MVAPICH2 using the instructions available in a resource > provided by Xavier. > I also have installed FTB (Fault-Tolerant Backplane) in order for MVAPICH2 to > have process migration feature. > > however, I got the following error message when I tried to run > ftb_database_server. > ------------------------------------------------------------------------------------------------------------------------------------------------ > pro@head-node:/usr/local/sbin$ ftb_database_server & > [2] 10678 > pro@head-node:/usr/local/sbin$ > [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h: > line 205][hostname:head-node]Cannot find boot-strap server ip address > ---------------------------------------------------------------------------------------------------------- > Error message : "cannot find boot-strap server ip address". > I have configured bootstrap ip address when I install FTB. > > does anyone have experience solving this problem when using FTB in Open MPI? > I need help. > > Regards, > > > Husen > > > On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron <xavier.besse...@uni.lu > <mailto:xavier.besse...@uni.lu>> wrote: > On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > > Just to clarify: I am not aware of any MPI that will allow you to relocate a > > process while it is running. You have to checkpoint the job, terminate it, > > and then restart the entire thing with the desired process on the new node. > > > > > Dear all, > > For your information, MVAPICH2 supports live migration of MPI > processes, without the need to terminate and restart the whole job. > > All the details are in the MVAPICH2 user guide: > - How to configure MVAPICH2 for migration > > http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4 > > <http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4> > - How to trigger process migration > > http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3 > > <http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3> > > You can also check the paper "High Performance Pipelined Process > Migration with RDMA" > http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf > > <http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf> > > > Best regards, > > Xavier > > > > > > > On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com > > <mailto:hus...@gmail.com>> wrote: > > > > In the case of MPI application (not gromacs), How do I relocate MPI > > application from one node to another node while it is running ? > > I'm sorry, as far as I know the ompi-restart command is used to restart > > application, based on checkpoint file, once the application already > > terminated (no longer running). > > > > Thanks > > > > regards, > > > > Husen > > > > On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond <jeff.scie...@gmail.com > > <mailto:jeff.scie...@gmail.com>> > > wrote: > >> > >> Just checkpoint-restart the app to relocate. The overhead will be lower > >> than trying to do with MPI. > >> > >> Jeff > >> > >> > >> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com > >> <mailto:hus...@gmail.com>> wrote: > >>> > >>> Hi Jeff, > >>> > >>> Thanks for the reply. > >>> > >>> After consulting the Gromacs docs, as you suggested, Gromacs already > >>> supports checkpoint/restart. thanks for the suggestion. > >>> > >>> Previously, I asked about checkpoint/restart in Open MPI because I want > >>> to checkpoint MPI Application and restart/migrate it while it is running. > >>> For the example, I run MPI application in node A,B and C in a cluster and > >>> I want to migrate process running in node A to other node, let's say to > >>> node > >>> C. > >>> is there a way to do this with open MPI ? thanks. > >>> > >>> Regards, > >>> > >>> Husen > >>> > >>> > >>> > >>> > >>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond <jeff.scie...@gmail.com > >>> <mailto:jeff.scie...@gmail.com>> > >>> wrote: > >>>> > >>>> Why do you need OpenMPI to do this? Molecular dynamics trajectories are > >>>> trivial to checkpoint and restart at the application level. I'm sure > >>>> Gromacs > >>>> already supports this. Please consult the Gromacs docs or user support > >>>> for > >>>> details. > >>>> > >>>> Jeff > >>>> > >>>> > >>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com > >>>> <mailto:hus...@gmail.com>> wrote: > >>>>> > >>>>> Dear Open MPI Users, > >>>>> > >>>>> > >>>>> Does the current stable release of Open MPI (v1.10 series) support > >>>>> fault tolerant feature ? > >>>>> I got the information from Open MPI FAQ that The checkpoint/restart > >>>>> support was last released as part of the v1.6 series. > >>>>> I just want to make sure about this. > >>>>> > >>>>> and by the way, does Open MPI able to checkpoint or restart mpi > >>>>> application/GROMACS automatically ? > >>>>> Please, I really need help. > >>>>> > >>>>> Regards, > >>>>> > >>>>> > >>>>> Husen > >>>> > >>>> > >>>> > >>>> -- > >>>> Jeff Hammond > >>>> jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com> > >>>> http://jeffhammond.github.io/ <http://jeffhammond.github.io/> > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> > >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> > >>>> Link to this post: > >>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php > >>>> <http://www.open-mpi.org/community/lists/users/2016/03/28705.php> > >>> > >>> > >> > >> > >> -- > >> Jeff Hammond > >> jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com> > >> http://jeffhammond.github.io/ <http://jeffhammond.github.io/> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org <mailto:us...@open-mpi.org> > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> > >> Link to this post: > >> http://www.open-mpi.org/community/lists/users/2016/03/28709.php > >> <http://www.open-mpi.org/community/lists/users/2016/03/28709.php> > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/03/28710.php > > <http://www.open-mpi.org/community/lists/users/2016/03/28710.php> > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/03/28731.php > > <http://www.open-mpi.org/community/lists/users/2016/03/28731.php> > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28742.php > <http://www.open-mpi.org/community/lists/users/2016/03/28742.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28752.php