I don’t believe OMPI supports FTB, I’m afraid - you might want to post your 
question on an FTB mailing list (I don’t recall if that project is even active 
any more?)



> On Mar 18, 2016, at 3:24 AM, Husen R <hus...@gmail.com> wrote:
> 
> Dear all,
> 
> Thanks for the reply and valuable informations.
> 
> I have configured MVAPICH2 using the instructions available in a resource 
> provided by Xavier.
> I also have installed FTB (Fault-Tolerant Backplane) in order for MVAPICH2 to 
> have process migration feature.
> 
> however, I got the following error message when I tried to run 
> ftb_database_server.
> ------------------------------------------------------------------------------------------------------------------------------------------------
> pro@head-node:/usr/local/sbin$ ftb_database_server &
> [2] 10678
> pro@head-node:/usr/local/sbin$ 
> [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h:
>  line 205][hostname:head-node]Cannot find boot-strap server ip address
> ----------------------------------------------------------------------------------------------------------
> Error message : "cannot find boot-strap server ip address".
> I have configured bootstrap ip address when I install FTB.
> 
> does anyone have experience solving this problem when using FTB in Open MPI?
> I need help.
> 
> Regards,
> 
> 
> Husen
> 
> 
> On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron <xavier.besse...@uni.lu 
> <mailto:xavier.besse...@uni.lu>> wrote:
> On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> > Just to clarify: I am not aware of any MPI that will allow you to relocate a
> > process while it is running. You have to checkpoint the job, terminate it,
> > and then restart the entire thing with the desired process on the new node.
> >
> 
> 
> Dear all,
> 
> For your information, MVAPICH2 supports live migration of MPI
> processes, without the need to terminate and restart the whole job.
> 
> All the details are in the MVAPICH2 user guide:
>   - How to configure MVAPICH2 for migration
>     
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4
>  
> <http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4>
>   - How to trigger process migration
>     
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3
>  
> <http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3>
> 
> You can also check the paper "High Performance Pipelined Process
> Migration with RDMA"
> http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf
>  
> <http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf>
> 
> 
> Best regards,
> 
> Xavier
> 
> 
> 
> >
> > On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com 
> > <mailto:hus...@gmail.com>> wrote:
> >
> > In the case of MPI application (not gromacs), How do I relocate MPI
> > application from one node to another node while it is running ?
> > I'm sorry, as far as I know the ompi-restart command is used to restart
> > application, based on checkpoint file, once the application already
> > terminated (no longer running).
> >
> > Thanks
> >
> > regards,
> >
> > Husen
> >
> > On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond <jeff.scie...@gmail.com 
> > <mailto:jeff.scie...@gmail.com>>
> > wrote:
> >>
> >> Just checkpoint-restart the app to relocate. The overhead will be lower
> >> than trying to do with MPI.
> >>
> >> Jeff
> >>
> >>
> >> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com 
> >> <mailto:hus...@gmail.com>> wrote:
> >>>
> >>> Hi Jeff,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> After consulting the Gromacs docs, as you suggested, Gromacs already
> >>> supports checkpoint/restart. thanks for the suggestion.
> >>>
> >>> Previously, I asked about checkpoint/restart in Open MPI because I want
> >>> to checkpoint MPI Application and restart/migrate it while it is running.
> >>> For the example, I run MPI application in node A,B and C in a cluster and
> >>> I want to migrate process running in node A to other node, let's say to 
> >>> node
> >>> C.
> >>> is there a way to do this with open MPI ? thanks.
> >>>
> >>> Regards,
> >>>
> >>> Husen
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond <jeff.scie...@gmail.com 
> >>> <mailto:jeff.scie...@gmail.com>>
> >>> wrote:
> >>>>
> >>>> Why do you need OpenMPI to do this? Molecular dynamics trajectories are
> >>>> trivial to checkpoint and restart at the application level. I'm sure 
> >>>> Gromacs
> >>>> already supports this. Please consult the Gromacs docs or user support 
> >>>> for
> >>>> details.
> >>>>
> >>>> Jeff
> >>>>
> >>>>
> >>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com 
> >>>> <mailto:hus...@gmail.com>> wrote:
> >>>>>
> >>>>> Dear Open MPI Users,
> >>>>>
> >>>>>
> >>>>> Does the current stable release of Open MPI (v1.10 series) support
> >>>>> fault tolerant feature ?
> >>>>> I got the information from Open MPI FAQ that The checkpoint/restart
> >>>>> support was last released as part of the v1.6 series.
> >>>>> I just want to make sure about this.
> >>>>>
> >>>>> and by the way, does Open MPI able to checkpoint or restart mpi
> >>>>> application/GROMACS automatically ?
> >>>>> Please, I really need help.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>
> >>>>> Husen
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jeff Hammond
> >>>> jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com>
> >>>> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >>>> Link to this post:
> >>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php 
> >>>> <http://www.open-mpi.org/community/lists/users/2016/03/28705.php>
> >>>
> >>>
> >>
> >>
> >> --
> >> Jeff Hammond
> >> jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com>
> >> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org <mailto:us...@open-mpi.org>
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2016/03/28709.php 
> >> <http://www.open-mpi.org/community/lists/users/2016/03/28709.php>
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28710.php 
> > <http://www.open-mpi.org/community/lists/users/2016/03/28710.php>
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28731.php 
> > <http://www.open-mpi.org/community/lists/users/2016/03/28731.php>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28742.php 
> <http://www.open-mpi.org/community/lists/users/2016/03/28742.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28752.php

Reply via email to