Dear all,

Thanks for the reply and valuable informations.

I have configured MVAPICH2 using the instructions available in a resource
provided by Xavier.
I also have installed FTB (Fault-Tolerant Backplane) in order for MVAPICH2
to have process migration feature.

however, I got the following error message when I tried to run
ftb_database_server.
------------------------------------------------------------------------------------------------------------------------------------------------
pro@head-node:/usr/local/sbin$ ftb_database_server &
[2] 10678
pro@head-node:/usr/local/sbin$
[FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h:
line 205][hostname:head-node]Cannot find boot-strap server ip address
----------------------------------------------------------------------------------------------------------
Error message : "cannot find boot-strap server ip address".
I have configured bootstrap ip address when I install FTB.

does anyone have experience solving this problem when using FTB in Open MPI?
I need help.

Regards,


Husen


On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron <xavier.besse...@uni.lu>
wrote:

> On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org> wrote:
> > Just to clarify: I am not aware of any MPI that will allow you to
> relocate a
> > process while it is running. You have to checkpoint the job, terminate
> it,
> > and then restart the entire thing with the desired process on the new
> node.
> >
>
>
> Dear all,
>
> For your information, MVAPICH2 supports live migration of MPI
> processes, without the need to terminate and restart the whole job.
>
> All the details are in the MVAPICH2 user guide:
>   - How to configure MVAPICH2 for migration
>
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4
>   - How to trigger process migration
>
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3
>
> You can also check the paper "High Performance Pipelined Process
> Migration with RDMA"
>
> http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf
>
>
> Best regards,
>
> Xavier
>
>
>
> >
> > On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com> wrote:
> >
> > In the case of MPI application (not gromacs), How do I relocate MPI
> > application from one node to another node while it is running ?
> > I'm sorry, as far as I know the ompi-restart command is used to restart
> > application, based on checkpoint file, once the application already
> > terminated (no longer running).
> >
> > Thanks
> >
> > regards,
> >
> > Husen
> >
> > On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond <jeff.scie...@gmail.com>
> > wrote:
> >>
> >> Just checkpoint-restart the app to relocate. The overhead will be lower
> >> than trying to do with MPI.
> >>
> >> Jeff
> >>
> >>
> >> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com> wrote:
> >>>
> >>> Hi Jeff,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> After consulting the Gromacs docs, as you suggested, Gromacs already
> >>> supports checkpoint/restart. thanks for the suggestion.
> >>>
> >>> Previously, I asked about checkpoint/restart in Open MPI because I want
> >>> to checkpoint MPI Application and restart/migrate it while it is
> running.
> >>> For the example, I run MPI application in node A,B and C in a cluster
> and
> >>> I want to migrate process running in node A to other node, let's say
> to node
> >>> C.
> >>> is there a way to do this with open MPI ? thanks.
> >>>
> >>> Regards,
> >>>
> >>> Husen
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond <jeff.scie...@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Why do you need OpenMPI to do this? Molecular dynamics trajectories
> are
> >>>> trivial to checkpoint and restart at the application level. I'm sure
> Gromacs
> >>>> already supports this. Please consult the Gromacs docs or user
> support for
> >>>> details.
> >>>>
> >>>> Jeff
> >>>>
> >>>>
> >>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com> wrote:
> >>>>>
> >>>>> Dear Open MPI Users,
> >>>>>
> >>>>>
> >>>>> Does the current stable release of Open MPI (v1.10 series) support
> >>>>> fault tolerant feature ?
> >>>>> I got the information from Open MPI FAQ that The checkpoint/restart
> >>>>> support was last released as part of the v1.6 series.
> >>>>> I just want to make sure about this.
> >>>>>
> >>>>> and by the way, does Open MPI able to checkpoint or restart mpi
> >>>>> application/GROMACS automatically ?
> >>>>> Please, I really need help.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>
> >>>>> Husen
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jeff Hammond
> >>>> jeff.scie...@gmail.com
> >>>> http://jeffhammond.github.io/
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> us...@open-mpi.org
> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>> Link to this post:
> >>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php
> >>>
> >>>
> >>
> >>
> >> --
> >> Jeff Hammond
> >> jeff.scie...@gmail.com
> >> http://jeffhammond.github.io/
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2016/03/28709.php
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28710.php
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/03/28731.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28742.php
>

Reply via email to