Dear all, Thanks for the reply and valuable informations.
I have configured MVAPICH2 using the instructions available in a resource provided by Xavier. I also have installed FTB (Fault-Tolerant Backplane) in order for MVAPICH2 to have process migration feature. however, I got the following error message when I tried to run ftb_database_server. ------------------------------------------------------------------------------------------------------------------------------------------------ pro@head-node:/usr/local/sbin$ ftb_database_server & [2] 10678 pro@head-node:/usr/local/sbin$ [FTB_ERROR][/home/pro/ftb-0.6.2/src/manager_lib/network/network_sock/include/ftb_network_sock.h: line 205][hostname:head-node]Cannot find boot-strap server ip address ---------------------------------------------------------------------------------------------------------- Error message : "cannot find boot-strap server ip address". I have configured bootstrap ip address when I install FTB. does anyone have experience solving this problem when using FTB in Open MPI? I need help. Regards, Husen On Fri, Mar 18, 2016 at 12:06 AM, Xavier Besseron <xavier.besse...@uni.lu> wrote: > On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Just to clarify: I am not aware of any MPI that will allow you to > relocate a > > process while it is running. You have to checkpoint the job, terminate > it, > > and then restart the entire thing with the desired process on the new > node. > > > > > Dear all, > > For your information, MVAPICH2 supports live migration of MPI > processes, without the need to terminate and restart the whole job. > > All the details are in the MVAPICH2 user guide: > - How to configure MVAPICH2 for migration > > http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4 > - How to trigger process migration > > http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3 > > You can also check the paper "High Performance Pipelined Process > Migration with RDMA" > > http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf > > > Best regards, > > Xavier > > > > > > > On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com> wrote: > > > > In the case of MPI application (not gromacs), How do I relocate MPI > > application from one node to another node while it is running ? > > I'm sorry, as far as I know the ompi-restart command is used to restart > > application, based on checkpoint file, once the application already > > terminated (no longer running). > > > > Thanks > > > > regards, > > > > Husen > > > > On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond <jeff.scie...@gmail.com> > > wrote: > >> > >> Just checkpoint-restart the app to relocate. The overhead will be lower > >> than trying to do with MPI. > >> > >> Jeff > >> > >> > >> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com> wrote: > >>> > >>> Hi Jeff, > >>> > >>> Thanks for the reply. > >>> > >>> After consulting the Gromacs docs, as you suggested, Gromacs already > >>> supports checkpoint/restart. thanks for the suggestion. > >>> > >>> Previously, I asked about checkpoint/restart in Open MPI because I want > >>> to checkpoint MPI Application and restart/migrate it while it is > running. > >>> For the example, I run MPI application in node A,B and C in a cluster > and > >>> I want to migrate process running in node A to other node, let's say > to node > >>> C. > >>> is there a way to do this with open MPI ? thanks. > >>> > >>> Regards, > >>> > >>> Husen > >>> > >>> > >>> > >>> > >>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond <jeff.scie...@gmail.com > > > >>> wrote: > >>>> > >>>> Why do you need OpenMPI to do this? Molecular dynamics trajectories > are > >>>> trivial to checkpoint and restart at the application level. I'm sure > Gromacs > >>>> already supports this. Please consult the Gromacs docs or user > support for > >>>> details. > >>>> > >>>> Jeff > >>>> > >>>> > >>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com> wrote: > >>>>> > >>>>> Dear Open MPI Users, > >>>>> > >>>>> > >>>>> Does the current stable release of Open MPI (v1.10 series) support > >>>>> fault tolerant feature ? > >>>>> I got the information from Open MPI FAQ that The checkpoint/restart > >>>>> support was last released as part of the v1.6 series. > >>>>> I just want to make sure about this. > >>>>> > >>>>> and by the way, does Open MPI able to checkpoint or restart mpi > >>>>> application/GROMACS automatically ? > >>>>> Please, I really need help. > >>>>> > >>>>> Regards, > >>>>> > >>>>> > >>>>> Husen > >>>> > >>>> > >>>> > >>>> -- > >>>> Jeff Hammond > >>>> jeff.scie...@gmail.com > >>>> http://jeffhammond.github.io/ > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> Link to this post: > >>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php > >>> > >>> > >> > >> > >> -- > >> Jeff Hammond > >> jeff.scie...@gmail.com > >> http://jeffhammond.github.io/ > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > >> http://www.open-mpi.org/community/lists/users/2016/03/28709.php > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/03/28710.php > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/03/28731.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28742.php >