Re: [OMPI users] Hibernating/Wakeup MPI processes

2010-04-13 Thread Josh Hursey
So what you are looking for is checkpoint/restart support, which you can find some details about at the link below: http://osl.iu.edu/research/ft/ompi-cr/ Additionally, we relatively recently added the ability to checkpoint and 'stop' the application. This generates a usable checkpoint of t

Re: [OMPI users] Don't crash on node failures

2010-04-13 Thread Durga Choudhury
This would be a very welcoming new feature for me as well. My two thumbs up when it happens. Best regards Durga On Tue, Apr 13, 2010 at 10:28 AM, Ralph Castain wrote: > Not right now, but coming later this year... > > On Apr 13, 2010, at 7:21 AM, Jürgen Kaiser wrote: > >> Hi, >> >> Can I force

Re: [OMPI users] Hibernating/Wakeup MPI processes

2010-04-13 Thread Ralph Castain
I believe that is called "checkpoint/restart" - see the FAQ page on that subject. On Apr 13, 2010, at 7:30 AM, Hoelzlwimmer Andreas - S0810595005 wrote: > Hi, > > I found in the FAQ that it is possible to suspend/resume MPI jobs. Would it > also be possible to Hibernate the jobs (free the memo

Re: [OMPI users] Don't crash on node failures

2010-04-13 Thread Ralph Castain
Not right now, but coming later this year... On Apr 13, 2010, at 7:21 AM, Jürgen Kaiser wrote: > Hi, > > Can I force MPI to not abort the whole job when a node crashes? I would > like to let the remaining MPI-processes perform some action in that case > and then proceed. > > Thanks, > Jürgen >

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Gabriele Fatigati
Ok Jeff, i have understood. Thanks very much for your help! Regards. 2010/4/13 Jeff Squyres > On Apr 13, 2010, at 9:17 AM, Gabriele Fatigati wrote: > > > My actual configuration is: > > > > btl = ^tcp > > btl_tcp_if_exclude = eth0,ib0,ib1 > > oob_tcp_include = eth1,lo > > > > But is it right?

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Jeff Squyres
On Apr 13, 2010, at 9:17 AM, Gabriele Fatigati wrote: > My actual configuration is: > > btl = ^tcp > btl_tcp_if_exclude = eth0,ib0,ib1 > oob_tcp_include = eth1,lo > > But is it right? I have some doubt.. It depends on what "right" is in your environment. :-) Your default config excludes the B

[OMPI users] Hibernating/Wakeup MPI processes

2010-04-13 Thread Hoelzlwimmer Andreas - S0810595005
Hi, I found in the FAQ that it is possible to suspend/resume MPI jobs. Would it also be possible to Hibernate the jobs (free the memory, serialize it to the hard drive) and continue/wake them up later, possibly at different locations? cheers, Andreas

[OMPI users] Don't crash on node failures

2010-04-13 Thread Jürgen Kaiser
Hi, Can I force MPI to not abort the whole job when a node crashes? I would like to let the remaining MPI-processes perform some action in that case and then proceed. Thanks, Jürgen

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Gabriele Fatigati
Yes, it's right! Now i can see btl_tcp_if_include flag: MCA btl: parameter "btl_tcp_if_include" (current value: , data source: default value) MCA btl: parameter "btl_tcp_if_exclude" (current value: "eth0,ib0,ib1", data source: file [/cineca/prod/opt/compilers/openmpi/1.3.3/intel--11.1--binary/et

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Jeff Squyres
On Apr 13, 2010, at 9:03 AM, Gabriele Fatigati wrote: > ompi_info --param btl tcp Ah ha... this is revealing: > MCA btl: parameter "btl" (current value: "^tcp", data > source: file > > [/cineca/prod/opt/compilers/openmpi/1.3.3/intel--11.1--binary/etc/

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Gabriele Fatigati
Ok, this is my output: ompi_info --param btl tcp MCA btl: parameter "btl_base_verbose" (current value: "0", data source: default value) Verbosity level of the BTL framework MCA btl: parameter "btl" (current value: "^tcp", data source: file [/cineca/pro

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Jeff Squyres
Oops! I neglected to see that you built statically -- hence, all the OMPI plugins got slurped up into their respective libraries (e.g., libmpi.a). If you run ompi_info --param btl tcp, do you see anything at all? If not, that would indicate that the TCP BTL wasn't built. IF so, can you send

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Gabriele Fatigati
MM, my OpenMPI installation haven't this library. Ho can i do to install it? It is very important? Or i can use OpenMPI without this module? 2010/4/13 Jeff Squyres > Check in your installation directory under $lib/openmpi -- see if > mca_btl_tcp.* is there. There should be a .so file (and pro

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Jeff Squyres
Check in your installation directory under $lib/openmpi -- see if mca_btl_tcp.* is there. There should be a .so file (and probably a .la file as well). If the .so is not there, then the BTL TCP plugin is not installed (which would be darn weird, to be honest...). On Apr 13, 2010, at 8:23 AM,

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Gabriele Fatigati
Hi Jeff, thaks for your reply! If i set yout command the response is empty. This means i haven't installed TCP BTL plugin? How can i check it? These are my build flags: --disable-ipv6 --disable-dlopen --enable-static --with-openib --with-memory-manager=none --with-mpi-f90-size=medium --with

Re: [OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Jeff Squyres
No, that param is still there: $ ompi_info --param btl tcp --parsable | grep clude: mca:btl:tcp:param:btl_tcp_if_include:value: mca:btl:tcp:param:btl_tcp_if_include:data_source:default value mca:btl:tcp:param:btl_tcp_if_include:status:writable mca:btl:tcp:param:btl_tcp_if_include:help:Comma-delimi

[OMPI users] btl_tcp_if_exclude param

2010-04-13 Thread Gabriele Fatigati
Dear OpenMPI users and developers, I'm trying OpenMPI 1.3.3 and i've noted that btl_tcp_if_exclude is not supported from new version: the response to this command: ompi_info --param all all | grep btl_tcp_if_exclude is empty. Maybe that params is renamed? Thanks in advance -- Ing. Gabriel