[OMPI users] Segmentation fault with SLURM and non-local nodes

2011-01-27 Thread Michael Curtis
Hi, I'm not sure whether this problem is with SLURM or OpenMPI, but the stack traces (below) point to an issue within OpenMPI. Whenever I try to launch an MPI job within SLURM, mpirun immediately segmentation faults -- but only if the machine that SLURM allocated to MPI is different to the one

[OMPI users] Argument parsing issue

2011-01-27 Thread Gabriele Fatigati
Dear OpenMPI users and developers, i'm using OpenMPI 1.4.3 and Intel compiler. My simple application require 3 line arguments to work. If i use the follow command: mpirun -np 2 ./a.out a b "c d" It works well. Debugging my application with Totalview: mpirun -np 2 --debug ./a.out a b "c d" Ar

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Reuti
Hi, Am 27.01.2011 um 09:48 schrieb Gabriele Fatigati: > Dear OpenMPI users and developers, > > i'm using OpenMPI 1.4.3 and Intel compiler. My simple application require 3 > line arguments to work. If i use the follow command: > > mpirun -np 2 ./a.out a b "c d" > > It works well. > > Debuggin

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Gabriele Fatigati
Mm, doing as you suggest the output is: a b "c d" and not: a b "c d" 2011/1/27 Reuti > Hi, > > Am 27.01.2011 um 09:48 schrieb Gabriele Fatigati: > > > Dear OpenMPI users and developers, > > > > i'm using OpenMPI 1.4.3 and Intel compiler. My simple application require > 3 line arguments to wo

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Reuti
Am 27.01.2011 um 10:32 schrieb Gabriele Fatigati: > Mm, > > doing as you suggest the output is: > > a > b > "c > d" Whoa - your applications without the debugger is running fine - so I don't think that it's a problem with `mpirun` per se. The same happens with single quotes inside double quot

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Gabriele Fatigati
The problem is how mpirun scan input parameters when Totalview is invoked. There is some wrong behaviour in the middle :( 2011/1/27 Reuti > Am 27.01.2011 um 10:32 schrieb Gabriele Fatigati: > > > Mm, > > > > doing as you suggest the output is: > > > > a > > b > > "c > > d" > > Whoa - your appli

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Ralph Castain
The problem is that mpirun regenerates itself to exec a command of "totalview mpirun ", and the quotes are lost in the process. Just start your debugged job with "totalview mpirun ..." and it should work fine. On Jan 27, 2011, at 3:00 AM, Gabriele Fatigati wrote: > The problem is how mpiru

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Gabriele Fatigati
The command "totalview mpirun..." starts debugging on mpirun not on my executable :( Code showed is related to main.c of OpenMPI. 2011/1/27 Ralph Castain > The problem is that mpirun regenerates itself to exec a command of > "totalview mpirun ", and the quotes are lost in the process. >

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Jeff Squyres
I found the code in OMPI that is dropping the quoting. Specifically: it *is* OMPI that is dropping your quoting / splitting "foo bar" into 2 arguments when re-execing totalview. Let me see if I can gin up a patch... On Jan 27, 2011, at 7:42 AM, Ralph Castain wrote: > The problem is that mpi

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Gabriele Fatigati
Ok Jeff, explain me where is the code and i'll try to fix it. Thanks a lot. 2011/1/27 Jeff Squyres > I found the code in OMPI that is dropping the quoting. > > Specifically: it *is* OMPI that is dropping your quoting / splitting "foo > bar" into 2 arguments when re-execing totalview. > > Let m

[OMPI users] allow job to survive process death

2011-01-27 Thread Kirk Stako
Hi, I was wondering what support Open MPI has for allowing a job to continue running when one or more processes in the job die unexpectedly? Is there a special mpirun flag for this? Any other ways? It seems obvious that collectives will fail once a process dies, but would it be possible to create

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey
The current version of Open MPI does not support continued operation of an MPI application after process failure within a job. If a process dies, so will the MPI job. Note that this is true of many MPI implementations out there at the moment. At Oak Ridge National Laboratory, we are working on

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > The current version of Open MPI does not support continued operation of an > MPI application after process failure within a job. If a process dies, so > will the MPI job. Note that this is true of many MPI implementations out > there at the moment

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Ralph Castain
On Jan 27, 2011, at 7:47 AM, Reuti wrote: > Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > >> The current version of Open MPI does not support continued operation of an >> MPI application after process failure within a job. If a process dies, so >> will the MPI job. Note that this is true of

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey
On Jan 27, 2011, at 9:47 AM, Reuti wrote: > Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > >> The current version of Open MPI does not support continued operation of an >> MPI application after process failure within a job. If a process dies, so >> will the MPI job. Note that this is true of

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 16:10 schrieb Joshua Hursey: > > On Jan 27, 2011, at 9:47 AM, Reuti wrote: > >> Am 27.01.2011 um 15:23 schrieb Joshua Hursey: >> >>> The current version of Open MPI does not support continued operation of an >>> MPI application after process failure within a job. If a process

Re: [OMPI users] Argument parsing issue

2011-01-27 Thread Jeff Squyres
I did my patch against the development trunk; could you try the attached patch to a trunk nightly tarball and see if that works for you? If it does, I can provide patches for v1.4 and v1.5 (the code moved a bit between these 3 versions, so I would need to adapt the patches a little). On Jan 2

[OMPI users] Experiences with Mellanox Connect-X HCA ?

2011-01-27 Thread Kevin . Buckley
Just touting around for any experiences with the following, combination (if it's already out there somewhere?) ahead of fully spec-ing a required software stack: Mellanox Connect-X HCAs talking through a Voltaire ISR4036 IB QDR switch RHEL (yep, not the usual NetBSD!) OFED (built with