[OMPI users] Segfault on any MPI communication on head node

2011-09-23 Thread Vassenkov, Phillip
Hey all, I've been racking my brains over this for several days and was hoping anyone could enlighten me. I'll describe only the relevant parts of the network/computer systems. There is one head node and a multitude of regular nodes. The regular nodes are all identical to each other. If I run an

Re: [OMPI users] Fault Tolerant with openib

2011-09-23 Thread Ralph Castain
On Sep 23, 2011, at 1:21 PM, Guilherme V wrote: > I'm using version 1.4.3 and I forgot to tell that I have made a change in the > orterun.c line 792: > > if (ORTE_JOB_STATE_TERMINATED != exit_state) { > exit(0); /* patch*/ > I don't see how that change can keep your job

Re: [OMPI users] ompi-checkpoint problem on shared storage

2011-09-23 Thread Josh Hursey
It sounds like there is a race happening in the shutdown of the processes. I wonder if the app is shutting down in a way that mpirun does not quite like. I have not tested the C/R functionality in the 1.4 series in a long time. Can you give it a try with the 1.5 series, and see if there is any var

Re: [OMPI users] Fault Tolerant with openib

2011-09-23 Thread Guilherme V
I'm using version 1.4.3 and I forgot to tell that I have made a change in the orterun.c line 792: if (ORTE_JOB_STATE_TERMINATED != exit_state) { exit(0); /* patch*/ Regards > What version of OMPI are you using? The job should terminate in either case - what did you do to

[OMPI users] ompi-checkpoint problem on shared storage

2011-09-23 Thread Dave Schulz
Hi Everyone. I've been trying to figure out an issue with ompi-checkpoint/blcr. The symptoms seem to be related to what filesystem the snapc_base_global_snapshot_dir is located on. I wrote a simple mpi program where rank 0 sends to 1, 1 to 2, etc. then the highest sends to 0. then it waits

Re: [OMPI users] Fault Tolerant with openib

2011-09-23 Thread Ralph Castain
What version of OMPI are you using? The job should terminate in either case - what did you do to keep it running after node failure with tcp? On Sep 23, 2011, at 12:34 PM, Guilherme V wrote: > Hi, > I want to know if anybody is having problems with fault tolerant job using > infiniband. When I

[OMPI users] Fault Tolerant with openib

2011-09-23 Thread Guilherme V
Hi, I want to know if anybody is having problems with fault tolerant job using infiniband. When I run my job with tcp if anything happens with one node, my job keeps running, but if I change my job to use infiniband if anything happens with the infiniband (i.e cable problems) my job fails. Anybody

[OMPI users] Fault Tolerant with openib

2011-09-23 Thread Guilherme V
Hi, I want to know if anybody is having problems with fault tolerant job using infiniband. When I run my job with tcp if anything happens with one node, my job keeps running, but if I change my job to use infiniband if anything happens with the infiniband (i.e cable problems) my job fails. Anybody

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Waclaw Kusnierczyk
On 09/23/2011 09:48 AM, Jeff Squyres wrote: (...) > However, we ultimately discarded it when someone showed a real-world code > that used *multiple* wrapper compilers (i.e., one wrapper compiler invoked > another, which, in turn, invoked another, and then finally invoked the > real/underlying

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Jeff Squyres
On Sep 23, 2011, at 10:07 AM, Waclaw Kusnierczyk wrote: > it's not unusual to use double-hyphen ('--') to separate options > intended for the wrapper from the options intended for the wrapped. so > you could have > >wrapper --foo -- --foo > > with the first --foo interpreted by the wrapper

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Waclaw Kusnierczyk
On 09/23/2011 06:40 AM, Jeff Squyres wrote: (...) > Unless there is an effort undertaken to standardize wrapper compiler flags, > this is not going to happen. Indeed, as I mentioned in a prior email, some > MPI implementations do not have wrapper compilers at all. This makes > standardizatio

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Jeff Squyres
On Sep 23, 2011, at 6:52 AM, Uday Kumar Reddy B wrote: > But that's not really the point - to re-install MPI from sources! One > would like to choose between compilers depending on what's on the > system, and also switch between them to experiment. And if I'm > packacging a software that makes use

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Jeff Squyres
On Sep 23, 2011, at 6:59 AM, Uday Kumar Reddy B wrote: > MVAPICH, for eg., supports the same set of options as MPICH (-help > output is identical); so it would be good if you can too. I don't know > if any other MPIs follow it as well. That's because MVAPICH is a fork of MPICH. -- Jeff Squyres

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Uday Kumar Reddy B
On Fri, Sep 23, 2011 at 3:33 PM, Jeff Squyres wrote: > On Sep 23, 2011, at 2:04 AM, Uday Kumar Reddy B wrote: > >> Okay. BTW, mpicc only has 7 cmdline options, and you probably already >> support some of them (-show), and they are all provided for good >> reason. > > I assume you mean that *MPICH'

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Uday Kumar Reddy B
On Fri, Sep 23, 2011 at 2:39 AM, Gus Correa wrote: > Jeff Squyres wrote: >> >> On Sep 22, 2011, at 4:17 PM, Uday Kumar Reddy B wrote: >> >>> Yes, but I can't find anything about -cc in openmpi's mpicc man page. So, >>> -cc should either not be supported or work as per mpich's mpicc if you are >>>

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Jeff Squyres
On Sep 23, 2011, at 2:04 AM, Uday Kumar Reddy B wrote: > Okay. BTW, mpicc only has 7 cmdline options, and you probably already > support some of them (-show), and they are all provided for good > reason. I assume you mean that *MPICH's* mpicc has 7 command line options. Open MPI's mpicc only

[OMPI users] problems with Intel 12.x compilers and OpenMPI (1.4.3)

2011-09-23 Thread Paul Kapinos
Hi Open MPI volks, we see some quite strange effects with our installations of Open MPI 1.4.3 with Intel 12.x compilers, which makes us puzzling: Different programs reproducibly deadlock or die with errors alike the below-listed ones. Some of the errors looks alike programming issue at first

Re: [OMPI users] openmpi -cc= option

2011-09-23 Thread Uday Kumar Reddy B
On Fri, Sep 23, 2011 at 2:55 AM, Jeff Squyres wrote: > On Sep 22, 2011, at 4:39 PM, Uday Kumar Reddy B wrote: > >>> More specifically: how is mpicc supposed to know that any given option was >>> intended for mpicc and not the underlying compiler, particularly the ones >>> that it doesn't support