Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-24 Thread Gilles Gouaillardet
Daniel, thanks for the logs. an other workaround is to mpirun --mca coll ^hcoll ... i was able to reproduce the issue, and it surprisingly occurs only if the coll_ml module is loaded *before* the hcoll module. /* this is not the case on my system, so i had to hack my mca_base_component_path i

Re: [OMPI users] Error while launching Jobs in LSF with OpenMPI

2015-06-24 Thread Ralph Castain
You probably should update to OMPI 1.8.6 as we spent some time in the 1.8 series refreshing the LSF support. On Wed, Jun 24, 2015 at 3:04 PM, Rahul Pisharody wrote: > Hello all, > > I'm trying to launch a job with OpenMPI using the LSF Scheduler. > However, when I execute the job, I get the fol

[OMPI users] Error while launching Jobs in LSF with OpenMPI

2015-06-24 Thread Rahul Pisharody
Hello all, I'm trying to launch a job with OpenMPI using the LSF Scheduler. However, when I execute the job, I get the following error : ORTE_ERROR_LOG: The specified application failed to start in file plm_lsf_module.c at line 305 lsb_launch failed: 0 I'm using OpenMPI 1.6.4 The LSF version

[OMPI users] vader/sm not being picked up

2015-06-24 Thread Dave Turner
Running OpenMPI 1.8.4 one application running on 16 cores of a single node takes over an hour compared to just 7 minutes for MPICH. If I use --mca btl vader,sm,self it runs in the same 7 minutes as MPICH. If I throw in the tcp and openib btl's it also runs quickly, so it seems to just not be

[OMPI users] Update to Open MPI version number scheme

2015-06-24 Thread Jeff Squyres (jsquyres)
Greetings Open MPI users and system administrators. In response to user feedback, Open MPI is changing how its releases will be numbered. In short, Open MPI will no longer be released using an "odd/even" cadence corresponding to "feature development" and "super stable" releases. Instead, each

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-24 Thread Ralph Castain
I think trying with --mca btl ^sm makes a lot of sense and may solve the problem. I also noted that we are having trouble with the topology of several of the nodes - seeing only one socket, non-HT where you say we should see two sockets and HT-enabled. In those cases, the locality is "unknown" - gi

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-24 Thread Gilles Gouaillardet
Bill, were you able to get a core file and analyze the stack with gdb ? I suspect the error occurs in mca_btl_sm_add_procs but this is just my best guess. if this is correct, can you check the value of mca_btl_sm_component.num_smp_procs ? as a workaround, can you try mpirun --mca btl ^sm ... I

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-24 Thread Lane, William
Gilles, All the blades only have two core Xeons (without hyperthreading) populating both their sockets. All the x3550 nodes have hyperthreading capable Xeons and Sandybridge server CPU's. It's possible hyperthreading has been disabled on some of these nodes though. The 3-0-n nodes are all IBM x

Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-24 Thread Daniel Letai
Gilles, Attached the two output logs. Thanks, Daniel On 06/22/2015 08:08 AM, Gilles Gouaillardet wrote: Daniel, i double checked this and i cannot make any sense with these logs. if coll_ml_priority is zero, then i do not any way how ml_coll_hier_barrier_setup can be invoked. could you pl