Re: [OMPI users] Problem getting job to start

2015-06-23 Thread Jeff Squyres (jsquyres)
Specifically, it means that Open MPI could not find the "orted" executable on some nodes ("orted" is the Open MPI helper daemon). Hence, your Open MPI install is either not in your PATH / LD_LIBRARY_PATH on those nodes, or, as Gilles mentioned, Open MPI is not installed on those nodes. Check o

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Ralph Castain
Wow - that is one sick puppy! I see that some nodes are reporting not-bound for their procs, and the rest are binding to socket (as they should). Some of your nodes clearly do not have hyper threads enabled (or only have single-thread cores on them), and have 2 cores/socket. Other nodes have 8 core

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Lane, William
Gilles, The nodes do not all have the same configuration. There are probably 6 different hardware configurations (as to memory, number of sockets populated, types of CPU). Some of the systems are older dual core Xeons (5160 and L5240 CPU's) installed in a blade chassis (some of these blades hav

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Lane, William
Ralph, There is something funny going on, the trace from the runs w/the debug build aren't showing any differences from what I got earlier. However, I did do a run w/the --bind-to core switch and was surprised to see that hyperthreading cores were sometimes being used. Here's the traces that I ha

Re: [OMPI users] Problem getting job to start

2015-06-23 Thread Gilles Gouaillardet
Jeff, it sounds like openmpI is not available on sone nodes ! an other possibility is it is installed but in an other directory or mabye it is not in your path and you did not configure with --enable-mpirun-prefix-by-default Cheers, Gilles On Wednesday, June 24, 2015, Jeff Layton wrote: > Go

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Ralph Castain
You shouldn't need any special flags for mpicc or mpirun to replicate the problem. This will just let us see the line numbers associated with the crash so we can narrow down the problem. Once we get that, we may need to rerun with specific params to narrow it down further. BTW: when you get the ba

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Lane, William
Ralph, I've had OpenMPI 1.8.6 installed on our cluster w/the --enable-debug option. Here's what I think are the relevant flags returned from ompi_info: openMPI 1.8.6 build info Fort MPI_SIZEOF: no C profiling: yes C++ profiling: yes Fort mpif.h profiling: yes Fort use mpi profiling: y

[OMPI users] Problem getting job to start

2015-06-23 Thread Jeff Layton
Good afternoon sports fans! I'm trying to run the ft code of NPB, class D, over 128 processors. I built the code with gfortran 4.4.7 (CentOS 6 platform) and Open MPI 1.8.1.  I'm using openlava as the resource manager. The error output is the following: [ec2-user@ip-172-31-42-106 bin]$ more runit