I'm having problems running OpenMPI jobs
(using a hostfile) on an HPC cluster running
ROCKS on CentOS 6.3. I'm running OpenMPI
outside of Sun Grid Engine (i.e. it is not submitted
as a job to SGE). The program being run is a LAPACK
benchmark. The commandline parameter I'm
using to run the jobs is:
The warning about binding to memory is due to not having numactl-devel
installed on the system. The job would still run, but we are warning you
that we cannot bind memory to the same domain as the core where we bind the
process. Can cause poor performance, but not fatal. I forget the name of
the pa
Ralph,
Here's the associated hostfile:
#openMPI hostfile for csclprd3
#max slots prevents oversubscribing csclprd3-0-9
csclprd3-0-0 slots=12 max-slots=12
csclprd3-0-1 slots=6 max-slots=6
csclprd3-0-2 slots=6 max-slots=6
csclprd3-0-3 slots=6 max-slots=6
csclprd3-0-4 slots=6 max-slots=6
csclprd3-0-
Bingo - you said the magic word. This is a terminology issue. When we say
"core", we mean the old definition of "core", not "hyperthreads". If you
want to use HTs as your base processing unit and bind to them, then you
need to specify --bind-to hwthread. That warning should then go away.
We don't
> 2. Unable to resolve: can you be more specific on this?
This was my mistake. I used "xxx.yyy.zzz" instead of "localhost" in the
startup options for orterun. (More precisely the GUI did it, but I knew
that code.) No idea how 1.6.5 managed to get around the fact that not even
"dig xxx.yyy.zzz" can
I know 1.8.4 is better than 1.6.5 in some regards, but I obviously can't
say if we fixed the specific bug you're referring to in your software. As
you know, thread bugs are really hard to nail down.
That event_base_loop warning could be flagging a known problem in the
openib module during inter-pr
> You might double-check by running with "--mca btl ^openib" to see if that
is the source of the warning
The warning appears always, independent of the interconnect, and even when
running with "--mca btl ^openib".
> Does it only crash when you pause it? Or does it crash while normally
running?
Would it be possible to get a backtrace from one of the crashes? It would
be especially helpful if you can add --enable-debug to the OMPI config.
On Wed, Apr 1, 2015 at 1:09 PM, Thomas Klimpel
wrote:
> > You might double-check by running with "--mca btl ^openib" to see if
> that is the source o