Well, it's telling you that your program segfaulted - so I'd start with that,
perhaps looking for any core it might have dropped.
On Jul 4, 2013, at 8:36 PM, Rick White wrote:
> Hello,
>
> I have this error:
> mpiexec noticed that process rank 1 with PID 16087 on node server exited on
> sign
Compile with -traceback and -check all if using Intel. Otherwise find the
right compiler options to check array bounds accesses and to dump a stack
trace. Then compile debug and run that way. Assuming it fails, you probably
will get good info on the source of the problem. If it doesn't fail th
Dear Gus,
Thanks for your help - your clue solved my problem!
The ultimate solution was to limit mpi communications to the local,
unrouted subnet. I made this the default behavior of all users of my
cluster by adding the following line to the bottom of my
$prefix/etc/openmpi-mca-params.conf file
I'm part of a team that maintains a global climate model running under
mpi. Recently we have been trying it out with different mpi stacks
at high resolution/processor counts.
At one point in the code there is a large number of mpi_isends/mpi_recv
(tens to hundreds of thousands) when data distrib
Hello OpenMPI
We area seriously considering deploying OpenMPI 1.6.5 for production (and
1.7.2 for testing) on HPC clusters which consists of nodes with *different
types of networking interfaces*.
1) Interface selection
We are using OpenMPI 1.6.5 and was wondering how one would go about
selectin
I can't speak for MVAPICH - you probably need to ask them about this scenario.
OMPI will automatically select whatever available transport that can reach the
intended process. This requires that each communicating pair of processes have
access to at least one common transport.
So if a process t
Sorry on the mvapich2 reference :)
All nodes are attached over a common 1GigE network. We wish ofcourse that
if a node-pair is connected via a higher-speed fabric *as well* (IB FDR or
10GigE) then that this would be leveraged instead of the common 1GigE.
One question: suppose that we use nodes ha
As long as the IB interfaces can communicate to each other, you should be fine.
On Jul 5, 2013, at 3:26 PM, Michael Thomadakis wrote:
> Sorry on the mvapich2 reference :)
>
> All nodes are attached over a common 1GigE network. We wish ofcourse that if
> a node-pair is connected via a higher-s
From: basmaabdelaz...@hotmail.com
To: us...@open-mpi.org
Subject: checkpoint-restart of version 1.6.5
Date: Fri, 5 Jul 2013 02:40:36 +0200
does open mpi 1.6.5 support checkpoint restart ( self or blcr) ?
i did not find ompi-checkpoint or ompi-restart in the documentation list of
version
If you check the installed bin directory, you will find they are still there.
Whether they work or not is something I wouldn't know - I suggest trying them
to see.
On Jul 5, 2013, at 6:05 PM, basma a.azeem wrote:
>
>
> From: basmaabdelaz...@hotmail.com
> To: us...@open-mpi.org
> Subject: che
Great ... thanks. We will try it out as soon as the common backbone IB is
in place.
cheers
Michael
On Fri, Jul 5, 2013 at 6:10 PM, Ralph Castain wrote:
> As long as the IB interfaces can communicate to each other, you should be
> fine.
>
> On Jul 5, 2013, at 3:26 PM, Michael Thomadakis
> w
11 matches
Mail list logo