Hi OpenMPI users,
I have been using OpenMPI for quite a few years now. Recently I figured out
some memory related issues which are quite bothering me.
I have OpenMPI 1.8.3 version installed on different machines. All machines
are SMPs and linux x86_64. The Machine one and one-1 are installed with
Hi,
today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
x86_64) with gcc-4.9.2 and Sun C 5.13 and I got the following error
on all platforms with gcc.
...
make[2]: Entering directory
`/export2/src/openmpi-1.10.0/openmp
Hi,
today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
x86_64) with gcc-4.9.2 and Sun C 5.13 and I got the following error
on all platforms with cc.
...
make[2]: Entering directory
`/export2/src/openmpi-1.10.0/openmpi
Hi,
today I tried to build openmpi-v1.8.5-40-g7b9e672 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
x86_64) with gcc-4.9.2 and Sun C 5.13 and I got the same error
on all three platforms with both compilers.
...
make[2]: Entering directory
`/export2/src/openmpi-1.8.
Hello,
I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where
one node is a PPC64 BE byte order and the other is a
X86_64 LE byte order node. OMPI 1.8.4 is configured with
--enable-heterogeneous:
./configure --with-openib=/usr CC=gcc CXX=g++ F77=gfortran FC=gfortran
--e
Just to check the obvious: I assume that the /usr/mpi directory is not network
mounted, and both application and OMPI code are appropriately compiled on each
side?
There is another mpirun flag —hetero-apps that you may need to provide. It has
been so long since someone tried this that I’d have
On May 30, 2015, at 9:42 AM, Jeff Layton wrote:
>
> The error happens during the configure step before compiling.
Hmm -- I'm confused. You show output from "make" in your previous mails...?
> However, I ran the make command as you indicated and I'm
> attaching the output to this email.
Ok, th
Dear OpenMPI users/developers,
We are experiencing a problem when debugging the message queues:
Summary: Message queues debugging broken on recent OpenMPI versions.
Affected OpenMPI versions: 1.8.3, 1.8.4 and 1.8.5 (at least).
The debug message queue library is not returning any pending message
Well, I checked and it looks to me like —hetero-apps is a stale option in the
master at least - I don’t see where it gets used.
Looking at the code, I would suspect that something didn’t get configured
correctly - either the —enable-heterogeneous flag didn’t get set on one side,
or we incorrect
Just to be sure. How are you measuring the memory usage? If you are
using /proc/meminfo are you subracting out the Cached memory usage?
-Nathan
On Mon, Jun 01, 2015 at 04:54:45AM -0400, Manoj Vaghela wrote:
>Hi OpenMPI users,
>
>I have been using OpenMPI for quite a few years now. Recen
Hmm, a master-ism that made it into 1.10. Wasn't caught by Jenkins. Will
fix now.
-Nathan
On Mon, Jun 01, 2015 at 01:06:43PM +0200, Siegmar Gross wrote:
> Hi,
>
> today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines
> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
https://github.com/open-mpi/ompi-release/pull/299
On Mon, Jun 01, 2015 at 01:06:43PM +0200, Siegmar Gross wrote:
> Hi,
>
> today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines
> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
> x86_64) with gcc-4.9.2 and Sun C 5.13 a
how was this configured? We aren’t seeing this problem elsewhere.
> On Jun 1, 2015, at 4:06 AM, Siegmar Gross
> wrote:
>
> Hi,
>
> today I tried to build openmpi-v1.8.5-40-g7b9e672 on my machines
> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1
> x86_64) with gcc-4.9.2 and Sun
It looks to me like the default queue pair settings are different on the
different systems. You can try setting the mca_btl_openib_receive_queues
variable by hand. If this is infiniband I recommend not using any
per-peer queue pairs and use something like:
S,2048,1024,1008,64:S,12288,1024,1008,64
> On Apr 30, 2015, at 1:16 PM, Noam Bernstein
> wrote:
>
>> On Apr 30, 2015, at 12:03 PM, Ralph Castain wrote:
>>
>> The planning is pretty simple: at startup, mpirun launches a daemon on each
>> node. If —hetero-nodes is provided, each daemon returns the topology
>> discovered by hwloc - ot
This probably isn’t very helpful, but fwiw: we added an automatic “fingerprint”
capability in the later OMPI versions just to detect things like this. If the
fingerprint of a backend node doesn’t match the head node, we automatically
assume hetero-nodes. It isn’t foolproof, but it would have pic
> On Jun 1, 2015, at 5:09 PM, Ralph Castain wrote:
>
> This probably isn’t very helpful, but fwiw: we added an automatic
> “fingerprint” capability in the later OMPI versions just to detect things
> like this. If the fingerprint of a backend node doesn’t match the head node,
> we automatically
On 6/1/2015 9:40 AM, Ralph Castain wrote:
Just to check the obvious: I assume that the /usr/mpi directory is not network
mounted, and both application and OMPI code are appropriately compiled on each
side?
Yes.
There is another mpirun flag —hetero-apps that you may need to provide. It has
On 6/1/2015 2:45 PM, Nathan Hjelm wrote:
It looks to me like the default queue pair settings are different on the
different systems. You can try setting the mca_btl_openib_receive_queues
variable by hand. If this is infiniband I recommend not using any
per-peer queue pairs and use something like:
On 6/1/2015 9:53 AM, Ralph Castain wrote:
Well, I checked and it looks to me like —hetero-apps is a stale option in the
master at least - I don’t see where it gets used.
Looking at the code, I would suspect that something didn’t get configured
correctly - either the —enable-heterogeneous flag
This is not a heterogeneous run-time issue -- it's the issue that Nathan cited:
that OMPI detected different receive queue setups on different machines.
As the error message states; the openib BTL simply cannot handle when different
MPI processes specific different receive queue specifications.
I’m wondering if it is also possible that the error message is simply printing
that ID incorrectly. Looking at the code, it appears that we do perform the
network byte translation correctly when we setup the data for transmission
between the processes. However, I don’t see that translation being
22 matches
Mail list logo