Ah, I see the "sh: tcp://10.1.25.142,172.31.1.254,10.12.25.142:41686: No such
file or directory" message now -- I was looking for something like that when I
replied before and missed it.
I really wish I understood why the heck that is happening; it doesn't seem to
make sense.
Matt: Random th
I can answer that for you right now. The launch of the orted's is what is
failing, and they are "silently" failing at this time. The reason is simple:
1. we are failing due to truncation of the HNP uri at the first semicolon. This
causes the orted to emit an ORTE_ERROR_LOG message and then abort
Ah, ok -- I think I missed this part of the thread: each of your individual MPI
processes suck up huge gobbs of memory.
So just to be clear, in general: you don't intend to run more MPI processes
than cores per server, *and* you intend to run fewer MPI processes per server
than would consume th
Matt --
We were discussing this issue on our weekly OMPI engineering call today.
Can you check one thing for me? With the un-edited 1.8.2 tarball installation,
I see that you're getting no output for commands that you run -- but also no
errors.
Can you verify and see if your commands are actu
Argh - yeah, I got confused as things context switched a few too many times.
The 1.8.2 release should certainly understand that arrangement, and
--hetero-nodes. The only way it wouldn't see the latter is if you configure it
--without-hwloc, or hwloc refused to build.
Since there was a question
Please send the information listed here:
http://www.open-mpi.org/community/help/
On Sep 2, 2014, at 2:10 PM, Swamy Kandadai wrote:
> Hi:
> While building OpenMPI (1.6.5 or 1.8.1) using openib on our power8 cluster
> with Mellanox IB (FDR) I get the following error:
>
> configure: WARNING
Ralph,
These latest issues (since 8/28/14) all occurred after we upgraded our cluster
to OpenMPI 1.8.2 on . Maybe I should've created a new thread rather
than tacking on these issues to my existing thread.
-Bill Lane
From: users [users-boun...@open-mpi.or
The difficulty here is that you have bundled several errors again into a single
message, making it hard to keep the conversation from getting terribly
confused. I was trying to address the segfault errors on cleanup, which have
nothing to do with the accept being rejected.
It looks like those a
Hi:
While building OpenMPI (1.6.5 or 1.8.1) using openib on our power8 cluster
with Mellanox IB (FDR) I get the following error:
configure: WARNING: infiniband/verbs.h: present but cannot be compiled
configure: WARNING: infiniband/verbs.h: check for missing prerequisite
headers?
configure: WA
On Sep 2, 2014, at 10:48 AM, Lane, William wrote:
> Ralph,
>
> There are at least three different permutations of CPU configurations in the
> cluster
> involved. Some are blades that have two sockets with two cores per Intel CPU
> (and not all
> sockets are filled). Some are IBM x3550 systems
I don't see any line numbers on the errors I flagged - all I see are the usual
memory offsets in bytes, which is of little help. I'm afraid I don't what what
you'd have to do under SunOS to get line numbers, but I can't do much without it
On Sep 2, 2014, at 10:26 AM, Siegmar Gross
wrote:
> H
Ralph,
There are at least three different permutations of CPU configurations in the
cluster
involved. Some are blades that have two sockets with two cores per Intel CPU
(and not all
sockets are filled). Some are IBM x3550 systems having two sockets with three
cores
per Intel CPU (and not all so
Hi Siegmar
Could you please configure this OMPI install with --enable-debug so that gdb
will provide line numbers where the error is occurring? Otherwise, I'm having a
hard time chasing this problem down.
Thanks
Ralph
On Sep 2, 2014, at 6:01 AM, Siegmar Gross
wrote:
> C problem:
> =
I believe this was fixed in the trunk and is now scheduled to come across to
1.8.3
On Sep 2, 2014, at 4:21 AM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc
> (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1 x86_64
> (linpc0)) wi
Would you please try r32662? I believe I finally found and fixed this problem.
On Sep 2, 2014, at 6:12 AM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.9a1r32657 on my machines (Solaris
> 10 Sparc (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1
> x86_64 (linpc0))
Thanks for the advice. Our jobs vary in size, from just a few MPI processes to
about 64. Jobs are submitted at random, which is why I want to map by socket.
If the cluster is empty, and someone submits a job with 16 MPI processes, I
would think it would run most efficiently if it used 8 nodes, 2
On that machine, it would be SLES 11 SP1. I think it's soon transitioning
to SLES 11 SP3.
I also use Open MPI on an RHEL 6.5 box (possibly soon to be RHEL 7).
On Mon, Sep 1, 2014 at 8:41 PM, Ralph Castain wrote:
> Thanks - I expect we'll have to release 1.8.3 soon to fix this in case
> others
Hi,
yesterday I installed openmpi-1.9a1r32657 on my machines (Solaris
10 Sparc (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1
x86_64 (linpc0)) with Sun C 5.12 and gcc-4.9.0.
I have the following problems with my gcc version. First once more
my problems with Java and below my problems
Hi,
yesterday I installed openmpi-1.9a1r32657 on my machines (Solaris
10 Sparc (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1
x86_64 (linpc0)) with Sun C 5.12 and gcc-4.9.0.
I have the following problems with my Sun C version. First my
problem with Java and below my problem with C.
Hi,
yesterday I installed openmpi-1.8.3a1r32641 on my machines (Solaris
10 Sparc (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1
x86_64 (linpc0)) with Sun C 5.12 and gcc-4.9.0. A small Java program
breaks with SIGSEGV. gdb shows the following backtrace for the Sun C
version.
tyr java
Hi Takahiro,
> I forgot to follow the previous report, sorry.
> The patch I suggested is not included in Open MPI 1.8.2.
> The backtrace Siegmar reported points the problem that I fixed
> in the patch.
>
> http://www.open-mpi.org/community/lists/users/2014/08/24968.php
>
> Siegmar:
> Could you
Hi,
yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc
(tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1 x86_64
(linpc0)) with Sun C 5.12. A small Java program works on Linux,
but breaks with a segmentation fault on Solaris 10.
tyr java 172 where mpijavac
mpijavac is a
Hi,
yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc
(tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1 x86_64
(linpc0)) with gcc-4.9.0. A small program works on some machines,
but breaks with a bus error on Solaris 10 Sparc.
tyr small_prog 118 which mpicc
/usr/local/
23 matches
Mail list logo