the setup of the network.
Unfortunately, the person who knows the most has recently left the
organization.)
Greg
From: Pritchard Jr., Howard
Sent: Thursday, October 14, 2021 5:45 PM
To: Fischer, Greg A. ; Open MPI Users
Subject: Re: [EXTERNAL] [OMPI users] OpenMPI 3.1.6 openib failure: &q
2:28 PM
To: Open MPI Users
Cc: Fischer, Greg A.
Subject: Re: [EXTERNAL] [OMPI users] OpenMPI 3.1.6 openib failure: "mlx4_0
errno says Success"
[External Email]
HI Greg,
It's the aging of the openib btl.
You may be able to apply the attached patch. Note the 3.1.x release strea
) API
The version of librdmacm we have comes from
librdmacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64, which seems to date from
mid-2017. I wonder if that's too old?
Greg
From: Pritchard Jr., Howard
Sent: Thursday, October 14, 2021 3:31 PM
To: Fischer, Greg A. ; Open MPI Users
S
Hello,
I have compiled OpenMPI 3.1.6 from source on SLES12-SP3, and I am seeing the
following errors when I try to use the openib btl:
WARNING: There was an error initializing an OpenFabrics device.
Local host: bl1308
Local device: mlx4_0
--
Hello,
I'm trying to run the "connectivity_c" test on a variety of systems using
OpenMPI 1.8.4. The test returns segmentation faults when running across nodes
on one particular type of system, and only when using the openib BTL. (The test
runs without error if I stipulate "--mca btl tcp,self".)
ached file, to see if it was enabled
or not.
Maxime
Le 2014-06-25 10:46, Fischer, Greg A. a écrit :
Attached are the results of "grep thread" on my configure output. There appears
to be some amount of threading, but is there anything I should look for in
particular?
I see Mike Dubman
rs] poor performance using the openib btl
What are your threading options for OpenMPI (when it was built) ?
I have seen OpenIB BTL completely lock when some level of threading is enabled
before.
Maxime Boissonneault
Le 2014-06-24 18:18, Fischer, Greg A. a écrit :
Hello openmpi-users,
A few w
Hello openmpi-users,
A few weeks ago, I posted to the list about difficulties I was having getting
openib to work with Torque (see "openib segfaults with Torque", June 6, 2014).
The issues were related to Torque imposing restrictive limits on locked memory,
and have since been resolved.
Howeve
RDMACM CPC (instead
>>>> of
>>>> UDCM, which is a pretty recent addition to the openIB BTL.) by setting:
>>>>
>>>> -mca btl_openib_cpc_include rdmacm
>>>>
>>>> Josh
>>>>
>>>> On Wed, Jun 11
Is there any other work around that I might try? Something that avoids UDCM?
-Original Message-
From: Fischer, Greg A.
Sent: Tuesday, June 10, 2014 2:59 PM
To: Nathan Hjelm
Cc: Open MPI Users; Fischer, Greg A.
Subject: RE: [OMPI users] openib segfaults with Torque
[binf316:fischega
[binf316:fischega] $ ulimit -m
unlimited
Greg
-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Tuesday, June 10, 2014 2:58 PM
To: Fischer, Greg A.
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Out of curiosity what is the mlock limit on
Yes, this fails on all nodes on the system, except for the head node.
The uptime of the system isn't significant. Maybe 1 week, and it's received
basically no use.
-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Tuesday, June 10, 2014 2:49 PM
To: Fischer,
Jeff/Nathan,
I ran the following with my debug build of OpenMPI 1.8.1 - after opening a
terminal on a compute node with "qsub -l nodes 2 -I":
mpirun -mca btl openib,self -mca btl_base_verbose 100 -np 2 ring_c &>
output.txt
Output and backtrace are attached. Let me know if I can provide
PI)
On Jun 4, 2014, at 5:15 PM, Ralph Castain wrote:
> Aha!! I found this in our users mailing list archives:
>
> http://www.open-mpi.org/community/lists/users/2012/01/18091.php
>
> Looks like this is a known compiler vectorization issue.
>
>
> On Jun 4, 2014, a
r the launch is complete.
Are you able to run this with btl tcp,sm,self? If so, that would confirm that
everything else is correct, and the problem truly is limited to the udcm
itself...which shouldn't have anything to do with how the proc was launched.
On Jun 6, 2014, at 6:47 AM, Fisch
tected against that scenario). If you run with -np 2 -mca
btl openib,sm,self, is it happy?
On Jun 5, 2014, at 2:16 PM, Fischer, Greg A.
mailto:fisch...@westinghouse.com>> wrote:
Here's the command I'm invoking and the terminal output. (Some of this
information doesn't a
316 exited on
signal 6 (Aborted).
----------
From: Fischer, Greg A.
Sent: Thursday, June 05, 2014 5:10 PM
To: us...@open-mpi.org
Cc: Fischer, Greg A.
Subject: openib segfaults with Torque
OpenMPI Users,
After encountering difficul
OpenMPI Users,
After encountering difficulty with the Intel compilers (see the "intermittent
segfaults with openib on ring_c.c" thread), I installed GCC-4.8.3 and
recompiled OpenMPI. I ran the simple examples (ring, etc.) with the openib BTL
in a typical BASH environment. Everything appeared to
but in the same part of the interceptor, which makes me
suspicious. Don't know how much testing we've seen on SLES...
On Jun 4, 2014, at 1:18 PM, Fischer, Greg A. wrote:
> Ralph,
>
> It segfaults. Here's the backtrace:
>
> Core was generated by `ring_c'.
>
l to your mca parameter?
>>
>> mpirun -np 2 --mca btl openib,sm,self ring_c
>>
>> As far as I know, sm is the preferred transport layer for intra-node
>> communication.
>>
>> Gus Correa
>>
>>
>> On 06/04/2014 11:13 AM, Ralph Castain wrote:
ble
might help identify the source
On Jun 4, 2014, at 7:55 AM, Fischer, Greg A.
mailto:fisch...@westinghouse.com>> wrote:
Oops, ulimit was set improperly. I generated a core file, loaded it in GDB, and
ran a backtrace:
Core was generated by `ring_c'.
Program terminated
eg
From: Fischer, Greg A.
Sent: Wednesday, June 04, 2014 10:17 AM
To: 'Open MPI Users'
Cc: Fischer, Greg A.
Subject: RE: [OMPI users] intermittent segfaults with openib on ring_c.c
I recompiled with "-enable-debug" but it doesn't seem to be providing any more
informat
us the line number where it
is failing?
On Jun 3, 2014, at 9:58 AM, Fischer, Greg A.
mailto:fisch...@westinghouse.com>> wrote:
Apologies - I forgot to add some of the information requested by the FAQ:
1. OpenFabrics is provided by the Linux distribution:
[binf102:fischega] $ rpm -
being used - I think OpenSM, but I'll
need to check with my administrators.
4. Output of ibv_devinfo is attached.
5. Ifconfig output is attached.
6. Ulimit -l output:
[binf102:fischega] $ ulimit -l
unlimited
Greg
From: Fischer, Greg A.
Sent: Tuesday, June 03, 2014
Hello openmpi-users,
I'm running into a perplexing problem on a new system, whereby I'm experiencing
intermittent segmentation faults when I run the ring_c.c example and use the
openib BTL. See an example below. Approximately 50% of the time it provides the
expected output, but the other 50% of
simply an incompatibility lurking in there
somewhere that would trip openmpi-1.6.5 but not openmpi-1.4.3?
Greg
>-Original Message-
>From: Fischer, Greg A.
>Sent: Friday, January 24, 2014 11:41 AM
>To: 'Open MPI Users'
>Cc: Fischer, Greg A.
>Subject: RE: [OM
>symbol: mca_base_param_reg_int" type of message is almost always an
>indicator of two different versions being installed into the same tree.
>
>
>On Jan 24, 2014, at 11:26 AM, "Fischer, Greg A."
> wrote:
>
>> Version 1.4.3 and 1.6.5 were and are install
;Sent: Friday, January 24, 2014 11:07 AM
>To: Open MPI Users
>Subject: Re: [OMPI users] simple test problem hangs on mpi_finalize and
>consumes all system resources
>
>On Jan 22, 2014, at 10:21 AM, "Fischer, Greg A."
> wrote:
>
>> The reason for deleting the openmpi-
and ring_c.c?
>I.e., let's get the Fortran out of the way and use just the base C bindings,
>and
>see what happens.
>
>
>On Jan 19, 2014, at 6:18 PM, "Fischer, Greg A."
>wrote:
>
>> I just tried running "hello_f90.f90" and see the sa
correctly set to pickup the
OMPI libs you installed - most Linux distros come with an older version, and
that can cause problems if you inadvertently pick them up.
On Jan 19, 2014, at 5:51 AM, Fischer, Greg A.
mailto:fisch...@westinghouse.com>> wrote:
Hello,
I have a simple, 1-p
Hello,
I have a simple, 1-process test case that gets stuck on the mpi_finalize call.
The test case is a dead-simple calculation of pi - 50 lines of Fortran. The
process gradually consumes more and more memory until the system becomes
unresponsive and needs to be rebooted, unless the job is kil
31 matches
Mail list logo