On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote:
> I seem to recall that you have an IB-based cluster, right?
>
> From a *very quick* glance at the code, it looks like this might be a simple
> incorrect-finalization issue. That is:
>
> - you run the job on a single serve
Greg:
Can you run with "--mca btl_base_verbose 100" on your debug build so that we
can get some additional output to see why UDCM is failing to setup properly?
On Jun 10, 2014, at 10:25 AM, Nathan Hjelm wrote:
> On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote:
>> I s
Yes, it should be possible for me to get an upgraded Intel compiler on that
system. However, as you suggest, I'm more focused on getting it working with
GCC right now.
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Monday, J
Jeff/Nathan,
I ran the following with my debug build of OpenMPI 1.8.1 - after opening a
terminal on a compute node with "qsub -l nodes 2 -I":
mpirun -mca btl openib,self -mca btl_base_verbose 100 -np 2 ring_c &>
output.txt
Output and backtrace are attached. Let me know if I can provide
Well, thats interesting. The output shows that ibv_create_cq is
failing. Strange since an identical call had just succeeded (udcm
creates two completion queues). Some questions that might indicate where
the failure might be:
Does this fail on any other node in your system?
How long has the node
Yes, this fails on all nodes on the system, except for the head node.
The uptime of the system isn't significant. Maybe 1 week, and it's received
basically no use.
-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Tuesday, June 10, 2014 2:49 PM
To: Fischer, Greg A.
Cc:
Out of curiosity what is the mlock limit on your system? If it is too
low that can cause ibv_create_cq to fail. To check run ulimit -m.
-Nathan Hjelm
Application Readiness, HPC-5, LANL
On Tue, Jun 10, 2014 at 02:53:58PM -0400, Fischer, Greg A. wrote:
> Yes, this fails on all nodes on the system,
[binf316:fischega] $ ulimit -m
unlimited
Greg
-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Tuesday, June 10, 2014 2:58 PM
To: Fischer, Greg A.
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Out of curiosity what is the mlock limit on you
btw, the output comes from ompi`s libevent and not from slurm itself (sorry
about confusion and thanks to Yossi for catching this)
opal/mca/event/libevent2021/libevent/epoll.c:
event_warn("Epoll %s(%d) on fd %d failed. Old events were %d; read change
was %d (%s); write change was %d (%s)",
opal/
Artem is investigating with Timur
On Jun 10, 2014, at 12:34 PM, Mike Dubman wrote:
> btw, the output comes from ompi`s libevent and not from slurm itself (sorry
> about confusion and thanks to Yossi for catching this)
>
>
> opal/mca/event/libevent2021/libevent/epoll.c:
10 matches
Mail list logo