Cross your fingers; we might release tomorrow (I've probably now
jinxed it by saying that!).
On Jan 12, 2009, at 1:54 PM, Justin wrote:
In order for me to test this out I need to wait for TACC to install
this version on Ranger. Right now they have version 1.3a1r19685
installed. I'm gues
In order for me to test this out I need to wait for TACC to install this
version on Ranger. Right now they have version 1.3a1r19685 installed.
I'm guessing this is probably an older version. I'm not sure when TACC
will get around to updating there OpenMPI version. I could request them
to u
Justin --
Could you actually give your code a whirl with 1.3rc3 to ensure that
it fixes the problem for you?
http://www.open-mpi.org/software/ompi/v1.3/
On Jan 12, 2009, at 1:30 PM, Tim Mattox wrote:
Hi Justin,
I applied the fixes for this particular deadlock to the 1.3 code base
late
Hi Justin,
I applied the fixes for this particular deadlock to the 1.3 code base
late last week, see ticket #1725:
https://svn.open-mpi.org/trac/ompi/ticket/1725
This should fix the described problem, but I personally have not tested
to see if the deadlock in question is now gone. Everyone should
Hi, has this deadlock been fixed in the 1.3 source yet?
Thanks,
Justin
Jeff Squyres wrote:
On Dec 11, 2008, at 5:30 PM, Justin wrote:
The more I look at this bug the more I'm convinced it is with openMPI
and not our code. Here is why: Our code generates a
communication/execution schedul
On Dec 11, 2008, at 5:30 PM, Justin wrote:
The more I look at this bug the more I'm convinced it is with
openMPI and not our code. Here is why: Our code generates a
communication/execution schedule. At each timestep this schedule is
executed and all communication and execution is perform
The more I look at this bug the more I'm convinced it is with openMPI
and not our code. Here is why: Our code generates a
communication/execution schedule. At each timestep this schedule is
executed and all communication and execution is performed. Our problem
is AMR which means the communi
George --
Is this the same issue that you're working on?
(we have a "blocker" bug for v1.3 about deadlock at heavy messaging
volume -- on Tuesday, it looked like a bug in our freelist...)
On Dec 9, 2008, at 10:28 AM, Justin wrote:
I have tried disabling the shared memory by running with th
I have tried disabling the shared memory by running with the following
parameters to mpirun
--mca btl openib,self --mca btl_openib_ib_timeout 23 --mca
btl_openib_use_srq 1 --mca btl_openib_use_rd_max 2048
Unfortunately this did not get rid of any hangs and has seemed to make
them more common
The current version of Open MPI installed on ranger is 1.3a1r19685 which
is from early October. This version has a fix for ticket #1378. Ticket
#1449 is not an issue is this case because each node has 16 processors
and #1449 is for larger SMPs.
However, I am wondering if this is because of
also see https://svn.open-mpi.org/trac/ompi/ticket/1449
On 12/9/08, Lenny Verkhovsky wrote:
>
> maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
>
> On 12/5/08, Justin wrote:
>>
>> The reason i'd like to disable these eager buffers is to help detect the
>> deadlock bett
maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
On 12/5/08, Justin wrote:
>
> The reason i'd like to disable these eager buffers is to help detect the
> deadlock better. I would not run with this for a normal run but it would be
> useful for debugging. If the deadlock i
The reason i'd like to disable these eager buffers is to help detect the
deadlock better. I would not run with this for a normal run but it
would be useful for debugging. If the deadlock is indeed due to our
code then disabling any shared buffers or eager sends would make that
deadlock reprod
OpenMPI has differnt eager limits for all the network types, on your
system run:
ompi_info --param btl all
and look for the eager_limits
You can set these values to 0 using the syntax I showed you before.
That would disable eager messages.
There might be a better way to disable eager messag
Thank you for this info. I should add that our code tends to post a lot
of sends prior to the other side posting receives. This causes a lot of
unexpected messages to exist. Our code explicitly matches up all tags
and processors (that is we do not use MPI wild cards). If we had a dead
lock
When ever this happens we found the code to have a deadlock. users
never saw it until they cross the eager->roundevous threshold.
Yes you can disable shared memory with:
mpirun --mca btl ^sm
Or you can try increasing the eager limit.
ompi_info --param btl sm
MCA btl: parameter "btl_sm_eage
On Dec 5, 2008, at 12:22 PM, Justin wrote:
Does OpenMPI have any known deadlocks that might be causing our
deadlocks?
Known deadlocks, no. We are assisting a customer, however, with a
deadlock that occurs in IMB Alltoall (and some other IMB tests) when
using 128 hosts and the MX BTL. We h
Hi,
We are currently using OpenMPI 1.3 on Ranger for large processor jobs
(8K+). Our code appears to be occasionally deadlocking at random within
point to point communication (see stacktrace below). This code has been
tested on many different MPI versions and as far as we know it does not
c
18 matches
Mail list logo