Well, according to George Bosilca
(http://www.open-mpi.org/community/lists/users/2005/02/0005.php),
threads are supported in OpenMPI.
The program I try to run works with the TCP stack and MX driver is
thread-safe, so i guess the problem comes from the MX BTL or MTL.
Francois
Scott Atchley wr
Hi
After updating all my nodes to Open-MPI 1.3.2 (with
--enable-mpi-threads some of them fail to execute a simple MPI test
program - they seem to hang.
With --debug-daemons the application seems to execute (two line os
output) but hangs before returning:
[jody@aplankton neander]$ mpirun -np 2 --h
More info:
I checked and found that not all nodes are equal:
the ones that don't work have mpi-threads *and* progress-threads enabled,
whereas the ones that work have only mpi-threads enabled
Is there a problem when both thread-types are enabled?
Jody
On Thu, Jun 11, 2009 at 12:19 PM, jody wrote
Gleb,
I am trying to use BLCR as well. What levels of OpenMPI, OFED, and BLCR
are you using? I can get a serial checkpoint/restart to work but not the
parallel case. I built my system using OFED 1.3.1, OpenMPI 1.3.1, and BLCR
0.8.1-1. I also used your same BLCR configuration options for OpenM
It's the --enable-progress-threads flag that causes the problem - we
don't really support that yet. Maybe someday.
Take that out and you should be okay, with the caveats expressed on
the OMPI web site (i.e., not everything works with threads yet).
On Jun 11, 2009, at 4:56 AM, jody wrote:
Francois,
For threads, the FAQ has:
http://www.open-mpi.org/faq/?category=supported-systems#thread-support
It mentions that thread support is designed in, but lightly tested. It
is also possible that the FAQ is out of date and MPI_THREAD_MULTIPLE
is fully supported.
The stack trace below
The comment on the FAQ (and on the other thread) is only true for some
BTLs (TCP, SM and MX). I don't have resources to test for the others
BTL, it is their developers responsibility to do the required
modifications to make them thread safe.
In addition, I have to confess that I never teste
Neither the CM PML or the MX MTL has been looked at for thread
safety. There's not much code to cause problems in the CM PML. The
MX MTL would likely need some work to ensure the restrictions Scott
mentioned are met (currently, there's no such guarantee in the MX MTL).
Brian
On Jun 11, 2
Hello. I've got following problem: I'm trying to restart parallel job
over our cluster using following command line:
/common/openmpi-1.3.2/ompi-restart -mca plm-rsh-agent rsh -verbose
-hostfile hfile ompi_global_snapshot_25229.ckpt
despite of using such mca option I got following error message:
The problem is that you misspelled the mca param - it should be:
-mca plm_rsh_agent rsh
On Jun 11, 2009, at 10:34 AM, Gleb Crazy Sage Igumnov wrote:
Hello. I've got following problem: I'm trying to restart parallel job
over our cluster using following command line:
/common/openmpi-1.3.2/ompi-
Brian and George,
I do not know if the stack trace is complete, but I do not see any
mx_* functions called which would indicate a crash inside MX due to
multiple threads trying to complete the same request. It does show an
assert failed.
Francois, is the stack trace from the MX MTL or BTL
The stack trace is from the MX MTL (I attach the backtraces I get with
both MX MTL and MX BTL)
Here is the program that I use. It is quite simple. It runs ping pongs
concurrently (with one thread per node, then with two threads per node,
etc.)
The error occurs when two threads run concurrently.
Almost assuredly, the MTL is not thread safe, and such support is
unlikely to happen in the short term. You might be better off
concentrating on the BTL, as George has done significant work on that
front.
Brian
On Jun 11, 2009, at 12:20 PM, François Trahay wrote:
The stack trace is from
Based on the stack trace, at one point (depth 4) we are in the MX MTL
and then we call free. It might happens that two threads call free
simultaneously ... It is a guess, as there is not enough information
to corroborate this.
george.
On Jun 11, 2009, at 13:17 , Scott Atchley wrote:
Br
On Jun 11, 2009, at 2:20 PM, François Trahay wrote:
The stack trace is from the MX MTL (I attach the backtraces I get
with both MX MTL and MX BTL)
Here is the program that I use. It is quite simple. It runs ping
pongs concurrently (with one thread per node, then with two threads
per node, e
Oops. Here's the trace using the BTL.
Francois
Scott Atchley wrote:
By specifying --mca pml cm, both traces are using the MTL. To use the
BTL, try:
$ mpiexec --mca btl mx,sm,self -machinefile ./joe -np 2 ./concurrent_ping
or simply:
$ mpiexec -machinefile ./joe -np 2 ./concurrent_ping
Scot
Hi,
I'm developing under OSX 10.5.7 with Open-MPI 1.3.2 and am running
into intermittent corruption when send / recv user defined data type.
When running with less than four processes (i.e. mpirun -np [2,3]),
the data is fine, when running with 4 or more the received data is
intermittent
I will take a look at the BTL problem. Can you provide a copy of the
benchmarks please.
Thanks,
george.
On Jun 11, 2009, at 16:05 , François Trahay wrote:
concurrent_ping
Hello,
I'm attempting to wrap my brain around the MPI I/O mechanisms, and I was
hoping to find some guidance. I'm trying to read a file that contains a
117-character string, followed by a series records that contain integers and
reals. The following code would read it in serial:
---
character(l
Did you try to follow the advice on the LAPACK mailing list, i.e.
upgrade your compiler from the MAC OS X default (4.0.1) to 4.3.0 ?
Btw, what is the test you're running? Can you create a small test case
so I can try to reproduce it?
Thanks,
george.
On Jun 11, 2009, at 17:02 , Nick Colli
20 matches
Mail list logo