be suprised if that's the reason
why it isn't working.
MVAPICH2's uDAPL support is 1.2 or greater, so I wouldn't be suprised if the
story is similar for Open MPI.
I'd add that it may be useful to others to mention what version(s) of uDAPL
work with Open MPI in the documentation or FAQ.
--
Troy Telford
On Monday 22 October 2007, Troy Telford wrote:
> WARNING: Failed to open "ib0"
Whoops; I typed in the wrong text here. The failure was "Failed to
Open "InfiniHost0" - ie. the name listed in the warning matches the name
in /etc/dat.conf.
--
Troy Telford
specify a DAT provider, and I play with the
name in /etc/dat.conf, Open MPI seems aware of the name change; it will
list 'failed to open "newname"'
my /etc/dat.conf looks like this:
InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " "
Any ideas on why I'm not able to get Open MPI to use uDAPL?
--
Troy Telford
dbit that is probably insignificant, but I'll mention anyway: We
are running IBM's GPFS via IPoIB, so there is a little bit of IB traffic from
GPFS - which is also a configuration we've used with no problems in the past.
Any ideas on what I can do to verify that OpenMPI is in fact using the IB
fabric?
--
Troy Telford
n
rank
by slot or node
This is good to know; for some reason it seemed logical that the batch
scheduler should know how many processes per node, and TM should be able
to get the information. But that's making assumptions...
Thanks!
--
Troy Telford
r some reason, I
recall that OMPI used to be able to get the number of processes to run
from PBS; am I just 'remembering' something that never existed?
--
Troy Telford
all this being one of the
errors I've seen).
Also, is there any chance that the error can be caused by mismatched
libraries (from a different compile of Open MPI?)
(And I apologize for firing off this without knowing more; I'm still
gathering data as I learn more...)
--
Troy Telford
On Tue, 31 Oct 2006 08:43:10 -0700, Galen M. Shipman
wrote:
Okay, so these are percentage not modulus, the formula makes some sense
now..
so the timeout is between 4.9 and 10.3 ms, you had better plug the cable
in/out very quickly
The Flash could do it.
--
Troy Telford
e
(and/or had enough information) to figure out that it can't continue at
all, and will abort the job.
--
Troy Telford
I'll take a deeper look, and can provide things like
the config.log, etc. I just don't want to flood the list at the moment.)
--
Troy Telford
, the thought occurs (and it may just be my ignorance of MPI):
After a network connection times out (as was apparently the case with IB),
is the job salvageable? If the jobs are not salvageable, why didn't Open
MPI abort the job (and clean up the running processes on the nodes)?
--
Troy Telford
On Tue, 03 Oct 2006 11:48:14 -0600, Janet Tvedt wrote:
I was curious if there is a list showing which InfiniBand HCAs are known
to work with Open MPI.
I can't claim to know which ones are *known* to work, but I've never seen
an IB HCA that didn't work with Open MPI.
That being said, here'
I've never set up dapl before, however I now have a reason to try...
The problem is, I can't seem to find any documentation on how to set it
up. I've tried the sample /etc/dat.conf (modified for the IPoIB address
on the system), but I'm not sure I've been sucessful.
I've:
* compiled from O
l Message-
From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org] On Behalf Of Troy Telford
Sent: Friday, June 02, 2006 12:46 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib /compiler issue?
On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford
wrote:
> the 'com
On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford
wrote:
the 'com' test ends with:
[n1:04941] *** An error occurred in MPI_Gather
[n1:04941] *** on communicator MPI_COMM_WORLD
[n1:04941] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:04941] *** MPI_ERRORS_ARE_FATAL (goo
The kernel being used is 2.6.16 -- so it's unlikely that the kernel is too
old. But it may not be explicitly enabled, etc...
--
Troy Telford
On Fri, 02 Jun 2006 09:15:06 -0600, Troy Telford
wrote:
Can you confirm that your Linux installation thinks that it has 4
processors and will schedule 4 processes simultaneously?
D'oh. Still too early in the morning...
OK, Linux thinks it has two CPUs. Period.
For some reason I f
'll have to see if the system
behaves similarly with non-mpi processes (ie. it doesn't use all of the
available cores). It may very well be a problem with the hardware or OS;
it's the pre-release distro I wrote about in another posting yesterday...
I'm wondering if there is something happening behind the scenes... I'll
have to check...
--
Troy Telford
et #41).
And yes, I'm going to try out the dev snapshots of 1.0.3 and 1.1... I'm
just not there yet...
(For those tracking tickets #40 and #41 -- I know it would be nice to see
if distro X has same the behavior I see with FC4, but I don't have the
hardware to do any sort of scale testing with distro X.)
--
Troy Telford
ots=4. Except 'slots=4' makes it run
a few orders of magnitude slower.
Thoughts?
--
Troy Telford
led with errno=113
[0,1,3][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
If I use -np 2 (ie. the job doesn't leave the node, it being a dual-cpu
machine), it works fine.
--
Troy Telford
status 5 for wr_id 8030232 opcode 0
[0,1,144][btl_openib_component.c:782:mca_btl_openib_component_progress]
error polling HP CQ with status 5 for wr_id 8042822 opcode 0
[0,1,144][btl_openib_component.c:782:mca_btl_openib_component_progress]
error polling HP CQ with status 5 for wr_id 8055412 opcode 0
--
Troy Telford
ssors)
PCI Express IB HCA's
Myrinet 10G (MX10G)
Gigabit Ethernet
configured with built with (both) GCC 3.4 and 4.0 -- didn't seem to make
much difference.
/configure --enable-cxx-exceptions
(Note, I use LDFLAGS and CFLAGS to point to the MX & InfiniBand headers.)
--
Troy Tel
ssage ***
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
*** End of error message ***
4 additional processes aborted (not shown)
Any Thoughts/Ideas on how to fix it?
--
Troy Telford
was no subnet manager on the IB fabric (which may well have been the
case, actually). It's working now, though...
--
Troy Telford
\
--enable-mpi-threads \
--enable-progress-threads \
--with-threads \
--enable-static \
--enable-shared \
--enable-cxx-exceptions
(Note that I'm not disabling ROMIO)
But I can compile it fine with:
icc (ICC) 9.0 20060222
ifort (IFORT) 9.0 20060222
--
Troy Telford
or hopefully anybody
else's) part.
On Wed, 12 Apr 2006 10:56:24 -0600, Troy Telford
wrote:
On Wed, 12 Apr 2006 10:04:18 -0600, Brian Barrett
wrote:
We've tested against the SilverStorm drivers for OS X with success,
but I don't think anyone has tried the Linux drivers.
m' has different size in shared object,
consider re-linking
IMB-MPI1.ss: Symbol `ompi_mpi_float' has different size in shared object,
consider re-linking
IMB-MPI1.ss: Symbol `ompi_mpi_comm_world' has different size in shared
object, consider re-linking
IMB-MPI1.ss: Symbol `ompi_mpi_double' has different size in shared object,
consider re-linking
IMB-MPI1.ss: Symbol `ompi_mpi_op_null' has different size in shared
object, consider re-linking
IMB-MPI1.ss: Symbol `ompi_mpi_comm_self' has different size in shared
object, consider re-linking
Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR)
Failing at addr:0x2a99610600
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xa8
--
Troy Telford
ny
development; but if there is a near zero effort method somebody knows of
to get it to work, I'd be interested. Beyond that... well, the 2nd gen
OpenIB.org drivers are on the horizon...
--
Troy Telford
On Tue, 11 Apr 2006 13:48:43 -0600, Troy Telford
wrote:
I have compiled Open MPI (on an Opteron) with the Intel 9 EM64T
compilers;
It's been a while since I've used the 8.1 series, but I'll give it a shot
with Intel 8.1 and tell y
On Tue, 11 Apr 2006 13:19:43 -0600, Hugh Merz
wrote:
I couldn't find any other threads in the mailing list concerning usage
of the Intel EM64T compilers - has anyone successfully compiled OpenMPI
using this combination? It also occurs on the Athlon 64 processor.
Logs attached.
Thanks
confusion...
Whenever --enable-X is used to DISable something, it's bound to cause some
head scratching.
default:enabled -- does this mean the option which /dis/ables the SMP
locks is the default, or does it mean that SMP locks are enabled by
default.
--
Troy Telford
user in my same situation goes
searching through the archives...) Well, that and I need to subscribe to
-dev.
--
Troy Telford
the info; I guess when PBS Pro is there, there's only
one option...
--
Troy Telford
mentation
from the x86-64 project:
http://www.x86-64.org/lists/discuss/msg05760.html (Basically, a plead
with suppliers of static libraries to compile with -fPIC on x86-64).
Sigh...
--
Troy Telford
I have added max_btls to the openib component on the trunk, try:
mpirun --mca btl_openib_max_btls 1 ...etc
I don't have a dual nic machine handy to test on, if this checks out we
can patch the release branch.
Thanks,
Galen
I'll get to it as soon as I can; but it may be a few day
/need/ for this.
--
Troy Telford
On Mar 9, 2006, at 9:18 PM, Brian Barrett wrote:
On Mar 9, 2006, at 6:41 PM, Troy Telford wrote:
I've got a machine that has the following config:
Each node has two InfiniBand ports:
* The first port is on fabric 'a' with switches for 'a'
* The second port is on
bgm.so' to ensure one of 'em
is 64-bit, and that the 64-bit library is in a path where the linker can
find it (ld.so.conf or LD_LIBRARY_PATH).
--
Troy Telford
first IB port (ie. fabric 'a'), and leaves the second IB port (ie.
fabric 'b') free for other uses (I'll use NFS as a humorous example).
If so, is there any magic required to configure it thusly?
Troy
Telford
.0 to test with.
No. I refuse :p
Attatched is a tar.bz2 with the config.log and the output of 'make'.
I wouldn't doubt it if it's just a problem with the way I have PGI 6.1 set
up; I just haven't had time to investigate it yet.
--
Troy Telford
PGI6.1_problem.tar.bz2
Description: application/bzip2
to it than that, but most of the differences
have to do with the installation prefix, for package management purposes)
That being said, I have been unable to get OpenMPI to compile with PGI 6.1
(but it does finish ./configure; it breaks during 'make').
--
Troy Telford
On Mon, 28 Nov 2005 03:05:05 -0700, Dries Kimpe
wrote:
Hi,
is somebody here building OpenMPI (svn trunk) with PathScale compilers?
I've been building OpenMPI with PathScale 2.2.1 with no issues. It would
be more helpful if you had attatched the configure.log as directed in the
maili
On Mon, 21 Nov 2005 06:00:05 -0700, Jeff Squyres
wrote:
Although George fixed the MX-abort error, let me clarify the rationale
here...
You are correct that at run-time, OMPI tries to load an run every
component that it finds. So if you have BTL components build for all
interconnects, OMPI w
I wouldn't be suprised if this is simply an issue of configuration:
In my test cluster, I've got Myrinet, InfiniBand, and Gigabit Ethernet
support.
My understanding is that when you use 'mpirun' without specifying an MCA
(including systemwide and/or user configurations in ~/.openmpi) , Open
On Wed, 16 Nov 2005 14:16:20 -0700, Enrique Curchitser
wrote:
Hi,
I put together a small cluster (4 computers) which has one head node
that sees the world
and 3 that are on a private network. If I want to use the head node
(which has 2 NICs)
as part of the ring, how do I tell it to go over th
On Mon, 14 Nov 2005 17:28:15 -0700, Troy Telford
wrote:
I've just finished a build of RC7, so I'll go give that a whirl and
report.
RC7:
With *both* mvapi and openib, I recieve the following when using IMB-MPI1:
***mvapi***
[0,1,3][btl_mvapi_compo
On Mon, 14 Nov 2005 10:38:03 -0700, Troy Telford
wrote:
My mvapi config is using the Mellanox IB Gold 1.8 IB software release.
Kernel 2.6.5-7.201 (SLES 9 SP2)
When I ran IMB using mvapi, I received the following error:
***
[0,1,2][btl_mvapi_component.c:637:mca_btl_mvapi_component_progress
Thus far, it appears that moving to MX 1.1.0 didn't change the error
message I've been getting about parts being 'not implemented.'
I also re-provisioned four of the IB nodes (leaving me with 3 four-node
clusters: One using mvapi, one using openib, and one using myrinet)
My mvapi config is
On Sun, 13 Nov 2005 17:53:40 -0700, Jeff Squyres
wrote:
I can't believe I missed that, sorry. :-(
None of the btl's are capable of doing loopback communication except
"self." Hence, you really can't run "--mca btl foo" if your app ever
sends to itself -- you really need to run "--mca btl f
We have very limited openib resources for testing at the moment. Can
you provide details on how to reproduce?
My bad; I must've been in a bigger hurry to go home for the weekend
than I thought.
I'm going to start with the assumption you're interested in the steps
to reproduce it in OpenMPI
On Fri, 11 Nov 2005 13:12:13 -0700, Jeff Squyres
wrote:
At long last, 1.0rc5 is available for download. It fixes all known
issues reported here on the mailing list. We still have a few minor
issues to work out, but things appear to generally be working now.
Please try to break it:
On Wed, 09 Nov 2005 08:44:50 -0700, Galen M. Shipman
wrote:
This error is occurring when Open MPI attempts to open the Infiniband
device mthca0. This doesn't appear to be an Open MPI issue, it looks
like a configuration issue with OpenIB. What do you find under /sys/
class/infiniband/ ?
Und
I decided to try OpenMPI using the 'openib' module, rather than 'mvapi';
however I'm having a bit of difficulty:
The test hardware is the same as in my earlier posts, the only software
difference is:
Linux 2.6.14 (OpenIB 2nd gen IB drivers)
OpenIB userspace tools (svn from openib.org)
OpenM
On Fri, 04 Nov 2005 16:45:59 -0700, Troy Telford
wrote:
the 'globalop' test was a dog on 4 nodes (some odd 360 times
slower on
mvapi than on mx); it'll take a while to verify whether it tickles the
65-process issue or not.
Globalop runs fine on 100 pro
(Using svn 'trunk' revision 7927 of OpenMPI):
I've found an interesting issue with OpenMPI and the mvapi btl mca: Most
of the benchmarks I've tried (HPL, HPCC, Presta, IMB), do not seem to run
properly when the number of processes is sufficiently large (the barrier
seems to be at 65 proces
On Mon, 31 Oct 2005 20:33:06 -0700, Jeff Squyres
wrote:
On Oct 28, 2005, at 3:08 PM, Jeff Squyres wrote:
1. I'm concerned about the MPI_Reduce error -- that one shouldn't be
happening at all. We have table lookups for the MPI_Op/MPI_Datatype
combinations that are supposed to work; the fact
a whirl?
Sure, I'll give it a whirl.
Just out of curiosity -- do you test OpenMPI for memory leaks using
Valgrind (or similar)?
--
Troy Telford
I've been running a number of benchmarks & tests with OpenMPI 1.0rc4.
I've run into a few issues that I believe are related to OpenMPI; if they
aren't, I'd appreciate the education. :)
The attached tarball does not have the MPICH variant results (the tarball
is 87 kb as it is)
I can run
Thanks; this workaround does allow it to complete its run.
On Tue, 25 Oct 2005 10:19:54 -0600, Galen M. Shipman
wrote:
Correction: HPL_NO_DATATYPE should be: HPL_NO_MPI_DATATYPE.
- Galen
On Oct 25, 2005, at 10:13 AM, Galen M. Shipman wrote:
Hi Troy,
Sorry for the delay, I am now able t
I've been trying out the RC4 builds of OpenMPI; I've been using Myrinet
(gm), Infiniband (mvapi), and TCP.
When running a benchmark such as IMB (formerly PALLAS, IIRC), or even a
simple hello world, there are no problems.
However, when running HPL (and HPCC, which is a superset of HPL), I h
61 matches
Mail list logo