Re: [OMPI users] OMPI & uDAPL

2007-10-23 Thread Troy Telford
be suprised if that's the reason why it isn't working. MVAPICH2's uDAPL support is 1.2 or greater, so I wouldn't be suprised if the story is similar for Open MPI. I'd add that it may be useful to others to mention what version(s) of uDAPL work with Open MPI in the documentation or FAQ. -- Troy Telford

Re: [OMPI users] OMPI & uDAPL

2007-10-22 Thread Troy Telford
On Monday 22 October 2007, Troy Telford wrote: > WARNING: Failed to open "ib0" Whoops; I typed in the wrong text here. The failure was "Failed to Open "InfiniHost0" - ie. the name listed in the warning matches the name in /etc/dat.conf. -- Troy Telford

[OMPI users] OMPI & uDAPL

2007-10-22 Thread Troy Telford
specify a DAT provider, and I play with the name in /etc/dat.conf, Open MPI seems aware of the name change; it will list 'failed to open "newname"' my /etc/dat.conf looks like this: InfiniHost0 u1.1 nonthreadsafe default /usr/lib64/libdapl.so ri.1.1 " " " " Any ideas on why I'm not able to get Open MPI to use uDAPL? -- Troy Telford

[OMPI users] Bad performance - OpenIB 1.2.3

2007-09-20 Thread Troy Telford
dbit that is probably insignificant, but I'll mention anyway: We are running IBM's GPFS via IPoIB, so there is a little bit of IB traffic from GPFS - which is also a configuration we've used with no problems in the past. Any ideas on what I can do to verify that OpenMPI is in fact using the IB fabric? -- Troy Telford

Re: [OMPI users] Open MPI and PBS Pro 8

2007-02-13 Thread Troy Telford
n rank by slot or node This is good to know; for some reason it seemed logical that the batch scheduler should know how many processes per node, and TM should be able to get the information. But that's making assumptions... Thanks! -- Troy Telford

[OMPI users] Open MPI and PBS Pro 8

2007-02-13 Thread Troy Telford
r some reason, I recall that OMPI used to be able to get the number of processes to run from PBS; am I just 'remembering' something that never existed? -- Troy Telford

[OMPI users] Open MPI/OpenIB Error/Problem

2007-02-08 Thread Troy Telford
all this being one of the errors I've seen). Also, is there any chance that the error can be caused by mismatched libraries (from a different compile of Open MPI?) (And I apologize for firing off this without knowing more; I'm still gathering data as I learn more...) -- Troy Telford

Re: [OMPI users] Fault Tolerance & Behavior

2006-10-31 Thread Troy Telford
On Tue, 31 Oct 2006 08:43:10 -0700, Galen M. Shipman wrote: Okay, so these are percentage not modulus, the formula makes some sense now.. so the timeout is between 4.9 and 10.3 ms, you had better plug the cable in/out very quickly The Flash could do it. -- Troy Telford

Re: [OMPI users] Fault Tolerance & Behavior

2006-10-30 Thread Troy Telford
e (and/or had enough information) to figure out that it can't continue at all, and will abort the job. -- Troy Telford

Re: [OMPI users] Fault Tolerance & Behavior

2006-10-26 Thread Troy Telford
I'll take a deeper look, and can provide things like the config.log, etc. I just don't want to flood the list at the moment.) -- Troy Telford

[OMPI users] Fault Tolerance & Behavior

2006-10-26 Thread Troy Telford
, the thought occurs (and it may just be my ignorance of MPI): After a network connection times out (as was apparently the case with IB), is the job salvageable? If the jobs are not salvageable, why didn't Open MPI abort the job (and clean up the running processes on the nodes)? -- Troy Telford

Re: [OMPI users] IB HCA support

2006-10-03 Thread Troy Telford
On Tue, 03 Oct 2006 11:48:14 -0600, Janet Tvedt wrote: I was curious if there is a list showing which InfiniBand HCAs are known to work with Open MPI. I can't claim to know which ones are *known* to work, but I've never seen an IB HCA that didn't work with Open MPI. That being said, here'

[OMPI users] DAPL setup/config help

2006-09-26 Thread Troy Telford
I've never set up dapl before, however I now have a reason to try... The problem is, I can't seem to find any documentation on how to set it up. I've tried the sample /etc/dat.conf (modified for the IPoIB address on the system), but I'm not sure I've been sucessful. I've: * compiled from O

Re: [OMPI users] openib /compiler issue?

2006-06-02 Thread Troy Telford
l Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Troy Telford Sent: Friday, June 02, 2006 12:46 PM To: Open MPI Users Subject: Re: [OMPI users] openib /compiler issue? On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford wrote: > the 'com&#

Re: [OMPI users] openib /compiler issue?

2006-06-02 Thread Troy Telford
On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford wrote: the 'com' test ends with: [n1:04941] *** An error occurred in MPI_Gather [n1:04941] *** on communicator MPI_COMM_WORLD [n1:04941] *** MPI_ERR_ARG: invalid argument of some other kind [n1:04941] *** MPI_ERRORS_ARE_FATAL (goo

Re: [OMPI users] Open MPI and Dual Core (machinefile)

2006-06-02 Thread Troy Telford
The kernel being used is 2.6.16 -- so it's unlikely that the kernel is too old. But it may not be explicitly enabled, etc... -- Troy Telford

Re: [OMPI users] Open MPI and Dual Core (machinefile)

2006-06-02 Thread Troy Telford
On Fri, 02 Jun 2006 09:15:06 -0600, Troy Telford wrote: Can you confirm that your Linux installation thinks that it has 4 processors and will schedule 4 processes simultaneously? D'oh. Still too early in the morning... OK, Linux thinks it has two CPUs. Period. For some reason I f

Re: [OMPI users] Open MPI and Dual Core (machinefile)

2006-06-02 Thread Troy Telford
'll have to see if the system behaves similarly with non-mpi processes (ie. it doesn't use all of the available cores). It may very well be a problem with the hardware or OS; it's the pre-release distro I wrote about in another posting yesterday... I'm wondering if there is something happening behind the scenes... I'll have to check... -- Troy Telford

[OMPI users] openib /compiler issue?

2006-06-01 Thread Troy Telford
et #41). And yes, I'm going to try out the dev snapshots of 1.0.3 and 1.1... I'm just not there yet... (For those tracking tickets #40 and #41 -- I know it would be nice to see if distro X has same the behavior I see with FC4, but I don't have the hardware to do any sort of scale testing with distro X.) -- Troy Telford

[OMPI users] Open MPI and Dual Core (machinefile)

2006-06-01 Thread Troy Telford
ots=4. Except 'slots=4' makes it run a few orders of magnitude slower. Thoughts? -- Troy Telford

Re: [OMPI users] Open MPI 1.0.2 and np >=64

2006-06-01 Thread Troy Telford
led with errno=113 [0,1,3][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 If I use -np 2 (ie. the job doesn't leave the node, it being a dual-cpu machine), it works fine. -- Troy Telford

Re: [OMPI users] Open MPI 1.0.2 and np >=64

2006-06-01 Thread Troy Telford
status 5 for wr_id 8030232 opcode 0 [0,1,144][btl_openib_component.c:782:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 8042822 opcode 0 [0,1,144][btl_openib_component.c:782:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 8055412 opcode 0 -- Troy Telford

Re: [OMPI users] Open MPI 1.0.2 and np >=64

2006-05-31 Thread Troy Telford
ssors) PCI Express IB HCA's Myrinet 10G (MX10G) Gigabit Ethernet configured with built with (both) GCC 3.4 and 4.0 -- didn't seem to make much difference. /configure --enable-cxx-exceptions (Note, I use LDFLAGS and CFLAGS to point to the MX & InfiniBand headers.) -- Troy Tel

[OMPI users] Open MPI 1.0.2 and np >=64

2006-05-30 Thread Troy Telford
ssage *** Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x6 *** End of error message *** 4 additional processes aborted (not shown) Any Thoughts/Ideas on how to fix it? -- Troy Telford

Re: [OMPI users] SilverStorm IB

2006-04-13 Thread Troy Telford
was no subnet manager on the IB fabric (which may well have been the case, actually). It's working now, though... -- Troy Telford

Re: [OMPI users] Building 1.0.2 with Intel 9.0

2006-04-12 Thread Troy Telford
\ --enable-mpi-threads \ --enable-progress-threads \ --with-threads \ --enable-static \ --enable-shared \ --enable-cxx-exceptions (Note that I'm not disabling ROMIO) But I can compile it fine with: icc (ICC) 9.0 20060222 ifort (IFORT) 9.0 20060222 -- Troy Telford

Re: [OMPI users] SilverStorm IB

2006-04-12 Thread Troy Telford
or hopefully anybody else's) part. On Wed, 12 Apr 2006 10:56:24 -0600, Troy Telford wrote: On Wed, 12 Apr 2006 10:04:18 -0600, Brian Barrett wrote: We've tested against the SilverStorm drivers for OS X with success, but I don't think anyone has tried the Linux drivers.

Re: [OMPI users] SilverStorm IB

2006-04-12 Thread Troy Telford
m' has different size in shared object, consider re-linking IMB-MPI1.ss: Symbol `ompi_mpi_float' has different size in shared object, consider re-linking IMB-MPI1.ss: Symbol `ompi_mpi_comm_world' has different size in shared object, consider re-linking IMB-MPI1.ss: Symbol `ompi_mpi_double' has different size in shared object, consider re-linking IMB-MPI1.ss: Symbol `ompi_mpi_op_null' has different size in shared object, consider re-linking IMB-MPI1.ss: Symbol `ompi_mpi_comm_self' has different size in shared object, consider re-linking Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR) Failing at addr:0x2a99610600 Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0xa8 -- Troy Telford

[OMPI users] SilverStorm IB

2006-04-12 Thread Troy Telford
ny development; but if there is a near zero effort method somebody knows of to get it to work, I'd be interested. Beyond that... well, the 2nd gen OpenIB.org drivers are on the horizon... -- Troy Telford

Re: [OMPI users] Intel EM64T Compiler error on Opteron

2006-04-11 Thread Troy Telford
On Tue, 11 Apr 2006 13:48:43 -0600, Troy Telford wrote: I have compiled Open MPI (on an Opteron) with the Intel 9 EM64T compilers; It's been a while since I've used the 8.1 series, but I'll give it a shot with Intel 8.1 and tell y

Re: [OMPI users] Intel EM64T Compiler error on Opteron

2006-04-11 Thread Troy Telford
On Tue, 11 Apr 2006 13:19:43 -0600, Hugh Merz wrote: I couldn't find any other threads in the mailing list concerning usage of the Intel EM64T compilers - has anyone successfully compiled OpenMPI using this combination? It also occurs on the Athlon 64 processor. Logs attached. Thanks

[OMPI users] Funny ./configure option

2006-04-10 Thread Troy Telford
confusion... Whenever --enable-X is used to DISable something, it's bound to cause some head scratching. default:enabled -- does this mean the option which /dis/ables the SMP locks is the default, or does it mean that SMP locks are enabled by default. -- Troy Telford

Re: [OMPI users] PBS Professional

2006-03-24 Thread Troy Telford
user in my same situation goes searching through the archives...) Well, that and I need to subscribe to -dev. -- Troy Telford

Re: [OMPI users] PBS Professional

2006-03-24 Thread Troy Telford
the info; I guess when PBS Pro is there, there's only one option... -- Troy Telford

[OMPI users] PBS Professional

2006-03-24 Thread Troy Telford
mentation from the x86-64 project: http://www.x86-64.org/lists/discuss/msg05760.html (Basically, a plead with suppliers of static libraries to compile with -fPIC on x86-64). Sigh... -- Troy Telford

Re: [OMPI users] Open MPI and MultiRail InfiniBand

2006-03-13 Thread Troy Telford
I have added max_btls to the openib component on the trunk, try: mpirun --mca btl_openib_max_btls 1 ...etc I don't have a dual nic machine handy to test on, if this checks out we can patch the release branch. Thanks, Galen I'll get to it as soon as I can; but it may be a few day

Re: [OMPI users] Open MPI and MultiRail InfiniBand

2006-03-13 Thread Troy Telford
/need/ for this. -- Troy Telford

Re: [OMPI users] Open MPI and MultiRail InfiniBand

2006-03-10 Thread Troy Telford
On Mar 9, 2006, at 9:18 PM, Brian Barrett wrote: On Mar 9, 2006, at 6:41 PM, Troy Telford wrote: I've got a machine that has the following config: Each node has two InfiniBand ports: * The first port is on fabric 'a' with switches for 'a' * The second port is on

Re: [OMPI users] Myrinet on linux cluster

2006-03-09 Thread Troy Telford
bgm.so' to ensure one of 'em is 64-bit, and that the 64-bit library is in a path where the linker can find it (ld.so.conf or LD_LIBRARY_PATH). -- Troy Telford

[OMPI users] Open MPI and MultiRail InfiniBand

2006-03-09 Thread Troy Telford
first IB port (ie. fabric 'a'), and leaves the second IB port (ie. fabric 'b') free for other uses (I'll use NFS as a humorous example). If so, is there any magic required to configure it thusly? Troy Telford

Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90

2006-03-03 Thread Troy Telford
.0 to test with. No. I refuse :p Attatched is a tar.bz2 with the config.log and the output of 'make'. I wouldn't doubt it if it's just a problem with the way I have PGI 6.1 set up; I just haven't had time to investigate it yet. -- Troy Telford PGI6.1_problem.tar.bz2 Description: application/bzip2

Re: [OMPI users] OpenMPI 1.0.x and PGI pgf90

2006-03-01 Thread Troy Telford
to it than that, but most of the differences have to do with the installation prefix, for package management purposes) That being said, I have been unable to get OpenMPI to compile with PGI 6.1 (but it does finish ./configure; it breaks during 'make'). -- Troy Telford

Re: [O-MPI users] pathscale 2.1/2.3 build problem

2005-11-28 Thread Troy Telford
On Mon, 28 Nov 2005 03:05:05 -0700, Dries Kimpe wrote: Hi, is somebody here building OpenMPI (svn trunk) with PathScale compilers? I've been building OpenMPI with PathScale 2.2.1 with no issues. It would be more helpful if you had attatched the configure.log as directed in the maili

Re: [O-MPI users] Minor issue: Failthrough of MCA components.

2005-11-21 Thread Troy Telford
On Mon, 21 Nov 2005 06:00:05 -0700, Jeff Squyres wrote: Although George fixed the MX-abort error, let me clarify the rationale here... You are correct that at run-time, OMPI tries to load an run every component that it finds. So if you have BTL components build for all interconnects, OMPI w

[O-MPI users] Minor issue: Failthrough of MCA components.

2005-11-17 Thread Troy Telford
I wouldn't be suprised if this is simply an issue of configuration: In my test cluster, I've got Myrinet, InfiniBand, and Gigabit Ethernet support. My understanding is that when you use 'mpirun' without specifying an MCA (including systemwide and/or user configurations in ~/.openmpi) , Open

Re: [O-MPI users] Configuring port

2005-11-16 Thread Troy Telford
On Wed, 16 Nov 2005 14:16:20 -0700, Enrique Curchitser wrote: Hi, I put together a small cluster (4 computers) which has one head node that sees the world and 3 that are on a private network. If I want to use the head node (which has 2 NICs) as part of the ring, how do I tell it to go over th

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
On Mon, 14 Nov 2005 17:28:15 -0700, Troy Telford wrote: I've just finished a build of RC7, so I'll go give that a whirl and report. RC7: With *both* mvapi and openib, I recieve the following when using IMB-MPI1: ***mvapi*** [0,1,3][btl_mvapi_compo

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
On Mon, 14 Nov 2005 10:38:03 -0700, Troy Telford wrote: My mvapi config is using the Mellanox IB Gold 1.8 IB software release. Kernel 2.6.5-7.201 (SLES 9 SP2) When I ran IMB using mvapi, I received the following error: *** [0,1,2][btl_mvapi_component.c:637:mca_btl_mvapi_component_progress

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
Thus far, it appears that moving to MX 1.1.0 didn't change the error message I've been getting about parts being 'not implemented.' I also re-provisioned four of the IB nodes (leaving me with 3 four-node clusters: One using mvapi, one using openib, and one using myrinet) My mvapi config is

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
On Sun, 13 Nov 2005 17:53:40 -0700, Jeff Squyres wrote: I can't believe I missed that, sorry. :-( None of the btl's are capable of doing loopback communication except "self." Hence, you really can't run "--mca btl foo" if your app ever sends to itself -- you really need to run "--mca btl f

Re: [O-MPI users] 1.0rc5 is up

2005-11-12 Thread Troy Telford
We have very limited openib resources for testing at the moment. Can you provide details on how to reproduce? My bad; I must've been in a bigger hurry to go home for the weekend than I thought. I'm going to start with the assumption you're interested in the steps to reproduce it in OpenMPI

Re: [O-MPI users] 1.0rc5 is up

2005-11-11 Thread Troy Telford
On Fri, 11 Nov 2005 13:12:13 -0700, Jeff Squyres wrote: At long last, 1.0rc5 is available for download. It fixes all known issues reported here on the mailing list. We still have a few minor issues to work out, but things appear to generally be working now. Please try to break it:

Re: [O-MPI users] OpenIB module problem/questions:

2005-11-09 Thread Troy Telford
On Wed, 09 Nov 2005 08:44:50 -0700, Galen M. Shipman wrote: This error is occurring when Open MPI attempts to open the Infiniband device mthca0. This doesn't appear to be an Open MPI issue, it looks like a configuration issue with OpenIB. What do you find under /sys/ class/infiniband/ ? Und

[O-MPI users] OpenIB module problem/questions:

2005-11-08 Thread Troy Telford
I decided to try OpenMPI using the 'openib' module, rather than 'mvapi'; however I'm having a bit of difficulty: The test hardware is the same as in my earlier posts, the only software difference is: Linux 2.6.14 (OpenIB 2nd gen IB drivers) OpenIB userspace tools (svn from openib.org) OpenM

Re: [O-MPI users] OpenMPI Scaling on mvapi interface:

2005-11-04 Thread Troy Telford
On Fri, 04 Nov 2005 16:45:59 -0700, Troy Telford wrote: the 'globalop' test was a dog on 4 nodes (some odd 360 times slower on mvapi than on mx); it'll take a while to verify whether it tickles the 65-process issue or not. Globalop runs fine on 100 pro

[O-MPI users] OpenMPI Scaling on mvapi interface:

2005-11-04 Thread Troy Telford
(Using svn 'trunk' revision 7927 of OpenMPI): I've found an interesting issue with OpenMPI and the mvapi btl mca: Most of the benchmarks I've tried (HPL, HPCC, Presta, IMB), do not seem to run properly when the number of processes is sufficiently large (the barrier seems to be at 65 proces

Re: [O-MPI users] Tests and Bugs (RC4):

2005-11-01 Thread Troy Telford
On Mon, 31 Oct 2005 20:33:06 -0700, Jeff Squyres wrote: On Oct 28, 2005, at 3:08 PM, Jeff Squyres wrote: 1. I'm concerned about the MPI_Reduce error -- that one shouldn't be happening at all. We have table lookups for the MPI_Op/MPI_Datatype combinations that are supposed to work; the fact

Re: [O-MPI users] Tests and Bugs (RC4):

2005-10-28 Thread Troy Telford
a whirl? Sure, I'll give it a whirl. Just out of curiosity -- do you test OpenMPI for memory leaks using Valgrind (or similar)? -- Troy Telford

[O-MPI users] Tests and Bugs (RC4):

2005-10-27 Thread Troy Telford
I've been running a number of benchmarks & tests with OpenMPI 1.0rc4. I've run into a few issues that I believe are related to OpenMPI; if they aren't, I'd appreciate the education. :) The attached tarball does not have the MPICH variant results (the tarball is 87 kb as it is) I can run

Re: [O-MPI users] HPL & HPCC: Wedged

2005-10-25 Thread Troy Telford
Thanks; this workaround does allow it to complete its run. On Tue, 25 Oct 2005 10:19:54 -0600, Galen M. Shipman wrote: Correction: HPL_NO_DATATYPE should be: HPL_NO_MPI_DATATYPE. - Galen On Oct 25, 2005, at 10:13 AM, Galen M. Shipman wrote: Hi Troy, Sorry for the delay, I am now able t

[O-MPI users] HPL & HPCC: Wedged

2005-10-21 Thread Troy Telford
I've been trying out the RC4 builds of OpenMPI; I've been using Myrinet (gm), Infiniband (mvapi), and TCP. When running a benchmark such as IMB (formerly PALLAS, IIRC), or even a simple hello world, there are no problems. However, when running HPL (and HPCC, which is a superset of HPL), I h