Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-28 Thread Scott Atchley
On Oct 28, 2010, at 2:50 PM, Ray Muno wrote: > On 10/28/2010 01:40 PM, Scott Atchley wrote: > >> >> Does your environment have LD_LIBRARY_PATH set to point to $OMPI/lib and >> $MX/lib? Does it get set on login? Is $OMPI/bin in your PATH? >> >> Scott >

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-28 Thread Scott Atchley
On Oct 28, 2010, at 2:18 PM, Ray Muno wrote: > On 10/22/2010 07:36 AM, Scott Atchley wrote: >> Ray, >> >> Looking back at your original message, you say that it works if you use the >> Myricom supplied mpirun from the Myrinet roll. I wonder if this is a >>

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-22 Thread Scott Atchley
I am wondering if ldd find the libraries from your compile or the Myrinet roll. Scott On Oct 21, 2010, at 10:39 AM, Raymond Muno wrote: > On 10/20/2010 8:30 PM, Scott Atchley wrote: >> We have fixed this bug in the most recent 1.4.x and 1.5.x releases. >> >> Scott > OK, a few

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-22 Thread Scott Atchley
On Oct 20, 2010, at 9:43 PM, Raymond Muno wrote: > On 10/20/2010 8:30 PM, Scott Atchley wrote >> Are you building OMPI with support for both MX and IB? If not and you only >> want MX support, try configuring OMPI using --disable-memory-manager (check >> configure

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Scott Atchley
On Oct 20, 2010, at 9:22 PM, Raymond Muno wrote: > On 10/20/2010 7:59 PM, Ralph Castain wrote: >> The error message seems to imply that mpirun itself didn't segfault, but >> that something else did. Is that segfault pid from mpirun? >> >> This kind of problem usually is caused by mismatched buil

Re: [OMPI users] MPICH2 is working OpenMPI Not

2010-07-19 Thread Scott Atchley
Hi Bibrak, The message about malloc looks like a MX message. Which interconnects did you compile support for? If you are using MX, does it appear when you run with: $ mpirun --mca pml cm -np 4 ./exec 98 which uses the MX MTL instead of MX BTL. Scott On Jul 18, 2010, at 9:23 AM, Bibrak Qamar

Re: [OMPI users] error in (Open MPI) 1.3.3r21324-ct8.2-b09b-r31

2010-07-15 Thread Scott Atchley
Lydia, Which interconnect is this running over? Scott On Jul 15, 2010, at 5:19 AM, Lydia Heck wrote: > We are running Sun's build of Open Mpi 1.3.3r21324-ct8.2-b09b-r31 > (HPC8.2) and one code that runs perfectly fine under > HPC8.1 (Open MPI) 1.3r19845-ct8.1-b06b-r21 and before fails with >

Re: [OMPI users] Unable to connect to a server using MX MTL with TCP

2010-06-05 Thread Scott Atchley
On Jun 4, 2010, at 7:18 PM, Audet, Martin wrote: > Hi OpenMPI_Users and OpenMPI_Developers, > > I'm unable to connect a client application using MPI_Comm_connect() to a > server job (the server job calls MPI_Open_port() before calling by > MPI_Comm_accept()) when the server job uses MX MTL (alt

Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.

2010-06-03 Thread Scott Atchley
On Jun 3, 2010, at 8:54 AM, guillaume ranquet wrote: > granquet@bordeplage-15 ~ $ mpirun --mca btl mx,openib,sm,self --mca pml > ^cm --mca mpi_leave_pinned 0 ~/bwlat/mpi_helloworld > [bordeplage-15.bordeaux.grid5000.fr:02707] Error in mx_init (error No MX > device entry in /dev.) > Hello world fro

Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.

2010-06-02 Thread Scott Atchley
On Jun 2, 2010, at 1:31 PM, guillaume ranquet wrote: > granquet@bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun > - --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld > Hello world from process 0 of 1 > granquet@bordeplage-9 ~/openmpi-1.4.2 $ > > I can tell it works :) O

Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.

2010-06-02 Thread Scott Atchley
On Jun 2, 2010, at 1:51 PM, Jeff Squyres wrote: >>> Ok, there is no segfault when it can't find IB. > > I'm not sure I follow this comment. MX initialization is interfering on IB nodes (that do not have MX). I wanted to make sure the opposite was not true (and it is not). :-) Scott

Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.

2010-06-02 Thread Scott Atchley
On Jun 2, 2010, at 11:52 AM, Scott Atchley wrote: > What if you explicitly disable MX? > > ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca btl ^mx > ~/bwlat/mpi_helloworld And can you try this as well? ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self -

Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.

2010-06-02 Thread Scott Atchley
On Jun 2, 2010, at 11:14 AM, guillaume ranquet wrote: >> What happens if you run: >> >> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self >> ~/bwlat/mpi_helloworld >> >> (i.e., MX support is still compiled in, but remove MX from the run-time) > > sadly, exactly the same thing :( > it doe

Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.

2010-06-02 Thread Scott Atchley
On Jun 2, 2010, at 9:54 AM, Jeff Squyres wrote: >> this is the output I get on a node with ethernet and infiniband hardware. >> note the Error regarding mx. >> >> $ ~/openmpi-1.4.2-bin/bin/mpirun ~/bwlat/mpi_helloworld >> [bordeplage-9.bordeaux.grid5000.fr:32365] Error in mx_init (error No MX >>

Re: [OMPI users] gadget-3 locks up using openmpi and infiniband (or myrinet)

2010-05-16 Thread Scott Atchley
On May 16, 2010, at 1:32 PM, Lydia Heck wrote: > When running over gigabit using -mca btl tcp,self,sm the code runs alright, > which is good as the largest part of our cluster is over gigabit, and as > Gadget-3 scales rather well, the penalty for running over gigabit is not > prohibitive. > We

Re: [OMPI users] Rapid I/O support

2010-01-15 Thread Scott Atchley
On Jan 14, 2010, at 3:08 PM, Jeff Squyres wrote: On Jan 14, 2010, at 1:59 PM, TONY BASIL wrote: I am doing a project with an HPC set up on multicore Power PC..Nodes will be connected using Rapid I/O instead for Gigabit Ethernet...I would like to know if OpenMPI supports Rapid I/O... I'm

Re: [OMPI users] Using OPENMPI configured for MX, GM and OPENIB interconnects

2009-08-26 Thread Scott Atchley
On Aug 26, 2009, at 4:20 PM, twu...@goodyear.com wrote: I see. My one script for all clusters calls mpirun --mca btl openib,mx,gm,tcp,sm,self so I'd need to add some logic above the mpirun line to figure out what cluster I am on to setup the correct mpirun line. still seems like I shou

Re: [OMPI users] Using OPENMPI configured for MX, GM and OPENIB interconnects

2009-08-26 Thread Scott Atchley
On Aug 26, 2009, at 3:41 PM, twu...@goodyear.com wrote: When, for example, I run on an IB cluster, I get warning messages about not finding GM NICS and another transport will be used etc. And warnings about mca btl mx components not found etc. It DOES run the IB, but it never says that in

Re: [OMPI users] How to make a job abort when one host dies?

2009-08-18 Thread Scott Atchley
On Aug 18, 2009, at 10:59 AM, Oskar Enoksson wrote: The question is, however, why is cl120 not acking messages? What is the application? What MPI calls does this application use? Scott The reason in this case was that cl120 had some kind of hardware problem, perhaps memory error or myrin

Re: [OMPI users] How to make a job abort when one host dies?

2009-08-17 Thread Scott Atchley
On Aug 17, 2009, at 2:43 PM, Jeff Squyres wrote: George / Myricom -- Does the MX MTL abort if it gets a "disconnected" error back from libmyriexpress? Short answer: yes. Long answer: The messages below indicate that these processes were all trying to send to cl120. It did not ack their

Re: [OMPI users] Problems with MPI_Issend and MX

2009-07-03 Thread Scott Atchley
ase I attached in my previous mail work. Very suspicious, but at least this does make a functional solution (however, if I understand OpenMPI correctly, I shouldn't be able to use the CM PML over a network where some nodes have MX and some don't, correct?). Scott Atchley atchley-at-myri.com |o

Re: [OMPI users] Problems with MPI_Issend and MX

2009-07-02 Thread Scott Atchley
Hi Kris, I have not run your code yet, but I will try to this weekend. You can have MX checksum its messages if you set MX_CSUM=1 and use the MX debug library (e.g. LD_LIBRARY_PATH to /opt/mx/lib/debug). Do you have the problem if you use the MX MTL? To test it modify your mpirun as follow

Re: [OMPI users] MX questions

2009-06-28 Thread Scott Atchley
On Jun 28, 2009, at 7:14 AM, Dave Love wrote: Scott Atchley writes: George's answer supersedes mine. You must be using the MX bonding driver to use more than one NIC per host. Will that be relevant for Open-MX, which I'm using rather than normal MX? (I'm afraid I don't

Re: [OMPI users] MX questions

2009-06-26 Thread Scott Atchley
On Jun 26, 2009, at 9:45 AM, Dave Love wrote: Scott Atchley writes: I believe the answer is yes as long as all NICs are in the same fabric (they usually are). Thanks. Do you mean it won't if, in this case, the two NICs are on separate switches? Dave, George's answer super

Re: [OMPI users] MX questions

2009-06-25 Thread Scott Atchley
On Jun 25, 2009, at 1:02 PM, Dave Love wrote: Also, Brice Goglin, the Open-MX author had a couple of questions concerning multi-rail MX while I'm on: 1. Does the MX MTL work with multi-rail? I believe the answer is yes as long as all NICs are in the same fabric (they usually are). 2. "Yo

Re: [OMPI users] Error in mx_init (error MX library incompatiblewith driver version)

2009-06-21 Thread Scott Atchley
tough2_mp with the Aztec library, however building it with mpich_mx instead of OpenMPI does not give problems. Thanks Henk -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Scott Atchley Sent: 19 June 2009 23:23 To: Open MPI Users Subject: Re

Re: [OMPI users] Error in mx_init (error MX library incompatible with driver version)

2009-06-19 Thread Scott Atchley
On Jun 19, 2009, at 1:05 PM, SLIM H.A. wrote: Although the mismatch between MX lib version and the kernel version appears to cause the mx_init error this should never be called as there is no mx card on those nodes. Thanks in advance for any advice to solve this Henk Henk, Is MX statical

Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads

2009-06-12 Thread Scott Atchley
Francois, How many cores do your machines have? The file specifies THREADS_DEFAULT 16. Does this spawn 16 threads per MPI rank? I see crashes when I run this with MX (BTL with mx,sm,self and MTL). If I change THREADS_DEFAULT to 4, I see crashes with TCP (BTL with tcp,sm,self) as well.

Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads

2009-06-11 Thread Scott Atchley
On Jun 11, 2009, at 2:20 PM, François Trahay wrote: The stack trace is from the MX MTL (I attach the backtraces I get with both MX MTL and MX BTL) Here is the program that I use. It is quite simple. It runs ping pongs concurrently (with one thread per node, then with two threads per node, e

Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads

2009-06-11 Thread Scott Atchley
directly on top of the underlying network capabilities. However, there are clearly few places where thread safety should be enforced in the MTL layer, and I don't know if this is the case. george. On Jun 11, 2009, at 09:35 , Scott Atchley wrote: Francois, For threads, the FAQ h

Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads

2009-06-11 Thread Scott Atchley
. Francois Scott Atchley wrote: Hi Francois, I am not familiar with the internals of the OMPI code. Are you sure, however, that threads are fully supported yet? I was under the impression that thread support was still partial. Can anyone else comment? Scott On Jun 8, 2009, at 8:43 AM

Re: [OMPI users] Problem with OpenMPI (MX btl and mtl) and threads

2009-06-09 Thread Scott Atchley
Hi Francois, I am not familiar with the internals of the OMPI code. Are you sure, however, that threads are fully supported yet? I was under the impression that thread support was still partial. Can anyone else comment? Scott On Jun 8, 2009, at 8:43 AM, François Trahay wrote: Hi, I'm en

Re: [OMPI users] top question

2009-06-03 Thread Scott Atchley
On Jun 3, 2009, at 6:05 AM, tsi...@coas.oregonstate.edu wrote: Top always shows all the paralell processes at 100% in the %CPU field, although some of the time these must be waiting for a communication to complete. How can I see actual processing as opposed to waiting at a barrier? Thanks

Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX

2009-05-06 Thread Scott Atchley
On May 4, 2009, at 10:54 AM, Ricardo Fernández-Perea wrote: I finally have opportunity to run the imb-3.2 benchmark over myrinet I am running in a cluster of 16 node Xservers connected with myrinet 15 of them are 8core ones and the last one is a 4 cores one. Having a limit of 124 process

Re: [OMPI users] Slightly off topic: Ethernet and InfiniBand speedevolution

2009-05-05 Thread Scott Atchley
On May 5, 2009, at 2:47 PM, Jeff Squyres wrote: On May 5, 2009, at 1:59 PM, Robert Kubrick wrote: I am preparing a presentation where I will discuss commodity interconnects and the evolution of Ethernet and Infiniband NICs. The idea is to show the advance in network interfaces speed over time

Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX

2009-03-20 Thread Scott Atchley
On Mar 20, 2009, at 11:33 AM, Ricardo Fernández-Perea wrote: This are the results initially Running 1000 iterations. Length Latency(us)Bandwidth(MB/s) 0 2.738 0.000 1 2.718 0.368 2 2.707 0.739 10485764392.217

Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX

2009-03-20 Thread Scott Atchley
On Mar 20, 2009, at 5:59 AM, Ricardo Fernández-Perea wrote: Hello, I am running DL_POLY in various Xserver 8 processor with a myrinet network.using mx-1.2.7 While I keep in the same node the process scales reasonably well but in the moment I hit the network ... I will like to try to max

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Scott Atchley
On Dec 5, 2008, at 12:22 PM, Justin wrote: Does OpenMPI have any known deadlocks that might be causing our deadlocks? Known deadlocks, no. We are assisting a customer, however, with a deadlock that occurs in IMB Alltoall (and some other IMB tests) when using 128 hosts and the MX BTL. We h

[OMPI users] Improving error messages

2008-06-20 Thread Scott Atchley
Hi all, We had a customer using 1.2.6 with MX. We were running his jobs, some of which used the MX BTL and some used the MX MTL. He added a few more nodes to the cluster and installed the same OMPI. When we tried to run jobs that spanned the new nodes, the jobs failed. I did not keep the

Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Scott Atchley
Jeff, If I remember correctly, Microsoft dropped support for .AVI 3-4 years ago so it can no longer be played by their media player. It is also not native to QT, so you will have to download a plugin (I have it somewhere if you want me to look for it). I do not know if there is a containe

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 3:24 PM, Tim Prins wrote: On Tuesday 10 July 2007 03:11:45 pm Scott Atchley wrote: On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote: Tim, starting with the recently released 1.2.1, it is the default. To clarify, MX_RCACHE=1 is the default. It would be good for the

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote: Tim, starting with the recently released 1.2.1, it is the default. To clarify, MX_RCACHE=1 is the default. Scott

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
always use MX_RCACHE=2 for both MTL and BTL. So far I didn't had any problems with it. george. On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote: On Jul 10, 2007, at 11:40 AM, Scott Atchley wrote: On Jul 10, 2007, at 1:14 PM, Christopher D. Maestas wrote: Has anyone seen the following me

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
On Jul 10, 2007, at 2:37 PM, Brian Barrett wrote: Scott - I'm having trouble getting the warning to go away with Open MPI. I've disabled our copy of ptmalloc2, so we're not providing a malloc anymore. I'm wondering if there's also something with the use of DSOs to load libmyriexpress? Is your

Re: [OMPI users] warning:regcache incompatible with malloc

2007-07-10 Thread Scott Atchley
anage memory, so MX_RCACHE=1 is safe to use unless the user's application manages memory. Scott -- Scott Atchley Myricom Inc. http://www.myri.com

Re: [OMPI users] openmpi fails on mx endpoint busy

2007-07-06 Thread Scott Atchley
On Jul 6, 2007, at 7:37 AM, SLIM H.A. wrote: Dear Michael I have now tried both mpirun --mca btl mx,sm -np 4 ./cpi which gives the same error message again, and, mpirun --mca btl mx,sm,self -np 4 ./cpi_gcc_ompi_mx actually locks some of the mx ports but not all 4, ie this is the output fr

Re: [OMPI users] problems with HPLinpack over myrinet MX-10G

2007-02-14 Thread Scott Atchley
On Feb 14, 2007, at 12:33 PM, Alex Tumanov wrote: Hello, I recently tried running HPLinpack, compiled with OMPI, over myrinet MX interconnect. Running a simple hello world program works, but XHPL fails with an error occurring when it tries to MPI_Send: # mpirun -np 4 -H l0-0,c0-2 --prefix $MPI

Re: [OMPI users] IB bandwidth vs. kernels

2007-01-18 Thread Scott Atchley
On Jan 18, 2007, at 8:11 AM, Peter Kjellstrom wrote: with Lustre, which is about 55% of the theoretical 20 Gb/s advertised speed. I think this should be calculated against 16 Gbps, not 20 Gbps. What is the advertised speed of a IB DDR card? http://mellanox.com/products/hca_cards.php http://

Re: [OMPI users] IB bandwidth vs. kernels

2007-01-18 Thread Scott Atchley
On Jan 18, 2007, at 5:05 AM, Peter Kjellstrom wrote: On Thursday 18 January 2007 09:52, Robin Humble wrote: ... is ~10Gbit the best I can expect from 4x DDR IB with MPI? some docs @HP suggest up to 16Gbit (data rate) should be possible, and I've heard that 13 or 14 has been achieved before.

Re: [OMPI users] ld_library_path not being updated

2007-01-17 Thread Scott Atchley
On Jan 17, 2007, at 10:45 AM, Brian Budge wrote: Hi Adrian - Thanks for the reply. I have been investigating this further. It appears that ssh isn't starting my .zshrc file. This is strange. You should check the zsh man page. .zshrc is for interactive logins only. You may want to use .

Re: [OMPI users] running with the dr pml.

2006-12-07 Thread Scott Atchley
son working on the DR PML like me to try anymore tests? Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Dec 7, 2006, at 9:50 AM, Scott Atchley wrote: On Dec 6, 2006, at 3:09 PM, Scott Atchley wrote: Brock and Galen, We are willing to assist. Our best guess is t

Re: [OMPI users] running with the dr pml.

2006-12-07 Thread Scott Atchley
On Dec 6, 2006, at 3:09 PM, Scott Atchley wrote: Brock and Galen, We are willing to assist. Our best guess is that OMPI is using the code in a way different than MPICH-GM does. One of our other developers who is more comfortable with the GM API is looking into it. We tried running with HPCC

Re: [OMPI users] running with the dr pml.

2006-12-06 Thread Scott Atchley
On Dec 6, 2006, at 2:29 PM, Brock Palen wrote: I wonder if we can narrow this down a bit to perhaps a PML protocol issue. Start by disabling RDMA by using: -mca btl_gm_flags 1 On the other-hand, with OB1 using btl_gm_flags 1 fixed the error problem with OMPI! Which is a great first step.

Re: [OMPI users] running with the dr pml.

2006-12-05 Thread Scott Atchley
On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote: Brock Palen wrote: I was asked by mirycom to run a test using the data reliability pml. (dr) I ran it like so: $ mpirun --mca pml dr -np 4 ./xhpl Is this the right format for running the dr pml? This should be fine, yes. I can running H

Re: [OMPI users] myirnet problems on OSX

2006-11-29 Thread Scott Atchley
On Nov 29, 2006, at 8:44 AM, Scott Atchley wrote: My last few runs all completed successfully without hanging. The job I am currently running just hung one node (can respond to ping, cannot ssh into it, cannot use any terminals connected to it). There are no messages in dmesg and vmstat shows

Re: [OMPI users] myirnet problems on OSX

2006-11-29 Thread Scott Atchley
On Nov 21, 2006, at 1:27 PM, Brock Palen wrote: I had sent a message two weeks ago about this problem and talked with jeff at SC06 about how it might not be a OMPI problem. But it appears now working with myricom that it is a problem in both lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the re

Re: [OMPI users] myirnet problems on OSX

2006-11-21 Thread Scott Atchley
On Nov 21, 2006, at 1:27 PM, Brock Palen wrote: I had sent a message two weeks ago about this problem and talked with jeff at SC06 about how it might not be a OMPI problem. But it appears now working with myricom that it is a problem in both lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the re