Re: [OMPI users] QP creation failure on iWARP adapter

2016-02-06 Thread Steve Wise
On 2/5/2016 2:38 AM, dpchoudh . wrote: Dear all This is a slightly off-topic post, and hopefully people won't mind helping me out. I have a very simple setup with two PCs, both with identical Chelsio 10GE iWARP adapter connected back-to-back. With this setup, the TCP channel works fine (with

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-11 Thread Steve Wise
Hey Jeff, what did you run to generate the memory corruption? Can you run the same test with --mca btl_openib_memalign_threshold 12288 and see if you get the same corruption? I'm not hitting any corruption over iw_cxgb4 with a simple test. On 6/10/2015 2:39 PM, Jeff Squyres (jsquyres) wrote

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-11 Thread Steve Wise
FYI: I opened: https://github.com/open-mpi/ompi/issues/638 to track this. Steve. On 6/10/2015 4:07 PM, Ralph Castain wrote: Done On Jun 10, 2015, at 1:55 PM, Steve Wise <mailto:sw...@opengridcomputing.com>> wrote: If you're trying to release 1.8.6, I recommend you revert th

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Steve Wise
ist > Cc: Nathan Hjelm; Steve Wise > Subject: Re: [OMPI users] Default value of btl_openib_memalign_threshold > > Nathan / Steve -- you guys are nominally the owners of the openib BTL: can > you please investigate? > > > > On Jun 10, 2015, at 4:15 PM, Ralph Castain wro

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
t; >> On Jun 2, 2015, at 7:10 AM, Steve Wise > wrote: >> >> On 6/1/2015 9:51 PM, Ralph Castain wrote: >>> I’m wondering if it is also possible that the error message is simply printing that ID incorrectly. Looking at the code, it appears

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
b BTL bootstrapping). :-) On Jun 2, 2015, at 10:04 AM, Ralph Castain wrote: On Jun 2, 2015, at 7:10 AM, Steve Wise wrote: On 6/1/2015 9:51 PM, Ralph Castain wrote: I’m wondering if it is also possible that the error message is simply printing that ID incorrectly. Looking at the code, it ap

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
On 6/2/2015 10:04 AM, Ralph Castain wrote: On Jun 2, 2015, at 7:10 AM, Steve Wise <mailto:sw...@opengridcomputing.com>> wrote: On 6/1/2015 9:51 PM, Ralph Castain wrote: I’m wondering if it is also possible that the error message is simply printing that ID incorrectly. Looking at the

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
erent MPI processes specific different receive queue specifications. You mentioned that the device ID is being incorrectly identified: is that OMPI's fault, or something wrong with the device itself? On Jun 1, 2015, at 6:06 PM, Steve Wise wrote: On 6/1/2015 9:53 AM, Ralph Castain wrote

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
On 6/1/2015 9:53 AM, Ralph Castain wrote: Well, I checked and it looks to me like —hetero-apps is a stale option in the master at least - I don’t see where it gets used. Looking at the code, I would suspect that something didn’t get configured correctly - either the —enable-heterogeneous flag

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
one of the settings that were printed out: P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64 or P,65536,64 -Nathan On Mon, Jun 01, 2015 at 09:28:28AM -0500, Steve Wise wrote: Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
been so long since someone tried this that I’d have to look to remember what it does. On Jun 1, 2015, at 7:28 AM, Steve Wise wrote: Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one node is a PPC64 BE byte order and the other is a X86_64 LE byte

[OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one node is a PPC64 BE byte order and the other is a X86_64 LE byte order node. OMPI 1.8.4 is configured with --enable-heterogeneous: ./configure --with-openib=/usr CC=gcc CXX=g++ F77=gfortran FC=gfortran --e

[OMPI users] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
Hey Open MPI wizards, I'm trying to debug something in my library that gets loaded into my mpi processes when they are started via mpirun. With other MPIs, I've been able to deliver SIGUSR2 to the process and trigger some debug code I have in my library that sets up a handler for SIGUSR2. Ho

[OMPI users] padb and openmpi

2010-08-17 Thread Steve Wise
Hi, I'm trying to use padb 3.0 to get stack traces on open-mpi / IMB1 runs. While the job is running, I do run this, but get an error: [ompi@hpc-hn1 ~]$ padb --show-jobs --config-option rmgr=orte 65427 [ompi@hpc-hn1 ~]$ padb --all --proc-summary --config-option rmgr=orte Warning, failed to l

Re: [OMPI users] TCP Bandwidth

2008-08-18 Thread Steve Wise
Andy Georgi wrote: Steve Wise wrote: Are you using Chelsio's TOE drivers? Or just a driver from the distro? We use the Chelsio TOE drivers. Steve Wise wrote: Ok. Did you run their perftune.sh script? Yes, if not we wouldn't get the 1.15 GB/s on the TCP level. We had ~800 M

Re: [OMPI users] TCP Bandwidth

2008-08-18 Thread Steve Wise
Jon Mason wrote: On Mon, Aug 18, 2008 at 10:00:24AM +0200, Andy Georgi wrote: Steve Wise wrote: Are you using Chelsio's TOE drivers? Or just a driver from the distro? We use the Chelsio TOE drivers. Steve Wise wrote: Ok. Did you run their perftune.sh s

Re: [OMPI users] TCP Bandwidth

2008-08-17 Thread Steve Wise
Andy Georgi wrote: Hello again ;), after getting acceptable latency on our Chelsio S320E-CXA adapters we now want to check if we can also tune the bandwidth. On TCP level (measured via iperf) we get 1.15 GB/s, on MPI level (measured via MPI-Ping-Pong) just 930 MB/s. We already set btl_tcp_sndb

Re: [OMPI users] TCP Latency

2008-08-17 Thread Steve Wise
With OpenMPI 1.3 / iWARP you should get around 8us latency using mpi pingpong tests. Andy Georgi wrote: Thanks again for all the answers. It seems that were was a bug in the driver in combination with Suse Linux Enterprise Server 10. It was fixed with version 1.0.146. Now we have 12us with NP

Re: [OMPI users] newbie question

2007-05-10 Thread Steve Wise
On Thu, 2007-05-10 at 20:07 -0400, Jeff Squyres wrote: > Brian -- > > Didn't you add something to fix exactly this problem recently? I > have a dim recollection of seeing a commit go by about this...? > > (I advised Steve in IM to use --disable-ipv6 in the meantime) > Yes, disabling it worke

[OMPI users] newbie question

2007-05-10 Thread Steve Wise
I'm trying to run a job specifically over tcp and the eth1 interface. It seems to be barfing on trying to listen via ipv6. I don't want ipv6. How can I disable it? Here's my mpirun line: [root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl self,tcp -mca btl_tcp_if_include eth1 /root/IM