Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-10 Thread Randolph Pullen
Thats very interesting Yevgeny, Yes tcp,self ran in 12 seconds tcp,self,sm ran in 27 seconds Does anyone have any idea how this can be? About half the data would go to local processes, so SM should pay dividends. From: Yevgeny Kliteynik To: Randolph Pullen

[OMPI users] test for sctp on FreeBSD too narrow

2012-09-10 Thread Brooks Davis
The test for SCTP support in libc on FreeBSD only allows it to work on FreeBSD 7 (or I suppose 70 :). That attached patch expands the test to 7 though 19 which should be enough for a while. Hopefully by the time FreeBSD 19 is out everything will have sctp support in libc or have dropped it. :) --

Re: [OMPI users] problem with rankfile

2012-09-10 Thread Jeff Squyres
We actually include hwloc v1.3.2 in the OMPI v1.6 series. Can you download and try that on your machines? http://www.open-mpi.org/software/hwloc/v1.3/ In particular try the hwloc-bind executable (outside of OMPI), and see if binding works properly on your machines. I typically run a te

Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7

2012-09-10 Thread Jeff Squyres
We've added "lo" back into the list, but I'm curious as to why 127.0.0.1/8 doesn't work. If you run ipconfig, what does it say for the localhost entry? I.e., what's its IP address and netmask? On Sep 9, 2012, at 1:27 PM, Siegmar Gross wrote: > Hi Shiqing, > > I disabled IPv6 in my network a

Re: [OMPI users] gcc problem compiling openmpi-1.6.1 on Solaris 10 sparc

2012-09-10 Thread Jeff Squyres
Ok, so this is 2 errors. 1. Something in the C++ bindings (which is weird because it's new; I don't think this code has changed in a long, long time). This actually looks like a problem in your C++ compiler, however -- can you compile other C++ applications at all? 2. Same issue in VT. I'll

Re: [OMPI users] problem with rankfile

2012-09-10 Thread Ralph Castain
Hmmm...well, let's try to isolate this a little. Would you mind installing a copy of the current trunk on this machine and trying it? I ask because I'd like to better understand if the problem is in the actual binding mechanism (i.e., hwloc), or in the code that computes where to bind the proce

Re: [OMPI users] [omx-devel] Open-mx issue with ompi 1.6.1

2012-09-10 Thread Brice Goglin
I replied a couple days ago (with OMPI users in CC) but got an error last night: Action: failed Status: 5.0.0 (permanent failure) Diagnostic-Code: smtp; 5.4.7 - Delivery expired (message too old) 'timeout' (delivery attempts: 0) I resent the mail this morning, it looks like it wasn't delivered

Re: [OMPI users] Open-mx issue with ompi 1.6.1

2012-09-10 Thread Douglas Eadline
I got no response for this question. Is Open-MX no longer supported in Open MPI? Or is there someplace else I should submit this information? I also attached my ompi_info and omx_info output -- Doug > I built open-mpi 1.6.1 using the open-mx libraries. > This worked previously and now I get the

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Ralph Castain
Just following up on this comment about running from a backend node while under slurm - I just tested this (using the patched 1.6 branch) and found it works just fine. However, note that you will only be able to execute on that local node as we cannot detect the full allocation anywhere but on t

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Ralph Castain
Yes, 1.6.2rc1 had a problem - now fixed, will be in tomorrow's nightly 1.6 tarball. On Sep 10, 2012, at 9:50 AM, Siegmar Gross wrote: > Hi, > > thank you very much for your fast answer. > >> On 10/09/2012 15:41, Siegmar Gross wrote: >>> Hi, >>> >>> I have built openmpi-1.6.2rc1 and get the

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Siegmar Gross
Hi, thank you very much for your fast answer. > On 10/09/2012 15:41, Siegmar Gross wrote: > > Hi, > > > > I have built openmpi-1.6.2rc1 and get the following error. > > > > tyr small_prog 123 mpicc -showme > > cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt > >-L/usr/local/openmpi-1.6.2_32_cc

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Ralph Castain
Wow - okay, I'll have to investigate. Be aware, though, that you just described a completely different failure. Oracle isn't using slurm, last I heard - you were using rsh/qrsh. And you aren't running from a backend node, but from the same frontend - just have two hosts listed in your -host entr

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Eugene Loh
On 09/10/12 11:37, Ralph Castain wrote: On Sep 10, 2012, at 8:12 AM, Aleksey Senin wrote: On 10/09/2012 15:41, Siegmar Gross wrote: Hi, I have built openmpi-1.6.2rc1 and get the following error. tyr small_prog 123 mpicc -showme cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt -L/usr/local

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Ralph Castain
On Sep 10, 2012, at 8:12 AM, Aleksey Senin wrote: > On 10/09/2012 15:41, Siegmar Gross wrote: >> Hi, >> >> I have built openmpi-1.6.2rc1 and get the following error. >> >> tyr small_prog 123 mpicc -showme >> cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt >> -L/usr/local/openmpi-1.6.2_32_cc/

Re: [OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Aleksey Senin
On 10/09/2012 15:41, Siegmar Gross wrote: Hi, I have built openmpi-1.6.2rc1 and get the following error. tyr small_prog 123 mpicc -showme cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt -L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp -lsocket -lnsl -lrt -lm tyr small_prog 12

[OMPI users] segmentation fault with openmpi-1.6.2

2012-09-10 Thread Siegmar Gross
Hi, I have built openmpi-1.6.2rc1 and get the following error. tyr small_prog 123 mpicc -showme cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt -L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp -lsocket -lnsl -lrt -lm tyr small_prog 124 mpiexec -np 2 -host tyr init_finalize Hello

Re: [OMPI users] problem with rankfile

2012-09-10 Thread Siegmar Gross
Hi, > > are the following outputs helpful to find the error with > > a rankfile on Solaris? > > If you can't bind on the new Solaris machine, then the rankfile > won't do you any good. It looks like we are getting the incorrect > number of cores on that machine - is it possible that it has > hard

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-10 Thread Yevgeny Kliteynik
Randolph, So what you saying in short, leaving all the numbers aside, is the following: In your particular application on your particular setup with this particular OMPI version, 1. openib BTL performs faster than shared memory BTL 2. TCP BTL performs faster than shared memory IMHO, this indic