Re: [OMPI users] Private and public IP mixing.

2011-10-05 Thread George Bosilca
The real solution is to evict the private addresses from both levels (MPI and ORTE). However, based on the ordering of the interfaces, I guess you cannot do it by name (eth0 has a private address on one side but a public one on the other). No panic! There is support for this. Look at the outpu

Re: [OMPI users] Private and public IP mixing.

2011-10-05 Thread Jeff Squyres
Ah...you're dealing with NAT. Sorry, I didn't understand that. OMPI currently doesn't handle NAT well. :-( There was some work at U. Tennessee to handle NAT nicely, but I think they forked off and made their own release based on an older version of Open MPI. ...or maybe I'm remembering that

Re: [OMPI users] Private and public IP mixing.

2011-10-05 Thread (.-=Kiwi=-.)
The thing is that there's just one interface: eth0. Computer 1 thinks that it has 212... but it actually has a 210 when accessed from outside. There's no other interface to choose from, just the one that thinks it's a 212, the eth0. Or maybe I'm just not understanding correctly. ---  On Wed,

Re: [OMPI users] Private and public IP mixing.

2011-10-05 Thread Jeff Squyres
Check out this FAQ entry: http://www.open-mpi.org/faq/?category=tcp#tcp-selection Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control MPI-level communications. There's also oob_tcp_if_include / oob_tcp_if_exclude (that take the same kinds of values as btl_tcp_if_incl

Re: [OMPI users] Private and public IP mixing.

2011-10-05 Thread (.-=Kiwi=-.)
"OMPI always tries to use the lowest numbered address first - just a natural ordering." That doesn't seem to be the reason. We changed the private IPs to 212... (a higher number than the public 210... IPs) and still MPI tries to go to 212 afterwards. We're reading the oob_tcp and btl_tcp paramete

Re: [OMPI users] Private and public IP mixing.

2011-10-05 Thread Jeff Squyres
Does a "hello world" MPI app (with no MPI_SEND/MPI_RECV) in it work without those params, but an MPI app with MPI_SEND/MPI_RECV hang? If so, that's a little disappointing -- OMPI's MPI layer should be able to tell the difference between the different networks and should be able to figure out ro

Re: [OMPI users] problem running with RoCE over 10GbE

2011-10-05 Thread Jeff Squyres
On Oct 5, 2011, at 9:35 AM, Yevgeny Kliteynik wrote: >> Yevgeny -- can you check that out? > > Yep, indeed - configure doesn't abort when "--enable-openib-rdmacm" > is provided and "rdma/rdma_cma.h" is not found. Can you fix? It might also be worthwhile to at least print a warning if IBoIP supp

Re: [OMPI users] problem running with RoCE over 10GbE

2011-10-05 Thread Yevgeny Kliteynik
On 05-Oct-11 3:15 PM, Jeff Squyres wrote: >> You shouldn't use the "--enable-openib-rdmacm" option - rdmacm >> support is enabled by default, providing librdmacm is found on >> the machine. > > Actually, this might be a configure bug. We have lots of other configure > options that, even if "foo"

Re: [OMPI users] problem running with RoCE over 10GbE

2011-10-05 Thread Konz, Jeffrey (SSA Solution Centers)
Jeff, Yevgeny, Thanks for your responses. We found the problem. Issue was that the librdmacm-devel rpm was not installed on the build system. Installed the rpm and re-built OpenMPI. Now RoCE works fine. You might the requirement for the librdmacm-devel rpm to the install readme. -Jeff > ---

Re: [OMPI users] problem running with RoCE over 10GbE

2011-10-05 Thread Jeff Squyres
On Oct 5, 2011, at 9:04 AM, Yevgeny Kliteynik wrote: >> Built OpenMPI with this option "--enable-openib-rdmacm". >> Our system has OFED 1.5.2 with librdmacm-1.0.13-1 >> >> I noticed this output from configure script: >> checking rdma/rdma_cma.h usability... no >> checking rdma/rdma_cma.h presence

Re: [OMPI users] OpenMPI with CPU of different speed.

2011-10-05 Thread Andreas Schäfer
On 16:58 Wed 05 Oct , Dmitry N. Mikushin wrote: > Maybe Mickaël means load balancing could be achieved simply by > spawning various number of MPI processes, depending on how many cores > particular node has? Varying numbers of slots are not a problem. But Mickaël's mail indicates that he woul

Re: [OMPI users] problem running with RoCE over 10GbE

2011-10-05 Thread Yevgeny Kliteynik
Jeff, On 01-Oct-11 1:01 AM, Konz, Jeffrey (SSA Solution Centers) wrote: > Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE > fabric. > > Got this run time error: > > An invalid CPC name was specified via the btl_openib_cpc_include MCA > parameter. > >Local host:

Re: [OMPI users] OpenMPI with CPU of different speed.

2011-10-05 Thread Dmitry N. Mikushin
Hi, Maybe Mickaël means load balancing could be achieved simply by spawning various number of MPI processes, depending on how many cores particular node has? This should be possible, but accuracy of such balancing will be task-dependent due to other factors, like memory operations and communicatio

Re: [OMPI users] OpenMPI with CPU of different speed.

2011-10-05 Thread Andreas Schäfer
I'm afraid you'll have to do this kind of load balancing in your application itself as Open MPI (just like any other MPI implementation) has no notion of how your application manages its workload. HTH -Andreas On 14:05 Wed 05 Oct , Mickaël CANÉVET wrote: > Hi, > > Is there a way to define a

[OMPI users] OpenMPI with CPU of different speed.

2011-10-05 Thread Mickaël CANÉVET
Hi, Is there a way to define a weight to the CPUs of the hosts. I have a cluster made of machine from different generation and when I run a process on it, the whole cluster is slowed down by the slowest node. What I'd like to do is something like that in my hostfile: oldest slots=4 weight=0.75 n