The real solution is to evict the private addresses from both levels (MPI and
ORTE). However, based on the ordering of the interfaces, I guess you cannot do
it by name (eth0 has a private address on one side but a public one on the
other).
No panic! There is support for this.
Look at the outpu
Ah...you're dealing with NAT. Sorry, I didn't understand that.
OMPI currently doesn't handle NAT well. :-(
There was some work at U. Tennessee to handle NAT nicely, but I think they
forked off and made their own release based on an older version of Open MPI.
...or maybe I'm remembering that
The thing is that there's just one interface: eth0.
Computer 1 thinks that it has 212... but it actually has a 210 when accessed
from outside. There's no other interface to choose from, just the one that
thinks it's a 212, the eth0.
Or maybe I'm just not understanding correctly.
---
On Wed,
Check out this FAQ entry:
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control
MPI-level communications. There's also oob_tcp_if_include / oob_tcp_if_exclude
(that take the same kinds of values as btl_tcp_if_incl
"OMPI always tries to use the lowest numbered address first - just a natural
ordering."
That doesn't seem to be the reason. We changed the private IPs to 212... (a
higher number than the public 210... IPs) and still MPI tries to go to 212
afterwards.
We're reading the oob_tcp and btl_tcp paramete
Does a "hello world" MPI app (with no MPI_SEND/MPI_RECV) in it work without
those params, but an MPI app with MPI_SEND/MPI_RECV hang?
If so, that's a little disappointing -- OMPI's MPI layer should be able to tell
the difference between the different networks and should be able to figure out
ro
On Oct 5, 2011, at 9:35 AM, Yevgeny Kliteynik wrote:
>> Yevgeny -- can you check that out?
>
> Yep, indeed - configure doesn't abort when "--enable-openib-rdmacm"
> is provided and "rdma/rdma_cma.h" is not found.
Can you fix?
It might also be worthwhile to at least print a warning if IBoIP supp
On 05-Oct-11 3:15 PM, Jeff Squyres wrote:
>> You shouldn't use the "--enable-openib-rdmacm" option - rdmacm
>> support is enabled by default, providing librdmacm is found on
>> the machine.
>
> Actually, this might be a configure bug. We have lots of other configure
> options that, even if "foo"
Jeff, Yevgeny,
Thanks for your responses.
We found the problem. Issue was that the librdmacm-devel rpm was not installed
on the build system.
Installed the rpm and re-built OpenMPI. Now RoCE works fine.
You might the requirement for the librdmacm-devel rpm to the install readme.
-Jeff
> ---
On Oct 5, 2011, at 9:04 AM, Yevgeny Kliteynik wrote:
>> Built OpenMPI with this option "--enable-openib-rdmacm".
>> Our system has OFED 1.5.2 with librdmacm-1.0.13-1
>>
>> I noticed this output from configure script:
>> checking rdma/rdma_cma.h usability... no
>> checking rdma/rdma_cma.h presence
On 16:58 Wed 05 Oct , Dmitry N. Mikushin wrote:
> Maybe Mickaël means load balancing could be achieved simply by
> spawning various number of MPI processes, depending on how many cores
> particular node has?
Varying numbers of slots are not a problem. But Mickaël's mail
indicates that he woul
Jeff,
On 01-Oct-11 1:01 AM, Konz, Jeffrey (SSA Solution Centers) wrote:
> Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE
> fabric.
>
> Got this run time error:
>
> An invalid CPC name was specified via the btl_openib_cpc_include MCA
> parameter.
>
>Local host:
Hi,
Maybe Mickaël means load balancing could be achieved simply by
spawning various number of MPI processes, depending on how many cores
particular node has? This should be possible, but accuracy of such
balancing will be task-dependent due to other factors, like memory
operations and communicatio
I'm afraid you'll have to do this kind of load balancing in your
application itself as Open MPI (just like any other MPI implementation)
has no notion of how your application manages its workload.
HTH
-Andreas
On 14:05 Wed 05 Oct , Mickaël CANÉVET wrote:
> Hi,
>
> Is there a way to define a
Hi,
Is there a way to define a weight to the CPUs of the hosts. I have a
cluster made of machine from different generation and when I run a
process on it, the whole cluster is slowed down by the slowest node.
What I'd like to do is something like that in my hostfile:
oldest slots=4 weight=0.75
n
15 matches
Mail list logo