Re: [OMPI users] v2.1.1 How to utilise multiple NIC ports

2018-12-22 Thread Bob Beattie

Hi Jeff,

> How are you measuring that it hasn't been successful?
A network switch sits between the two machines and I am watching the link activity on the 
ports.


> One thing to make sure of is that you interfaces are on different subnets.
Oh.  I had them all on the same subnet.  Now the first port shares the same subnet so I 
can ssh in and the other ports have their own just as you suggested.


> Bad Things(tm) can happen...
:)

How do I now go about setting up /etc/hosts, -hostfile entries and bringing them all 
together on the mpirun run line ?
For example, my 2nd machine is a quad core Dell T3500.  Should I create a separate entry 
in /etc/hosts for each NIC port ? (T3500-eth1, T3500-eth2, T3500-eth3):

and for the -hostfile should I also create separate entries for each core ?

Cheers,
Bob.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] v2.1.1 How to utilise multiple NIC ports

2018-12-22 Thread Jeff Squyres (jsquyres) via users
On Dec 22, 2018, at 10:56 AM, Bob Beattie  wrote:
> 
> How do I now go about setting up /etc/hosts, -hostfile entries and bringing 
> them all together on the mpirun run line ?
> For example, my 2nd machine is a quad core Dell T3500.  Should I create a 
> separate entry in /etc/hosts for each NIC port ? (T3500-eth1, T3500-eth2, 
> T3500-eth3):
> and for the -hostfile should I also create separate entries for each core ?

You can add entries in /etc/hosts for the new IP interfaces if you like, but 
Open MPI won't care.

Open MPI deals with IP addresses, and it'll auto-discover them (by looking at 
all the IP interfaces exported by the kernel) and use them as it finds them.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] v2.1.1 How to utilise multiple NIC ports

2018-12-22 Thread Bob Beattie

Many, many thanks.
Couldn't see the wood for the trees !
I now have the two machines using all their 1Gb ports to talk to each other.

Cheers Jeff,
Happy holidays.
Bob. South UK.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] open-mpi.org 3.1.3.tar.gz needs a refresh?

2018-12-22 Thread Bennet Fauber
Maybe the distribution tar ball at

https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.gz

did not get refreshed after the fix in

https://github.com/bosilca/ompi/commit/b902cd5eb765ada57f06c75048509d0716953549

was implemented? I downloaded the tarball from open-mpi.org today, 22
Dec, and compiled and I get the warnings.

ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0xd82
valid_mask = 0x1)
[bn01][[37143,17005],0][btl_openib_component.c:1670:init_one_device]
error obtaining device attributes for mlx4_0 errno says Invalid
argument
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0xd810002
valid_mask = 0x1)
[bn01][[37143,17005],1][btl_openib_component.c:1670:init_one_device]
error obtaining device attributes for mlx4_0 errno says Invalid
argument
--
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   bn01
  Local device: mlx4_0
--
--
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   bn01
  Local device: mlx4_0
--

It looks like Howard merged the fix on Dec 4, but the date listed for
the 3.1.3 tarball on the open-mpi.org site is in Oct.

Relevant lines in opal/mca/btl/openib/btl_openib_component.c from the
tar ball are these.  Missing the

memset(&device->ib_exp_dev_attr, 0, sizeof(device->ib_exp_dev_attr));

that should have been inserted at 1667.

1666 #if HAVE_DECL_IBV_EXP_QUERY_DEVICE
1667 device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
1668 if(ibv_exp_query_device(device->ib_dev_context,
&device->ib_exp_dev_att r)){
1669 BTL_ERROR(("error obtaining device attributes for %s
errno says %s" ,
1670 ibv_get_device_name(device->ib_dev), strerror(errno)));
1671 goto error;
1672 }
1673 #endif

I added a comment to the GitHub issue, but it was closed and I am not
sure that will be noticed.  Sorry for the double-posting if that was
sufficient.

-- bennet
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] open-mpi.org is DOWN

2018-12-22 Thread Ralph H Castain
Hello all

Apologies to everyone, but I received an alert this moring that malware has 
been detected on the www.open-mpi.org site. I have tried to contact the hosting 
agency and the security scanners, but nobody is around on this pre-holiday 
weekend.

Accordingly, I have taken the site OFFLINE for the indeterminate future until 
we can get this resolved. Sadly, with the holidays upon us, I don’t know how 
long it will take to get responses from either company. Until we do, the site 
will remain offline for safety reasons.

Ralph

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users