If you are using Intel 16, yes, 1.10.1 would be a good choice.
If you're not using Fortran, you can disable the MPI Fortran bindings, and you
should be ok, too.
> On Oct 16, 2015, at 3:54 PM, Nick Papior wrote:
>
> @Jeff, Kevin
>
> Shouldn't Kevin wait for 1.10.1 with the intel 16 compiler?
@Jeff, Kevin
Shouldn't Kevin wait for 1.10.1 with the intel 16 compiler?
A bugfix for intel 16 has been committed with
fb49a2d71ed9115be892e8a22643d9a1c069a8f9.
(At least I am anxiously awaiting the 1.10.1 because I cannot get my builds
to complete successfully)
2015-10-16 19:33 GMT+00:00 Jeff
> On Oct 16, 2015, at 3:25 PM, McGrattan, Kevin B. Dr.
> wrote:
>
> I cannot nail this down any better because this happens like every other
> night, with about 1 out of a hundred jobs. Can anyone think of a reason why
> the job would seg fault in MPI_FINALIZE, but only under conditions where
My group is running a fairly large CFD code compiled with Intel Fortran 16.0.0
and OpenMPI 1.8.4. Each night we run hundreds of simple test cases, using a
range of MPI processes from 1 to 16. I have noticed that if we submit these
jobs on our linux cluster and assign each job exclusive rights to
On 10/16/2015 02:27 PM, Shamis, Pavel wrote:
Well, OMPI will see this as a 14 separate devices and will create ~28 openib
btl instances (one per each port).
Can you try to limit OpenMPI to run with a single device/port and see what
happens ?
We are running inside an LXC container and only 1 i
Well, OMPI will see this as a 14 separate devices and will create ~28 openib
btl instances (one per each port).
Can you try to limit OpenMPI to run with a single device/port and see what
happens ?
Best,
Pasha
From: users mailto:users-boun...@open-mpi.org>> on
behalf of John Marshall
mailto:jo
On 10/16/2015 01:35 PM, Shamis, Pavel wrote:
Did you try to run ibdiagnet to check the network ?
Also, how many devices you have on the same node ?
It say "mlx4_14" - do you have 14 HCA on the same machine ?!
Yes. ibdiagnet seems to check out fine except for a few warning which do
not seem to b
Did you try to run ibdiagnet to check the network ?
Also, how many devices you have on the same node ?
It say "mlx4_14" - do you have 14 HCA on the same machine ?!
Best,
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
Hi,
I have encountered a problem when running with 1.6.5 over IB (openib,
ConnectX-3):
[[51298,1],2][btl_openib_component.c:3496:handle_wc] from
ib7-bc2qq42-be01p02 to: 3 error polling LP CQ with status RETRY EXCEEDED ERROR
status number 12 for wr_id 217ce00 opcode 0 vendor error 129 qp_i
David,
ib0 means IP over IB
this is *not* what you want to use since it is way slower than native
infiniband.
if you
mpirun --mca self,sm,openib ...
on more than one node, the only btl usable for inter node communication is
openib,
so if communication happen, that means opening is used.
in order
Hi,
We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat,
iblinkinfo, ibqueryerrors(*)). It's operating at Rate 40 FDR10.
But openMPI programs (test and user) that are specifying the
'openib,self,sm'
paramenters do not seem to be using the IB network according to network-
m
On Oct 15, 2015, at 2:58 PM, Brant Abbott wrote:
>
> If I use mpirun.openmpi everything works as normal. I suppose mpirun is
> executing the MPICH version. I'm not entirely sure why when logged in a root
> it behaves differently, but good enough for me to just use the alternative
> command.
T
Dear George,
thanks a lot for your explanations. Now all works and it is more clear to
me.
Best Regards,
Diego
Diego
On 14 October 2015 at 17:16, Georg Geiser wrote:
> Hi Diego,
>
> displacements start at 0, so 0 means no displacement, i.e., the
> corresponding data starts at the first entry
13 matches
Mail list logo