Dear George,
thanks a lot for your explanations. Now all works and it is more clear to
me.
Best Regards,
Diego
Diego
On 14 October 2015 at 17:16, Georg Geiser wrote:
> Hi Diego,
>
> displacements start at 0, so 0 means no displacement, i.e., the
> corresponding data starts at the first entry
On Oct 15, 2015, at 2:58 PM, Brant Abbott wrote:
>
> If I use mpirun.openmpi everything works as normal. I suppose mpirun is
> executing the MPICH version. I'm not entirely sure why when logged in a root
> it behaves differently, but good enough for me to just use the alternative
> command.
T
Hi,
We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat,
iblinkinfo, ibqueryerrors(*)). It's operating at Rate 40 FDR10.
But openMPI programs (test and user) that are specifying the
'openib,self,sm'
paramenters do not seem to be using the IB network according to network-
m
David,
ib0 means IP over IB
this is *not* what you want to use since it is way slower than native
infiniband.
if you
mpirun --mca self,sm,openib ...
on more than one node, the only btl usable for inter node communication is
openib,
so if communication happen, that means opening is used.
in order
Hi,
I have encountered a problem when running with 1.6.5 over IB (openib,
ConnectX-3):
[[51298,1],2][btl_openib_component.c:3496:handle_wc] from
ib7-bc2qq42-be01p02 to: 3 error polling LP CQ with status RETRY EXCEEDED ERROR
status number 12 for wr_id 217ce00 opcode 0 vendor error 129 qp_i
Did you try to run ibdiagnet to check the network ?
Also, how many devices you have on the same node ?
It say "mlx4_14" - do you have 14 HCA on the same machine ?!
Best,
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On 10/16/2015 01:35 PM, Shamis, Pavel wrote:
Did you try to run ibdiagnet to check the network ?
Also, how many devices you have on the same node ?
It say "mlx4_14" - do you have 14 HCA on the same machine ?!
Yes. ibdiagnet seems to check out fine except for a few warning which do
not seem to b
Well, OMPI will see this as a 14 separate devices and will create ~28 openib
btl instances (one per each port).
Can you try to limit OpenMPI to run with a single device/port and see what
happens ?
Best,
Pasha
From: users mailto:users-boun...@open-mpi.org>> on
behalf of John Marshall
mailto:jo
On 10/16/2015 02:27 PM, Shamis, Pavel wrote:
Well, OMPI will see this as a 14 separate devices and will create ~28 openib
btl instances (one per each port).
Can you try to limit OpenMPI to run with a single device/port and see what
happens ?
We are running inside an LXC container and only 1 i
My group is running a fairly large CFD code compiled with Intel Fortran 16.0.0
and OpenMPI 1.8.4. Each night we run hundreds of simple test cases, using a
range of MPI processes from 1 to 16. I have noticed that if we submit these
jobs on our linux cluster and assign each job exclusive rights to
> On Oct 16, 2015, at 3:25 PM, McGrattan, Kevin B. Dr.
> wrote:
>
> I cannot nail this down any better because this happens like every other
> night, with about 1 out of a hundred jobs. Can anyone think of a reason why
> the job would seg fault in MPI_FINALIZE, but only under conditions where
@Jeff, Kevin
Shouldn't Kevin wait for 1.10.1 with the intel 16 compiler?
A bugfix for intel 16 has been committed with
fb49a2d71ed9115be892e8a22643d9a1c069a8f9.
(At least I am anxiously awaiting the 1.10.1 because I cannot get my builds
to complete successfully)
2015-10-16 19:33 GMT+00:00 Jeff
If you are using Intel 16, yes, 1.10.1 would be a good choice.
If you're not using Fortran, you can disable the MPI Fortran bindings, and you
should be ok, too.
> On Oct 16, 2015, at 3:54 PM, Nick Papior wrote:
>
> @Jeff, Kevin
>
> Shouldn't Kevin wait for 1.10.1 with the intel 16 compiler?
13 matches
Mail list logo