>>You can try a more recent version of openmpi
>>1.10.2 was released recently, or try with a nightly snapshot of master.
>>If all of these still fail, can you post a trimmed version of your
program so we can investigate ?
Hi Gilles,
I try 1.10.2. My program has been running successfully without
On Jan 21, 2016, at 7:40 AM, Eva wrote:
>
> Thanks Jeff.
>
> >>1. Can you create a small example to reproduce the problem?
>
> >>2. The TCP and verbs-based transports use different thresholds and
> >>protocols, and can sometimes bring to light errors in the application
> >>(e.g., the applica
Thanks Jeff.
>>1. Can you create a small example to reproduce the problem?
>>2. The TCP and verbs-based transports use different thresholds and
protocols, and can sometimes bring to light errors in the application
(e.g., the application is making assumptions that just happen to be true
for TCP, b
Can you create a small example to reproduce the problem?
The TCP and verbs-based transports use different thresholds and protocols, and
can sometimes bring to light errors in the application (e.g., the application
is making assumptions that just happen to be true for TCP, but not necessarily
fo
That could be a bug in openib, openmpi and/or your application.
for example, a memory corruption could be unnoticed with tcp, but might
cause openib hang.
you can start by running your program under a memory debugger
(valgrind, ddt or other) and confirm your application works fine.
you can also up
Gilles,
Actually, there are some more strange things.
With the same environment and MPI version, I write a simple program by
using the same communication logic with my hang program.
The simple program can work without hang.
So is there any possible reason? I can try them one by one.
Or can I debug
You can try a more recent version of openmpi
1.10.2 was released recently, or try with a nightly snapshot of master.
If all of these still fail, can you post a trimmed version of your program so
we can investigate ?
Cheers,
Gilles
Eva wrote:
>Gilles,
>
>>>Can you try to
>>>mpirun --mca btl t
Gilles,
>>Can you try to
>>mpirun --mca btl tcp,self --mca btl_tcp_eager_limit 56 ...
>>and confirm it works fine with TCP *and* without eager ?
I have tried this and it works.
So what should I do next?
2016-01-21 16:25 GMT+08:00 Eva :
> Thanks Gilles.
> it works fine on tcp
> So I use this to
Can you try to
mpirun --mca btl tcp,self --mca btl_tcp_eager_limit 56 ...
and confirm it works fine with TCP *and* without eager ?
Cheers,
Gilles
On 1/21/2016 5:25 PM, Eva wrote:
Thanks Gilles.
it works fine on tcp
So I use this to disable eager:
-mca btl_openib_use_eager_rdma 0 -mca btl_open
Thanks Gilles.
it works fine on tcp
So I use this to disable eager:
-mca btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0
2016-01-21 13:10 GMT+08:00 Eva :
> I run with two machines, 2 process per node: process0, process1, process2,
> process3.
> After some random rounds of communicat
and by the way, you did
mpirun --mca btl_tcp_eager_limit 56
in order to disable eager mode, right ?
--mca btl_tcp_rndv_eager_limit 0
does something different
Cheers,
Gilles
On 1/21/2016 2:10 PM, Eva wrote:
I run with two machines, 2 process per node: process0, process1,
process2, process3.
Af
Hi,
can you post a trimmed version of your program so we can reproduce and
analyze the hang ?
Cheers,
Gilles
On 1/21/2016 2:10 PM, Eva wrote:
I run with two machines, 2 process per node: process0, process1,
process2, process3.
After some random rounds of communications, the communication ha
I run with two machines, 2 process per node: process0, process1, process2,
process3.
After some random rounds of communications, the communication hangs. When I
debug into the program, I found:
process1 sent a message to process2;
process2 received the message from process1 and then start to receiv
Run MPI_Send on MPI1.8.5 without multithread enabled:
it hangs on mca_pml_ob1_send() -> opal_progreses() ->
btl_openib_component_progress() -> poll_device() -> libmlx4-rdmav2.so -> cq
-> phread_spin_unlock
The program can run on TCP with no error.
14 matches
Mail list logo