Re: [OMPI users] openmpi-1.10.2rc3 is slower than 1.4.1

2016-02-26 Thread Eva
Thanks Gilles. Got it. I will run it. 2016-02-26 16:10 GMT+08:00 Eva : > Thanks Gilles. what do you mean " standard MPI benchmark" ? where can I > find it? > > 2016-02-26 14:47 GMT+08:00 Eva : > >> I measure communication time for MPI_Send and end2end training ti

Re: [OMPI users] openmpi-1.10.2rc3 is slower than 1.4.1

2016-02-26 Thread Eva
Thanks Gilles. what do you mean " standard MPI benchmark" ? where can I find it? 2016-02-26 14:47 GMT+08:00 Eva : > I measure communication time for MPI_Send and end2end training time > (including model training and communication time). > MPI1.4.1 is faster than MPI1.10.2:

Re: [OMPI users] openmpi-1.10.2rc3 is slower than 1.4.1

2016-02-26 Thread Eva
I measure communication time for MPI_Send and end2end training time (including model training and communication time). MPI1.4.1 is faster than MPI1.10.2: MPI_Send+MPI_Recv: 2.83% end2end training time: 8.89% 2016-02-26 14:45 GMT+08:00 Eva : > I measure communication time for MPI_Send and end2

Re: [OMPI users] openmpi-1.10.2rc3 is slower than 1.4.1

2016-02-26 Thread Eva
I measure communication time for MPI_Send and end2end training time(including model training and communication time). MPI_Send+MPI_Recv end2end training MPI1.4.1 is faster than MPI1.10.2 2.83% 8.89% 2016-02-24 13:49 GMT+08:00 Eva : > I compile the same program by using 1.4.1

[OMPI users] openmpi-1.10.2rc3 is slower than 1.4.1

2016-02-24 Thread Eva
I compile the same program by using 1.4.1 and 1.10.2rc3 and then run them under the same environment. 1.4.1 is 8.89% faster than 1.10.2rc3. Is there any official performance report for each version upgrade?

Re: [OMPI users] openmpi-1.10.2 cores at mca_coll_libnbc.so

2016-01-26 Thread Eva
No. I didn't use MPI_Type_free Is there any other reason? 2016-01-26 13:35 GMT+08:00 Eva : > openmpi-1.10.2 cores at mca_coll_libnbc.so > > My program is transferred from 1.8.5 to 1.10.2. But when I run it, it > cores as below. > > Program terminated with signal 11, S

[OMPI users] openmpi-1.10.2 cores at mca_coll_libnbc.so

2016-01-26 Thread Eva
openmpi-1.10.2 cores at mca_coll_libnbc.so My program is transferred from 1.8.5 to 1.10.2. But when I run it, it cores as below. Program terminated with signal 11, Segmentation fault. #0 0x7fa3550f51d2 in ompi_coll_libnbc_igather () from /home/work/wuzhihua/install/openmpi-1.10.2rc3-gcc4.8/

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-22 Thread Eva
ccessfully without any hang for 4 hours now. I will continue to watch its status. Btw, have you fixed any such hang issues from 1.8.5 to 1.10.2? 2016-01-21 20:40 GMT+08:00 Eva : > Thanks Jeff. > > >>1. Can you create a small example to reproduce the problem? > > >

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
process1/process3: foreach to_id in process0, process2: MPI_Send(send_buf, sendlen, to_id, TAG); MPI_Recv(recv_buf, recvlen, to_id, TAG); process0/process2: while(true): MPI_recv(recv_buf, any_source, TAG); MPI_Send(send_buf, source_id, TAG); 201

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
debug into the openib source code to find the root cause with your instructions or guide? 2016-01-21 17:03 GMT+08:00 Eva : > Gilles, > >>Can you try to > >>mpirun --mca btl tcp,self --mca btl_tcp_eager_limit 56 ... > >>and confirm it works fine with TCP *and* without eag

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Gilles, >>Can you try to >>mpirun --mca btl tcp,self --mca btl_tcp_eager_limit 56 ... >>and confirm it works fine with TCP *and* without eager ? I have tried this and it works. So what should I do next? 2016-01-21 16:25 GMT+08:00 Eva : > Thanks Gilles. > it works fine o

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Thanks Gilles. it works fine on tcp So I use this to disable eager: -mca btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 2016-01-21 13:10 GMT+08:00 Eva : > I run with two machines, 2 process per node: process0, process1, process2, > process3. > After some random

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Eva
Bsend but it still hangs. The same program works fine on TCP for more than one year. After I move it onto rdma, it starts to hang. And I can't debug into any rdma details 2016-01-21 11:24 GMT+08:00 Eva : > Run MPI_Send on MPI1.8.5 without multithread enabled: > it han

[OMPI users] MPI hangs on poll_device() with rdma

2016-01-20 Thread Eva
Run MPI_Send on MPI1.8.5 without multithread enabled: it hangs on mca_pml_ob1_send() -> opal_progreses() -> btl_openib_component_progress() -> poll_device() -> libmlx4-rdmav2.so -> cq -> phread_spin_unlock The program can run on TCP with no error.