Hi,

I've been working on a random segmentation fault that seems to occur during a 
collective communication when using the openib btl (see [OMPI users] [openib] 
segfault when using openib btl).

During my tests, I've come across different issues reported by OpenMPI-1.4.2:

1/ 
[[12770,1],0][btl_openib_component.c:3227:handle_wc] from bn0103 to: bn0122 
error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 
560618664 opcode 1  vendor error 105 qp_idx 3

2/
[[992,1],6][btl_openib_component.c:3227:handle_wc] from pbn04 to: pbn05 error 
polling LP CQ with status REMOTE ACCESS ERROR status number 10 for wr_id 
162858496 opcode 1  vendor error 136 qp_idx 
0[[992,1],5][btl_openib_component.c:3227:handle_wc] from pbn05 to: pbn04 error 
polling HP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 
485900928 opcode 0  vendor error 249 
qp_idx 0

--------------------------------------------------------------------------
The OpenFabrics stack has reported a network error event.  Open MPI will try to 
continue, but your job may end up failing.

  Local host:        p'"
  MPI process PID:   20743
  Error number:      3 (IBV_EVENT_QP_ACCESS_ERR)

This error may indicate connectivity problems within the fabric; please contact 
your system administrator.
--------------------------------------------------------------------------

I'd like to know what these two errors mean and where they come from.

Thanks for your help,
Eloi

-- 


Eloi Gaudry

Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959

Reply via email to