Hi, Could someone have a look on these two different error messages ? I'd like to know the reason(s) why they were displayed and their actual meaning.
Thanks, Eloi On Monday 19 July 2010 16:38:57 Eloi Gaudry wrote: > Hi, > > I've been working on a random segmentation fault that seems to occur during > a collective communication when using the openib btl (see [OMPI users] > [openib] segfault when using openib btl). > > During my tests, I've come across different issues reported by > OpenMPI-1.4.2: > > 1/ > [[12770,1],0][btl_openib_component.c:3227:handle_wc] from bn0103 to: bn0122 > error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for > wr_id 560618664 opcode 1 vendor error 105 qp_idx 3 > > 2/ > [[992,1],6][btl_openib_component.c:3227:handle_wc] from pbn04 to: pbn05 > error polling LP CQ with status REMOTE ACCESS ERROR status number 10 for > wr_id 162858496 opcode 1 vendor error 136 qp_idx > 0[[992,1],5][btl_openib_component.c:3227:handle_wc] from pbn05 to: pbn04 > error polling HP CQ with status WORK REQUEST FLUSHED ERROR status number 5 > for wr_id 485900928 opcode 0 vendor error 249 qp_idx 0 > > -------------------------------------------------------------------------- > The OpenFabrics stack has reported a network error event. Open MPI will > try to continue, but your job may end up failing. > > Local host: p'" > MPI process PID: 20743 > Error number: 3 (IBV_EVENT_QP_ACCESS_ERR) > > This error may indicate connectivity problems within the fabric; please > contact your system administrator. > -------------------------------------------------------------------------- > > I'd like to know what these two errors mean and where they come from. > > Thanks for your help, > Eloi -- Eloi Gaudry Free Field Technologies Company Website: http://www.fft.be Company Phone: +32 10 487 959