On Sun, 29 Oct 2006 01:34:06 -0700, Gleb Natapov <gl...@voltaire.com> wrote:

If you use OB1 PML (default one) it will never recover from link down
error no matter how many other transports you have. The reason is that
OB1 never tracks what happens with buffers submitted to BTL. So if BTL
can't, for any reason, transmit packet passed to it by OB1 the job will
stuck because OB1 doesn't have enough information to try to resend the
packet via another transport. For this kind of resource tracking there
is DR PML. In case of IB BTL link down event generates error for each
packet submitted for sending to the device. IB BTL simply discards all
this packets and relies on PML to resend them, so even after link up
event a job will not recover if OB1 PML is used with IB BTL. This may be
different with another transports.

This makes sense; one thing I'm wondering now is if the OB1 PML is able (and/or had enough information) to figure out that it can't continue at all, and will abort the job.
--
Troy Telford

Reply via email to