On Thu, Nov 05, 2015 at 02:33:44PM -0200, Guilherme G. Piccoli wrote:
> Hello Shlomo and Or,
>
> I'm Guilherme Piccoli from LTC/IBM - firstly, sorry to bother you.
>
>
> We are running some tests with iSCSI and we found an issue caused possibly
> by commit 659743b02c41 ("libiscsi: Reduce locking contention in fast path").
>
> After some time (+/- 1 hour) of testing with a hardware target (using fio
> benchmark tool), we got a kernel oops; the following link is a pastebin of
> the error message (we got lots of these messages, since our system has
> multiple cores): http://codepad.org/KS2C9Jjt
Interesting. From the trace, the list debugging code is detecting
corruption when removing a task from some list. Could be the connection
mgmtqueue, cmdqueue, or requeue.
After the locking change adding a task to any of those lists is under
the session fwrd_lock, but the call to iscsi_complete_task which deletes
the task from whatever list it's on is under the back_lock.
Am I missing something, or is splitting a linked list across two locks a
major failing of this change?
- Chris
> With some debugging, we could find the exact point of the crash, caused by a
> null-pointer read: sc == NULL on sc->device->lun at libiscsi.c:369. But as
> you can see in error messages, some list issue seems to be possibly leading
> to this null-pointer situation.
>
> After reverting the aforementioned commit, the issue is gone and we can run
> the benchmark many times without a single failure. The issue is hard to
> reproduce; we only were able to reproduce in high bandwidth environment
> (10Gb network) with the our hardware target (IBM FlashSystem 840). Notice
> that from the initiator side we're using software iSCSI
> (iscsi_tcp/libiscsi_tcp).
>
>
> We'd really appreciate if you could give us some directions to help us
> figuring what's going on - what path might have been taken leading to that
> null pointer read? It's hard to debug since I'm no expert in iSCSI, so any
> clues or suggestions you can provide would be really appreciated and
> helpful.
>
> Any additional information you want, please let me know and I'd be glad to
> provide. Again, sorry to bother you.
>
> Thanks in advance,
>
>
>
> Guilherme
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.