Hello Shlomo and Or,

I'm Guilherme Piccoli from LTC/IBM - firstly, sorry to bother you.


We are running some tests with iSCSI and we found an issue caused possibly by commit 659743b02c41 ("libiscsi: Reduce locking contention in fast path").

After some time (+/- 1 hour) of testing with a hardware target (using fio benchmark tool), we got a kernel oops; the following link is a pastebin of the error message (we got lots of these messages, since our system has multiple cores): http://codepad.org/KS2C9Jjt

With some debugging, we could find the exact point of the crash, caused by a null-pointer read: sc == NULL on sc->device->lun at libiscsi.c:369. But as you can see in error messages, some list issue seems to be possibly leading to this null-pointer situation.

After reverting the aforementioned commit, the issue is gone and we can run the benchmark many times without a single failure. The issue is hard to reproduce; we only were able to reproduce in high bandwidth environment (10Gb network) with the our hardware target (IBM FlashSystem 840). Notice that from the initiator side we're using software iSCSI (iscsi_tcp/libiscsi_tcp).


We'd really appreciate if you could give us some directions to help us figuring what's going on - what path might have been taken leading to that null pointer read? It's hard to debug since I'm no expert in iSCSI, so any clues or suggestions you can provide would be really appreciated and helpful.

Any additional information you want, please let me know and I'd be glad to provide. Again, sorry to bother you.

Thanks in advance,



Guilherme

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to