On Tuesday, November 25, 2014 9:42:59 AM UTC-8, Mike Christie wrote: > > On 11/24/2014 11:04 AM, The Lee-Man wrote: > > Okay, I spent most the day yesterday playing with the > > code in question, i.e. the open-iscsi code that rescans > > the session list looking for the current session. > > > > In particular, I was looking at update_sessions(). > > > > One thing I noticed is that this code only gets executed > > if discovery.sendtargets.use_discoveryd is set to Yes for > > a particular target, by the way. > > So how does your interconnect test come into play for this issue? It > seems like you should be hitting this issue all the time even when the > connection is ok, because that code polls every N seconds.
When the connections are being dropped and then reconnected, the sessions (created by the kernel) are coming and going. And each time a connection goes away and comes back, it gets a new session ID, and a new symlink in /sys/class/iscsi_sessions, which of course is not cached. (This is my theory, since I don't have a configuration which can test this ATM.) This is backed up by the fact that I see lot of messages like: Oct 13 13:00:09 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:09 bumble1 iscsid: could not find session info for session11 Oct 13 13:00:09 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:09 bumble1 iscsid: could not find session info for session11 Oct 13 13:00:10 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:10 bumble1 iscsid: could not find session info for session5 Oct 13 13:00:10 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:10 bumble1 iscsid: could not find session info for session5 Oct 13 13:00:10 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:10 bumble1 iscsid: could not find session info for session10 Oct 13 13:00:11 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:11 bumble1 iscsid: could not find session info for session11 Oct 13 13:00:12 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:12 bumble1 iscsid: could not find session info for session11 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session13 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session17 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session14 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session18 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session15 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session16 Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:13 bumble1 iscsid: could not find session info for session19 Oct 13 13:00:14 bumble1 iscsid: could not read session targetname: 5 Oct 13 13:00:14 bumble1 iscsid: could not find session info for session20 So not only are we doing I/O to discover info about new sessions, but existing sessions are going away, causing lots of error I/O output, again swamping the time taken to compute things. > > > Bottom line: I did not find any way to significantly speed > > up the search other than the "cache the last session" > > patch I already posted. > > Can you explain the problem again? I thought originally you were > thinking it was due to the sysfs operations taking too long and then > compounded by them being repeated. However, I thought for the discoveryd > daemon process, sysfs.c is caching that info, so we are not actually > reading from sysfs every time. > > Is the issue just a normal old bad search cpu time type of issue and not > really sysfs read/scan operations taking a long time? If so, then the > patch makes sense. > I know sysfs attributes are cached here, after spending a day playing with the code. And, as I said, I'm guessing as to the "/sysfs read delays" part, since I can't recreate the problem. But I'm sure it is not the sort causing the delay, since the supplied patch fixed the problem, and the sort is still present. Think about it: sorting one or two hundred entries is not going to take very long compared to reading a dozen attributes for the new sessions since the last discoveryd scan. It seems to me like the problem should be related to the discoveryd poll time: if that poll time is less than the time it takes to rescan sessions, then things will back up. So even if we didn't have a problem, for example, with the default rescan time of 30 seconds, what happens if we set the poll time to 10 or 5 seconds? In simple terms, this patch just caches the last session, and sorts it to the front of the list. After the patch, if there was still a CPU-bound issue, could scan the cached session first before even building and sorting the session list. I will leave that for the next update, if and when it's needed. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
