On Tuesday, November 25, 2014 9:42:59 AM UTC-8, Mike Christie wrote:
>
> On 11/24/2014 11:04 AM, The Lee-Man wrote: 
> > Okay, I spent most the day yesterday playing with the 
> > code in question, i.e. the open-iscsi code that rescans 
> > the session list looking for the current session. 
> > 
> > In particular, I was looking at update_sessions(). 
> > 
> > One thing I noticed is that this code only gets executed 
> > if discovery.sendtargets.use_discoveryd is set to Yes for 
> > a particular target, by the way. 
>
> So how does your interconnect test come into play for this issue? It 
> seems like you should be hitting this issue all the time even when the 
> connection is ok, because that code polls every N seconds.


When the connections are being dropped and then reconnected, the
sessions (created by the kernel) are coming and going. And each time
a connection goes away and comes back, it gets a new session ID,
and a new symlink in /sys/class/iscsi_sessions, which of course
is not cached. (This is my theory, since I don't have a configuration
which can test this ATM.)

This is backed up by the fact that I see lot of messages like:

Oct 13 13:00:09 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:09 bumble1 iscsid: could not find session info for session11
Oct 13 13:00:09 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:09 bumble1 iscsid: could not find session info for session11
Oct 13 13:00:10 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:10 bumble1 iscsid: could not find session info for session5
Oct 13 13:00:10 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:10 bumble1 iscsid: could not find session info for session5
Oct 13 13:00:10 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:10 bumble1 iscsid: could not find session info for session10
Oct 13 13:00:11 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:11 bumble1 iscsid: could not find session info for session11
Oct 13 13:00:12 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:12 bumble1 iscsid: could not find session info for session11
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session13
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session17
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session14
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session18
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session15
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session16
Oct 13 13:00:13 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:13 bumble1 iscsid: could not find session info for session19
Oct 13 13:00:14 bumble1 iscsid: could not read session targetname: 5
Oct 13 13:00:14 bumble1 iscsid: could not find session info for session20

So not only are we doing I/O to discover info about new sessions, but
existing sessions are going away, causing lots of error I/O output, again
swamping the time taken to compute things.


> 
> > Bottom line: I did not find any way to significantly speed 
> > up the search other than the "cache the last session" 
> > patch I already posted. 
>
> Can you explain the problem again? I thought originally you were 
> thinking it was due to the sysfs operations taking too long and then 
> compounded by them being repeated. However, I thought for the discoveryd 
> daemon process, sysfs.c is caching that info, so we are not actually 
> reading from sysfs every time. 
>
> Is the issue just a normal old bad search cpu time type of issue and not 
> really sysfs read/scan operations taking a long time? If so, then the 
> patch makes sense. 
>

I know sysfs attributes are cached here, after spending a day playing
with the code. And, as I said, I'm guessing as to the "/sysfs read delays"
part, since I can't recreate the problem. But I'm sure it is not the sort
causing the delay, since the supplied patch fixed the problem, and the
sort is still present.

Think about it: sorting one or two hundred entries is not going to take
very long compared to reading a dozen attributes for the new sessions
since the last discoveryd scan.

It seems to me like the problem should be related to the discoveryd
poll time: if that poll time is less than the time it takes to rescan
sessions, then things will back up. So even if we didn't have
a problem, for example, with the default rescan time of 30 seconds,
what happens if we set the poll time to 10 or 5 seconds?

In simple terms, this patch just caches the last session, and
sorts it to the front of the list.

After the patch, if there was still a CPU-bound issue, could
scan the cached session first before even building and sorting
the session list. I will leave that for the next update, if and
when it's needed.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to