On Fri, 2 Oct 2015, Felipe Balbi wrote:
> Hi Alan,
>
> here's a question which I hope you can help me understand :-)
>
> Why do we have that kthread for the mass storage gadgets ? I noticed a very
> interesting behavior.
Basically, the mass-storage gadget uses a kthread in order to access
the backing file.
Obviously a kernel driver doesn't _need_ to use a separate thread to
read or write a file (think of /dev/loop, for example). But doing it
that way is very easy because the driver merely has to imitate read(2)
and write(2) system calls, which have a simple and well defined
interface. To do anything more direct would mean getting deeply
involved in the block and filesystem layers, and that wasn't my goal --
I wanted to write a USB gadget driver, not re-implement the loop
driver.
Also, remember that at the time the MS Gadget was written, speed over
USB wasn't a pressing issue. You couldn't transfer more than about 40
MB/s anyway, so efficiency in the gadget wasn't a terribly high
priority.
> Basically, the MSC class works in a loop like so:
>
> CBW
> Data Transfer
> CSW
>
> In our implemention, what we do is:
>
> CBW
> wake_up_process()
> Data Transfer
> wake_up_process()
> CSW
> wake_up_process()
The wake_up_process() call is the notification from the completion
routine telling the kthread that a request has completed. The kthread
doesn't necessarily stop and wait for these notifications; it can
continue processing in parallel.
> Now here's the interesting bit. Every time we wake_up_process(), we basically
> don't do anything until MSC's kthread gets finally scheduled and has a chance
> of
> doing its job.
That's not entirely true; the kthread may already be running. For
example, the default MS gadget uses two I/O buffers (there's a Kconfig
option to use more if you want). When carrying out a READ, the kthread
fills up the first buffer and submits it to the UDC. But it doesn't
stop and wait for wake_up_process(); instead it goes on to fill up the
second buffer while the first is being sent back to the host. The
wake_up_process() may very well occur while the second buffer is being
filled, in which case it won't do anything (since the kthread isn't
asleep).
> This means that the host keeps sending us tokens but the UDC
> doesn't have any request queued to start a transfer. This happens specially
> with
> IN endpoints, not so much on OUT directions. See figure one [1] we can see
> that
> host issues over 7 POLLs before UDC has finally started a usb_request,
> sometimes
> this goes for even longer (see image [3]).
Figure [1] is misleading. The 7 POLLs you see are at the very start of
a READ. It's not surprising that the host can poll 7 times before the
gadget manages to read the first block of data from the backing file.
This isn't a case where the kthread could have been doing something
useful instead of waiting around to hear from the host.
Figure [3] is difficult to interpret because it doesn't include the
transfer lengths. I can't tell what was going on during the 37 + 50
POLLs before the first OUT transfer. If that was a CDB starting a new
command, I don't see why it would take that long to schedule the
kthread unless the CPU was busy with other tasks.
> On figure two we can see that on this particular session, I had as much as 15%
> of the bandwidth wasted on POLLs. With this current setup I'm 34MB/sec and
> with
> the added 15% that would get really close to 40MB/sec.
So high speed, right? Are the numbers in the figure handshake _counts_
or handshake _times_? A simple NAK doesn't use much bandwidth. Even
if 15% of the handshakes are NAKs, it doesn't mean you're wasting 15%
of the bandwidth.
> So the question is, why do we have to wait for that kthread to get scheduled ?
> Why couldn't we skip it completely ? Is there really anything left in there
> that
> couldn't be done from within usb_request->complete() itself ?
The real answer is the calls to vfs_read() and vfs_write() -- those
have to occur in process context.
In theory, the CBW packet doesn't have to be processed by the kthread.
But processing it in the completion routine wouldn't help, because the
kthread would still have to be scheduled in order to carry out the READ
or WRITE command. In other words, changing:
Completion handler Kthread
------------------ -------
Receive CBW
Wake up kthread
Process CBW
Start READ or WRITE
into:
Completion handler Kthread
------------------ -------
Receive CBW
Process CBW
Wake up kthread
Start READ or WRITE
doesn't really give any significant advantage.
> I'll spend some time on that today and really dig that thing up, but if you
> know
> the answer off the top of your head, I'd be happy to hear.
I hope this explains the issues clearly enough.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html