On 03/29/2010 03:54 PM, Avi Kivity wrote:
On 03/29/2010 11:42 PM, Anthony Liguori wrote:
For individual device models or host services, I think (3) is
probably the worst model overall. I personally think that (1) is
better in the long run but ultimately would need an existence proof
to compare against (2). (2) looks appealing until you actually try
to have the device handle multiple requests at a time.
Sooner or later nature and the ever more complicated code will force
us towards (3). As an example, we've observed live migration to
throttle vcpus when sending a large guest's zeroed memory over; the
bandwidth control doesn't kick in since zero pages are compressed,
so the iothread spends large amounts of time reading memory.
Making things re-entrant is different than (3) in my mind.
There's no reason that VCPU threads should run in lock-step with live
migration during the live phase. Making device models re-entrant and
making live migration depend not depend on the big global lock is a
good thing to do.
It's not sufficient. If you have a single thread that runs both live
migrations and timers, then timers will be backlogged behind live
migration, or you'll have to yield often. This is regardless of the
locking model (and of course having threads without fixing the locking
is insufficient as well, live migration accesses guest memory so it
needs the big qemu lock).
But what's the solution? Sending every timer in a separate thread?
We'll hit the same problem if we implement an arbitrary limit to number
of threads.
What I'm skeptical of, is whether converting virtio-9p or qcow2 to
handle each request in a separate thread is really going to improve
things.
Currently qcow2 isn't even fullly asynchronous, so it can't fail to
improve things.
Unless it introduces more data corruptions which is my concern with any
significant change to qcow2.
The VNC server is another area that I think multithreading would be a
bad idea.
If the vnc server is stuffing a few megabytes of screen into a socket,
then timers will be delayed behind it, unless you litter the code with
calls to bottom halves. Even worse if it does complicated compression
and encryption.
Sticking the VNC server in it's own thread would be fine. Trying to
make the VNC server multithreaded though would be problematic.
Basically, sticking isolated components in a single thread should be
pretty reasonable.
But if those system calls are blocking, you need a thread?
You can dispatch just the system call to a thread pool. The
advantage of doing that is that you don't need to worry about locking
since the system calls are not (usually) handling shared state.
There is always implied shared state. If you're doing direct guest
memory access, you need to lock memory against hotunplug, or the
syscall will end up writing into freed memory. If the device can be
hotunplugged, you need to make sure all threads have returned before
unplugging it.
There are other ways to handle hot unplug (like reference counting) that
avoid this problem.
Ultimately, this comes down to a question of lock granularity and thread
granularity. I don't think it's a good idea to start with the
assumption that we want extremely fine granularity. There's certainly
very low hanging fruit with respect to threading.
On a philosophical note, threads may be easier to model complex
hardware that includes a processor, for example our scsi card (and
how about using tcg as a jit to boost it :)
Yeah, it's hard to argue that script evaluation shouldn't be done in
a thread. But that doesn't prevent me from being very cautious about
how and where we use threading :-)
Caution where threads are involved is a good thing. They are
inevitable however, IMO.
We already are using threads so they aren't just inevitable, they're
reality. I still don't think using threads would significantly simplify
virtio-9p.
Regards,
Anthony Liguori