On 03/29/2010 11:42 PM, Anthony Liguori wrote:
For individual device models or host services, I think (3) is
probably the worst model overall. I personally think that (1) is
better in the long run but ultimately would need an existence proof
to compare against (2). (2) looks appealing until you actually try
to have the device handle multiple requests at a time.
Sooner or later nature and the ever more complicated code will force
us towards (3). As an example, we've observed live migration to
throttle vcpus when sending a large guest's zeroed memory over; the
bandwidth control doesn't kick in since zero pages are compressed, so
the iothread spends large amounts of time reading memory.
Making things re-entrant is different than (3) in my mind.
There's no reason that VCPU threads should run in lock-step with live
migration during the live phase. Making device models re-entrant and
making live migration depend not depend on the big global lock is a
good thing to do.
It's not sufficient. If you have a single thread that runs both live
migrations and timers, then timers will be backlogged behind live
migration, or you'll have to yield often. This is regardless of the
locking model (and of course having threads without fixing the locking
is insufficient as well, live migration accesses guest memory so it
needs the big qemu lock).
What I'm skeptical of, is whether converting virtio-9p or qcow2 to
handle each request in a separate thread is really going to improve
things.
Currently qcow2 isn't even fullly asynchronous, so it can't fail to
improve things.
The VNC server is another area that I think multithreading would be a
bad idea.
If the vnc server is stuffing a few megabytes of screen into a socket,
then timers will be delayed behind it, unless you litter the code with
calls to bottom halves. Even worse if it does complicated compression
and encryption.
But if those system calls are blocking, you need a thread?
You can dispatch just the system call to a thread pool. The advantage
of doing that is that you don't need to worry about locking since the
system calls are not (usually) handling shared state.
There is always implied shared state. If you're doing direct guest
memory access, you need to lock memory against hotunplug, or the syscall
will end up writing into freed memory. If the device can be
hotunplugged, you need to make sure all threads have returned before
unplugging it.
On a philosophical note, threads may be easier to model complex
hardware that includes a processor, for example our scsi card (and
how about using tcg as a jit to boost it :)
Yeah, it's hard to argue that script evaluation shouldn't be done in a
thread. But that doesn't prevent me from being very cautious about
how and where we use threading :-)
Caution where threads are involved is a good thing. They are inevitable
however, IMO.
--
Do not meddle in the internals of kernels, for they are subtle and quick to
panic.