On 03/29/2010 03:31 PM, Avi Kivity wrote:
On 03/29/2010 06:00 PM, Anthony Liguori wrote:
In qemu, we tend to prefer state-machine based code.
There are a few different models that we have discussed at different
points:
1) state machines with a thread pool to make blocking functions
asynchronous (what we have today)
2) co-operative threading
3) pre-emptive threading
All three models are independent of making device models re-entrant.
In order to allow VCPU threads to run in qemu simultaneously, we need
each device to carry a lock and for that lock to be acquired upon any
of the public functions for the device model.
For individual device models or host services, I think (3) is
probably the worst model overall. I personally think that (1) is
better in the long run but ultimately would need an existence proof
to compare against (2). (2) looks appealing until you actually try
to have the device handle multiple requests at a time.
Sooner or later nature and the ever more complicated code will force
us towards (3). As an example, we've observed live migration to
throttle vcpus when sending a large guest's zeroed memory over; the
bandwidth control doesn't kick in since zero pages are compressed, so
the iothread spends large amounts of time reading memory.
Making things re-entrant is different than (3) in my mind.
There's no reason that VCPU threads should run in lock-step with live
migration during the live phase. Making device models re-entrant and
making live migration depend not depend on the big global lock is a good
thing to do.
What I'm skeptical of, is whether converting virtio-9p or qcow2 to
handle each request in a separate thread is really going to improve
things. The VNC server is another area that I think multithreading
would be a bad idea.
We could fix this by yielding every so often (and a patch will be
posted soon), but it remains an issue. We have too much work in the
I/O thread and that defers I/O completion and timers.
For virtio-9p, I don't think (1) is much of a burden to be honest.
In particular, when the 9p2000.L dialect is used, there should be a
1-1 correlation between protocol operations and system calls which
means that for the most part, there's really no advantage to
threading from a complexity point of view.
But if those system calls are blocking, you need a thread?
You can dispatch just the system call to a thread pool. The advantage
of doing that is that you don't need to worry about locking since the
system calls are not (usually) handling shared state.
On a philosophical note, threads may be easier to model complex
hardware that includes a processor, for example our scsi card (and how
about using tcg as a jit to boost it :)
Yeah, it's hard to argue that script evaluation shouldn't be done in a
thread. But that doesn't prevent me from being very cautious about how
and where we use threading :-)
Regards,
Anthony Liguori