Thread model in QEMU

Anthony Liguori Mon, 29 Mar 2010 14:29:19 -0700

On 03/29/2010 03:54 PM, Avi Kivity wrote:

On 03/29/2010 11:42 PM, Anthony Liguori wrote:
For individual device models or host services, I think (3) isprobably the worst model overall. I personally think that (1) isbetter in the long run but ultimately would need an existence proofto compare against (2). (2) looks appealing until you actually tryto have the device handle multiple requests at a time.
Sooner or later nature and the ever more complicated code will forceus towards (3). As an example, we've observed live migration tothrottle vcpus when sending a large guest's zeroed memory over; thebandwidth control doesn't kick in since zero pages are compressed,so the iothread spends large amounts of time reading memory.
Making things re-entrant is different than (3) in my mind.
There's no reason that VCPU threads should run in lock-step with livemigration during the live phase. Making device models re-entrant andmaking live migration depend not depend on the big global lock is agood thing to do.
It's not sufficient. If you have a single thread that runs both livemigrations and timers, then timers will be backlogged behind livemigration, or you'll have to yield often. This is regardless of thelocking model (and of course having threads without fixing the lockingis insufficient as well, live migration accesses guest memory so itneeds the big qemu lock).

But what's the solution? Sending every timer in a separate thread?We'll hit the same problem if we implement an arbitrary limit to numberof threads.

What I'm skeptical of, is whether converting virtio-9p or qcow2 tohandle each request in a separate thread is really going to improvethings.
Currently qcow2 isn't even fullly asynchronous, so it can't fail toimprove things.

Unless it introduces more data corruptions which is my concern with anysignificant change to qcow2.

The VNC server is another area that I think multithreading would be abad idea.
If the vnc server is stuffing a few megabytes of screen into a socket,then timers will be delayed behind it, unless you litter the code withcalls to bottom halves. Even worse if it does complicated compressionand encryption.

Sticking the VNC server in it's own thread would be fine. Trying tomake the VNC server multithreaded though would be problematic.

Basically, sticking isolated components in a single thread should bepretty reasonable.

But if those system calls are blocking, you need a thread?
You can dispatch just the system call to a thread pool. Theadvantage of doing that is that you don't need to worry about lockingsince the system calls are not (usually) handling shared state.
There is always implied shared state. If you're doing direct guestmemory access, you need to lock memory against hotunplug, or thesyscall will end up writing into freed memory. If the device can behotunplugged, you need to make sure all threads have returned beforeunplugging it.

There are other ways to handle hot unplug (like reference counting) thatavoid this problem.

Ultimately, this comes down to a question of lock granularity and threadgranularity. I don't think it's a good idea to start with theassumption that we want extremely fine granularity. There's certainlyvery low hanging fruit with respect to threading.

On a philosophical note, threads may be easier to model complexhardware that includes a processor, for example our scsi card (andhow about using tcg as a jit to boost it :)
Yeah, it's hard to argue that script evaluation shouldn't be done ina thread. But that doesn't prevent me from being very cautious abouthow and where we use threading :-)
Caution where threads are involved is a good thing. They areinevitable however, IMO.

We already are using threads so they aren't just inevitable, they'rereality. I still don't think using threads would significantly simplifyvirtio-9p.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Thread model in QEMU

Reply via email to