Re: [Qemu-devel] approaches to 3D virtualisation

Paul Brook Mon, 14 Dec 2009 04:04:17 -0800

On Saturday 12 December 2009, Dave Airlie wrote:
> So I've been musing on the addition of some sort of 3D passthrough for
> qemu (as I'm sure have lots of ppl)


IIUC a typical graphics system consists of several operations:

1) Allocate space for data objects[2] on server[1].
2) Upload data from client to server
3) Issue data processing commands that manipulate (combine/modify) data 
objects. For example a 3D rasterizer takes an image and sets of coordinates, 
and writes pixels to an image.
4) display data object to user.
5) Read data back to client. In modern systems this should almost never 
happen.

I'd expect this to be the same for both 2D and 3D subsystems. The only real 
wart is that some 2D systems do not provide sufficient offload, and some 
processing is still done by the guest CPU. This means (5) is common, and 
you're effectively limited to a local implementation.

With remote rendering the main difference is that you have a relatively high 
latency connection between client and server. If you have more than a few 
round-trips per frame you probably aren't going to get acceptable performance. 
IIUC this is why remote X connections perform so poorly, the protocol is 
effectively synchronous so the client must wait for a response from the server 
before sending the next command.

In practical terms this means that the state of the graphics pipeline should 
not be guest visible. Considering the above pipeline, the only place where 
guest state is visible is (5).  I'd expect that this almost never happens in 
normal circumstances. The fact the SLI/Crossfire setups can operate in AFR 
mode supports this theory.

One prerequisite for isolating the graphics pipeline is that commands may not 
fail. I guess this may require step (1) be a synchronous operation. However 
steps (2), (3) and (4) should be fire-and-forget operations.

If step (2) is defined as completing any time between the issue of the upload 
and the actual use then this allows both local zero-copy and remote explicit-
upload implementations.

A protocol that meets these requirements should be largely transport agnostic. 
While a full paravirtual interface may be desirable to squeeze the last bits 
of performance out, it should be possible to get acceptable performance over 
e.g. TCP, in the same way that the main benefit of virtio block/net drivers is 
simplicity and consistency rather than actual performance[3].

My understanding is that Chromium effectively implements the system described 
above, and I guess the VirtualBox implementation is just a custom transport 
backend and some modesetting tweaks. I have no specific knowledge of the 
VMware implementation.


Once you have remote rendering the next problem is hotplug.
IMO transparently migrating state is not an realistic option. This effectively 
required mirroring all of the server data on the guest. For source data (i.e. 
textures) this is fairly trivial. However for intermediate images (think 
redirected rendering of a 3D application window in a composited environment) 
this is not feasible. You could try to record the commands used to generate 
all intermediate data, however this also becomes infeasible. Command sequences 
may be large, and the original source data may no longer be available.

Instead I suggest adding some sort of "damage" notification whereby the server 
can inform the client that data objects have been lost. When hotplug 
(switching to a different terminal) occurs we immediately complete all pending 
commands and report that all objects have been lost. The guest should then re-
upload and regenerate as necessary and proceed to render the next frame. I'd 
expect that clients already have logic to do this as part of the VRAM handling 
for local video cards.

As long as the guest never tries to read back the image, a null implementation 
is also trivial.


Obviously all this is predicated on having a virtual display driver in the 
guest.

For simple framebuffer devices, and actual VGA hardware, our initial premise 
that GPU state is not guest visible fails. In practice this means that there's 
little scope for doing remote server size acceleration, and you're reduced to 
implementing everything in the client and trying to optimize (2).

Paul

[1] I'm using X client/server terminology. The client is the guest OS and the 
server is the user's terminal.
[2] Data objects include textures/bitmaps, vertex buffers, fragment programs, 
and probably command buffers.
[3] Obviously if you emulate lame hardware like ne2k or IDE then performance 
will suck. However emulation of a high-end NIC of SCSI HBA should get within 
spitting distance of virtio.

Re: [Qemu-devel] approaches to 3D virtualisation

Reply via email to