On Saturday 12 December 2009, Dave Airlie wrote: > So I've been musing on the addition of some sort of 3D passthrough for > qemu (as I'm sure have lots of ppl)
IIUC a typical graphics system consists of several operations: 1) Allocate space for data objects[2] on server[1]. 2) Upload data from client to server 3) Issue data processing commands that manipulate (combine/modify) data objects. For example a 3D rasterizer takes an image and sets of coordinates, and writes pixels to an image. 4) display data object to user. 5) Read data back to client. In modern systems this should almost never happen. I'd expect this to be the same for both 2D and 3D subsystems. The only real wart is that some 2D systems do not provide sufficient offload, and some processing is still done by the guest CPU. This means (5) is common, and you're effectively limited to a local implementation. With remote rendering the main difference is that you have a relatively high latency connection between client and server. If you have more than a few round-trips per frame you probably aren't going to get acceptable performance. IIUC this is why remote X connections perform so poorly, the protocol is effectively synchronous so the client must wait for a response from the server before sending the next command. In practical terms this means that the state of the graphics pipeline should not be guest visible. Considering the above pipeline, the only place where guest state is visible is (5). I'd expect that this almost never happens in normal circumstances. The fact the SLI/Crossfire setups can operate in AFR mode supports this theory. One prerequisite for isolating the graphics pipeline is that commands may not fail. I guess this may require step (1) be a synchronous operation. However steps (2), (3) and (4) should be fire-and-forget operations. If step (2) is defined as completing any time between the issue of the upload and the actual use then this allows both local zero-copy and remote explicit- upload implementations. A protocol that meets these requirements should be largely transport agnostic. While a full paravirtual interface may be desirable to squeeze the last bits of performance out, it should be possible to get acceptable performance over e.g. TCP, in the same way that the main benefit of virtio block/net drivers is simplicity and consistency rather than actual performance[3]. My understanding is that Chromium effectively implements the system described above, and I guess the VirtualBox implementation is just a custom transport backend and some modesetting tweaks. I have no specific knowledge of the VMware implementation. Once you have remote rendering the next problem is hotplug. IMO transparently migrating state is not an realistic option. This effectively required mirroring all of the server data on the guest. For source data (i.e. textures) this is fairly trivial. However for intermediate images (think redirected rendering of a 3D application window in a composited environment) this is not feasible. You could try to record the commands used to generate all intermediate data, however this also becomes infeasible. Command sequences may be large, and the original source data may no longer be available. Instead I suggest adding some sort of "damage" notification whereby the server can inform the client that data objects have been lost. When hotplug (switching to a different terminal) occurs we immediately complete all pending commands and report that all objects have been lost. The guest should then re- upload and regenerate as necessary and proceed to render the next frame. I'd expect that clients already have logic to do this as part of the VRAM handling for local video cards. As long as the guest never tries to read back the image, a null implementation is also trivial. Obviously all this is predicated on having a virtual display driver in the guest. For simple framebuffer devices, and actual VGA hardware, our initial premise that GPU state is not guest visible fails. In practice this means that there's little scope for doing remote server size acceleration, and you're reduced to implementing everything in the client and trying to optimize (2). Paul [1] I'm using X client/server terminology. The client is the guest OS and the server is the user's terminal. [2] Data objects include textures/bitmaps, vertex buffers, fragment programs, and probably command buffers. [3] Obviously if you emulate lame hardware like ne2k or IDE then performance will suck. However emulation of a high-end NIC of SCSI HBA should get within spitting distance of virtio.