Hi, this is a partial version of a "qdev for programmers" document I've been working on. Comments are welcome.
Paolo --------------------------------- 8< --------------------------------- == qdev overview and concepts == qdev is the factory interface that QEMU uses to create guest devices and connect them to each other. It also provides a uniform way to expose host devices (character, network and block) to the guest. In the remainder, unless specified explicitly, "device" will refer to _guest_ devices. qdev exposes a device tree that alternates buses (qbuses) and devices (qdevs). The root of the tree is the system bus SysBus. Devices can be leaves, or they can expose buses and talk to the devices on those buses. Such relation does not cover host counterparts of the devices, which are not part of the device tree. A device's interaction occurs by invoking services specific to the kind of bus. In general, if a device or bus X wants to requests something from Y, X needs to know the type of Y, and of course needs to have a pointer to it. In a properly "qdevified" board, these assumptions hold: - qdev enforces what bus a device is placed on; - buses are not user-visible; - initialization of buses is driven exclusively by the parent device, and initialization of devices is driven by the parent bus and a well-defined set of properties (defined per-device); - buses do not know what device exposes them; - devices do not know what device exposes their bus. With these assumptions in place, leaf devices are the simplest to understand. They only make requests to the bus and/or to the character/block/network subsystems; and possibly, they provide services (routines) used by the bus and the grandparent device. Intermediate devices also have to provide glue between their parent bus and their child bus(es), and buses likewise glue two devices. Depending on the kind of bus and the relationship of a device with the bus (parent or child), different sets of services may be defined. For example, a SCSI bus mediates many kinds of interaction: - from a SCSI controller to a SCSI device (e.g., start process this command); - from a SCSI bus to a child device (e.g., cancel this command due to a bus reset); - from a SCSI bus to its parent controller (e.g., this piece of data was sent to you by a SCSI device); - from a SCSI controller to its child bus (e.g., I dealt with this data, please transfer more); - from a SCSI device to its parent bus (e.g., please pass this data on to the controller); In general, the following rules and best practices are common: - devices interact with their parent bus, and vice versa; - buses interact with their parent devices, and vice versa; - occasionally, devices may interact directly with their grandchildren devices, but _not_ vice versa; interaction with the grandfather device is mediated by the parent bus; - in addition, devices interact freely with their host counterparts (that is, character/block/network devices). qdev defines a set of data structures, and devices use them to expose metainformation to the rest of QEMU and to the user. The qdev system is object-oriented; qdev data structures can be subclassed and used to store additional information, including function pointers for bus- specific services. The remainder of this document explains how to define and use these data structures. == Implementation techniques == qdev exposes an object-oriented mechanism in C through containment (C replacement for inheritance, so to speak) and tries to make this as type-safe as possible by leveraging the DO_UPCAST macro. Sample structure definitions for a superclass and subclass are as follows: typedef struct Superclass { int field1; int field2; struct Superclass *field3; } Superclass; typedef struct Subclass { struct Superclass sup; int subfield1; int subfield2; } Subclass; In many cases, C programmers pass such objects using an opaque pointer (void *). These are then casted to the appropriate subtype like void func (void *opaque) { Subclass *s = (Subclass *) opaque; ... } QEMU prefers to always use a more type-safe approach that passes the pointer to the superclass. The cast is then done using the aforementioned macro: void func (Superclass *state) { /* more typesafe version of (Subclass *) state, that also verifies that &state->sup == state. - First argument: subclass type. - Second argument: field being accessed. - Third argument: variable being casted. */ Subclass *sub = DO_UPCAST(Subclass, sup, state); } Casts to a superclass are done with &state->sup. This scheme is quite handy to use, even though may be a bit strange-looking at the beginning. == qdev data structures == This part of the document explains the data structures used by qdev. These include the class hierarchies for buses and devices, together with the corresponding metaclass hierarchies, and a registry of devices and corresponding metainformation. === Bus and device hierarchies === Buses and devices reside on two parallel hierarchies, BusState and DeviceState. Devices that work on the same bus usually share a superclass. Hence, each bus defines a subclass of BusState and an abstract subclass of DeviceState. Each device then adds its concrete subclass in the DeviceState hierarchy. For example: BusState PCIBus ISABus i2c_bus DeviceState PCIState /* bus common superclass */ LSIState /* device-specific class */ ... ISADevice IB700State ISASerialState ... i2c_slave WM8750State ... Here is how tasks are separated between these classes: 1) bus classes (e.g. i2c_bus) are usually the least interesting of all. Their fields are mostly private and used at device creation time. For example, you could place here the highest IRQ allocated to devices on the bus. In some cases it is even absent, for example the SysBus reuses BusState. 2) bus superclasses (e.g. i2c_slave) typically include the address of the device and the interrupt lines that it is connected to. 3) device subclasses contain device-specific configuration information (e.g. the character or block devices to connect to) and registers. === Describing qdev data structures === In addition to defining the structs, each bus and device should describe them as "properties". Since the description that a device exposes is shared between the bus superclasses and the device subclasses, a device is described completely by the union of "bus properties" (representing fields of the abstract per-bus superclass) and "device properties" (representing fields of the device subclass). Example: /* This is a bus superclass */ struct i2c_slave { DeviceState qdev; I2CSlaveInfo *info; /* explained later */ uint8_t address; }; /* This is how we explain it to QEMU */ static struct BusInfo i2c_bus_info = { .name = "I2C", .size = sizeof(i2c_bus), .props = (Property[]) { /* This means: "address" is an uint8_t property with a default value of 0. Store it in field address of struct i2c_slave. */ DEFINE_PROP_UINT8("address", struct i2c_slave, address, 0), DEFINE_PROP_END_OF_LIST(), } }; /* This is a device that exposes no properties. */ static I2CSlaveInfo wm8750_info = { .qdev.name = "wm8750", .qdev.size = sizeof(WM8750State), /* For migration and save/restore; do not care yet. */ .qdev.vmsd = &vmstate_wm8750, /* These functions are exposed to the bus and possibly to the grandparent device. */ .init = wm8750_init, .event = wm8750_event, .recv = wm8750_rx, .send = wm8750_tx }; Another example: /* ISA defines no bus properties */ static struct BusInfo isa_bus_info = { .name = "ISA", .size = sizeof(ISABus), /* ISA defines a couple of bus-specific callbacks. */ .print_dev = isabus_dev_print, .get_fw_dev_path = isabus_get_fw_dev_path, }; /* However, a parallel port does define device properties: */ static ISADeviceInfo parallel_isa_info = { .qdev.name = "isa-parallel", .qdev.size = sizeof(ISAParallelState), .init = parallel_isa_initfn, .qdev.props = (Property[]) { DEFINE_PROP_UINT32("index", ISAParallelState, index, -1), DEFINE_PROP_HEX32("iobase", ISAParallelState, iobase, -1), DEFINE_PROP_UINT32("irq", ISAParallelState, isairq, 7), DEFINE_PROP_CHR("chardev", ISAParallelState, state.chr), DEFINE_PROP_END_OF_LIST(), }, }; In general, a device may have both bus properties and device properties. Simple examples appropriate for documentation unfortunately don't. :) === Metainformation hierarchy === Above you may have noticed some new type names: BusInfo, I2CSlaveInfo, DeviceInfo. These are the names used to store information on the class: properties of course, and also virtual functions. In some sense these *are* metaclass objects. Their hierarchies mimics the BusState and DeviceState ones. The BusInfo/DeviceInfo hierarchy includes a struct for each abstract class in the BusState/DeviceState hierarchy, and an instance for each concrete class: BusState <=> BusInfo PCIBus -> struct BusInfo pci_bus_info = ... ISABus -> struct BusInfo isa_bus_info = ... i2c_bus -> struct BusInfo i2c_bus_info = ... DeviceState <=> DeviceInfo PCIState <=> PCIDeviceInfo LSIState -> static PCIDeviceInfo lsi_info = ... ... ISADevice <=> ISADeviceInfo IB700State -> static ISADeviceInfo wdt_ib700_info = ... ISASerialState -> static ISADeviceInfo serial_isa_info = ... ... i2c_slave <=> I2CSlaveInfo WM8750State -> static I2CSlaveInfo wm8750_info = ... ... I2CSlaveInfo are the place where devices declare virtual functions requested by the bus, in addition to those already in DeviceInfo. In many cases, these functions correspond to additional "services" that only make sense for that bus (example: event/recv/send in the i2c bus). Sometimes, instead, they replace the ones in the superclass because the bus needs to pass extra information. The init function is always overridden in this way; there is an internal init member in DeviceInfo: typedef int (*qdev_initfn)(DeviceState *dev, DeviceInfo *info); and one per bus, for example: typedef int (*i2c_slave_initfn)(i2c_slave *dev); typedef int (*isa_qdev_initfn)(ISADevice *dev); typedef int (*pci_qdev_initfn)(PCIDevice *dev); Here is the way the I2C bus defines its qdev_initfn in terms of i2c_slave_initfn: static int i2c_slave_qdev_init(DeviceState *dev, DeviceInfo *base) { I2CSlaveInfo *info = DO_UPCAST (I2CSlaveInfo, qdev, base); i2c_slave *s = DO_UPCAST(i2c_slave, qdev, dev); /* Store virtual function table for later use. */ s->info = info; return info->init(s); } === Registering devices and making them public === The last part of qdev is the registry of all devices defined by the target system. This is a fundamental piece of metainformation, because it allows the "-device" option to work, at least for devices that do not rely on DEFINE_PROP_PTR or sysbus_create_varargs (those devices can only be instantiated from QEMU's machine initialization code). Registering a device's name is done with the qdev_register function. This function however is used only internally. The actual function to be used varies per-bus, so that the bus can first perform some checks and do some initialization that is common to all DeviceInfo objects for that bus. To this end, each bus defines a wrapper function that initializes common part of the struct DeviceInfo, and passes it to qdev_register: void i2c_register_slave(I2CSlaveInfo *info) { assert(info->qdev.size >= sizeof(i2c_slave)); info->qdev.init = i2c_slave_qdev_init; info->qdev.bus_info = &i2c_bus_info; qdev_register(&info->qdev); } Each device then calls this function: static void wm8750_register_devices(void) { i2c_register_slave(&wm8750_info); } In turn, wm8750_register_devices is called at startup (as if it was a C++ global constructor; a gcc extension allows to do it in C): device_init(wm8750_register_devices) == Letting buses and devices "talk" == In this part of the document, we will examine the mechanisms by which buses and devices are connected. The first section will explain how buses convert human-readable properties into pointers to internal data structures. The second section will explain how devices take care of creating buses. Finally, we will describe SysBus, which is the root of the qdev system and connects qdev with the rest of the QEMU device model. === Using buses to connect device layers === As mentioned above, buses sit in a unique location, as they have access to services from both the parent device and the child device. As such, they provide the "glue" between two layers of devices. As part of this, they may simply expose some of the services of the parent devices to the children. For example, a USB host controller interface exposes a bus with one or more "ports", and defines a set of functions to operate on ports. USB devices do not operate directly on these functions; they always go through helpers such as this one: void usb_wakeup(USBDevice *dev) { if (dev->remote_wakeup && dev->port && dev->port->ops->wakeup) { dev->port->ops->wakeup(dev); } } Helpers like this makes change easier, for example if a function used to be mandatory and you want to make it optional. Another very important piece of glue is initialization. When the bus's init function is called, properties have been set already and the parent bus is known too. hence the bus has the occasion to take the values of the properties, and convert them to pointers for internal data structures (or for example qemu_irqs). Here is an example: 1) the bus defines a property (irq, the IRQ number): static struct BusInfo spapr_vio_bus_info = { .name = "spapr-vio", .size = sizeof(VIOsPAPRBus), .props = (Property[]) { DEFINE_PROP_UINT32("irq", VIOsPAPRDevice, vio_irq_num, 0), DEFINE_PROP_END_OF_LIST(), }, }; 2) the bus init function talks to the parent device (spapr) in order to get a default value and especially a qemu_irq: if (!dev->vio_irq_num) { dev->vio_irq_num = spapr_allocate_irq (spapr); } dev->qirq = xics_find_qirq(spapr->icp, dev->vio_irq_num); So this is how qdev manages to convert human-readable configuration into pointers. Since you cannot go "turtles all the way down", there are two fallback mechanisms to pass pointers directly to devices: 1) one is DEFINE_PROP_PTR, which you probably shouldn't use; 2) one is specific to qemu_irq and devices from sysbus; see sysbus_create_varargs. === Defining a child bus === [...] === SysBus: the root === [...] == A quick guide to qdev conversion == Converting devices to qdev is a three-step process: 1) ensuring that an appropriate bus type is defined where the device can be attached to; 2) defining a device's properties (the "schema" exposed by the device); 3) converting board initialization functions to use qdev services. The first step is very important to achieve a "quality" conversion to qdev. QEMU includes partial conversions to qdev that have a large amount of SysBus devices, or devices that use DEFINE_PROP_PTR. In many cases, this is because the authors did not introduce a board-specific bus type to mediate access to the board resources. Together with such a bus type there should be a single root board-specific device that is attached to SysBus. An interrupt controller is usually a good candidate for this because it takes qemu_irqs from the outside, and can make good use of the specificities of SysBus. A good design will make the conversion simpler (this is important, because it is usually hard to convert only a small part of the devices) and especially the second step might be mostly trivial. The third step is also very important. If the conversion was done well, a lot of board-specific initialization code may be removed and replaced by command-line options. This will also give the user the flexibility of working with "dumbed down" versions of the board, with some devices removed. If necessary, standard versions of the board may be described with configuration files. Old code not yet converted to qdev uses a specific function for each device type: goldfish_timer_and_rtc_init(0xff003000, 3); ... static struct goldfish_timer_state timer_state; void goldfish_timer_and_rtc_init(uint32_t timerbase, int timerirq) { timer_state.dev.base = timerbase; timer_state.dev.irq = timerirq; timer_state.timer = qemu_new_timer_ns(vm_clock, goldfish_timer_tick, &timer_state); goldfish_device_add(&timer_state.dev, goldfish_timer_readfn, goldfish_timer_writefn, &timer_state); } Here, the "timer_state.dev" function is a sub-structure that is common to all devices in the board. This is an embryonal separation between bus-specific and device-specific data that can be exploited when converting to qdev. However, there are substantial differences between this code and what will be required after qdev conversion: - the timerbase and timerirq are set via properties before the qdev is actually created; qdev takes care of initializing the structure's fields; - creation of the timer is moved into the init virtual function for the device; - of all the arguments to goldfish_device_add, only "&timer_state" matters, because the goldfish_timer_readfn and goldfish_timer_writefn arguments will be stored in the GoldfishDeviceInfo; - last but not least, everything will be allocated dynamically, so static device objects such as "timer_state" will have to go. qdev's metainformation structures BusInfo and DeviceInfo provide a place for all this information, including even initializers for the static "timer_state" object. These for example can become bus property defaults, or can be moved to the DeviceInfo subclass. So, the call to goldfish_timer_and_rtc_init can be described entirely in terms of qdev properties. This can in turn be expressed in different ways: 1) command-line -device goldfish_timer,base=0xff003000,irq=3 2) configuration files (for -readconfig): [device "goldfish_timer"] base = 0xff003000 irq = 3 3) C code: /* The first argument is the bus. See below for how to create a bus-specific wrapper to qdev_create. */ dev = qdev_create(&goldfish_bus->qbus, "goldfish_timer"); qdev_prop_set_uint32(dev, "base", 0xff003000); qdev_prop_set_uint32(dev, "irq", 3); qdev_init_nofail(dev); The last case will appear in the machine initialization function in several cases: devices using DEFINE_PROP_PTR; devices that are present in the board by default (though in the long term we would like to move those to configuration files); code that creates devices based on legacy command-line interfaces. It will often be hidden behind a helper function not unlike goldfish_timer_and_rtc_init; for example (slightly edited from the actual QEMU code): static ISABus *isabus; ISADevice *isa_create(const char *name) { DeviceState *dev; dev = qdev_create(&isabus->qbus, name); return DO_UPCAST(ISADevice, qdev, dev); } static inline void serial_isa_init(int index, CharDriverState *chr) { ISADevice *dev; dev = isa_create("isa-serial"); qdev_prop_set_uint32(&dev->qdev, "index", index); qdev_prop_set_chr(&dev->qdev, "chardev", chr); qdev_init_nofail(&dev->qdev); } ... /* Here we create ISA serial ports for each -serial option on the command line. */ for(i = 0; i < MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { serial_isa_init(i, serial_hds[i]); } }