Re: [Qemu-devel] Running KVM guest on X86
On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777 wrote: > > >> -Original Message- >> From: Alex Williamson [mailto:alex.william...@redhat.com] >> Sent: Monday, August 06, 2012 9:27 PM >> To: Bhushan Bharat-R65777 >> Cc: qemu-devel@nongnu.org; Avi Kivity >> Subject: Re: Running KVM guest on X86 >> >> On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote: >> > Hi Avi/All, >> > >> > I am facing issue to boot KVM guest on x86 (I used to work on PowerPC >> > platform >> and do not have enough knowledge of x86). I am working on making VFIO >> working on >> PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled >> kernel >> for x86 on fedora with virtualization configuration (selected all kernel >> config >> options for same). Run below command to boot Guest (I have not provided vfio >> device yet): >> > >> > "qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel >> arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial >> tcp::,server,telnet" >> > >> > After the I can see qemu command line (able to run various commands like >> > "info >> registers" etc), while guest does not boot (not even the first print comes). >> > >> > Can anyone help in what I am missing or doing wrong? >> >> x86 doesn't use the serial port for console by default, so you're making >> things >> quite a bit more difficult that way. Typically you'll want to provide a disk >> image (the -hda option is the easiest way to do this), a display (-vga std >> -vnc >> :0 is again easiest), and probably something to install from (-cdrom >> ). You can also add a -boot d to get it to choose the cdrom the >> first time for install. Thanks, > > Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append > "console=ttyS0" Note, once you get to user space you will need a getty specified in inittab in order to get a login on your serial port. Something like: T0:23:respawn:/sbin/getty -L ttyS0 Stuart
[Qemu-devel] [PATCH] PPC: e500: set has-idle in guest device tree
From: Stuart Yoder If the host kernel supports the idle hcall, then advertise that to the guest kernel via the device tree. Signed-off-by: Stuart Yoder --- hw/ppce500_mpc8544ds.c |5 + target-ppc/kvm.c | 26 +++--- target-ppc/kvm_ppc.h |1 + 3 files changed, 29 insertions(+), 3 deletions(-) diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index b1a0b8c..7d111bb 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -124,6 +124,11 @@ static int mpc8544_load_device_tree(CPUPPCState *env, kvmppc_get_hypercall(env, hypercall, sizeof(hypercall)); qemu_devtree_setprop(fdt, "/hypervisor", "hcall-instructions", hypercall, sizeof(hypercall)); + +/* add "has-idle" prop if KVM supports ePAPR idle */ +if (kvmppc_get_hasidle(env)) { +qemu_devtree_setprop(fdt, "/hypervisor", "has-idle", NULL, 0); +} } /* We need to generate the cpu nodes in reverse order, so Linux can pick diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index c09cc39..f986de3 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -705,16 +705,36 @@ uint32_t kvmppc_get_dfp(void) return kvmppc_read_int_cpu_dt("ibm,dfp"); } +static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo) +{ +if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) && +!kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, pvinfo)) { +return 0; +} + +return 1; +} + +int kvmppc_get_hasidle(CPUPPCState *env) +{ +struct kvm_ppc_pvinfo pvinfo; + +if (!kvmppc_get_pvinfo(env, &pvinfo) && +(pvinfo.flags & KVM_PPC_PVINFO_FLAGS_EV_IDLE)) { +return 1; +} + +return 0; +} + int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len) { uint32_t *hc = (uint32_t*)buf; struct kvm_ppc_pvinfo pvinfo; -if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) && -!kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, &pvinfo)) { +if (!kvmppc_get_pvinfo(env, &pvinfo)) { memcpy(buf, pvinfo.hcall, buf_len); - return 0; } diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 34ecad3..c2c75b6 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -19,6 +19,7 @@ uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); uint32_t kvmppc_get_vmx(void); uint32_t kvmppc_get_dfp(void); +int kvmppc_get_hasidle(CPUPPCState *env); int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUPPCState *env, int irq, int level); void kvmppc_set_papr(CPUPPCState *env); -- 1.7.3.4
Re: [Qemu-devel] [PATCH 4/4] PPC: e500: add generic e500 platform
On Wed, Jun 27, 2012 at 6:51 PM, Scott Wood wrote: > This gives the kernel a paravirtualized machine to target, without > requiring both sides to pretend to be targeting a specific board > that likely has little to do with the host in KVM scenarios. This > avoids the need to add new boards to QEMU, just to be able to > run KVM on new CPUs. > > Signed-off-by: Scott Wood > --- > hw/ppc/Makefile.objs | 3 +- > hw/ppc/e500plat.c | 60 > ++ > 2 files changed, 62 insertions(+), 1 deletions(-) > create mode 100644 hw/ppc/e500plat.c > > diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs > index 23eb8ca..58d82c9 100644 > --- a/hw/ppc/Makefile.objs > +++ b/hw/ppc/Makefile.objs > @@ -15,7 +15,8 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o > spapr_iommu.o > obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o > obj-y += ppc440_bamboo.o > # PowerPC E500 boards > -obj-$(CONFIG_FDT) += ppc/e500.o mpc8544_guts.o ppce500_spin.o ppc/mpc8544ds.o > +obj-$(CONFIG_FDT) += ppc/e500.o mpc8544_guts.o ppce500_spin.o > ppc/mpc8544ds.o \ > + ppc/e500plat.o > # PowerPC 440 Xilinx ML507 reference board. > obj-y += virtex_ml507.o > # PowerPC OpenPIC > diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c > new file mode 100644 > index 000..a9ef5f8 > --- /dev/null > +++ b/hw/ppc/e500plat.c > @@ -0,0 +1,60 @@ > +/* > + * Generic device-tree-driven paravirt PPC e500 platform > + * > + * Copyright 2012 Freescale Semiconductor, Inc. > + * > + * This is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include "config.h" > +#include "qemu-common.h" > +#include "e500.h" > +#include "../boards.h" > +#include "device_tree.h" > + > +static void e500plat_fixup_devtree(PPCE500Params *params, void *fdt) > +{ > + const char model[] = "QEMU e500plat"; > + const char compatible[] = "fsl,qemu-e500"; > + > + qemu_devtree_setprop(fdt, "/", "model", model, sizeof(model)); > + qemu_devtree_setprop(fdt, "/", "compatible", compatible, > + sizeof(compatible)); > +} > + > +static void e500plat_init(ram_addr_t ram_size, > + const char *boot_device, > + const char *kernel_filename, > + const char *kernel_cmdline, > + const char *initrd_filename, > + const char *cpu_model) > +{ > + PPCE500Params params = { > + .ram_size = ram_size, > + .boot_device = boot_device, > + .kernel_filename = kernel_filename, > + .kernel_cmdline = kernel_cmdline, > + .initrd_filename = initrd_filename, > + .cpu_model = cpu_model, > + .fixup_devtree = e500plat_fixup_devtree, > + }; > + > + ppce500_init(¶ms); > +} > + > +static QEMUMachine e500plat_machine = { > + .name = "e500plat", > + .desc = "e500plat", > + .init = e500plat_init, > + .max_cpus = 15, > +}; Can we call the generic e500 machine "ppce500"? I like that better as it denotes both the "Power/PPC" architecture and "e500". I aesthetically like -M ppce500 over -M e500plat when running QEMU. Stuart
[Qemu-devel] [PATCH] QEMU: PPC: set has-idle in guest device tree
From: Stuart Yoder Signed-off-by: Stuart Yoder --- -note: this patch requires a kernel headers update for the pvinfo idle flag. See Bharat Bhushan's recent patch "Synchronized the linux headers" --- hw/ppc/e500.c|4 target-ppc/kvm.c | 29 ++--- target-ppc/kvm_ppc.h |1 + 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index aa54fd8..a19f094 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -225,6 +225,10 @@ static int ppce500_load_device_tree(CPUPPCState *env, kvmppc_get_hypercall(env, hypercall, sizeof(hypercall)); qemu_devtree_setprop(fdt, "/hypervisor", "hcall-instructions", hypercall, sizeof(hypercall)); +/* if KVM supports the idle hcall, set property indicating this */ +if (kvmppc_get_hasidle(env)) { +qemu_devtree_setprop(fdt, "/hypervisor", "has-idle", NULL, 0); +} } /* Create CPU nodes */ diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 88650d4..10adf26 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -967,16 +967,39 @@ uint32_t kvmppc_get_dfp(void) return kvmppc_read_int_cpu_dt("ibm,dfp"); } +static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo) + { + PowerPCCPU *cpu = ppc_env_get_cpu(env); + CPUState *cs = CPU(cpu); + +if (kvm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO) && +!kvm_vm_ioctl(cs->kvm_state, KVM_PPC_GET_PVINFO, pvinfo)) { +return 0; +} + +return 1; +} + +int kvmppc_get_hasidle(CPUPPCState *env) +{ +struct kvm_ppc_pvinfo pvinfo; + +if (!kvmppc_get_pvinfo(env, &pvinfo) && +(pvinfo.flags & KVM_PPC_PVINFO_FLAGS_EV_IDLE)) { +return 1; +} + +return 0; +} + int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len) { uint32_t *hc = (uint32_t*)buf; struct kvm_ppc_pvinfo pvinfo; -if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) && -!kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, &pvinfo)) { +if (!kvmppc_get_pvinfo(env, &pvinfo)) { memcpy(buf, pvinfo.hcall, buf_len); - return 0; } diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 83f9872..b1f096f 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -19,6 +19,7 @@ uint32_t kvmppc_get_tbfreq(void); uint64_t kvmppc_get_clockfreq(void); uint32_t kvmppc_get_vmx(void); uint32_t kvmppc_get_dfp(void); +int kvmppc_get_hasidle(CPUPPCState *env); int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(CPUPPCState *env, int irq, int level); void kvmppc_set_papr(CPUPPCState *env); -- 1.7.9.7
[Qemu-devel] [PATCH] configure: change endian cross compilation test
From: Stuart Yoder Previous check in configure's endian test was to determine if this is a cross-compile build by testing whether --cross-prefix was used. This does not work for cross build environments like Yocto that may set CC instead of --cross-prefix. Instead, test whether host compiler is same as target compiler, which also works when --cross-prefix is used. Signed-off-by: Stuart Yoder --- configure |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/configure b/configure index fe4fc4f..c5333bf 100755 --- a/configure +++ b/configure @@ -1269,7 +1269,7 @@ feature_not_found() { exit 1; } -if test -z "$cross_prefix" ; then +if test $cc = $host_cc; then # --- # big/little endian test -- 1.7.3.4
[Qemu-devel] [PATCH][v2] configure: change endianness test
From: Stuart Yoder Remove the runtime check for endianness, and for platforms that can be bit or little endian do a compile time check. This resolves an issue encountered building QEMU under Yocto which was not setting --cross-prefix. Signed-off-by: Stuart Yoder --- -v2: removed the dynamic runtime test completely, added compile time check for mips configure | 33 - 1 files changed, 8 insertions(+), 25 deletions(-) diff --git a/configure b/configure index fe4fc4f..d9c5999 100755 --- a/configure +++ b/configure @@ -1269,41 +1269,24 @@ feature_not_found() { exit 1; } -if test -z "$cross_prefix" ; then - -# --- -# big/little endian test -cat > $TMPC << EOF -#include -int main(int argc, char ** argv){ -volatile uint32_t i=0x01234567; -return (*((uint8_t*)(&i))) == 0x67; -} -EOF - -if compile_prog "" "" ; then -$TMPE && bigendian="yes" -else -echo big/little test failed -fi - -else - -# if cross compiling, cannot launch a program, so make a static guess +## +# endianness check case "$cpu" in arm) -# ARM can be either way; ask the compiler which one we are if check_define __ARMEB__; then bigendian=yes fi ;; - hppa|m68k|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64) + mips|mips64) +if check_define __MIPSEB__; then + bigendian=yes +fi + ;; + hppa|m68k|ppc|ppc64|s390|s390x|sparc|sparc64) bigendian=yes ;; esac -fi - ## # NPTL probe -- 1.7.3.4
[Qemu-devel] [PATCH] remove cross prefix from pkg-config command
From: Stuart Yoder the host pkg-config tool should be used with the location to pkg-config *.pc files being specified via the PKG_CONFIG_PATH Signed-off-by: Stuart Yoder --- The Freescale cross build environment is multilib which means there are special library paths for different cpu types. pkg-config installed along with the other cross tools does not work...it has no clue how to find the right libraries. So using cross-prefix does no good. Only thing I can get to work is explicitly setting PKG_CONFIG_PATH before running configure. Not sure how other cross build envionments work, but it seems like the above method should be sufficiently general. configure |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/configure b/configure index 38e3724..59f6e09 100755 --- a/configure +++ b/configure @@ -223,7 +223,7 @@ objcopy="${cross_prefix}${OBJCOPY-objcopy}" ld="${cross_prefix}${LD-ld}" strip="${cross_prefix}${STRIP-strip}" windres="${cross_prefix}${WINDRES-windres}" -pkg_config="${cross_prefix}${PKG_CONFIG-pkg-config}" +pkg_config="${PKG_CONFIG-pkg-config}" sdl_config="${cross_prefix}${SDL_CONFIG-sdl-config}" # default flags for all hosts -- 1.7.3.4
[Qemu-devel] [PATCH] when overriding default tool names don't add cross-prefix
From: Stuart Yoder When overriding a tool name via a shell variable, don't tack on the cross-prefix. This specifically allows the pkg-config command to be overridden and work where it does not exist in some cross build environments. Signed-off-by: Stuart Yoder --- configure | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/configure b/configure index 38e3724..ad522f2 100755 --- a/configure +++ b/configure @@ -217,14 +217,14 @@ done # Using uname is really, really broken. Once we have the right set of checks # we can eliminate it's usage altogether -cc="${cross_prefix}${CC-gcc}" -ar="${cross_prefix}${AR-ar}" -objcopy="${cross_prefix}${OBJCOPY-objcopy}" -ld="${cross_prefix}${LD-ld}" -strip="${cross_prefix}${STRIP-strip}" -windres="${cross_prefix}${WINDRES-windres}" -pkg_config="${cross_prefix}${PKG_CONFIG-pkg-config}" -sdl_config="${cross_prefix}${SDL_CONFIG-sdl-config}" +cc="${CC-${cross_prefix}gcc}" +ar="${AR-${cross_prefix}ar}" +objcopy="${OBJCOPY-${cross_prefix}objcopy}" +ld="${LD-${cross_prefix}ld}" +strip="${STRIP-${cross_prefix}strip}" +windres="${WINDRES-${cross_prefix}windres}" +pkg_config="${PKG_CONFIG-${cross_prefix}pkg-config}" +sdl_config="${SDL_CONFIG-${cross_prefix}sdl-config}" # default flags for all hosts QEMU_CFLAGS="-fno-strict-aliasing $QEMU_CFLAGS" -- 1.7.3.4
[Qemu-devel] bringing up virtio network device hangs guest
I'm trying to solve a problem where bringing up a virtio network device in a KVM guest hangs the guest. Start QEMU with these options: -net nic,model=virtio -net tap,script=/root/qemu-ifup The qemu-ifup script is pretty simple, just adds the interface passed in to a bridge: #!/bin/sh bridge=br0 guest_device=$1 ifconfig $guest_device up brctl addif $bridge $guest_device The guest kernel finds the virtio device, but when bringing up the virtio eth0 device in the guest the ifconfig hangs the guest. In the host, the bridge (br0) is able to send and receive packets. The tap device created by QEMU shows packets sent, but none received: tap0 Link encap:Ethernet HWaddr 0A:02:56:5A:72:C6 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:257 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:0 (0.0 B) TX bytes:17592 (17.1 KiB) Wondered if anyone had any ideas on how to further diagnose what might be going on. Thanks, Stuart
Re: [Qemu-devel] State of KVM guest debugging support on Power
On Tue, Nov 1, 2011 at 9:22 AM, Jan Kiszka wrote: > Hi there, > > I'm generating some slides on guest debugging via kvm. What's the > current state for Book-E and Book-S? Works out of box, mostly usable, or > to be implemented? Is anyone using it? Are you talking about guest debug using the QEMU stub or using the virtual CPU's debug registers and interrupts?For Freescale Book E we have both approaches working, and patches to be sent upstream soon. Stuart
Re: [Qemu-devel] State of KVM guest debugging support on Power
On Thu, Nov 3, 2011 at 2:14 PM, Jan Kiszka wrote: > On 2011-11-03 19:59, Stuart Yoder wrote: >> On Tue, Nov 1, 2011 at 9:22 AM, Jan Kiszka wrote: >>> Hi there, >>> >>> I'm generating some slides on guest debugging via kvm. What's the >>> current state for Book-E and Book-S? Works out of box, mostly usable, or >>> to be implemented? Is anyone using it? >> >> Are you talking about guest debug using the QEMU stub or using >> the virtual CPU's debug registers and interrupts? For Freescale >> Book E we have both approaches working, and patches to be sent upstream >> soon. > > It's good to see both features coming into mainline. > > I'll talk about the former, ie. debugging without guest help/awareness > (virtual hardware debugger). Is gdb well prepared for it (all registers > available, real-mode support, etc.)? Right now it looks like the basic register set is available, not SPRs but not because of any gdb limitation as far as I know. gdb supports additional registers with the correct target description xml file passed back to it when it connects to the target. In the QEMU monitor, we currently show key SPRs for 'info registers'. As far as real mode goes, don't have direct experience with it as everything we are doing is bookE, which has no real mode. Stuart
[Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Based on the discussions over the last couple of weeks I have updated the device fd file layout proposal and tried to specify it a bit more formally. === 1. Overview This specification describes the layout of device files used in the context of vfio, which gives user space direct access to I/O devices that have been bound to vfio. When a device fd is opened and read, offset 0x0 contains a fixed sized header followed by a number of variable length records that describe different characteristics of the device-- addressable regions, interrupts, etc. 0x0 +-+-+ | magic | u32 // identifies this as a vfio device file +---+ and identifies the type of bus | version | u32 // specifies the version of this +---+ | flags | u32 // encodes any flags +---+ | dev info record 0| |type | u32 // type of record |rec_len| u32 // length in bytes of record | | (including record header) |flags | u32 // type specific flags |...content... | // record content, which could +---+ // include sub-records | dev info record 1| +---+ | dev info record N| +---+ The device info records following the file header may have the following record types each with content encoded in a record specific way: +---+-- | type | Region | num | Description --- REGION 1describes an addressable address range for the device DTPATH 2describes the device tree path for the device DTINDEX 3describes the index into the related device tree property (reg,ranges,interrupts,interrupt-map) INTERRUPT4describes an interrupt for the device PCI_CONFIG_SPACE 5property identifying a region as PCI config space PCI_BAR_INDEX6describes the BAR index for a PCI region PHYS_ADDR7describes the physical address of the region --- 2. Header The header is located at offset 0x0 in the device fd and has the following format: struct devfd_header { __u32 magic; __u32 version; __u32 flags; }; The 'magic' field contains a magic value that will identify the type bus the device is on. Valid values are: 0x70636900 // "pci" - PCI device 0x6474 // "dt" - device tree (system bus) 3. Region A REGION record an addressable address region for the device. struct devfd_region { __u32 type; // must be 0x1 __u32 record_len; __u32 flags; __u64 offset; // seek offset to region from beginning // of file __u64 len ; // length of the region }; The 'flags' field supports one flag: IS_MMAPABLE 4. Device Tree Path (DTPATH) A DTPATH record is a sub-record of a REGION and describes the path to a device tree node for the region struct devfd_dtpath { __u32 type; // must be 0x2 __u32 record_len; __u64 char[] ; // length of the region }; 5. Device Tree Index (DTINDEX) A DTINDEX record is a sub-record of a REGION and specifies the index into the resource list encoded in the associated device tree property-- "reg", "ranges", "interrupts", or "interrupt-map". struct devfd_dtindex { __u32 type; // must be 0x3 __u32 record_len; __u32 prop_type; __u32 prop_index; // index into the resource list }; prop_type must have one of the follow values: 1 // "reg" property 2 // "ranges" property 3 // "interrupts" property 4 // "interrupts" property Note: prop_index is not the byte offset into the property, but the logical index. 6. Interrupts (INTERRUPT) An INTERRUPT record describes one of a device's interrupts. The handle field is an argument to VFIO_DEVICE_GET_IRQ_FD which user space can use to receive device interrupts. struct devfd_interrupts { __u32 type; // must be 0x4 __u32 record_len; __u32 flags; __u32 handle; // parameter to VFIO_DEVICE_GET_IRQ_FD }; 7. PCI Config Space (PCI_CONFIG_SPACE) A PCI_CONFIG_SPACE record is a sub-record of a REGION record and identifies the region as PCI configuration space. struct de
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Meant to identify the changes in v2 of this proposal: v2: -removed PCI_INFO record type -removed PCI_BAR_INFO record type -PCI_CONFIG_SPACE is now a sub-record/property of a REGION -removed physical address from region and made it a subrecord/property of a REGION -added PCI_BAR_INDEX sub-record type -updated magic numbers Stuart On Fri, Sep 9, 2011 at 8:11 AM, Stuart Yoder wrote: > Based on the discussions over the last couple of weeks > I have updated the device fd file layout proposal and > tried to specify it a bit more formally. > > === > > 1. Overview > > This specification describes the layout of device files > used in the context of vfio, which gives user space > direct access to I/O devices that have been bound to > vfio. > > When a device fd is opened and read, offset 0x0 contains > a fixed sized header followed by a number of variable length > records that describe different characteristics > of the device-- addressable regions, interrupts, etc. > > 0x0 +-+-+ > | magic | u32 // identifies this as a vfio > device file > +---+ and identifies the type of bus > | version | u32 // specifies the version of this > +---+ > | flags | u32 // encodes any flags > +---+ > | dev info record 0 | > | type | u32 // type of record > | rec_len | u32 // length in bytes of record > | | (including record header) > | flags | u32 // type specific flags > | ...content... | // record content, which could > +---+ // include sub-records > | dev info record 1 | > +---+ > | dev info record N | > +---+ > > The device info records following the file header may have > the following record types each with content encoded in > a record specific way: > > +---+-- > | type | > Region | num | Description > --- > REGION 1 describes an addressable address range for the device > DTPATH 2 describes the device tree path for the device > DTINDEX 3 describes the index into the related device tree > property (reg,ranges,interrupts,interrupt-map) > INTERRUPT 4 describes an interrupt for the device > PCI_CONFIG_SPACE 5 property identifying a region as PCI config space > PCI_BAR_INDEX 6 describes the BAR index for a PCI region > PHYS_ADDR 7 describes the physical address of the region > --- > > 2. Header > > The header is located at offset 0x0 in the device fd > and has the following format: > > struct devfd_header { > __u32 magic; > __u32 version; > __u32 flags; > }; > > The 'magic' field contains a magic value that will > identify the type bus the device is on. Valid values > are: > > 0x70636900 // "pci" - PCI device > 0x6474 // "dt" - device tree (system bus) > > 3. Region > > A REGION record an addressable address region for the device. > > struct devfd_region { > __u32 type; // must be 0x1 > __u32 record_len; > __u32 flags; > __u64 offset; // seek offset to region from beginning > // of file > __u64 len ; // length of the region > }; > > The 'flags' field supports one flag: > > IS_MMAPABLE > > 4. Device Tree Path (DTPATH) > > A DTPATH record is a sub-record of a REGION and describes > the path to a device tree node for the region > > struct devfd_dtpath { > __u32 type; // must be 0x2 > __u32 record_len; > __u64 char[] ; // length of the region > }; > > 5. Device Tree Index (DTINDEX) > > A DTINDEX record is a sub-record of a REGION and specifies > the index into the resource list encoded in the associated > device tree property-- "reg", "ranges", "interrupts", or > "interrupt-map". > > struct devfd_dtindex { > __u32 type; // must be 0x3 >
[Qemu-devel] [PATCH] support compiling and installing DTBs
From: Stuart Yoder also adds configure options to enable/disable installing DTBs and override location of dtc Signed-off-by: Stuart Yoder --- Makefile | 17 +++-- configure | 24 2 files changed, 39 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 7e9382f..1fe0a0a 100644 --- a/Makefile +++ b/Makefile @@ -245,7 +245,6 @@ ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc \ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \ pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \ bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \ -mpc8544ds.dtb \ multiboot.bin linuxboot.bin \ s390-zipl.rom \ spapr-rtas.bin slof.bin @@ -253,6 +252,20 @@ else BLOBS= endif +ifdef INSTALL_DTBS +DTBS=$(SRC_PATH)/pc-bios/mpc8544ds.dtb + +%.dtb: %.dts + $(call quiet-command,$(DTC) -I dts -O dtb -b 0 -o $@ $<, " DTC $@") + +install-dtbs: $(DTBS) + $(INSTALL_DIR) "$(DESTDIR)$(datadir)" + set -e; for x in $(DTBS); do \ + $(INSTALL_DATA) $$x "$(DESTDIR)$(datadir)"; \ + done + +endif + install-doc: $(DOCS) $(INSTALL_DIR) "$(DESTDIR)$(docdir)" $(INSTALL_DATA) qemu-doc.html qemu-tech.html "$(DESTDIR)$(docdir)" @@ -267,7 +280,7 @@ install-sysconfig: $(INSTALL_DIR) "$(DESTDIR)$(sysconfdir)/qemu" $(INSTALL_DATA) $(SRC_PATH)/sysconfigs/target/target-x86_64.conf "$(DESTDIR)$(sysconfdir)/qemu" -install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig +install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig $(if $(INSTALL_DTBS),install-dtbs) $(INSTALL_DIR) "$(DESTDIR)$(bindir)" ifneq ($(TOOLS),) $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)" diff --git a/configure b/configure index 0875f95..7d9447f 100755 --- a/configure +++ b/configure @@ -182,6 +182,7 @@ usb_redir="" opengl="" zlib="yes" guest_agent="yes" +dtc=dtc # parse CC options first for opt do @@ -728,6 +729,12 @@ for opt do ;; --disable-blobs) blobs="no" ;; + --enable-dtbs) install_dtbs="yes" + ;; + --disable-dtbs) install_dtbs="no" + ;; + --with-dtc=*) dtc=$optarg + ;; --with-pkgversion=*) pkgversion=" ($optarg)" ;; --disable-docs) docs="no" @@ -771,6 +778,13 @@ for opt do esac done +# set default for $install_dtbs, may be overriden by command line +if has $dtc; then +install_dtbs="yes" +else +install_dtbs="no" +fi + # # If cpu ~= sparc and sparc_cpu hasn't been defined, plug in the right # QEMU_CFLAGS/LDFLAGS (assume sparc_v8plus for 32-bit and sparc_v9 for 64-bit) @@ -1028,6 +1042,9 @@ echo " --enable-linux-aio enable Linux AIO support" echo " --disable-attr disables attr and xattr support" echo " --enable-attrenable attr and xattr support" echo " --disable-blobs disable installing provided firmware blobs" +echo " --enable-dtbsbuild/install provided device trees (requires dtc)" +echo " --disable-dtbs don't build/install provided device trees" +echo " --with-dtc full path to device tree compiler (overrides default)" echo " --enable-docsenable documentation build" echo " --disable-docs disable documentation build" echo " --disable-vhost-net disable vhost-net acceleration support" @@ -2713,6 +2730,7 @@ echo "vde support $vde" echo "Linux AIO support $linux_aio" echo "ATTR/XATTR support $attr" echo "Install blobs $blobs" +echo "Install dtbs $install_dtbs" echo "KVM support $kvm" echo "fdt support $fdt" echo "preadv support$preadv" @@ -2982,6 +3000,9 @@ fi if test "$blobs" = "yes" ; then echo "INSTALL_BLOBS=yes" >> $config_host_mak fi +if test "$install_dtbs" = "yes" ; then + echo "INSTALL_DTBS=yes" >> $config_host_mak +fi if test "$iovec" = "yes" ; then echo "CONFIG_IOVEC=y" >> $config_host_mak fi @@ -3122,6 +3143,9 @@ echo "ARLIBS_END=$arlibs_end" >> $config_host_mak echo "LIBS+=$LIBS" >> $config_host_mak echo "LIBS_TOOLS+=$libs_tools" >> $config_host_mak echo "EXESUF=$EXESUF" >> $config_host_mak +if test "$install_dtbs" = "yes" ; then + echo "DTC=$dtc" >> $config_host_mak +fi echo "LIBS_QGA+=$libs_qga" >> $config_host_mak # generate list of library paths for linker script -- 1.7.3.4
[Qemu-devel] [PATCH] remove mpc8544ds.dtb
From: Stuart Yoder make install now compiles dtb Signed-off-by: Stuart Yoder --- apply after 'support compiling and installing DTBs' pc-bios/mpc8544ds.dtb | Bin 2277 -> 0 bytes 1 files changed, 0 insertions(+), 0 deletions(-) delete mode 100644 pc-bios/mpc8544ds.dtb diff --git a/pc-bios/mpc8544ds.dtb b/pc-bios/mpc8544ds.dtb deleted file mode 100644 index ae318b1fe83846cc2e133951a3666fcfcdf87f79.. GIT binary patch literal 0 HcmV?d1 literal 2277 zcmb7Fy>C=U5Z@O^han^~5(T1&EZjuso_vEG~YMsIJg>4f2Eiz1{8M@ZMj+0sBqj-i4i|`8*)t|B)DTA(!au zQBShmX5M&G+4n1r{Y_tKOfoWqK%3q;o8_b7_Fw?y1$KBjcBhN|g<^i+thMEwyOfsG zs2^xZbUot&V&2Q@MMSb+{cGI*UZ3j=Nn}l9$^(q(51l!;_;Ggm26TcfE>5H{csS;yqBG&_{S74_@7AV zd0W)ENe|*K19zSHj98uHvxX1n=X%arWuxSjwI)lQApX}J5_0f51CFi0AaAeqTRYp^ zd}v~$wcjTfx4Gn!eH6@F0^B9W-U#Mh3<~=dyEAs2U{I6v{CC#$M+|M}`|g@*t4BlJ z9?C?7nn3>%OMN&;P}CEQvc}LCxC!8Z!5!-9sjK@{ZOW8pU>C<^<^%zo_|1NDpUC_pXLIQCVmr#1JKS=!5(` z$G6vBYE#|5StERBu0cw1WNNYFj!xKww)TJcg>c50|D1e#N<9@jMywyq^(^_KjbjP3 z5Qlt-!znELqD^u?CrC?(0mt|SpUfBI7kn~b#4Pxh=4Jh>;`O61;t& zFT}77h|d53okfPcPK?5Z-ir{7h(#T*Rz8hh(inB_T_D82PO{7oJo*SeF!#*4Iwf(9 zIGl-|R^F6Q8AJb}a-%k~$;27& zQ*H9xs|{}-n<$6(YM{cnrKr~iYGRJ9Hh(viUpaHACaKQ!)TFsBO^RS;zp)5r!Ocoz z3v)m}+w_q$|IAFOPZT}L9ZlzzKI-T$D-Y?{jABz;PRv}{vN6uhYPy51nAVvyWz#X} zs%%`9=fuFX_XhU$br&GJmva;&Qnt71H=-Qq3AF HpqZ+#N9HOV -- 1.7.3.4
Re: [Qemu-devel] KVM call agenfda for 2014-04-01
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, April 10, 2014 10:57 AM > To: Peter Maydell > Cc: Juan Quintela; KVM devel mailing list; qemu list; Yoder Stuart- > B08248; Alistair Francis; Peter Crosthwaite; Christoffer Dall > Subject: Re: [Qemu-devel] KVM call agenfda for 2014-04-01 > > > On 10.04.2014, at 17:52, Peter Maydell wrote: > > > On 10 April 2014 16:49, Alexander Graf wrote: > >> For the next call, I would propose to revive the "platform bus" > >> (aka: how to create non-PCI devices with -device) discussions > >> to make sure we're all on the same page. > > > > I rather suspect we are not :-) Do you have a link to > > the current proposals for prior reading? > > The only thing I could find is the old thread about my platform bus > approach (which Anthony disliked): > > https://lists.gnu.org/archive/html/qemu-devel/2013-07/msg03614.html > > So from what I remember the plan moving forward was to have a special > device type similar to my platform bus devices that you can just create > using -device, no bus involved. The machine file would then loop through > them, interpret the "I sit at address x" and "I want interrupt number y" > fields to link them to whatever the machine model thinks is a good fit. > > The same way the machine model today has to have knowledge on each device > tree node type it generates, it would do the same for these devices. So > the machine has to have awareness of all the "funky special options" a > device tree node receives - the same as for any other device. Just that > in this case it wouldn't be able to hardcode them, but have to generate > them on the fly when it sees a device in the object tree. Another link that may help from a call we had back in Sept: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-September/005532.html Stuart
[Qemu-devel] [PATCH 1/2] QEMU: PPC: specify PVRs for all e500 cores
From: Stuart Yoder Signed-off-by: Stuart Yoder --- target-ppc/cpu-models.c | 64 +-- target-ppc/cpu-models.h | 30 -- 2 files changed, 90 insertions(+), 4 deletions(-) diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c index f6c9b3a..9cf603b 100644 --- a/target-ppc/cpu-models.c +++ b/target-ppc/cpu-models.c @@ -677,19 +677,79 @@ "PowerPC e500 v2.0 core") POWERPC_DEF("e500v2_v10",CPU_POWERPC_e500v2_v10, e500v2, "PowerPC e500v2 v1.0 core") +POWERPC_DEF("e500v2_v11",CPU_POWERPC_e500v2_v11, e500v2, +"PowerPC e500v2 v1.1 core") POWERPC_DEF("e500v2_v20",CPU_POWERPC_e500v2_v20, e500v2, "PowerPC e500v2 v2.0 core") POWERPC_DEF("e500v2_v21",CPU_POWERPC_e500v2_v21, e500v2, "PowerPC e500v2 v2.1 core") POWERPC_DEF("e500v2_v22",CPU_POWERPC_e500v2_v22, e500v2, "PowerPC e500v2 v2.2 core") +POWERPC_DEF("e500v2_v23",CPU_POWERPC_e500v2_v23, e500v2, +"PowerPC e500v2 v2.3 core") POWERPC_DEF("e500v2_v30",CPU_POWERPC_e500v2_v30, e500v2, "PowerPC e500v2 v3.0 core") +POWERPC_DEF("e500v2_v31",CPU_POWERPC_e500v2_v31, e500v2, +"PowerPC e500v2 v3.1 core") +POWERPC_DEF("e500v2_v1040", CPU_POWERPC_e500v2_v1040, e500v2, +"PowerPC e500v2 v104.0 core") +POWERPC_DEF("e500v2_v1050", CPU_POWERPC_e500v2_v1050, e500v2, +"PowerPC e500v2 v105.0 core") +POWERPC_DEF("e500v2_v1051", CPU_POWERPC_e500v2_v1051, e500v2, +"PowerPC e500v2 v105.1 core") +POWERPC_DEF("e500v2_v1151", CPU_POWERPC_e500v2_v1151, e500v2, +"PowerPC e500v2 v115.1 core") +POWERPC_DEF("e500v2_v1152", CPU_POWERPC_e500v2_v1152, e500v2, +"PowerPC e500v2 v115.2 core") +POWERPC_DEF("e500v2_v2050", CPU_POWERPC_e500v2_v2050, e500v2, +"PowerPC e500v2 v205.0 core") +POWERPC_DEF("e500v2_v2051", CPU_POWERPC_e500v2_v2051, e500v2, +"PowerPC e500v2 v205.1 core") +POWERPC_DEF("e500v2_v2151", CPU_POWERPC_e500v2_v2151, e500v2, +"PowerPC e500v2 v215.1 core") +POWERPC_DEF("e500v2_v2152", CPU_POWERPC_e500v2_v2152, e500v2, +"PowerPC e500v2 v215.2 core") +POWERPC_DEF("e500v2_v2251", CPU_POWERPC_e500v2_v2251, e500v2, +"PowerPC e500v2 v225.1 core") + +/* e500mc family */ POWERPC_DEF_SVR("e500mc", "e500mc", -CPU_POWERPC_e500mc, POWERPC_SVR_E500, e500mc) +CPU_POWERPC_e500mc_v20, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v10", "PowerPC e500mc v1.0 core", +CPU_POWERPC_e500mc_v10, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v21", "PowerPC e500mc v2.1 core", +CPU_POWERPC_e500mc_v21, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v22", "PowerPC e500mc v2.2 core", +CPU_POWERPC_e500mc_v22, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v23", "PowerPC e500mc v2.3 core", +CPU_POWERPC_e500mc_v23, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v1030", "PowerPC e500mc v103.0 core", +CPU_POWERPC_e500mc_v1030, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v30", "PowerPC e500mc v3.0 core", +CPU_POWERPC_e500mc_v30, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v31", "PowerPC e500mc v3.1 core", +CPU_POWERPC_e500mc_v31, POWERPC_SVR_E500, e500mc) +POWERPC_DEF_SVR("e500mc_v32", "PowerPC e500mc v3.2 core", +CPU_POWERPC_e500mc_v32, POWERPC_SVR_E500, e500mc) + #ifdef TARGET_PPC64 +/* e5500 family */ POWERPC_DEF_SVR("e5500", "e5500", -CPU_POWERPC_e5500,POWERPC_SVR_E500, e5500) +CPU_POWERPC_e5500_v12,POWERPC_SVR_E500, e5500) +POWERPC_DEF_SVR("e5500_v10", "PowerPC e5500 v1.0 core", +CPU_POWERPC_e5500_v10,POWERPC_SVR_E500,
[Qemu-devel] [PATCH 2/2] QEMU: PPC: set default cpu type to be 'host'
From: Stuart Yoder -for KVM we always want the cpu to be that of the host system, so make that the default -for TGC mode, the emulated cpu type should be explicitly set Signed-off-by: Stuart Yoder --- hw/ppc/e500.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index b37ce9d..69dbf47 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -621,7 +621,11 @@ void ppce500_init(QEMUMachineInitArgs *args, PPCE500Params *params) /* Setup CPUs */ if (args->cpu_model == NULL) { -args->cpu_model = "e500v2_v30"; +if (kvm_enabled()) { +args->cpu_model = "host"; +} else { +args->cpu_model = "e500v2_v30"; +} } irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *)); -- 1.7.9.7
Re: [Qemu-devel] [PATCH 1/2] QEMU: PPC: specify PVRs for all e500 cores
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Monday, April 14, 2014 6:01 AM > To: Yoder Stuart-B08248 > Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org > Subject: Re: [PATCH 1/2] QEMU: PPC: specify PVRs for all e500 cores > > > On 14.02.14 20:22, Stuart Yoder wrote: > > From: Stuart Yoder > > > > Signed-off-by: Stuart Yoder > > --- > > target-ppc/cpu-models.c | 64 > +-- > > target-ppc/cpu-models.h | 30 -- > > 2 files changed, 90 insertions(+), 4 deletions(-) > > > > diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c > > index f6c9b3a..9cf603b 100644 > > --- a/target-ppc/cpu-models.c > > +++ b/target-ppc/cpu-models.c > > @@ -677,19 +677,79 @@ > > "PowerPC e500 v2.0 core") > > POWERPC_DEF("e500v2_v10",CPU_POWERPC_e500v2_v10, > e500v2, > > "PowerPC e500v2 v1.0 core") > > +POWERPC_DEF("e500v2_v11",CPU_POWERPC_e500v2_v11, > e500v2, > > +"PowerPC e500v2 v1.1 core") > > POWERPC_DEF("e500v2_v20",CPU_POWERPC_e500v2_v20, > e500v2, > > "PowerPC e500v2 v2.0 core") > > POWERPC_DEF("e500v2_v21",CPU_POWERPC_e500v2_v21, > e500v2, > > "PowerPC e500v2 v2.1 core") > > POWERPC_DEF("e500v2_v22",CPU_POWERPC_e500v2_v22, > e500v2, > > "PowerPC e500v2 v2.2 core") > > +POWERPC_DEF("e500v2_v23",CPU_POWERPC_e500v2_v23, > e500v2, > > +"PowerPC e500v2 v2.3 core") > > POWERPC_DEF("e500v2_v30",CPU_POWERPC_e500v2_v30, > e500v2, > > "PowerPC e500v2 v3.0 core") > > +POWERPC_DEF("e500v2_v31",CPU_POWERPC_e500v2_v31, > e500v2, > > +"PowerPC e500v2 v3.1 core") > > +POWERPC_DEF("e500v2_v1040", CPU_POWERPC_e500v2_v1040, > e500v2, > > +"PowerPC e500v2 v104.0 core") > > +POWERPC_DEF("e500v2_v1050", CPU_POWERPC_e500v2_v1050, > e500v2, > > +"PowerPC e500v2 v105.0 core") > > +POWERPC_DEF("e500v2_v1051", CPU_POWERPC_e500v2_v1051, > e500v2, > > +"PowerPC e500v2 v105.1 core") > > +POWERPC_DEF("e500v2_v1151", CPU_POWERPC_e500v2_v1151, > e500v2, > > +"PowerPC e500v2 v115.1 core") > > +POWERPC_DEF("e500v2_v1152", CPU_POWERPC_e500v2_v1152, > e500v2, > > +"PowerPC e500v2 v115.2 core") > > +POWERPC_DEF("e500v2_v2050", CPU_POWERPC_e500v2_v2050, > e500v2, > > +"PowerPC e500v2 v205.0 core") > > +POWERPC_DEF("e500v2_v2051", CPU_POWERPC_e500v2_v2051, > e500v2, > > +"PowerPC e500v2 v205.1 core") > > +POWERPC_DEF("e500v2_v2151", CPU_POWERPC_e500v2_v2151, > e500v2, > > +"PowerPC e500v2 v215.1 core") > > +POWERPC_DEF("e500v2_v2152", CPU_POWERPC_e500v2_v2152, > e500v2, > > +"PowerPC e500v2 v215.2 core") > > +POWERPC_DEF("e500v2_v2251", CPU_POWERPC_e500v2_v2251, > e500v2, > > +"PowerPC e500v2 v225.1 core") > > + > > +/* e500mc family */ > > POWERPC_DEF_SVR("e500mc", "e500mc", > > -CPU_POWERPC_e500mc, POWERPC_SVR_E500, > e500mc) > > +CPU_POWERPC_e500mc_v20, POWERPC_SVR_E500, > e500mc) > > How about we use aliases instead like the other CPU types? :) > > > +POWERPC_DEF_SVR("e500mc_v10", "PowerPC e500mc v1.0 core", > > +CPU_POWERPC_e500mc_v10, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v21", "PowerPC e500mc v2.1 core", > > +CPU_POWERPC_e500mc_v21, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v22", "PowerPC e500mc v2.2 core", > > +CPU_POWERPC_e500mc_v22, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v23", "PowerPC e500mc v2.3 core", > > +CPU_POWERPC_e500mc_v23, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v1030", "PowerPC e500mc v103.0 core", > > +CPU_POWERPC_e500mc_v1030, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v30", "PowerPC e500mc v3.0 core", > > +CPU_POWERPC_e500mc_v30, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v31", "PowerPC e500mc v3.1 core", > > +CPU_POWERPC_e500mc_v31, POWERPC_SVR_E500, > e500mc) > > +POWERPC_DEF_SVR("e500mc_v32", "PowerPC e500mc v3.2 core", > > +CPU_POWERPC_e500mc_v32, POWERPC_SVR_E500, > e500mc) > > + > > #ifdef TARGET_PPC64 > > +/* e5500 family */ > > POWERPC_DEF_SVR("e5500", "e5500", > > -CPU_POWERPC_e5500,POWERPC_SVR_E500, > e5500) > > Same here Can you explain you mean? what kind of alias? Thanks, Stuart
[Qemu-devel] USB passthru of hub
Hi Gerd, In the "USB 2.0 Quick Start" write-up you said about USB passthrough: > (2) hostbus+hostport -- match for a specific physical port in the > host, any device which is plugged in there gets passed to the > guest A customer is asking what happens if a USB hub is plugged into the specified port (instead of a device). Will the hub get passed through and the guest will see all devices plugged into that hub? Thanks, Stuart
Re: [Qemu-devel] [Qemu-ppc] qemu does not support PAPR
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, June 05, 2014 2:59 AM > To: sonia verma > Cc: Paolo Bonzini; abhishek jain; qemu-...@nongnu.org; qemu-devel; Yoder > Stuart-B08248 > Subject: Re: [Qemu-ppc] qemu does not support PAPR > > > > > Am 05.06.2014 um 09:45 schrieb sonia verma : > > > > > > So what can be the solution for this? > > Please don't top post. > > You're running on QorIQ which is a Freescale chip. The pseries target > only works on IBM PPC chips. > > The target you're looking for is ppce500: > > $ qemu-system-ppc64 -M ppce500 -nographic -kernel uImage ... > > I don't know about the status of libvirt in your sdk version, but I'm > sure Stuart does, so let's ask him ;). libvirt is supported in Freescale SDK 1.4 in the Yocto-built rootfs, so I would expect anything that uses libvirt interfaces to work there. But, we haven't specifically tested libguestfs. However, it looks like you may not be using the SDK Yocto rootfs, but a ubuntu rootfs. In that case there may be a issue with the ubuntu libvirt. There are some patches that we apply to libvirt that you will find only in the Freescale SDK...you can find them if you look at the libvirt Yocto recipe. Stuart
Re: [Qemu-devel] [Qemu-ppc] qemu does not support PAPR
> From: sonia verma [mailto:soniaverma9...@gmail.com] > Sent: Thursday, June 05, 2014 12:13 PM > To: Yoder Stuart-B08248 > Cc: abhishek jain; Alexander Graf; qemu-...@nongnu.org; qemu-devel; Paolo > Bonzini > Subject: RE: [Qemu-ppc] qemu does not support PAPR > > Hi Stuart. > Thanks for the information.I need to run libguestfs on powerpc ubuntu.How can > i do that? > Are there any patches for the same. I think you will have to build/install libvirt yourself on ubuntu with the Freescale patches. For SDK 1.4, you can see the Yocto recipe used here: http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt ...we use the stock Yocto libvirt recipe and and apply 2 patches. The 2 patches are here: http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt/files I have not built libvirt from scratch myself, so can't help you there. Thanks, Stuart
Re: [Qemu-devel] [Qemu-ppc] qemu does not support PAPR
Based on your original description of the problem, I thought the error was that libvirt was starting a PAPR machine instead of an e500 machine. So my take is that libvirt needed to be rebuilt with the patches to support e500 machines (that I mentioned in my last email). If you are actually able to start e500 machines successfully with libvirt, then the problem must lie with how libguestfs interfaces to libvirt or something. I have never used libguestfs, and don’t think I can help you with that. Stuart From: sonia verma [mailto:soniaverma9...@gmail.com] Sent: Thursday, June 05, 2014 11:11 PM To: Yoder Stuart-B08248 Cc: abhishek jain; Alexander Graf; qemu-...@nongnu.org; qemu-devel; Paolo Bonzini Subject: Re: [Qemu-ppc] qemu does not support PAPR Hi Stuart Thanks for the information.I'm able to run libvirt on my system.I need to run libguestfs on the same powerpc ubuntu. Can you help me regarding that? On Thu, Jun 5, 2014 at 11:10 PM, Stuart Yoder mailto:stuart.yo...@freescale.com>> wrote: > From: sonia verma > [mailto:soniaverma9...@gmail.com<mailto:soniaverma9...@gmail.com>] > Sent: Thursday, June 05, 2014 12:13 PM > To: Yoder Stuart-B08248 > Cc: abhishek jain; Alexander Graf; > qemu-...@nongnu.org<mailto:qemu-...@nongnu.org>; qemu-devel; Paolo Bonzini > Subject: RE: [Qemu-ppc] qemu does not support PAPR > > Hi Stuart. > Thanks for the information.I need to run libguestfs on powerpc ubuntu.How can > i do that? > Are there any patches for the same. I think you will have to build/install libvirt yourself on ubuntu with the Freescale patches. For SDK 1.4, you can see the Yocto recipe used here: http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt ...we use the stock Yocto libvirt recipe and and apply 2 patches. The 2 patches are here: http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt/files I have not built libvirt from scratch myself, so can't help you there. Thanks, Stuart
Re: [Qemu-devel] [RFC PATCH v4 1/3] use image_file_reset to reload initrd image
> /* read()-like version */ > ssize_t read_targphys(const char *name, >int fd, hwaddr dst_addr, size_t nbytes) > @@ -113,6 +146,12 @@ int load_image_targphys(const char *filename, > } > if (size > 0) { > rom_add_file_fixed(filename, addr, -1); > +ImageFile *image; > +image = g_malloc0(sizeof(*image)); > +image->name = g_strdup(filename); > +image->addr = addr; > + > +qemu_register_reset(image_file_reset, image); You need to remove the call to rom_add_file_fixed(), no? Stuart
Re: [Qemu-devel] [RFC PATCH v4 2/3] use uimage_reset to reload uimage
On Wed, Nov 14, 2012 at 2:28 PM, Olivia Yin wrote: > Signed-off-by: Olivia Yin > --- > hw/loader.c | 64 ++ > 1 files changed, 46 insertions(+), 18 deletions(-) > > diff --git a/hw/loader.c b/hw/loader.c > index a8a0a09..1a909d0 100644 > --- a/hw/loader.c > +++ b/hw/loader.c > @@ -55,6 +55,8 @@ > #include > > static int roms_loaded; > +static int kernel_loaded; > +static int *is_linux; Why do we need these 2 new variables? > /* return the size or -1 if error */ > int get_image_size(const char *filename) > @@ -472,15 +474,14 @@ static ssize_t gunzip(void *dst, size_t dstlen, uint8_t > *src, > return dstbytes; > } > > -/* Load a U-Boot image. */ > -int load_uimage(const char *filename, hwaddr *ep, > -hwaddr *loadaddr, int *is_linux) > +/* write uimage into memory */ > +static int uimage_physical_loader(const char *filename, uint8_t **data, > + hwaddr *loadaddr, int *is_linux) > { > int fd; > int size; > uboot_image_header_t h; > uboot_image_header_t *hdr = &h; > -uint8_t *data = NULL; > int ret = -1; > > fd = open(filename, O_RDONLY | O_BINARY); > @@ -521,10 +522,9 @@ int load_uimage(const char *filename, hwaddr *ep, > *is_linux = 0; > } > > -*ep = hdr->ih_ep; > -data = g_malloc(hdr->ih_size); > +*data = g_malloc(hdr->ih_size); > > -if (read(fd, data, hdr->ih_size) != hdr->ih_size) { > +if (read(fd, *data, hdr->ih_size) != hdr->ih_size) { > fprintf(stderr, "Error reading file\n"); > goto out; > } > @@ -534,11 +534,11 @@ int load_uimage(const char *filename, hwaddr *ep, > size_t max_bytes; > ssize_t bytes; > > -compressed_data = data; > +compressed_data = *data; > max_bytes = UBOOT_MAX_GUNZIP_BYTES; > -data = g_malloc(max_bytes); > +*data = g_malloc(max_bytes); > > -bytes = gunzip(data, max_bytes, compressed_data, hdr->ih_size); > +bytes = gunzip(*data, max_bytes, compressed_data, hdr->ih_size); > g_free(compressed_data); > if (bytes < 0) { > fprintf(stderr, "Unable to decompress gzipped image!\n"); > @@ -547,7 +547,9 @@ int load_uimage(const char *filename, hwaddr *ep, > hdr->ih_size = bytes; > } > > -rom_add_blob_fixed(filename, data, hdr->ih_size, hdr->ih_load); > +if (!kernel_loaded) { > +rom_add_blob_fixed(filename, *data, hdr->ih_size, hdr->ih_load); > +} Why do we need to keep the rom_add_blob_fixed() call? I thought the point of this was to remove adding roms. > if (loadaddr) > *loadaddr = hdr->ih_load; > @@ -555,12 +557,41 @@ int load_uimage(const char *filename, hwaddr *ep, > ret = hdr->ih_size; > > out: > -if (data) > -g_free(data); > close(fd); > return ret; > } > > +static void uimage_reset(void *opaque) > +{ > +ImageFile *image = opaque; > +uint8_t *data = NULL; > +int size; > + > +size = uimage_physical_loader(image->name, &data, &image->addr, > is_linux); Not sure why we need is_linux here. I think you should just set that parameter to NULL. Nothing will look at is_linux. Stuart
Re: [Qemu-devel] [RFC PATCH v5 4/4] free the memory malloced by load_at()
On Wed, Nov 21, 2012 at 8:38 AM, Olivia Yin wrote: > Signed-off-by: Olivia Yin > --- > hw/elf_ops.h |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/hw/elf_ops.h b/hw/elf_ops.h > index b346861..9c76a75 100644 > --- a/hw/elf_ops.h > +++ b/hw/elf_ops.h > @@ -178,6 +178,8 @@ static int glue(load_symbols, SZ)(struct elfhdr *ehdr, > int fd, int must_swab, > s->disas_strtab = str; > s->next = syminfos; > syminfos = s; > +g_free(syms); > +g_free(str); > g_free(shdr_table); > return 0; > fail: Olivia, as Alex pointed out there are references to syms and str in the struct "s"so you can't just free those I don't think. The problem that leaves us with is that on every reset when we call load_elf() that we re-load and re-malloc space for the symbols. I think the solution may be to factor out the call to load_symbols() from load_elf(). It looks like what load_symbols does in the end is set the variable syminfos to point to the loaded symbol info. If you factor load_symbols() out then in load_elf_32/64() you would do something like: elf_phy_loader_32/64() load_symbols_32/64(). We don't need to be reloading symbols on every reset. Alex, does that make sense? Stuart
[Qemu-devel] [PATCH] PPC: e500: advertise 4.2 MPIC only if KVM supports EPR
From: Stuart Yoder Signed-off-by: Stuart Yoder --- hw/ppc/e500plat.c |5 + 1 file changed, 5 insertions(+) diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c index 25ac4b1..2cd7cad 100644 --- a/hw/ppc/e500plat.c +++ b/hw/ppc/e500plat.c @@ -16,6 +16,7 @@ #include "sysemu/device_tree.h" #include "hw/pci/pci.h" #include "hw/openpic.h" +#include "sysemu/kvm.h" static void e500plat_fixup_devtree(PPCE500Params *params, void *fdt) { @@ -48,6 +49,10 @@ static void e500plat_init(QEMUMachineInitArgs *args) .mpic_version = OPENPIC_MODEL_FSL_MPIC_42, }; +if (kvm_enabled() && !kvm_check_extension(kvm_state, KVM_CAP_PPC_EPR)) { +params.mpic_version = OPENPIC_MODEL_FSL_MPIC_20; +} + ppce500_init(¶ms); } -- 1.7.9.7
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood wrote: > On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: >> >> Alex, >> >> We are in the process of implementing vfio-pci support for the Freescale >> IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite different >> than x86, and will involve creating a 'type 2' vfio implementation. >> >> For each device's DMA mappings, PAMU has an overall aperture and a number >> of windows. All sizes and window counts must be power of 2. To >> illustrate, >> below is a mapping for a 256MB guest, including guest memory (backed by >> 64MB huge pages) and some windows for MSIs: >> >> Total aperture: 512MB >> # of windows: 8 >> >> win gphys/ >> # iovaphys size >> --- >> 0 0x 0xX_XX00 64MB >> 1 0x0400 0xX_XX00 64MB >> 2 0x0800 0xX_XX00 64MB >> 3 0x0C00 0xX_XX00 64MB >> 4 0x1000 0xf_fe044000 4KB// msi bank 1 >> 5 0x1400 0xf_fe045000 4KB// msi bank 2 >> 6 0x1800 0xf_fe046000 4KB// msi bank 3 >> 7- - disabled >> >> There are a couple of updates needed to the vfio user->kernel interface >> that we would like your feedback on. >> >> 1. IOMMU geometry >> >>The kernel IOMMU driver now has an interface (see domain_set_attr, >>domain_get_attr) that lets us set the domain geometry using >>"attributes". >> >>We want to expose that to user space, so envision needing a couple >>of new ioctls to do this: >> VFIO_IOMMU_SET_ATTR >> VFIO_IOMMU_GET_ATTR > > > Note that this means attributes need to be updated for user-API > appropriateness, such as using fixed-size types. > > >> 2. MSI window mappings >> >>The more problematic question is how to deal with MSIs. We need to >>create mappings for up to 3 MSI banks that a device may need to target >>to generate interrupts. The Linux MSI driver can allocate MSIs from >>the 3 banks any way it wants, and currently user space has no way of >>knowing which bank may be used for a given device. >> >>There are 3 options we have discussed and would like your direction: >> >>A. Implicit mappings -- with this approach user space would not >>explicitly map MSIs. User space would be required to set the >>geometry so that there are 3 unused windows (the last 3 windows) > > > Where does userspace get the number "3" from? E.g. on newer chips there are > 4 MSI banks. Maybe future chips have even more. Ok, then make the number 4. The chance of more MSI banks in future chips is nil, and if it ever happened user space could adjust. Also, practically speaking since memory is typically allocate in powers of 2 way you need to approximately double the window geometry anyway. >>B. Explicit mapping using DMA map flags. The idea is that a new >>flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that >>a mapping is to be created for the supplied iova. No vaddr >>is given though. So in the above example there would be a >>a dma map at 0x1000 for 24KB (and no vaddr). > > > A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI > group is involved in this VFIO group? What if four MSI groups are > involved?). You'd need to either have a naturally aligned, power-of-two > sized mapping that covers exactly the pages you want to map and no more, or > you'd need to create a separate mapping for each MSI bank, and due to PAMU > subwindow alignment restrictions these mappings could not be contiguous in > iova-space. You're right, a single 24KB mapping wouldn't work-- in the case of 3 MSI banks perhaps we could just do one 64MB*3 mapping to identify which windows are used for MSIs. If only one MSI bank was involved the kernel could get clever and only enable the banks actually needed. Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 3:32 PM, Alex Williamson wrote: >> 2. MSI window mappings >> >>The more problematic question is how to deal with MSIs. We need to >>create mappings for up to 3 MSI banks that a device may need to target >>to generate interrupts. The Linux MSI driver can allocate MSIs from >>the 3 banks any way it wants, and currently user space has no way of >>knowing which bank may be used for a given device. >> >>There are 3 options we have discussed and would like your direction: >> >>A. Implicit mappings -- with this approach user space would not >>explicitly map MSIs. User space would be required to set the >>geometry so that there are 3 unused windows (the last 3 windows) >>for MSIs, and it would be up to the kernel to create the mappings. >>This approach requires some specific semantics (leaving 3 windows) >>and it potentially gets a little weird-- when should the kernel >>actually create the MSI mappings? When should they be unmapped? >>Some convention would need to be established. > > VFIO would have control of SET/GET_ATTR, right? So we could reduce the > number exposed to userspace on GET and transparently add MSI entries on > SET. The number of windows is always power of 2 (and max is 256). And to reduce PAMU cache pressure you want to use the fewest number of windows you can.So, I don't see practically how we could transparently steal entries to add the MSIs. Either user space knows to leave empty windows for MSIs and by convention the kernel knows which windows those are (as in option #A) or explicitly tell the kernel which windows (as in option #B). > On x86 the interrupt remapper handles this transparently when MSI > is enabled and userspace never gets direct access to the device MSI > address/data registers. What kind of restrictions do you have around > adding and removing windows while the aperture is enabled? The windows can be enabled/disabled event while the aperture is enabled (pretty sure)... >>B. Explicit mapping using DMA map flags. The idea is that a new >>flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that >>a mapping is to be created for the supplied iova. No vaddr >>is given though. So in the above example there would be a >>a dma map at 0x1000 for 24KB (and no vaddr). It's >>up to the kernel to determine which bank gets mapped where. >>So, this option puts user space in control of which windows >>are used for MSIs and when MSIs are mapped/unmapped. There >>would need to be some semantics as to how this is used-- it >>only makes sense > > This could also be done as another "type2" ioctl extension. What's the > value to userspace in determining which windows are used by which banks? > It sounds like the case that there are X banks and if userspace wants to > use MSI it needs to leave X windows available for that. Is this just > buying userspace a few more windows to allow them the choice between MSI > or RAM? Yes, it would potentially give user space the flexibility some more windows. It also makes more explicit when the MSI mappings are created. In option #A the MSI mappings would probably get created at the time of the first normal DMA map. So, you're saying with this approach you'd rather see a new type 2 ioctl instead of adding new flags to DMA map, right? >>C. Explicit mapping using normal DMA map. The last idea is that >>we would introduce a new ioctl to give user-space an fd to >>the MSI bank, which could be mmapped. The flow would be >>something like this: >> -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD >> -user space mmaps the fd, getting a vaddr >> -user space does a normal DMA map for desired iova >>This approach makes everything explicit, but adds a new ioctl >>applicable most likely only to the PAMU (type2 iommu). > > And the DMA_MAP of that mmap then allows userspace to select the window > used? This one seems like a lot of overhead, adding a new ioctl, new > fd, mmap, special mapping path, etc. It would be less overhead to just > add an ioctl to enable MSI, maybe letting userspace pick which windows > get used, but I'm still not sure what the value is to userspace in > exposing it. Thanks, Thanks, Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 3:47 PM, Scott Wood wrote: > On 04/02/2013 03:38:42 PM, Stuart Yoder wrote: >> >> On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood >> wrote: >> > On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: >> >> >> >> Alex, >> >> >> >> We are in the process of implementing vfio-pci support for the >> >> Freescale >> >> IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite >> >> different >> >> than x86, and will involve creating a 'type 2' vfio implementation. >> >> >> >> For each device's DMA mappings, PAMU has an overall aperture and a >> >> number >> >> of windows. All sizes and window counts must be power of 2. To >> >> illustrate, >> >> below is a mapping for a 256MB guest, including guest memory (backed by >> >> 64MB huge pages) and some windows for MSIs: >> >> >> >> Total aperture: 512MB >> >> # of windows: 8 >> >> >> >> win gphys/ >> >> # iovaphys size >> >> --- >> >> 0 0x 0xX_XX00 64MB >> >> 1 0x0400 0xX_XX00 64MB >> >> 2 0x0800 0xX_XX00 64MB >> >> 3 0x0C00 0xX_XX00 64MB >> >> 4 0x1000 0xf_fe044000 4KB// msi bank 1 >> >> 5 0x1400 0xf_fe045000 4KB// msi bank 2 >> >> 6 0x1800 0xf_fe046000 4KB// msi bank 3 >> >> 7- - disabled >> >> >> >> There are a couple of updates needed to the vfio user->kernel interface >> >> that we would like your feedback on. >> >> >> >> 1. IOMMU geometry >> >> >> >>The kernel IOMMU driver now has an interface (see domain_set_attr, >> >>domain_get_attr) that lets us set the domain geometry using >> >>"attributes". >> >> >> >>We want to expose that to user space, so envision needing a couple >> >>of new ioctls to do this: >> >> VFIO_IOMMU_SET_ATTR >> >> VFIO_IOMMU_GET_ATTR >> > >> > >> > Note that this means attributes need to be updated for user-API >> > appropriateness, such as using fixed-size types. >> > >> > >> >> 2. MSI window mappings >> >> >> >>The more problematic question is how to deal with MSIs. We need to >> >>create mappings for up to 3 MSI banks that a device may need to >> >> target >> >>to generate interrupts. The Linux MSI driver can allocate MSIs from >> >>the 3 banks any way it wants, and currently user space has no way of >> >>knowing which bank may be used for a given device. >> >> >> >>There are 3 options we have discussed and would like your direction: >> >> >> >>A. Implicit mappings -- with this approach user space would not >> >>explicitly map MSIs. User space would be required to set the >> >>geometry so that there are 3 unused windows (the last 3 windows) >> > >> > >> > Where does userspace get the number "3" from? E.g. on newer chips there >> > are >> > 4 MSI banks. Maybe future chips have even more. >> >> Ok, then make the number 4. The chance of more MSI banks in future chips >> is nil, > > > What makes you so sure? Especially since you seem to be presenting this as > not specifically an MPIC API. > > >> and if it ever happened user space could adjust. > > > What bit of API is going to tell it that it needs to adjust? Haven't thought through that completely, but I guess we could add an API to return the number of MSI banks for type 2 iommus. >> Also, practically speaking since memory is typically allocate in powers of >> 2 way you need to approximately double the window geometry anyway. > > > Only if your existing mapping needs fit exactly in a power of two. > > >> >>B. Explicit mapping using DMA map flags. The idea is that a new >> >>flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that >> >>a mapping is to be created for the supplied iova. No vaddr >> >>is given though. So in the above example there would be a >> >>a dma map at 0x1000 for 24KB (and no vaddr). >> > >> > >> > A single 24 KiB mapping wouldn't work (and why 24KB? What if only one >> > MSI >> > group is involved in this VFIO group? What if four MSI groups are >> > involved?). You'd need to either have a naturally aligned, power-of-two >> > sized mapping that covers exactly the pages you want to map and no more, >> > or >> > you'd need to create a separate mapping for each MSI bank, and due to >> > PAMU >> > subwindow alignment restrictions these mappings could not be contiguous >> > in >> > iova-space. >> >> You're right, a single 24KB mapping wouldn't work-- in the case of 3 MSI >> banks >> perhaps we could just do one 64MB*3 mapping to identify which windows >> are used for MSIs. > > > Where did the assumption of a 64MiB subwindow size come from? The example I was using. User space would need to create a mapping for window_size * msi_bank_count. Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood wrote: >> This could also be done as another "type2" ioctl extension. > > > Again, what is "type2", specifically? If someone else is adding their own > IOMMU that is kind of, sort of like PAMU, how would they know if it's close > enough? What assumptions can a user make when they see that they're dealing > with "type2"? We will define that as part of the type2 implementation. Highly unlikely anything but a PAMU will comply. >> What's the value to userspace in determining which windows are used by >> which banks? > > > That depends on who programs the MSI config space address. What is > important is userspace controlling which iovas will be dedicated to this, in > case it wants to put something else there. > > >> It sounds like the case that there are X banks and if userspace wants to >> use MSI it needs to leave X windows available for that. Is this just >> buying userspace a few more windows to allow them the choice between MSI >> or RAM? > > > Well, there could be that. But also, userspace will generally have a much > better idea of the type of mappings it's creating, so it's easier to keep > everything explicit at the kernel/user interface than require more > complicated code in the kernel to figure things out automatically (not just > for MSIs but in general). > > If the kernel automatically creates the MSI mappings, when does it assume > that userspace is done creating its own? What if userspace doesn't need any > DMA other than the MSIs? What if userspace wants to continue dynamically > modifying its other mappings? > > >> >C. Explicit mapping using normal DMA map. The last idea is that >> >we would introduce a new ioctl to give user-space an fd to >> >the MSI bank, which could be mmapped. The flow would be >> >something like this: >> > -for each group user space calls new ioctl >> > VFIO_GROUP_GET_MSI_FD >> > -user space mmaps the fd, getting a vaddr >> > -user space does a normal DMA map for desired iova >> >This approach makes everything explicit, but adds a new ioctl >> >applicable most likely only to the PAMU (type2 iommu). >> >> And the DMA_MAP of that mmap then allows userspace to select the window >> used? This one seems like a lot of overhead, adding a new ioctl, new >> fd, mmap, special mapping path, etc. > > > There's going to be special stuff no matter what. This would keep it > separated from the IOMMU map code. > > I'm not sure what you mean by "overhead" here... the runtime overhead of > setting things up is not particularly relevant as long as it's reasonable. > If you mean development and maintenance effort, keeping things well > separated should help. We don't need to change DMA_MAP. If we can simply add a new "type 2" ioctl that allows user space to set which windows are MSIs, it seems vastly less complex than an ioctl to supply a new fd, mmap of it, etc. So maybe 2 ioctls: VFIO_IOMMU_GET_MSI_COUNT VFIO_IOMMU_MAP_MSI(iova, size) Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
>> > Type1 is arbitrary. It might as well be named "brown" and this one >> > can be >> > "blue". >> >> The difference is that "type1" seems to refer to hardware that can do >> arbitrary 4K page mappings, possibly constrained by an aperture but >> nothing else. More than one IOMMU can reasonably fit that. The odds >> that another IOMMU would have exactly the same restrictions as PAMU >> seem smaller in comparison. >> >> In any case, if you had to deal with some Intel-only quirk, would it >> make sense to call it a "type1 attribute"? I'm not advocating one way >> or the other on whether an abstraction is viable here (though Stuart >> seems to think it's "highly unlikely anything but a PAMU will comply"), >> just that if it is to be abstracted rather than a hardware-specific >> interface, we need to document what is and is not part of the >> abstraction. Otherwise a non-PAMU-specific user won't know what they >> can rely on, and someone adding support for a new windowed IOMMU won't >> know if theirs is close enough, or they need to introduce a "type3". > > So Alexey named the SPAPR IOMMU something related to spapr... > surprisingly enough. I'm fine with that. If you think it's unique > enough, name it something appropriately. I haven't seen the code and > don't know the architecture sufficiently to have an opinion. The only reason I suggested "type 2" is that I thought that was the convention...we would enumerate different iommus. I think that calling it "pamu" is better and more clear. Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
On Tue, Apr 2, 2013 at 5:50 PM, Scott Wood wrote: > On 04/02/2013 04:38:45 PM, Alex Williamson wrote: >> >> On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote: >> > On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood >> > wrote: >> > >> >C. Explicit mapping using normal DMA map. The last idea is >> > >> > that >> > >> >we would introduce a new ioctl to give user-space an fd to >> > >> >the MSI bank, which could be mmapped. The flow would be >> > >> >something like this: >> > >> > -for each group user space calls new ioctl >> > >> > VFIO_GROUP_GET_MSI_FD >> > >> > -user space mmaps the fd, getting a vaddr >> > >> > -user space does a normal DMA map for desired iova >> > >> >This approach makes everything explicit, but adds a new >> > >> > ioctl >> > >> >applicable most likely only to the PAMU (type2 iommu). >> > >> >> > >> And the DMA_MAP of that mmap then allows userspace to select the >> > >> window >> > >> used? This one seems like a lot of overhead, adding a new ioctl, new >> > >> fd, mmap, special mapping path, etc. >> > > >> > > >> > > There's going to be special stuff no matter what. This would keep it >> > > separated from the IOMMU map code. >> > > >> > > I'm not sure what you mean by "overhead" here... the runtime overhead >> > > of >> > > setting things up is not particularly relevant as long as it's >> > > reasonable. >> > > If you mean development and maintenance effort, keeping things well >> > > separated should help. >> > >> > We don't need to change DMA_MAP. If we can simply add a new "type 2" >> > ioctl that allows user space to set which windows are MSIs, it seems >> > vastly >> > less complex than an ioctl to supply a new fd, mmap of it, etc. >> > >> > So maybe 2 ioctls: >> > VFIO_IOMMU_GET_MSI_COUNT > > > Do you mean a count of actual MSIs or a count of MSI banks used by the whole > VFIO group? I meant # of MSI banks, so VFIO_IOMMU_GET_MSI_BANK_COUNT would be better. >> > VFIO_IOMMU_MAP_MSI(iova, size) > > > Not sure how you mean "size" to be used -- for MPIC it would be 4K per bank, > and you can only map one bank at a time (which bank you're mapping should be > a parameter, if only so that the kernel doesn't have to keep iteration state > for you). The intent was for user space to tell the kernel which windows to use for MSI. So I envisioned a total size of window-size * msi-bank-count. Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
> Would is be possible for userspace to simply leave room for MSI bank > mapping (how much room could be determined by something like > VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can > DMA_MAP starting at the 0x0 address of the aperture, growing up, and > VFIO will map banks on demand at the top of the aperture, growing down? > Wouldn't that avoid a lot of issues with userspace needing to know > anything about MSI banks (other than count) and coordinating irq numbers > and enabling handlers? This is basically option #A in the original proposals sent. I like this approach, in that it is simpler and keeps user space mostly out of this...which is consistent with how things are done on x86. User space just needs to know how many windows to leave at the top of the aperture. The kernel then has the flexibility to use those windows how it wants. But one question, is when should the kernel actually map (and unmap) the MSI banks. One thing we need to do is enable the aperture...and current thinking is that is done on the first DMA_MAP. Similarly when the last mapping is unmapped we could also umap the MSI banks. Sequence would be something like: VFIO_GROUP_SET_CONTAINER // add groups to the container VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model cnt = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows, including MSI banks VFIO_IOMMU_MAP_DMA// map the guest's memory ---> kernel enables aperture and maps needed MSI banks here VFIO_DEVICE_SET_IRQS ---> kernel sets actual MSI addr/data in physical device here (I think) Stuart
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
On Wed, Apr 3, 2013 at 2:18 PM, Scott Wood wrote: > On 04/03/2013 02:09:45 PM, Stuart Yoder wrote: >> >> > Would is be possible for userspace to simply leave room for MSI bank >> > mapping (how much room could be determined by something like >> > VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can >> > DMA_MAP starting at the 0x0 address of the aperture, growing up, and >> > VFIO will map banks on demand at the top of the aperture, growing down? >> > Wouldn't that avoid a lot of issues with userspace needing to know >> > anything about MSI banks (other than count) and coordinating irq numbers >> > and enabling handlers? >> >> This is basically option #A in the original proposals sent. I like >> this approach, in that it >> is simpler and keeps user space mostly out of this...which is >> consistent with how things are done >> on x86. User space just needs to know how many windows to leave at >> the top of the aperture. >> The kernel then has the flexibility to use those windows how it wants. >> >> But one question, is when should the kernel actually map (and unmap) >> the MSI banks. > > > I think userspace should explicitly request it. Userspace still wouldn't > need to know anything but the count: > > count = VFIO_IOMMU_GET_MSI_BANK_COUNT > VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) > VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) > // do other DMA maps now, or later, or not at all, doesn't matter > for (i = 0; i < count; i++) > VFIO_IOMMU_MAP_MSI_BANK(iova, i); > // The kernel now knows where each bank has been mapped, and can update PCI > config space appropriately. And the overall aperture enable/disable would occur on the first dma/msi map() and last dma/msi unmap()? Stuart
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On Mon, Sep 26, 2011 at 2:51 AM, David Gibson wrote: > On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote: >> Based on the discussions over the last couple of weeks >> I have updated the device fd file layout proposal and >> tried to specify it a bit more formally. >> >> === >> >> 1. Overview >> >> This specification describes the layout of device files >> used in the context of vfio, which gives user space >> direct access to I/O devices that have been bound to >> vfio. >> >> When a device fd is opened and read, offset 0x0 contains >> a fixed sized header followed by a number of variable length >> records that describe different characteristics >> of the device-- addressable regions, interrupts, etc. >> >> 0x0 +-+-+ >> | magic | u32 // identifies this as a vfio >> device file >> +---+ and identifies the type of bus >> | version | u32 // specifies the version of this >> +---+ >> | flags | u32 // encodes any flags >> +---+ >> | dev info record 0 | >> | type | u32 // type of record >> | rec_len | u32 // length in bytes of record >> | | (including record header) >> | flags | u32 // type specific flags >> | ...content... | // record content, which could >> +---+ // include sub-records >> | dev info record 1 | >> +---+ >> | dev info record N | >> +---+ > > I really should have chimed in on this earlier, but I've been very > busy. > > Um, not to put too fine a point on it, this is madness. > > Yes, it's very flexible and can thereby cover a very wide range of > cases. But it's much, much too complex. Userspace has to parse a > complex, multilayered data structure, with variable length elements > just to get an address at which to do IO. I can pretty much guarantee > that if we went with this, most userspace programs using this > interface would just ignore this metadata and directly map the > offsets at which they happen to know the kernel will put things for > the type of device they care about. > > _At least_ for PCI, I think the original VFIO layout of each BAR at a > fixed, well known offset is much better. Despite its limitations, > just advertising a "device type" ID which describes one of a few fixed > layouts would be preferable to this. I'm still hoping, that we can do > a bit better than that. But we should try really hard to at the very > least force the metadata into a simple array of resources each with a > fixed size record describing it, even if it means some space wastage > with occasionally-used fields. Anything more complex than that and > userspace is just never going to use it properly. So, is your issue really the variable length nature of what was proposed? I don't think it would be that hard to make the different resources fixed length. I think we have 2 types of resources now-- address regions and interrupts. The only thing that get's a bit tricky is device tree paths, which are obviously variable length. We could put a description of all the resources in an array with each element being something like 4KB?? Stuart
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
> > The other obvious possibility is a pure ioctl interface. To match what > this proposal is trying to describe, plus the runtime interfaces, we'd > need something like: > > /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ > #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) > > > /* Return number of mmio/iop/config regions. > * For PCI this is always 8 (BAR0-5 + ROM + Config) */ > #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) > > /* Return length for region index (may be zero) */ > #define VFIO_DEVICE_GET_REGION_LEN _IOWR(, , u64) > > /* Return flags for region index > * :0 - mmap'able, :1 - read-only, 63:2 - reserved */ > #define VFIO_DEVICE_GET_REGION_FLAGS _IOR(, , u64) > > /* Return file offset for region index */ > #define VFIO_DEVICE_GET_REGION_OFFSET _IOWR(, , u64) > > /* Return physical address for region index - not implemented for PCI */ > #define VFIO_DEVICE_GET_REGION_PHYS_ADDR _IOWR(, , u64) > > > > /* Return number of IRQs (Not including MSI/MSI-X for PCI) */ > #define VFIO_DEVICE_GET_NUM_IRQ _IOR(, , int) > > /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */ > #define VFIO_DEVICE_SET_IRQ_EVENTFD _IOW(, , int) > > /* Unmask IRQ index */ > #define VFIO_DEVICE_UNMASK_IRQ _IOW(, , int) > > /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */ > #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD _IOW(, , int) > > > /* Return the device tree path for type/index into the user > * allocated buffer */ > struct dtpath { > u32 type; (0 = region, 1 = IRQ) > u32 index; > u32 buf_len; > char *buf; > }; > #define VFIO_DEVICE_GET_DTPATH _IOWR(, , struct dtpath) > > /* Return the device tree index for type/index */ > struct dtindex { > u32 type; (0 = region, 1 = IRQ) > u32 index; > u32 prop_type; > u32 prop_index; > }; > #define VFIO_DEVICE_GET_DTINDEX _IOWR(, , struct dtindex) > > > /* Reset the device */ > #define VFIO_DEVICE_RESET _IO(, ,) > > > /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ > #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int) > #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) > > Hope that covers it. Something I prefer about this interface is that > everything can easily be generated on the fly, whereas reading out a > table from the device means we really need to have that table somewhere > in kernel memory to easily support reading random offsets. Thoughts? I think this could work, but I'm not sure it makes the problem David had any better-- you substitute the complexity of parsing the variable length regions with invoking a set of APIs. Stuart
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
> > BTW, github now has updated trees: > > git://github.com/awilliam/linux-vfio.git vfio-next-2029 > git://github.com/awilliam/qemu-vfio.git vfio-ng Hi Alex, Have been looking at vfio a bit. A few observations and things we'll need to figure out as it relates to the Freescale iommu. __vfio_dma_map() assumes that mappings are broken into 4KB pages. That will not be true for us. We normally will be mapping much larger physically contiguous chunks for our guests. Guests will get hugetlbfs backed memory with very large pages (e.g. 16MB, 64MB) or very large chunks allocated by some proprietary means. Also, mappings will have additional Freescale-specific attributes that need to get passed through to dma_map somehow. For example, the iommu can stash directly into a CPU's cache and we have iommu mapping properties like the cache stash target id and an operation mapping attribute. How do you envision handling proprietary attributes in struct vfio_dma_map? Stuart
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
On Tue, Nov 29, 2011 at 5:44 PM, Alex Williamson wrote: > On Tue, 2011-11-29 at 17:20 -0600, Stuart Yoder wrote: >> > >> > BTW, github now has updated trees: >> > >> > git://github.com/awilliam/linux-vfio.git vfio-next-2029 >> > git://github.com/awilliam/qemu-vfio.git vfio-ng >> >> Hi Alex, >> >> Have been looking at vfio a bit. A few observations and things >> we'll need to figure out as it relates to the Freescale iommu. >> >> __vfio_dma_map() assumes that mappings are broken into >> 4KB pages. That will not be true for us. We normally will be mapping >> much larger physically contiguous chunks for our guests. Guests will >> get hugetlbfs backed memory with very large pages (e.g. 16MB, >> 64MB) or very large chunks allocated by some proprietary >> means. > > Hi Stuart, > > I think practically everyone has commented on the 4k mappings ;) There > are a few problems around this. The first is that iommu drivers don't > necessarily support sub-region unmapping, so if we map 1GB and later > want to unmap 4k, we can't do it atomically. 4k gives us the most > flexibility for supporting fine granularities. Another problem is that > we're using get_user_pages to pin memory. It's been suggested that we > should use mlock for this, but I can't find anything that prevents a > user from later munlock'ing the memory and then getting access to memory > they shouldn't have. Those kind of limit us, but I don't see it being > an API problem for VFIO, just implementation. Ok. >> Also, mappings will have additional Freescale-specific attributes >> that need to get passed through to dma_map somehow. For >> example, the iommu can stash directly into a CPU's cache >> and we have iommu mapping properties like the cache stash >> target id and an operation mapping attribute. >> >> How do you envision handling proprietary attributes >> in struct vfio_dma_map? > > Let me turn the question around, how do you plan to support proprietary > attributes in the IOMMU API? Is the user level the appropriate place to > specify them, or are they an intrinsic feature of the domain? We've > designed struct vfio_dma_map for extension, so depending on how many > bits you need, we can make a conduit using the flags directly or setting > a new flag to indicate presence of an arch specific attributes field. The attributes are not intrinsic features of the domain. User space will need to set them. But in thinking about it a bit more I think the attributes are more properties of the domain rather than a per map() operation characteristic. I think a separate API might be appropriate. Define a new set_domain_attrs() op in the iommu_ops.In user space, perhaps a new vfio group API-- VFIO_GROUP_SET_ATTRS, VFIO_GROUP_GET_ATTRS. Stuart
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
>> The attributes are not intrinsic features of the domain. User space will >> need to set them. But in thinking about it a bit more I think the attributes >> are more properties of the domain rather than a per map() operation >> characteristic. I think a separate API might be appropriate. Define a >> new set_domain_attrs() op in the iommu_ops. In user space, perhaps >> a new vfio group API-- VFIO_GROUP_SET_ATTRS, >> VFIO_GROUP_GET_ATTRS. > > In that case, you should definitely be following what Alexey is thinking > about with an iommu_setup IOMMU API callback. I think it's shaping up > to do: > > x86: > - Report any IOVA range restrictions imposed by hw implementation > POWER: > - Request IOVA window size, report size and base > powerpc: > - Set domain attributes, probably report range as well. One other mechanism we need as well is the ability to enable/disable a domain. For example-- suppose a device is assigned to a VM, the device is in use when the VM is abruptly terminated. The VM terminate would shut off DMA at the IOMMU, but now the device is in an indeterminate state. Some devices have no simple reset bit and getting the device back into a sane state could be complicated-- something the hypervisor doesn't want to do. So now KVM restarts the VM, vfio init happens for the device and the IOMMU for that device is re-configured, etc, but we really can't re-enable DMA until the guest OS tells us (via an hcall) that it is ready. The guest needs to get the assigned device in a sane state before DMA is enabled. Does this warrant a new domain enable/disable API, or should we make this part of the setup API we are discussing here? Stuart
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
On Thu, Dec 1, 2011 at 3:25 PM, Alex Williamson wrote: > On Thu, 2011-12-01 at 14:58 -0600, Stuart Yoder wrote: >> One other mechanism we need as well is the ability to >> enable/disable a domain. >> >> For example-- suppose a device is assigned to a VM, the >> device is in use when the VM is abruptly terminated. The >> VM terminate would shut off DMA at the IOMMU, but now >> the device is in an indeterminate state. Some devices >> have no simple reset bit and getting the device back into >> a sane state could be complicated-- something the hypervisor >> doesn't want to do. >> >> So now KVM restarts the VM, vfio init happens for the device >> and the IOMMU for that device is re-configured, >> etc, but we really can't re-enable DMA until the guest OS tells us >> (via an hcall) that it is ready. The guest needs to get the >> assigned device in a sane state before DMA is enabled. > > Giant red flag. We need to paravirtualize the guest? Not on x86. It's the reality we have to deal with, but doing this would obviously only apply to platforms that need it. > Some > devices are better for assignment than others. PCI devices are moving > towards supporting standard reset mechanisms. > >> Does this warrant a new domain enable/disable API, or should >> we make this part of the setup API we are discussing >> here? > > What's wrong with simply not adding any DMA mapping entries until you > think your guest is ready? Isn't that effectively the same thing? > Unmap ~= disable. If the IOMMU API had a mechanism to toggle the iommu > domain on and off, I wouldn't be opposed to adding an ioctl to do it, > but it really seems like just a shortcut vs map/unmap. Thanks, Yes, we could do something like that I guess. Stuart
[Qemu-devel] vfio / iommu domain attributes
In the vfio RFC thread there seemed to be convergence that some new iommu_ops API is needed to set some platform specific aspects of an iommu domain. On Wed, Nov 30, 2011 at 10:58 AM, Alex Williamson wrote: [cut] > In that case, you should definitely be following what Alexey is thinking > about with an iommu_setup IOMMU API callback. I think it's shaping up > to do: > > x86: > - Report any IOVA range restrictions imposed by hw implementation > POWER: > - Request IOVA window size, report size and base > powerpc: > - Set domain attributes, probably report range as well. Alex, Alexey I'm wondering if you've had any new thoughts on this over the last week. For Freescale, our iommu domain attributes would look something like: -domain iova base address -domain iova window size -domain enable/disable -number of subwindows -operation mapping table index -stash destination CPU -stash target (cache– L1, L2, L3) These are all things that need to be set by the creator of the domain. Since the domain attributes are going to be so different for each platform does it make sense to define a new iommu_ops call back that just takes a void pointer that can be implemented in a platform specific way? For example: struct iommu_ops { [cut] int (*domain_set_attrs)(struct iommu_domain *domain, void *attrs); int (*domain_get_attrs)(struct iommu_domain *domain, void *attrs); } Whatever this API winds up looking like it needs to be reflected in the vfio interface to user space as well. Thanks, Stuart
Re: [Qemu-devel] vfio / iommu domain attributes
On Wed, Dec 7, 2011 at 10:38 AM, Joerg Roedel wrote: > On Wed, Dec 07, 2011 at 09:54:39AM -0600, Stuart Yoder wrote: >> Alex, Alexey I'm wondering if you've had any new thoughts on this over >> the last week. >> >> For Freescale, our iommu domain attributes would look something like: >> -domain iova base address >> -domain iova window size > > I agree with that. > >> -domain enable/disable >> -number of subwindows >> -operation mapping table index >> -stash destination CPU >> -stash target (cache– L1, L2, L3) > > Why does the user of the IOMMU-API need to have control over these > things? Our IOMMU complicates things in that it is used for more than just memory protection and address translation. It has a concept of operation translation as well. Some devices could do a 'write' transaction that when passing through the iommu gets translated to a a 'write-with-stash'. Stashed transactions get pushed directly into some cache. It's the entity creating and setting up the domain that will have the knowledge of what cache is to be stashed to.Right now software that uses stashing is pinned to a CPU, but if in the future it's possible that we'll want to work without pinning and may need to update stashing attributes on the fly. The overall iova window for the domain can be divided into a configurable number of subwindows (a power of 2, up to 256), which means we can have a contiguous iova region backed by up to 256 physically dis-contiguous subwindows.The creator of the iommu domain is in the best position to know how many subwindows are needed (the fewer the better for performance reasons). So, in short, the above list of attributes are the attributes of our iommu hardware and the knowlege of how they should be set is with the domain creator. >> These are all things that need to be set by the creator of the domain. >> >> Since the domain attributes are going to be so different for each platform >> does >> it make sense to define a new iommu_ops call back that just takes a void >> pointer >> that can be implemented in a platform specific way? For example: >> >> struct iommu_ops { >> [cut] >> int (*domain_set_attrs)(struct iommu_domain *domain, >> void *attrs); >> int (*domain_get_attrs)(struct iommu_domain *domain, >> void *attrs); >> } > > A void pointer is certainly the worst choice for an interface. I think > it is better to have at least a set of common attributes. Somthing like > this: > > iommu_domain_set_attr(struct iommu_domain *domain, enum attr_type, void *data) > iommu_domain_get_attr(struct iommu_domain *domain, enum attr_type, void *data) That would be fine. Stuart