Re: [Qemu-devel] Running KVM guest on X86

2012-08-09 Thread Stuart Yoder
On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777
 wrote:
>
>
>> -Original Message-
>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>> Sent: Monday, August 06, 2012 9:27 PM
>> To: Bhushan Bharat-R65777
>> Cc: qemu-devel@nongnu.org; Avi Kivity
>> Subject: Re: Running KVM guest on X86
>>
>> On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
>> > Hi Avi/All,
>> >
>> > I am facing issue to boot KVM guest on x86 (I used to work on PowerPC 
>> > platform
>> and do not have enough knowledge of x86). I am working on making VFIO 
>> working on
>> PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled 
>> kernel
>> for x86 on fedora with virtualization configuration (selected all kernel 
>> config
>> options for same). Run below command to boot Guest (I have not provided vfio
>> device yet):
>> >
>> > "qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
>> arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial
>> tcp::,server,telnet"
>> >
>> > After the I can see qemu command line (able to run various commands like 
>> > "info
>> registers" etc), while guest does not boot (not even the first print comes).
>> >
>> > Can anyone help in what I am missing or doing wrong?
>>
>> x86 doesn't use the serial port for console by default, so you're making 
>> things
>> quite a bit more difficult that way.  Typically you'll want to provide a disk
>> image (the -hda option is the easiest way to do this), a display (-vga std 
>> -vnc
>> :0 is again easiest), and probably something to install from (-cdrom
>> ).  You can also add a -boot d to get it to choose the cdrom the
>> first time for install.  Thanks,
>
> Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append 
> "console=ttyS0"

Note, once you get to user space you will need a getty specified in
inittab in order to get a login on your serial port.   Something like:

   T0:23:respawn:/sbin/getty -L ttyS0

Stuart



[Qemu-devel] [PATCH] PPC: e500: set has-idle in guest device tree

2012-06-22 Thread Stuart Yoder
From: Stuart Yoder 

If the host kernel supports the idle hcall, then advertise
that to the guest kernel via the device tree.

Signed-off-by: Stuart Yoder 
---
 hw/ppce500_mpc8544ds.c |5 +
 target-ppc/kvm.c   |   26 +++---
 target-ppc/kvm_ppc.h   |1 +
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c
index b1a0b8c..7d111bb 100644
--- a/hw/ppce500_mpc8544ds.c
+++ b/hw/ppce500_mpc8544ds.c
@@ -124,6 +124,11 @@ static int mpc8544_load_device_tree(CPUPPCState *env,
 kvmppc_get_hypercall(env, hypercall, sizeof(hypercall));
 qemu_devtree_setprop(fdt, "/hypervisor", "hcall-instructions",
  hypercall, sizeof(hypercall));
+
+/* add "has-idle" prop if KVM supports ePAPR idle */
+if (kvmppc_get_hasidle(env)) {
+qemu_devtree_setprop(fdt, "/hypervisor", "has-idle", NULL, 0);
+}
 }
 
 /* We need to generate the cpu nodes in reverse order, so Linux can pick
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index c09cc39..f986de3 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -705,16 +705,36 @@ uint32_t kvmppc_get_dfp(void)
 return kvmppc_read_int_cpu_dt("ibm,dfp");
 }
 
+static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
+{
+if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) &&
+!kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, pvinfo)) {
+return 0;
+}
+
+return 1;
+}
+
+int kvmppc_get_hasidle(CPUPPCState *env)
+{
+struct kvm_ppc_pvinfo pvinfo;
+
+if (!kvmppc_get_pvinfo(env, &pvinfo) &&
+(pvinfo.flags & KVM_PPC_PVINFO_FLAGS_EV_IDLE)) {
+return 1;
+}
+
+return 0;
+}
+
 int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len)
 {
 uint32_t *hc = (uint32_t*)buf;
 
 struct kvm_ppc_pvinfo pvinfo;
 
-if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) &&
-!kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, &pvinfo)) {
+if (!kvmppc_get_pvinfo(env, &pvinfo)) {
 memcpy(buf, pvinfo.hcall, buf_len);
-
 return 0;
 }
 
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 34ecad3..c2c75b6 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -19,6 +19,7 @@ uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
 uint32_t kvmppc_get_vmx(void);
 uint32_t kvmppc_get_dfp(void);
+int kvmppc_get_hasidle(CPUPPCState *env);
 int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUPPCState *env, int irq, int level);
 void kvmppc_set_papr(CPUPPCState *env);
-- 
1.7.3.4





Re: [Qemu-devel] [PATCH 4/4] PPC: e500: add generic e500 platform

2012-07-02 Thread Stuart Yoder
On Wed, Jun 27, 2012 at 6:51 PM, Scott Wood  wrote:
> This gives the kernel a paravirtualized machine to target, without
> requiring both sides to pretend to be targeting a specific board
> that likely has little to do with the host in KVM scenarios.  This
> avoids the need to add new boards to QEMU, just to be able to
> run KVM on new CPUs.
>
> Signed-off-by: Scott Wood 
> ---
>  hw/ppc/Makefile.objs |    3 +-
>  hw/ppc/e500plat.c    |   60 
> ++
>  2 files changed, 62 insertions(+), 1 deletions(-)
>  create mode 100644 hw/ppc/e500plat.c
>
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 23eb8ca..58d82c9 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -15,7 +15,8 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o 
> spapr_iommu.o
>  obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>  obj-y += ppc440_bamboo.o
>  # PowerPC E500 boards
> -obj-$(CONFIG_FDT) += ppc/e500.o mpc8544_guts.o ppce500_spin.o ppc/mpc8544ds.o
> +obj-$(CONFIG_FDT) += ppc/e500.o mpc8544_guts.o ppce500_spin.o 
> ppc/mpc8544ds.o \
> +       ppc/e500plat.o
>  # PowerPC 440 Xilinx ML507 reference board.
>  obj-y += virtex_ml507.o
>  # PowerPC OpenPIC
> diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c
> new file mode 100644
> index 000..a9ef5f8
> --- /dev/null
> +++ b/hw/ppc/e500plat.c
> @@ -0,0 +1,60 @@
> +/*
> + * Generic device-tree-driven paravirt PPC e500 platform
> + *
> + * Copyright 2012 Freescale Semiconductor, Inc.
> + *
> + * This is free software; you can redistribute it and/or modify
> + * it under the terms of  the GNU General  Public License as published by
> + * the Free Software Foundation;  either version 2 of the  License, or
> + * (at your option) any later version.
> + */
> +
> +#include "config.h"
> +#include "qemu-common.h"
> +#include "e500.h"
> +#include "../boards.h"
> +#include "device_tree.h"
> +
> +static void e500plat_fixup_devtree(PPCE500Params *params, void *fdt)
> +{
> +    const char model[] = "QEMU e500plat";
> +    const char compatible[] = "fsl,qemu-e500";
> +
> +    qemu_devtree_setprop(fdt, "/", "model", model, sizeof(model));
> +    qemu_devtree_setprop(fdt, "/", "compatible", compatible,
> +                         sizeof(compatible));
> +}
> +
> +static void e500plat_init(ram_addr_t ram_size,
> +                           const char *boot_device,
> +                           const char *kernel_filename,
> +                           const char *kernel_cmdline,
> +                           const char *initrd_filename,
> +                           const char *cpu_model)
> +{
> +    PPCE500Params params = {
> +        .ram_size = ram_size,
> +        .boot_device = boot_device,
> +        .kernel_filename = kernel_filename,
> +        .kernel_cmdline = kernel_cmdline,
> +        .initrd_filename = initrd_filename,
> +        .cpu_model = cpu_model,
> +        .fixup_devtree = e500plat_fixup_devtree,
> +    };
> +
> +    ppce500_init(¶ms);
> +}
> +
> +static QEMUMachine e500plat_machine = {
> +    .name = "e500plat",
> +    .desc = "e500plat",
> +    .init = e500plat_init,
> +    .max_cpus = 15,
> +};

Can we call the generic e500 machine  "ppce500"?  I like that better as
it denotes both the "Power/PPC" architecture and "e500".   I aesthetically
like -M ppce500 over -M e500plat when running QEMU.

Stuart



[Qemu-devel] [PATCH] QEMU: PPC: set has-idle in guest device tree

2013-01-03 Thread Stuart Yoder
From: Stuart Yoder 

Signed-off-by: Stuart Yoder 
---

-note: this patch requires a kernel headers update
 for the pvinfo idle flag.  See Bharat Bhushan's
 recent patch "Synchronized the linux headers"
---
 hw/ppc/e500.c|4 
 target-ppc/kvm.c |   29 ++---
 target-ppc/kvm_ppc.h |1 +
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index aa54fd8..a19f094 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -225,6 +225,10 @@ static int ppce500_load_device_tree(CPUPPCState *env,
 kvmppc_get_hypercall(env, hypercall, sizeof(hypercall));
 qemu_devtree_setprop(fdt, "/hypervisor", "hcall-instructions",
  hypercall, sizeof(hypercall));
+/* if KVM supports the idle hcall, set property indicating this */
+if (kvmppc_get_hasidle(env)) {
+qemu_devtree_setprop(fdt, "/hypervisor", "has-idle", NULL, 0);
+}
 }
 
 /* Create CPU nodes */
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 88650d4..10adf26 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -967,16 +967,39 @@ uint32_t kvmppc_get_dfp(void)
 return kvmppc_read_int_cpu_dt("ibm,dfp");
 }
 
+static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
+ {
+ PowerPCCPU *cpu = ppc_env_get_cpu(env);
+ CPUState *cs = CPU(cpu);
+
+if (kvm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO) &&
+!kvm_vm_ioctl(cs->kvm_state, KVM_PPC_GET_PVINFO, pvinfo)) {
+return 0;
+}
+
+return 1;
+}
+
+int kvmppc_get_hasidle(CPUPPCState *env)
+{
+struct kvm_ppc_pvinfo pvinfo;
+
+if (!kvmppc_get_pvinfo(env, &pvinfo) &&
+(pvinfo.flags & KVM_PPC_PVINFO_FLAGS_EV_IDLE)) {
+return 1;
+}
+
+return 0;
+}
+
 int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len)
 {
 uint32_t *hc = (uint32_t*)buf;
 
 struct kvm_ppc_pvinfo pvinfo;
 
-if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) &&
-!kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, &pvinfo)) {
+if (!kvmppc_get_pvinfo(env, &pvinfo)) {
 memcpy(buf, pvinfo.hcall, buf_len);
-
 return 0;
 }
 
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 83f9872..b1f096f 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -19,6 +19,7 @@ uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_get_clockfreq(void);
 uint32_t kvmppc_get_vmx(void);
 uint32_t kvmppc_get_dfp(void);
+int kvmppc_get_hasidle(CPUPPCState *env);
 int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUPPCState *env, int irq, int level);
 void kvmppc_set_papr(CPUPPCState *env);
-- 
1.7.9.7





[Qemu-devel] [PATCH] configure: change endian cross compilation test

2012-03-14 Thread Stuart Yoder
From: Stuart Yoder 

Previous check in configure's endian test was to determine if
this is a cross-compile build by testing whether --cross-prefix
was used.  This does not work for cross build environments
like Yocto that may set CC instead of --cross-prefix.

Instead, test whether host compiler is same as target compiler,
which also works when --cross-prefix is used.

Signed-off-by: Stuart Yoder 
---
 configure |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index fe4fc4f..c5333bf 100755
--- a/configure
+++ b/configure
@@ -1269,7 +1269,7 @@ feature_not_found() {
   exit 1;
 }
 
-if test -z "$cross_prefix" ; then
+if test $cc = $host_cc; then
 
 # ---
 # big/little endian test
-- 
1.7.3.4





[Qemu-devel] [PATCH][v2] configure: change endianness test

2012-03-14 Thread Stuart Yoder
From: Stuart Yoder 

Remove the runtime check for endianness, and for platforms
that can be bit or little endian do a compile time check.

This resolves an issue encountered building QEMU
under Yocto which was not setting --cross-prefix.

Signed-off-by: Stuart Yoder 
---

-v2: removed the dynamic runtime test completely,
 added compile time check for mips

 configure |   33 -
 1 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/configure b/configure
index fe4fc4f..d9c5999 100755
--- a/configure
+++ b/configure
@@ -1269,41 +1269,24 @@ feature_not_found() {
   exit 1;
 }
 
-if test -z "$cross_prefix" ; then
-
-# ---
-# big/little endian test
-cat > $TMPC << EOF
-#include 
-int main(int argc, char ** argv){
-volatile uint32_t i=0x01234567;
-return (*((uint8_t*)(&i))) == 0x67;
-}
-EOF
-
-if compile_prog "" "" ; then
-$TMPE && bigendian="yes"
-else
-echo big/little test failed
-fi
-
-else
-
-# if cross compiling, cannot launch a program, so make a static guess
+##
+# endianness check
 case "$cpu" in
   arm)
-# ARM can be either way; ask the compiler which one we are
 if check_define __ARMEB__; then
   bigendian=yes
 fi
   ;;
-  hppa|m68k|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64)
+  mips|mips64)
+if check_define __MIPSEB__; then
+  bigendian=yes
+fi
+  ;;
+  hppa|m68k|ppc|ppc64|s390|s390x|sparc|sparc64)
 bigendian=yes
   ;;
 esac
 
-fi
-
 ##
 # NPTL probe
 
-- 
1.7.3.4





[Qemu-devel] [PATCH] remove cross prefix from pkg-config command

2011-08-02 Thread Stuart yoder
From: Stuart Yoder 

the host pkg-config tool should be used with the location to
pkg-config *.pc files being specified via the PKG_CONFIG_PATH

Signed-off-by: Stuart Yoder 
---

 The Freescale cross build environment is multilib which
 means there are special library paths for different
 cpu types.  pkg-config installed along with the 
 other cross tools does not work...it has no clue how
 to find the right libraries.   So using cross-prefix
 does no good.  Only thing I can get to work is 
 explicitly setting PKG_CONFIG_PATH before running
 configure.

 Not sure how other cross build envionments work, but
 it seems like the above method should be sufficiently
 general.

 configure |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index 38e3724..59f6e09 100755
--- a/configure
+++ b/configure
@@ -223,7 +223,7 @@ objcopy="${cross_prefix}${OBJCOPY-objcopy}"
 ld="${cross_prefix}${LD-ld}"
 strip="${cross_prefix}${STRIP-strip}"
 windres="${cross_prefix}${WINDRES-windres}"
-pkg_config="${cross_prefix}${PKG_CONFIG-pkg-config}"
+pkg_config="${PKG_CONFIG-pkg-config}"
 sdl_config="${cross_prefix}${SDL_CONFIG-sdl-config}"
 
 # default flags for all hosts
-- 
1.7.3.4





[Qemu-devel] [PATCH] when overriding default tool names don't add cross-prefix

2011-08-04 Thread Stuart yoder
From: Stuart Yoder 

When overriding a tool name via a shell variable, don't
tack on the cross-prefix.  This specifically allows the
pkg-config command to be overridden and work where it
does not exist in some cross build environments.

Signed-off-by: Stuart Yoder 
---
 configure |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index 38e3724..ad522f2 100755
--- a/configure
+++ b/configure
@@ -217,14 +217,14 @@ done
 # Using uname is really, really broken.  Once we have the right set of checks
 # we can eliminate it's usage altogether
 
-cc="${cross_prefix}${CC-gcc}"
-ar="${cross_prefix}${AR-ar}"
-objcopy="${cross_prefix}${OBJCOPY-objcopy}"
-ld="${cross_prefix}${LD-ld}"
-strip="${cross_prefix}${STRIP-strip}"
-windres="${cross_prefix}${WINDRES-windres}"
-pkg_config="${cross_prefix}${PKG_CONFIG-pkg-config}"
-sdl_config="${cross_prefix}${SDL_CONFIG-sdl-config}"
+cc="${CC-${cross_prefix}gcc}"
+ar="${AR-${cross_prefix}ar}"
+objcopy="${OBJCOPY-${cross_prefix}objcopy}"
+ld="${LD-${cross_prefix}ld}"
+strip="${STRIP-${cross_prefix}strip}"
+windres="${WINDRES-${cross_prefix}windres}"
+pkg_config="${PKG_CONFIG-${cross_prefix}pkg-config}"
+sdl_config="${SDL_CONFIG-${cross_prefix}sdl-config}"
 
 # default flags for all hosts
 QEMU_CFLAGS="-fno-strict-aliasing $QEMU_CFLAGS"
-- 
1.7.3.4





[Qemu-devel] bringing up virtio network device hangs guest

2012-03-05 Thread Stuart Yoder
I'm trying to solve a problem where bringing up a virtio network
device in a KVM guest hangs the guest.

Start QEMU with these options:
   -net nic,model=virtio -net tap,script=/root/qemu-ifup

The qemu-ifup script is pretty simple, just adds the interface passed
in to a bridge:
  #!/bin/sh
  bridge=br0
  guest_device=$1
  ifconfig $guest_device up
  brctl addif $bridge $guest_device

The guest kernel finds the virtio device, but when bringing up the
virtio eth0 device in the
guest the ifconfig hangs the guest.

In the host, the bridge (br0) is able to send and receive packets.

The tap device created by QEMU shows packets sent, but none received:

tap0  Link encap:Ethernet  HWaddr 0A:02:56:5A:72:C6
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:257 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:500
  RX bytes:0 (0.0 B)  TX bytes:17592 (17.1 KiB)

Wondered if anyone had any ideas on how to further diagnose what might
be going on.

Thanks,
Stuart



Re: [Qemu-devel] State of KVM guest debugging support on Power

2011-11-03 Thread Stuart Yoder
On Tue, Nov 1, 2011 at 9:22 AM, Jan Kiszka  wrote:
> Hi there,
>
> I'm generating some slides on guest debugging via kvm. What's the
> current state for Book-E and Book-S? Works out of box, mostly usable, or
> to be implemented? Is anyone using it?

Are you talking about guest debug using the QEMU stub or using
the virtual CPU's debug registers and interrupts?For Freescale
Book E we have both approaches working, and patches to be sent upstream
soon.

Stuart



Re: [Qemu-devel] State of KVM guest debugging support on Power

2011-11-04 Thread Stuart Yoder
On Thu, Nov 3, 2011 at 2:14 PM, Jan Kiszka  wrote:
> On 2011-11-03 19:59, Stuart Yoder wrote:
>> On Tue, Nov 1, 2011 at 9:22 AM, Jan Kiszka  wrote:
>>> Hi there,
>>>
>>> I'm generating some slides on guest debugging via kvm. What's the
>>> current state for Book-E and Book-S? Works out of box, mostly usable, or
>>> to be implemented? Is anyone using it?
>>
>> Are you talking about guest debug using the QEMU stub or using
>> the virtual CPU's debug registers and interrupts?    For Freescale
>> Book E we have both approaches working, and patches to be sent upstream
>> soon.
>
> It's good to see both features coming into mainline.
>
> I'll talk about the former, ie. debugging without guest help/awareness
> (virtual hardware debugger). Is gdb well prepared for it (all registers
> available, real-mode support, etc.)?

Right now it looks like the basic register set is available, not SPRs but not
because of any gdb limitation as far as I know.   gdb supports additional
registers with the correct target description xml file passed back to
it when it connects to the target.

In the QEMU monitor, we currently show key SPRs for 'info registers'.

As far as real mode goes, don't have direct experience with it as
everything we are doing is bookE, which has no real mode.

Stuart



[Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-09 Thread Stuart Yoder
Based on the discussions over the last couple of weeks
I have updated the device fd file layout proposal and
tried to specify it a bit more formally.

===

1.  Overview

  This specification describes the layout of device files
  used in the context of vfio, which gives user space
  direct access to I/O devices that have been bound to
  vfio.

  When a device fd is opened and read, offset 0x0 contains
  a fixed sized header followed by a number of variable length
  records that describe different characteristics
  of the device-- addressable regions, interrupts, etc.

  0x0  +-+-+
   | magic | u32  // identifies this as a vfio
device file
   +---+ and identifies the type of bus
   | version   | u32  // specifies the version of this
   +---+
   | flags | u32  // encodes any flags
   +---+
   |  dev info record 0|
   |type   | u32   // type of record
   |rec_len| u32   // length in bytes of record
   |   |  (including record header)
   |flags  | u32   // type specific flags
   |...content...  |   // record content, which could
   +---+   // include sub-records
   |  dev info record 1|
   +---+
   |  dev info record N|
   +---+

  The device info records following the file header may have
  the following record types each with content encoded in
  a record specific way:

  +---+--
  |  type |
   Region |  num  | Description
  ---
  REGION   1describes an addressable address range for the device
  DTPATH   2describes the device tree path for the device
  DTINDEX  3describes the index into the related device tree
  property (reg,ranges,interrupts,interrupt-map)
  INTERRUPT4describes an interrupt for the device
  PCI_CONFIG_SPACE 5property identifying a region as PCI config space
  PCI_BAR_INDEX6describes the BAR index for a PCI region
  PHYS_ADDR7describes the physical address of the region
  ---

2. Header

The header is located at offset 0x0 in the device fd
and has the following format:

struct devfd_header {
__u32 magic;
__u32 version;
__u32 flags;
};

The 'magic' field contains a magic value that will
identify the type bus the device is on.  Valid values
are:

0x70636900   // "pci" - PCI device
0x6474   // "dt" - device tree (system bus)

3. Region

  A REGION record an addressable address region for the device.

struct devfd_region {
__u32 type;   // must be 0x1
__u32 record_len;
__u32 flags;
__u64 offset; // seek offset to region from beginning
  // of file
__u64 len   ; // length of the region
};

  The 'flags' field supports one flag:

  IS_MMAPABLE

4. Device Tree Path (DTPATH)

  A DTPATH record is a sub-record of a REGION and describes
  the path to a device tree node for the region

struct devfd_dtpath {
__u32 type;   // must be 0x2
__u32 record_len;
__u64 char[]   ; // length of the region
};

5. Device Tree Index (DTINDEX)

  A DTINDEX record is a sub-record of a REGION and specifies
  the index into the resource list encoded in the associated
  device tree property-- "reg", "ranges", "interrupts", or
  "interrupt-map".

struct devfd_dtindex {
__u32 type;   // must be 0x3
__u32 record_len;
__u32 prop_type;
__u32 prop_index;  // index into the resource list
};

prop_type must have one of the follow values:
   1   // "reg" property
   2   // "ranges" property
   3   // "interrupts" property
   4   // "interrupts" property

Note: prop_index is not the byte offset into the property,
but the logical index.

6. Interrupts (INTERRUPT)

  An INTERRUPT record describes one of a device's interrupts.
  The handle field is an argument to VFIO_DEVICE_GET_IRQ_FD
  which user space can use to receive device interrupts.

struct devfd_interrupts {
__u32 type;   // must be 0x4
__u32 record_len;
__u32 flags;
__u32 handle;  // parameter to VFIO_DEVICE_GET_IRQ_FD
};

7.  PCI Config Space (PCI_CONFIG_SPACE)

A PCI_CONFIG_SPACE record is a sub-record of a REGION record
and identifies the region as PCI configuration space.

struct de

Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-09 Thread Stuart Yoder
Meant to identify the changes in v2 of this proposal:

v2:
   -removed PCI_INFO record type
   -removed PCI_BAR_INFO record type
   -PCI_CONFIG_SPACE is now a sub-record/property of a REGION
   -removed physical address from region and made it
a subrecord/property of a REGION
   -added PCI_BAR_INDEX sub-record type
   -updated magic numbers

Stuart

On Fri, Sep 9, 2011 at 8:11 AM, Stuart Yoder  wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
>
> ===
>
> 1.  Overview
>
>  This specification describes the layout of device files
>  used in the context of vfio, which gives user space
>  direct access to I/O devices that have been bound to
>  vfio.
>
>  When a device fd is opened and read, offset 0x0 contains
>  a fixed sized header followed by a number of variable length
>  records that describe different characteristics
>  of the device-- addressable regions, interrupts, etc.
>
>  0x0  +-+-+
>       |         magic             | u32  // identifies this as a vfio
> device file
>       +---+         and identifies the type of bus
>       |         version           | u32  // specifies the version of this
>       +---+
>       |         flags             | u32  // encodes any flags
>       +---+
>       |  dev info record 0        |
>       |    type                   | u32   // type of record
>       |    rec_len                | u32   // length in bytes of record
>       |                           |          (including record header)
>       |    flags                  | u32   // type specific flags
>       |    ...content...          |       // record content, which could
>       +---+       // include sub-records
>       |  dev info record 1        |
>       +---+
>       |  dev info record N        |
>       +---+
>
>  The device info records following the file header may have
>  the following record types each with content encoded in
>  a record specific way:
>
>  +---+--
>              |  type |
>   Region     |  num  | Description
>  ---
>  REGION           1    describes an addressable address range for the device
>  DTPATH           2    describes the device tree path for the device
>  DTINDEX          3    describes the index into the related device tree
>                          property (reg,ranges,interrupts,interrupt-map)
>  INTERRUPT        4    describes an interrupt for the device
>  PCI_CONFIG_SPACE 5    property identifying a region as PCI config space
>  PCI_BAR_INDEX    6    describes the BAR index for a PCI region
>  PHYS_ADDR        7    describes the physical address of the region
>  ---
>
> 2. Header
>
> The header is located at offset 0x0 in the device fd
> and has the following format:
>
>    struct devfd_header {
>        __u32 magic;
>        __u32 version;
>        __u32 flags;
>    };
>
>    The 'magic' field contains a magic value that will
>    identify the type bus the device is on.  Valid values
>    are:
>
>        0x70636900   // "pci" - PCI device
>        0x6474   // "dt" - device tree (system bus)
>
> 3. Region
>
>  A REGION record an addressable address region for the device.
>
>    struct devfd_region {
>        __u32 type;   // must be 0x1
>        __u32 record_len;
>        __u32 flags;
>        __u64 offset; // seek offset to region from beginning
>                      // of file
>        __u64 len   ; // length of the region
>    };
>
>  The 'flags' field supports one flag:
>
>      IS_MMAPABLE
>
> 4. Device Tree Path (DTPATH)
>
>  A DTPATH record is a sub-record of a REGION and describes
>  the path to a device tree node for the region
>
>    struct devfd_dtpath {
>        __u32 type;   // must be 0x2
>        __u32 record_len;
>        __u64 char[]   ; // length of the region
>    };
>
> 5. Device Tree Index (DTINDEX)
>
>  A DTINDEX record is a sub-record of a REGION and specifies
>  the index into the resource list encoded in the associated
>  device tree property-- "reg", "ranges", "interrupts", or
>  "interrupt-map".
>
>    struct devfd_dtindex {
>        __u32 type;   // must be 0x3
>     

[Qemu-devel] [PATCH] support compiling and installing DTBs

2011-09-16 Thread Stuart Yoder
From: Stuart Yoder 

also adds configure options to enable/disable installing DTBs
and override location of dtc

Signed-off-by: Stuart Yoder 
---
 Makefile  |   17 +++--
 configure |   24 
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 7e9382f..1fe0a0a 100644
--- a/Makefile
+++ b/Makefile
@@ -245,7 +245,6 @@ ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc \
 pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
-mpc8544ds.dtb \
 multiboot.bin linuxboot.bin \
 s390-zipl.rom \
 spapr-rtas.bin slof.bin
@@ -253,6 +252,20 @@ else
 BLOBS=
 endif
 
+ifdef INSTALL_DTBS
+DTBS=$(SRC_PATH)/pc-bios/mpc8544ds.dtb
+
+%.dtb: %.dts
+   $(call quiet-command,$(DTC) -I dts -O dtb -b 0 -o $@ $<, "  DTC   $@")
+
+install-dtbs: $(DTBS)
+   $(INSTALL_DIR) "$(DESTDIR)$(datadir)"
+   set -e; for x in $(DTBS); do \
+   $(INSTALL_DATA) $$x "$(DESTDIR)$(datadir)"; \
+   done
+
+endif
+
 install-doc: $(DOCS)
$(INSTALL_DIR) "$(DESTDIR)$(docdir)"
$(INSTALL_DATA) qemu-doc.html  qemu-tech.html "$(DESTDIR)$(docdir)"
@@ -267,7 +280,7 @@ install-sysconfig:
$(INSTALL_DIR) "$(DESTDIR)$(sysconfdir)/qemu"
$(INSTALL_DATA) $(SRC_PATH)/sysconfigs/target/target-x86_64.conf 
"$(DESTDIR)$(sysconfdir)/qemu"
 
-install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig
+install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig $(if 
$(INSTALL_DTBS),install-dtbs)
$(INSTALL_DIR) "$(DESTDIR)$(bindir)"
 ifneq ($(TOOLS),)
$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
diff --git a/configure b/configure
index 0875f95..7d9447f 100755
--- a/configure
+++ b/configure
@@ -182,6 +182,7 @@ usb_redir=""
 opengl=""
 zlib="yes"
 guest_agent="yes"
+dtc=dtc
 
 # parse CC options first
 for opt do
@@ -728,6 +729,12 @@ for opt do
   ;;
   --disable-blobs) blobs="no"
   ;;
+  --enable-dtbs) install_dtbs="yes"
+  ;;
+  --disable-dtbs) install_dtbs="no"
+  ;;
+  --with-dtc=*) dtc=$optarg
+  ;;
   --with-pkgversion=*) pkgversion=" ($optarg)"
   ;;
   --disable-docs) docs="no"
@@ -771,6 +778,13 @@ for opt do
   esac
 done
 
+# set default for $install_dtbs, may be overriden by command line
+if has $dtc; then
+install_dtbs="yes"
+else
+install_dtbs="no"
+fi
+
 #
 # If cpu ~= sparc and  sparc_cpu hasn't been defined, plug in the right
 # QEMU_CFLAGS/LDFLAGS (assume sparc_v8plus for 32-bit and sparc_v9 for 64-bit)
@@ -1028,6 +1042,9 @@ echo "  --enable-linux-aio   enable Linux AIO support"
 echo "  --disable-attr   disables attr and xattr support"
 echo "  --enable-attrenable attr and xattr support"
 echo "  --disable-blobs  disable installing provided firmware blobs"
+echo "  --enable-dtbsbuild/install provided device trees (requires 
dtc)"
+echo "  --disable-dtbs   don't build/install provided device trees"
+echo "  --with-dtc   full path to device tree compiler (overrides 
default)"
 echo "  --enable-docsenable documentation build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
@@ -2713,6 +2730,7 @@ echo "vde support   $vde"
 echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
+echo "Install dtbs  $install_dtbs"
 echo "KVM support   $kvm"
 echo "fdt support   $fdt"
 echo "preadv support$preadv"
@@ -2982,6 +3000,9 @@ fi
 if test "$blobs" = "yes" ; then
   echo "INSTALL_BLOBS=yes" >> $config_host_mak
 fi
+if test "$install_dtbs" = "yes" ; then
+  echo "INSTALL_DTBS=yes" >> $config_host_mak
+fi
 if test "$iovec" = "yes" ; then
   echo "CONFIG_IOVEC=y" >> $config_host_mak
 fi
@@ -3122,6 +3143,9 @@ echo "ARLIBS_END=$arlibs_end" >> $config_host_mak
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "LIBS_TOOLS+=$libs_tools" >> $config_host_mak
 echo "EXESUF=$EXESUF" >> $config_host_mak
+if test "$install_dtbs" = "yes" ; then
+  echo "DTC=$dtc" >> $config_host_mak
+fi
 echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
 
 # generate list of library paths for linker script
-- 
1.7.3.4





[Qemu-devel] [PATCH] remove mpc8544ds.dtb

2011-09-16 Thread Stuart Yoder
From: Stuart Yoder 

make install now compiles dtb

Signed-off-by: Stuart Yoder 
---

apply after 'support compiling and installing DTBs'

 pc-bios/mpc8544ds.dtb |  Bin 2277 -> 0 bytes
 1 files changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 pc-bios/mpc8544ds.dtb

diff --git a/pc-bios/mpc8544ds.dtb b/pc-bios/mpc8544ds.dtb
deleted file mode 100644
index 
ae318b1fe83846cc2e133951a3666fcfcdf87f79..
GIT binary patch
literal 0
HcmV?d1

literal 2277
zcmb7Fy>C=U5Z@O^han^~5(T1&EZjuso_vEG~YMsIJg>4f2Eiz1{8M@ZMj+0sBqj-i4i|`8*)t|B)DTA(!au
zQBShmX5M&G+4n1r{Y_tKOfoWqK%3q;o8_b7_Fw?y1$KBjcBhN|g<^i+thMEwyOfsG
zs2^xZbUot&V&2Q@MMSb+{cGI*UZ3j=Nn}l9$^(q(51l!;_;Ggm26TcfE>5H{csS;yqBG&_{S74_@7AV
zd0W)ENe|*K19zSHj98uHvxX1n=X%arWuxSjwI)lQApX}J5_0f51CFi0AaAeqTRYp^
zd}v~$wcjTfx4Gn!eH6@F0^B9W-U#Mh3<~=dyEAs2U{I6v{CC#$M+|M}`|g@*t4BlJ
z9?C?7nn3>%OMN&;P}CEQvc}LCxC!8Z!5!-9sjK@{ZOW8pU>C<^<^%zo_|1NDpUC_pXLIQCVmr#1JKS=!5(`
z$G6vBYE#|5StERBu0cw1WNNYFj!xKww)TJcg>c50|D1e#N<9@jMywyq^(^_KjbjP3
z5Qlt-!znELqD^u?CrC?(0mt|SpUfBI7kn~b#4Pxh=4Jh>;`O61;t&
zFT}77h|d53okfPcPK?5Z-ir{7h(#T*Rz8hh(inB_T_D82PO{7oJo*SeF!#*4Iwf(9
zIGl-|R^F6Q8AJb}a-%k~$;27&
zQ*H9xs|{}-n<$6(YM{cnrKr~iYGRJ9Hh(viUpaHACaKQ!)TFsBO^RS;zp)5r!Ocoz
z3v)m}+w_q$|IAFOPZT}L9ZlzzKI-T$D-Y?{jABz;PRv}{vN6uhYPy51nAVvyWz#X}
zs%%`9=fuFX_XhU$br&GJmva;&Qnt71H=-Qq3AF
HpqZ+#N9HOV

-- 
1.7.3.4





Re: [Qemu-devel] KVM call agenfda for 2014-04-01

2014-04-10 Thread Stuart Yoder


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, April 10, 2014 10:57 AM
> To: Peter Maydell
> Cc: Juan Quintela; KVM devel mailing list; qemu list; Yoder Stuart-
> B08248; Alistair Francis; Peter Crosthwaite; Christoffer Dall
> Subject: Re: [Qemu-devel] KVM call agenfda for 2014-04-01
> 
> 
> On 10.04.2014, at 17:52, Peter Maydell  wrote:
> 
> > On 10 April 2014 16:49, Alexander Graf  wrote:
> >> For the next call, I would propose to revive the "platform bus"
> >> (aka: how to create non-PCI devices with -device) discussions
> >> to make sure we're all on the same page.
> >
> > I rather suspect we are not :-)  Do you have a link to
> > the current proposals for prior reading?
> 
> The only thing I could find is the old thread about my platform bus
> approach (which Anthony disliked):
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2013-07/msg03614.html
> 
> So from what I remember the plan moving forward was to have a special
> device type similar to my platform bus devices that you can just create
> using -device, no bus involved. The machine file would then loop through
> them, interpret the "I sit at address x" and "I want interrupt number y"
> fields to link them to whatever the machine model thinks is a good fit.
> 
> The same way the machine model today has to have knowledge on each device
> tree node type it generates, it would do the same for these devices. So
> the machine has to have awareness of all the "funky special options" a
> device tree node receives - the same as for any other device. Just that
> in this case it wouldn't be able to hardcode them, but have to generate
> them on the fly when it sees a device in the object tree.

Another link that may help from a call we had back in Sept:
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-September/005532.html

Stuart



[Qemu-devel] [PATCH 1/2] QEMU: PPC: specify PVRs for all e500 cores

2014-02-14 Thread Stuart Yoder
From: Stuart Yoder 

Signed-off-by: Stuart Yoder 
---
 target-ppc/cpu-models.c |   64 +--
 target-ppc/cpu-models.h |   30 --
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index f6c9b3a..9cf603b 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -677,19 +677,79 @@
 "PowerPC e500 v2.0 core")
 POWERPC_DEF("e500v2_v10",CPU_POWERPC_e500v2_v10, e500v2,
 "PowerPC e500v2 v1.0 core")
+POWERPC_DEF("e500v2_v11",CPU_POWERPC_e500v2_v11, e500v2,
+"PowerPC e500v2 v1.1 core")
 POWERPC_DEF("e500v2_v20",CPU_POWERPC_e500v2_v20, e500v2,
 "PowerPC e500v2 v2.0 core")
 POWERPC_DEF("e500v2_v21",CPU_POWERPC_e500v2_v21, e500v2,
 "PowerPC e500v2 v2.1 core")
 POWERPC_DEF("e500v2_v22",CPU_POWERPC_e500v2_v22, e500v2,
 "PowerPC e500v2 v2.2 core")
+POWERPC_DEF("e500v2_v23",CPU_POWERPC_e500v2_v23, e500v2,
+"PowerPC e500v2 v2.3 core")
 POWERPC_DEF("e500v2_v30",CPU_POWERPC_e500v2_v30, e500v2,
 "PowerPC e500v2 v3.0 core")
+POWERPC_DEF("e500v2_v31",CPU_POWERPC_e500v2_v31, e500v2,
+"PowerPC e500v2 v3.1 core")
+POWERPC_DEF("e500v2_v1040", CPU_POWERPC_e500v2_v1040,   e500v2,
+"PowerPC e500v2 v104.0 core")
+POWERPC_DEF("e500v2_v1050", CPU_POWERPC_e500v2_v1050,   e500v2,
+"PowerPC e500v2 v105.0 core")
+POWERPC_DEF("e500v2_v1051", CPU_POWERPC_e500v2_v1051,   e500v2,
+"PowerPC e500v2 v105.1 core")
+POWERPC_DEF("e500v2_v1151", CPU_POWERPC_e500v2_v1151,   e500v2,
+"PowerPC e500v2 v115.1 core")
+POWERPC_DEF("e500v2_v1152", CPU_POWERPC_e500v2_v1152,   e500v2,
+"PowerPC e500v2 v115.2 core")
+POWERPC_DEF("e500v2_v2050", CPU_POWERPC_e500v2_v2050,   e500v2,
+"PowerPC e500v2 v205.0 core")
+POWERPC_DEF("e500v2_v2051", CPU_POWERPC_e500v2_v2051,   e500v2,
+"PowerPC e500v2 v205.1 core")
+POWERPC_DEF("e500v2_v2151", CPU_POWERPC_e500v2_v2151,   e500v2,
+"PowerPC e500v2 v215.1 core")
+POWERPC_DEF("e500v2_v2152", CPU_POWERPC_e500v2_v2152,   e500v2,
+"PowerPC e500v2 v215.2 core")
+POWERPC_DEF("e500v2_v2251", CPU_POWERPC_e500v2_v2251,   e500v2,
+"PowerPC e500v2 v225.1 core")
+
+/* e500mc family */
 POWERPC_DEF_SVR("e500mc", "e500mc",
-CPU_POWERPC_e500mc,   POWERPC_SVR_E500,  e500mc)
+CPU_POWERPC_e500mc_v20,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v10", "PowerPC e500mc v1.0 core",
+CPU_POWERPC_e500mc_v10,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v21", "PowerPC e500mc v2.1 core",
+CPU_POWERPC_e500mc_v21,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v22", "PowerPC e500mc v2.2 core",
+CPU_POWERPC_e500mc_v22,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v23", "PowerPC e500mc v2.3 core",
+CPU_POWERPC_e500mc_v23,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v1030", "PowerPC e500mc v103.0 core",
+CPU_POWERPC_e500mc_v1030, POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v30", "PowerPC e500mc v3.0 core",
+CPU_POWERPC_e500mc_v30,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v31", "PowerPC e500mc v3.1 core",
+CPU_POWERPC_e500mc_v31,   POWERPC_SVR_E500,  e500mc)
+POWERPC_DEF_SVR("e500mc_v32", "PowerPC e500mc v3.2 core",
+CPU_POWERPC_e500mc_v32,   POWERPC_SVR_E500,  e500mc)
+
 #ifdef TARGET_PPC64
+/* e5500 family */
 POWERPC_DEF_SVR("e5500", "e5500",
-CPU_POWERPC_e5500,POWERPC_SVR_E500,  e5500)
+CPU_POWERPC_e5500_v12,POWERPC_SVR_E500,  e5500)
+POWERPC_DEF_SVR("e5500_v10", "PowerPC e5500 v1.0 core",
+CPU_POWERPC_e5500_v10,POWERPC_SVR_E500,   

[Qemu-devel] [PATCH 2/2] QEMU: PPC: set default cpu type to be 'host'

2014-02-14 Thread Stuart Yoder
From: Stuart Yoder 

-for KVM we always want the cpu to be that of the
 host system, so make that the default

-for TGC mode, the emulated cpu type should be explicitly
 set

Signed-off-by: Stuart Yoder 
---
 hw/ppc/e500.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index b37ce9d..69dbf47 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -621,7 +621,11 @@ void ppce500_init(QEMUMachineInitArgs *args, PPCE500Params 
*params)
 
 /* Setup CPUs */
 if (args->cpu_model == NULL) {
-args->cpu_model = "e500v2_v30";
+if (kvm_enabled()) {
+args->cpu_model = "host";
+} else {
+args->cpu_model = "e500v2_v30";
+}
 }
 
 irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *));
-- 
1.7.9.7





Re: [Qemu-devel] [PATCH 1/2] QEMU: PPC: specify PVRs for all e500 cores

2014-04-16 Thread Stuart Yoder


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, April 14, 2014 6:01 AM
> To: Yoder Stuart-B08248
> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH 1/2] QEMU: PPC: specify PVRs for all e500 cores
> 
> 
> On 14.02.14 20:22, Stuart Yoder wrote:
> > From: Stuart Yoder 
> >
> > Signed-off-by: Stuart Yoder 
> > ---
> >   target-ppc/cpu-models.c |   64
> +--
> >   target-ppc/cpu-models.h |   30 --
> >   2 files changed, 90 insertions(+), 4 deletions(-)
> >
> > diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
> > index f6c9b3a..9cf603b 100644
> > --- a/target-ppc/cpu-models.c
> > +++ b/target-ppc/cpu-models.c
> > @@ -677,19 +677,79 @@
> >   "PowerPC e500 v2.0 core")
> >   POWERPC_DEF("e500v2_v10",CPU_POWERPC_e500v2_v10,
> e500v2,
> >   "PowerPC e500v2 v1.0 core")
> > +POWERPC_DEF("e500v2_v11",CPU_POWERPC_e500v2_v11,
> e500v2,
> > +"PowerPC e500v2 v1.1 core")
> >   POWERPC_DEF("e500v2_v20",CPU_POWERPC_e500v2_v20,
> e500v2,
> >   "PowerPC e500v2 v2.0 core")
> >   POWERPC_DEF("e500v2_v21",CPU_POWERPC_e500v2_v21,
> e500v2,
> >   "PowerPC e500v2 v2.1 core")
> >   POWERPC_DEF("e500v2_v22",CPU_POWERPC_e500v2_v22,
> e500v2,
> >   "PowerPC e500v2 v2.2 core")
> > +POWERPC_DEF("e500v2_v23",CPU_POWERPC_e500v2_v23,
> e500v2,
> > +"PowerPC e500v2 v2.3 core")
> >   POWERPC_DEF("e500v2_v30",CPU_POWERPC_e500v2_v30,
> e500v2,
> >   "PowerPC e500v2 v3.0 core")
> > +POWERPC_DEF("e500v2_v31",CPU_POWERPC_e500v2_v31,
> e500v2,
> > +"PowerPC e500v2 v3.1 core")
> > +POWERPC_DEF("e500v2_v1040", CPU_POWERPC_e500v2_v1040,
> e500v2,
> > +"PowerPC e500v2 v104.0 core")
> > +POWERPC_DEF("e500v2_v1050", CPU_POWERPC_e500v2_v1050,
> e500v2,
> > +"PowerPC e500v2 v105.0 core")
> > +POWERPC_DEF("e500v2_v1051", CPU_POWERPC_e500v2_v1051,
> e500v2,
> > +"PowerPC e500v2 v105.1 core")
> > +POWERPC_DEF("e500v2_v1151", CPU_POWERPC_e500v2_v1151,
> e500v2,
> > +"PowerPC e500v2 v115.1 core")
> > +POWERPC_DEF("e500v2_v1152", CPU_POWERPC_e500v2_v1152,
> e500v2,
> > +"PowerPC e500v2 v115.2 core")
> > +POWERPC_DEF("e500v2_v2050", CPU_POWERPC_e500v2_v2050,
> e500v2,
> > +"PowerPC e500v2 v205.0 core")
> > +POWERPC_DEF("e500v2_v2051", CPU_POWERPC_e500v2_v2051,
> e500v2,
> > +"PowerPC e500v2 v205.1 core")
> > +POWERPC_DEF("e500v2_v2151", CPU_POWERPC_e500v2_v2151,
> e500v2,
> > +"PowerPC e500v2 v215.1 core")
> > +POWERPC_DEF("e500v2_v2152", CPU_POWERPC_e500v2_v2152,
> e500v2,
> > +"PowerPC e500v2 v215.2 core")
> > +POWERPC_DEF("e500v2_v2251", CPU_POWERPC_e500v2_v2251,
> e500v2,
> > +"PowerPC e500v2 v225.1 core")
> > +
> > +/* e500mc family */
> >   POWERPC_DEF_SVR("e500mc", "e500mc",
> > -CPU_POWERPC_e500mc,   POWERPC_SVR_E500,
> e500mc)
> > +CPU_POWERPC_e500mc_v20,   POWERPC_SVR_E500,
> e500mc)
> 
> How about we use aliases instead like the other CPU types? :)
> 
> > +POWERPC_DEF_SVR("e500mc_v10", "PowerPC e500mc v1.0 core",
> > +CPU_POWERPC_e500mc_v10,   POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v21", "PowerPC e500mc v2.1 core",
> > +CPU_POWERPC_e500mc_v21,   POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v22", "PowerPC e500mc v2.2 core",
> > +CPU_POWERPC_e500mc_v22,   POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v23", "PowerPC e500mc v2.3 core",
> > +CPU_POWERPC_e500mc_v23,   POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v1030", "PowerPC e500mc v103.0 core",
> > +CPU_POWERPC_e500mc_v1030, POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v30", "PowerPC e500mc v3.0 core",
> > +CPU_POWERPC_e500mc_v30,   POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v31", "PowerPC e500mc v3.1 core",
> > +CPU_POWERPC_e500mc_v31,   POWERPC_SVR_E500,
> e500mc)
> > +POWERPC_DEF_SVR("e500mc_v32", "PowerPC e500mc v3.2 core",
> > +CPU_POWERPC_e500mc_v32,   POWERPC_SVR_E500,
> e500mc)
> > +
> >   #ifdef TARGET_PPC64
> > +/* e5500 family */
> >   POWERPC_DEF_SVR("e5500", "e5500",
> > -CPU_POWERPC_e5500,POWERPC_SVR_E500,
> e5500)
> 
> Same here

Can you explain you mean?  what kind of alias?

Thanks,
Stuart



[Qemu-devel] USB passthru of hub

2014-03-19 Thread Stuart Yoder
Hi Gerd,

In the "USB 2.0 Quick Start" write-up you said about USB
passthrough:

>  (2) hostbus+hostport -- match for a specific physical port in the
>  host, any device which is plugged in there gets passed to the
>  guest

A customer is asking what happens if a USB hub is plugged into
the specified port (instead of a device).   Will the hub get
passed through and the guest will see all devices plugged into
that hub?

Thanks,
Stuart



Re: [Qemu-devel] [Qemu-ppc] qemu does not support PAPR

2014-06-05 Thread Stuart Yoder


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, June 05, 2014 2:59 AM
> To: sonia verma
> Cc: Paolo Bonzini; abhishek jain; qemu-...@nongnu.org; qemu-devel; Yoder
> Stuart-B08248
> Subject: Re: [Qemu-ppc] qemu does not support PAPR
> 
> 
> 
> > Am 05.06.2014 um 09:45 schrieb sonia verma :
> >
> >
> > So what can be the solution for this?
> 
> Please don't top post.
> 
> You're running on QorIQ which is a Freescale chip. The pseries target
> only works on IBM PPC chips.
> 
> The target you're looking for is ppce500:
> 
>   $ qemu-system-ppc64 -M ppce500 -nographic -kernel uImage ...
> 
> I don't know about the status of libvirt in your sdk version, but I'm
> sure Stuart does, so let's ask him ;).

libvirt is supported in Freescale SDK 1.4 in the Yocto-built rootfs, so I
would expect anything that uses libvirt interfaces to work there.  But, we
haven't specifically tested libguestfs.

However, it looks like you may not be using the SDK Yocto rootfs, but a
ubuntu rootfs.  In that case there may be a issue with the ubuntu libvirt.
There are some patches that we apply to libvirt that you will find only in the
Freescale SDK...you can find them if you look at the libvirt Yocto
recipe.

Stuart




Re: [Qemu-devel] [Qemu-ppc] qemu does not support PAPR

2014-06-05 Thread Stuart Yoder
> From: sonia verma [mailto:soniaverma9...@gmail.com] 
> Sent: Thursday, June 05, 2014 12:13 PM
> To: Yoder Stuart-B08248
> Cc: abhishek jain; Alexander Graf; qemu-...@nongnu.org; qemu-devel; Paolo 
> Bonzini
> Subject: RE: [Qemu-ppc] qemu does not support PAPR
> 
> Hi Stuart.
> Thanks for the information.I need to run libguestfs on powerpc ubuntu.How can 
> i do that?
> Are there any patches for the same.

I think you will have to build/install libvirt yourself on ubuntu with the
Freescale patches.

For SDK 1.4, you can see the Yocto recipe used here:
http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt

...we use the stock Yocto libvirt recipe and and apply 2 patches.

The 2 patches are here:
http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt/files

I have not built libvirt from scratch myself, so can't help you there.

Thanks,
Stuart



Re: [Qemu-devel] [Qemu-ppc] qemu does not support PAPR

2014-06-06 Thread Stuart Yoder
Based on your original description of the problem, I thought the error was that 
libvirt was starting a PAPR machine instead of an e500 machine.   So my take is 
that libvirt needed to be rebuilt with the patches to support e500 machines 
(that I mentioned in my last email).

If you are actually able to start e500 machines successfully with libvirt, then 
the problem must lie with how libguestfs interfaces to libvirt or something.   
I have never used libguestfs, and don’t think I can help you with that.

Stuart

From: sonia verma [mailto:soniaverma9...@gmail.com]
Sent: Thursday, June 05, 2014 11:11 PM
To: Yoder Stuart-B08248
Cc: abhishek jain; Alexander Graf; qemu-...@nongnu.org; qemu-devel; Paolo 
Bonzini
Subject: Re: [Qemu-ppc] qemu does not support PAPR

Hi Stuart
Thanks for the information.I'm able to run libvirt on my system.I need to run 
libguestfs on the same powerpc ubuntu.
Can you help me regarding that?

On Thu, Jun 5, 2014 at 11:10 PM, Stuart Yoder 
mailto:stuart.yo...@freescale.com>> wrote:
> From: sonia verma 
> [mailto:soniaverma9...@gmail.com<mailto:soniaverma9...@gmail.com>]
> Sent: Thursday, June 05, 2014 12:13 PM
> To: Yoder Stuart-B08248
> Cc: abhishek jain; Alexander Graf; 
> qemu-...@nongnu.org<mailto:qemu-...@nongnu.org>; qemu-devel; Paolo Bonzini
> Subject: RE: [Qemu-ppc] qemu does not support PAPR
>
> Hi Stuart.
> Thanks for the information.I need to run libguestfs on powerpc ubuntu.How can 
> i do that?
> Are there any patches for the same.
I think you will have to build/install libvirt yourself on ubuntu with the
Freescale patches.

For SDK 1.4, you can see the Yocto recipe used here:
http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt

...we use the stock Yocto libvirt recipe and and apply 2 patches.

The 2 patches are here:
http://git.freescale.com/git/cgit.cgi/ppc/sdk/meta-fsl-networking.git/tree/recipes-append/libvirt/files

I have not built libvirt from scratch myself, so can't help you there.

Thanks,
Stuart



Re: [Qemu-devel] [RFC PATCH v4 1/3] use image_file_reset to reload initrd image

2012-11-15 Thread Stuart Yoder
>  /* read()-like version */
>  ssize_t read_targphys(const char *name,
>int fd, hwaddr dst_addr, size_t nbytes)
> @@ -113,6 +146,12 @@ int load_image_targphys(const char *filename,
>  }
>  if (size > 0) {
>  rom_add_file_fixed(filename, addr, -1);
> +ImageFile *image;
> +image = g_malloc0(sizeof(*image));
> +image->name = g_strdup(filename);
> +image->addr = addr;
> +
> +qemu_register_reset(image_file_reset, image);

You need to remove the call to rom_add_file_fixed(), no?

Stuart



Re: [Qemu-devel] [RFC PATCH v4 2/3] use uimage_reset to reload uimage

2012-11-15 Thread Stuart Yoder
On Wed, Nov 14, 2012 at 2:28 PM, Olivia Yin  wrote:
> Signed-off-by: Olivia Yin 
> ---
>  hw/loader.c |   64 ++
>  1 files changed, 46 insertions(+), 18 deletions(-)
>
> diff --git a/hw/loader.c b/hw/loader.c
> index a8a0a09..1a909d0 100644
> --- a/hw/loader.c
> +++ b/hw/loader.c
> @@ -55,6 +55,8 @@
>  #include 
>
>  static int roms_loaded;
> +static int kernel_loaded;
> +static int *is_linux;

Why do we need these 2 new variables?

>  /* return the size or -1 if error */
>  int get_image_size(const char *filename)
> @@ -472,15 +474,14 @@ static ssize_t gunzip(void *dst, size_t dstlen, uint8_t 
> *src,
>  return dstbytes;
>  }
>
> -/* Load a U-Boot image.  */
> -int load_uimage(const char *filename, hwaddr *ep,
> -hwaddr *loadaddr, int *is_linux)
> +/* write uimage into memory */
> +static int uimage_physical_loader(const char *filename, uint8_t **data,
> +  hwaddr *loadaddr, int *is_linux)
>  {
>  int fd;
>  int size;
>  uboot_image_header_t h;
>  uboot_image_header_t *hdr = &h;
> -uint8_t *data = NULL;
>  int ret = -1;
>
>  fd = open(filename, O_RDONLY | O_BINARY);
> @@ -521,10 +522,9 @@ int load_uimage(const char *filename, hwaddr *ep,
>  *is_linux = 0;
>  }
>
> -*ep = hdr->ih_ep;
> -data = g_malloc(hdr->ih_size);
> +*data = g_malloc(hdr->ih_size);
>
> -if (read(fd, data, hdr->ih_size) != hdr->ih_size) {
> +if (read(fd, *data, hdr->ih_size) != hdr->ih_size) {
>  fprintf(stderr, "Error reading file\n");
>  goto out;
>  }
> @@ -534,11 +534,11 @@ int load_uimage(const char *filename, hwaddr *ep,
>  size_t max_bytes;
>  ssize_t bytes;
>
> -compressed_data = data;
> +compressed_data = *data;
>  max_bytes = UBOOT_MAX_GUNZIP_BYTES;
> -data = g_malloc(max_bytes);
> +*data = g_malloc(max_bytes);
>
> -bytes = gunzip(data, max_bytes, compressed_data, hdr->ih_size);
> +bytes = gunzip(*data, max_bytes, compressed_data, hdr->ih_size);
>  g_free(compressed_data);
>  if (bytes < 0) {
>  fprintf(stderr, "Unable to decompress gzipped image!\n");
> @@ -547,7 +547,9 @@ int load_uimage(const char *filename, hwaddr *ep,
>  hdr->ih_size = bytes;
>  }
>
> -rom_add_blob_fixed(filename, data, hdr->ih_size, hdr->ih_load);
> +if (!kernel_loaded) {
> +rom_add_blob_fixed(filename, *data, hdr->ih_size, hdr->ih_load);
> +}

Why do we need to keep the rom_add_blob_fixed() call?  I thought the
point of this was to remove adding roms.

>  if (loadaddr)
>  *loadaddr = hdr->ih_load;
> @@ -555,12 +557,41 @@ int load_uimage(const char *filename, hwaddr *ep,
>  ret = hdr->ih_size;
>
>  out:
> -if (data)
> -g_free(data);
>  close(fd);
>  return ret;
>  }
>
> +static void uimage_reset(void *opaque)
> +{
> +ImageFile *image = opaque;
> +uint8_t *data = NULL;
> +int size;
> +
> +size = uimage_physical_loader(image->name, &data, &image->addr, 
> is_linux);

Not sure why we need is_linux here.  I think you should just set that
parameter to NULL.
Nothing will look at is_linux.

Stuart



Re: [Qemu-devel] [RFC PATCH v5 4/4] free the memory malloced by load_at()

2012-11-21 Thread Stuart Yoder
On Wed, Nov 21, 2012 at 8:38 AM, Olivia Yin  wrote:
> Signed-off-by: Olivia Yin 
> ---
>  hw/elf_ops.h |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/hw/elf_ops.h b/hw/elf_ops.h
> index b346861..9c76a75 100644
> --- a/hw/elf_ops.h
> +++ b/hw/elf_ops.h
> @@ -178,6 +178,8 @@ static int glue(load_symbols, SZ)(struct elfhdr *ehdr, 
> int fd, int must_swab,
>  s->disas_strtab = str;
>  s->next = syminfos;
>  syminfos = s;
> +g_free(syms);
> +g_free(str);
>  g_free(shdr_table);
>  return 0;
>   fail:

Olivia, as Alex pointed out there are references to syms and str in
the struct "s"so you can't just free those I don't think.

The problem that leaves us with is that on every reset when we call
load_elf() that we re-load and re-malloc space for the symbols.

I think the solution may be to factor out the call to load_symbols()
from load_elf().   It looks like what load_symbols does in the end is
set the variable syminfos to point to the loaded symbol info.

If you factor load_symbols() out then in load_elf_32/64() you would do
something like:
  elf_phy_loader_32/64()
  load_symbols_32/64().

We don't need to be reloading symbols on every reset.

Alex, does that make sense?

Stuart



[Qemu-devel] [PATCH] PPC: e500: advertise 4.2 MPIC only if KVM supports EPR

2013-03-30 Thread Stuart Yoder
From: Stuart Yoder 

Signed-off-by: Stuart Yoder 
---
 hw/ppc/e500plat.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c
index 25ac4b1..2cd7cad 100644
--- a/hw/ppc/e500plat.c
+++ b/hw/ppc/e500plat.c
@@ -16,6 +16,7 @@
 #include "sysemu/device_tree.h"
 #include "hw/pci/pci.h"
 #include "hw/openpic.h"
+#include "sysemu/kvm.h"
 
 static void e500plat_fixup_devtree(PPCE500Params *params, void *fdt)
 {
@@ -48,6 +49,10 @@ static void e500plat_init(QEMUMachineInitArgs *args)
 .mpic_version = OPENPIC_MODEL_FSL_MPIC_42,
 };
 
+if (kvm_enabled() && !kvm_check_extension(kvm_state, KVM_CAP_PPC_EPR)) {
+params.mpic_version = OPENPIC_MODEL_FSL_MPIC_20;
+}
+
 ppce500_init(¶ms);
 }
 
-- 
1.7.9.7





Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood  wrote:
> On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:
>>
>> Alex,
>>
>> We are in the process of implementing vfio-pci support for the Freescale
>> IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite different
>> than x86, and will involve creating a 'type 2' vfio implementation.
>>
>> For each device's DMA mappings, PAMU has an overall aperture and a number
>> of windows.  All sizes and window counts must be power of 2.  To
>> illustrate,
>> below is a mapping for a 256MB guest, including guest memory (backed by
>> 64MB huge pages) and some windows for MSIs:
>>
>> Total aperture: 512MB
>> # of windows: 8
>>
>> win gphys/
>> #   iovaphys  size
>> ---   
>> 0   0x  0xX_XX00  64MB
>> 1   0x0400  0xX_XX00  64MB
>> 2   0x0800  0xX_XX00  64MB
>> 3   0x0C00  0xX_XX00  64MB
>> 4   0x1000  0xf_fe044000  4KB// msi bank 1
>> 5   0x1400  0xf_fe045000  4KB// msi bank 2
>> 6   0x1800  0xf_fe046000  4KB// msi bank 3
>> 7- -  disabled
>>
>> There are a couple of updates needed to the vfio user->kernel interface
>> that we would like your feedback on.
>>
>> 1.  IOMMU geometry
>>
>>The kernel IOMMU driver now has an interface (see domain_set_attr,
>>domain_get_attr) that lets us set the domain geometry using
>>"attributes".
>>
>>We want to expose that to user space, so envision needing a couple
>>of new ioctls to do this:
>> VFIO_IOMMU_SET_ATTR
>> VFIO_IOMMU_GET_ATTR
>
>
> Note that this means attributes need to be updated for user-API
> appropriateness, such as using fixed-size types.
>
>
>> 2.   MSI window mappings
>>
>>The more problematic question is how to deal with MSIs.  We need to
>>create mappings for up to 3 MSI banks that a device may need to target
>>to generate interrupts.  The Linux MSI driver can allocate MSIs from
>>the 3 banks any way it wants, and currently user space has no way of
>>knowing which bank may be used for a given device.
>>
>>There are 3 options we have discussed and would like your direction:
>>
>>A.  Implicit mappings -- with this approach user space would not
>>explicitly map MSIs.  User space would be required to set the
>>geometry so that there are 3 unused windows (the last 3 windows)
>
>
> Where does userspace get the number "3" from?  E.g. on newer chips there are
> 4 MSI banks.  Maybe future chips have even more.

Ok, then make the number 4.   The chance of more MSI banks in future chips
is nil, and if it ever happened user space could adjust.  Also,
practically speaking since memory is typically allocate in powers of
2 way you need to approximately double the window geometry anyway.

>>B.  Explicit mapping using DMA map flags.  The idea is that a new
>>flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
>>a mapping is to be created for the supplied iova.  No vaddr
>>is given though.  So in the above example there would be a
>>a dma map at 0x1000 for 24KB (and no vaddr).
>
>
> A single 24 KiB mapping wouldn't work (and why 24KB? What if only one MSI
> group is involved in this VFIO group?  What if four MSI groups are
> involved?).  You'd need to either have a naturally aligned, power-of-two
> sized mapping that covers exactly the pages you want to map and no more, or
> you'd need to create a separate mapping for each MSI bank, and due to PAMU
> subwindow alignment restrictions these mappings could not be contiguous in
> iova-space.

You're right, a single 24KB mapping wouldn't work--  in the case of 3 MSI banks
perhaps we could just do one 64MB*3 mapping to identify which windows
are used for MSIs.

If only one MSI bank was involved the kernel could get clever and only enable
the banks actually needed.

Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 3:32 PM, Alex Williamson
 wrote:
>> 2.   MSI window mappings
>>
>>The more problematic question is how to deal with MSIs.  We need to
>>create mappings for up to 3 MSI banks that a device may need to target
>>to generate interrupts.  The Linux MSI driver can allocate MSIs from
>>the 3 banks any way it wants, and currently user space has no way of
>>knowing which bank may be used for a given device.
>>
>>There are 3 options we have discussed and would like your direction:
>>
>>A.  Implicit mappings -- with this approach user space would not
>>explicitly map MSIs.  User space would be required to set the
>>geometry so that there are 3 unused windows (the last 3 windows)
>>for MSIs, and it would be up to the kernel to create the mappings.
>>This approach requires some specific semantics (leaving 3 windows)
>>and it potentially gets a little weird-- when should the kernel
>>actually create the MSI mappings?  When should they be unmapped?
>>Some convention would need to be established.
>
> VFIO would have control of SET/GET_ATTR, right?  So we could reduce the
> number exposed to userspace on GET and transparently add MSI entries on
> SET.

The number of windows is always power of 2 (and max is 256).  And to reduce
PAMU cache pressure you want to use the fewest number of windows
you can.So, I don't see practically how we could transparently
steal entries to
add the MSIs. Either user space knows to leave empty windows for
MSIs and by convention the kernel knows which windows those are (as
in option #A) or explicitly tell the kernel which windows (as in option #B).

> On x86 the interrupt remapper handles this transparently when MSI
> is enabled and userspace never gets direct access to the device MSI
> address/data registers.  What kind of restrictions do you have around
> adding and removing windows while the aperture is enabled?

The windows can be enabled/disabled event while the aperture is
enabled (pretty sure)...

>>B.  Explicit mapping using DMA map flags.  The idea is that a new
>>flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
>>a mapping is to be created for the supplied iova.  No vaddr
>>is given though.  So in the above example there would be a
>>a dma map at 0x1000 for 24KB (and no vaddr).   It's
>>up to the kernel to determine which bank gets mapped where.
>>So, this option puts user space in control of which windows
>>are used for MSIs and when MSIs are mapped/unmapped.   There
>>would need to be some semantics as to how this is used-- it
>>only makes sense
>
> This could also be done as another "type2" ioctl extension.  What's the
> value to userspace in determining which windows are used by which banks?
> It sounds like the case that there are X banks and if userspace wants to
> use MSI it needs to leave X windows available for that.  Is this just
> buying userspace a few more windows to allow them the choice between MSI
> or RAM?

Yes, it would potentially give user space the flexibility some more windows.
It also makes more explicit when the MSI mappings are created.  In option
#A the MSI mappings would probably get created at the time of the first
normal DMA map.

So, you're saying with this approach you'd rather see a new type 2
ioctl instead of adding new flags to DMA map, right?

>>C.  Explicit mapping using normal DMA map.  The last idea is that
>>we would introduce a new ioctl to give user-space an fd to
>>the MSI bank, which could be mmapped.  The flow would be
>>something like this:
>>   -for each group user space calls new ioctl VFIO_GROUP_GET_MSI_FD
>>   -user space mmaps the fd, getting a vaddr
>>   -user space does a normal DMA map for desired iova
>>This approach makes everything explicit, but adds a new ioctl
>>applicable most likely only to the PAMU (type2 iommu).
>
> And the DMA_MAP of that mmap then allows userspace to select the window
> used?  This one seems like a lot of overhead, adding a new ioctl, new
> fd, mmap, special mapping path, etc.  It would be less overhead to just
> add an ioctl to enable MSI, maybe letting userspace pick which windows
> get used, but I'm still not sure what the value is to userspace in
> exposing it.  Thanks,

Thanks,
Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 3:47 PM, Scott Wood  wrote:
> On 04/02/2013 03:38:42 PM, Stuart Yoder wrote:
>>
>> On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood 
>> wrote:
>> > On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:
>> >>
>> >> Alex,
>> >>
>> >> We are in the process of implementing vfio-pci support for the
>> >> Freescale
>> >> IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite
>> >> different
>> >> than x86, and will involve creating a 'type 2' vfio implementation.
>> >>
>> >> For each device's DMA mappings, PAMU has an overall aperture and a
>> >> number
>> >> of windows.  All sizes and window counts must be power of 2.  To
>> >> illustrate,
>> >> below is a mapping for a 256MB guest, including guest memory (backed by
>> >> 64MB huge pages) and some windows for MSIs:
>> >>
>> >> Total aperture: 512MB
>> >> # of windows: 8
>> >>
>> >> win gphys/
>> >> #   iovaphys  size
>> >> ---   
>> >> 0   0x  0xX_XX00  64MB
>> >> 1   0x0400  0xX_XX00  64MB
>> >> 2   0x0800  0xX_XX00  64MB
>> >> 3   0x0C00  0xX_XX00  64MB
>> >> 4   0x1000  0xf_fe044000  4KB// msi bank 1
>> >> 5   0x1400  0xf_fe045000  4KB// msi bank 2
>> >> 6   0x1800  0xf_fe046000  4KB// msi bank 3
>> >> 7- -  disabled
>> >>
>> >> There are a couple of updates needed to the vfio user->kernel interface
>> >> that we would like your feedback on.
>> >>
>> >> 1.  IOMMU geometry
>> >>
>> >>The kernel IOMMU driver now has an interface (see domain_set_attr,
>> >>domain_get_attr) that lets us set the domain geometry using
>> >>"attributes".
>> >>
>> >>We want to expose that to user space, so envision needing a couple
>> >>of new ioctls to do this:
>> >> VFIO_IOMMU_SET_ATTR
>> >> VFIO_IOMMU_GET_ATTR
>> >
>> >
>> > Note that this means attributes need to be updated for user-API
>> > appropriateness, such as using fixed-size types.
>> >
>> >
>> >> 2.   MSI window mappings
>> >>
>> >>The more problematic question is how to deal with MSIs.  We need to
>> >>create mappings for up to 3 MSI banks that a device may need to
>> >> target
>> >>to generate interrupts.  The Linux MSI driver can allocate MSIs from
>> >>the 3 banks any way it wants, and currently user space has no way of
>> >>knowing which bank may be used for a given device.
>> >>
>> >>There are 3 options we have discussed and would like your direction:
>> >>
>> >>A.  Implicit mappings -- with this approach user space would not
>> >>explicitly map MSIs.  User space would be required to set the
>> >>geometry so that there are 3 unused windows (the last 3 windows)
>> >
>> >
>> > Where does userspace get the number "3" from?  E.g. on newer chips there
>> > are
>> > 4 MSI banks.  Maybe future chips have even more.
>>
>> Ok, then make the number 4.   The chance of more MSI banks in future chips
>> is nil,
>
>
> What makes you so sure?  Especially since you seem to be presenting this as
> not specifically an MPIC API.
>
>
>> and if it ever happened user space could adjust.
>
>
> What bit of API is going to tell it that it needs to adjust?

Haven't thought through that completely, but I guess we could add an API
to return the number of MSI banks for type 2 iommus.

>> Also, practically speaking since memory is typically allocate in powers of
>> 2 way you need to approximately double the window geometry anyway.
>
>
> Only if your existing mapping needs fit exactly in a power of two.
>
>
>> >>B.  Explicit mapping using DMA map flags.  The idea is that a new
>> >>flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
>> >>a mapping is to be created for the supplied iova.  No vaddr
>> >>is given though.  So in the above example there would be a
>> >>a dma map at 0x1000 for 24KB (and no vaddr).
>> >
>> >
>> > A single 24 KiB mapping wouldn't work (and why 24KB? What if only one
>> > MSI
>> > group is involved in this VFIO group?  What if four MSI groups are
>> > involved?).  You'd need to either have a naturally aligned, power-of-two
>> > sized mapping that covers exactly the pages you want to map and no more,
>> > or
>> > you'd need to create a separate mapping for each MSI bank, and due to
>> > PAMU
>> > subwindow alignment restrictions these mappings could not be contiguous
>> > in
>> > iova-space.
>>
>> You're right, a single 24KB mapping wouldn't work--  in the case of 3 MSI
>> banks
>> perhaps we could just do one 64MB*3 mapping to identify which windows
>> are used for MSIs.
>
>
> Where did the assumption of a 64MiB subwindow size come from?

The example I was using.   User space would need to create a
mapping for window_size * msi_bank_count.

Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-02 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood  wrote:
>> This could also be done as another "type2" ioctl extension.
>
>
> Again, what is "type2", specifically?  If someone else is adding their own
> IOMMU that is kind of, sort of like PAMU, how would they know if it's close
> enough?  What assumptions can a user make when they see that they're dealing
> with "type2"?

We will define that as part of the type2 implementation.   Highly unlikely
anything but a PAMU will comply.

>> What's the value to userspace in determining which windows are used by
>> which banks?
>
>
> That depends on who programs the MSI config space address.  What is
> important is userspace controlling which iovas will be dedicated to this, in
> case it wants to put something else there.
>
>
>> It sounds like the case that there are X banks and if userspace wants to
>> use MSI it needs to leave X windows available for that.  Is this just
>> buying userspace a few more windows to allow them the choice between MSI
>> or RAM?
>
>
> Well, there could be that.  But also, userspace will generally have a much
> better idea of the type of mappings it's creating, so it's easier to keep
> everything explicit at the kernel/user interface than require more
> complicated code in the kernel to figure things out automatically (not just
> for MSIs but in general).
>
> If the kernel automatically creates the MSI mappings, when does it assume
> that userspace is done creating its own?  What if userspace doesn't need any
> DMA other than the MSIs?  What if userspace wants to continue dynamically
> modifying its other mappings?
>
>
>> >C.  Explicit mapping using normal DMA map.  The last idea is that
>> >we would introduce a new ioctl to give user-space an fd to
>> >the MSI bank, which could be mmapped.  The flow would be
>> >something like this:
>> >   -for each group user space calls new ioctl
>> > VFIO_GROUP_GET_MSI_FD
>> >   -user space mmaps the fd, getting a vaddr
>> >   -user space does a normal DMA map for desired iova
>> >This approach makes everything explicit, but adds a new ioctl
>> >applicable most likely only to the PAMU (type2 iommu).
>>
>> And the DMA_MAP of that mmap then allows userspace to select the window
>> used?  This one seems like a lot of overhead, adding a new ioctl, new
>> fd, mmap, special mapping path, etc.
>
>
> There's going to be special stuff no matter what.  This would keep it
> separated from the IOMMU map code.
>
> I'm not sure what you mean by "overhead" here... the runtime overhead of
> setting things up is not particularly relevant as long as it's reasonable.
> If you mean development and maintenance effort, keeping things well
> separated should help.

We don't need to change DMA_MAP.  If we can simply add a new "type 2"
ioctl that allows user space to set which windows are MSIs, it seems vastly
less complex than an ioctl to supply a new fd, mmap of it, etc.

So maybe 2 ioctls:
VFIO_IOMMU_GET_MSI_COUNT
VFIO_IOMMU_MAP_MSI(iova, size)

Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
>> >  Type1 is arbitrary.  It might as well be named "brown" and this one
>> > can be
>> > "blue".
>>
>> The difference is that "type1" seems to refer to hardware that can do
>> arbitrary 4K page mappings, possibly constrained by an aperture but
>> nothing else.  More than one IOMMU can reasonably fit that.  The odds
>> that another IOMMU would have exactly the same restrictions as PAMU
>> seem smaller in comparison.
>>
>> In any case, if you had to deal with some Intel-only quirk, would it
>> make sense to call it a "type1 attribute"?  I'm not advocating one way
>> or the other on whether an abstraction is viable here (though Stuart
>> seems to think it's "highly unlikely anything but a PAMU will comply"),
>> just that if it is to be abstracted rather than a hardware-specific
>> interface, we need to document what is and is not part of the
>> abstraction.  Otherwise a non-PAMU-specific user won't know what they
>> can rely on, and someone adding support for a new windowed IOMMU won't
>> know if theirs is close enough, or they need to introduce a "type3".
>
> So Alexey named the SPAPR IOMMU something related to spapr...
> surprisingly enough.  I'm fine with that.  If you think it's unique
> enough, name it something appropriately.  I haven't seen the code and
> don't know the architecture sufficiently to have an opinion.

The only reason I suggested "type 2" is that I thought that was the
convention...we would enumerate different iommus.   I think that
calling it "pamu" is better and more clear.

Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
On Tue, Apr 2, 2013 at 5:50 PM, Scott Wood  wrote:
> On 04/02/2013 04:38:45 PM, Alex Williamson wrote:
>>
>> On Tue, 2013-04-02 at 16:08 -0500, Stuart Yoder wrote:
>> > On Tue, Apr 2, 2013 at 3:57 PM, Scott Wood 
>> > wrote:
>> > >> >C.  Explicit mapping using normal DMA map.  The last idea is
>> > >> > that
>> > >> >we would introduce a new ioctl to give user-space an fd to
>> > >> >the MSI bank, which could be mmapped.  The flow would be
>> > >> >something like this:
>> > >> >   -for each group user space calls new ioctl
>> > >> > VFIO_GROUP_GET_MSI_FD
>> > >> >   -user space mmaps the fd, getting a vaddr
>> > >> >   -user space does a normal DMA map for desired iova
>> > >> >This approach makes everything explicit, but adds a new
>> > >> > ioctl
>> > >> >applicable most likely only to the PAMU (type2 iommu).
>> > >>
>> > >> And the DMA_MAP of that mmap then allows userspace to select the
>> > >> window
>> > >> used?  This one seems like a lot of overhead, adding a new ioctl, new
>> > >> fd, mmap, special mapping path, etc.
>> > >
>> > >
>> > > There's going to be special stuff no matter what.  This would keep it
>> > > separated from the IOMMU map code.
>> > >
>> > > I'm not sure what you mean by "overhead" here... the runtime overhead
>> > > of
>> > > setting things up is not particularly relevant as long as it's
>> > > reasonable.
>> > > If you mean development and maintenance effort, keeping things well
>> > > separated should help.
>> >
>> > We don't need to change DMA_MAP.  If we can simply add a new "type 2"
>> > ioctl that allows user space to set which windows are MSIs, it seems
>> > vastly
>> > less complex than an ioctl to supply a new fd, mmap of it, etc.
>> >
>> > So maybe 2 ioctls:
>> > VFIO_IOMMU_GET_MSI_COUNT
>
>
> Do you mean a count of actual MSIs or a count of MSI banks used by the whole
> VFIO group?

I meant # of MSI banks, so VFIO_IOMMU_GET_MSI_BANK_COUNT would be better.

>> > VFIO_IOMMU_MAP_MSI(iova, size)
>
>
> Not sure how you mean "size" to be used -- for MPIC it would be 4K per bank,
> and you can only map one bank at a time (which bank you're mapping should be
> a parameter, if only so that the kernel doesn't have to keep iteration state
> for you).

The intent was for user space to tell the kernel which windows to use
for MSI.   So I envisioned a total size of window-size * msi-bank-count.

Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
> Would is be possible for userspace to simply leave room for MSI bank
> mapping (how much room could be determined by something like
> VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
> DMA_MAP starting at the 0x0 address of the aperture, growing up, and
> VFIO will map banks on demand at the top of the aperture, growing down?
> Wouldn't that avoid a lot of issues with userspace needing to know
> anything about MSI banks (other than count) and coordinating irq numbers
> and enabling handlers?

This is basically option #A in the original proposals sent.   I like
this approach, in that it
is simpler and keeps user space mostly out of this...which is
consistent with how things are done
on x86.  User space just needs to know how many windows to leave at
the top of the aperture.
The kernel then has the flexibility to use those windows how it wants.

But one question, is when should the kernel actually map (and unmap)
the MSI banks.   One thing we need to do is enable the aperture...and current
thinking is that is done on the first DMA_MAP.   Similarly when the last mapping
is unmapped we could also umap the MSI banks.

Sequence would be something like:

VFIO_GROUP_SET_CONTAINER // add groups to the container

VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model

cnt = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks

VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture

VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows,
including MSI banks

VFIO_IOMMU_MAP_DMA// map the guest's memory
   ---> kernel enables aperture and maps needed MSI banks here

VFIO_DEVICE_SET_IRQS
   ---> kernel sets actual MSI addr/data in physical
device here (I think)


Stuart



Re: [Qemu-devel] RFC: vfio API changes needed for powerpc

2013-04-03 Thread Stuart Yoder
On Wed, Apr 3, 2013 at 2:18 PM, Scott Wood  wrote:
> On 04/03/2013 02:09:45 PM, Stuart Yoder wrote:
>>
>> > Would is be possible for userspace to simply leave room for MSI bank
>> > mapping (how much room could be determined by something like
>> > VFIO_IOMMU_GET_MSI_BANK_COUNT) then document the API that userspace can
>> > DMA_MAP starting at the 0x0 address of the aperture, growing up, and
>> > VFIO will map banks on demand at the top of the aperture, growing down?
>> > Wouldn't that avoid a lot of issues with userspace needing to know
>> > anything about MSI banks (other than count) and coordinating irq numbers
>> > and enabling handlers?
>>
>> This is basically option #A in the original proposals sent.   I like
>> this approach, in that it
>> is simpler and keeps user space mostly out of this...which is
>> consistent with how things are done
>> on x86.  User space just needs to know how many windows to leave at
>> the top of the aperture.
>> The kernel then has the flexibility to use those windows how it wants.
>>
>> But one question, is when should the kernel actually map (and unmap)
>> the MSI banks.
>
>
> I think userspace should explicitly request it.  Userspace still wouldn't
> need to know anything but the count:
>
> count = VFIO_IOMMU_GET_MSI_BANK_COUNT
> VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY)
> VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS)
> // do other DMA maps now, or later, or not at all, doesn't matter
> for (i = 0; i < count; i++)
> VFIO_IOMMU_MAP_MSI_BANK(iova, i);
> // The kernel now knows where each bank has been mapped, and can update PCI
> config space appropriately.

And the overall aperture enable/disable would occur on the first
dma/msi map() and last dma/msi unmap()?

Stuart



Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Stuart Yoder
On Mon, Sep 26, 2011 at 2:51 AM, David Gibson
 wrote:
> On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
>> Based on the discussions over the last couple of weeks
>> I have updated the device fd file layout proposal and
>> tried to specify it a bit more formally.
>>
>> ===
>>
>> 1.  Overview
>>
>>   This specification describes the layout of device files
>>   used in the context of vfio, which gives user space
>>   direct access to I/O devices that have been bound to
>>   vfio.
>>
>>   When a device fd is opened and read, offset 0x0 contains
>>   a fixed sized header followed by a number of variable length
>>   records that describe different characteristics
>>   of the device-- addressable regions, interrupts, etc.
>>
>>   0x0  +-+-+
>>        |         magic             | u32  // identifies this as a vfio
>> device file
>>        +---+         and identifies the type of bus
>>        |         version           | u32  // specifies the version of this
>>        +---+
>>        |         flags             | u32  // encodes any flags
>>        +---+
>>        |  dev info record 0        |
>>        |    type                   | u32   // type of record
>>        |    rec_len                | u32   // length in bytes of record
>>        |                           |          (including record header)
>>        |    flags                  | u32   // type specific flags
>>        |    ...content...          |       // record content, which could
>>        +---+       // include sub-records
>>        |  dev info record 1        |
>>        +---+
>>        |  dev info record N        |
>>        +---+
>
> I really should have chimed in on this earlier, but I've been very
> busy.
>
> Um, not to put too fine a point on it, this is madness.
>
> Yes, it's very flexible and can thereby cover a very wide range of
> cases.  But it's much, much too complex.  Userspace has to parse a
> complex, multilayered data structure, with variable length elements
> just to get an address at which to do IO.  I can pretty much guarantee
> that if we went with this, most userspace programs using this
> interface would just ignore this metadata and directly map the
> offsets at which they happen to know the kernel will put things for
> the type of device they care about.
>
> _At least_ for PCI, I think the original VFIO layout of each BAR at a
> fixed, well known offset is much better.  Despite its limitations,
> just advertising a "device type" ID which describes one of a few fixed
> layouts would be preferable to this.  I'm still hoping, that we can do
> a bit better than that.  But we should try really hard to at the very
> least force the metadata into a simple array of resources each with a
> fixed size record describing it, even if it means some space wastage
> with occasionally-used fields.  Anything more complex than that and
> userspace is just never going to use it properly.

So, is your issue really the variable length nature of what was
proposed?

I don't think it would be that hard to make the different resources
fixed length.   I think we have 2 types of resources now-- address
regions and interrupts.

The only thing that get's a bit tricky is device tree paths, which
are obviously variable length.

We could put a description of all the resources in an array with
each element being something like 4KB??

Stuart



Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-26 Thread Stuart Yoder
>
> The other obvious possibility is a pure ioctl interface.  To match what
> this proposal is trying to describe, plus the runtime interfaces, we'd
> need something like:
>
> /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
> #define VFIO_DEVICE_GET_FLAGS                   _IOR(, , u64)
>
>
> /* Return number of mmio/iop/config regions.
>  * For PCI this is always 8 (BAR0-5 + ROM + Config) */
> #define VFIO_DEVICE_GET_NUM_REGIONS             _IOR(, , int)
>
> /* Return length for region index (may be zero) */
> #define VFIO_DEVICE_GET_REGION_LEN              _IOWR(, , u64)
>
> /* Return flags for region index
>  * :0 - mmap'able, :1 - read-only, 63:2 - reserved */
> #define VFIO_DEVICE_GET_REGION_FLAGS            _IOR(, , u64)
>
> /* Return file offset for region index */
> #define VFIO_DEVICE_GET_REGION_OFFSET           _IOWR(, , u64)
>
> /* Return physical address for region index - not implemented for PCI */
> #define VFIO_DEVICE_GET_REGION_PHYS_ADDR        _IOWR(, , u64)
>
>
>
> /* Return number of IRQs (Not including MSI/MSI-X for PCI) */
> #define VFIO_DEVICE_GET_NUM_IRQ                 _IOR(, , int)
>
> /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */
> #define VFIO_DEVICE_SET_IRQ_EVENTFD             _IOW(, , int)
>
> /* Unmask IRQ index */
> #define VFIO_DEVICE_UNMASK_IRQ                  _IOW(, , int)
>
> /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */
> #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD      _IOW(, , int)
>
>
> /* Return the device tree path for type/index into the user
>  * allocated buffer */
> struct dtpath {
>        u32     type; (0 = region, 1 = IRQ)
>        u32     index;
>        u32     buf_len;
>        char    *buf;
> };
> #define VFIO_DEVICE_GET_DTPATH                  _IOWR(, , struct dtpath)
>
> /* Return the device tree index for type/index */
> struct dtindex {
>        u32     type; (0 = region, 1 = IRQ)
>        u32     index;
>        u32     prop_type;
>        u32     prop_index;
> };
> #define VFIO_DEVICE_GET_DTINDEX                 _IOWR(, , struct dtindex)
>
>
> /* Reset the device */
> #define VFIO_DEVICE_RESET                       _IO(, ,)
>
>
> /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
> #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS        _IOW(, , int)
> #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS       _IOW(, , int)
>
> Hope that covers it.  Something I prefer about this interface is that
> everything can easily be generated on the fly, whereas reading out a
> table from the device means we really need to have that table somewhere
> in kernel memory to easily support reading random offsets.  Thoughts?

I think this could work, but I'm not sure it makes the problem David
had any better--  you substitute the complexity of parsing the
variable length regions with invoking a set of APIs.

Stuart



Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-11-29 Thread Stuart Yoder
>
> BTW, github now has updated trees:
>
> git://github.com/awilliam/linux-vfio.git vfio-next-2029
> git://github.com/awilliam/qemu-vfio.git vfio-ng

Hi Alex,

Have been looking at vfio a bit.   A few observations and things
we'll need to figure out as it relates to the Freescale iommu.

__vfio_dma_map() assumes that mappings are broken into
4KB pages.   That will not be true for us.   We normally will be mapping
much larger physically contiguous chunks for our guests.  Guests will
get hugetlbfs backed memory with very large pages (e.g. 16MB,
64MB) or very large chunks allocated by some proprietary
means.

Also, mappings will have additional Freescale-specific attributes
that need to get passed through to dma_map somehow.   For
example, the iommu can stash directly into a CPU's cache
and we have iommu mapping properties like the cache stash
target id and an operation mapping attribute.

How do you envision handling proprietary attributes
in struct vfio_dma_map?

Stuart



Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-11-30 Thread Stuart Yoder
On Tue, Nov 29, 2011 at 5:44 PM, Alex Williamson
 wrote:
> On Tue, 2011-11-29 at 17:20 -0600, Stuart Yoder wrote:
>> >
>> > BTW, github now has updated trees:
>> >
>> > git://github.com/awilliam/linux-vfio.git vfio-next-2029
>> > git://github.com/awilliam/qemu-vfio.git vfio-ng
>>
>> Hi Alex,
>>
>> Have been looking at vfio a bit.   A few observations and things
>> we'll need to figure out as it relates to the Freescale iommu.
>>
>> __vfio_dma_map() assumes that mappings are broken into
>> 4KB pages.   That will not be true for us.   We normally will be mapping
>> much larger physically contiguous chunks for our guests.  Guests will
>> get hugetlbfs backed memory with very large pages (e.g. 16MB,
>> 64MB) or very large chunks allocated by some proprietary
>> means.
>
> Hi Stuart,
>
> I think practically everyone has commented on the 4k mappings ;)  There
> are a few problems around this.  The first is that iommu drivers don't
> necessarily support sub-region unmapping, so if we map 1GB and later
> want to unmap 4k, we can't do it atomically.  4k gives us the most
> flexibility for supporting fine granularities.  Another problem is that
> we're using get_user_pages to pin memory.  It's been suggested that we
> should use mlock for this, but I can't find anything that prevents a
> user from later munlock'ing the memory and then getting access to memory
> they shouldn't have.  Those kind of limit us, but I don't see it being
> an API problem for VFIO, just implementation.

Ok.

>> Also, mappings will have additional Freescale-specific attributes
>> that need to get passed through to dma_map somehow.   For
>> example, the iommu can stash directly into a CPU's cache
>> and we have iommu mapping properties like the cache stash
>> target id and an operation mapping attribute.
>>
>> How do you envision handling proprietary attributes
>> in struct vfio_dma_map?
>
> Let me turn the question around, how do you plan to support proprietary
> attributes in the IOMMU API?  Is the user level the appropriate place to
> specify them, or are they an intrinsic feature of the domain?  We've
> designed struct vfio_dma_map for extension, so depending on how many
> bits you need, we can make a conduit using the flags directly or setting
> a new flag to indicate presence of an arch specific attributes field.

The attributes are not intrinsic features of the domain.  User space will
need to set them.  But in thinking about it a bit more I think the attributes
are more properties of the domain rather than a per map() operation
characteristic.  I think a separate API might be appropriate.  Define a
new set_domain_attrs() op in the iommu_ops.In user space, perhaps
 a new vfio group API-- VFIO_GROUP_SET_ATTRS,
VFIO_GROUP_GET_ATTRS.

Stuart



Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-01 Thread Stuart Yoder
>> The attributes are not intrinsic features of the domain.  User space will
>> need to set them.  But in thinking about it a bit more I think the attributes
>> are more properties of the domain rather than a per map() operation
>> characteristic.  I think a separate API might be appropriate.  Define a
>> new set_domain_attrs() op in the iommu_ops.    In user space, perhaps
>>  a new vfio group API-- VFIO_GROUP_SET_ATTRS,
>> VFIO_GROUP_GET_ATTRS.
>
> In that case, you should definitely be following what Alexey is thinking
> about with an iommu_setup IOMMU API callback.  I think it's shaping up
> to do:
>
> x86:
>  - Report any IOVA range restrictions imposed by hw implementation
> POWER:
>  - Request IOVA window size, report size and base
> powerpc:
>  - Set domain attributes, probably report range as well.

One other mechanism we need as well is the ability to
enable/disable a domain.

For example-- suppose a device is assigned to a VM, the
device is in use when the VM is abruptly terminated.  The
VM terminate would shut off DMA at the IOMMU, but now
the device is in an indeterminate state.   Some devices
have no simple reset bit and getting the device back into
a sane state could be complicated-- something the hypervisor
doesn't want to do.

So now KVM restarts the VM, vfio init happens for the device
and  the IOMMU for that device is re-configured,
etc, but we really can't re-enable DMA until the guest OS tells us
(via an hcall) that it is ready.   The guest needs to get the
assigned device in a sane state before DMA is enabled.

Does this warrant a new domain enable/disable API, or should
we make this part of the setup API we are discussing
here?

Stuart



Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Stuart Yoder
On Thu, Dec 1, 2011 at 3:25 PM, Alex Williamson
 wrote:
> On Thu, 2011-12-01 at 14:58 -0600, Stuart Yoder wrote:
>> One other mechanism we need as well is the ability to
>> enable/disable a domain.
>>
>> For example-- suppose a device is assigned to a VM, the
>> device is in use when the VM is abruptly terminated.  The
>> VM terminate would shut off DMA at the IOMMU, but now
>> the device is in an indeterminate state.   Some devices
>> have no simple reset bit and getting the device back into
>> a sane state could be complicated-- something the hypervisor
>> doesn't want to do.
>>
>> So now KVM restarts the VM, vfio init happens for the device
>> and  the IOMMU for that device is re-configured,
>> etc, but we really can't re-enable DMA until the guest OS tells us
>> (via an hcall) that it is ready.   The guest needs to get the
>> assigned device in a sane state before DMA is enabled.
>
> Giant red flag.  We need to paravirtualize the guest?  Not on x86.

It's the reality we have to deal with, but doing this would obviously
only apply to platforms that need it.

> Some
> devices are better for assignment than others.  PCI devices are moving
> towards supporting standard reset mechanisms.
>
>> Does this warrant a new domain enable/disable API, or should
>> we make this part of the setup API we are discussing
>> here?
>
> What's wrong with simply not adding any DMA mapping entries until you
> think your guest is ready?  Isn't that effectively the same thing?
> Unmap ~= disable.  If the IOMMU API had a mechanism to toggle the iommu
> domain on and off, I wouldn't be opposed to adding an ioctl to do it,
> but it really seems like just a shortcut vs map/unmap.  Thanks,

Yes, we could do something like that I guess.

Stuart



[Qemu-devel] vfio / iommu domain attributes

2011-12-07 Thread Stuart Yoder
In the vfio RFC thread there seemed to be convergence that some new
iommu_ops API is needed to set some platform specific aspects of an
iommu domain.

On Wed, Nov 30, 2011 at 10:58 AM, Alex Williamson
 wrote:
[cut]
> In that case, you should definitely be following what Alexey is thinking
> about with an iommu_setup IOMMU API callback.  I think it's shaping up
> to do:
>
> x86:
>  - Report any IOVA range restrictions imposed by hw implementation
> POWER:
>  - Request IOVA window size, report size and base
> powerpc:
>  - Set domain attributes, probably report range as well.

Alex, Alexey I'm wondering if you've had any new thoughts on this over
the last week.

For Freescale, our iommu domain attributes would look something like:
-domain iova base address
-domain iova window size
-domain enable/disable
-number of subwindows
-operation mapping table index
-stash destination CPU
-stash target (cache– L1, L2, L3)

These are all things that need to be set by the creator of the domain.

Since the domain attributes are going to be so different for each platform does
it make sense to define a new iommu_ops call back that just takes a void pointer
that can be implemented in a platform specific way?   For example:

struct iommu_ops {
[cut]
int (*domain_set_attrs)(struct iommu_domain *domain,
  void *attrs);
int (*domain_get_attrs)(struct iommu_domain *domain,
  void *attrs);
}

Whatever this API winds up looking like it needs to be reflected in
the vfio interface to user space as well.

Thanks,
Stuart



Re: [Qemu-devel] vfio / iommu domain attributes

2011-12-07 Thread Stuart Yoder
On Wed, Dec 7, 2011 at 10:38 AM, Joerg Roedel  wrote:
> On Wed, Dec 07, 2011 at 09:54:39AM -0600, Stuart Yoder wrote:
>> Alex, Alexey I'm wondering if you've had any new thoughts on this over
>> the last week.
>>
>> For Freescale, our iommu domain attributes would look something like:
>>     -domain iova base address
>>     -domain iova window size
>
> I agree with that.
>
>>     -domain enable/disable
>>     -number of subwindows
>>     -operation mapping table index
>>     -stash destination CPU
>>     -stash target (cache– L1, L2, L3)
>
> Why does the user of the IOMMU-API need to have control over these
> things?

Our IOMMU complicates things in that it is used for more than just
memory protection
and address translation.  It has a concept of operation translation as well.
Some devices could do a 'write' transaction that when passing through the
iommu gets translated to a a 'write-with-stash'.   Stashed transactions
get pushed directly into some cache.

It's the entity creating and setting up the domain that will have the knowledge
of what cache is to be stashed to.Right now software that uses stashing
is pinned to a CPU, but if in the future it's possible that we'll want to
work without pinning and may need to update stashing attributes on the
fly.

The overall iova window for the domain can be divided into a configurable
number of subwindows (a power of 2, up to 256), which means we can have
a contiguous iova region backed by up to 256 physically dis-contiguous
subwindows.The creator of the iommu domain is in the best position
to know how many subwindows are needed (the fewer the better for
performance reasons).

So, in short, the above list of attributes are the attributes of our
iommu hardware
and  the knowlege of how they should be set is with the domain creator.

>> These are all things that need to be set by the creator of the domain.
>>
>> Since the domain attributes are going to be so different for each platform 
>> does
>> it make sense to define a new iommu_ops call back that just takes a void 
>> pointer
>> that can be implemented in a platform specific way?   For example:
>>
>>     struct iommu_ops {
>>         [cut]
>>         int (*domain_set_attrs)(struct iommu_domain *domain,
>>                               void *attrs);
>>         int (*domain_get_attrs)(struct iommu_domain *domain,
>>                               void *attrs);
>>     }
>
> A void pointer is certainly the worst choice for an interface. I think
> it is better to have at least a set of common attributes. Somthing like
> this:
>
> iommu_domain_set_attr(struct iommu_domain *domain, enum attr_type, void *data)
> iommu_domain_get_attr(struct iommu_domain *domain, enum attr_type, void *data)

That would be fine.

Stuart