from:"Blue Swirl"

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-12 Thread Blue Swirl

On 6/12/09, Mark McLoughlin  wrote:
> On Fri, 2009-06-12 at 12:00 -0500, Anthony Liguori wrote:
>  > Mark McLoughlin wrote:
>  > > So, when libvirt creates a guest for the first time, it makes a copy of
>  > > the device tree and continues to use that even if qemu is upgraded.
>  > > That's enough to ensure compat is retained for all built-in devices.
>  > >
>  > > However, in order to retain compat for that SCSI device (e.g. ensuring
>  > > the PCI address doesn't change as other devices are added an removed),
>  > > we're back to the same problem ... either:
>  > >
>  > >   1) Use '-drive file=foo.img,if=scsi,pci_addr=foo'; in order to figure
>  > >  out what address to use, libvirt would need to query qemu for what
>  > >  address was originally allocated to device or it would do all the
>  > >  PCI address allocation itself ... or:
>  > >
>  > >   2) Don't use the command line, instead get a dump of the entire
>  > >  device tree (including the SCSI device) - if the device is to be
>  > >  removed or modified in future, libvirt would need to modify the
>  > >  device tree
>  > >
>  > > The basic problem would be that the command line config would have very
>  > > limited ability to override the device tree config.
>  > >
>  >
>  > After libvirt has done -drive file=foo... it should dump the machine
>  > config and use that from then on.
>
>  Right - libvirt then wouldn't be able to avoid the complexity of merging
>  any future changes into the dumped machine config.
>
>  > To combined to a single thread...
>  > > How do you add a new attribute to the device tree and, when a supplied
>  > > device tree lacking said attribute, distinguish between a device tree
>  > > from an old version of qemu (i.e. use the old default) and a partial
>  > > device tree from the VM manager (i.e. use the new default) ?
>  > >
>  >
>  > Please define "attribute".  I don't follow what you're asking.
>
>  e.g. a per-device "enable MSI support" flag.
>
>  If qemu is supplied with a device tree that lacks that flag, does it
>  enable or disable MSI?
>
>  Enable by default is bad - it could be a device tree dumped from an old
>  version of qemu, so compat would be broken.
>
>  Disable by default is bad - it could be a simple device tree supplied by
>  the user, and the latest features are wanted.
>
>  Maybe we want a per-device "this is a complete device description" flag
>  and if anything is missing from a supposedly complete description, the
>  old defaults would be used. A config dumped from qemu would have this
>  flag set, a config generated by libvirt would not have the flag.

If the device has different behavior or different properties from
guest perspective compared to the old device, it should get a new
device type so that you could specify in the device tree either the
old device or the new one. Flags won't help in the long term.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-15 Thread Blue Swirl

On 6/15/09, Avi Kivity  wrote:
> On 06/15/2009 09:12 PM, Anthony Liguori wrote:
>
> >
> > 2) Whenever the default machine type changes in a guest-visible way,
> introduce a new machine type
> >
>
>  s/whenever/qemu stable release/
>
>
> >  - Use explicit versions in name: pc-v1, pc-v2
> >
>
>  pc-qemu-0.10?

pc-2009.06? Or given the hardware, should that be pc-1997?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Planning for the 0.11.0 release

2009-06-23 Thread Blue Swirl

On 6/23/09, Anthony Liguori  wrote:
> Hi,
>
>  It's getting to be about the time to start thinking about the 0.11.0
> release.  0.10.0 was released on March 2nd so following with the 6 month
> release cycle, that would put 0.11.0 at September 2nd.
>
>  Based on the experiences with the stable releases, here's what I'd
> recommend:
>
>  o On July 15th, fork master -> stable-0.11
>  o Change version to 0.10.90
>  o Release qemu-0.11.0-rc1
>  o Release additional -rcN releases every 1-2 weeks
>  o Introduce a new maintainer for stable-0.10 (via git pulls)
>  o At least 1 week before release, hopefully we'll have the final -rcN that
> we can then declare 0.11.0.

Sounds OK. I think OpenBIOS releases should follow similar schedule,
maybe even with matching SVN tags (1.1-rc1 for 0.11.0-rc1 etc).

>  I think we should really try hard to make these dates.  I only have a few
> things that I would like to see happen before forking stable-0.11.  Namely:
>
>  o Setup qemu.org infrastructure (git hosting, wiki)
>  o Setup qemu bug tracker (see next mail)
>  o Include all ROM source code in tree via git submodules.  This is a major
> headache for distributors and I think it's important to resolve before our
> next release.

I think this is great, but OpenBIOS still uses Subversion. Can git use
SVN submodules for example?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] virtio_serial: A char device for simple guest <-> host communication

2009-06-23 Thread Blue Swirl

On 6/23/09, Amit Shah  wrote:
> We expose multiple char devices ("ports") for simple communication
>  between the host userspace and guest.
>  +struct virtio_serial_config {
>  +   __u32 nr_ports;
>  +   __u16 status;
>  +} __attribute__((packed));

There is still structure packing. I'd use __u16 for both fields, do
you really need 4 gigs of ports?.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] virtio-serial: virtio device for simple host <-> guest communication

2009-06-23 Thread Blue Swirl

On 6/23/09, Amit Shah  wrote:
> This interface presents a char device from which bits can be
>  sent and read.

>  +struct virtio_serial_config
>  +{
>  +uint32_t nr_ports;
>  +uint16_t status;
>  +} __attribute__((packed));

Obviously this has to match the kernel structure if you go for 16 bit nr_ports.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] QEMU bug tracker on Launchpad

2009-06-23 Thread Blue Swirl

On 6/23/09, Anthony Liguori  wrote:
> Dustin Kirkland was kind enough to setup a bug tracker for QEMU on
> Launchpad.  I would like to make this the official QEMU bug tracker unless
> there is significant objection.

The links on code tab do not show which is our tree and there are some
Ubuntu trees:
https://code.launchpad.net/qemu

Can that be fixed?

>  There are a number of QEMU/KVM bug trackers mostly distribution centric
> today.  Having a common upstream bug tracker will help immensely in
> coordinating bug fixing effort for those people who are interested in that
> sort of thing :-)
>
>  Here are some of the requirements I had for a bug tracker in no particular
> order:
>
>  1) minimal work required for the QEMU maintainers
>  2) ability to link bugs to external bug trackers
>  3) ability to control bug status via mails
>  4) API for scripting
>
>  The biggest issue for me with Launchpad was that it is not open source
> today.  Canonical is actively working to release the source code though and
> have scheduled a date for release this July.  See the link below for more
> information.
>
>  I've already begun using this bug tracker so you can look through it today
> to get a feeling for what it's like.
>
>  https://bugs.launchpad.net/qemu
>  https://bugs.launchpad.net/qemu/+bugs
>  https://dev.launchpad.net/OpenSourcing
>
>  --
>  Regards,
>
>  Anthony Liguori
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCHv3 1/3] qemu/msi: fix segfault in msix_save

2009-07-05 Thread Blue Swirl

On 7/5/09, Michael S. Tsirkin  wrote:
> This fixes segfault reported by Kevin Wolf,
>  and simplifies the code in msix_save.

>  +if (!dev->cap_present & QEMU_PCI_CAP_MSIX)
>  +return;

Dubious: !x & y. You also forgot the braces.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 4/5] qemu/msi: missing braces

2009-07-05 Thread Blue Swirl

On 7/5/09, Michael S. Tsirkin  wrote:
> MSIX present bit is tested incorrectly, and only happens to work because
>  the bit we are testing is 0x1.  Add braces to fix this.
>
>  Reported-by: Blue Swirl 
>  Signed-off-by: Michael S. Tsirkin 
>  ---
>   hw/msix.c |2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
>  diff --git a/hw/msix.c b/hw/msix.c
>  index 33549f5..db72cc3 100644
>  --- a/hw/msix.c
>  +++ b/hw/msix.c
>  @@ -298,7 +298,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
>   {
>  unsigned n = dev->msix_entries_nr;
>
>  -if (!dev->cap_present & QEMU_PCI_CAP_MSIX)
>  +if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))

With the braces comment I meant that while working on the code, you
should update it to match CODING_STYLE:
if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
return;
}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 4/5] qemu/msi: missing braces

2009-07-05 Thread Blue Swirl

On 7/5/09, Michael S. Tsirkin  wrote:
> On Sun, Jul 05, 2009 at 02:48:12PM +0300, Blue Swirl wrote:
>  > On 7/5/09, Michael S. Tsirkin  wrote:
>  > > MSIX present bit is tested incorrectly, and only happens to work because
>  > >  the bit we are testing is 0x1.  Add braces to fix this.
>  > >
>  > >  Reported-by: Blue Swirl 
>  > >  Signed-off-by: Michael S. Tsirkin 
>  > >  ---
>  > >   hw/msix.c |2 +-
>  > >   1 files changed, 1 insertions(+), 1 deletions(-)
>  > >
>  > >  diff --git a/hw/msix.c b/hw/msix.c
>  > >  index 33549f5..db72cc3 100644
>  > >  --- a/hw/msix.c
>  > >  +++ b/hw/msix.c
>  > >  @@ -298,7 +298,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
>  > >   {
>  > >  unsigned n = dev->msix_entries_nr;
>  > >
>  > >  -if (!dev->cap_present & QEMU_PCI_CAP_MSIX)
>  > >  +if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
>  >
>  > With the braces comment I meant that while working on the code, you
>  > should update it to match CODING_STYLE:
>  > if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
>  > return;
>  > }
>
>
> Yea ... it's probably better to do this all over the file, not piecewise,
>  though. No?

I think it's better to do it together with other changes:
http://lists.gnu.org/archive/html/qemu-devel/2009-05/msg00925.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] rev5: support colon in filenames

2009-07-15 Thread Blue Swirl

On 7/15/09, Ram Pai  wrote:
> Problem: It is impossible to feed filenames with the character colon because
>  qemu interprets such names as a protocol. For example filename scsi:0, is
>  interpreted as a protocol by name "scsi".

>  --- a/block/raw-posix.c
>  +++ b/block/raw-posix.c
>  +static int qemu_open(const char *filename, int flags, ...)

>  --- a/block/raw-win32.c
>  +++ b/block/raw-win32.c
>  +fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,

I bet this won't compile on win32.

Instead of this (IMHO doomed) escape approach, maybe the filename
parameter could be specified as the next argument, for example:
-hda format=qcow2,blah,blah,filename_is_next_arg -hda "filename with
funky characters like ',' ':' & '!'"
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] rev5: support colon in filenames

2009-07-15 Thread Blue Swirl

On 7/15/09, Anthony Liguori  wrote:
> Blue Swirl wrote:
>
> > I bet this won't compile on win32.
> >
> > Instead of this (IMHO doomed) escape approach, maybe the filename
> > parameter could be specified as the next argument, for example:
> > -hda format=qcow2,blah,blah,filename_is_next_arg -hda
> "filename with
> > funky characters like ',' ':' & '!'"
> >
> >
>
>  -drive name=hda,if=ide,cache=off -hda foo.img
>  -drive name=vda,if=virtio,cache=writeback -vda foo.img
>  -drive name=sdb,if=scsi,unit=1 -sdb boo.img
>
>  But Paul has long objected to having -vda or -sda syntaxes.  I do agree
> though that the most sane thing to do is to make the filename an independent
> argument.

Then how about something like:
 -drive name=hda,if=ide,cache=off,file_is_arg -filearg foo.img
 -drive name=vda,if=virtio,cache=writeback,file_comes_next  -patharg  foo.img
 -drive name=sdb,if=scsi,unit=1,fnarg -fnarg boo.img
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Fix fallouts from Linux header inclusion

2011-06-26 Thread Blue Swirl

Thanks, applied.

On Thu, Jun 23, 2011 at 11:05 AM, Jan Kiszka  wrote:
> From: Jan Kiszka 
>
> This is an all-in-one fix for the smaller and bigger mistakes of the
> build system changes for accompanied Linux headers:
>  - only enable KVM and vhost on Linux hosts
>  - fix powerpc asm header symlink
>  - do not use Linux headers on non-Linux hosts
>  - fix kvmclock for !CONFIG_KVM
>  - fix s390 build on non-Linux hosts
>
> Signed-off-by: Jan Kiszka 
> ---
>
> Let me know if separate patches are preferred for this.
>
>  Makefile.target          |    8 ++--
>  configure                |   34 +++---
>  hw/kvmclock.h            |   10 ++
>  target-s390x/op_helper.c |    6 +-
>  4 files changed, 40 insertions(+), 18 deletions(-)
>
> diff --git a/Makefile.target b/Makefile.target
> index 03d3646..d3971a6 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -14,7 +14,10 @@ endif
>
>  TARGET_PATH=$(SRC_PATH)/target-$(TARGET_BASE_ARCH)
>  $(call set-vpath, $(SRC_PATH):$(TARGET_PATH):$(SRC_PATH)/hw)
> -QEMU_CFLAGS+= -I.. -I../linux-headers -I$(TARGET_PATH) -DNEED_CPU_H
> +ifdef CONFIG_LINUX
> +QEMU_CFLAGS += -I../linux-headers
> +endif
> +QEMU_CFLAGS += -I.. -I$(TARGET_PATH) -DNEED_CPU_H
>
>  include $(SRC_PATH)/Makefile.objs
>
> @@ -234,7 +237,8 @@ obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o 
> piix_pci.o
>  obj-i386-y += vmport.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o kvmclock.o
> +obj-i386-y += pc_piix.o
> +obj-i386-$(CONFIG_KVM) += kvmclock.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/configure b/configure
> index 856b41e..e523976 100755
> --- a/configure
> +++ b/configure
> @@ -113,7 +113,7 @@ curl=""
>  curses=""
>  docs=""
>  fdt=""
> -kvm="yes"
> +kvm=""
>  nptl=""
>  sdl=""
>  vnc="yes"
> @@ -129,7 +129,7 @@ xen=""
>  xen_ctrl_version=""
>  linux_aio=""
>  attr=""
> -vhost_net="yes"
> +vhost_net=""
>  xfs=""
>
>  gprof="no"
> @@ -457,6 +457,8 @@ Haiku)
>   linux="yes"
>   linux_user="yes"
>   usb="linux"
> +  kvm="yes"
> +  vhost_net="yes"
>   if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
>     audio_possible_drivers="$audio_possible_drivers fmod"
>   fi
> @@ -3444,19 +3446,21 @@ if test "$target_linux_user" = "yes" -o 
> "$target_bsd_user" = "yes" ; then
>  fi
>
>  # use included Linux headers
> -includes="-I\$(SRC_PATH)/linux-headers $includes"
> -mkdir -p linux-headers
> -case "$cpu" in
> -i386|x86_64)
> -  symlink $source_path/linux-headers/asm-x86 linux-headers/asm
> -  ;;
> -ppcemb|ppc|ppc64)
> -  symlink $source_path/linux-headers/asm-x86 linux-headers/asm
> -  ;;
> -s390x)
> -  symlink $source_path/linux-headers/asm-s390 linux-headers/asm
> -  ;;
> -esac
> +if test "$linux" = "yes" ; then
> +  includes="-I\$(SRC_PATH)/linux-headers $includes"
> +  mkdir -p linux-headers
> +  case "$cpu" in
> +  i386|x86_64)
> +    symlink $source_path/linux-headers/asm-x86 linux-headers/asm
> +    ;;
> +  ppcemb|ppc|ppc64)
> +    symlink $source_path/linux-headers/asm-powerpc linux-headers/asm
> +    ;;
> +  s390x)
> +    symlink $source_path/linux-headers/asm-s390 linux-headers/asm
> +    ;;
> +  esac
> +fi
>
>  echo "LDFLAGS+=$ldflags" >> $config_target_mak
>  echo "QEMU_CFLAGS+=$cflags" >> $config_target_mak
> diff --git a/hw/kvmclock.h b/hw/kvmclock.h
> index 7a83cbe..252ea13 100644
> --- a/hw/kvmclock.h
> +++ b/hw/kvmclock.h
> @@ -11,4 +11,14 @@
>  *
>  */
>
> +#ifdef CONFIG_KVM
> +
>  void kvmclock_create(void);
> +
> +#else /* CONFIG_KVM */
> +
> +static inline void kvmclock_create(void)
> +{
> +}
> +
> +#endif /* !CONFIG_KVM */
> diff --git a/target-s390x/op_helper.c b/target-s390x/op_helper.c
> index 9429698..6a3c1f6 100644
> --- a/target-s390x/op_helper.c
> +++ b/target-s390x/op_helper.c
> @@ -23,8 +23,10 @@
>  #include "helpers.h"
>  #include 
>  #include "kvm.h"
> -#include 
>  #include "qemu-timer.h"
> +#ifdef CONFIG_KVM
> +#include 
> +#endif
>
>  /*/
>  /* Softmmu support */
> @@ -2332,7 +2334,9 @@ static void program_interrupt(CPUState *env, uint32_t 
> code, int ilc)
>     qemu_log("program interrupt at %#" PRIx64 "\n", env->psw.addr);
>
>     if (kvm_enabled()) {
> +#ifdef CONFIG_KVM
>         kvm_s390_interrupt(env, KVM_S390_PROGRAM_INT, code);
> +#endif
>     } else {
>         env->int_pgm_code = code;
>         env->int_pgm_ilc = ilc;
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: nested VMX + unrelated qemu bug

2011-07-16 Thread Blue Swirl

On Sat, Jul 16, 2011 at 10:53 AM, Alexander Graf  wrote:
>
> On 14.07.2011, at 17:22, Bernhard M. Wiedemann wrote:
>
>> Hi,
>>
>> I tried nested VMX on Xeon E5630 and it worked really well with the Kernel 
>> from avi's git and 0.14.0
>> (with modprobe kvm-intel nested=1)
>>
>>
>> but in the process I found that qemu built from
>> git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
>> crashed when started with -m 3600 or more
>> while booting into openSUSE-11.3  where 0.14.0 worked well (even with 5GB)
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -drive 
>> file=opensuse-113-64.img,if=virtio,boot=on -m 3600 -serial stdio -vnc :9
>> Could not open option rom 'extboot.bin': No such file or directory
>> doing fast boot
>> Creating device nodes with udev
>> Trying manual resume from /dev/vda1
>> Invoking userspace resume from /dev/vda1
>> resume: libgcrypt version: 1.4.4
>> Trying manual resume from /dev/vda1
>> Invoking in-kernel resume from /dev/vda1
>> Waiting for device /dev/vda2 to appear:  ok
>> fsck from util-linux-ng 2.17.2
>> [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/vda2
>> /dev/vda2: recovering journal
>> /dev/vda2: clean, 62553/230608 files, 449938/922112 blocks
>> fsck succeeded. Mounting root device read-write.
>> Mounting root /dev/vda2
>> mount -o rw,acl,user_xattr -t ext4 /dev/vda2 /root
>>
>> Bad ram offset 1009cc000
>> Aborted
>
> Ah, the infamous memory map bug.
>
> Anthony, could you please pull the xen-next branch so this one finally gets 
> fixed? The following patch should resolve that issue:
>
> commit f221e5ac378feea71d9857ddaa40f829c511742f
> Author: Stefano Stabellini 
> Date:   Mon Jun 27 18:26:06 2011 +0100
>
>    qemu_ram_ptr_length: take ram_addr_t as arguments
>
>    qemu_ram_ptr_length should take ram_addr_t as argument rather than
>    target_phys_addr_t because is doing comparisons with RAMBlock addresses.
>
>    cpu_physical_memory_map should create a ram_addr_t address to pass to
>    qemu_ram_ptr_length from PhysPageDesc phys_offset.
>
>    Remove code after abort() in qemu_ram_ptr_length.
>
>
> Otherwise, Blue - as you do have commit rights as well - Anthony seems to be 
> rather busy these days. Could you please jump in and commit the outstanding 
> pull requests from maintainers?

The pull would break build:
  LINK  alpha-softmmu/qemu-system-alpha
../xen_console.o: In function `con_init':
/src/qemu/hw/xen_console.c:208: undefined reference to
`xenstore_store_pv_console_info'
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC v4 11/58] memory: add ioeventfd support

2011-07-20 Thread Blue Swirl

On Sun, Jul 17, 2011 at 2:13 PM, Avi Kivity  wrote:
> As with the rest of the memory API, the caller associates an eventfd
> with an address, and the memory API takes care of registering or
> unregistering when the address is made visible or invisible to the
> guest.
>
> Signed-off-by: Avi Kivity 
> ---
>  memory.c |  218 
> ++
>  memory.h |   20 ++
>  2 files changed, 238 insertions(+), 0 deletions(-)
>
> diff --git a/memory.c b/memory.c
> index e4446a0..cc5a0a4 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -15,6 +15,7 @@
>  #include "exec-memory.h"
>  #include "ioport.h"
>  #include "bitops.h"
> +#include "kvm.h"
>  #include 
>
>  typedef struct AddrRange AddrRange;
> @@ -64,6 +65,38 @@ struct CoalescedMemoryRange {
>     QTAILQ_ENTRY(CoalescedMemoryRange) link;
>  };
>
> +struct MemoryRegionIoeventfd {
> +    AddrRange addr;
> +    bool match_data;
> +    uint64_t data;
> +    int fd;
> +};
> +
> +static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd a,
> +                                           MemoryRegionIoeventfd b)
> +{
> +    if (a.addr.start < b.addr.start) return true;
> +    if (a.addr.start > b.addr.start) return false;
> +    if (a.addr.size < b.addr.size) return true;
> +    if (a.addr.size > b.addr.size) return false;
> +    if (a.match_data < b.match_data) return true;
> +    if (a.match_data > b.match_data) return false;
> +    if (a.match_data) {
> +        if (a.data < b.data) return true;
> +        if (a.data > b.data) return false;
> +    }
> +    if (a.fd < b.fd) return true;
> +    if (a.fd > b.fd) return false;

NACK for CODING_STYLE.

> +    return false;
> +}
> +
> +static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
> +                                          MemoryRegionIoeventfd b)
> +{
> +    return !memory_region_ioeventfd_before(a, b)
> +        && !memory_region_ioeventfd_before(b, a);
> +}
> +
>  typedef struct FlatRange FlatRange;
>  typedef struct FlatView FlatView;
>
> @@ -92,6 +125,8 @@ struct AddressSpace {
>     const AddressSpaceOps *ops;
>     MemoryRegion *root;
>     FlatView current_map;
> +    int ioeventfd_nb;
> +    MemoryRegionIoeventfd *ioeventfds;
>  };
>
>  struct AddressSpaceOps {
> @@ -99,6 +134,8 @@ struct AddressSpaceOps {
>     void (*range_del)(AddressSpace *as, FlatRange *fr);
>     void (*log_start)(AddressSpace *as, FlatRange *fr);
>     void (*log_stop)(AddressSpace *as, FlatRange *fr);
> +    void (*ioeventfd_add)(AddressSpace *as, MemoryRegionIoeventfd *fd);
> +    void (*ioeventfd_del)(AddressSpace *as, MemoryRegionIoeventfd *fd);
>  };
>
>  #define FOR_EACH_FLAT_RANGE(var, view)          \
> @@ -201,11 +238,37 @@ static void as_memory_log_stop(AddressSpace *as, 
> FlatRange *fr)
>     cpu_physical_log_stop(fr->addr.start, fr->addr.size);
>  }
>
> +static void as_memory_ioeventfd_add(AddressSpace *as, MemoryRegionIoeventfd 
> *fd)
> +{
> +    int r;
> +
> +    if (!fd->match_data || fd->addr.size != 4) {
> +        abort();
> +    }
> +
> +    r = kvm_set_ioeventfd_mmio_long(fd->fd, fd->addr.start, fd->data, true);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
> +static void as_memory_ioeventfd_del(AddressSpace *as, MemoryRegionIoeventfd 
> *fd)
> +{
> +    int r;
> +
> +    r = kvm_set_ioeventfd_mmio_long(fd->fd, fd->addr.start, fd->data, false);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
>  static const AddressSpaceOps address_space_ops_memory = {
>     .range_add = as_memory_range_add,
>     .range_del = as_memory_range_del,
>     .log_start = as_memory_log_start,
>     .log_stop = as_memory_log_stop,
> +    .ioeventfd_add = as_memory_ioeventfd_add,
> +    .ioeventfd_del = as_memory_ioeventfd_del,
>  };
>
>  static AddressSpace address_space_memory = {
> @@ -281,9 +344,35 @@ static void as_io_range_del(AddressSpace *as, FlatRange 
> *fr)
>     isa_unassign_ioport(fr->addr.start, fr->addr.size);
>  }
>
> +static void as_io_ioeventfd_add(AddressSpace *as, MemoryRegionIoeventfd *fd)
> +{
> +    int r;
> +
> +    if (!fd->match_data || fd->addr.size != 2) {
> +        abort();
> +    }
> +
> +    r = kvm_set_ioeventfd_pio_word(fd->fd, fd->addr.start, fd->data, true);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
> +static void as_io_ioeventfd_del(AddressSpace *as, MemoryRegionIoeventfd *fd)
> +{
> +    int r;
> +
> +    r = kvm_set_ioeventfd_pio_word(fd->fd, fd->addr.start, fd->data, false);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
>  static const AddressSpaceOps address_space_ops_io = {
>     .range_add = as_io_range_add,
>     .range_del = as_io_range_del,
> +    .ioeventfd_add = as_io_ioeventfd_add,
> +    .ioeventfd_del = as_io_ioeventfd_del,
>  };
>
>  static AddressSpace address_space_io = {
> @@ -382,6 +471,69 @@ static FlatView generate_memory_topology(MemoryRegion 
> *mr)
>     return view;
>  }
>
> +static void address_space_add_del_ioeventfds(AddressSpace *as,
> +

Re: [Qemu-devel] [RFC v5 12/86] memory: add ioeventfd support

2011-07-21 Thread Blue Swirl

On Wed, Jul 20, 2011 at 7:49 PM, Avi Kivity  wrote:
> As with the rest of the memory API, the caller associates an eventfd
> with an address, and the memory API takes care of registering or
> unregistering when the address is made visible or invisible to the
> guest.
>
> Signed-off-by: Avi Kivity 
> ---
>  memory.c |  218 
> ++
>  memory.h |   20 ++
>  2 files changed, 238 insertions(+), 0 deletions(-)
>
> diff --git a/memory.c b/memory.c
> index e4446a0..cc5a0a4 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -15,6 +15,7 @@
>  #include "exec-memory.h"
>  #include "ioport.h"
>  #include "bitops.h"
> +#include "kvm.h"
>  #include 
>
>  typedef struct AddrRange AddrRange;
> @@ -64,6 +65,38 @@ struct CoalescedMemoryRange {
>     QTAILQ_ENTRY(CoalescedMemoryRange) link;
>  };
>
> +struct MemoryRegionIoeventfd {
> +    AddrRange addr;
> +    bool match_data;
> +    uint64_t data;
> +    int fd;
> +};
> +
> +static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd a,
> +                                           MemoryRegionIoeventfd b)
> +{
> +    if (a.addr.start < b.addr.start) return true;
> +    if (a.addr.start > b.addr.start) return false;
> +    if (a.addr.size < b.addr.size) return true;
> +    if (a.addr.size > b.addr.size) return false;
> +    if (a.match_data < b.match_data) return true;
> +    if (a.match_data > b.match_data) return false;
> +    if (a.match_data) {
> +        if (a.data < b.data) return true;
> +        if (a.data > b.data) return false;
> +    }
> +    if (a.fd < b.fd) return true;
> +    if (a.fd > b.fd) return false;

NACK for CO.. Wait, is this another trap?

> +    return false;
> +}
> +
> +static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
> +                                          MemoryRegionIoeventfd b)
> +{
> +    return !memory_region_ioeventfd_before(a, b)
> +        && !memory_region_ioeventfd_before(b, a);
> +}
> +
>  typedef struct FlatRange FlatRange;
>  typedef struct FlatView FlatView;
>
> @@ -92,6 +125,8 @@ struct AddressSpace {
>     const AddressSpaceOps *ops;
>     MemoryRegion *root;
>     FlatView current_map;
> +    int ioeventfd_nb;
> +    MemoryRegionIoeventfd *ioeventfds;
>  };
>
>  struct AddressSpaceOps {
> @@ -99,6 +134,8 @@ struct AddressSpaceOps {
>     void (*range_del)(AddressSpace *as, FlatRange *fr);
>     void (*log_start)(AddressSpace *as, FlatRange *fr);
>     void (*log_stop)(AddressSpace *as, FlatRange *fr);
> +    void (*ioeventfd_add)(AddressSpace *as, MemoryRegionIoeventfd *fd);
> +    void (*ioeventfd_del)(AddressSpace *as, MemoryRegionIoeventfd *fd);
>  };
>
>  #define FOR_EACH_FLAT_RANGE(var, view)          \
> @@ -201,11 +238,37 @@ static void as_memory_log_stop(AddressSpace *as, 
> FlatRange *fr)
>     cpu_physical_log_stop(fr->addr.start, fr->addr.size);
>  }
>
> +static void as_memory_ioeventfd_add(AddressSpace *as, MemoryRegionIoeventfd 
> *fd)
> +{
> +    int r;
> +
> +    if (!fd->match_data || fd->addr.size != 4) {
> +        abort();
> +    }
> +
> +    r = kvm_set_ioeventfd_mmio_long(fd->fd, fd->addr.start, fd->data, true);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
> +static void as_memory_ioeventfd_del(AddressSpace *as, MemoryRegionIoeventfd 
> *fd)
> +{
> +    int r;
> +
> +    r = kvm_set_ioeventfd_mmio_long(fd->fd, fd->addr.start, fd->data, false);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
>  static const AddressSpaceOps address_space_ops_memory = {
>     .range_add = as_memory_range_add,
>     .range_del = as_memory_range_del,
>     .log_start = as_memory_log_start,
>     .log_stop = as_memory_log_stop,
> +    .ioeventfd_add = as_memory_ioeventfd_add,
> +    .ioeventfd_del = as_memory_ioeventfd_del,
>  };
>
>  static AddressSpace address_space_memory = {
> @@ -281,9 +344,35 @@ static void as_io_range_del(AddressSpace *as, FlatRange 
> *fr)
>     isa_unassign_ioport(fr->addr.start, fr->addr.size);
>  }
>
> +static void as_io_ioeventfd_add(AddressSpace *as, MemoryRegionIoeventfd *fd)
> +{
> +    int r;
> +
> +    if (!fd->match_data || fd->addr.size != 2) {
> +        abort();
> +    }
> +
> +    r = kvm_set_ioeventfd_pio_word(fd->fd, fd->addr.start, fd->data, true);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
> +static void as_io_ioeventfd_del(AddressSpace *as, MemoryRegionIoeventfd *fd)
> +{
> +    int r;
> +
> +    r = kvm_set_ioeventfd_pio_word(fd->fd, fd->addr.start, fd->data, false);
> +    if (r < 0) {
> +        abort();
> +    }
> +}
> +
>  static const AddressSpaceOps address_space_ops_io = {
>     .range_add = as_io_range_add,
>     .range_del = as_io_range_del,
> +    .ioeventfd_add = as_io_ioeventfd_add,
> +    .ioeventfd_del = as_io_ioeventfd_del,
>  };
>
>  static AddressSpace address_space_io = {
> @@ -382,6 +471,69 @@ static FlatView generate_memory_topology(MemoryRegion 
> *mr)
>     return view;
>  }
>
> +static void address_space_add_del_ioeventfds(AddressSpace *as,
> +

Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()

2011-07-25 Thread Blue Swirl

On Mon, Jul 25, 2011 at 1:09 PM, Avi Kivity  wrote:
> On 07/25/2011 01:04 PM, Alexander Graf wrote:
>>
>> On 25.07.2011, at 12:02, Avi Kivity wrote:
>>
>> >  On 07/25/2011 12:56 PM, Alexander Graf wrote:
>> >>  >
>> >>  >   That argument can be used to block any change.  You'll get used to
>> >> it in time.  The question is, is the new interface better or not.
>> >>
>> >>  I agree that it keeps you from accidently malloc'ing a struct of
>> >> pointer size. But couldn't we also just add this to checkpatch.pl?
>> >
>> >  Better APIs trump better patch review.
>>
>> Only if you enforce them. The only sensible thing for QEMU_NEW (despite
>> the general rule of upper case macros, I'd actually prefer this one to be
>> lower case though since it's so often used) would be to remove qemu_malloc,
>> declare malloc() as unusable and convert all users of qemu_malloc() to
>> qemu_new().
>
> Some qemu_mallocs() will remain (allocating a byte array or something
> variable sized).
>
> I agree qemu_new() will be nicer, but that will have to wait until Blue is
> several light-days away from Earth.

There is no escape. Don't make me destroy you. You cannot hide forever, Luke.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()

2011-07-25 Thread Blue Swirl

On Mon, Jul 25, 2011 at 3:21 PM, Anthony Liguori  wrote:
> On 07/25/2011 07:18 AM, Avi Kivity wrote:
>>
>> On 07/25/2011 03:11 PM, Anthony Liguori wrote:
>>>
>>> On 07/25/2011 03:51 AM, Avi Kivity wrote:

 qemu_malloc() is type-unsafe as it returns a void pointer. Introduce
 QEMU_NEW() (and QEMU_NEWZ()), which return the correct type.
>>>
>>> Just use g_new() and g_new0()
>>>
>>
>> These bypass qemu_malloc(). Are we okay with that?
>
> Yes.  We can just make qemu_malloc use g_malloc.

It would be also possible to make g_malloc() use qemu_malloc(). That
way we could keep the tracepoints which would lose their value with
g_malloc() otherwise.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()

2011-07-25 Thread Blue Swirl

On Mon, Jul 25, 2011 at 5:51 PM, Paolo Bonzini  wrote:
> On 07/25/2011 04:23 PM, Blue Swirl wrote:
>>
>> >  Yes.  We can just make qemu_malloc use g_malloc.
>>
>> It would be also possible to make g_malloc() use qemu_malloc(). That
>> way we could keep the tracepoints which would lose their value with
>> g_malloc() otherwise.
>
> qemu_malloc uses g_malloc => you keep tracepoints, you just do not trace
> memory allocated by glib

Unless the plan is to replace all qemu_malloc() calls with calls to g_malloc().

> g_malloc uses qemu_malloc => you keep and expand tracepoints, you lose the
> very nicely tuned allocator

It is replaced by libc malloc() which shouldn't be so bad either.

> The former is much less code, however it requires qemu_malloc to be always
> balanced with qemu_free (patches ready and on my github tree, won't be sent
> before KVM Forum though...).

Freeing qemu_malloc() memory with plain free() is a bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info

2012-07-12 Thread Blue Swirl

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
 wrote:
> The numa_fw_cfg paravirt interface is extended to include SRAT information for
> all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
> denoting start address, size and node proximity. The new info is appended 
> after
> existing numa info, so that the fw_cfg layout does not break.  This 
> information
> is used by Seabios to build hotplug memory device objects at runtime.
> nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
> to SeaBIOS.
>
> v1->v2:
> Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order 
> not
> to break existing layout
> Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt
>
> Signed-off-by: Vasilis Liaskovitis 
> ---
>  docs/specs/fwcfg.txt |   28 ++
>  hw/pc.c  |   53 -
>  vl.c |2 +-
>  3 files changed, 80 insertions(+), 3 deletions(-)
>  create mode 100644 docs/specs/fwcfg.txt
>
> diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
> new file mode 100644
> index 000..e6fcd8f
> --- /dev/null
> +++ b/docs/specs/fwcfg.txt
> @@ -0,0 +1,28 @@
> +QEMU<->BIOS Paravirt Documentation
> +--
> +
> +This document describes paravirt data structures passed from QEMU to BIOS.
> +
> +fw_cfg SRAT paravirt info
> +
> +The SRAT info passed from QEMU to BIOS has the following layout:
> +
> +---
> +#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | 
> ... | nodelast_mem
> +
> +---
> +#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | 
> dimmlast_sz | dimmlast_pxm
> +
> +Entry 0 contains the number of numa nodes (nb_numa_nodes).
> +
> +Entries 1..max_cpus: The next max_cpus entries describe node proximity for 
> each
> +one of the vCPUs in the system.
> +
> +Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
> +describe the memory size for each one of the NUMA nodes in the system.
> +
> +Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms 
> (nb_hp_dimms)
> +
> +The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet 
> contains
> +the physical address offset, size (in bytes), and node proximity for the
> +respective dimm.

The size and endianness are not specified, you are using LE 64 bit
values for each item.

> diff --git a/hw/pc.c b/hw/pc.c
> index ef9901a..cf651d0 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, 
> uint32_t type)
>  return index;
>  }
>
> +static void setup_hp_dimms(uint64_t *fw_cfg_slots);
> +
>  static void *bochs_bios_init(void)
>  {
>  void *fw_cfg;
>  uint8_t *smbios_table;
>  size_t smbios_len;
>  uint64_t *numa_fw_cfg;
> +uint64_t *hp_dimms_fw_cfg;
>  int i, j;
>
>  register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
> @@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
>  /* allocate memory for the NUMA channel: one (64bit) word for the number
>   * of nodes, one word for each VCPU->node and one word for each node to
>   * hold the amount of memory.
> + * Finally one word for the number of hotplug memory slots and three 
> words
> + * for each hotplug memory slot (start address, size and node proximity).
>   */
> -numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
> +numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) 
> * 8);
>  numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>  for (i = 0; i < max_cpus; i++) {
>  for (j = 0; j < nb_numa_nodes; j++) {
> @@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
>  for (i = 0; i < nb_numa_nodes; i++) {
>  numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
>  }
> +
> +numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
> +
> +hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
> +if (nb_hp_dimms)
> +setup_hp_dimms(hp_dimms_fw_cfg);

Braces.

> +
>  fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
> - (1 + max_cpus + nb_numa_nodes) * 8);
> + (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
>
>  return fw_cfg;
>  }
> @@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t 
> size)
>
>  return ret;
>  }
> +
> +static void setup_hp_dimms(uint64_t *fw_cfg_slots)
> +{
> +int i = 0;
> +Error *err = NULL;
> +DeviceState *dev;
> +DimmState *slot;
> +const char *type;
> +BusChild *kid;
> +BusState *bus = sysbus_get_default();
> +
> +QTAILQ_FOREACH(kid, &bus->childr

Re: [Qemu-devel] [RFC PATCH v2 06/21] dimm: Implement memory device abstraction

2012-07-12 Thread Blue Swirl

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
 wrote:
> Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
> particular dimm creates a new MemoryRegion of the given physical address
> offset, size and node proximity, and attaches it to main system memory as a
> sub_region. A hot-remove operation detaches and frees the MemoryRegion from
> system memory.
>
> This prototype still lacks proper qdev integration: a separate
> hotplug side-channel is used and main system bus hotplug capability is
> ignored.
>
> Signed-off-by: Vasilis Liaskovitis 
> ---
>  hw/Makefile.objs |2 +-
>  hw/dimm.c|  234 
> ++
>  hw/dimm.h|   58 +
>  3 files changed, 293 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dimm.c
>  create mode 100644 hw/dimm.h
>
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 3d77259..e2184bf 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
>  hw-obj-$(CONFIG_PCSPK) += pcspk.o
>  hw-obj-$(CONFIG_PCKBD) += pckbd.o
>  hw-obj-$(CONFIG_FDC) += fdc.o
> -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
> +hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
>  hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
>  hw-obj-$(CONFIG_DMA) += dma.o
>  hw-obj-$(CONFIG_I82374) += i82374.o
> diff --git a/hw/dimm.c b/hw/dimm.c
> new file mode 100644
> index 000..00c4623
> --- /dev/null
> +++ b/hw/dimm.c
> @@ -0,0 +1,234 @@
> +/*
> + * Dimm device for Memory Hotplug
> + *
> + * Copyright ProfitBricks GmbH 2012
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see 
> 
> + */
> +
> +#include "trace.h"
> +#include "qdev.h"
> +#include "dimm.h"
> +#include 
> +#include "../exec-memory.h"
> +#include "qmp-commands.h"
> +
> +static DeviceState *dimm_hotplug_qdev;
> +static dimm_hotplug_fn dimm_hotplug;
> +static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;

Using global state does not look right. It should always be possible
to pass around structures to avoid it.

> +
> +static Property dimm_properties[] = {
> +DEFINE_PROP_END_OF_LIST()
> +};
> +
> +void dimm_populate(DimmState *s)

All functions are global and exported but there does not seem to be
users. Please make all static which you can.

> +{
> +DeviceState *dev= (DeviceState*)s;
> +MemoryRegion *new = NULL;
> +
> +new = g_malloc(sizeof(MemoryRegion));
> +memory_region_init_ram(new, dev->id, s->size);
> +vmstate_register_ram_global(new);
> +memory_region_add_subregion(get_system_memory(), s->start, new);
> +s->mr = new;
> +s->populated = true;
> +}
> +
> +
> +void dimm_depopulate(DimmState *s)
> +{
> +assert(s);
> +if (s->populated) {
> +vmstate_unregister_ram(s->mr, NULL);
> +memory_region_del_subregion(get_system_memory(), s->mr);
> +memory_region_destroy(s->mr);
> +s->populated = false;
> +s->mr = NULL;
> +}
> +}
> +
> +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
> +dimm_idx, bool populated)
> +{
> +DeviceState *dev;
> +DimmState *mdev;
> +
> +dev = sysbus_create_simple("dimm", -1, NULL);
> +dev->id = id;
> +
> +mdev = DIMM(dev);
> +mdev->idx = dimm_idx;
> +mdev->start = 0;
> +mdev->size = size;
> +mdev->node = node;
> +mdev->populated = populated;
> +QTAILQ_INSERT_TAIL(&dimmlist, mdev, nextdimm);
> +return mdev;
> +}
> +
> +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
> +{
> +dimm_hotplug_qdev = qdev;
> +dimm_hotplug = hotplug;
> +dimm_scan_populated();
> +}
> +
> +void dimm_activate(DimmState *slot)
> +{
> +dimm_populate(slot);
> +if (dimm_hotplug)
> +dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);

Why the cast?

Also braces, please check your patches with checkpatch.pl.

> +}
> +
> +void dimm_deactivate(DimmState *slot)
> +{
> +if (dimm_hotplug)
> +dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
> +}
> +
> +DimmState *dimm_find_from_name(char *id)

const char *id?

> +{
> +Error *err = NULL;
> +DeviceState *qdev;
> +const char *type;
> +qdev = qdev_find_recursive(sysbus_get_default(), id);
> +if (qdev) {
> +type = object_property_get_str(OBJECT(qdev), "type",

Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug

2012-07-12 Thread Blue Swirl

On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
 wrote:
> This is v2 of the ACPI memory hotplug prototype for x86_64 target.

I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
memslot) would be useful for most targets, but hotplugging may be
limited to x86 only. It would be nice to keep these two separate or as
loosely coupled as possible.

>
> Changes v1->v2
>
> - memory map is automatically calculated for hotplug dimms. Dimms are added 
> from
> top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
> - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", 
> "dimm_del".
> - Seabios ejection array reduced to a byte. Use extraction macros for dimm 
> ssdt.
> - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
> - Documentation of new acpi_piix4 registers and paravirt data.
> - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
> notification for success / failure of memory hot-add and hot-remove 
> operations.
> Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
> - add monitor info command to report total guest memory (initial + hot-added)
> - add command line options and monitor commands for batch dimm 
> creation/population
>
> Overview:
>
> Dimm devices are modeled with a new qemu command line
>
> "-dimm id=name,size=sz,node=pxm,populated=on|off"
>
> As already mentioned, the starting physical address for all dimms is 
> calculated
> automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 
> 4G).
> Node is defining numa proximity for this dimm. When not defined it defaults
> to zero.
> "-dimm id=dimm0,size=512M,node=0,populated=off"
> will define a 512M memory slot belonging to numa node 0.
>
> Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
> Hot-add syntax: "dimm_add id"
> Hot-remove syntax: "dimm_del id"
>
> Issues:
>
> - Live migration works as long as populated field is changed to "on" for
> hotplugged dimms at the destination qemu command line (patch 12/21 lifts
> this requirement). The DimmState structure does not yet define a
> VMStateDescription, but i assume this is the preferred way to pass state
> for migration.
>
> - Dimms are abstracted as qdevices attached to the main system bus. However,
> memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> incapability. A cleaner integration is still needed, probably attaching memory
> devices as children-links of an acpi-capable device (in the pc case 
> acpi_piix4)
> instead of the system bus (TBD). Then device_add/device_del instead of new
> commands can hopefully be used.
>
> Comments/review welcome.
>
> series is based on uq/master for qemu-kvm, and master for seabios. Can be 
> found
> also at:
> http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> http://github.com/vliaskov/seabios/commits/memhp-v2
>
> Vasilis Liaskovitis (14):
>   dimm: Implement memory device abstraction
>   acpi_piix4: Implement memory device hotplug registers
>   pc: calculate dimm physical addresses and adjust memory map
>   pc: Add dimm paravirt SRAT info
>   Implement "-dimm" command line option
>   Implement dimm_add and dimm_del commands for hmp and qmp
>   fix live-migration when "populated=on" is missing
>   Implement memory hotplug notification lists
>   acpi_piix4: _OST dimm support
>   acpi_piix4: Update dimm state on VM reboot
>   acpi_piix4: Update dimm bitmap state on hot-remove fail
>   Implement "info memtotal" and "query-memtotal"
>   Implement -dimms, -dimmspop command line options
>   Implement mem_increase, mem_decrease hmp/qmp commands
>
>  arch_init.c |   23 ++-
>  docs/specs/acpi_hotplug.txt |   46 +
>  docs/specs/fwcfg.txt|   28 +++
>  hmp-commands.hx |   67 +++
>  hmp.c   |   24 +++
>  hmp.h   |2 +
>  hw/Makefile.objs|2 +-
>  hw/acpi_piix4.c |  131 -
>  hw/dimm.c   |  449 
> +++
>  hw/dimm.h   |   72 +++
>  hw/pc.c |   94 +-
>  hw/pc.h |6 +
>  hw/pc_piix.c|   18 ++-
>  monitor.c   |   35 
>  monitor.h   |5 +
>  qapi-schema.json|   38 
>  qemu-config.c   |   70 +++
>  qemu-options.hx |   15 ++
>  qmp-commands.hx |  137 +
>  sysemu.h|1 +
>  vl.c|  122 -
>  21 files changed, 1368 insertions(+), 17 deletions(-)
>  create mode 100644 docs/specs/acpi_hotplug.txt
>  create mode 100644 docs/specs/fwcfg.txt
>  create mode 100644 hw/dimm.c
>  create mode 100644 hw/dimm.h
>
> Vasilis Liaskovitis (7):
>   Add ACPI_EXTRACT_DEVICE* macros
>   Add SSDT memory device support
>   acpi-dsdt: Implement functions for memory hotplug.
>   acpi: generate hotplug memo

Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug

2012-07-14 Thread Blue Swirl

On Fri, Jul 13, 2012 at 5:49 PM, Vasilis Liaskovitis
 wrote:
> On Thu, Jul 12, 2012 at 08:04:56PM +0000, Blue Swirl wrote:
>> On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
>>  wrote:
>> > This is v2 of the ACPI memory hotplug prototype for x86_64 target.
>>
>> I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
>> memslot) would be useful for most targets, but hotplugging may be
>> limited to x86 only. It would be nice to keep these two separate or as
>> loosely coupled as possible.
>
> agreed.
> what specific usecases besides hotplugging are you thinking about?

Most real boards have some kind of RAM module slots. Now this is
implemented with -m option, but a generic memory slot model would be
more accurate. Also the memory layout needs to be communicated to BIOS
somehow unless we want to spend cycles for BIOS memory probes. The
NUMA fw_cfg memory description should be usable for most cases even
for embedded UP machines.

> Also are there non-acpi hotplug platforms?

Some enterprise-class Sparc and PPC machines support memory hotplug.

>
> I am trying to keep generic dimm manipulation functions (e.g. population /
> depopulation and searching) in hw/dimm[.ch]. Currently the x86-acpi_piix4 
> "backend"
> registers a callback for hot-add / hot-remove. In theory other hotplug 
> backends
> can hook in.
>
> btw I don't mind using "-memslot" (I think someone during v1 mentioned 
> -dimm), we just
> need some consensus on the naming.
>
>>
>> >
>> > Changes v1->v2
>> >
>> > - memory map is automatically calculated for hotplug dimms. Dimms are 
>> > added from
>> > top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
>> > - Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add", 
>> > "dimm_del".
>> > - Seabios ejection array reduced to a byte. Use extraction macros for dimm 
>> > ssdt.
>> > - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
>> > - Documentation of new acpi_piix4 registers and paravirt data.
>> > - add ACPI _OST support for _OST enabled guests. This allows qemu to 
>> > receive
>> > notification for success / failure of memory hot-add and hot-remove 
>> > operations.
>> > Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
>> > - add monitor info command to report total guest memory (initial + 
>> > hot-added)
>> > - add command line options and monitor commands for batch dimm 
>> > creation/population
>> >
>> > Overview:
>> >
>> > Dimm devices are modeled with a new qemu command line
>> >
>> > "-dimm id=name,size=sz,node=pxm,populated=on|off"
>> >
>> > As already mentioned, the starting physical address for all dimms is 
>> > calculated
>> > automatically from top of memory, skipping the pci hole at 
>> > [PCI_HOLE_START, 4G).
>> > Node is defining numa proximity for this dimm. When not defined it defaults
>> > to zero.
>> > "-dimm id=dimm0,size=512M,node=0,populated=off"
>> > will define a 512M memory slot belonging to numa node 0.
>> >
>> > Dimms are added or removed with a new hmp command "dimm_add/dimm_del":
>> > Hot-add syntax: "dimm_add id"
>> > Hot-remove syntax: "dimm_del id"
>> >
>> > Issues:
>> >
>> > - Live migration works as long as populated field is changed to "on" for
>> > hotplugged dimms at the destination qemu command line (patch 12/21 lifts
>> > this requirement). The DimmState structure does not yet define a
>> > VMStateDescription, but i assume this is the preferred way to pass state
>> > for migration.
>> >
>> > - Dimms are abstracted as qdevices attached to the main system bus. 
>> > However,
>> > memory hotplugging has its own side channel ignoring main_system_bus's 
>> > hotplug
>> > incapability. A cleaner integration is still needed, probably attaching 
>> > memory
>> > devices as children-links of an acpi-capable device (in the pc case 
>> > acpi_piix4)
>> > instead of the system bus (TBD). Then device_add/device_del instead of new
>> > commands can hopefully be used.
>> >
>> > Comments/review welcome.
>> >
>> > series is based on uq/master for qemu-kvm, and master for seabios. Can be 
>> > found
>> > also at:
>> > http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
&g

Re: [Qemu-devel] [PATCH 1/5] scsi-disk: removable hard disks support START/STOP

2012-07-23 Thread Blue Swirl

On Mon, Jul 16, 2012 at 2:25 PM, Paolo Bonzini  wrote:
> Support for START/STOP UNIT right now is limited to CD-ROMs.  This is wrong,
> since removable hard disks (in the real world: SD card readers) also support
> it in pretty much the same way.

I remember vaguely tuning a set of large SCSI hard disks
(non-removable) so that they all didn't start immediately at the same
time (which could have burned out the PSU) but only with START UNIT
command. I think Linux or maybe even the BIOS started the drives
(nicely in sequence) before accessing the drive.

>
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/scsi-disk.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
> index bcec66b..42bae3b 100644
> --- a/hw/scsi-disk.c
> +++ b/hw/scsi-disk.c
> @@ -1251,7 +1251,7 @@ static int scsi_disk_emulate_start_stop(SCSIDiskReq *r)
>  bool start = req->cmd.buf[4] & 1;
>  bool loej = req->cmd.buf[4] & 2; /* load on start, eject on !start */
>
> -if (s->qdev.type == TYPE_ROM && loej) {
> +if ((s->features & (1 << SCSI_DISK_F_REMOVABLE)) && loej) {
>  if (!start && !s->tray_open && s->tray_locked) {
>  scsi_check_condition(r,
>   bdrv_is_inserted(s->qdev.conf.bs)
> --
> 1.7.10.4
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option

2012-08-04 Thread Blue Swirl

Thanks, applied.

On Tue, Jul 17, 2012 at 4:31 AM, Chegu Vinod  wrote:
> Changes since v3:
>- using bitmap_set() instead of set_bit() in numa_add() routine.
>- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
>- Rebased to the latest qemu.
>
> Changes since v2:
>- Using "unsigned long *" for the node_cpumask[].
>- Use bitmap_new() instead of g_malloc0() for allocation.
>- Don't rely on "max_cpus" since it may not be initialized
>  before the numa related qemu options are parsed & processed.
>
> Note: Continuing to use a new constant for allocation of
>   the mask (This constant is currently set to 255 since
>   with an 8bit APIC ID VCPUs can range from 0-254 in a
>   guest. The APIC ID 255 (0xFF) is reserved for broadcast).
>
> Changes since v1:
>
>- Use bitmap functions that are already in qemu (instead
>  of cpu_set_t macro's from sched.h)
>- Added a check for endvalue >= max_cpus.
>- Fix to address the round-robbing assignment when
>  cpu's are not explicitly specified.
> ---
>
> v1:
> --
>
> The -numa option to qemu is used to create [fake] numa nodes
> and expose them to the guest OS instance.
>
> There are a couple of issues with the -numa option:
>
> a) Max VCPU's that can be specified for a guest while using
>the qemu's -numa option is 64. Due to a typecasting issue
>when the number of VCPUs is > 32 the VCPUs don't show up
>under the specified [fake] numa nodes.
>
> b) KVM currently has support for 160VCPUs per guest. The
>qemu's -numa option has only support for upto 64VCPUs
>per guest.
> This patch addresses these two issues.
>
> Below are examples of (a) and (b)
>
> a) >32 VCPUs are specified with the -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \
> -vnc :4
>
> ...
> Upstream qemu :
> --
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
> node 2 size: 131072 MB
> node 3 cpus: 30
> node 3 size: 131072 MB
> node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
>
> With the patch applied :
> ---
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 131072 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 131072 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 131072 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 131072 MB
>
> b) >64 VCPUs specified with -numa option:
>
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu 
> Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc
>  :4
>
> ...
>
> Upstream qemu :
> --
>
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 
> 75 76 77 78 79
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
> node 2 size: 65536 MB
> node 3 cpus: 30 62
> node 3 size: 65536 MB
> node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
> node 7 size: 65536 MB
>
> With the patch applied :
> ---
>
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 65536 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 65536 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 65536 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 65536 MB
> node 6 cpus: 60 61 62 63 64 65 66 67 68 69
> node 6 size: 65536 MB
> node 7 cpus: 70 71 72 73 74 75 76 77 78 79
>
> Signed-off-by: Chegu Vinod , Jim Hull , 
> Craig Hada 
> ---
>  cpus.c   |3 ++-
>  hw/pc.c  |3 ++-
>  sysemu.h |3 ++-
>  vl.c |   43 +--
>  4 files changed, 27 insertions(+), 25 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index b182b3d..acccd08 100

Re: [Qemu-devel] [PATCH 3/5] s390: Add new channel I/O based virtio transport.

2012-08-07 Thread Blue Swirl

On Tue, Aug 7, 2012 at 2:52 PM, Cornelia Huck  wrote:
> Add a new virtio transport that uses channel commands to perform
> virtio operations.
>
> Add a new machine type s390-ccw that uses this virtio-ccw transport
> and make it the default machine for s390.
>
> Signed-off-by: Cornelia Huck 
> ---
>  hw/qdev-monitor.c  |   5 +
>  hw/s390-virtio.c   | 268 ++
>  hw/s390x/Makefile.objs |   1 +
>  hw/s390x/virtio-ccw.c  | 962 
> +
>  hw/s390x/virtio-ccw.h  |  77 
>  vl.c   |   1 +
>  6 files changed, 1243 insertions(+), 71 deletions(-)
>  create mode 100644 hw/s390x/virtio-ccw.c
>  create mode 100644 hw/s390x/virtio-ccw.h
>
> diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
> index b22a37a..79f7e6b 100644
> --- a/hw/qdev-monitor.c
> +++ b/hw/qdev-monitor.c
> @@ -42,6 +42,11 @@ static const QDevAlias qdev_alias_table[] = {
>  { "virtio-blk-s390", "virtio-blk", QEMU_ARCH_S390X },
>  { "virtio-net-s390", "virtio-net", QEMU_ARCH_S390X },
>  { "virtio-serial-s390", "virtio-serial", QEMU_ARCH_S390X },
> +{ "virtio-blk-ccw", "virtio-blk", QEMU_ARCH_S390X },
> +{ "virtio-net-ccw", "virtio-net", QEMU_ARCH_S390X },
> +{ "virtio-serial-ccw", "virtio-serial", QEMU_ARCH_S390X },
> +{ "virtio-balloon-ccw", "virtio-balloon", QEMU_ARCH_S390X },
> +{ "virtio-scsi-ccw", "virtio-scsi", QEMU_ARCH_S390X },
>  { "lsi53c895a", "lsi" },
>  { "ich9-ahci", "ahci" },
>  { }
> diff --git a/hw/s390-virtio.c b/hw/s390-virtio.c
> index 47eed35..b8bdf80 100644
> --- a/hw/s390-virtio.c
> +++ b/hw/s390-virtio.c
> @@ -30,8 +30,11 @@
>  #include "hw/sysbus.h"
>  #include "kvm.h"
>  #include "exec-memory.h"
> +#include "qemu-thread.h"
>
>  #include "hw/s390-virtio-bus.h"
> +#include "hw/s390x/css.h"
> +#include "hw/s390x/virtio-ccw.h"
>
>  //#define DEBUG_S390
>
> @@ -46,6 +49,7 @@
>  #define KVM_S390_VIRTIO_NOTIFY  0
>  #define KVM_S390_VIRTIO_RESET   1
>  #define KVM_S390_VIRTIO_SET_STATUS  2
> +#define KVM_S390_VIRTIO_CCW_NOTIFY  3
>
>  #define KERN_IMAGE_START0x01UL
>  #define KERN_PARM_AREA  0x010480UL
> @@ -62,6 +66,7 @@
>
>  static VirtIOS390Bus *s390_bus;
>  static S390CPU **ipi_states;
> +VirtioCcwBus *ccw_bus;
>
>  S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
>  {
> @@ -75,15 +80,21 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
>  int s390_virtio_hypercall(CPUS390XState *env, uint64_t mem, uint64_t 
> hypercall)
>  {
>  int r = 0, i;
> +int cssid, ssid, schid, m;
> +SubchDev *sch;
>
>  dprintf("KVM hypercall: %ld\n", hypercall);
>  switch (hypercall) {
>  case KVM_S390_VIRTIO_NOTIFY:
>  if (mem > ram_size) {
> -VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus,
> -   mem, &i);
> -if (dev) {
> -virtio_queue_notify(dev->vdev, i);
> +if (s390_bus) {
> +VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus,
> +   mem, &i);
> +if (dev) {
> +virtio_queue_notify(dev->vdev, i);
> +} else {
> +r = -EINVAL;
> +}
>  } else {
>  r = -EINVAL;
>  }
> @@ -92,28 +103,49 @@ int s390_virtio_hypercall(CPUS390XState *env, uint64_t 
> mem, uint64_t hypercall)
>  }
>  break;
>  case KVM_S390_VIRTIO_RESET:
> -{
> -VirtIOS390Device *dev;
> -
> -dev = s390_virtio_bus_find_mem(s390_bus, mem);
> -virtio_reset(dev->vdev);
> -stb_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS, 0);
> -s390_virtio_device_sync(dev);
> -s390_virtio_reset_idx(dev);
> +if (s390_bus) {
> +VirtIOS390Device *dev;
> +
> +dev = s390_virtio_bus_find_mem(s390_bus, mem);
> +virtio_reset(dev->vdev);
> +stb_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS, 0);
> +s390_virtio_device_sync(dev);
> +s390_virtio_reset_idx(dev);
> +} else {
> +r = -EINVAL;
> +}
>  break;
> -}
>  case KVM_S390_VIRTIO_SET_STATUS:
> -{
> -VirtIOS390Device *dev;
> +if (s390_bus) {
> +VirtIOS390Device *dev;
>
> -dev = s390_virtio_bus_find_mem(s390_bus, mem);
> -if (dev) {
> -s390_virtio_device_update_status(dev);
> +dev = s390_virtio_bus_find_mem(s390_bus, mem);
> +if (dev) {
> +s390_virtio_device_update_status(dev);
> +} else {
> +r = -EINVAL;
> +}
>  } else {
>  r = -EINVAL;
>  }
>  break;
> -}
> +case KVM_S390_VIRTIO_CCW_NOTIFY:
> +if (ccw_bus) {
> +if (ioinst_disassemble_sch_ide

Re: [Qemu-devel] [PATCH 2/5] s390: Virtual channel subsystem support.

2012-08-07 Thread Blue Swirl

On Tue, Aug 7, 2012 at 2:52 PM, Cornelia Huck  wrote:
> Provide a mechanism for qemu to provide fully virtual subchannels to
> the guest. In the KVM case, this relies on the kernel's css support.
> The !KVM case is not yet supported.
>
> Signed-off-by: Cornelia Huck 
> ---
>  hw/s390x/Makefile.objs |   1 +
>  hw/s390x/css.c | 440 
> +
>  hw/s390x/css.h |  62 +++
>  target-s390x/Makefile.objs |   2 +-
>  target-s390x/cpu.h | 108 +++
>  target-s390x/ioinst.c  |  38 
>  target-s390x/ioinst.h  | 173 ++
>  target-s390x/kvm.c | 101 +++
>  8 files changed, 924 insertions(+), 1 deletion(-)
>  create mode 100644 hw/s390x/css.c
>  create mode 100644 hw/s390x/css.h
>  create mode 100644 target-s390x/ioinst.c
>  create mode 100644 target-s390x/ioinst.h
>
> diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
> index dcdcac8..93b41fb 100644
> --- a/hw/s390x/Makefile.objs
> +++ b/hw/s390x/Makefile.objs
> @@ -1,3 +1,4 @@
>  obj-y = s390-virtio-bus.o s390-virtio.o
>
>  obj-y := $(addprefix ../,$(obj-y))
> +obj-y += css.o
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> new file mode 100644
> index 000..7941c44
> --- /dev/null
> +++ b/hw/s390x/css.c
> @@ -0,0 +1,440 @@
> +/*
> + * Channel subsystem base support.
> + *
> + * Copyright 2012 IBM Corp.
> + * Author(s): Cornelia Huck 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "qemu-thread.h"
> +#include "qemu-queue.h"
> +#include 
> +#include "kvm.h"
> +#include "cpu.h"
> +#include "ioinst.h"
> +#include "css.h"
> +
> +struct chp_info {

CamelCase, please.

> +uint8_t in_use;
> +uint8_t type;
> +};
> +
> +static struct chp_info chpids[MAX_CSSID + 1][MAX_CHPID + 1];
> +
> +static css_subch_cb_func css_subch_cb;

Probably these can be put to a container structure which can be passed around.

> +
> +int css_set_subch_cb(css_subch_cb_func func)
> +{
> +if (func && css_subch_cb) {
> +return -EBUSY;
> +}
> +css_subch_cb = func;
> +return 0;
> +}
> +
> +static void css_inject_io_interrupt(SubchDev *sch, uint8_t func)
> +{
> +s390_io_interrupt(sch->cssid, sch->ssid, sch->schid, 
> &sch->curr_status.scsw,
> +  &sch->curr_status.pmcw, &sch->sense_data, 0,
> +  sch->curr_status.pmcw.isc, 
> sch->curr_status.pmcw.intparm,
> +  func);
> +}
> +
> +void css_conditional_io_interrupt(SubchDev *sch)
> +{
> +s390_io_interrupt(sch->cssid, sch->ssid, sch->schid, 
> &sch->curr_status.scsw,
> +  &sch->curr_status.pmcw, &sch->sense_data, 1,
> +  sch->curr_status.pmcw.isc, 
> sch->curr_status.pmcw.intparm, 0);
> +}
> +
> +static void sch_handle_clear_func(SubchDev *sch)
> +{
> +struct pmcw *p = &sch->curr_status.pmcw;
> +struct scsw *s = &sch->curr_status.scsw;
> +int path;
> +
> +/* Path management: In our simple css, we always choose the only path. */
> +path = 0x80;
> +
> +/* Reset values prior to 'issueing the clear signal'. */
> +p->lpum = 0;
> +p->pom = 0xff;
> +s->pno = 0;
> +
> +/* We always 'attempt to issue the clear signal', and we always succeed. 
> */
> +sch->orb = NULL;
> +sch->channel_prog = NULL;
> +sch->last_cmd = NULL;
> +s->actl &= ~SCSW_ACTL_CLEAR_PEND;
> +s->stctl |= SCSW_STCTL_STATUS_PEND;
> +
> +s->dstat = 0;
> +s->cstat = 0;
> +p->lpum = path;
> +
> +}
> +
> +static void sch_handle_halt_func(SubchDev *sch)
> +{
> +
> +struct pmcw *p = &sch->curr_status.pmcw;
> +struct scsw *s = &sch->curr_status.scsw;
> +int path;
> +
> +/* Path management: In our simple css, we always choose the only path. */
> +path = 0x80;
> +
> +/* We always 'attempt to issue the halt signal', and we always succeed. 
> */
> +sch->orb = NULL;
> +sch->channel_prog = NULL;
> +sch->last_cmd = NULL;
> +s->actl &= ~SCSW_ACTL_HALT_PEND;
> +s->stctl |= SCSW_STCTL_STATUS_PEND;
> +
> +if ((s->actl & (SCSW_ACTL_SUBCH_ACTIVE | SCSW_ACTL_DEVICE_ACTIVE)) ||
> +!((s->actl & SCSW_ACTL_START_PEND) ||
> +  (s->actl & SCSW_ACTL_SUSP))) {
> +s->dstat = SCSW_DSTAT_DEVICE_END;
> +}
> +s->cstat = 0;
> +p->lpum = path;
> +
> +}
> +
> +static int css_interpret_ccw(SubchDev *sch, struct ccw1 *ccw)
> +{
> +int ret;
> +bool check_len;
> +int len;
> +int i;
> +
> +if (!ccw) {
> +return -EIO;
> +}
> +
> +/* Check for invalid command codes. */
> +if ((ccw->cmd_code & 0x0f) == 0) {
> +return -EINVAL;
> +}
> +if (((ccw->cmd_code & 0x0f) == CCW_CMD_TIC) &&
> +((ccw->cmd_code & 0xf0) != 0)) {
> +return -EINVAL;
> +}
> +
> +if (ccw->flags & CCW_FLAG_SUSPEND) {
> +retu

Re: [Qemu-devel] [PATCH v8 5/6] introduce a new qom device to deal with panicked event

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 2:47 AM, Wen Congyang  wrote:
> If the target is x86/x86_64, the guest's kernel will write 0x01 to the
> port KVM_PV_EVENT_PORT when it is panciked. This patch introduces a new
> qom device kvm_pv_ioport to listen this I/O port, and deal with panicked
> event according to panicked_action's value. The possible actions are:
> 1. emit QEVENT_GUEST_PANICKED only
> 2. emit QEVENT_GUEST_PANICKED and pause the guest
> 3. emit QEVENT_GUEST_PANICKED and poweroff the guest
> 4. emit QEVENT_GUEST_PANICKED and reset the guest
>
> I/O ports does not work for some targets(for example: s390). And you
> can implement another qom device, and include it's code into pv_event.c
> for such target.
>
> Note: if we emit QEVENT_GUEST_PANICKED only, and the management
> application does not receive this event(the management may not
> run when the event is emitted), the management won't know the
> guest is panicked.
>
> Signed-off-by: Wen Congyang 
> ---
>  hw/kvm/Makefile.objs |2 +-
>  hw/kvm/pv_event.c|  109 
> ++
>  hw/kvm/pv_ioport.c   |   93 ++
>  hw/pc_piix.c |9 
>  kvm.h|2 +
>  5 files changed, 214 insertions(+), 1 deletions(-)
>  create mode 100644 hw/kvm/pv_event.c
>  create mode 100644 hw/kvm/pv_ioport.c
>
> diff --git a/hw/kvm/Makefile.objs b/hw/kvm/Makefile.objs
> index 226497a..23e3b30 100644
> --- a/hw/kvm/Makefile.objs
> +++ b/hw/kvm/Makefile.objs
> @@ -1 +1 @@
> -obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o
> +obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o pv_event.o
> diff --git a/hw/kvm/pv_event.c b/hw/kvm/pv_event.c
> new file mode 100644
> index 000..8897237
> --- /dev/null
> +++ b/hw/kvm/pv_event.c
> @@ -0,0 +1,109 @@
> +/*
> + * QEMU KVM support, paravirtual event device
> + *
> + * Copyright Fujitsu, Corp. 2012
> + *
> + * Authors:
> + * Wen Congyang 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Possible values for action parameter. */
> +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
> +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause VM */
> +#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit VM */
> +#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset VM */
> +
> +#define PV_EVENT_DRIVER "kvm_pv_event"
> +
> +struct pv_event_action {

PVEventAction

> +char *panicked_action;
> +int panicked_action_value;
> +};
> +
> +#define DEFINE_PV_EVENT_PROPERTIES(_state, _conf)   \
> +DEFINE_PROP_STRING("panicked_action", _state, _conf.panicked_action)
> +
> +static void panicked_mon_event(const char *action)
> +{
> +QObject *data;
> +
> +data = qobject_from_jsonf("{ 'action': %s }", action);
> +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
> +qobject_decref(data);
> +}
> +
> +static void panicked_perform_action(uint32_t panicked_action)
> +{
> +switch (panicked_action) {
> +case PANICKED_REPORT:
> +panicked_mon_event("report");
> +break;
> +
> +case PANICKED_PAUSE:
> +panicked_mon_event("pause");
> +vm_stop(RUN_STATE_GUEST_PANICKED);
> +break;
> +
> +case PANICKED_POWEROFF:
> +panicked_mon_event("poweroff");
> +qemu_system_shutdown_request();
> +break;

Misses a line break unlike other cases.

> +case PANICKED_RESET:
> +panicked_mon_event("reset");
> +qemu_system_reset_request();
> +break;
> +}
> +}
> +
> +static uint64_t supported_event(void)
> +{
> +return 1 << KVM_PV_FEATURE_PANICKED;
> +}
> +
> +static void handle_event(int event, struct pv_event_action *conf)
> +{
> +if (event == KVM_PV_EVENT_PANICKED) {
> +panicked_perform_action(conf->panicked_action_value);
> +}
> +}
> +
> +static int pv_event_init(struct pv_event_action *conf)
> +{
> +if (!conf->panicked_action) {
> +conf->panicked_action_value = PANICKED_REPORT;
> +} else if (strcasecmp(conf->panicked_action, "none") == 0) {
> +conf->panicked_action_value = PANICKED_REPORT;
> +} else if (strcasecmp(conf->panicked_action, "pause") == 0) {
> +conf->panicked_action_value = PANICKED_PAUSE;
> +} else if (strcasecmp(conf->panicked_action, "poweroff") == 0) {
> +conf->panicked_action_value = PANICKED_POWEROFF;
> +} else if (strcasecmp(conf->panicked_action, "reset") == 0) {
> +conf->panicked_action_value = PANICKED_RESET;
> +} else {
> +return -1;
> +}
> +
> +return 0;
> +}
> +
> +#if defined(KVM_PV_EVENT_PORT)
> +
> +#include "pv_ioport.c"

I'd rather not include any .c files but insert the contents here directly.

> +
> +#else
> +vo

Re: [Qemu-devel] [PATCH 3/5] s390: Add new channel I/O based virtio transport.

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 8:28 AM, Cornelia Huck  wrote:
> On Tue, 7 Aug 2012 20:47:22 +
> Blue Swirl  wrote:
>
>
>> > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
>> > new file mode 100644
>> > index 000..8a90c3a
>> > --- /dev/null
>> > +++ b/hw/s390x/virtio-ccw.c
>> > @@ -0,0 +1,962 @@
>> > +/*
>> > + * virtio ccw target implementation
>> > + *
>> > + * Copyright 2012 IBM Corp.
>> > + * Author(s): Cornelia Huck 
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>> > + * your option) any later version. See the COPYING file in the top-level
>> > + * directory.
>> > + */
>> > +
>> > +#include 
>> > +#include "block.h"
>> > +#include "blockdev.h"
>> > +#include "sysemu.h"
>> > +#include "net.h"
>> > +#include "monitor.h"
>> > +#include "qemu-thread.h"
>> > +#include "../virtio.h"
>> > +#include "../virtio-serial.h"
>> > +#include "../virtio-net.h"
>> > +#include "../sysbus.h"
>>
>> "hw/virtio..." for the above
>
> OK.
>>
>> > +#include "bitops.h"
>> > +
>> > +#include "ioinst.h"
>> > +#include "css.h"
>> > +#include "virtio-ccw.h"
>> > +
>> > +static const TypeInfo virtio_ccw_bus_info = {
>> > +.name = TYPE_VIRTIO_CCW_BUS,
>> > +.parent = TYPE_BUS,
>> > +.instance_size = sizeof(VirtioCcwBus),
>> > +};
>> > +
>> > +static const VirtIOBindings virtio_ccw_bindings;
>> > +
>> > +typedef struct sch_entry {
>> > +SubchDev *sch;
>> > +QLIST_ENTRY(sch_entry) entry;
>> > +} sch_entry;
>>
>> SubchEntry, see CODING_STYLE. Also other struct and typedef names below.
>>
>> > +
>> > +QLIST_HEAD(subch_list, sch_entry);
>>
>> static, but please put this to a structure that is passed around instead.
>>
>> > +
>> > +typedef struct devno_entry {
>> > +uint16_t devno;
>> > +QLIST_ENTRY(devno_entry) entry;
>> > +} devno_entry;
>> > +
>> > +QLIST_HEAD(devno_list, devno_entry);
>>
>> Ditto
>>
>> > +
>> > +struct subch_set {
>> > +struct subch_list *s_list[256];
>> > +struct devno_list *d_list[256];
>> > +};
>> > +
>> > +struct css_set {
>> > +struct subch_set *set[MAX_SSID + 1];
>> > +};
>> > +
>> > +static struct css_set *channel_subsys[MAX_CSSID + 1];
>
> OK, will try to come up with some kind of structure for this and
> CamelCasify it.
>
>> > +
>> > +VirtIODevice *virtio_ccw_get_vdev(SubchDev *sch)
>> > +{
>> > +VirtIODevice *vdev = NULL;
>> > +
>> > +if (sch->driver_data) {
>> > +vdev = ((VirtioCcwData *)sch->driver_data)->vdev;
>> > +}
>> > +return vdev;
>> > +}
>> > +
>
>> > +VirtioCcwBus *virtio_ccw_bus_init(void)
>> > +{
>> > +VirtioCcwBus *bus;
>> > +BusState *_bus;
>>
>> Please avoid identifiers with leading underscores.
>
> OK.
>
>>
>> > +DeviceState *dev;
>> > +
>> > +css_set_subch_cb(virtio_ccw_find_subch);
>> > +
>> > +/* Create bridge device */
>> > +dev = qdev_create(NULL, "virtio-ccw-bridge");
>> > +qdev_init_nofail(dev);
>> > +
>> > +/* Create bus on bridge device */
>> > +_bus = qbus_create(TYPE_VIRTIO_CCW_BUS, dev, "virtio-ccw");
>> > +bus = DO_UPCAST(VirtioCcwBus, bus, _bus);
>> > +
>> > +/* Enable hotplugging */
>> > +_bus->allow_hotplug = 1;
>> > +
>> > +return bus;
>> > +}
>> > +
>> > +struct vq_info_block {
>> > +uint64_t queue;
>> > +uint16_t num;
>> > +} QEMU_PACKED;
>> > +
>> > +struct vq_config_block {
>> > +uint16_t index;
>> > +uint16_t num;
>> > +} QEMU_PACKED;
>>
>> Aren't these KVM structures? They should be defined in a KVM header
>> file file in linux-headers.
>
> Not really, virtio-ccw isn't tied to kvm.
>
> I see this more as command blocks that are specific to the "control
> unit" - like s

Re: [Qemu-devel] [PATCH 2/5] s390: Virtual channel subsystem support.

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 8:17 AM, Cornelia Huck  wrote:
> On Tue, 7 Aug 2012 21:00:59 +
> Blue Swirl  wrote:
>
>
>> > diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> > new file mode 100644
>> > index 000..7941c44
>> > --- /dev/null
>> > +++ b/hw/s390x/css.c
>> > @@ -0,0 +1,440 @@
>> > +/*
>> > + * Channel subsystem base support.
>> > + *
>> > + * Copyright 2012 IBM Corp.
>> > + * Author(s): Cornelia Huck 
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>> > + * your option) any later version. See the COPYING file in the top-level
>> > + * directory.
>> > + */
>> > +
>> > +#include "qemu-thread.h"
>> > +#include "qemu-queue.h"
>> > +#include 
>> > +#include "kvm.h"
>> > +#include "cpu.h"
>> > +#include "ioinst.h"
>> > +#include "css.h"
>> > +
>> > +struct chp_info {
>>
>> CamelCase, please.
>
> OK.
>>
>> > +uint8_t in_use;
>> > +uint8_t type;
>> > +};
>> > +
>> > +static struct chp_info chpids[MAX_CSSID + 1][MAX_CHPID + 1];
>> > +
>> > +static css_subch_cb_func css_subch_cb;
>>
>> Probably these can be put to a container structure which can be passed 
>> around.
>
> Still trying to come up with a good model for that.
>
>>
>
>> > +case CCW_CMD_SENSE_ID:
>> > +{
>> > +uint8_t sense_bytes[256];
>> > +
>> > +/* Sense ID information is device specific. */
>> > +memcpy(sense_bytes, &sch->id, sizeof(sense_bytes));
>> > +if (check_len) {
>> > +if (ccw->count != sizeof(sense_bytes)) {
>> > +ret = -EINVAL;
>> > +break;
>> > +}
>> > +}
>> > +len = MIN(ccw->count, sizeof(sense_bytes));
>> > +/*
>> > + * Only indicate 0xff in the first sense byte if we actually
>> > + * have enough place to store at least bytes 0-3.
>> > + */
>> > +if (len >= 4) {
>> > +stb_phys(ccw->cda, 0xff);
>> > +} else {
>> > +stb_phys(ccw->cda, 0);
>> > +}
>> > +i = 1;
>> > +for (i = 1; i < len - 1; i++) {
>> > +stb_phys(ccw->cda + i, sense_bytes[i]);
>> > +}
>>
>> cpu_physical_memory_write()
>
> Hm, what's wrong with storing byte-by-byte?

cpu_physical_memory_write() could be more optimal, for example resolve
guest addresses only once per page.

>
>>
>> > +sch->curr_status.scsw.count = ccw->count - len;
>> > +ret = 0;
>> > +break;
>> > +}
>> > +case CCW_CMD_TIC:
>> > +if (sch->last_cmd->cmd_code == CCW_CMD_TIC) {
>> > +ret = -EINVAL;
>> > +break;
>> > +}
>> > +if (ccw->flags & (CCW_FLAG_CC | CCW_FLAG_DC)) {
>> > +ret = -EINVAL;
>> > +break;
>> > +}
>> > +sch->channel_prog = qemu_get_ram_ptr(ccw->cda);
>> > +ret = sch->channel_prog ? -EAGAIN : -EFAULT;
>> > +break;
>> > +default:
>> > +if (sch->ccw_cb) {
>> > +/* Handle device specific commands. */
>> > +ret = sch->ccw_cb(sch, ccw);
>> > +} else {
>> > +ret = -EOPNOTSUPP;
>> > +}
>> > +break;
>> > +}
>> > +sch->last_cmd = ccw;
>> > +if (ret == 0) {
>> > +if (ccw->flags & CCW_FLAG_CC) {
>> > +sch->channel_prog += 8;
>> > +ret = -EAGAIN;
>> > +}
>> > +}
>> > +
>> > +return ret;
>
>> > diff --git a/hw/s390x/css.h b/hw/s390x/css.h
>> > new file mode 100644
>> > index 000..b8a95cc
>> > --- /dev/null
>> > +++ b/hw/s390x/css.h
>> > @@ -0,0 +1,62 @@
>> > +/*
>> > + * Channel subsystem structures and definitions.
>> > + *
>> > + * Copyright 2012 IBM Corp.
>> > + * Author(s): Cornelia Huck 
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or

Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan  wrote:
> From: Liu Ping Fan 
>
> Using mem_map_lock to protect among updaters. So we can get the intact
> snapshot of mem topology -- FlatView & radix-tree.
>
> Signed-off-by: Liu Ping Fan 
> ---
>  exec.c   |3 +++
>  memory.c |   22 ++
>  memory.h |2 ++
>  3 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8244d54..0e29ef9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, 
> phys_map_nodes_nb_alloc;
> The bottom level has pointers to MemoryRegionSections.  */
>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>
> +QemuMutex mem_map_lock;
> +
>  static void io_mem_init(void);
>  static void memory_map_init(void);
>
> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>  #if !defined(CONFIG_USER_ONLY)
>  memory_map_init();
>  io_mem_init();
> +qemu_mutex_init(&mem_map_lock);

I'd move this and the mutex to memory.c since there are no other uses.
The mutex could be static then.

>  #endif
>  }
>
> diff --git a/memory.c b/memory.c
> index aab4a31..5986532 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>  assert(memory_region_transaction_depth);
>  --memory_region_transaction_depth;
>  if (!memory_region_transaction_depth && memory_region_update_pending) {
> +qemu_mutex_lock(&mem_map_lock);
>  memory_region_update_topology(NULL);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>  }
>
> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool log, 
> unsigned client)
>  {
>  uint8_t mask = 1 << client;
>
> +qemu_mutex_lock(&mem_map_lock);
>  mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>  {
>  if (mr->readonly != readonly) {
> +qemu_mutex_lock(&mem_map_lock);
>  mr->readonly = readonly;
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>  }
>
> @@ -1112,7 +1118,9 @@ void memory_region_rom_device_set_readable(MemoryRegion 
> *mr, bool readable)
>  {
>  if (mr->readable != readable) {
>  mr->readable = readable;
> +qemu_mutex_lock(&mem_map_lock);
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>  }
>
> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>  };
>  unsigned i;
>
> +qemu_mutex_lock(&mem_map_lock);
>  for (i = 0; i < mr->ioeventfd_nb; ++i) {
>  if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>  break;
> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>  sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>  mr->ioeventfds[i] = mrfd;
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void memory_region_del_eventfd(MemoryRegion *mr,
> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>  };
>  unsigned i;
>
> +qemu_mutex_lock(&mem_map_lock);
>  for (i = 0; i < mr->ioeventfd_nb; ++i) {
>  if (memory_region_ioeventfd_equal(mrfd, mr->ioeventfds[i])) {
>  break;
> @@ -1248,6 +1259,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>  mr->ioeventfds = g_realloc(mr->ioeventfds,
>sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 
> 1);
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  static void memory_region_add_subregion_common(MemoryRegion *mr,
> @@ -1259,6 +1271,8 @@ static void 
> memory_region_add_subregion_common(MemoryRegion *mr,
>  assert(!subregion->parent);
>  subregion->parent = mr;
>  subregion->addr = offset;
> +
> +qemu_mutex_lock(&mem_map_lock);
>  QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>  if (subregion->may_overlap || other->may_overlap) {
>  continue;
> @@ -1289,6 +1303,7 @@ static void 
> memory_region_add_subregion_common(MemoryRegion *mr,
>  QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
>  done:
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>
>
> @@ -1316,8 +1331,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
>  {
>  assert(subregion->parent == mr);
>  subregion->parent = NULL;
> +
> +qemu_mutex_lock(&mem_map_lock);
>  QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>  memory_region_update_topology(mr);
> +qemu_mutex_unlock(&mem_map_lock);
>  }
>
>  void mem

Re: [PATCH 08/15] memory: introduce PhysMap to present snapshot of toploygy

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan  wrote:
> From: Liu Ping Fan 
>
> PhysMap contain the flatview and radix-tree view, they are snapshot
> of system topology and should be consistent. With PhysMap, we can
> swap the pointer when updating and achieve the atomic.
>
> Signed-off-by: Liu Ping Fan 
> ---
>  exec.c   |8 
>  memory.c |   33 -
>  memory.h |   62 
> --
>  3 files changed, 60 insertions(+), 43 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 0e29ef9..01b91b0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -156,8 +156,6 @@ typedef struct PageDesc {
>  #endif
>
>  /* Size of the L2 (and L3, etc) page tables.  */

Please copy this comment to the header file.

> -#define L2_BITS 10
> -#define L2_SIZE (1 << L2_BITS)
>
>  #define P_L2_LEVELS \
>  (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
> @@ -185,7 +183,6 @@ uintptr_t qemu_host_page_mask;
>  static void *l1_map[V_L1_SIZE];
>
>  #if !defined(CONFIG_USER_ONLY)
> -typedef struct PhysPageEntry PhysPageEntry;
>
>  static MemoryRegionSection *phys_sections;
>  static unsigned phys_sections_nb, phys_sections_nb_alloc;
> @@ -194,11 +191,6 @@ static uint16_t phys_section_notdirty;
>  static uint16_t phys_section_rom;
>  static uint16_t phys_section_watch;
>
> -struct PhysPageEntry {
> -uint16_t is_leaf : 1;
> - /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
> -uint16_t ptr : 15;
> -};
>
>  /* Simple allocator for PhysPageEntry nodes */
>  static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> diff --git a/memory.c b/memory.c
> index 2eaa2fc..c7f2cfd 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -31,17 +31,6 @@ static bool global_dirty_log = false;
>  static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
>  = QTAILQ_HEAD_INITIALIZER(memory_listeners);
>
> -typedef struct AddrRange AddrRange;
> -
> -/*
> - * Note using signed integers limits us to physical addresses at most
> - * 63 bits wide.  They are needed for negative offsetting in aliases
> - * (large MemoryRegion::alias_offset).
> - */
> -struct AddrRange {
> -Int128 start;
> -Int128 size;
> -};
>
>  static AddrRange addrrange_make(Int128 start, Int128 size)
>  {
> @@ -197,28 +186,6 @@ static bool 
> memory_region_ioeventfd_equal(MemoryRegionIoeventfd a,
>  && !memory_region_ioeventfd_before(b, a);
>  }
>
> -typedef struct FlatRange FlatRange;
> -typedef struct FlatView FlatView;
> -
> -/* Range of memory in the global map.  Addresses are absolute. */
> -struct FlatRange {
> -MemoryRegion *mr;
> -target_phys_addr_t offset_in_region;
> -AddrRange addr;
> -uint8_t dirty_log_mask;
> -bool readable;
> -bool readonly;
> -};
> -
> -/* Flattened global view of current active memory hierarchy.  Kept in sorted
> - * order.
> - */
> -struct FlatView {
> -FlatRange *ranges;
> -unsigned nr;
> -unsigned nr_allocated;
> -};
> -
>  typedef struct AddressSpace AddressSpace;
>  typedef struct AddressSpaceOps AddressSpaceOps;
>
> diff --git a/memory.h b/memory.h
> index 740f018..357edd8 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -29,12 +29,72 @@
>  #include "qemu-thread.h"
>  #include "qemu/reclaimer.h"
>
> +typedef struct AddrRange AddrRange;
> +typedef struct FlatRange FlatRange;
> +typedef struct FlatView FlatView;
> +typedef struct PhysPageEntry PhysPageEntry;
> +typedef struct PhysMap PhysMap;
> +typedef struct MemoryRegionSection MemoryRegionSection;
>  typedef struct MemoryRegionOps MemoryRegionOps;
>  typedef struct MemoryRegionLifeOps MemoryRegionLifeOps;
>  typedef struct MemoryRegion MemoryRegion;
>  typedef struct MemoryRegionPortio MemoryRegionPortio;
>  typedef struct MemoryRegionMmio MemoryRegionMmio;
>
> +/*
> + * Note using signed integers limits us to physical addresses at most
> + * 63 bits wide.  They are needed for negative offsetting in aliases
> + * (large MemoryRegion::alias_offset).
> + */
> +struct AddrRange {
> +Int128 start;
> +Int128 size;
> +};
> +
> +/* Range of memory in the global map.  Addresses are absolute. */
> +struct FlatRange {
> +MemoryRegion *mr;
> +target_phys_addr_t offset_in_region;
> +AddrRange addr;
> +uint8_t dirty_log_mask;
> +bool readable;
> +bool readonly;
> +};
> +
> +/* Flattened global view of current active memory hierarchy.  Kept in sorted
> + * order.
> + */
> +struct FlatView {
> +FlatRange *ranges;
> +unsigned nr;
> +unsigned nr_allocated;
> +};
> +
> +struct PhysPageEntry {
> +uint16_t is_leaf:1;
> + /* index into phys_sections (is_leaf) or phys_map_nodes (!is_leaf) */
> +uint16_t ptr:15;
> +};
> +
> +#define L2_BITS 10
> +#define L2_SIZE (1 << L2_BITS)
> +/* This is a multi-level map on the physical address space.
> +   The bottom level has pointers to MemoryRegionSections.  */
> +struct PhysMap {
> +Atomic ref;
> +PhysPageEntry root;
> +PhysPageEntry

Re: [PATCH 09/15] memory: prepare flatview and radix-tree for rcu style access

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan  wrote:
> From: Liu Ping Fan 
>
> Flatview and radix view are all under the protection of pointer.
> And this make sure the change of them seem to be atomic!
>
> The mr accessed by radix-tree leaf or flatview will be reclaimed
> after the prev PhysMap not in use any longer
>
> Signed-off-by: Liu Ping Fan 
> ---
>  exec.c  |  303 +++---
>  hw/vhost.c  |2 +-
>  hw/xen_pt.c |2 +-
>  kvm-all.c   |2 +-
>  memory.c|   92 ++-
>  memory.h|9 ++-
>  vl.c|1 +
>  xen-all.c   |2 +-
>  8 files changed, 286 insertions(+), 127 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 01b91b0..97addb9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -24,6 +24,7 @@
>  #include 
>  #endif
>
> +#include "qemu/atomic.h"
>  #include "qemu-common.h"
>  #include "cpu.h"
>  #include "tcg.h"
> @@ -35,6 +36,8 @@
>  #include "qemu-timer.h"
>  #include "memory.h"
>  #include "exec-memory.h"
> +#include "qemu-thread.h"
> +#include "qemu/reclaimer.h"
>  #if defined(CONFIG_USER_ONLY)
>  #include 
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> @@ -184,25 +187,17 @@ static void *l1_map[V_L1_SIZE];
>
>  #if !defined(CONFIG_USER_ONLY)
>
> -static MemoryRegionSection *phys_sections;
> -static unsigned phys_sections_nb, phys_sections_nb_alloc;
>  static uint16_t phys_section_unassigned;
>  static uint16_t phys_section_notdirty;
>  static uint16_t phys_section_rom;
>  static uint16_t phys_section_watch;
>
> -
> -/* Simple allocator for PhysPageEntry nodes */
> -static PhysPageEntry (*phys_map_nodes)[L2_SIZE];
> -static unsigned phys_map_nodes_nb, phys_map_nodes_nb_alloc;
> -
>  #define PHYS_MAP_NODE_NIL (((uint16_t)~0) >> 1)
>
> -/* This is a multi-level map on the physical address space.
> -   The bottom level has pointers to MemoryRegionSections.  */
> -static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
> -
> +static QemuMutex cur_map_lock;
> +static PhysMap *cur_map;
>  QemuMutex mem_map_lock;
> +static PhysMap *next_map;
>
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -383,41 +378,38 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>
>  #if !defined(CONFIG_USER_ONLY)
>
> -static void phys_map_node_reserve(unsigned nodes)
> +static void phys_map_node_reserve(PhysMap *map, unsigned nodes)
>  {
> -if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
> +if (map->phys_map_nodes_nb + nodes > map->phys_map_nodes_nb_alloc) {
>  typedef PhysPageEntry Node[L2_SIZE];
> -phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc * 2, 16);
> -phys_map_nodes_nb_alloc = MAX(phys_map_nodes_nb_alloc,
> -  phys_map_nodes_nb + nodes);
> -phys_map_nodes = g_renew(Node, phys_map_nodes,
> - phys_map_nodes_nb_alloc);
> +map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc * 2,
> +16);
> +map->phys_map_nodes_nb_alloc = MAX(map->phys_map_nodes_nb_alloc,
> +  map->phys_map_nodes_nb + nodes);
> +map->phys_map_nodes = g_renew(Node, map->phys_map_nodes,
> + map->phys_map_nodes_nb_alloc);
>  }
>  }
>
> -static uint16_t phys_map_node_alloc(void)
> +static uint16_t phys_map_node_alloc(PhysMap *map)
>  {
>  unsigned i;
>  uint16_t ret;
>
> -ret = phys_map_nodes_nb++;
> +ret = map->phys_map_nodes_nb++;
>  assert(ret != PHYS_MAP_NODE_NIL);
> -assert(ret != phys_map_nodes_nb_alloc);
> +assert(ret != map->phys_map_nodes_nb_alloc);
>  for (i = 0; i < L2_SIZE; ++i) {
> -phys_map_nodes[ret][i].is_leaf = 0;
> -phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
> +map->phys_map_nodes[ret][i].is_leaf = 0;
> +map->phys_map_nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
>  }
>  return ret;
>  }
>
> -static void phys_map_nodes_reset(void)
> -{
> -phys_map_nodes_nb = 0;
> -}
> -
> -
> -static void phys_page_set_level(PhysPageEntry *lp, target_phys_addr_t *index,
> -target_phys_addr_t *nb, uint16_t leaf,
> +static void phys_page_set_level(PhysMap *map, PhysPageEntry *lp,
> +target_phys_addr_t *index,
> +target_phys_addr_t *nb,
> +uint16_t leaf,
>  int level)
>  {
>  PhysPageEntry *p;
> @@ -425,8 +417,8 @@ static void phys_page_set_level(PhysPageEntry *lp, 
> target_phys_addr_t *index,
>  target_phys_addr_t step = (target_phys_addr_t)1 << (level * L2_BITS);
>
>  if (!lp->is_leaf && lp->ptr == PHYS_MAP_NODE_NIL) {
> -lp->ptr = phys_map_node_alloc();
> -p = phys_map_nodes[lp->ptr];
> +lp->ptr = phys_map_node_alloc(map);
> +

Re: [Qemu-devel] [PATCH 2/5] s390: Virtual channel subsystem support.

2012-08-08 Thread Blue Swirl

On Wed, Aug 8, 2012 at 7:34 PM, Peter Maydell  wrote:
> On 8 August 2012 20:16, Blue Swirl  wrote:
>> On Wed, Aug 8, 2012 at 8:17 AM, Cornelia Huck  
>> wrote:
>>> On Tue, 7 Aug 2012 21:00:59 +
>>> Blue Swirl  wrote:
>>>> Please use more descriptive names instead of acronyms, for example 
>>>> SubChStatus.
>>>
>>> I'd rather leave these at the well-known scsw, pmcw, etc. names. These
>>> have been around for decades, and somebody familiar with channel I/O
>>> will instantly know what a struct scsw is, but will need to look hard
>>> at the code to figure out the meaning of SubChStatus.
>>
>> If they are well-known and have been around for so long time, are
>> there any suitable header files (with compatible licenses) where they
>> are defined which could be reused?
>>
>> Otherwise, please follow CODING_STYLE.
>
> I think we should follow CODING_STYLE for capitalisation issues
> but generally if the device's documentation has standard abbreviations
> for register names, structures, etc, etc we should use them. Often
> this code has to be maintained later by somebody else who might not
> be familiar with the general operation of the hardware and who is trying
> to match up the code with whatever the data sheet says. Following the
> naming used in the h/w docs makes that job easier.

Yes. typedef struct SCSW {} SCSW; should be OK too.

>
> (for instance I took the opportunity of making a bunch of structure
> member names in target-arm line up with the ARM ARM names
> as part of the refactoring that went on a while back.)
>
> -- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/15] memory: MemoryRegion topology must be stable when updating

2012-08-09 Thread Blue Swirl

On Thu, Aug 9, 2012 at 7:28 AM, liu ping fan  wrote:
> On Thu, Aug 9, 2012 at 3:17 AM, Blue Swirl  wrote:
>> On Wed, Aug 8, 2012 at 6:25 AM, Liu Ping Fan  wrote:
>>> From: Liu Ping Fan 
>>>
>>> Using mem_map_lock to protect among updaters. So we can get the intact
>>> snapshot of mem topology -- FlatView & radix-tree.
>>>
>>> Signed-off-by: Liu Ping Fan 
>>> ---
>>>  exec.c   |3 +++
>>>  memory.c |   22 ++
>>>  memory.h |2 ++
>>>  3 files changed, 27 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 8244d54..0e29ef9 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -210,6 +210,8 @@ static unsigned phys_map_nodes_nb, 
>>> phys_map_nodes_nb_alloc;
>>> The bottom level has pointers to MemoryRegionSections.  */
>>>  static PhysPageEntry phys_map = { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
>>>
>>> +QemuMutex mem_map_lock;
>>> +
>>>  static void io_mem_init(void);
>>>  static void memory_map_init(void);
>>>
>>> @@ -637,6 +639,7 @@ void cpu_exec_init_all(void)
>>>  #if !defined(CONFIG_USER_ONLY)
>>>  memory_map_init();
>>>  io_mem_init();
>>> +qemu_mutex_init(&mem_map_lock);
>>
>> I'd move this and the mutex to memory.c since there are no other uses.
>> The mutex could be static then.
>>
> But the init entry is in exec.c, not memory.c.

Memory subsystem does not have an init function of its own, this can
be the start of it.

>
> Regards,
> pingfan
>
>>>  #endif
>>>  }
>>>
>>> diff --git a/memory.c b/memory.c
>>> index aab4a31..5986532 100644
>>> --- a/memory.c
>>> +++ b/memory.c
>>> @@ -761,7 +761,9 @@ void memory_region_transaction_commit(void)
>>>  assert(memory_region_transaction_depth);
>>>  --memory_region_transaction_depth;
>>>  if (!memory_region_transaction_depth && memory_region_update_pending) {
>>> +qemu_mutex_lock(&mem_map_lock);
>>>  memory_region_update_topology(NULL);
>>> +qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>  }
>>>
>>> @@ -1069,8 +1071,10 @@ void memory_region_set_log(MemoryRegion *mr, bool 
>>> log, unsigned client)
>>>  {
>>>  uint8_t mask = 1 << client;
>>>
>>> +qemu_mutex_lock(&mem_map_lock);
>>>  mr->dirty_log_mask = (mr->dirty_log_mask & ~mask) | (log * mask);
>>>  memory_region_update_topology(mr);
>>> +qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  bool memory_region_get_dirty(MemoryRegion *mr, target_phys_addr_t addr,
>>> @@ -1103,8 +1107,10 @@ void memory_region_sync_dirty_bitmap(MemoryRegion 
>>> *mr)
>>>  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
>>>  {
>>>  if (mr->readonly != readonly) {
>>> +qemu_mutex_lock(&mem_map_lock);
>>>  mr->readonly = readonly;
>>>  memory_region_update_topology(mr);
>>> +qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>  }
>>>
>>> @@ -1112,7 +1118,9 @@ void 
>>> memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
>>>  {
>>>  if (mr->readable != readable) {
>>>  mr->readable = readable;
>>> +qemu_mutex_lock(&mem_map_lock);
>>>  memory_region_update_topology(mr);
>>> +qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>  }
>>>
>>> @@ -1206,6 +1214,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>>  };
>>>  unsigned i;
>>>
>>> +qemu_mutex_lock(&mem_map_lock);
>>>  for (i = 0; i < mr->ioeventfd_nb; ++i) {
>>>  if (memory_region_ioeventfd_before(mrfd, mr->ioeventfds[i])) {
>>>  break;
>>> @@ -1218,6 +1227,7 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>>>  sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>>>  mr->ioeventfds[i] = mrfd;
>>>  memory_region_update_topology(mr);
>>> +qemu_mutex_unlock(&mem_map_lock);
>>>  }
>>>
>>>  void memory_region_del_eventfd(MemoryRegion *mr,
>>> @@ -1236,6 +1246,7 @@ void memory_region_del_eventfd(MemoryRegion *mr,
>>>  };
>>>  unsigned i;
>

Re: [Qemu-devel] [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Blue Swirl

On Mon, Aug 13, 2012 at 8:35 AM, Nicholas A. Bellinger
 wrote:
> From: Nicholas Bellinger 
>
> This is required to get past the following assert with:
>
> commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> Author: Jan Kiszka 
> Date:   Thu May 17 10:32:39 2012 -0300
>
> virtio/vhost: Add support for KVM in-kernel MSI injection
>
> Cc: Stefan Hajnoczi 
> Cc: Jan Kiszka 
> Cc: Paolo Bonzini 
> Cc: Anthony Liguori 
> Signed-off-by: Nicholas Bellinger 
> ---
>  hw/msix.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/hw/msix.c b/hw/msix.c
> index 800fc32..c1e6dc3 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
>  {
>  int vector;
>
> +if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier)
> +return;

Missing braces, please read CODING_STYLE.

> +
>  assert(dev->msix_vector_use_notifier &&
> dev->msix_vector_release_notifier);
>
> --
> 1.7.2.5
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC-v2 3/6] vhost-scsi: add -vhost-scsi host device for use with tcm-vhost

2012-08-13 Thread Blue Swirl

On Mon, Aug 13, 2012 at 8:35 AM, Nicholas A. Bellinger
 wrote:
> From: Stefan Hajnoczi 
>
> This patch adds a new type of host device that drives the vhost_scsi
> device.  The syntax to add vhost-scsi is:
>
>   qemu -vhost-scsi id=vhost-scsi0,wwpn=...,tpgt=123
>
> The virtio-scsi emulated device will make use of vhost-scsi to process
> virtio-scsi requests inside the kernel and hand them to the in-kernel
> SCSI target stack using the tcm_vhost fabric driver.
>
> The tcm_vhost driver was merged into the upstream linux kernel for 3.6-rc2,
> and the commit can be found here:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057cbf49a1f08297
>
> Changelog v1 -> v2:
>
> - Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
>   starting point for v3.6-rc code (Stefan + ALiguori + nab)
> - Fix upstream qemu conflict in hw/qdev-properties.c
> - Make GET_ABI_VERSION use int (nab + mst)
> - Fix vhost-scsi case lables in configure (reported by paolo)
> - Convert qdev_prop_vhost_scsi to use ->get() + ->set() following
>   qdev_prop_netdev (reported by paolo)
> - Fix typo in qemu-options.hx definition of vhost-scsi (reported by paolo)
>
> Changelog v0 -> v1:
>
> - Add VHOST_SCSI_SET_ENDPOINT call (stefan)
> - Enable vhost notifiers for multiple queues (Zhi)
> - clear vhost-scsi endpoint on stopped (Zhi)
> - Add CONFIG_VHOST_SCSI for QEMU build configure (nab)
> - Rename vhost_vring_target -> vhost_scsi_target (mst + nab)
> - Add support for VHOST_SCSI_GET_ABI_VERSION ioctl (aliguori + nab)
>
> Cc: Stefan Hajnoczi 
> Cc: Zhi Yong Wu 
> Cc: Anthony Liguori 
> Cc: Paolo Bonzini 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Nicholas Bellinger 
> ---
>  configure|   10 +++
>  hw/Makefile.objs |1 +
>  hw/qdev-properties.c |   40 
>  hw/qdev.h|3 +
>  hw/vhost-scsi.c  |  170 
> ++
>  hw/vhost-scsi.h  |   50 +++
>  qemu-common.h|1 +
>  qemu-config.c|   16 +
>  qemu-options.hx  |4 +
>  vl.c |   18 +
>  10 files changed, 313 insertions(+), 0 deletions(-)
>  create mode 100644 hw/vhost-scsi.c
>  create mode 100644 hw/vhost-scsi.h
>
> diff --git a/configure b/configure
> index f0dbc03..1f03202 100755
> --- a/configure
> +++ b/configure
> @@ -168,6 +168,7 @@ libattr=""
>  xfs=""
>
>  vhost_net="no"
> +vhost_scsi="no"
>  kvm="no"
>  gprof="no"
>  debug_tcg="no"
> @@ -513,6 +514,7 @@ Haiku)
>usb="linux"
>kvm="yes"
>vhost_net="yes"
> +  vhost_scsi="yes"
>if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
>  audio_possible_drivers="$audio_possible_drivers fmod"
>fi
> @@ -818,6 +820,10 @@ for opt do
>;;
>--enable-vhost-net) vhost_net="yes"
>;;
> +  --disable-vhost-scsi) vhost_scsi="no"
> +  ;;
> +  --enable-vhost-scsi) vhost_scsi="yes"
> +  ;;
>--disable-opengl) opengl="no"
>;;
>--enable-opengl) opengl="yes"
> @@ -3116,6 +3122,7 @@ echo "posix_madvise $posix_madvise"
>  echo "uuid support  $uuid"
>  echo "libcap-ng support $cap_ng"
>  echo "vhost-net support $vhost_net"
> +echo "vhost-scsi support $vhost_scsi"
>  echo "Trace backend $trace_backend"
>  echo "Trace output file $trace_file-"
>  echo "spice support $spice"
> @@ -3828,6 +3835,9 @@ case "$target_arch2" in
>if test "$vhost_net" = "yes" ; then
>  echo "CONFIG_VHOST_NET=y" >> $config_target_mak
>fi
> +  if test "$vhost_scsi" = "yes" ; then
> +echo "CONFIG_VHOST_SCSI=y" >> $config_target_mak
> +  fi
>  fi
>  esac
>  case "$target_arch2" in
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 3ba5dd0..6ab75ec 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -169,6 +169,7 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o 
> virtio-balloon.o virtio-net.o
>  obj-$(CONFIG_VIRTIO) += virtio-serial-bus.o virtio-scsi.o
>  obj-$(CONFIG_SOFTMMU) += vhost_net.o
>  obj-$(CONFIG_VHOST_NET) += vhost.o
> +obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o
>  obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/
>  obj-$(CONFIG_NO_PCI) += pci-stub.o
>  obj-$(CONFIG_VGA) += vga.o
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 8aca0d4..0266266 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -4,6 +4,7 @@
>  #include "blockdev.h"
>  #include "hw/block-common.h"
>  #include "net/hub.h"
> +#include "vhost-scsi.h"
>
>  void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
>  {
> @@ -696,6 +697,45 @@ PropertyInfo qdev_prop_vlan = {
>  .set   = set_vlan,
>  };
>
> +/* --- vhost-scsi --- */
> +
> +static int parse_vhost_scsi_dev(DeviceState *dev, const char *str, void 
> **ptr)
> +{
> +   VHostSCSI *p;
> +
> +   p = find_vhost_scsi(str);
> +   if (p == NULL)
> +   return -ENOENT;

Braces, please.

> +
> +   *ptr = p;
> +   return 0;
> +}
> +
> +static const char *print_vhost_scsi_dev(void *ptr)
> +{
> +VHostSCSI *p = ptr;
> +
> +return (p) ? v

Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Blue Swirl

On Mon, Aug 13, 2012 at 7:33 PM, Anthony Liguori  wrote:
> Alex Williamson  writes:
>
>> On Mon, 2012-08-13 at 08:27 -0500, Anthony Liguori wrote:
>>> Alex Williamson  writes:
>>>
>>> > VFIO kernel support was just merged into Linux, so I'd like to
>>> > formally propose inclusion of the QEMU vfio-pci driver for
>>> > QEMU 1.2.  Included here is support for x86 PCI device assignment.
>>> > PCI INTx is not yet enabled, but devices making use of either MSI
>>> > or MSI-X work.  The level irqfd and eoifd support I've proposed
>>> > for KVM enable an accelerated patch for this through KVM.  I'd
>>> > like to get this base driver in first and enable the remaining
>>> > support in-tree.
>>> >
>>> > I've split this version up a little from the RFC to make it a bit
>>> > easier to review.  Review comments from Blue Swirl and Avi are
>>> > already incorporated, including Avi's requests to simplify both
>>> > the PCI BAR mapping and unmapping paths.
>>>
>>> Hi Alex,
>>>
>>> Thanks for pushing this forward!  Hopefully this will finally kill off
>>> qemu-kvm.git for good.
>>>
>>> I think this series is going to have to wait for 1.3 to open up.  We
>>> have a very short release window for this release and I'd feel a lot
>>> more comfortable having such a significant feature spend some time in
>>> the development cycle getting testing/review.
>>>
>>> I'd like to see a few Reviewed-by's too for this series before it goes
>>> in.  I expect they won't be hard to get but I also expect it will take a
>>> few more revisions of this series to get there.
>>
>> That's disappointing, but I can understand your reluctance.  Blue Swirl
>> reviewed the RFC and could perhaps add a Reviewed-by.  Alexey has been
>> working on the POWER port and I'm sure could provide a Reviewed-by.  We
>> also have a few early adopters that are already making use of this code.
>> Towards accepting it, the driver is entirely self contained, there's
>> really no risk to the rest of qemu.  The only missing functionality is
>> legacy interrupt support.  Perhaps there's a compromise where this
>> driver could be considered a tech preview in 1.2 (x-vfio-pci?).
>> Thanks,
>
> Yeah, if a few people were willing to at least give an Acked-by by
> Wednesday, I'd be okay taking this in a "preview" or something like
> that.

Acked-by: Blue Swirl 

>
> I wouldn't bother renaming it or anything like that.  We can just
> declare in the release notes that it's an experimental feature and may
> eat your lunch while you're not looking.
>
> Regards,
>
> Anthony Liguori
>
>>
>> Alex
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v8 5/6] introduce a new qom device to deal with panicked event

2012-08-25 Thread Blue Swirl

On Wed, Aug 22, 2012 at 7:30 AM, Wen Congyang  wrote:
> At 08/09/2012 03:01 AM, Blue Swirl Wrote:
>> On Wed, Aug 8, 2012 at 2:47 AM, Wen Congyang  wrote:
>>> If the target is x86/x86_64, the guest's kernel will write 0x01 to the
>>> port KVM_PV_EVENT_PORT when it is panciked. This patch introduces a new
>>> qom device kvm_pv_ioport to listen this I/O port, and deal with panicked
>>> event according to panicked_action's value. The possible actions are:
>>> 1. emit QEVENT_GUEST_PANICKED only
>>> 2. emit QEVENT_GUEST_PANICKED and pause the guest
>>> 3. emit QEVENT_GUEST_PANICKED and poweroff the guest
>>> 4. emit QEVENT_GUEST_PANICKED and reset the guest
>>>
>>> I/O ports does not work for some targets(for example: s390). And you
>>> can implement another qom device, and include it's code into pv_event.c
>>> for such target.
>>>
>>> Note: if we emit QEVENT_GUEST_PANICKED only, and the management
>>> application does not receive this event(the management may not
>>> run when the event is emitted), the management won't know the
>>> guest is panicked.
>>>
>>> Signed-off-by: Wen Congyang 
>>> ---
>>>  hw/kvm/Makefile.objs |2 +-
>>>  hw/kvm/pv_event.c|  109 
>>> ++
>>>  hw/kvm/pv_ioport.c   |   93 ++
>>>  hw/pc_piix.c |9 
>>>  kvm.h|2 +
>>>  5 files changed, 214 insertions(+), 1 deletions(-)
>>>  create mode 100644 hw/kvm/pv_event.c
>>>  create mode 100644 hw/kvm/pv_ioport.c
>>>
>>> diff --git a/hw/kvm/Makefile.objs b/hw/kvm/Makefile.objs
>>> index 226497a..23e3b30 100644
>>> --- a/hw/kvm/Makefile.objs
>>> +++ b/hw/kvm/Makefile.objs
>>> @@ -1 +1 @@
>>> -obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o
>>> +obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o pv_event.o
>>> diff --git a/hw/kvm/pv_event.c b/hw/kvm/pv_event.c
>>> new file mode 100644
>>> index 000..8897237
>>> --- /dev/null
>>> +++ b/hw/kvm/pv_event.c
>>> @@ -0,0 +1,109 @@
>>> +/*
>>> + * QEMU KVM support, paravirtual event device
>>> + *
>>> + * Copyright Fujitsu, Corp. 2012
>>> + *
>>> + * Authors:
>>> + * Wen Congyang 
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or 
>>> later.
>>> + * See the COPYING file in the top-level directory.
>>> + *
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +/* Possible values for action parameter. */
>>> +#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
>>> +#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause VM 
>>> */
>>> +#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit VM 
>>> */
>>> +#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset VM 
>>> */
>>> +
>>> +#define PV_EVENT_DRIVER "kvm_pv_event"
>>> +
>>> +struct pv_event_action {
>>
>> PVEventAction
>>
>>> +char *panicked_action;
>>> +int panicked_action_value;
>>> +};
>>> +
>>> +#define DEFINE_PV_EVENT_PROPERTIES(_state, _conf)   \
>>> +DEFINE_PROP_STRING("panicked_action", _state, _conf.panicked_action)
>>> +
>>> +static void panicked_mon_event(const char *action)
>>> +{
>>> +QObject *data;
>>> +
>>> +data = qobject_from_jsonf("{ 'action': %s }", action);
>>> +monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
>>> +qobject_decref(data);
>>> +}
>>> +
>>> +static void panicked_perform_action(uint32_t panicked_action)
>>> +{
>>> +switch (panicked_action) {
>>> +case PANICKED_REPORT:
>>> +panicked_mon_event("report");
>>> +break;
>>> +
>>> +case PANICKED_PAUSE:
>>> +panicked_mon_event("pause");
>>> +vm_stop(RUN_STATE_GUEST_PANICKED);
>>> +break;
>>> +
>>> +case PANICKED_POWEROFF:
>>> +panicked_mon_event("poweroff");
>>> +qemu_system_

Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Blue Swirl

On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin  wrote:
> In preparation for adding PV EOI support, disable PV EOI by default for
> 1.1 and older machine types, to avoid CPUID changing during migration.
>
> PV EOI can still be enabled/disabled by specifying it explicitly.
> Enable for 1.1
> -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
> Disable for 1.2
> -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/Makefile.objs  |  2 +-
>  hw/cpu_flags.c| 32 
>  hw/cpu_flags.h|  9 +
>  hw/pc_piix.c  |  2 ++
>  target-i386/cpu.c |  8 
>  5 files changed, 52 insertions(+), 1 deletion(-)
>  create mode 100644 hw/cpu_flags.c
>  create mode 100644 hw/cpu_flags.h
>
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 850b87b..3f2532a 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -1,5 +1,5 @@
>  hw-obj-y = usb/ ide/
> -hw-obj-y += loader.o
> +hw-obj-y += loader.o cpu_flags.o
>  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
>  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
>  hw-obj-y += fw_cfg.o
> diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
> new file mode 100644
> index 000..2422d20
> --- /dev/null
> +++ b/hw/cpu_flags.c
> @@ -0,0 +1,32 @@
> +/*
> + * CPU compatibility flags.
> + *
> + * Copyright (c) 2012 Red Hat Inc.
> + * Author: Michael S. Tsirkin.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +#include "hw/cpu_flags.h"
> +
> +static bool __kvm_pv_eoi_disabled;

Don't use identifiers with leading underscores.

> +
> +void disable_kvm_pv_eoi(void)
> +{
> +   __kvm_pv_eoi_disabled = true;
> +}
> +
> +bool kvm_pv_eoi_disabled(void)
> +{
> +   return __kvm_pv_eoi_disabled;
> +}
> diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
> new file mode 100644
> index 000..05777b6
> --- /dev/null
> +++ b/hw/cpu_flags.h
> @@ -0,0 +1,9 @@
> +#ifndef HW_CPU_FLAGS_H
> +#define HW_CPU_FLAGS_H
> +
> +#include 
> +
> +void disable_kvm_pv_eoi(void);
> +bool kvm_pv_eoi_disabled(void);
> +
> +#endif
> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> index 008d42f..bdbceda 100644
> --- a/hw/pc_piix.c
> +++ b/hw/pc_piix.c
> @@ -46,6 +46,7 @@
>  #ifdef CONFIG_XEN
>  #  include 
>  #endif
> +#include "cpu_flags.h"
>
>  #define MAX_IDE_BUS 2
>
> @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
>
>  static void pc_machine_v1_1_compat(void)
>  {
> +disable_kvm_pv_eoi();
>  }
>
>  static void pc_init_pci_v1_1(ram_addr_t ram_size,
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 120a2e3..0d02fd1 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -23,6 +23,7 @@
>
>  #include "cpu.h"
>  #include "kvm.h"
> +#include "asm/kvm_para.h"
>
>  #include "qemu-option.h"
>  #include "qemu-config.h"
> @@ -33,6 +34,7 @@
>  #include "hyperv.h"
>
>  #include "hw/hw.h"
> +#include "hw/cpu_flags.h"
>
>  /* feature flags taken from "Intel Processor Identification and the CPUID
>   * Instruction" and AMD's "CPUID Specification".  In cases of disagreement
> @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
> const char *cpu_model)
>
>  plus_kvm_features = ~0; /* not supported bits will be filtered out later 
> */
>
> +/* Disable PV EOI for old machine types.
> + * Feature flags can still override. */
> +if (kvm_pv_eoi_disabled()) {
> +plus_kvm_features &= ~(0x1 << KVM_FEATURE_PV_EOI);
> +}
> +
>  add_flagname_to_bitmaps("hypervisor", &plus_features,
>  &plus_ext_features, &plus_ext2_features, &plus_ext3_features,
>  &plus_kvm_features, &plus_svm_features);
> --
> MST
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-27 Thread Blue Swirl

On Mon, Aug 27, 2012 at 7:01 PM, Michael S. Tsirkin  wrote:
> On Mon, Aug 27, 2012 at 06:56:38PM +0000, Blue Swirl wrote:
>> > +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
>> > +{
>> > +AssignedDevRegion *d = opaque;
>> > +uint8_t *in = d->u.r_virtbase + addr;
>>
>> Don't perform arithmetic with void pointers.
>
> Why not?
> We require gcc and it's a documented extension there.

We don't require GCC, Clang can be used for some targets already.
Though it supports this non-standard extension too.

It's a bad idea to introduce dependencies where it's not necessary.

In this case it's not much effort to add the identifier for the struct
and in fact the only benefit ever is that the lazy coder saves a few
key presses.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Blue Swirl

On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin  wrote:
> On Mon, Aug 27, 2012 at 06:58:29PM +0000, Blue Swirl wrote:
>> On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin  wrote:
>> > In preparation for adding PV EOI support, disable PV EOI by default for
>> > 1.1 and older machine types, to avoid CPUID changing during migration.
>> >
>> > PV EOI can still be enabled/disabled by specifying it explicitly.
>> > Enable for 1.1
>> > -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
>> > Disable for 1.2
>> > -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
>> >
>> > Signed-off-by: Michael S. Tsirkin 
>> > ---
>> >  hw/Makefile.objs  |  2 +-
>> >  hw/cpu_flags.c| 32 
>> >  hw/cpu_flags.h|  9 +
>> >  hw/pc_piix.c  |  2 ++
>> >  target-i386/cpu.c |  8 
>> >  5 files changed, 52 insertions(+), 1 deletion(-)
>> >  create mode 100644 hw/cpu_flags.c
>> >  create mode 100644 hw/cpu_flags.h
>> >
>> > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
>> > index 850b87b..3f2532a 100644
>> > --- a/hw/Makefile.objs
>> > +++ b/hw/Makefile.objs
>> > @@ -1,5 +1,5 @@
>> >  hw-obj-y = usb/ ide/
>> > -hw-obj-y += loader.o
>> > +hw-obj-y += loader.o cpu_flags.o
>> >  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
>> >  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
>> >  hw-obj-y += fw_cfg.o
>> > diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
>> > new file mode 100644
>> > index 000..2422d20
>> > --- /dev/null
>> > +++ b/hw/cpu_flags.c
>> > @@ -0,0 +1,32 @@
>> > +/*
>> > + * CPU compatibility flags.
>> > + *
>> > + * Copyright (c) 2012 Red Hat Inc.
>> > + * Author: Michael S. Tsirkin.
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License as published by
>> > + * the Free Software Foundation; either version 2 of the License, or
>> > + * (at your option) any later version.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License along
>> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> > + */
>> > +#include "hw/cpu_flags.h"
>> > +
>> > +static bool __kvm_pv_eoi_disabled;
>>
>> Don't use identifiers with leading underscores.
>
> C99 spec says "
> Any other predefined macro names
> shall begin with a leading underscore followed by an uppercase letter or
> a second underscore.
> "
>
> what are chances of compiler predefining macro __kvm_pv_eoi_disabled?

Why do you even consider that since it's trivially easy to use
something else? If a standard (and HACKING in our case) specifies
something, why do you want to fight it?

>
> But OK, will rename _kvm_pv_eoi_disabled.
> _ + lower case is guaranteed OK.

No, just use kvm_pv_eoi_disabled, the underscore is useless.

>
>
>> > +
>> > +void disable_kvm_pv_eoi(void)
>> > +{
>> > +   __kvm_pv_eoi_disabled = true;
>> > +}
>> > +
>> > +bool kvm_pv_eoi_disabled(void)
>> > +{
>> > +   return __kvm_pv_eoi_disabled;
>> > +}
>> > diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
>> > new file mode 100644
>> > index 000..05777b6
>> > --- /dev/null
>> > +++ b/hw/cpu_flags.h
>> > @@ -0,0 +1,9 @@
>> > +#ifndef HW_CPU_FLAGS_H
>> > +#define HW_CPU_FLAGS_H
>> > +
>> > +#include 
>> > +
>> > +void disable_kvm_pv_eoi(void);
>> > +bool kvm_pv_eoi_disabled(void);
>> > +
>> > +#endif
>> > diff --git a/hw/pc_piix.c b/hw/pc_piix.c
>> > index 008d42f..bdbceda 100644
>> > --- a/hw/pc_piix.c
>> > +++ b/hw/pc_piix.c
>> > @@ -46,6 +46,7 @@
>> >  #ifdef CONFIG_XEN
>> >  #  include 
>> >  #endif
>> > +#include "cpu_flags.h"
>> >
>> >  #define MAX_IDE_BUS 2
>> >
>> > @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
>> >
>> >  static void pc_machine_v1_1_compat(void)
&

Re: [Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-27 Thread Blue Swirl

On Mon, Aug 27, 2012 at 7:24 PM, Michael S. Tsirkin  wrote:
> On Mon, Aug 27, 2012 at 07:12:27PM +0000, Blue Swirl wrote:
>> On Mon, Aug 27, 2012 at 7:06 PM, Michael S. Tsirkin  wrote:
>> > On Mon, Aug 27, 2012 at 06:58:29PM +, Blue Swirl wrote:
>> >> On Mon, Aug 27, 2012 at 12:20 PM, Michael S. Tsirkin  
>> >> wrote:
>> >> > In preparation for adding PV EOI support, disable PV EOI by default for
>> >> > 1.1 and older machine types, to avoid CPUID changing during migration.
>> >> >
>> >> > PV EOI can still be enabled/disabled by specifying it explicitly.
>> >> > Enable for 1.1
>> >> > -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
>> >> > Disable for 1.2
>> >> > -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
>> >> >
>> >> > Signed-off-by: Michael S. Tsirkin 
>> >> > ---
>> >> >  hw/Makefile.objs  |  2 +-
>> >> >  hw/cpu_flags.c| 32 
>> >> >  hw/cpu_flags.h|  9 +
>> >> >  hw/pc_piix.c  |  2 ++
>> >> >  target-i386/cpu.c |  8 
>> >> >  5 files changed, 52 insertions(+), 1 deletion(-)
>> >> >  create mode 100644 hw/cpu_flags.c
>> >> >  create mode 100644 hw/cpu_flags.h
>> >> >
>> >> > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
>> >> > index 850b87b..3f2532a 100644
>> >> > --- a/hw/Makefile.objs
>> >> > +++ b/hw/Makefile.objs
>> >> > @@ -1,5 +1,5 @@
>> >> >  hw-obj-y = usb/ ide/
>> >> > -hw-obj-y += loader.o
>> >> > +hw-obj-y += loader.o cpu_flags.o
>> >> >  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
>> >> >  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
>> >> >  hw-obj-y += fw_cfg.o
>> >> > diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
>> >> > new file mode 100644
>> >> > index 000..2422d20
>> >> > --- /dev/null
>> >> > +++ b/hw/cpu_flags.c
>> >> > @@ -0,0 +1,32 @@
>> >> > +/*
>> >> > + * CPU compatibility flags.
>> >> > + *
>> >> > + * Copyright (c) 2012 Red Hat Inc.
>> >> > + * Author: Michael S. Tsirkin.
>> >> > + *
>> >> > + * This program is free software; you can redistribute it and/or modify
>> >> > + * it under the terms of the GNU General Public License as published by
>> >> > + * the Free Software Foundation; either version 2 of the License, or
>> >> > + * (at your option) any later version.
>> >> > + *
>> >> > + * This program is distributed in the hope that it will be useful,
>> >> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> >> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> >> > + * GNU General Public License for more details.
>> >> > + *
>> >> > + * You should have received a copy of the GNU General Public License 
>> >> > along
>> >> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> >> > + */
>> >> > +#include "hw/cpu_flags.h"
>> >> > +
>> >> > +static bool __kvm_pv_eoi_disabled;
>> >>
>> >> Don't use identifiers with leading underscores.
>> >
>> > C99 spec says "
>> > Any other predefined macro names
>> > shall begin with a leading underscore followed by an uppercase letter or
>> > a second underscore.
>> > "
>> >
>> > what are chances of compiler predefining macro __kvm_pv_eoi_disabled?
>>
>> Why do you even consider that since it's trivially easy to use
>> something else? If a standard (and HACKING in our case) specifies
>> something, why do you want to fight it?
>
> I missed this in HACKING, you are right:
>
> 2.4. Reserved namespaces in C and POSIX
> Underscore capital, double underscore, and underscore 't' suffixes
> should be avoided.
>
> so _kvm_pv_eoi_disabled is ok __kvm_pv_eoi_disabled is not.
> Will fix.

No leading underscores. They are not used in QEMU.

>
>> >
>> > But OK, will rename _kvm_pv_eoi_disabled.
>> > _ + lower case is guaranteed OK.
>>
>> No, just use kvm_pv_eoi_disabled, the underscore is useless.
>
> It isn't useless, this avoids conflict with function name.
> _ says it's an internal variable used to implement kvm_pv_eoi_disabled
> in a very clear way.

Sure, but there are infinite number of ways of making the identifiers
unique. Using leading underscores is a way to ever conflict with
compiler, linker,  libc, POSIX etc. Don't do it.

Where's your imagination, can't you invent any other prefix or suffix?

>
> --
> MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-28 Thread Blue Swirl

On Tue, Aug 28, 2012 at 7:35 AM, Michael Tokarev  wrote:
> On 27.08.2012 22:56, Blue Swirl wrote:
> []
>>> +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
>>> +{
>>> +AssignedDevRegion *d = opaque;
>>> +uint8_t *in = d->u.r_virtbase + addr;
>>
>> Don't perform arithmetic with void pointers.
>
> There are a few places in common qemu code which does this for a very
> long time.  So I guess it is safe now.

It's a non-standard GCC extension.

>
> /mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCHv3 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-28 Thread Blue Swirl

On Tue, Aug 28, 2012 at 1:22 PM, Michael S. Tsirkin  wrote:
> In preparation for adding PV EOI support, disable PV EOI by default for
> 1.1 and older machine types, to avoid CPUID changing during migration.
>
> PV EOI can still be enabled/disabled by specifying it explicitly.
> Enable for 1.1
> -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
> Disable for 1.2
> -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/Makefile.objs  |  2 +-
>  hw/cpu_flags.c| 32 
>  hw/cpu_flags.h|  9 +
>  hw/pc_piix.c  |  2 ++
>  target-i386/cpu.c |  8 
>  5 files changed, 52 insertions(+), 1 deletion(-)
>  create mode 100644 hw/cpu_flags.c
>  create mode 100644 hw/cpu_flags.h
>
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 850b87b..3f2532a 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -1,5 +1,5 @@
>  hw-obj-y = usb/ ide/
> -hw-obj-y += loader.o
> +hw-obj-y += loader.o cpu_flags.o
>  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
>  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
>  hw-obj-y += fw_cfg.o
> diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
> new file mode 100644
> index 000..d821d8c
> --- /dev/null
> +++ b/hw/cpu_flags.c
> @@ -0,0 +1,32 @@
> +/*
> + * CPU compatibility flags.
> + *
> + * Copyright (c) 2012 Red Hat Inc.
> + * Author: Michael S. Tsirkin.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +#include "hw/cpu_flags.h"
> +
> +static bool _kvm_pv_eoi_disabled;

NACK. I find your lack of compliance disturbing.

> +
> +void disable_kvm_pv_eoi(void)
> +{
> +   _kvm_pv_eoi_disabled = true;
> +}
> +
> +bool kvm_pv_eoi_disabled(void)
> +{
> +   return _kvm_pv_eoi_disabled;
> +}
> diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
> new file mode 100644
> index 000..05777b6
> --- /dev/null
> +++ b/hw/cpu_flags.h
> @@ -0,0 +1,9 @@
> +#ifndef HW_CPU_FLAGS_H
> +#define HW_CPU_FLAGS_H
> +
> +#include 
> +
> +void disable_kvm_pv_eoi(void);
> +bool kvm_pv_eoi_disabled(void);
> +
> +#endif
> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> index 008d42f..bdbceda 100644
> --- a/hw/pc_piix.c
> +++ b/hw/pc_piix.c
> @@ -46,6 +46,7 @@
>  #ifdef CONFIG_XEN
>  #  include 
>  #endif
> +#include "cpu_flags.h"
>
>  #define MAX_IDE_BUS 2
>
> @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
>
>  static void pc_machine_v1_1_compat(void)
>  {
> +disable_kvm_pv_eoi();
>  }
>
>  static void pc_init_pci_v1_1(ram_addr_t ram_size,
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 120a2e3..0d02fd1 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -23,6 +23,7 @@
>
>  #include "cpu.h"
>  #include "kvm.h"
> +#include "asm/kvm_para.h"
>
>  #include "qemu-option.h"
>  #include "qemu-config.h"
> @@ -33,6 +34,7 @@
>  #include "hyperv.h"
>
>  #include "hw/hw.h"
> +#include "hw/cpu_flags.h"
>
>  /* feature flags taken from "Intel Processor Identification and the CPUID
>   * Instruction" and AMD's "CPUID Specification".  In cases of disagreement
> @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
> const char *cpu_model)
>
>  plus_kvm_features = ~0; /* not supported bits will be filtered out later 
> */
>
> +/* Disable PV EOI for old machine types.
> + * Feature flags can still override. */
> +if (kvm_pv_eoi_disabled()) {
> +plus_kvm_features &= ~(0x1 << KVM_FEATURE_PV_EOI);
> +}
> +
>  add_flagname_to_bitmaps("hypervisor", &plus_features,
>  &plus_ext_features, &plus_ext2_features, &plus_ext3_features,
>  &plus_kvm_features, &plus_svm_features);
> --
> MST
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCHv3 3/4] cpuid: disable pv eoi for 1.1 and older compat types

2012-08-28 Thread Blue Swirl

On Tue, Aug 28, 2012 at 5:22 PM, Michael S. Tsirkin  wrote:
> On Tue, Aug 28, 2012 at 05:05:25PM +0000, Blue Swirl wrote:
>> > +static bool _kvm_pv_eoi_disabled;
>>
>> NACK. I find your lack of compliance disturbing.
>
> Compliance with what? Could you please add some
> motivation for the NACK?

You did not respect my review comments.

>
> --
> MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-28 Thread Blue Swirl

On Tue, Aug 28, 2012 at 5:28 PM, Michael S. Tsirkin  wrote:
> On Tue, Aug 28, 2012 at 05:01:55PM +0000, Blue Swirl wrote:
>> On Tue, Aug 28, 2012 at 7:35 AM, Michael Tokarev  wrote:
>> > On 27.08.2012 22:56, Blue Swirl wrote:
>> > []
>> >>> +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
>> >>> +{
>> >>> +AssignedDevRegion *d = opaque;
>> >>> +uint8_t *in = d->u.r_virtbase + addr;
>> >>
>> >> Don't perform arithmetic with void pointers.
>> >
>> > There are a few places in common qemu code which does this for a very
>> > long time.  So I guess it is safe now.
>>
>> It's a non-standard GCC extension.
>
> So?  We use many other GCC extensions. grep for typeof.

Dependencies should not be introduced trivially. In this case, it's
pretty easy to avoid void pointer arithmetic as Jan's next version shows.

>
> Is there a work in progress to build GCC with visual studio?
> If yes what are the chances KVM device assignment
> will work on windows?

IIRC there was really a project to use KVM on Windows and another
project to build QEMU with MSVC.

>
> Look QEMU codebase is what it is. Unless you rework all existing
> code to confirm to your taste, I do not see why you NACK valid new code
> unless it confirms to same.

Yes, I'd be happy to fix the style with huge patches at once. But our
fearless leader does not agree, so we are stuck with the codebase
being what it is until it is fixed one step at a time.

>
>> >
>> > /mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-08-28 Thread Blue Swirl

On Tue, Aug 28, 2012 at 7:31 PM, Anthony Liguori  wrote:
> Blue Swirl  writes:
>
>> On Tue, Aug 28, 2012 at 5:28 PM, Michael S. Tsirkin  wrote:
>>> On Tue, Aug 28, 2012 at 05:01:55PM +, Blue Swirl wrote:
>>>> On Tue, Aug 28, 2012 at 7:35 AM, Michael Tokarev  wrote:
>>>> > On 27.08.2012 22:56, Blue Swirl wrote:
>>>> > []
>>>> >>> +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
>>>> >>> +{
>>>> >>> +AssignedDevRegion *d = opaque;
>>>> >>> +uint8_t *in = d->u.r_virtbase + addr;
>>>> >>
>>>> >> Don't perform arithmetic with void pointers.
>>>> >
>>>> > There are a few places in common qemu code which does this for a very
>>>> > long time.  So I guess it is safe now.
>>>>
>>>> It's a non-standard GCC extension.
>>>
>>> So?  We use many other GCC extensions. grep for typeof.
>>
>> Dependencies should not be introduced trivially. In this case, it's
>> pretty easy to avoid void pointer arithmetic as Jan's next version
>> shows.
>
> The standard is vague with respect void arithmetic.  Most compilers
> allow it.  A very good analysis of the standard can be found below.
>
> http://stackoverflow.com/questions/3523145/pointer-arithmetic-for-void-pointer-in-c

The analysis would seem to show that arithmetic may be acceptable, but
it doesn't say that void pointers must be treated like char pointers.
In my view, this would make sense:

char *cptr;
void *vptr;

Since
cptr++;
is equivalent to
cptr = (char *)((uintptr_t)cptr + sizeof(*cptr));

therefore

vptr++;
should be equivalent to
vptr = (void *)((uintptr_t)vptr + sizeof(*vptr));

That is, vptr++ should be equivalent to vptr += 0 because sizeof(void)
should be 0 if allowed.

>
> BTW: can we please stop arguing about C standards.  If we currently are
> using something in QEMU that's supported by clang and GCC, it's fine and
> we ought to continue using it.
>
> The reserved names actually did bite us when porting to a new platform.
> But the only requirement for C extensions ought to be reasonable support
> in GCC and clang.
>
> I don't care at all about supporting proprietary compilers.

We also don't have crowds banging doors with their money bags with a
need for such support.

>
> Regards,
>
> Anthony Liguori
>
>>
>>>
>>> Is there a work in progress to build GCC with visual studio?
>>> If yes what are the chances KVM device assignment
>>> will work on windows?
>>
>> IIRC there was really a project to use KVM on Windows and another
>> project to build QEMU with MSVC.
>>
>>>
>>> Look QEMU codebase is what it is. Unless you rework all existing
>>> code to confirm to your taste, I do not see why you NACK valid new code
>>> unless it confirms to same.
>>
>> Yes, I'd be happy to fix the style with huge patches at once. But our
>> fearless leader does not agree, so we are stuck with the codebase
>> being what it is until it is fixed one step at a time.
>>
>>>
>>>> >
>>>> > /mjt
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-01 Thread Blue Swirl

On Tue, Aug 28, 2012 at 9:51 PM, Anthony Liguori  wrote:
> Blue Swirl  writes:
>
>> On Tue, Aug 28, 2012 at 7:31 PM, Anthony Liguori  
>> wrote:
>>> Blue Swirl  writes:
>>>
>>>> On Tue, Aug 28, 2012 at 5:28 PM, Michael S. Tsirkin  
>>>> wrote:
>>>>> On Tue, Aug 28, 2012 at 05:01:55PM +, Blue Swirl wrote:
>>>>>> On Tue, Aug 28, 2012 at 7:35 AM, Michael Tokarev  wrote:
>>>>>> > On 27.08.2012 22:56, Blue Swirl wrote:
>>>>>> > []
>>>>>> >>> +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t 
>>>>>> >>> addr)
>>>>>> >>> +{
>>>>>> >>> +AssignedDevRegion *d = opaque;
>>>>>> >>> +uint8_t *in = d->u.r_virtbase + addr;
>>>>>> >>
>>>>>> >> Don't perform arithmetic with void pointers.
>>>>>> >
>>>>>> > There are a few places in common qemu code which does this for a very
>>>>>> > long time.  So I guess it is safe now.
>>>>>>
>>>>>> It's a non-standard GCC extension.
>>>>>
>>>>> So?  We use many other GCC extensions. grep for typeof.
>>>>
>>>> Dependencies should not be introduced trivially. In this case, it's
>>>> pretty easy to avoid void pointer arithmetic as Jan's next version
>>>> shows.
>>>
>>> The standard is vague with respect void arithmetic.  Most compilers
>>> allow it.  A very good analysis of the standard can be found below.
>>>
>>> http://stackoverflow.com/questions/3523145/pointer-arithmetic-for-void-pointer-in-c
>>
>> The analysis would seem to show that arithmetic may be acceptable, but
>> it doesn't say that void pointers must be treated like char pointers.
>> In my view, this would make sense:
>>
>> char *cptr;
>> void *vptr;
>>
>> Since
>> cptr++;
>> is equivalent to
>> cptr = (char *)((uintptr_t)cptr + sizeof(*cptr));
>>
>> therefore
>>
>> vptr++;
>> should be equivalent to
>> vptr = (void *)((uintptr_t)vptr + sizeof(*vptr));
>> That is, vptr++ should be equivalent to vptr += 0 because sizeof(void)
>> should be 0 if allowed.
>
> sizeof(void) == 1
>
> With GCC at least.

It's not valid C (0 is just how I think it should be if allowed). Also
GCC can reject it even with std=gnu89 (default, C89 with GNU
extensions):
$ cat void.c
unsigned long x = sizeof(void);
$ gcc -pedantic void.c -c
void.c:1: warning: invalid application of 'sizeof' to a void type

> Regards,
>
> Anthony Liguori
>
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>>
>>>>>
>>>>> Is there a work in progress to build GCC with visual studio?
>>>>> If yes what are the chances KVM device assignment
>>>>> will work on windows?
>>>>
>>>> IIRC there was really a project to use KVM on Windows and another
>>>> project to build QEMU with MSVC.
>>>>
>>>>>
>>>>> Look QEMU codebase is what it is. Unless you rework all existing
>>>>> code to confirm to your taste, I do not see why you NACK valid new code
>>>>> unless it confirms to same.
>>>>
>>>> Yes, I'd be happy to fix the style with huge patches at once. But our
>>>> fearless leader does not agree, so we are stuck with the codebase
>>>> being what it is until it is fixed one step at a time.
>>>>
>>>>>
>>>>>> >
>>>>>> > /mjt
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-03 Thread Blue Swirl

On Mon, Sep 3, 2012 at 4:14 PM, Avi Kivity  wrote:
> On 08/29/2012 11:27 AM, Markus Armbruster wrote:
>>
>> I don't see a point in making contributors avoid non-problems that might
>> conceivably become trivial problems some day.  Especially when there's
>> no automated help with the avoiding.
>
> -Wpointer-arith

+1

>
>
>
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment

2012-09-04 Thread Blue Swirl

On Tue, Sep 4, 2012 at 8:32 AM, Avi Kivity  wrote:
> On 09/03/2012 10:32 PM, Blue Swirl wrote:
>> On Mon, Sep 3, 2012 at 4:14 PM, Avi Kivity  wrote:
>>> On 08/29/2012 11:27 AM, Markus Armbruster wrote:
>>>>
>>>> I don't see a point in making contributors avoid non-problems that might
>>>> conceivably become trivial problems some day.  Especially when there's
>>>> no automated help with the avoiding.
>>>
>>> -Wpointer-arith
>>
>> +1
>
> FWIW, I'm not in favour of enabling it, just pointing out that it
> exists.  In general I prefer avoiding unnecessary use of extensions, but
> in this case the extension is trivial and improves readability.

Void pointers are not so type safe as uint8_t pointers. There's also
little difference in readability between those in my opinion.

>
>
> --
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] KVM: Add wrapper script around Qemu to test kernels

2011-08-24 Thread Blue Swirl

On Tue, Aug 23, 2011 at 10:16 PM, Alexander Graf  wrote:
> On LinuxCon I had a nice chat with Linus on what he thinks kvm-tool
> would be doing and what he expects from it. Basically he wants a
> small and simple tool he and other developers can run to try out and
> see if the kernel they just built actually works.
>
> Fortunately, Qemu can do that today already! The only piece that was
> missing was the "simple" piece of the equation, so here is a script
> that wraps around Qemu and executes a kernel you just built.
>
> If you do have KVM around and are not cross-compiling, it will use
> KVM. But if you don't, you can still fall back to emulation mode and
> at least check if your kernel still does what you expect. I only
> implemented support for s390x and ppc there, but it's easily extensible
> to more platforms, as Qemu can emulate (and virtualize) pretty much
> any platform out there.
>
> If you don't have qemu installed, please do so before using this script. Your
> distro should provide a package for it (might even call it "kvm"). If not,
> just compile it from source - it's not hard!
>
> To quickly get going, just execute the following as user:
>
>    $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
>
> This will drop you into a shell on your rootfs.
>
> Happy hacking!
>
> Signed-off-by: Alexander Graf 
> ---
>  Documentation/run-qemu.sh |  284 
> +
>  1 files changed, 284 insertions(+), 0 deletions(-)
>  create mode 100755 Documentation/run-qemu.sh
>
> diff --git a/Documentation/run-qemu.sh b/Documentation/run-qemu.sh
> new file mode 100755
> index 000..0bac924
> --- /dev/null
> +++ b/Documentation/run-qemu.sh
> @@ -0,0 +1,284 @@
> +#!/bin/bash
> +#
> +# QEMU Launcher
> +#
> +# This script enables simple use of the KVM and Qemu tool stack for

QEMU

> +# easy kernel testing. It allows to pass either a host directory to
> +# the guest or a disk image. Example usage:
> +#
> +# Run the host root fs inside a VM:
> +#
> +# $ ./Documentation/run-qemu.sh -r /
> +#
> +# Run the same with SDL:
> +#
> +# $ ./Documentation/run-qemu.sh -r / --sdl
> +#
> +# Or with a PPC build:
> +#
> +# $ ARCH=ppc ./Documentation/run-qemu.sh -r /
> +#
> +#
> +
> +USE_SDL=
> +USE_VNC=
> +KERNEL_BIN=arch/x86/boot/bzImage
> +MON_STDIO=
> +KERNEL_APPEND2=
> +SERIAL=ttyS0
> +SERIAL_KCONFIG=SERIAL_8250
> +
> +function usage() {
> +       echo "
> +Run-Qemu allows you to execute a virtual machine with the Linux kernel

run-qemu.sh or $0

> +that you just built. To only execute a simple VM, you can just run it
> +on your root fs with \"-r / -a init=/bin/bash\"
> +
> +       -a, --append parameters
> +               Append the given parameters to the kernel command line
> +
> +       -d, --disk image
> +               Add the image file as disk into the VM
> +
> +       -r, --root directory
> +               Use the specified directory as root directory inside the 
> guest.
> +
> +       -s, --sdl
> +               Enable SDL graphical output.
> +
> +       -S, --smp cpus
> +               Set number of virtual CPUs
> +
> +       -v, --vnc
> +               Enable VNC graphical output.
> +
> +Examples:
> +
> +       Run the host root fs inside a VM:
> +       $ ./Documentation/run-qemu.sh -r /
> +
> +       Run the same with SDL:
> +       $ ./Documentation/run-qemu.sh -r / --sdl
> +
> +       Or with a PPC build:
> +       $ ARCH=ppc ./Documentation/run-qemu.sh -r /
> +"
> +}
> +
> +function require_config() {
> +       if [ "$(grep CONFIG_$1=y .config)" ]; then
> +               return
> +       fi
> +
> +       echo "You need to enable CONFIG_$1 for run-qemu to work properly"
> +       exit 1
> +}
> +
> +function has_config() {
> +       grep "CONFIG_$1=y" .config
> +}
> +
> +function drive_if() {
> +       if [ "$(has_config VIRTIO_BLK)" ]; then
> +               echo virtio
> +       elif [ "$(has_config ATA_PIIX)" ]; then
> +               echo ide
> +       else
> +               echo "\
> +Your kernel must have either VIRTIO_BLK or ATA_PIIX
> +enabled for block device assignment" >&2
> +               exit 1
> +       fi
> +}
> +
> +GETOPT=`getopt -o a:d:hr:sS:v --long append,disk:,help,root:,sdl,smp:,vnc \
> +       -n "$(basename \"$0\")" -- "$@"`
> +
> +if [ $? != 0 ]; then
> +       echo "Terminating..." >&2
> +       exit 1
> +fi
> +
> +eval set -- "$GETOPT"
> +
> +while true; do
> +       case "$1" in
> +       -a|--append)
> +               KERNEL_APPEND2="$2"
> +               shift 2
> +               ;;
> +       -d|--disk)
> +               QEMU_OPTIONS="$QEMU_OPTIONS -drive \
> +                       file=$2,if=$(drive_if),cache=unsafe"
> +               USE_DISK=1
> +               shift 2
> +               ;;
> +       -h|--help)
> +               usage
> +               exit 0
> +               ;;
> +       -r|--root)
> +               ROOTFS="$2"
> +               shift 2
> +               ;;
> +       -s|--sdl)
> +               USE_SDL=1
> +               sh

Re: [Qemu-devel] [PATCH] KVM: Add wrapper script around QEMU to test kernels

2011-08-25 Thread Blue Swirl

On Wed, Aug 24, 2011 at 9:38 PM, Alexander Graf  wrote:
> On LinuxCon I had a nice chat with Linus on what he thinks kvm-tool
> would be doing and what he expects from it. Basically he wants a
> small and simple tool he and other developers can run to try out and
> see if the kernel they just built actually works.
>
> Fortunately, QEMU can do that today already! The only piece that was
> missing was the "simple" piece of the equation, so here is a script
> that wraps around QEMU and executes a kernel you just built.
>
> If you do have KVM around and are not cross-compiling, it will use
> KVM. But if you don't, you can still fall back to emulation mode and
> at least check if your kernel still does what you expect. I only
> implemented support for s390x and ppc there, but it's easily extensible
> to more platforms, as QEMU can emulate (and virtualize) pretty much
> any platform out there.
>
> If you don't have qemu installed, please do so before using this script. Your
> distro should provide a package for it (might even call it "kvm"). If not,
> just compile it from source - it's not hard!
>
> To quickly get going, just execute the following as user:
>
>    $ ./Documentation/run-qemu.sh -r / -a init=/bin/bash
>
> This will drop you into a shell on your rootfs.
>
> Happy hacking!
>
> Signed-off-by: Alexander Graf 
>
> ---
>
> v1 -> v2:
>
>  - fix naming of QEMU
>  - use grep -q for has_config
>  - support multiple -a args
>  - spawn gdb on execution
>  - pass through qemu options
>  - dont use qemu-system-x86_64 on i386
>  - add funny sentence to startup text
>  - more helpful error messages
> ---
>  scripts/run-qemu.sh |  334 
> +++
>  1 files changed, 334 insertions(+), 0 deletions(-)
>  create mode 100755 scripts/run-qemu.sh
>
> diff --git a/scripts/run-qemu.sh b/scripts/run-qemu.sh
> new file mode 100755
> index 000..5d4e185
> --- /dev/null
> +++ b/scripts/run-qemu.sh
> @@ -0,0 +1,334 @@
> +#!/bin/bash
> +#
> +# QEMU Launcher
> +#
> +# This script enables simple use of the KVM and QEMU tool stack for
> +# easy kernel testing. It allows to pass either a host directory to
> +# the guest or a disk image. Example usage:
> +#
> +# Run the host root fs inside a VM:
> +#
> +# $ ./scripts/run-qemu.sh -r /
> +#
> +# Run the same with SDL:
> +#
> +# $ ./scripts/run-qemu.sh -r / --sdl
> +#
> +# Or with a PPC build:
> +#
> +# $ ARCH=ppc ./scripts/run-qemu.sh -r /
> +#
> +# PPC with a mac99 model by passing options to QEMU:
> +#
> +# $ ARCH=ppc ./scripts/run-qemu.sh -r / -- -M mac99
> +#
> +
> +USE_SDL=
> +USE_VNC=
> +USE_GDB=1
> +KERNEL_BIN=arch/x86/boot/bzImage
> +MON_STDIO=
> +KERNEL_APPEND2=
> +SERIAL=ttyS0
> +SERIAL_KCONFIG=SERIAL_8250
> +BASENAME=$(basename "$0")
> +
> +function usage() {
> +       echo "
> +$BASENAME allows you to execute a virtual machine with the Linux kernel
> +that you just built. To only execute a simple VM, you can just run it
> +on your root fs with \"-r / -a init=/bin/bash\"
> +
> +       -a, --append parameters
> +               Append the given parameters to the kernel command line.
> +
> +       -d, --disk image
> +               Add the image file as disk into the VM.
> +
> +       -D, --no-gdb
> +               Don't run an xterm with gdb attached to the guest.
> +
> +       -r, --root directory
> +               Use the specified directory as root directory inside the 
> guest.
> +
> +       -s, --sdl
> +               Enable SDL graphical output.
> +
> +       -S, --smp cpus
> +               Set number of virtual CPUs.
> +
> +       -v, --vnc
> +               Enable VNC graphical output.
> +
> +Examples:
> +
> +       Run the host root fs inside a VM:
> +       $ ./scripts/run-qemu.sh -r /
> +
> +       Run the same with SDL:
> +       $ ./scripts/run-qemu.sh -r / --sdl
> +
> +       Or with a PPC build:
> +       $ ARCH=ppc ./scripts/run-qemu.sh -r /
> +
> +       PPC with a mac99 model by passing options to QEMU:
> +       $ ARCH=ppc ./scripts/run-qemu.sh -r / -- -M mac99
> +"
> +}
> +
> +function require_config() {
> +       if [ "$(grep CONFIG_$1=y .config)" ]; then
> +               return
> +       fi
> +
> +       echo "You need to enable CONFIG_$1 for run-qemu to work properly"
> +       exit 1
> +}
> +
> +function has_config() {
> +       grep -q "CONFIG_$1=y" .config
> +}
> +
> +function drive_if() {
> +       if has_config VIRTIO_BLK; then
> +               echo virtio
> +       elif has_config ATA_PIIX; then
> +               echo ide
> +       else
> +               echo "\
> +Your kernel must have either VIRTIO_BLK or ATA_PIIX
> +enabled for block device assignment" >&2
> +               exit 1
> +       fi
> +}
> +
> +GETOPT=`getopt -o a:d:Dhr:sS:v --long 
> append,disk:,no-gdb,help,root:,sdl,smp:,vnc \
> +       -n "$(basename \"$0\")" -- "$@"`
> +
> +if [ $? != 0 ]; then
> +       echo "Terminating..." >&2
> +       exit 1
> +fi
> +
> +eval set -- "$GETOPT"
> +
> +while true; do
> +       case "$1" in
> +

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Blue Swirl

On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>
> But if the underlying observation is that the device tree is not really a
> tree, you're 100% correct.  This is part of why a factory interface that
> just takes a parent bus is too simplistic.
>
> I think we ought to introduce a -pci-device option that is specifically for
> creating PCI devices that doesn't require a parent bus argument but provides
> a way to specify stable addressing (for instancing, using a linear index).

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Blue Swirl

On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka  wrote:
> On 2011-01-19 20:32, Blue Swirl wrote:
>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>  wrote:
>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>
>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>> between the two interactions that makes you choose the (hypothetical)
>>>> KVM bus over the PCI bus as device parent?
>>>>
>>>
>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>
>>> But if the underlying observation is that the device tree is not really a
>>> tree, you're 100% correct.  This is part of why a factory interface that
>>> just takes a parent bus is too simplistic.
>>>
>>> I think we ought to introduce a -pci-device option that is specifically for
>>> creating PCI devices that doesn't require a parent bus argument but provides
>>> a way to specify stable addressing (for instancing, using a linear index).
>>
>> I think kvm_state should not be a property of any device or bus. It
>> should be split to more logical pieces.
>>
>> Some parts of it could remain in CPUState, because they are associated
>> with a VCPU.
>>
>> Also, for example irqfd could be considered to be similar object to
>> char or block devices provided by QEMU to devices. Would it make sense
>> to introduce new host types for passing parts of kvm_state to devices?
>>
>> I'd also make coalesced MMIO stuff part of memory object. We are not
>> passing any state references when using cpu_physical_memory_rw(), but
>> that could be changed.
>
> There are currently no VCPU-specific bits remaining in kvm_state.

I think fields vcpu_events, robust_singlestep, debugregs,
kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
same for all VCPUs but still they are sort of CPU properties. I'm not
sure about fd field.

> It may
> be a good idea to introduce an arch-specific kvm_state and move related
> bits over.

This should probably contain only irqchip_in_kernel, pit_in_kernel and
many_ioeventfds, maybe fd.

> It may also once be feasible to carve out memory management
> related fields if we have proper abstractions for that, but I'm not
> completely sure here.

I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
migration_log into the memory object.

> Anyway, all these things are secondary. The primary topic here is how to
> deal with kvm_state and its fields that have VM-global scope.

If it is an opaque blob which contains various unrelated stuff, no
clear place will be found.

By the way, we don't have a QEMUState but instead use globals. Perhaps
this should be reorganized as well. For fd field, maybe even using a
global variable could be justified, since it is used for direct access
to kernel, not unlike a system call.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Blue Swirl

On Thu, Jan 20, 2011 at 7:37 PM, Anthony Liguori
 wrote:
> On 01/20/2011 03:33 AM, Jan Kiszka wrote:
>>
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>
>>>
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>>   wrote:
>>>
>>>>
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>
>>>>>
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>>
>>>>
>>>> It's almost arbitrary, but I would say it's the direction that I/Os
>>>> flow.
>>>>
>>>> But if the underlying observation is that the device tree is not really
>>>> a
>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is specifically
>>>> for
>>>> creating PCI devices that doesn't require a parent bus argument but
>>>> provides
>>>> a way to specify stable addressing (for instancing, using a linear
>>>> index).
>>>>
>>>
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>>
>>
>> There are currently no VCPU-specific bits remaining in kvm_state. It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over. It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
>>
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
>>
>
> The debate is really:
>
> 1) should we remove all passing of kvm_state and just assume it's static
>
> 2) deal with a couple places in the code where we need to figure out how to
> get at kvm_state
>
> I think we've only identified 1 real instance of (2) and it's resulted in
> some good discussions about how to model KVM devices vs. emulated devices.
>  Honestly, (1) just stinks.  I see absolutely no advantage to it at all.

Fully agree.

> In the very worst case scenario, the thing we need to do is just reference
> an extern variable in a few places.  That completely avoids all of the
> modelling discussions for now (while leaving for placeholder FIXMEs so the
> problem can be tackled later).

I think KVMState was designed to match KVM ioctl interface: all stuff
that is needed for talking to KVM or received from KVM are there. But
I think this shouldn't be a design driver.

If the only pieces of kvm_state that are needed by the devices are
irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
passing kvm_state to devices becomes very different. Each of these are
just single bits, affecting only a few devices. Perhaps they could be
device properties which the board level sets when KVM is used?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Blue Swirl

On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka  wrote:
> On 2011-01-20 20:27, Blue Swirl wrote:
>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka  wrote:
>>> On 2011-01-19 20:32, Blue Swirl wrote:
>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>>>  wrote:
>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>>
>>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>>> KVM bus over the PCI bus as device parent?
>>>>>>
>>>>>
>>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>>
>>>>> But if the underlying observation is that the device tree is not really a
>>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>>> just takes a parent bus is too simplistic.
>>>>>
>>>>> I think we ought to introduce a -pci-device option that is specifically 
>>>>> for
>>>>> creating PCI devices that doesn't require a parent bus argument but 
>>>>> provides
>>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>>
>>>> I think kvm_state should not be a property of any device or bus. It
>>>> should be split to more logical pieces.
>>>>
>>>> Some parts of it could remain in CPUState, because they are associated
>>>> with a VCPU.
>>>>
>>>> Also, for example irqfd could be considered to be similar object to
>>>> char or block devices provided by QEMU to devices. Would it make sense
>>>> to introduce new host types for passing parts of kvm_state to devices?
>>>>
>>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>>> passing any state references when using cpu_physical_memory_rw(), but
>>>> that could be changed.
>>>
>>> There are currently no VCPU-specific bits remaining in kvm_state.
>>
>> I think fields vcpu_events, robust_singlestep, debugregs,
>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
>> same for all VCPUs but still they are sort of CPU properties. I'm not
>> sure about fd field.
>
> They are all properties of the currently loaded KVM subsystem in the
> host kernel. They can't change while KVM's root fd is opened.
> Replicating this static information into each and every VCPU state would
> be crazy.

Then each CPUX86State could have a pointer to common structure.

> In fact, services like kvm_has_vcpu_events() already encode this: they
> are static functions without any kvm_state reference that simply return
> the content of those fields. Totally inconsistent to this, we force the
> caller of kvm_check_extension to pass a handle. This is part of my
> problem with the current situation and any halfhearted steps in this
> context. Either we work towards eliminating "static KVMState *kvm_state"
> in kvm-all.c or eliminating KVMState.

If the CPU related fields are accessible through CPUState, the handle
should be available.

>>> It may
>>> be a good idea to introduce an arch-specific kvm_state and move related
>>> bits over.
>>
>> This should probably contain only irqchip_in_kernel, pit_in_kernel and
>> many_ioeventfds, maybe fd.
>
> fd is that root file descriptor you need for a few KVM services that are
> not bound to a specific VM - e.g. feature queries. It's not arch
> specific. Arch specific are e.g. robust_singlestep or xsave feature states.

By arch you mean guest CPU architecture? They are not machine features.

>>
>>> It may also once be feasible to carve out memory management
>>> related fields if we have proper abstractions for that, but I'm not
>>> completely sure here.
>>
>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
>> migration_log into the memory object.
>
> vmfd is the VM-scope file descriptor you need at machine-level. The rest
> logically belongs to a memory object, but I haven't looked at technical
> details yet.
>
>>
>>> Anyway, all these things are secondary. The primary topic here is how to
>>> deal with kvm_state and its fields that have VM-global scope.
>>
>> If it is an opaque blob which contains various unrelated stuff, no
>> clear place will

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Blue Swirl

On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann  wrote:
>  Hi,
>
>> By the way, we don't have a QEMUState but instead use globals.
>
> /me wants to underline this.
>
> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>  There never ever will be a serious need for that.
>
> We can stick with the current model of keeping global state in global
> variables.  And just do the same with kvm_state.
>
> Or we can move to have all state in a QEMUState struct which we'll pass
> around basically everywhere.  Then we can simply embed or reference
> kvm_state there.
>
> I'd tend to stick with the global variables as I don't see the point in
> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
> single qemu process.  YMMV.

Global variables are signs of a poor design. QEMUState would not help
that, instead more specific structures should be designed, much like
what I've proposed for KVMState. Some of these new structures should
be even passed around when it makes sense.

But I'd not start kvm_state redesign around global variables or QEMUState.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Blue Swirl

On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka  wrote:
> On 2011-01-21 17:37, Blue Swirl wrote:
>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann  wrote:
>>>  Hi,
>>>
>>>> By the way, we don't have a QEMUState but instead use globals.
>>>
>>> /me wants to underline this.
>>>
>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>  There never ever will be a serious need for that.
>>>
>>> We can stick with the current model of keeping global state in global
>>> variables.  And just do the same with kvm_state.
>>>
>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>> around basically everywhere.  Then we can simply embed or reference
>>> kvm_state there.
>>>
>>> I'd tend to stick with the global variables as I don't see the point in
>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>> single qemu process.  YMMV.
>>
>> Global variables are signs of a poor design.
>
> s/are/can be/.
>
>> QEMUState would not help
>> that, instead more specific structures should be designed, much like
>> what I've proposed for KVMState. Some of these new structures should
>> be even passed around when it makes sense.
>>
>> But I'd not start kvm_state redesign around global variables or QEMUState.
>
> We do not need to move individual fields yet, but we need to define
> classes of fields and strategies how to deal with them long-term. Then
> we can move forward, and that already in the right direction.

Excellent plan.

> Obvious classes are
>  - static host capabilities and means for the KVM core to query them

OK. There could be other host capabilities here in the future too,
like Xen. I don't think there are any Xen capabilities ATM though but
IIRC some recently sent patches had something like those.

>  - per-VM fields

What is per-VM which is not machine or CPU architecture specific?

>  - fields related to memory management

OK.

I'd add fourth possible class:
 - device, CPU and machine configuration, like nographic,
win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
irqchip_in_kernel could fit here, though it obviously depends on a
host capability too.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Blue Swirl

On Fri, Jan 21, 2011 at 6:17 PM, Jan Kiszka  wrote:
> On 2011-01-21 19:04, Blue Swirl wrote:
>> On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka  wrote:
>>> On 2011-01-21 17:37, Blue Swirl wrote:
>>>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann  wrote:
>>>>>  Hi,
>>>>>
>>>>>> By the way, we don't have a QEMUState but instead use globals.
>>>>>
>>>>> /me wants to underline this.
>>>>>
>>>>> IMO it is absolutely pointless to worry about ways to pass around 
>>>>> kvm_state.
>>>>>  There never ever will be a serious need for that.
>>>>>
>>>>> We can stick with the current model of keeping global state in global
>>>>> variables.  And just do the same with kvm_state.
>>>>>
>>>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>>>> around basically everywhere.  Then we can simply embed or reference
>>>>> kvm_state there.
>>>>>
>>>>> I'd tend to stick with the global variables as I don't see the point in
>>>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven 
>>>>> by a
>>>>> single qemu process.  YMMV.
>>>>
>>>> Global variables are signs of a poor design.
>>>
>>> s/are/can be/.
>>>
>>>> QEMUState would not help
>>>> that, instead more specific structures should be designed, much like
>>>> what I've proposed for KVMState. Some of these new structures should
>>>> be even passed around when it makes sense.
>>>>
>>>> But I'd not start kvm_state redesign around global variables or QEMUState.
>>>
>>> We do not need to move individual fields yet, but we need to define
>>> classes of fields and strategies how to deal with them long-term. Then
>>> we can move forward, and that already in the right direction.
>>
>> Excellent plan.
>>
>>> Obvious classes are
>>>  - static host capabilities and means for the KVM core to query them
>>
>> OK. There could be other host capabilities here in the future too,
>> like Xen. I don't think there are any Xen capabilities ATM though but
>> IIRC some recently sent patches had something like those.
>>
>>>  - per-VM fields
>>
>> What is per-VM which is not machine or CPU architecture specific?
>
> I think it would suffice for a first step to consider all per-VM fields
> as independent of CPU architecture or machine type.

I'm afraid that would not be progress.

>>>  - fields related to memory management
>>
>> OK.
>>
>> I'd add fourth possible class:
>>  - device, CPU and machine configuration, like nographic,
>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>> irqchip_in_kernel could fit here, though it obviously depends on a
>> host capability too.
>
> I would count everything that cannot be assigned to a concrete device
> upfront to the dynamic state of a machine, thus class 2. The point is,
> (potentially) every device of that machine requires access to it, just
> like (indirectly, via the KVM core services) to some KVM VM state bits.

The machine class should not be a catch-all, it would be like
QEMUState or KVMState then. Perhaps each field or variable should be
listed and given more thought.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-24 Thread Blue Swirl

On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka  wrote:
> On 2011-01-21 19:49, Blue Swirl wrote:
>>>> I'd add fourth possible class:
>>>>  - device, CPU and machine configuration, like nographic,
>>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>>> host capability too.
>>>
>>> I would count everything that cannot be assigned to a concrete device
>>> upfront to the dynamic state of a machine, thus class 2. The point is,
>>> (potentially) every device of that machine requires access to it, just
>>> like (indirectly, via the KVM core services) to some KVM VM state bits.
>>
>> The machine class should not be a catch-all, it would be like
>> QEMUState or KVMState then. Perhaps each field or variable should be
>> listed and given more thought.
>
> Let's start with what is most urgent:
>
>  - vmfd: file descriptor required for any KVM request that has VM scope
>   (in-kernel device creation, device state synchronizations, IRQ
>   routing etc.)

I'd say VM state.

>  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
>   (some devices will have to adjust their behavior depending on this)

Since QEMU version is useless, I peeked at qemu-kvm version.

There are a lot of lines like:
if (kvm_enabled() && !kvm_irqchip_in_kernel())
kvm_just_do_it();

Perhaps these would be cleaner with stub functions.

The device cases are obvious: the devices need a flag, passed to them
by pc.c, which combines kvm_enabled && kvm_irqchip_in_kernel(). This
gets stored in device state.

But exec.c case, where kvm_update_interrupt_request() is called, is
more interesting. CPU init could set up function pointer to either
stub/NULL or kvm_update_interrupt_request().

I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.

So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
The information could be stored in a MachineState, where pc.c could
grab it for device and CPU setup.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation

2011-02-03 Thread Blue Swirl

On Thu, Feb 3, 2011 at 2:55 PM, Jan Kiszka  wrote:
> The registers of real IOAPICs can be relocated during runtime (via
> chipset registers). We don't support this yet, but qemu-kvm carries the
> current base address in its version 2 vmstate.
>
> To align both implementations for migratability, add the proper
> infrastructure to accept initial as well as updated base addresses and
> include the current address in the vmstate. This is done in a way that
> will also allow multiple IOAPICs in the future.

Nack, the addresses should be device properties.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation

2011-02-03 Thread Blue Swirl

On Thu, Feb 3, 2011 at 5:18 PM, Jan Kiszka  wrote:
> On 2011-02-03 18:03, Blue Swirl wrote:
>> On Thu, Feb 3, 2011 at 2:55 PM, Jan Kiszka  wrote:
>>> The registers of real IOAPICs can be relocated during runtime (via
>>> chipset registers). We don't support this yet, but qemu-kvm carries the
>>> current base address in its version 2 vmstate.
>>>
>>> To align both implementations for migratability, add the proper
>>> infrastructure to accept initial as well as updated base addresses and
>>> include the current address in the vmstate. This is done in a way that
>>> will also allow multiple IOAPICs in the future.
>>
>> Nack, the addresses should be device properties.
>
> Hmm we could make default_base_address a property. Will change that.
> But current_base_address is just the same as apicbase and can't be a
> property.

Oh, right. What will current_base_address used for? Why can't board
just unmap IOAPIC from current address and remap it at the new
address? Then the device would not need to know its base address.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation

2011-02-03 Thread Blue Swirl

On Thu, Feb 3, 2011 at 5:43 PM, Jan Kiszka  wrote:
> On 2011-02-03 18:36, Blue Swirl wrote:
>> On Thu, Feb 3, 2011 at 5:18 PM, Jan Kiszka  wrote:
>>> On 2011-02-03 18:03, Blue Swirl wrote:
>>>> On Thu, Feb 3, 2011 at 2:55 PM, Jan Kiszka  wrote:
>>>>> The registers of real IOAPICs can be relocated during runtime (via
>>>>> chipset registers). We don't support this yet, but qemu-kvm carries the
>>>>> current base address in its version 2 vmstate.
>>>>>
>>>>> To align both implementations for migratability, add the proper
>>>>> infrastructure to accept initial as well as updated base addresses and
>>>>> include the current address in the vmstate. This is done in a way that
>>>>> will also allow multiple IOAPICs in the future.
>>>>
>>>> Nack, the addresses should be device properties.
>>>
>>> Hmm we could make default_base_address a property. Will change that.
>>> But current_base_address is just the same as apicbase and can't be a
>>> property.
>>
>> Oh, right. What will current_base_address used for? Why can't board
>> just unmap IOAPIC from current address and remap it at the new
>> address? Then the device would not need to know its base address.
>
> The board could do this. The question is where we put this service, in
> the context if the IOAPIC as ioapic_set_base_address (compare to
> cpu_set_apic_base - which is buggy as it lacks sysbus_mmio_map) or into
> each and every board code. In the latter case, the boards would also be
> responsible for saving/restoring the address.

How is the device relocated? Where are the chipset registers you mention?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation

2011-02-03 Thread Blue Swirl

On Thu, Feb 3, 2011 at 6:01 PM, Jan Kiszka  wrote:
> On 2011-02-03 18:54, Blue Swirl wrote:
>> On Thu, Feb 3, 2011 at 5:43 PM, Jan Kiszka  wrote:
>>> On 2011-02-03 18:36, Blue Swirl wrote:
>>>> On Thu, Feb 3, 2011 at 5:18 PM, Jan Kiszka  wrote:
>>>>> On 2011-02-03 18:03, Blue Swirl wrote:
>>>>>> On Thu, Feb 3, 2011 at 2:55 PM, Jan Kiszka  
>>>>>> wrote:
>>>>>>> The registers of real IOAPICs can be relocated during runtime (via
>>>>>>> chipset registers). We don't support this yet, but qemu-kvm carries the
>>>>>>> current base address in its version 2 vmstate.
>>>>>>>
>>>>>>> To align both implementations for migratability, add the proper
>>>>>>> infrastructure to accept initial as well as updated base addresses and
>>>>>>> include the current address in the vmstate. This is done in a way that
>>>>>>> will also allow multiple IOAPICs in the future.
>>>>>>
>>>>>> Nack, the addresses should be device properties.
>>>>>
>>>>> Hmm we could make default_base_address a property. Will change that.
>>>>> But current_base_address is just the same as apicbase and can't be a
>>>>> property.
>>>>
>>>> Oh, right. What will current_base_address used for? Why can't board
>>>> just unmap IOAPIC from current address and remap it at the new
>>>> address? Then the device would not need to know its base address.
>>>
>>> The board could do this. The question is where we put this service, in
>>> the context if the IOAPIC as ioapic_set_base_address (compare to
>>> cpu_set_apic_base - which is buggy as it lacks sysbus_mmio_map) or into
>>> each and every board code. In the latter case, the boards would also be
>>> responsible for saving/restoring the address.
>>
>> How is the device relocated? Where are the chipset registers you mention?
>
> Intel's PIIX chipsets contain a register called APICBASE (but it means
> the IOAPIC), and that defines the location. The analogy in the APIC
> world is the MSR_IA32_APICBASE which we maintain via the APIC state.

In ICH10 the register is called OIC—Other Interrupt Control Register
and the interesting bits APIC Range Select (ASEL).

So actually PIIX should manage IOAPIC mapping, not board level.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation

2011-02-03 Thread Blue Swirl

On Thu, Feb 3, 2011 at 7:06 PM, Jan Kiszka  wrote:
> On 2011-02-03 20:01, Blue Swirl wrote:
>> On Thu, Feb 3, 2011 at 6:01 PM, Jan Kiszka  wrote:
>>> On 2011-02-03 18:54, Blue Swirl wrote:
>>>> On Thu, Feb 3, 2011 at 5:43 PM, Jan Kiszka  wrote:
>>>>> On 2011-02-03 18:36, Blue Swirl wrote:
>>>>>> On Thu, Feb 3, 2011 at 5:18 PM, Jan Kiszka  
>>>>>> wrote:
>>>>>>> On 2011-02-03 18:03, Blue Swirl wrote:
>>>>>>>> On Thu, Feb 3, 2011 at 2:55 PM, Jan Kiszka  
>>>>>>>> wrote:
>>>>>>>>> The registers of real IOAPICs can be relocated during runtime (via
>>>>>>>>> chipset registers). We don't support this yet, but qemu-kvm carries 
>>>>>>>>> the
>>>>>>>>> current base address in its version 2 vmstate.
>>>>>>>>>
>>>>>>>>> To align both implementations for migratability, add the proper
>>>>>>>>> infrastructure to accept initial as well as updated base addresses and
>>>>>>>>> include the current address in the vmstate. This is done in a way that
>>>>>>>>> will also allow multiple IOAPICs in the future.
>>>>>>>>
>>>>>>>> Nack, the addresses should be device properties.
>>>>>>>
>>>>>>> Hmm we could make default_base_address a property. Will change that.
>>>>>>> But current_base_address is just the same as apicbase and can't be a
>>>>>>> property.
>>>>>>
>>>>>> Oh, right. What will current_base_address used for? Why can't board
>>>>>> just unmap IOAPIC from current address and remap it at the new
>>>>>> address? Then the device would not need to know its base address.
>>>>>
>>>>> The board could do this. The question is where we put this service, in
>>>>> the context if the IOAPIC as ioapic_set_base_address (compare to
>>>>> cpu_set_apic_base - which is buggy as it lacks sysbus_mmio_map) or into
>>>>> each and every board code. In the latter case, the boards would also be
>>>>> responsible for saving/restoring the address.
>>>>
>>>> How is the device relocated? Where are the chipset registers you mention?
>>>
>>> Intel's PIIX chipsets contain a register called APICBASE (but it means
>>> the IOAPIC), and that defines the location. The analogy in the APIC
>>> world is the MSR_IA32_APICBASE which we maintain via the APIC state.
>>
>> In ICH10 the register is called OIC—Other Interrupt Control Register
>> and the interesting bits APIC Range Select (ASEL).
>>
>> So actually PIIX should manage IOAPIC mapping, not board level.
>
> The point is we need ioapic_set_base_address logic in multiple places
> (once chipsets start to implement it). Better push it to a central place
> from the beginning. Also the bit keeping. There is no difference to
> apicbase.

In that case, the function should be made inline version in ioapic.h.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [0.14?][PATCH 3/4] ioapic: Prepare for base address relocation

2011-02-03 Thread Blue Swirl

On Thu, Feb 3, 2011 at 7:25 PM, Jan Kiszka  wrote:
> On 2011-02-03 20:11, Blue Swirl wrote:
>> On Thu, Feb 3, 2011 at 7:06 PM, Jan Kiszka  wrote:
>>> On 2011-02-03 20:01, Blue Swirl wrote:
>>>> On Thu, Feb 3, 2011 at 6:01 PM, Jan Kiszka  wrote:
>>>>> On 2011-02-03 18:54, Blue Swirl wrote:
>>>>>> On Thu, Feb 3, 2011 at 5:43 PM, Jan Kiszka  
>>>>>> wrote:
>>>>>>> On 2011-02-03 18:36, Blue Swirl wrote:
>>>>>>>> On Thu, Feb 3, 2011 at 5:18 PM, Jan Kiszka  
>>>>>>>> wrote:
>>>>>>>>> On 2011-02-03 18:03, Blue Swirl wrote:
>>>>>>>>>> On Thu, Feb 3, 2011 at 2:55 PM, Jan Kiszka  
>>>>>>>>>> wrote:
>>>>>>>>>>> The registers of real IOAPICs can be relocated during runtime (via
>>>>>>>>>>> chipset registers). We don't support this yet, but qemu-kvm carries 
>>>>>>>>>>> the
>>>>>>>>>>> current base address in its version 2 vmstate.
>>>>>>>>>>>
>>>>>>>>>>> To align both implementations for migratability, add the proper
>>>>>>>>>>> infrastructure to accept initial as well as updated base addresses 
>>>>>>>>>>> and
>>>>>>>>>>> include the current address in the vmstate. This is done in a way 
>>>>>>>>>>> that
>>>>>>>>>>> will also allow multiple IOAPICs in the future.
>>>>>>>>>>
>>>>>>>>>> Nack, the addresses should be device properties.
>>>>>>>>>
>>>>>>>>> Hmm we could make default_base_address a property. Will change 
>>>>>>>>> that.
>>>>>>>>> But current_base_address is just the same as apicbase and can't be a
>>>>>>>>> property.
>>>>>>>>
>>>>>>>> Oh, right. What will current_base_address used for? Why can't board
>>>>>>>> just unmap IOAPIC from current address and remap it at the new
>>>>>>>> address? Then the device would not need to know its base address.
>>>>>>>
>>>>>>> The board could do this. The question is where we put this service, in
>>>>>>> the context if the IOAPIC as ioapic_set_base_address (compare to
>>>>>>> cpu_set_apic_base - which is buggy as it lacks sysbus_mmio_map) or into
>>>>>>> each and every board code. In the latter case, the boards would also be
>>>>>>> responsible for saving/restoring the address.
>>>>>>
>>>>>> How is the device relocated? Where are the chipset registers you mention?
>>>>>
>>>>> Intel's PIIX chipsets contain a register called APICBASE (but it means
>>>>> the IOAPIC), and that defines the location. The analogy in the APIC
>>>>> world is the MSR_IA32_APICBASE which we maintain via the APIC state.
>>>>
>>>> In ICH10 the register is called OIC—Other Interrupt Control Register
>>>> and the interesting bits APIC Range Select (ASEL).
>>>>
>>>> So actually PIIX should manage IOAPIC mapping, not board level.
>>>
>>> The point is we need ioapic_set_base_address logic in multiple places
>>> (once chipsets start to implement it). Better push it to a central place
>>> from the beginning. Also the bit keeping. There is no difference to
>>> apicbase.
>>
>> In that case, the function should be made inline version in ioapic.h.
>
> That still replicates the bit keeping.
>
> I don't see the benefit of moving it over, even less when we want to
> consolidate with a vmstate layout that is already in use.

The benefit is that the device model is improved.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/13] Generic DMA memory access interface

2011-02-05 Thread Blue Swirl

On Thu, Feb 3, 2011 at 11:32 PM, Eduard - Gabriel Munteanu
 wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
>
> Signed-off-by: Eduard - Gabriel Munteanu 
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++
>  hw/dma_rw.h     |  157 
> +++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
>
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (ple

Re: [PATCH 00/13] AMD IOMMU emulation patchset (reworked cc/to)

2011-02-05 Thread Blue Swirl

On Thu, Feb 3, 2011 at 11:32 PM, Eduard - Gabriel Munteanu
 wrote:
> Hi again,
>
> Sorry for the mess, I forgot to cc Michael and this should go through his 
> tree.
> I'm also cc-ing the SeaBIOS people.
>
> malc already ack-ed the audio bits.

Please use scripts/checkpatch.pl to check for whitespace, brace etc.
issues. The patches (except for 01) look fine to me otherwise.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 15/15] kvm: x86: Introduce kvmclock device to save/restore its state

2011-02-07 Thread Blue Swirl

On Mon, Feb 7, 2011 at 1:19 PM, Jan Kiszka  wrote:
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
>
> Signed-off-by: Jan Kiszka 
> CC: Glauber Costa 
> ---
>  Makefile.target |    4 +-
>  hw/kvmclock.c   |  125 
> +++
>  hw/kvmclock.h   |   14 ++
>  hw/pc_piix.c    |   31 +++---
>  4 files changed, 165 insertions(+), 9 deletions(-)
>  create mode 100644 hw/kvmclock.c
>  create mode 100644 hw/kvmclock.h
>
> diff --git a/Makefile.target b/Makefile.target
> index b0ba95f..30232fa 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -37,7 +37,7 @@ ifndef CONFIG_HAIKU
>  LIBS+=-lm
>  endif
>
> -kvm.o kvm-all.o vhost.o vhost_net.o: QEMU_CFLAGS+=$(KVM_CFLAGS)
> +kvm.o kvm-all.o vhost.o vhost_net.o kvmclock.o: QEMU_CFLAGS+=$(KVM_CFLAGS)
>
>  config-target.h: config-target.h-timestamp
>  config-target.h-timestamp: config-target.mak
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o kvmclock.o

Please build kvmclock.o conditionally to CONFIG_something...

>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/hw/kvmclock.c b/hw/kvmclock.c
> new file mode 100644
> index 000..b6ceddf
> --- /dev/null
> +++ b/hw/kvmclock.c
> @@ -0,0 +1,125 @@
> +/*
> + * QEMU KVM support, paravirtual clock device
> + *
> + * Copyright (C) 2011 Siemens AG
> + *
> + * Authors:
> + *  Jan Kiszka        
> + *
> + * This work is licensed under the terms of the GNU GPL version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "sysemu.h"
> +#include "sysbus.h"
> +#include "kvm.h"
> +#include "kvmclock.h"
> +
> +#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
> +
> +#include 
> +#include 
> +
> +typedef struct KVMClockState {
> +    SysBusDevice busdev;
> +    uint64_t clock;
> +    bool clock_valid;
> +} KVMClockState;
> +
> +static void kvmclock_pre_save(void *opaque)
> +{
> +    KVMClockState *s = opaque;
> +    struct kvm_clock_data data;
> +    int ret;
> +
> +    if (s->clock_valid) {
> +        return;
> +    }
> +    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> +    if (ret < 0) {
> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> +        data.clock = 0;
> +    }
> +    s->clock = data.clock;
> +    /*
> +     * If the VM is stopped, declare the clock state valid to avoid 
> re-reading
> +     * it on next vmsave (which would return a different value). Will be 
> reset
> +     * when the VM is continued.
> +     */
> +    s->clock_valid = !vm_running;
> +}
> +
> +static int kvmclock_post_load(void *opaque, int version_id)
> +{
> +    KVMClockState *s = opaque;
> +    struct kvm_clock_data data;
> +
> +    data.clock = s->clock;
> +    data.flags = 0;
> +    return kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
> +}
> +
> +static void kvmclock_vm_state_change(void *opaque, int running, int reason)
> +{
> +    KVMClockState *s = opaque;
> +
> +    if (running) {
> +        s->clock_valid = false;
> +    }
> +}
> +
> +static int kvmclock_init(SysBusDevice *dev)
> +{
> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
> +
> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> +    return 0;
> +}
> +
> +static const VMStateDescription kvmclock_vmsd = {
> +    .name = "kvmclock",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .pre_save = kvmclock_pre_save,
> +    .post_load = kvmclock_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(clock, KVMClockState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static SysBusDeviceInfo kvmclock_info = {
> +    .qdev.name = "kvmclock",
> +    .qdev.size = sizeof(KVMClockState),
> +    .qdev.vmsd = &kvmclock_vmsd,
> +    .qdev.no_user = 1,
> +    .init = kvmclock_init,
> +};
> +
> +/* Note: Must be called after VCPU initialization. */
> +void kvmclock_create(void)
> +{
> +    if (kvm_enabled() &&
> +        first_cpu->cpuid_kvm_features & (1ULL << KVM_FEATURE_CLOCKSOURCE)) {
> +        sysbus_create_simple("kvmclock", -1, NULL);
> +    }
> +}

... and with this moved to a header as a static inline function, it
should be possible to use sysbus_try_create() (coming soon) to try to
create the device. Then it's not fatal if the device can't be created,
that just means that the capability was not available at build time.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo i

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-09 Thread Blue Swirl

On Wed, Feb 9, 2011 at 12:43 PM, Anthony Liguori  wrote:
> On 02/08/2011 01:30 PM, Aurelien Jarno wrote:
>>
>> On Tue, Feb 08, 2011 at 06:13:53PM +0100, Markus Armbruster wrote:
>>
>>>
>>> Chris Wright  writes:
>>>
>>> [...]
>>>

 - qdev/vmstate both examples of partially completed work that need more
   attention

>>>
>>> As far as qdev's concerned, I can see two kinds of to-dos:
>>>
>>> * Further develop qdev so that more of the machine init code can becomes
>>>   qdev declarations.  Specific ideas welcome.  Patches even more, as
>>>   always.
>>>
>>> * Convert the remaining devices.  They are typically used only with
>>>   oddball machines, which makes the conversion hard to test for anyone
>>>   who's not already using them.
>>>
>>>   I've said this before: at some point in time (sooner rather than
>>>   later, if you ask me), we need to shoot the stragglers.  I'm pretty
>>>   optimistic that any victims worth keeping will receive timely
>>>   attention then.
>>>
>>>
>>
>> For those oddball machines, qdev doesn't really bring anything, that's
>> why there is so little interest in converting them, and why I prefer to
>> spend my time on the emulation correctness than converting those
>> remaining to qdev. Of course I agree it's something to do, and with an
>> unlimited amount of free time, I'll do them immediately.
>>
>> Let's take for example the SH4 target. It's nice to be able to create
>> the whole machine from a script, except your kernel won't boot if the
>> machine:
>> - has a different cpu
>> - doesn't a SM501 chipset
>> - has not the correct memory size
>> - doesn't have 2 serial port
>>
>
> qdev needs a v2.  The object model is very difficult to work with and it
> offers little value for the scenario you describe.
>
> A SoC should be modelled as a single object with parameters that can be set.
>  That object will then have other objects embedded through it with
> composition or reference.
>
> So for instance, you might have:
>
> class SH4 {
>    SH4CPU cpu[n_vcpus];
>    SM501 chipset;
> };
>
> class SM501 : public PCIHostController {
>     PCIDevice *slots[32];
> };
>
> Having a script where you describe this is wrong.  This ought to be an
> object.  For instance, what we really ought to have on x86 is:
>
> qemu -no-machine -device i440fx,id=root -device
> rtl8139,bus=/root/pci.0,addr=1.0 -device cpu,chipset=/root
>
> Part of the problem with qdev v1 is that it doesn't allow for meaningful
> object composition.  The only relationship between devices is through
> BusState which presents a hierarchical parent/child relationship.

That is actually how hardware is usually designed, usually multiple
active masters or cyclic graphs would be too complex.

> We really need a way to do composition and referencing.  For instance, if
> you notice above, SM501 has 32 references to a PCIDevice as opposed to
> having a linked list of children.  The effect is that a PCIDevice does not
> have the PCIHostController as it's "parent" because there's no intrinsic
> parent/child hierarchy.
>
> So really, we're talking about a device graph here instead of a tree.

I think the problem is that there are several semi-overlapping trees
when you take into consideration also IRQ routing, power lines, wakeup
signals etc. and this is not so easy to describe.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-09 Thread Blue Swirl

On Wed, Feb 9, 2011 at 4:44 PM, Anthony Liguori  wrote:
> On 02/09/2011 06:28 AM, Markus Armbruster wrote:
>>>
>>> Except that construction of a device requires initialization from an
>>> array of variants (which is then type checked).  The way we store the
>>> variants is lossy because we convert back and forth to a string.
>>>
>>
>> Yes, there's overlap, but no, a qdev property isn't yet another variant
>> type scheme.  Exhibit A of the defense: qdev uses QemuOpts for variant
>> types.
>>
>> Let me elaborate.  qdev_device_add() uses QemuOpts as map from name to
>> variant type value, uses the name to look up the property, then uses
>> property methods to stuff the variant value it got from QemuOpts into
>> the (non-variant) struct member described by the property.
>>
>> I figure QemuOpts was adopted for this purpose because it was already in
>> use with command line and human monitor.  With QMP added to the mix,
>> there's friction: QMP uses QDict, not QemuOpts.
>>
>
> I'm going to finish QMP before tackling qdev, but at a high level, here's
> how I think we fix this.
>
> Right now, qdev only really supports construction properties.  In GObject
> parlance, this would be properties with G_PARAM_CONSTRUCT_ONLY.
>
> Instead of the current approach of having the construction properties
> automagically set as part of the object prior to initfn() being invoked, we
> should have an init function that takes the full set of construction only
> properties in the native type.
>
> With a schema description of the device's constructor, we can generate code
> that invokes the native constructor based on a QemuOpts, or based on a
> QDict.
>
> So instead of:
>
> static int serial_isa_initfn(ISADevice *dev);
>
> static ISADeviceInfo serial_isa_info = {
>    .init       = serial_isa_initfn,
>    .qdev.props = (Property[]) {
>        DEFINE_PROP_UINT32("index", ISASerialState, index,   -1),
>        DEFINE_PROP_HEX32("iobase", ISASerialState, iobase,  -1),
>        DEFINE_PROP_UINT32("irq",   ISASerialState, isairq,  -1),
>        DEFINE_PROP_CHR("chardev",  ISASerialState, state.chr),
>        DEFINE_PROP_END_OF_LIST(),
>    },
> };
>
> We'd have:
>
> void isa_serial_init(ISASerialState *obj, uint32_t index, uint32_t iobase,
> uint32_t irq, CharDriverState *chardev, Error **errp);
>
> // isa_serial.json
> [ 'ISASerialState', {'index': 'uint32_t', 'iobase': 'uint32_t', 'irq':
> 'uint32_t', 'chardev': 'CharDriverState *'} ]
>
> From this definition, we can generate code that handles the -device argument
> doing conversion from string to the appropriate types while also doing
> QObject/GVariant conversion to support the qmp_device_add() interface.
>
> Also, having a well typed constructor means that we can do safer composition
> because instead of doing:
>
> DeviceState *dev;
>
> dev = qdev_create(NULL, "isa-serial");
> qdev_prop_set_uint32(dev, "iobase", 0x274);
> qdev_prop_set_uint32(dev, "irq", 0x07);
> qdev_init_nofail(dev);
>
> We can just do:
>
> ISASerialState dev;
>
> isa_serial_init(&dev, 0, 0x274, 0x07, NULL, NULL);

Do you mean that there should be a generic way of doing that, like
sysbus_create_varargs() for qdev, or just add inline functions which
hide qdev property setup?

I still think that FDT should be used in the future. That would
require that the properties can be set up mechanically, and I don't
see how your proposal would help that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-09 Thread Blue Swirl

On Wed, Feb 9, 2011 at 9:59 PM, Anthony Liguori  wrote:
> On 02/09/2011 06:48 PM, Blue Swirl wrote:
>>>
>>> ISASerialState dev;
>>>
>>> isa_serial_init(&dev, 0, 0x274, 0x07, NULL, NULL);
>>>
>>
>> Do you mean that there should be a generic way of doing that, like
>> sysbus_create_varargs() for qdev, or just add inline functions which
>> hide qdev property setup?
>>
>> I still think that FDT should be used in the future. That would
>> require that the properties can be set up mechanically, and I don't
>> see how your proposal would help that.
>>
>
> Yeah, I don't think that is a good idea anymore.  I think this is part of
> why we're having so many problems with qdev.
>
> While (most?) hardware hierarchies can be represented by device tree syntax,
> not all valid device trees correspond to interface and/or useful hardware
> hierarchies.

User creates a non-working machine and so gets to fix the problems?
How is that a problem for us?

> We want to have an interface to create large chunks of hardware (like an
> i440fx) which then results in a significant portion of a device tree.

But how would this affect interface to devices? I don't see how that
would be any different with current model and the function call model.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-11 Thread Blue Swirl

On Thu, Feb 10, 2011 at 9:47 AM, Anthony Liguori  wrote:
> On 02/09/2011 09:15 PM, Blue Swirl wrote:
>>
>> On Wed, Feb 9, 2011 at 9:59 PM, Anthony Liguori
>>  wrote:
>>
>>>
>>> On 02/09/2011 06:48 PM, Blue Swirl wrote:
>>>
>>>>>
>>>>> ISASerialState dev;
>>>>>
>>>>> isa_serial_init(&dev, 0, 0x274, 0x07, NULL, NULL);
>>>>>
>>>>>
>>>>
>>>> Do you mean that there should be a generic way of doing that, like
>>>> sysbus_create_varargs() for qdev, or just add inline functions which
>>>> hide qdev property setup?
>>>>
>>>> I still think that FDT should be used in the future. That would
>>>> require that the properties can be set up mechanically, and I don't
>>>> see how your proposal would help that.
>>>>
>>>>
>>>
>>> Yeah, I don't think that is a good idea anymore.  I think this is part of
>>> why we're having so many problems with qdev.
>>>
>>> While (most?) hardware hierarchies can be represented by device tree
>>> syntax,
>>> not all valid device trees correspond to interface and/or useful hardware
>>> hierarchies.
>>>
>>
>> User creates a non-working machine and so gets to fix the problems?
>> How is that a problem for us?
>>
>
> It's not about creating a non-working machine.  It's about what user-level
> abstraction we need to provide.
>
> It's a whole lot easier to implement an i440fx device with a fixed set of
> parameters than it is to make every possible subdevice have a proper factory
> interface along with mechanisms to hook everything together.
>
> Basically, we're making things much harder for ourselves than we should.
>
>>> We want to have an interface to create large chunks of hardware (like an
>>> i440fx) which then results in a significant portion of a device tree.
>>>
>>
>> But how would this affect interface to devices? I don't see how that
>> would be any different with current model and the function call model.
>>
>
> If all composition is done through a factory interface, it doesn't.  But my
> main argument here is that we shouldn't try to make all composition done
> through a factory interface--only where it makes sense.
>
> So very concretely, I'm suggesting we do the following to target-i386:
>
> 1) make the i440fx device have an embedded ide controller, piix3, and usb
> controller that get initialized automatically.  The piix3 embeds the
> PCI-to-ISA bridge along with all of the default ISA devices (rtc, serial,
> etc.).

This makes sense.

> 2) get rid of the entire concept of machines.  Creating a i440fx is
> essentially equivalent to creating a bare machine.

This doesn't make so much sense. There's still memory and the PCI
devices plugged to PCI bus created by i440fx. The various drives need
to be connected IDE channels, chardevs to serial ports etc., the
devices can't claim them in order of creation. The connections must be
managed at board level.

But I don't disagree completely, some time ago I proposed that
machines should be qdevified and also some of host functions.

> 3) just use the existing -device infrastructure to support all of this.  A
> very simple device config corresponds to a very complex device tree but
> that's the desired effect.

This depends on the above.

> 4) model the CPUs as devices that take a pointer to a host controller, for
> x86, the normal case would be giving it a pointer to i440fx.

For more precision, each CPUs should connect to its cache controller,
which should connect to local APIC, that to global bus for IOAPIC,
that to northbridge, which connects to both memory and southbridge.

Anyway, all of the above points 1 to 4 are orthogonal to qdev and its
API, they can be done either way.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-11 Thread Blue Swirl

On Thu, Feb 10, 2011 at 6:05 PM, Anthony Liguori  wrote:
> On 02/10/2011 03:20 PM, Gleb Natapov wrote:
>>
>> Jugging by how well all previous conversion went we will end up with one
>> more way of creating devices. One legacy, another qdev and your new one.
>> And what is the problem with qdev again (not that I am a big qdev fan)?
>>
>
> We've really been arguing about probably the most minor aspect of the
> problem with qdev.
>
> All I'm really saying is that we shouldn't tie device construction to a
> factory interface as we do with qdev.
>
> That simply means that we should be able to do:
>
> RTC *rtc_create(arg1, arg2, arg2);

I don't see how that would help at all. Throwing qdev away and just
calling various functions directly, with all states exposed would be
like QEMU 0.9.0.

> And that a separate piece of code decides which devices are exposed through
> -device or device_add.  Which devices are exposed is really a minor detail.
>
> That said, qdev has a number of significant limitations in my mind.  The
> first is that the only relationship between devices is through the BusState
> interface.

There's also qemu_irq for arbitrary signals.

>  I don't think we should even try to have a generic bus model.
>  When you look at how badly broken PCI hotplug is current in qdev, I think
> this is symptomatic of this.

And how should this be fixed? The API change would not help.

> There's also no way in qdev to really have polymorphism.  Interfaces really
> aren't meaningful in qdev so you have things like PCIDevice where some
> methods are stored in the object instead of the class dispatch table and you
> have overuse of static class members.

QEMU is developed in C, not C++.

> And it's all unrelated to VMState.

Right, but this has also the good side that not all device state is
automatically exported. If other devices would be allowed to muck with
a devices internal state freely, bad things could happen.

Device reset could also use standard register definitions, shared with VMState.

> And this is just the basic mechanisms of qdev.  The actual implementation is
> worse.  The use of qemu_irq as gpio in the base class and overuse of
> SystemBus is really quite insane.

Maybe qemu_irq should be renamed to QEMUSignal (and I don't like
typedeffing pointers), otherwise it looks quite sane to me.

Could you point to examples of SystemBus overuse?

> And so far, the use of qdev has been entirely superficial.  Devices still
> don't make use of bus level interfaces to do I/O so we don't have any better
> componentization than we did before qdev.
>
>> The fact that there is no enough interest to convert all devices to it?
>>
>
> I don't think there is any device that has been improved by qdev.  -device
> is a nice feature, but it could have been implemented without qdev.

We have 'info qtree' which can't be implemented easily without a
generic device class. Avi (or who was it) sent patches to expose even
more device state.

With the patches I'm going to apply, if Redhat wants to disable
building various devices, it can be done without #ifdeffery. This is
not possible without a generic factory interface.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-13 Thread Blue Swirl

On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori  wrote:
> On 02/11/2011 12:14 PM, Blue Swirl wrote:
>>
>> On Thu, Feb 10, 2011 at 6:05 PM, Anthony Liguori
>>  wrote:
>>
>>>
>>> On 02/10/2011 03:20 PM, Gleb Natapov wrote:
>>>
>>>>
>>>> Jugging by how well all previous conversion went we will end up with one
>>>> more way of creating devices. One legacy, another qdev and your new one.
>>>> And what is the problem with qdev again (not that I am a big qdev fan)?
>>>>
>>>>
>>>
>>> We've really been arguing about probably the most minor aspect of the
>>> problem with qdev.
>>>
>>> All I'm really saying is that we shouldn't tie device construction to a
>>> factory interface as we do with qdev.
>>>
>>> That simply means that we should be able to do:
>>>
>>> RTC *rtc_create(arg1, arg2, arg2);
>>>
>>
>> I don't see how that would help at all. Throwing qdev away and just
>> calling various functions directly, with all states exposed would be
>> like QEMU 0.9.0.
>>
>
> qdev doesn't expose any state today.  qdev properties are construction-only
> properties that happen to be stored in each device state.
>
> What we really need is a full property framework that includes properties
> with hookable getters and setters along with the ability to mark properties
> as construct-only, read-only, or read-write.
>
> But I think it's reasonable to expose construct-only properties as just an
> initfn argument.

Sounds OK. About read-write properties, what happens if we one day
have extensive threading, and locks are pushed to device level? I can
imagine a deadlock involving one thread running in IO thread for a
device and another trying to access that device's properties. Maybe
that is not different from function call version.

>>> And that a separate piece of code decides which devices are exposed
>>> through
>>> -device or device_add.  Which devices are exposed is really a minor
>>> detail.
>>>
>>> That said, qdev has a number of significant limitations in my mind.  The
>>> first is that the only relationship between devices is through the
>>> BusState
>>> interface.
>>>
>>
>> There's also qemu_irq for arbitrary signals.
>>
>
> Yes, but qemu_irq is very restricted as it only models a signal bit of
> information and doesn't really have a mechanism to attach/detach in any
> generic way.

Basic signals are already very useful for many purposes, since they
match digital logic signals in real HW. In theory, whole machines
could be constructed with just qemu_irq and NAND gate emulator. ;-)

In the message passing IRQ discussion earlier, it was IIRC decided
that the one bit version would not be changed but a separate message
passing version would be created if ever needed.

>>>  I don't think we should even try to have a generic bus model.
>>>  When you look at how badly broken PCI hotplug is current in qdev, I
>>> think
>>> this is symptomatic of this.
>>>
>>
>> And how should this be fixed? The API change would not help.
>>
>
> Just as we have bus level creation functions, we should have bus level
> hotplug interfaces.
>
>>> There's also no way in qdev to really have polymorphism.  Interfaces
>>> really
>>> aren't meaningful in qdev so you have things like PCIDevice where some
>>> methods are stored in the object instead of the class dispatch table and
>>> you
>>> have overuse of static class members.
>>>
>>
>> QEMU is developed in C, not C++.
>>
>
> But we're trying to do object oriented programming in C so as long as we're
> doing that, we ought to do it right.
>
>>> And it's all unrelated to VMState.
>>>
>>
>> Right, but this has also the good side that not all device state is
>> automatically exported. If other devices would be allowed to muck with
>> a devices internal state freely, bad things could happen.
>>
>> Device reset could also use standard register definitions, shared with
>> VMState.
>>
>
> There's a way to have formally verifiable serialization/deserialization if
> we can satisfy two conditions 1) the devices rely on no global state (i.e.
> static variables) and 2) every field asssociated with a device is marshalled
> during serialization/deserialization.
>
> When we define a device, right now we say that certain state is writable
> during construction.  It's not a stretch to wan

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-13 Thread Blue Swirl

On Sun, Feb 13, 2011 at 9:57 PM, Anthony Liguori  wrote:
> On 02/13/2011 01:37 PM, Blue Swirl wrote:
>>
>> On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori
>>  wrote:
>>
>>>
>>> qdev doesn't expose any state today.  qdev properties are
>>> construction-only
>>> properties that happen to be stored in each device state.
>>>
>>> What we really need is a full property framework that includes properties
>>> with hookable getters and setters along with the ability to mark
>>> properties
>>> as construct-only, read-only, or read-write.
>>>
>>> But I think it's reasonable to expose construct-only properties as just
>>> an
>>> initfn argument.
>>>
>>
>> Sounds OK. About read-write properties, what happens if we one day
>> have extensive threading, and locks are pushed to device level? I can
>> imagine a deadlock involving one thread running in IO thread for a
>> device and another trying to access that device's properties. Maybe
>> that is not different from function call version.
>>
>
> You need hookable setters/getters that can acquire a lock and do the right
> thing.  It shouldn't be able to dead lock if the locking is designed right.
>
>
>>> Yes, but qemu_irq is very restricted as it only models a signal bit of
>>> information and doesn't really have a mechanism to attach/detach in any
>>> generic way.
>>>
>>
>> Basic signals are already very useful for many purposes, since they
>> match digital logic signals in real HW. In theory, whole machines
>> could be constructed with just qemu_irq and NAND gate emulator. ;-)
>>
>
> It's not just in theory.  In the C++ port of QEMU that I wrote, I
> implemented an AND, OR, and XOR gate and implemented a full 32-bit adder by
> just using a device config file.
>
> If done correctly, using referencing can be extremely powerful.  A full
> adder is a good example.  The gates really don't have any concept of bus and
> the relationship between gates is definitely not a tree.
>
>> In the message passing IRQ discussion earlier, it was IIRC decided
>> that the one bit version would not be changed but a separate message
>> passing version would be created if ever needed.
>>
>
> C already has a message passing interface that supports type safety called
> function pointers :-)
>
> An object that implements multiple interfaces where the interface becomes
> the "message passing interface" is exactly what I've been saying we need.
>  It's flexible and the compiler helps us enforce typing.
>
>>>
>>> Any interfaces of a base class should make sense even for derived
>>> classes.
>>>
>>> That means if the base class is going to expose essentially a pin-out
>>> interface, that if I have a PCIDevice and cast it to Device, I should be
>>> able to interact with the GPIO interface to interact with the PCI device.
>>>  Presumably, that means interfacing at the PCI signalling level.  That's
>>> insane to model in QEMU :-)
>>>
>>
>> This would be doable, if we built buses from a bunch of signals, like
>> in VHDL or Verilog. It would simplify aliased MMIO addresses nicely,
>> the undecoded address pins would be ignored. I don't think it would be
>> useful, but a separate interface could be added for connecting to
>> PCIBus with just qemu_irqs.
>>
>
> Yeah, it's possible, but I don't want to spend my time doing this.
>
>>> In reality, GPIO only makes sense for a small class of simple devices
>>> where
>>> modelling the pin-out interface makes sense (like a 7-segment LCD).  That
>>> suggests that GPIO should not be in the DeviceState interface but instead
>>> should be in a SimpleDevice subclass or something like that.
>>>
>>>
>>>>
>>>> Could you point to examples of SystemBus overuse?
>>>>
>>>>
>>>
>>> anthony@titi:~/git/qemu/hw$ grep qdev_create *.c | wc -l
>>> 73
>>> anthony@titi:~/git/qemu/hw$ grep 'qdev_create(NULL' *.c | wc -l
>>> 56
>>>
>>> SystemBus has become a catch-all for shallow qdev conversions.  We've got
>>> Northbridges, RAM, and network devices sitting on the same bus...
>>>
>>
>> On Sparc32 I have not bothered to create a SBus bus. Now it would be
>> useful to get bootindex corrected. Most devices (even on-board IO)
>> should use SBus.
>>
>> The only other bus (MBus) would exist between

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-14 Thread Blue Swirl

On Mon, Feb 14, 2011 at 12:42 AM, Anthony Liguori  wrote:
> On 02/13/2011 03:00 PM, Blue Swirl wrote:
>>
>> On Sun, Feb 13, 2011 at 9:57 PM, Anthony Liguori
>>  wrote:
>>
>>>
>>> On 02/13/2011 01:37 PM, Blue Swirl wrote:
>>>
>>>>
>>>> On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori
>>>>  wrote:
>>>>
>>>>
>>>>>
>>>>> qdev doesn't expose any state today.  qdev properties are
>>>>> construction-only
>>>>> properties that happen to be stored in each device state.
>>>>>
>>>>> What we really need is a full property framework that includes
>>>>> properties
>>>>> with hookable getters and setters along with the ability to mark
>>>>> properties
>>>>> as construct-only, read-only, or read-write.
>>>>>
>>>>> But I think it's reasonable to expose construct-only properties as just
>>>>> an
>>>>> initfn argument.
>>>>>
>>>>>
>>>>
>>>> Sounds OK. About read-write properties, what happens if we one day
>>>> have extensive threading, and locks are pushed to device level? I can
>>>> imagine a deadlock involving one thread running in IO thread for a
>>>> device and another trying to access that device's properties. Maybe
>>>> that is not different from function call version.
>>>>
>>>>
>>>
>>> You need hookable setters/getters that can acquire a lock and do the
>>> right
>>> thing.  It shouldn't be able to dead lock if the locking is designed
>>> right.
>>>
>>>
>>>
>>>>>
>>>>> Yes, but qemu_irq is very restricted as it only models a signal bit of
>>>>> information and doesn't really have a mechanism to attach/detach in any
>>>>> generic way.
>>>>>
>>>>>
>>>>
>>>> Basic signals are already very useful for many purposes, since they
>>>> match digital logic signals in real HW. In theory, whole machines
>>>> could be constructed with just qemu_irq and NAND gate emulator. ;-)
>>>>
>>>>
>>>
>>> It's not just in theory.  In the C++ port of QEMU that I wrote, I
>>> implemented an AND, OR, and XOR gate and implemented a full 32-bit adder
>>> by
>>> just using a device config file.
>>>
>>> If done correctly, using referencing can be extremely powerful.  A full
>>> adder is a good example.  The gates really don't have any concept of bus
>>> and
>>> the relationship between gates is definitely not a tree.
>>>
>>>
>>>>
>>>> In the message passing IRQ discussion earlier, it was IIRC decided
>>>> that the one bit version would not be changed but a separate message
>>>> passing version would be created if ever needed.
>>>>
>>>>
>>>
>>> C already has a message passing interface that supports type safety
>>> called
>>> function pointers :-)
>>>
>>> An object that implements multiple interfaces where the interface becomes
>>> the "message passing interface" is exactly what I've been saying we need.
>>>  It's flexible and the compiler helps us enforce typing.
>>>
>>>
>>>>>
>>>>> Any interfaces of a base class should make sense even for derived
>>>>> classes.
>>>>>
>>>>> That means if the base class is going to expose essentially a pin-out
>>>>> interface, that if I have a PCIDevice and cast it to Device, I should
>>>>> be
>>>>> able to interact with the GPIO interface to interact with the PCI
>>>>> device.
>>>>>  Presumably, that means interfacing at the PCI signalling level.
>>>>>  That's
>>>>> insane to model in QEMU :-)
>>>>>
>>>>>
>>>>
>>>> This would be doable, if we built buses from a bunch of signals, like
>>>> in VHDL or Verilog. It would simplify aliased MMIO addresses nicely,
>>>> the undecoded address pins would be ignored. I don't think it would be
>>>> useful, but a separate interface could be added for connecting to
>>>> PCIBus with just qemu_irqs.
>>>>
>>>>
>>>
>>> Yeah, it's possible, but I don't want

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-14 Thread Blue Swirl

On Mon, Feb 14, 2011 at 10:53 PM, Anthony Liguori  wrote:
> On 02/14/2011 11:31 AM, Blue Swirl wrote:
>>
>> I don't understand. The caller just does
>> if (isa_serial_init()) {
>>   error();
>> }
>> or
>> if (serial_init()) {
>>   error();
>> }
>>
>> If you mean inside isa_serial_init() vs. serial_init(), that may be
>> true since isa_serial_init has to check for qdev failures, but the to
>> the caller both should be identical.
>>
>
> The problem with qdev is there's too much boiler plate code which makes it
> hard to give examples :-)  Here's precisely what I'm talking about:
>
> static int serial_isa_initfn(ISADevice *dev)
> {
>    static int index;
>    ISASerialState *isa = DO_UPCAST(ISASerialState, dev, dev);
>    SerialState *s = &isa->state;
>
>    if (isa->index == -1)
>        isa->index = index;
>    if (isa->index >= MAX_SERIAL_PORTS)
>        return -1;
>    if (isa->iobase == -1)
>        isa->iobase = isa_serial_io[isa->index];
>    if (isa->isairq == -1)
>        isa->isairq = isa_serial_irq[isa->index];
>    index++;
>
>    s->baudbase = 115200;
>    isa_init_irq(dev, &s->irq, isa->isairq);
>    serial_init_core(s);
>    qdev_set_legacy_instance_id(&dev->qdev, isa->iobase, 3);
>
>    register_ioport_write(isa->iobase, 8, 1, serial_ioport_write, s);
>    register_ioport_read(isa->iobase, 8, 1, serial_ioport_read, s);
>    isa_init_ioport_range(dev, isa->iobase, 8);
>    return 0;
> }
>
> SerialState *serial_init(int base, qemu_irq irq, int baudbase,
>                         CharDriverState *chr)
> {
>    SerialState *s;
>
>    s = qemu_mallocz(sizeof(SerialState));
>
>    s->irq = irq;
>    s->baudbase = baudbase;
>    s->chr = chr;
>    serial_init_core(s);
>
>    vmstate_register(NULL, base, &vmstate_serial, s);
>
>    register_ioport_write(base, 8, 1, serial_ioport_write, s);
>    register_ioport_read(base, 8, 1, serial_ioport_read, s);
>    return s;
> }
>
> static ISADeviceInfo serial_isa_info = {
>    .qdev.name  = "isa-serial",
>    .qdev.size  = sizeof(ISASerialState),
>    .qdev.vmsd  = &vmstate_isa_serial,
>    .init       = serial_isa_initfn,
>    .qdev.props = (Property[]) {
>        DEFINE_PROP_UINT32("index", ISASerialState, index,   -1),
>        DEFINE_PROP_HEX32("iobase", ISASerialState, iobase,  -1),
>        DEFINE_PROP_UINT32("irq",   ISASerialState, isairq,  -1),
>        DEFINE_PROP_CHR("chardev",  ISASerialState, state.chr),
>        DEFINE_PROP_END_OF_LIST(),
>    },
> };
>
> static void serial_register_devices(void)
> {
>    isa_qdev_register(&serial_isa_info);
> }
>
> device_init(serial_register_devices)
>
>
> To create a device, I need to do:
>
> {
>     ISADevice *dev;
>
>     dev = isa_create("isa-serial");
>     if (dev == NULL) {
>          return error;
>     }
>     if (qdev_set_uint32(&dev->qdev, "index", index)) {
>          goto err;
>     }
>     if (qdev_set_uint32(&dev->qdev, "iobase", iobase)) {
>          goto err;
>     }
>     if (qdev_set_uint32(&dev->qdev, "irq", irq)) {
>         goto err;
>     }
>     if (qdev_set_chr(&dev->qdev, "chardev", chr)) {
>         goto err;
>     }
>     if (qdev_init(&dev->qdev)) {
>         goto err;
>     }
>     return 0;
> err:
>     qdev_destroy(&dev->qdev);
>     return -1;
> }
>
> This is simply not a reasonable API to use to create devices.

This can be wrapped in a static inline function, with similar
signature to what you propose:

static inline ISADevice *serial_init(int base, qemu_irq irq, int
baudbase, CharDriverState *chr);

>  There are two
> ways we can make this more managable.  The first is gobject-style vararg
> constructor coupled with a type safe wrapper.  So...
>
> ISASerialDevice *isa_serial_device_new(uint32_t index, uint32_t iobase,
> uint32_t irq, CharDriverState *chr)
> {
>      return isa_device_create_va("isa-seral", "index", index, "iobase",
> iobase, "irq", irq, "chardev", chr, NULL);
> }
>
> Now this can be used in a reasonable fashion.  However, we can do even
> better if we change the way qdev is done.   Consider the following:
>
> SerialState *serial_init(int base, qemu_irq irq, int baudbase,
>                         CharDriverState *chr)
> {
>    SerialState *s;
>
>    s = qemu_mallocz(sizeof(SerialState));
>
>

Re: [Qemu-devel] KVM call minutes for Feb 8

2011-02-15 Thread Blue Swirl

On Mon, Feb 14, 2011 at 11:47 PM, Anthony Liguori  wrote:
> On 02/14/2011 03:25 PM, Blue Swirl wrote:
>>
>> I'd still like to have the inline wrapper over the factory interface,
>> probably with similar signature to isa_serial_new. Then there would be
>> two functions, one going through qdev and the other bypassing it. I
>> don't see how that would be useful.
>>
>> The callers of the direct interface would force linkage between them
>> and so it would be impossible to build QEMU with that device. We don't
>> need that flexibility for every device though, but I don't see any
>> advantages for using the direct interface either.
>>
>> Why shouldn't we want all devices to be exposed to the user? For
>> example, there are still devices which don't show up in 'info qtree',
>> which is a shame.
>>
>
> Showing up in info qtree is goodness, but I'm talking about allowing a user
> to directly instantiate a device.
>
> Any device we expose to the user through -device needs to maintain a
> compatible interface forever.  For our own sanity, I think we should try to
> expose as little as possible.

Restricting the users from adding arbitrary devices is a different
issue. Dropping qdev support to prevent user from adding the device
seems draconian, what's wrong with no_user flag?

> A good example of a device that we should model through qdev but not expose
> via -device is actually SerialState.

You wouldn't want users to add any serial ports? What should be do
with serial ports then, always enable a full set of ports? How would
the user use them?

> Today, we have ISASerialState which embeds SerialState.  We can also create
> a MMIO version of SerialState although there's no direct structure that
> wraps that.
>
> Ideally, SerialState would be a proper qdev device that is embedded in both
> ISASerialState and MMIOSerialState (or pick a better name).  info qtree
> should show a has-a relationship for these devices.

I think the devices shown in qtree should always have some
relationship to real devices. If ICH10 contains all possible onboard
devices, including for example HPET, e1000 and SATA, that could use a
has-a relationship to show the composition but otherwise I fear this
would only increase complexity with no gain.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-07 Thread Blue Swirl

On Thu, Apr 7, 2011 at 9:51 PM, Gleb Natapov  wrote:
> On Thu, Apr 07, 2011 at 01:32:50PM -0500, Anthony Liguori wrote:
>> On 04/07/2011 01:10 PM, Peter Maydell wrote:
>> >On 6 April 2011 20:34, Anthony Liguori  wrote:
>> >>http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=/liaai/crashdump/liaaicrashdumpnmiipmi.htm
>> >>
>> >>If an OS is totally hosed (spinning with interrupts disabled), and NMI can
>> >>be used to generate a crash dump.
>> >>
>> >>It's a debug feature and modelling it exactly the way we are probably makes
>> >>sense for other architectures too.  The real semantics are basically force
>> >>guest crash dump.
>> >Ah, right. (There isn't really an equivalent to this on ARM since
>> >we don't have a real NMI equivalent. So any implementation for ARM
>> >qemu would be board dependent since you could wire a watchdog up to
>> >any interrupt.)
>> >
>> >Should we try to pick a command name that says what it's supposed to
>> >do rather than how it happens to be implemented on x86 ?
>>
>> Yup, I was thinking the same thing after I sent the note above.  If
>> we call it 'force-crash-dump', we can implement it as an NMI on
>> target-i386 and potentially as something else on a different target.
>>
> NMI does not have to generate crash dump on every guest we support.
> Actually even for windows guest it does not generate one without
> tweaking registry. For all I know there is a guest that checks mail when
> NMI arrives. Lets give meaningful name, like inject-nmi, for nmi
> injection command.

I'd prefer something more generic like these:
raise /apic@fee0:l1int
lower /i44FX-pcihost/e1000@03.0/pinD

The clumsier syntax shouldn't be a problem, since this would be a
system developer tool.

Some kind of IRQ registration would be needed for this to work without
lots of changes.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-08 Thread Blue Swirl

On Fri, Apr 8, 2011 at 9:04 AM, Gleb Natapov  wrote:
> On Thu, Apr 07, 2011 at 04:41:03PM -0500, Anthony Liguori wrote:
>> On 04/07/2011 02:17 PM, Gleb Natapov wrote:
>> >On Thu, Apr 07, 2011 at 10:04:00PM +0300, Blue Swirl wrote:
>> >>On Thu, Apr 7, 2011 at 9:51 PM, Gleb Natapov  wrote:
>> >>
>> >>I'd prefer something more generic like these:
>> >>raise /apic@fee0:l1int
>> >>lower /i44FX-pcihost/e1000@03.0/pinD
>> >>
>> >>The clumsier syntax shouldn't be a problem, since this would be a
>> >>system developer tool.
>> >>
>> >>Some kind of IRQ registration would be needed for this to work without
>> >>lots of changes.
>> >True. The ability to trigger any interrupt line is very useful for
>> >debugging. I often re-implement it during debug.
>>
>> And it's a good thing to have, but exposing this as the only API to
>> do something as simple as generating a guest crash dump is not the
>> friendliest thing in the world to do to users.
>>
> Well, this is not intended to be used by regular users directly and
> management can provide nicer interface for issuing NMI. But really,
> my point is that NMI actually generates guest core dump in such rare
> cases (only preconfigured Windows guests) that it doesn't warrant to
> name command as such. Management is in much better position to implement
> functionality with such name since it knows what type of guest it runs
> and can tell agent to configure guest accordingly.

Does the management need to know about each and every debugging
oriented interface? For example, "info regs",  "info mem", "info irq"
and tracepoints?

I think giving IRQs symbolic names could solve some other problems as
well. Maybe it should be possible to connect IRQs in a configuration
file and even with command line:
-device port90,irqid=p90out -device pckbd,irqid=kbdout -device
and,in=p90out,in=kbdout,out=sreset device system_reset,in=sreset

or

 -device 
and,in=/i44FX-pcihost/PIIX3/i8042/out1,in=/i44FX-pcihost/PIIX3/p90/out1,out=/QEMU/system_reset
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-08 Thread Blue Swirl

On Fri, Apr 8, 2011 at 10:32 PM, Anthony Liguori  wrote:
> On 04/08/2011 02:17 PM, Blue Swirl wrote:
>>
>> On Fri, Apr 8, 2011 at 9:04 AM, Gleb Natapov  wrote:
>>>
>>> On Thu, Apr 07, 2011 at 04:41:03PM -0500, Anthony Liguori wrote:
>>>>
>>>> On 04/07/2011 02:17 PM, Gleb Natapov wrote:
>>>>>
>>>>> On Thu, Apr 07, 2011 at 10:04:00PM +0300, Blue Swirl wrote:
>>>>>>
>>>>>> On Thu, Apr 7, 2011 at 9:51 PM, Gleb Natapov
>>>>>>  wrote:
>>>>>>
>>>>>> I'd prefer something more generic like these:
>>>>>> raise /apic@fee0:l1int
>>>>>> lower /i44FX-pcihost/e1000@03.0/pinD
>>>>>>
>>>>>> The clumsier syntax shouldn't be a problem, since this would be a
>>>>>> system developer tool.
>>>>>>
>>>>>> Some kind of IRQ registration would be needed for this to work without
>>>>>> lots of changes.
>>>>>
>>>>> True. The ability to trigger any interrupt line is very useful for
>>>>> debugging. I often re-implement it during debug.
>>>>
>>>> And it's a good thing to have, but exposing this as the only API to
>>>> do something as simple as generating a guest crash dump is not the
>>>> friendliest thing in the world to do to users.
>>>>
>>> Well, this is not intended to be used by regular users directly and
>>> management can provide nicer interface for issuing NMI. But really,
>>> my point is that NMI actually generates guest core dump in such rare
>>> cases (only preconfigured Windows guests) that it doesn't warrant to
>>> name command as such. Management is in much better position to implement
>>> functionality with such name since it knows what type of guest it runs
>>> and can tell agent to configure guest accordingly.
>>
>> Does the management need to know about each and every debugging
>> oriented interface? For example, "info regs",  "info mem", "info irq"
>> and tracepoints?
>>
>> I think giving IRQs symbolic names could solve some other problems as
>> well. Maybe it should be possible to connect IRQs in a configuration
>> file and even with command line:
>> -device port90,irqid=p90out -device pckbd,irqid=kbdout -device
>> and,in=p90out,in=kbdout,out=sreset device system_reset,in=sreset
>
> You really want devices to have properties and for the device properties to
> be discoverable.  For instance:
>
> struct DeviceInfo
> {
>     .name = "and",
>     .properties = {
>          DEFINE_IRQ_IN(AndDevice, in[0]),
>          DEFINE_IRQ_IN(AndDevice, in[1]),
>          DEFINE_IRQ_OUT(AndDevice, out),
>     },
> };
>
> And then you can do:
>
> -device port90,id=port90 -device pckbd,id=pckbd \
> -device and,in[0]=port90.out,in[1]=pckbd.out,id=reset_and \
> -device system_reset.in=reset_and

Exactly. Given a NAND device, we could construct entire machines from
CLI or for example co-simulate SoCs with FPGAs using cells based on
the net lists.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)

2011-04-09 Thread Blue Swirl

On Sat, Apr 9, 2011 at 2:25 AM, Luiz Capitulino  wrote:
> Hi there,
>
> Summary:
>
>  - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got
>   the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's
>   as fast as qemu-kvm.git)
>
>  - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried
>   with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet)
>
> I tried with qemu.git v0.13.0 in order to check if this was a regression, but
> I got the same problem...
>
> Then I inspected qemu-kvm.git under the assumption that it could have a fix
> that wasn't commited to qemu.git. Found this:
>
>  - commit 0836b77f0f65d56d08bdeffbac25cd6d78267dc9 which is merge, works
>
>  - commit cc015e9a5dde2f03f123357fa060acbdfcd570a4 does not work (it's slow)
>
> I tried a bisect, but it brakes due to gcc4 vs. gcc3 changes. Then I inspected
> commits manually, and found out that commit 64d7e9a4 doesn't work, which makes
> me think that the fix could be in the conflict resolution of 0836b77f, which
> makes me remember that I'm late for diner, so my conclusions at this point are
> not reliable :)
>
> Ideas?

What is the test case? I tried PXE booting a 10M file with and without
KVM and the results are pretty much the same with pcnet and e1000.
time qemu -monitor stdio -boot n -net nic,model=e1000 -net
user,tftp=.,bootfile=10M -net dump,file=foo -enable-kvm
time qemu -monitor stdio -boot n -net nic,model=pcnet -net
user,tftp=.,bootfile=10M -net dump,file=foo -enable-kvm
time qemu -monitor stdio -boot n -net nic,model=e1000 -net
user,tftp=.,bootfile=10M -net dump,file=foo
time qemu -monitor stdio -boot n -net nic,model=pcnet -net
user,tftp=.,bootfile=10M -net dump,file=foo

All times are ~10s.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 14/16] kvm: x86: Add user space part for in-kernel i8259

2011-12-04 Thread Blue Swirl

On Sun, Dec 4, 2011 at 16:35, Avi Kivity  wrote:
> On 12/04/2011 05:19 PM, Jan Kiszka wrote:
>> >
>> > In the sense that kernel-apic is just an accelerated apic.  From the
>> > guest point of view, there's no difference, and that should be reflected
>> > in the device model.
>>
>> That was my goal as well: The guest should not notice the difference,
>> but the admin on the host side should still be able to tell both
>> internally fairly different models apart.
>
> This should be some attribute, not the name.
>
>> Plus the code should be
>> clearly split where there are differences and explicitly shared where
>> there aren't.
>
> That's a good goal, yes.

I'd prefer an unified device built from a single source file if
possible. This conflicts with the build-once model though.

>>
>> >
>> > If I'm reading an apic register, either from the guest or via a monitor
>> > debug interface, I shouldn't care whether it's accelerated or not.  The
>> > guest part already holds, of course.
>>
>> Specifically for the debug scenario, I'd prefer the clear
>> differentiation by name as there can always remain subtle differences in
>> the implementation of kernel vs. user space. Someone debugging the guest
>> and/or qemu/kvm should remain aware of this.
>
> Aware, yes, but the name change is too drastic.

It should be also possible to migrate from non-KVM device to KVM
version, different names would prevent that for ever.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 12/15] kvm: x86: Add user space part for in-kernel APIC

2011-12-08 Thread Blue Swirl

On Thu, Dec 8, 2011 at 11:52, Jan Kiszka  wrote:
> This introduces the alternative APIC backend which makes use of KVM's
> in-kernel device model. External NMI injection via LINT1 is emulated by
> checking the current state of the in-kernel APIC, only injecting a NMI
> into the VCPU if LINT1 is unmasked and configured to DM_NMI.
>
> MSI is not yet supported, so we disable this when the in-kernel model is
> in use.
>
> CC: Lai Jiangshan 
> Signed-off-by: Jan Kiszka 
> ---
>  Makefile.target   |    2 +-
>  hw/kvm/apic.c     |  154 
> +
>  hw/pc.c           |   15 --
>  kvm.h             |    3 +
>  target-i386/kvm.c |    8 +++
>  5 files changed, 176 insertions(+), 6 deletions(-)
>  create mode 100644 hw/kvm/apic.c
>
> diff --git a/Makefile.target b/Makefile.target
> index b549988..76de485 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -236,7 +236,7 @@ obj-i386-y += vmport.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
>  obj-i386-y += pc_piix.o
> -obj-i386-$(CONFIG_KVM) += kvm/clock.o
> +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
> new file mode 100644
> index 000..3924f9e
> --- /dev/null
> +++ b/hw/kvm/apic.c
> @@ -0,0 +1,154 @@
> +/*
> + * KVM in-kernel APIC support
> + *
> + * Copyright (c) 2011 Siemens AG
> + *
> + * Authors:
> + *  Jan Kiszka          
> + *
> + * This work is licensed under the terms of the GNU GPL version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "hw/apic_internal.h"
> +#include "kvm.h"
> +
> +static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
> +                                   int reg_id, uint32_t val)
> +{
> +    *((uint32_t *)(kapic->regs + (reg_id << 4))) = val;
> +}
> +
> +static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic,
> +                                       int reg_id)
> +{
> +    return *((uint32_t *)(kapic->regs + (reg_id << 4)));
> +}
> +
> +int kvm_put_apic(CPUState *env)
> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);

Please pass APICState instead of CPUState.

> +    struct kvm_lapic_state kapic;
> +    int i;
> +
> +    if (s && kvm_enabled() && kvm_irqchip_in_kernel()) {
> +        memset(&kapic, 0, sizeof(kapic));
> +        kvm_apic_set_reg(&kapic, 0x2, s->id << 24);
> +        kvm_apic_set_reg(&kapic, 0x8, s->tpr);
> +        kvm_apic_set_reg(&kapic, 0xd, s->log_dest << 24);
> +        kvm_apic_set_reg(&kapic, 0xe, s->dest_mode << 28 | 0x0fff);
> +        kvm_apic_set_reg(&kapic, 0xf, s->spurious_vec);
> +        for (i = 0; i < 8; i++) {
> +            kvm_apic_set_reg(&kapic, 0x10 + i, s->isr[i]);
> +            kvm_apic_set_reg(&kapic, 0x18 + i, s->tmr[i]);
> +            kvm_apic_set_reg(&kapic, 0x20 + i, s->irr[i]);
> +        }
> +        kvm_apic_set_reg(&kapic, 0x28, s->esr);
> +        kvm_apic_set_reg(&kapic, 0x30, s->icr[0]);
> +        kvm_apic_set_reg(&kapic, 0x31, s->icr[1]);
> +        for (i = 0; i < APIC_LVT_NB; i++) {
> +            kvm_apic_set_reg(&kapic, 0x32 + i, s->lvt[i]);
> +        }
> +        kvm_apic_set_reg(&kapic, 0x38, s->initial_count);
> +        kvm_apic_set_reg(&kapic, 0x3e, s->divide_conf);
> +
> +        return kvm_vcpu_ioctl(env, KVM_SET_LAPIC, &kapic);
> +    }
> +
> +    return 0;
> +}
> +
> +int kvm_get_apic(CPUState *env)

Same here.

> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);
> +    struct kvm_lapic_state kapic;
> +    int ret, i, v;
> +
> +    if (s && kvm_enabled() && kvm_irqchip_in_kernel()) {
> +        ret = kvm_vcpu_ioctl(env, KVM_GET_LAPIC, &kapic);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        s->id = kvm_apic_get_reg(&kapic, 0x2) >> 24;
> +        s->tpr = kvm_apic_get_reg(&kapic, 0x8);
> +        s->arb_id = kvm_apic_get_reg(&kapic, 0x9);
> +        s->log_dest = kvm_apic_get_reg(&kapic, 0xd) >> 24;
> +        s->dest_mode = kvm_apic_get_reg(&kapic, 0xe) >> 28;
> +        s->spurious_vec = kvm_apic_get_reg(&kapic, 0xf);
> +        for (i = 0; i < 8; i++) {
> +            s->isr[i] = kvm_apic_get_reg(&kapic, 0x10 + i);
> +            s->tmr[i] = kvm_apic_get_reg(&kapic, 0x18 + i);
> +            s->irr[i] = kvm_apic_get_reg(&kapic, 0x20 + i);
> +        }
> +        s->esr = kvm_apic_get_reg(&kapic, 0x28);
> +        s->icr[0] = kvm_apic_get_reg(&kapic, 0x30);
> +        s->icr[1] = kvm_apic_get_reg(&kapic, 0x31);
> +        for (i = 0; i < APIC_LVT_NB; i++) {
> +            s->lvt[i] = kvm_apic_get_reg(&kapic, 0x32 + i);
> +        }
> +        s->initial_count = kvm_apic_get_reg(&kapic, 0x38);
> +        s->divide_conf = kvm_apic_get_reg(&kapic, 0x3e);
> +
> +        v = (s->divide_conf & 3) | ((s->divide_conf >> 1) & 4);
> +        s->count_shift = (v + 1) & 7;

Re: [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-08 Thread Blue Swirl

On Thu, Dec 8, 2011 at 11:52, Jan Kiszka  wrote:
> Changes in v4:
> - rebased of current uq/master
> - fixed stupid bugs that broke bisectability and user space irqchip mode
> - integrated NMI-over-LINT1 injection logic

I had comments to one patch, others look fine.

Overall, string based subtype selection does not somehow seem to be a
hot idea, but this could be used as a starting point which should be
cleaned up later when we have proper device composition. APIC and x86
interrupt handling need more cleanup anyway.

> CC: Lai Jiangshan 
>
> Jan Kiszka (15):
>  msi: Generalize msix_supported to msi_supported
>  kvm: Move kvmclock into hw/kvm folder
>  apic: Stop timer on reset
>  apic: Inject external NMI events via LINT1
>  apic: Introduce backend/frontend infrastructure for KVM reuse
>  apic: Open-code timer save/restore
>  i8259: Introduce backend/frontend infrastructure for KVM reuse
>  ioapic: Introduce backend/frontend infrastructure for KVM reuse
>  memory: Introduce memory_region_init_reservation
>  kvm: Introduce core services for in-kernel irqchip support
>  kvm: x86: Establish IRQ0 override control
>  kvm: x86: Add user space part for in-kernel APIC
>  kvm: x86: Add user space part for in-kernel i8259
>  kvm: x86: Add user space part for in-kernel IOAPIC
>  kvm: Arm in-kernel irqchip support
>
>  Makefile.objs                  |    2 +-
>  Makefile.target                |    6 +-
>  configure                      |    1 +
>  hw/apic.c                      |  309 ---
>  hw/apic.h                      |    1 +
>  hw/apic_common.c               |  312 
> 
>  hw/apic_internal.h             |  122 
>  hw/i8259.c                     |  127 ++--
>  hw/i8259_common.c              |  173 ++
>  hw/i8259_internal.h            |   82 +++
>  hw/ioapic.c                    |  130 ++---
>  hw/ioapic_common.c             |  138 ++
>  hw/ioapic_internal.h           |  106 ++
>  hw/kvm/apic.c                  |  154 
>  hw/{kvmclock.c => kvm/clock.c} |    4 +-
>  hw/{kvmclock.h => kvm/clock.h} |    0
>  hw/kvm/i8259.c                 |  126 
>  hw/kvm/ioapic.c                |  101 +
>  hw/msi.c                       |    8 +
>  hw/msi.h                       |    2 +
>  hw/msix.c                      |    9 +-
>  hw/msix.h                      |    2 -
>  hw/pc.c                        |   19 ++-
>  hw/pc.h                        |    1 +
>  hw/pc_piix.c                   |   66 -
>  kvm-all.c                      |  154 
>  kvm-stub.c                     |    5 +
>  kvm.h                          |   13 ++
>  memory.c                       |   36 +
>  memory.h                       |   16 ++
>  monitor.c                      |    6 +-
>  qemu-config.c                  |    4 +
>  qemu-options.hx                |    5 +-
>  sysemu.h                       |    1 -
>  target-i386/kvm.c              |   19 +++
>  trace-events                   |    2 +-
>  vl.c                           |    1 -
>  37 files changed, 1724 insertions(+), 539 deletions(-)
>  create mode 100644 hw/apic_common.c
>  create mode 100644 hw/apic_internal.h
>  create mode 100644 hw/i8259_common.c
>  create mode 100644 hw/i8259_internal.h
>  create mode 100644 hw/ioapic_common.c
>  create mode 100644 hw/ioapic_internal.h
>  create mode 100644 hw/kvm/apic.c
>  rename hw/{kvmclock.c => kvm/clock.c} (98%)
>  rename hw/{kvmclock.h => kvm/clock.h} (100%)
>  create mode 100644 hw/kvm/i8259.c
>  create mode 100644 hw/kvm/ioapic.c
>
> --
> 1.7.3.4
>

Re: [PATCH v4 12/15] kvm: x86: Add user space part for in-kernel APIC

2011-12-10 Thread Blue Swirl

On Fri, Dec 9, 2011 at 07:52, Jan Kiszka  wrote:
> On 2011-12-09 08:45, Jan Kiszka wrote:
>> On 2011-12-08 22:16, Blue Swirl wrote:
>>> On Thu, Dec 8, 2011 at 11:52, Jan Kiszka  wrote:
>>>> This introduces the alternative APIC backend which makes use of KVM's
>>>> in-kernel device model. External NMI injection via LINT1 is emulated by
>>>> checking the current state of the in-kernel APIC, only injecting a NMI
>>>> into the VCPU if LINT1 is unmasked and configured to DM_NMI.
>>>>
>>>> MSI is not yet supported, so we disable this when the in-kernel model is
>>>> in use.
>>>>
>>>> CC: Lai Jiangshan 
>>>> Signed-off-by: Jan Kiszka 
>>>> ---
>>>>  Makefile.target   |    2 +-
>>>>  hw/kvm/apic.c     |  154 
>>>> +
>>>>  hw/pc.c           |   15 --
>>>>  kvm.h             |    3 +
>>>>  target-i386/kvm.c |    8 +++
>>>>  5 files changed, 176 insertions(+), 6 deletions(-)
>>>>  create mode 100644 hw/kvm/apic.c
>>>>
>>>> diff --git a/Makefile.target b/Makefile.target
>>>> index b549988..76de485 100644
>>>> --- a/Makefile.target
>>>> +++ b/Makefile.target
>>>> @@ -236,7 +236,7 @@ obj-i386-y += vmport.o
>>>>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>>>>  obj-i386-y += debugcon.o multiboot.o
>>>>  obj-i386-y += pc_piix.o
>>>> -obj-i386-$(CONFIG_KVM) += kvm/clock.o
>>>> +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
>>>>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>>>>
>>>>  # shared objects
>>>> diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
>>>> new file mode 100644
>>>> index 000..3924f9e
>>>> --- /dev/null
>>>> +++ b/hw/kvm/apic.c
>>>> @@ -0,0 +1,154 @@
>>>> +/*
>>>> + * KVM in-kernel APIC support
>>>> + *
>>>> + * Copyright (c) 2011 Siemens AG
>>>> + *
>>>> + * Authors:
>>>> + *  Jan Kiszka          
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL version 2.
>>>> + * See the COPYING file in the top-level directory.
>>>> + */
>>>> +#include "hw/apic_internal.h"
>>>> +#include "kvm.h"
>>>> +
>>>> +static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
>>>> +                                   int reg_id, uint32_t val)
>>>> +{
>>>> +    *((uint32_t *)(kapic->regs + (reg_id << 4))) = val;
>>>> +}
>>>> +
>>>> +static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic,
>>>> +                                       int reg_id)
>>>> +{
>>>> +    return *((uint32_t *)(kapic->regs + (reg_id << 4)));
>>>> +}
>>>> +
>>>> +int kvm_put_apic(CPUState *env)
>>>> +{
>>>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);
>>>
>>> Please pass APICState instead of CPUState.
>>
>> DeviceState, I suppose. Yes, makes more sense, update will follow.
>
> On second look: no, I'll keep it as is. All kvm_get/put_* helpers have
> this kind of signature, i.e. are working against env.

There's kvm_get_supported_msrs for example.

> kvm_get/put_apic
> just happens to be implemented outside of target-i386/kvm.c. And they
> require both APIC and CPUState anyway, so it makes no difference.

It does, passing CPUState violates layering. Please split the
functions so that the ioctl calls which need CPUState go to kvm.c. For
example, the functions in kvm/apic.c could just perform copying from
kvm_lapic_state fields to APICstate fields and vice versa.

The KVM interface by the way does not look so clever. Why isn't there
just an array of 32 bit fields so the casts can be avoided? Perhaps
APICState should be (later) changed to match KVM version so that the
structure can be passed directly without copying.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] i8254: Rework & fix interaction with HPET in legacy mode

2011-12-10 Thread Blue Swirl

On Sat, Dec 10, 2011 at 12:28, Jan Kiszka  wrote:
> From: Jan Kiszka 
>
> When the HPET enters legacy mode, the IRQ output of the PIT is
> suppressed and replaced by the HPET timer 0. But the current code to
> emulate this was broken in many ways. It reset the PIT state after
> re-enabling, it worked against a stale static PIT structure, and it did
> not properly saved/restored the IRQ output mask in the PIT vmstate.
>
> This patch solves the PIT IRQ control in a different way. On x86, it
> both redirects the PIT IRQ to the HPET, just like the RTC. But it also
> keeps the control line from the HPET to the PIT. This allows to disable
> the PIT QEMU timer when it is not needed. The PIT's view on the control
> line state is now saved in the same format that qemu-kvm is already
> using.
>
> Note that, in contrast to the suppressed RTC IRQ line, we do not need to
> save/restore the PIT line state in the HPET. As we trigger a PIT IRQ
> update via the control line, the line state is reconstructed on mode
> switch.
>
> Signed-off-by: Jan Kiszka 
> ---
>  hw/alpha_dp264.c   |    2 +-
>  hw/hpet.c          |   38 +---
>  hw/hpet_emul.h     |    3 ++
>  hw/i8254.c         |   60 +--
>  hw/mips_fulong2e.c |    2 +-
>  hw/mips_jazz.c     |    2 +-
>  hw/mips_malta.c    |    2 +-
>  hw/mips_r4k.c      |    2 +-
>  hw/pc.c            |   13 --
>  hw/pc.h            |   13 +--
>  hw/ppc_prep.c      |    2 +-
>  11 files changed, 74 insertions(+), 65 deletions(-)
>
> diff --git a/hw/alpha_dp264.c b/hw/alpha_dp264.c
> index fcc20e9..412ccf0 100644
> --- a/hw/alpha_dp264.c
> +++ b/hw/alpha_dp264.c
> @@ -70,7 +70,7 @@ static void clipper_init(ram_addr_t ram_size,
>     pci_bus = typhoon_init(ram_size, &rtc_irq, cpus, clipper_pci_map_irq);
>
>     rtc_init(1980, rtc_irq);
> -    pit_init(0x40, 0);
> +    pit_init(0x40, isa_get_irq(0));
>     isa_create_simple("i8042");
>
>     /* VGA setup.  Don't bother loading the bios.  */
> diff --git a/hw/hpet.c b/hw/hpet.c
> index 1b64e6a..ace0b1d 100644
> --- a/hw/hpet.c
> +++ b/hw/hpet.c
> @@ -64,6 +64,7 @@ typedef struct HPETState {
>     qemu_irq irqs[HPET_NUM_IRQ_ROUTES];
>     uint32_t flags;
>     uint8_t rtc_irq_level;
> +    qemu_irq pit_enabled;
>     uint8_t num_timers;
>     HPETTimer timer[HPET_MAX_TIMERS];
>
> @@ -572,12 +573,15 @@ static void hpet_ram_write(void *opaque, 
> target_phys_addr_t addr,
>                     hpet_del_timer(&s->timer[i]);
>                 }
>             }
> -            /* i8254 and RTC are disabled when HPET is in legacy mode */
> +            /* i8254 and RTC output pins are disabled
> +             * when HPET is in legacy mode */
>             if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
> -                hpet_pit_disable();
> +                qemu_set_irq(s->pit_enabled, 0);
> +                qemu_irq_lower(s->irqs[0]);
>                 qemu_irq_lower(s->irqs[RTC_ISA_IRQ]);
>             } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
> -                hpet_pit_enable();
> +                qemu_irq_lower(s->irqs[0]);
> +                qemu_set_irq(s->pit_enabled, 1);
>                 qemu_set_irq(s->irqs[RTC_ISA_IRQ], s->rtc_irq_level);
>             }
>             break;
> @@ -631,7 +635,6 @@ static void hpet_reset(DeviceState *d)
>  {
>     HPETState *s = FROM_SYSBUS(HPETState, sysbus_from_qdev(d));
>     int i;
> -    static int count = 0;
>
>     for (i = 0; i < s->num_timers; i++) {
>         HPETTimer *timer = &s->timer[i];
> @@ -648,29 +651,27 @@ static void hpet_reset(DeviceState *d)
>         timer->wrap_flag = 0;
>     }
>
> +    qemu_set_irq(s->pit_enabled, 1);
>     s->hpet_counter = 0ULL;
>     s->hpet_offset = 0ULL;
>     s->config = 0ULL;
> -    if (count > 0) {
> -        /* we don't enable pit when hpet_reset is first called (by hpet_init)
> -         * because hpet is taking over for pit here. On subsequent 
> invocations,
> -         * hpet_reset is called due to system reset. At this point control 
> must
> -         * be returned to pit until SW reenables hpet.
> -         */
> -        hpet_pit_enable();
> -    }
>     hpet_cfg.hpet[s->hpet_id].event_timer_block_id = (uint32_t)s->capability;
>     hpet_cfg.hpet[s->hpet_id].address = sysbus_from_qdev(d)->mmio[0].addr;
> -    count = 1;
>  }
>
> -static void hpet_handle_rtc_irq(void *opaque, int n, int level)
> +static void hpet_handle_legacy_irq(void *opaque, int n, int level)
>  {
>     HPETState *s = FROM_SYSBUS(HPETState, opaque);
>
> -    s->rtc_irq_level = level;
> -    if (!hpet_in_legacy_mode(s)) {
> -        qemu_set_irq(s->irqs[RTC_ISA_IRQ], level);
> +    if (n == HPET_LEGACY_PIT_INT) {
> +        if (!hpet_in_legacy_mode(s)) {
> +            qemu_set_irq(s->irqs[0], level);
> +        }
> +    } else {
> +        s->rtc_irq_level = level;
> +        if (!hpet_in_legacy_mode(s)) {
> +            qemu_set_irq(s->irqs[RTC_ISA_IRQ], l

Re: [PATCH 0/2] pit/hpet: Fix legacy mode switching

2011-12-10 Thread Blue Swirl

On Sat, Dec 10, 2011 at 12:28, Jan Kiszka  wrote:
> This is a small preparatory series to allow the introduction of the KVM
> in-kernel PIT. Of course, it is also a fix for the various bugs in the
> related PIT/HPET code. See patches for details.
>
> Jan Kiszka (2):
>  hpet: Save/restore cached RTC IRQ level
>  i8254: Rework & fix interaction with HPET in legacy mode

I had one comment to this patch.

Otherwise nice cleanups, I think this logic matches real PIT/HPET
routing better.

>  hw/alpha_dp264.c   |    2 +-
>  hw/hpet.c          |   64 +--
>  hw/hpet_emul.h     |    3 ++
>  hw/i8254.c         |   60 +++-
>  hw/mips_fulong2e.c |    2 +-
>  hw/mips_jazz.c     |    2 +-
>  hw/mips_malta.c    |    2 +-
>  hw/mips_r4k.c      |    2 +-
>  hw/pc.c            |   13 --
>  hw/pc.h            |   13 +-
>  hw/ppc_prep.c      |    2 +-
>  11 files changed, 100 insertions(+), 65 deletions(-)
>
> --
> 1.7.3.4
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] i8254: Rework & fix interaction with HPET in legacy mode

2011-12-10 Thread Blue Swirl

On Sat, Dec 10, 2011 at 15:51, Jan Kiszka  wrote:
> On 2011-12-10 16:49, Blue Swirl wrote:
>>>
>>> +ISADevice *pit_init(int base, qemu_irq irq)
>>
>> Please retain this function in pc.h, or even better, introduce i8254.h.
>
> No concerns about i8254.h, but this function does not qualify for static
> inline.

The function is static inline in a header file not for performance
reasons, but to keep the instantiation separate from device internals.

>>
>>> +{
>>> +    ISADevice *dev;
>>> +
>>> +    dev = isa_create("isa-pit");
>>> +    qdev_prop_set_uint32(&dev->qdev, "iobase", base);
>>> +    qdev_init_nofail(&dev->qdev);
>>> +    qdev_connect_gpio_out(&dev->qdev, 0, irq);
>>> +
>>> +    return dev;
>>> +}
>>> +
>
> Jan
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 12/15] kvm: x86: Add user space part for in-kernel APIC

2011-12-10 Thread Blue Swirl

On Sat, Dec 10, 2011 at 15:58, Jan Kiszka  wrote:
> On 2011-12-10 16:40, Blue Swirl wrote:
>> On Fri, Dec 9, 2011 at 07:52, Jan Kiszka  wrote:
>>> On 2011-12-09 08:45, Jan Kiszka wrote:
>>>> On 2011-12-08 22:16, Blue Swirl wrote:
>>>>> On Thu, Dec 8, 2011 at 11:52, Jan Kiszka  wrote:
>>>>>> This introduces the alternative APIC backend which makes use of KVM's
>>>>>> in-kernel device model. External NMI injection via LINT1 is emulated by
>>>>>> checking the current state of the in-kernel APIC, only injecting a NMI
>>>>>> into the VCPU if LINT1 is unmasked and configured to DM_NMI.
>>>>>>
>>>>>> MSI is not yet supported, so we disable this when the in-kernel model is
>>>>>> in use.
>>>>>>
>>>>>> CC: Lai Jiangshan 
>>>>>> Signed-off-by: Jan Kiszka 
>>>>>> ---
>>>>>>  Makefile.target   |    2 +-
>>>>>>  hw/kvm/apic.c     |  154 
>>>>>> +
>>>>>>  hw/pc.c           |   15 --
>>>>>>  kvm.h             |    3 +
>>>>>>  target-i386/kvm.c |    8 +++
>>>>>>  5 files changed, 176 insertions(+), 6 deletions(-)
>>>>>>  create mode 100644 hw/kvm/apic.c
>>>>>>
>>>>>> diff --git a/Makefile.target b/Makefile.target
>>>>>> index b549988..76de485 100644
>>>>>> --- a/Makefile.target
>>>>>> +++ b/Makefile.target
>>>>>> @@ -236,7 +236,7 @@ obj-i386-y += vmport.o
>>>>>>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>>>>>>  obj-i386-y += debugcon.o multiboot.o
>>>>>>  obj-i386-y += pc_piix.o
>>>>>> -obj-i386-$(CONFIG_KVM) += kvm/clock.o
>>>>>> +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
>>>>>>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>>>>>>
>>>>>>  # shared objects
>>>>>> diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
>>>>>> new file mode 100644
>>>>>> index 000..3924f9e
>>>>>> --- /dev/null
>>>>>> +++ b/hw/kvm/apic.c
>>>>>> @@ -0,0 +1,154 @@
>>>>>> +/*
>>>>>> + * KVM in-kernel APIC support
>>>>>> + *
>>>>>> + * Copyright (c) 2011 Siemens AG
>>>>>> + *
>>>>>> + * Authors:
>>>>>> + *  Jan Kiszka          
>>>>>> + *
>>>>>> + * This work is licensed under the terms of the GNU GPL version 2.
>>>>>> + * See the COPYING file in the top-level directory.
>>>>>> + */
>>>>>> +#include "hw/apic_internal.h"
>>>>>> +#include "kvm.h"
>>>>>> +
>>>>>> +static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
>>>>>> +                                   int reg_id, uint32_t val)
>>>>>> +{
>>>>>> +    *((uint32_t *)(kapic->regs + (reg_id << 4))) = val;
>>>>>> +}
>>>>>> +
>>>>>> +static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic,
>>>>>> +                                       int reg_id)
>>>>>> +{
>>>>>> +    return *((uint32_t *)(kapic->regs + (reg_id << 4)));
>>>>>> +}
>>>>>> +
>>>>>> +int kvm_put_apic(CPUState *env)
>>>>>> +{
>>>>>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);
>>>>>
>>>>> Please pass APICState instead of CPUState.
>>>>
>>>> DeviceState, I suppose. Yes, makes more sense, update will follow.
>>>
>>> On second look: no, I'll keep it as is. All kvm_get/put_* helpers have
>>> this kind of signature, i.e. are working against env.
>>
>> There's kvm_get_supported_msrs for example.
>>
>>> kvm_get/put_apic
>>> just happens to be implemented outside of target-i386/kvm.c. And they
>>> require both APIC and CPUState anyway, so it makes no difference.
>>
>> It does, passing CPUState violates layering. Please split the
>> functions so that the ioctl calls which need CPUState go to kvm.c. For
>> example, the functions in kvm/apic.c could ju

Re: [PATCH 2/2] i8254: Rework & fix interaction with HPET in legacy mode

2011-12-10 Thread Blue Swirl

On Sat, Dec 10, 2011 at 16:03, Jan Kiszka  wrote:
> On 2011-12-10 16:54, Blue Swirl wrote:
>> On Sat, Dec 10, 2011 at 15:51, Jan Kiszka  wrote:
>>> On 2011-12-10 16:49, Blue Swirl wrote:
>>>>>
>>>>> +ISADevice *pit_init(int base, qemu_irq irq)
>>>>
>>>> Please retain this function in pc.h, or even better, introduce i8254.h.
>>>
>>> No concerns about i8254.h, but this function does not qualify for static
>>> inline.
>>
>> The function is static inline in a header file not for performance
>> reasons, but to keep the instantiation separate from device internals.
>
> Not performance, footprint and header dependencies. You need to pull in
> all the stuff the inline function needs for everyone including the
> header that contains this function. That's messy.

There's only ISA and qdev stuff, that's not messy since both are
needed in any case.

> Even if the instantiation helper should not poke into the device model
> internals (and I don't want this to change as well), it belongs to the
> module that implements the device. We do the same with other fabric
> functions.

In this case, the callers have the same needs and there are several of
them. In general this need not be true at all, if for example some
part of instantiation would have to be skipped, the functions may need
to be manually inlined to the board level anyway. The instantiation
definitely does not belong to the implementer but to the creator.
Ideally file implementing the device contains only static functions
and instantiation is either in a header file or at the board. This is
true for example for several Sparc32 devices.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] i8254: Rework & fix interaction with HPET in legacy mode

2011-12-10 Thread Blue Swirl

On Sat, Dec 10, 2011 at 16:29, Jan Kiszka  wrote:
> On 2011-12-10 17:26, Blue Swirl wrote:
>> On Sat, Dec 10, 2011 at 16:03, Jan Kiszka  wrote:
>>> On 2011-12-10 16:54, Blue Swirl wrote:
>>>> On Sat, Dec 10, 2011 at 15:51, Jan Kiszka  wrote:
>>>>> On 2011-12-10 16:49, Blue Swirl wrote:
>>>>>>>
>>>>>>> +ISADevice *pit_init(int base, qemu_irq irq)
>>>>>>
>>>>>> Please retain this function in pc.h, or even better, introduce i8254.h.
>>>>>
>>>>> No concerns about i8254.h, but this function does not qualify for static
>>>>> inline.
>>>>
>>>> The function is static inline in a header file not for performance
>>>> reasons, but to keep the instantiation separate from device internals.
>>>
>>> Not performance, footprint and header dependencies. You need to pull in
>>> all the stuff the inline function needs for everyone including the
>>> header that contains this function. That's messy.
>>
>> There's only ISA and qdev stuff, that's not messy since both are
>> needed in any case.
>>
>>> Even if the instantiation helper should not poke into the device model
>>> internals (and I don't want this to change as well), it belongs to the
>>> module that implements the device. We do the same with other fabric
>>> functions.
>>
>> In this case, the callers have the same needs and there are several of
>> them. In general this need not be true at all, if for example some
>> part of instantiation would have to be skipped, the functions may need
>> to be manually inlined to the board level anyway. The instantiation
>> definitely does not belong to the implementer but to the creator.
>> Ideally file implementing the device contains only static functions
>> and instantiation is either in a header file or at the board. This is
>> true for example for several Sparc32 devices.
>
> The helper is wrapping the property base API into a proper function call
> - nothing that is board-specific.

Not in this case, but in general boards could need to pass different
sets of properties or avoid passing something at all.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH qom-next 00/59] QOM CPUState, part 4: CPU_COMMON

2012-05-23 Thread Blue Swirl

On Wed, May 23, 2012 at 3:07 AM, Andreas Färber  wrote:
> Hello,
>
> This series, based on qom-next and the two pending ARM cleanup patches, starts
> moving fields from CPUArchState (CPU_COMMON) to QOM CPUState. It stops short
> of moving all easily possible fields (i.e., those not depending on 
> target_ulong
> or target_phys_addr_t) since the series got too long already and is expected 
> to
> spark some controversies due to collisions with several other series.
>
> The series is structured as preparatory refactorings interwoven with the 
> actual
> touch-all movement of one field ("cpu: Move ... to CPUState"), optionally
> followed by type signature cleanups, culminating in the movement of two fields
> that are tied together by VMState.
> Thus, unlike part 3, this series cannot randomly be cherry-picked to
> -next trees, only select parts thereof (e.g., use of cpu_s390x_init()).
>
> Please review and test.
>
> The use of cpu_index vs. cpuid_apic_id for x86 cpu[n] still needs some 
> thought.
>
> The question was brought up whether adding the CPUs a child properties
> should be generalized outside the machine scope - I don't think so, since CPU
> hotplug seems highly architecture-specific and not applicable everywhere 
> (SoCs).
>
> Blue will likely have a superb idea how to avoid the cpu_tlb_flush() 
> indirection
> that I needed for VMState, but apart from having been a lot of dumb typing, it
> works fine as interim solution. "Blah." wasn't terribly helpful as a comment.

Unfortunately I don't have superb ideas today (as if I had them any
other day...), only second rate jokes (as if they could be called
jokes...). With 'Blah' I obviously meant that I didn't have a solution
for that particular target_ulong/target_phys_addr_t problem. I'll try
to improve on all these areas, if you know what I mean.

>
> I have checked this to compile on ...
> * openSUSE 12.1 x86_64 w/KVM,
> * openSUSE Factory ppc w/KVM,
> * SLES 11 SP2 s390x w/KVM,
> * mingw32/64 cross-builds,
> * OpenBSD 5.1 amd64 (not for final version though, master doesn't build).
> Untested: Xen.
> Only some targets including i386 were lightly runtime-tested.
>
> Available for testing and cherry-picking (not pulling!) from:
> git://github.com/afaerber/qemu-cpu.git qom-cpu-common.v1
> https://github.com/afaerber/qemu-cpu/commits/qom-cpu-common.v1
>
> Regards,
> Andreas
>
> Cc: Anthony Liguori 
> Cc: Paolo Bonzini 
> Cc: Igor Mammedov 
>
> Cc: Richard Henderson 
> Cc: Peter Maydell 
> Cc: Edgar E. Iglesias 
> Cc: Michael Walle 
> Cc: Aurélien Jarno 
> Cc: Alexander Graf 
> Cc: David Gibson 
> Cc: qemu-ppc 
> Cc: Blue Swirl 
> Cc: Guan Xuetao 
> Cc: Max Filippov 
>
> Cc: Avi Kivity 
> Cc: Marcelo Tosatti 
> Cc: Jan Kiszka 
> Cc: kvm 
>
> Cc: Stefano Stabellini 
> Cc: xen-devel 
>
> Changes from preview in Igor's apic thread:
> * Use g_strdup_printf() for "cpu[x]" to be safe wrt length and nul 
> termination.
> * Clean up removal of x86 version 5 load/save support.
> * Convert use of env->halted in s390x KVM code.
> * Convert some uses of env->halted/interrupt_request in ppc KVM code.
> * Convert some uses of env->halted in Xen code, prepend cpu_x86_init() patch.
> * Avoid using POWERPC_CPU() / SPARC_CPU() macros inside *_set_irq() functions.
>
> Andreas Färber (59):
>  qemu-thread: Let qemu_thread_is_self() return bool
>  cpu: Move CPU_COMMON_THREAD into CPUState
>  cpu: Move thread field into CPUState
>  pc: Add CPU as /machine/cpu[n]
>  apic: Replace cpu_env pointer by X86CPU link
>  pc: Pass X86CPU to cpu_is_bsp()
>  cpu: Move thread_kicked to CPUState
>  Makefile.dis: Add include/ to include path
>  cpus: Pass CPUState to qemu_cpu_is_self()
>  cpus: Pass CPUState to qemu_cpu_kick_thread()
>  cpu: Move created field to CPUState
>  cpu: Move stop field to CPUState
>  ppce500_spin: Store PowerPCCPU in SpinKick
>  cpu: Move stopped field to CPUState
>  cpus: Pass CPUState to cpu_is_stopped()
>  cpus: Pass CPUState to cpu_can_run()
>  cpu: Move halt_cond to CPUState
>  cpus: Pass CPUState to qemu_tcg_cpu_thread_fn
>  cpus: Pass CPUState to qemu_tcg_init_vcpu()
>  ppc: Pass PowerPCCPU to ppc6xx_set_irq()
>  ppc: Pass PowerPCCPU to ppc970_set_irq()
>  ppc: Pass PowerPCCPU to power7_set_irq()
>  ppc: Pass PowerPCCPU to ppc40x_set_irq()
>  ppc: Pass PowerPCCPU to ppce500_set_irq()
>  sun4m: Pass SPARCCPU to cpu_set_irq()
>  sun4m: Pass SPARCCPU to cpu_kick_irq()
>  sun4u: Pass SPARCCPU to {,s,hs}tick_irq() and cpu_timer_create()
>  sun4u: Pass SPARCCPU to cpu_kick_irq()
>  target-ppc: Rename kvm_kick_{env => cpu} and pass PowerPCCPU
>

Re: [Qemu-devel] [PATCH] kvm: align ram_size to page boundary

2012-06-17 Thread Blue Swirl

On Sun, Jun 17, 2012 at 11:51 AM, Avi Kivity  wrote:
> On 06/17/2012 02:47 PM, Jan Kiszka wrote:

 I think this should rather go into generic code.
>>>
>>> To be honest, I put this in kvm-specific code because vl.c doesn't have
>>> TARGET_PAGE_ALIGN.  Maybe we should have machine->page_size or
>>> machine->ram_alignment.
>>>
 What sense does it make
 to have partial pages with TCG?
>>>
>>> Why impose an artificial restriction?
>>
>> Beca...
>>
>>>
>>> (answer: to reduce differences among various accelerators)
>>>
>>
>> Oh, you found the answer. :)
>
> Reducing round-trips across the Internet.
>
>>
>> At least, it should be enforce for the x86 target, independent of the
>> accelerator.
>
> Yeah.  So there's machine->page_size or machine->ram_alignment.  Not
> sure which is best.

The boards should make sure that the amount of RAM is feasible with
the board memory slots. It's not possible to put 256kb SIMMs to a slot
that expects 1GB DIMMs. We can allow some flexibility there though,
I'm not sure if the current chipsets would support very much memory if
we followed the docs to the letter.

Maybe strtosz() should just enforce 1MB granularity.

What about ballooning (memory hotplug?), can that reduce the memory by
smaller amount than page size?

>
> --
> error compiling committee.c: too many arguments to function
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: align ram_size to page boundary

2012-06-17 Thread Blue Swirl

On Sun, Jun 17, 2012 at 12:54 PM, Avi Kivity  wrote:
> On 06/17/2012 03:43 PM, Blue Swirl wrote:
>> On Sun, Jun 17, 2012 at 11:51 AM, Avi Kivity  wrote:
>>> On 06/17/2012 02:47 PM, Jan Kiszka wrote:
>>>>>>
>>>>>> I think this should rather go into generic code.
>>>>>
>>>>> To be honest, I put this in kvm-specific code because vl.c doesn't have
>>>>> TARGET_PAGE_ALIGN.  Maybe we should have machine->page_size or
>>>>> machine->ram_alignment.
>>>>>
>>>>>> What sense does it make
>>>>>> to have partial pages with TCG?
>>>>>
>>>>> Why impose an artificial restriction?
>>>>
>>>> Beca...
>>>>
>>>>>
>>>>> (answer: to reduce differences among various accelerators)
>>>>>
>>>>
>>>> Oh, you found the answer. :)
>>>
>>> Reducing round-trips across the Internet.
>>>
>>>>
>>>> At least, it should be enforce for the x86 target, independent of the
>>>> accelerator.
>>>
>>> Yeah.  So there's machine->page_size or machine->ram_alignment.  Not
>>> sure which is best.
>>
>> The boards should make sure that the amount of RAM is feasible with
>> the board memory slots. It's not possible to put 256kb SIMMs to a slot
>> that expects 1GB DIMMs. We can allow some flexibility there though,
>> I'm not sure if the current chipsets would support very much memory if
>> we followed the docs to the letter.
>
> Right. And generally memory modules are sized a power of two, creating
> the silly "mega == 1048576" movement.
>
>>
>> Maybe strtosz() should just enforce 1MB granularity.
>
> strtosz() is much too general.  We could do it in vl.c without trouble.
>  However, it takes away our ability to emulate a "640k should be enough
> for everyone" machine.

Then how about current max of target page sizes: 8k? No machine should
want less than that.

>
>>
>> What about ballooning (memory hotplug?), can that reduce the memory by
>> smaller amount than page size?
>
> Ballooning removes individual pages, that has no effect on the slot size.
>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] plan for device assignment upstream

2012-07-03 Thread Blue Swirl

On Mon, Jul 2, 2012 at 9:43 AM, Avi Kivity  wrote:
> On 07/02/2012 12:30 PM, Jan Kiszka wrote:
>> On 2012-07-02 11:18, Michael S. Tsirkin wrote:
>>> I've been thinking hard about Jan's patches for device
>>> assignment. Basically while I thought it makes sense
>>> to make all devices: assignment and not - behave the
>>> same and use same APIs for injecting irqs, Anthony thinks there is huge
>>> value in making irq propagation hierarchical and device assignment
>>> should be special cased.
>>
>> On the long term, we will need direct injection, ie. caching, to allow
>> making it lock-less. Stepping through all intermediate layers will cause
>> troubles, at least performance-wise, when having to take and drop a lock
>> at each stop.
>
> So we precalculate everything beforehand.  Instead of each qemu_irq
> triggering a callback, calculating the next hop and firing the next
> qemu_irq, configure each qemu_irq array with a function that describes
> how to take the next hop.  Whenever the configuration changes,
> recalculate all routes.

Yes, we had this discussion last year when I proposed the IRQ matrix:
http://lists.nongnu.org/archive/html/qemu-devel/2011-09/msg00474.html

One problem with the matrix is that it only works for enable/disable
level, not for more complex situations like boolean logic or
multiplexed outputs.

Perhaps the devices should describe the currently valid logic with
packet filter type mechanism? I think that could scale arbitrarily and
it could be more friendly even as a kernel interface?

>
> For device assignment or vhost, we can have a qemu_irq_irqfd() which
> converts a qemu_irq to an eventfd.  If the route calculations determine
> that it can be serviced via a real irqfd, they also configure it as an
> irqfd.  Otherwise qemu configures a poll on this eventfd and calls the
> callback when needed.
>
>
> --
> error compiling committee.c: too many arguments to function
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] file_ram_alloc(): coding style fixes

2012-07-03 Thread Blue Swirl

On Mon, Jul 2, 2012 at 6:06 PM, Eduardo Habkost  wrote:
> Cc: Blue Swirl 
> Signed-off-by: Eduardo Habkost 

Acked-by: Blue Swirl 

> ---
>  exec.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8244d54..c8bfd27 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2392,7 +2392,7 @@ static void *file_ram_alloc(RAMBlock *block,
>  unlink(filename);
>  free(filename);
>
> -memory = (memory+hpagesize-1) & ~(hpagesize-1);
> +memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
>
>  /*
>   * ftruncate is not supported by hugetlbfs in older
> @@ -2400,8 +2400,9 @@ static void *file_ram_alloc(RAMBlock *block,
>   * If anything goes wrong with it under other filesystems,
>   * mmap will fail.
>   */
> -if (ftruncate(fd, memory))
> +if (ftruncate(fd, memory)) {
>  perror("ftruncate");
> +}
>
>  #ifdef MAP_POPULATE
>  /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf()

2012-07-03 Thread Blue Swirl

On Mon, Jul 2, 2012 at 6:06 PM, Eduardo Habkost  wrote:
> Cc: Blue Swirl 
> Signed-off-by: Eduardo Habkost 

Acked-by: Blue Swirl 

> ---
>  exec.c |   14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index c8bfd27..d856325 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -24,6 +24,9 @@
>  #include 
>  #endif
>
> +#include 
> +#include 
> +
>  #include "qemu-common.h"
>  #include "cpu.h"
>  #include "tcg.h"
> @@ -2357,7 +2360,7 @@ static void *file_ram_alloc(RAMBlock *block,
>  ram_addr_t memory,
>  const char *path)
>  {
> -char *filename;
> +gchar *filename;
>  void *area;
>  int fd;
>  #ifdef MAP_POPULATE
> @@ -2379,18 +2382,15 @@ static void *file_ram_alloc(RAMBlock *block,
>  return NULL;
>  }
>
> -if (asprintf(&filename, "%s/qemu_back_mem.XX", path) == -1) {
> -return NULL;
> -}
> -
> +filename = g_strdup_printf("%s/qemu_back_mem.XX", path);
>  fd = mkstemp(filename);
>  if (fd < 0) {
>  perror("unable to create backing store for hugepages");
> -free(filename);
> +g_free(filename);
>  return NULL;
>  }
>  unlink(filename);
> -free(filename);
> +g_free(filename);
>
>  memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
>
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] plan for device assignment upstream

2012-07-05 Thread Blue Swirl

On Wed, Jul 4, 2012 at 8:05 AM, Avi Kivity  wrote:
> On 07/03/2012 10:06 PM, Blue Swirl wrote:
>> On Mon, Jul 2, 2012 at 9:43 AM, Avi Kivity  wrote:
>>> On 07/02/2012 12:30 PM, Jan Kiszka wrote:
>>>> On 2012-07-02 11:18, Michael S. Tsirkin wrote:
>>>>> I've been thinking hard about Jan's patches for device
>>>>> assignment. Basically while I thought it makes sense
>>>>> to make all devices: assignment and not - behave the
>>>>> same and use same APIs for injecting irqs, Anthony thinks there is huge
>>>>> value in making irq propagation hierarchical and device assignment
>>>>> should be special cased.
>>>>
>>>> On the long term, we will need direct injection, ie. caching, to allow
>>>> making it lock-less. Stepping through all intermediate layers will cause
>>>> troubles, at least performance-wise, when having to take and drop a lock
>>>> at each stop.
>>>
>>> So we precalculate everything beforehand.  Instead of each qemu_irq
>>> triggering a callback, calculating the next hop and firing the next
>>> qemu_irq, configure each qemu_irq array with a function that describes
>>> how to take the next hop.  Whenever the configuration changes,
>>> recalculate all routes.
>>
>> Yes, we had this discussion last year when I proposed the IRQ matrix:
>> http://lists.nongnu.org/archive/html/qemu-devel/2011-09/msg00474.html
>>
>> One problem with the matrix is that it only works for enable/disable
>> level, not for more complex situations like boolean logic or
>> multiplexed outputs.
>
> I think we do need to support inverters etc.
>
>> Perhaps the devices should describe the currently valid logic with
>> packet filter type mechanism? I think that could scale arbitrarily and
>> it could be more friendly even as a kernel interface?
>
> Interesting idea.  So qemu creates multiple eventfds, gives half to
> devices and half to kvm (as irqfds), and configures bpf programs that
> calculate the irqfd outputs from the vfio inputs.

I wasn't thinking of using fds, I guess that could work too but just
that the interface could be similar to packet filters. So a device
which implements an enable switch and ORs 8 inputs to a global output
could be implemented with:
context = rule_init();
context = append_rule(context, R_OR, 8, &irq_array[]);
context = append_rule(context, R_AND, 1, irq_enable);
send_to_kernel_or_master_irq_controller(context);

>
> At least for x86 this is overkill.  I would be okay with
> one-input-one-output cases handled with the current code and everything
> else routed through qemu.

If this is efficient, some of the internal logic inside devices (for
example PCI) could be implemented with the rules. Usually devices have
one or just a few IRQ outputs but several possible internal sources
for these.

>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 270 matches

Mail list logo