date:20130909

Re: [Qemu-devel] [PATCH V12 00/13] Add support for binding guest numa nodes to host numa nodes

2013-09-09 Thread Wanlong Gao

Hi folks,

Any comments? ;-P

Wanlong Gao

> As you know, QEMU can't direct it's memory allocation now, this may cause
> guest cross node access performance regression.
> And, the worse thing is that if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
> =>kvm_assign_device()
>   => kvm_iommu_map_memslots()
> => kvm_iommu_map_pages()
>=> kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
> 
> So, we should set the guest nodes memory allocation policy before
> the pages are really mapped.
> 
> According to this patch set, we are able to set guest nodes memory policy
> like following:
> 
>  -numa node,nodeid=0,cpus=0, \
>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>  -numa node,nodeid=1,cpus=1 \
>  -numa mem,size=1024M,policy=interleave,host-nodes=1
> 
> This supports 
> "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" 
> like format.
> 
> 
> Also add "set-mem-policy" QMP and hmp command to set memory policy.
> 
> And add a QMP command "query-numa" to show numa info through
> this API.
> 
> And convert the "info numa" monitor command to use this
> QMP command "query-numa".
> 
> 
> V1->V2:
> change to use QemuOpts in numa options (Paolo)
> handle Error in mpol parser (Paolo)
> change qmp command format to mem-policy=membind,mem-hostnode=0-1 like 
> (Paolo)
> V2->V3:
> also handle Error in cpus parser (5/10)
> split out common parser from cpus and hostnode parser (Bandan 6/10)
> V3-V4:
> rebase to request for comments
> V4->V5:
> use OptVisitor and split -numa option (Paolo)
>  - s/set-mpol/set-mem-policy (Andreas)
>  - s/mem-policy/policy
>  - s/mem-hostnode/host-nodes
> fix hmp command process after error (Luiz)
> add qmp command query-numa and convert info numa to it (Luiz)
> V5->V6:
> remove tabs in json file (Laszlo, Paolo)
> add back "-numa node,mem=xxx" as legacy (Paolo)
> change cpus and host-nodes to array (Laszlo, Eric)
> change "nodeid" to "uint16"
> add NumaMemPolicy enum type (Eric)
> rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for 
> repeating options" patch set, thanks for Laszlo's help
> V6-V7:
> change UInt16 to uint16 (Laszlo)
> fix a typo in adding qmp command set-mem-policy
> V7-V8:
> rebase to current master with Laszlo's V2 of OptsVisitor patch set
> fix an adding white space line error
> V8->V9:
> rebase to current master
> check if total numa memory size is equal to ram_size (Paolo)
> add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo)
> replace the use of numa_num_configured_nodes() (Andrew)
> avoid abusing the fact i==nodeid (Andrew)
> V9->V10:
> rebase to current master
> remove libnuma (Andrew)
> MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew)
> use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew)
> remove a useless clear_bit() operation (Andrew)
> V10->V11:
> rebase to current master
> fix "maxnode" argument of mbind(2)
> V11->V12:
> rebase to current master
> split patch 02/11 of V11 (Eduardo)
> add some max value check (Eduardo)
> split MAX_NODES change patch (Eduardo)
> 
> 
> *I hope this can catch up the train of 1.7.*
> 
> Thanks,
> Wanlong Gao
> 
> Wanlong Gao (13):
>   NUMA: move numa related code to new file numa.c
>   NUMA: check if the total numa memory size is equal to ram_size
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: convert -numa option to use OptsVisitor
>   NUMA: introduce NumaMemOptions
>   NUMA: add "-numa mem," options
>   NUMA: expand MAX_NODES from 64 to 128
>   NUMA: parse guest numa nodes memory policy
>   NUMA: set guest numa nodes memory policy
>   NUMA: add qmp command set-mem-policy to set memory policy for NUMA
> node
>   NUMA: add hmp command set-mem-policy
>   NUMA: add qmp command query-numa
>   NUMA: convert hmp command info_numa to use qmp command query_numa
> 
>  Makefile.target |   2 +-
>  cpus.c  |  14 --
>  hmp-commands.hx |  16 ++
>  hmp.c   | 119 +
>  hmp.h   |   2 +
>  hw/i386/pc.c|   4 +-
>  include/sysemu/cpus.h   |   1 -
>  include/sysemu/sysemu.h |  18 +-
>  monitor.c   |  21 +--
>  numa.c  | 460 
> 
>  qapi-schema.json| 133 ++
>  qemu-options.hx |   6 +-
>  qmp-commands.hx |  90 ++
>  vl.c| 160 ++---
>  14 files changed, 861 insertions(+), 185 deletions(-)
>  create mode 100644 numa.c
>

Re: [Qemu-devel] [PATCH 1/2] linux-headers: update for s390 floating interrupt controller

2013-09-09 Thread Jens Freimann

On Fri, Sep 06, 2013 at 01:23:32PM +0100, Peter Maydell wrote:
> On 6 September 2013 13:19, Jens Freimann  wrote:
> > Add symbols required for the s390 floating interrupt controller (flic)
> 
> Updates to linux-headers should be the result of a sync
> against a specified mainline kernel revision, please
> (otherwise this should be an RFC patchset).

ok, I understand.  I was not sure about that and only added a remark in 
the cover letter

regards
Jens
 
> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH 1/2] linux-headers: update for s390 floating interrupt controller

2013-09-09 Thread Jens Freimann

On Fri, Sep 06, 2013 at 01:32:52PM +0100, Peter Maydell wrote:
> On 6 September 2013 13:19, Jens Freimann  wrote:
> > @@ -839,6 +903,7 @@ struct kvm_device_attr {
> >  #define KVM_DEV_TYPE_FSL_MPIC_20   1
> >  #define KVM_DEV_TYPE_FSL_MPIC_42   2
> >  #define KVM_DEV_TYPE_XICS  3
> > +#define KVM_DEV_TYPE_FLIC  4
> 
> Christoffer's patchset switching the ARM VGIC to this
> list also uses 4 as its enumeration value:
> 
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-August/006822.html
> 
> That patchset isn't in yet, but maybe you should use
> 5 to avoid conflicts?

sure, will do

regards
Jens

> 
> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH V3 1/2] qemu: Adjust qemu wakeup

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 05:26, Liu, Jinsong ha scritto:
> From 6f40a66521e012170493964a2135fb3b4ae7c9b2 Mon Sep 17 00:00:00 2001
> From: Liu Jinsong 
> Date: Sun, 8 Sep 2013 00:33:19 +0800
> Subject: [PATCH V3 1/2] qemu: Adjust qemu wakeup
> 
> Currently Xen hvm s3 has a bug coming from the difference between
> qemu-traditioanl and qemu-xen. For qemu-traditional, the way to
> resume from hvm s3 is via 'xl trigger' command. However, for
> qemu-xen, the way to resume from hvm s3 inherited from standard
> qemu, i.e. via QMP, and it doesn't work under Xen.
> 
> The root cause is, for qemu-xen, 'xl trigger' command didn't reset
> devices, while QMP didn't unpause hvm domain though they did qemu
> system reset.
> 
> We have two qemu patches and one xl patch to fix Xen hvm s3 bug.
> This patch is the qemu patch 1. It adjusts qemu wakeup so that
> Xen s3 resume logic (which will be implemented at qemu patch 2)
> will be notified after qemu system reset.
> 
> Signed-off-by: Liu Jinsong 
> ---
>  hw/acpi/core.c  |3 ++-
>  include/sysemu/sysemu.h |4 +++-
>  vl.c|   15 +++
>  3 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/acpi/core.c b/hw/acpi/core.c
> index 7467b88..7138139 100644
> --- a/hw/acpi/core.c
> +++ b/hw/acpi/core.c
> @@ -324,12 +324,13 @@ static void acpi_notify_wakeup(Notifier *notifier, void 
> *data)
>  (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_TIMER_STATUS);
>  break;
>  case QEMU_WAKEUP_REASON_OTHER:
> -default:
>  /* ACPI_BITMASK_WAKE_STATUS should be set on resume.
> Pretend that resume was caused by power button */
>  ar->pm1.evt.sts |=
>  (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_POWER_BUTTON_STATUS);
>  break;
> +default:
> +break;
>  }
>  }
>  
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index b1aa059..50bc44d 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -39,9 +39,11 @@ int vm_stop(RunState state);
>  int vm_stop_force_state(RunState state);
>  
>  typedef enum WakeupReason {
> -QEMU_WAKEUP_REASON_OTHER = 0,
> +/* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
> +QEMU_WAKEUP_REASON_NONE = 0,
>  QEMU_WAKEUP_REASON_RTC,
>  QEMU_WAKEUP_REASON_PMTIMER,
> +QEMU_WAKEUP_REASON_OTHER,
>  } WakeupReason;
>  
>  void qemu_system_reset_request(void);
> diff --git a/vl.c b/vl.c
> index b4b119a..8c5f113 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1792,14 +1792,14 @@ static pid_t shutdown_pid;
>  static int powerdown_requested;
>  static int debug_requested;
>  static int suspend_requested;
> -static int wakeup_requested;
> +static WakeupReason wakeup_reason;
>  static NotifierList powerdown_notifiers =
>  NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
>  static NotifierList suspend_notifiers =
>  NOTIFIER_LIST_INITIALIZER(suspend_notifiers);
>  static NotifierList wakeup_notifiers =
>  NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
> -static uint32_t wakeup_reason_mask = ~0;
> +static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
>  static RunState vmstop_requested = RUN_STATE_MAX;
>  
>  int qemu_shutdown_requested_get(void)
> @@ -1849,11 +1849,9 @@ static int qemu_suspend_requested(void)
>  return r;
>  }
>  
> -static int qemu_wakeup_requested(void)
> +static WakeupReason qemu_wakeup_requested(void)
>  {
> -int r = wakeup_requested;
> -wakeup_requested = 0;
> -return r;
> +return wakeup_reason;
>  }
>  
>  static int qemu_powerdown_requested(void)
> @@ -1970,8 +1968,7 @@ void qemu_system_wakeup_request(WakeupReason reason)
>  return;
>  }
>  runstate_set(RUN_STATE_RUNNING);
> -notifier_list_notify(&wakeup_notifiers, &reason);
> -wakeup_requested = 1;
> +wakeup_reason = reason;
>  qemu_notify_event();
>  }
>  
> @@ -2063,6 +2060,8 @@ static bool main_loop_should_exit(void)
>  pause_all_vcpus();
>  cpu_synchronize_all_states();
>  qemu_system_reset(VMRESET_SILENT);
> +notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
> +wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>  resume_all_vcpus();
>  monitor_protocol_event(QEVENT_WAKEUP, NULL);
>  }
> 

Reviewed-by: Paolo Bonzini

Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements

2013-09-09 Thread Claudio Fontana

Hello Richard,

On 02.09.2013 19:54, Richard Henderson wrote:
> I'm not sure if I posted v2 or not, but my branch is named -3,
> therefore this is v3.  ;-)
> 
> The jumbo "fixme" patch from v1 has been split up.  This has been
> updated for the changes in the tlb helpers over the past few weeks.
> For the benefit of trivial conflict resolution, it's relative to a
> tree that contains basically all of my patches.
> 
> See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
> you find yourself missing any of the dependencies.
> 
> 
> r~
> Richard Henderson (29):
>   tcg-aarch64: Set ext based on TCG_OPF_64BIT
>   tcg-aarch64: Change all ext variables to bool
>   tcg-aarch64: Don't handle mov/movi in tcg_out_op
>   tcg-aarch64: Hoist common argument loads in tcg_out_op
>   tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
>   tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
>   tcg-aarch64: Introduce tcg_fmt_* functions
>   tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
>   tcg-aarch64: Implement mov with tcg_fmt_* functions
>   tcg-aarch64: Handle constant operands to add, sub, and compare
>   tcg-aarch64: Handle constant operands to and, or, xor
>   tcg-aarch64: Support andc, orc, eqv, not
>   tcg-aarch64: Handle zero as first argument to sub
>   tcg-aarch64: Support movcond
>   tcg-aarch64: Support deposit
>   tcg-aarch64: Support add2, sub2
>   tcg-aarch64: Support muluh, mulsh
>   tcg-aarch64: Support div, rem
>   tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
>   tcg-aarch64: Improve tcg_out_movi
>   tcg-aarch64: Avoid add with zero in tlb load
>   tcg-aarch64: Use adrp in tcg_out_movi
>   tcg-aarch64: Pass return address to load/store helpers directly.
>   tcg-aarch64: Use tcg_out_call for qemu_ld/st
>   tcg-aarch64: Use symbolic names for branches
>   tcg-aarch64: Implement tcg_register_jit
>   tcg-aarch64: Reuse FP and LR in translated code
>   tcg-aarch64: Introduce tcg_out_ldst_pair
>   tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check
> 
>  include/exec/exec-all.h  |   18 -
>  tcg/aarch64/tcg-target.c | 1276 
> ++
>  tcg/aarch64/tcg-target.h |   76 +--
>  3 files changed, 867 insertions(+), 503 deletions(-)
> 

after carefully reading and testing your patches, this is how I suggest to 
proceed: 

first do the implementation of the new functionality (tcg opcodes, jit) in a 
way that is consistent with the existing code.
No type changes, no refactoring, no beautification.

Once we agree on those, introduce the meaningful restructuring you want to do,
like the new INSN type, the "don't handle mov/movi in tcg_out_op", the 
TCG_OPF_64BIT thing, etc.

Last do the cosmetic stuff if you really want to do it, like the change all ext 
to bool (note that there is no point if the callers still use "1" and "0": 
adapt them as well) etc.

Right now the patchset is difficult to digest, given the regressions it 
introduces coupled with a mixing of functional changes, restructuring and 
cosmetics.

I think this will allow us to proceed quicker towards agreement.

Thanks,

Claudio

[Qemu-devel] [PATCH] ehci: Fix crash with isoc usb packets

2013-09-09 Thread Hans de Goede

The isoc packet path in the ehci code has a bad qobject cast, causing an
abort, this patch fixes this.

Note this problem is backported in 1.6.0 too, and this patch should be
backported to the 1.6.0 stable tree.

Signed-off-by: Hans de Goede 
---
 hw/usb/hcd-ehci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 010a0d0..77c4872 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1486,7 +1486,8 @@ static int ehci_process_itd(EHCIState *ehci,
 return -1;
 }
 
-qemu_sglist_init(&ehci->isgl, DEVICE(ehci), 2, ehci->as);
+qemu_sglist_init(&ehci->isgl, BUS(&ehci->bus)->parent,
+ 2, ehci->as);
 if (off + len > 4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
-- 
1.8.3.1

[Qemu-devel] [PATCH] ehci: save device pointer in EHCIState

2013-09-09 Thread Gerd Hoffmann

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci.c | 3 ++-
 hw/usb/hcd-ehci.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 137e200..162680c 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1486,7 +1486,7 @@ static int ehci_process_itd(EHCIState *ehci,
 return -1;
 }
 
-qemu_sglist_init(&ehci->isgl, DEVICE(ehci), 2, ehci->as);
+qemu_sglist_init(&ehci->isgl, ehci->device, 2, ehci->as);
 if (off + len > 4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
@@ -2529,6 +2529,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, 
Error **errp)
 
 s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s);
 s->async_bh = qemu_bh_new(ehci_frame_timer, s);
+s->device = dev;
 
 qemu_register_reset(ehci_reset, s);
 qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 15a28e8..065c9fa 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead;
 
 struct EHCIState {
 USBBus bus;
+DeviceState *device;
 qemu_irq irq;
 MemoryRegion mem;
 AddressSpace *as;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 08/09/2013 13:40, Gleb Natapov ha scritto:
> On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
>> On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
>> and not restore anything.
>>
> XRSTOR restores FP/SSE state to reset state if no bits are set in
> xstate_bv. This is what should happen on reset, no?

Yes. The problem happens on the migration destination when XSAVE data is
not transmitted.  FP/SSE data is transmitted and must be restored, but
xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
state.  The vcpu then loses the values that were set in the migration data.

>> Since FP and SSE data are always valid, set them in xstate_bv at reset
>> time.  In fact, that value is the same that KVM_GET_XSAVE returns on
>> pre-XSAVE hosts.
> It is needed for migration between non xsave host to xsave host.

Yes, and this patch does the same for migration between non-XSAVE QEMU
and XSAVE QEMU.

In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
xstate_bv when XSAVE is not available.  Instead, it should reset the
FXSAVE data to processor-reset values (except for MXCSR which always
comes from XRSTOR data), i.e. to all-zeros except for the x87 control
and tag words.  It should also check reserved bits of MXCSR.

>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  target-i386/cpu.c | 1 +
>>  target-i386/cpu.h | 5 +
>>  2 files changed, 6 insertions(+)
>>
>> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
>> index c36345e..ac83106 100644
>> --- a/target-i386/cpu.c
>> +++ b/target-i386/cpu.c
>> @@ -2386,6 +2386,7 @@ static void x86_cpu_reset(CPUState *s)
>>  env->fpuc = 0x37f;
>>  
>>  env->mxcsr = 0x1f80;
>> +env->xstate_bv = XSTATE_FP | XSTATE_SSE;
>>  
>>  env->pat = 0x0007040600070406ULL;
>>  env->msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
>> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
>> index 5723eff..a153078 100644
>> --- a/target-i386/cpu.h
>> +++ b/target-i386/cpu.h
>> @@ -380,6 +380,11 @@
>>  
>>  #define MSR_VM_HSAVE_PA 0xc0010117
>>  
>> +#define XSTATE_SUPPORTED(XSTATE_FP|XSTATE_SSE|XSTATE_YMM)
> Supported by whom? By QEMU? We should filer unsupported bits from CPUID.0D 
> then too.

Yes.  QEMU unmarshals information from the XSAVE region and back, so it
cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, though.

Paolo

> 
>> +#define XSTATE_FP   1
>> +#define XSTATE_SSE  2
>> +#define XSTATE_YMM  4
>> +
>>  /* CPUID feature words */
>>  typedef enum FeatureWord {
>>  FEAT_1_EDX, /* CPUID[1].EDX */
>> -- 
>> 1.8.3.1
>>
> 
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [Qemu-devel] savevm too slow

2013-09-09 Thread Kevin Wolf

Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben:
> >> the other question: when I change the buffer size #define IO_BUF_SIZE 32768
> >> to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
>  
> > Is this for cache=unsafe as well?
>  
> > Juan, any specific reason for using 32k? I think it would be better to
> > have a multiple of the qcow2 cluster size, otherwise we get COW for the
> > empty part of newly allocated clusters. If we can't make it dynamic,
> > using at least fixed 64k to match the qcow2 default would probably
> > improve things a bit.
>  
> with cache=writeback.  Is there any risk for setting cache=writeback with
> IO_BUF_SIZE 1M ?

No. Using a larger buffer size should be safe.

Kevin

> ━━━
> xuanmao_001
>  
> From: Kevin Wolf
> Date: 2013-09-06 18:38
> To: xuanmao_001
> CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
> Subject: Re: savevm too slow
> Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben:
> > Hi, qemuers:
> >  
> >
>  I found that the guest disk file cache mode will affect to the time of 
> savevm.
> >  
> > the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can,
> > less than 10 seconds.
> >  
> > here is the example I use virsh:
> > @cache with writeback:
> > #the first snapshot
> > real0m21.904s
> > user0m0.006s
> > sys 0m0.008s
> >  
> > #the secondary snapshot
> > real2m11.624s
> > user0m0.013s
> > sys 0m0.008s
> >  
> > @cache with unsafe:
> > #the first snapshot
> > real0m0.730s
> > user0m0.006s
> > sys 0m0.005s
> >  
> > #the secondary snapshot
> > real0m1.296s
> > user0m0.002s
> > sys 0m0.008s
>  
> I sent patches that should eliminate the difference between the first
> and second snapshot at least.
>  
> > so, what the difference between them when using different cache.
>  
> cache=unsafe ignores any flush requests. It's possible that there is
> potential for optimisation with cache=writeback, i.e. it sends flush
> requests that aren't necessary in fact. This is something that I haven't
> checked yet.
>  
> > the other question: when I change the buffer size #define IO_BUF_SIZE 32768
> > to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
>  
> Is this for cache=unsafe as well?
>  
> Juan, any specific reason for using 32k? I think it would be better to
> have a multiple of the qcow2 cluster size, otherwise we get COW for the
> empty part of newly allocated clusters. If we can't make it dynamic,
> using at least fixed 64k to match the qcow2 default would probably
> improve things a bit.
>  
> Kevin

Re: [Qemu-devel] savevm too slow

2013-09-09 Thread xuanmao_001

> I sent patches that should eliminate the difference between the first
> and second snapshot at least.

where I can find the patches that can eliminate the difference between the first
and second snapshot ? Does they fit qemu-kvm-1.0,1 ?




xuanmao_001

From: Kevin Wolf
Date: 2013-09-09 16:35
To: xuanmao_001
CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
Subject: Re: Re: savevm too slow
Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben:
> >> the other question: when I change the buffer size #define IO_BUF_SIZE 32768
> >> to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
>  
> > Is this for cache=unsafe as well?
>  
> > Juan, any specific reason for using 32k? I think it would be better to
> > have a multiple of the qcow2 cluster size, otherwise we get COW for the
> > empty part of newly allocated clusters. If we can't make it dynamic,
> > using at least fixed 64k to match the qcow2 default would probably
> > improve things a bit.
>  
> with cache=writeback.  Is there any risk for setting cache=writeback with
> IO_BUF_SIZE 1M ?

No. Using a larger buffer size should be safe.

Kevin

> ━━━
> xuanmao_001
>  
> From: Kevin Wolf
> Date: 2013-09-06 18:38
> To: xuanmao_001
> CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
> Subject: Re: savevm too slow
> Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben:
> > Hi, qemuers:
> >  
> >
>  I found that the guest disk file cache mode will affect to the time of 
> savevm.
> >  
> > the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can,
> > less than 10 seconds.
> >  
> > here is the example I use virsh:
> > @cache with writeback:
> > #the first snapshot
> > real0m21.904s
> > user0m0.006s
> > sys 0m0.008s
> >  
> > #the secondary snapshot
> > real2m11.624s
> > user0m0.013s
> > sys 0m0.008s
> >  
> > @cache with unsafe:
> > #the first snapshot
> > real0m0.730s
> > user0m0.006s
> > sys 0m0.005s
> >  
> > #the secondary snapshot
> > real0m1.296s
> > user0m0.002s
> > sys 0m0.008s
>  
> I sent patches that should eliminate the difference between the first
> and second snapshot at least.
>  
> > so, what the difference between them when using different cache.
>  
> cache=unsafe ignores any flush requests. It's possible that there is
> potential for optimisation with cache=writeback, i.e. it sends flush
> requests that aren't necessary in fact. This is something that I haven't
> checked yet.
>  
> > the other question: when I change the buffer size #define IO_BUF_SIZE 32768
> > to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
>  
> Is this for cache=unsafe as well?
>  
> Juan, any specific reason for using 32k? I think it would be better to
> have a multiple of the qcow2 cluster size, otherwise we get COW for the
> empty part of newly allocated clusters. If we can't make it dynamic,
> using at least fixed 64k to match the qcow2 default would probably
> improve things a bit.
>  
> Kevin

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Paolo Bonzini

Il 08/09/2013 13:52, Gleb Natapov ha scritto:
> On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
>> QEMU moves state from CPUArchState to struct kvm_xsave and back when it
>> invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
>> region as an opaque blob, it might be impossible to set some state on
>> the destination if migrating to an older version.
>>
>> This patch blocks migration if it finds that unsupported bits are set
>> in the XSTATE_BV header field.  To make this work robustly, QEMU should
>> only report in env->xstate_bv those fields that will actually end up
>> in the migration stream.
> 
> We usually handle host cpu differences in cpuid layer, not by trying to
> validate migration data.

Actually we do both.  QEMU for example detects invalid subsections and
blocks migration, and CPU differences also result in subsections that
the destination does not know.

But as far as QEMU is concerned, setting an unknown bit in XSTATE_BV is
not a CPU difference, it is simply invalid migration data.

> i.e CPUID.0D should be configurable and
> management should be able to query QEMU what is supported and prevent
> migration attempt accordingly.

Management is already able to query QEMU of what is supported, because
new XSAVE state is always attached to new CPUID bits in leaves other
than 0Dh (e.g. EAX=07h, ECX=0h returns AVX512 and MPX support in EBX).
QEMU should compute 0Dh data based on those bits indeed.

However, KVM_GET/SET_XSAVE should still return all values supported by
the hypervisor, independent of the supported CPUID bits.

>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  target-i386/kvm.c | 3 ++-
>>  target-i386/machine.c | 4 
>>  2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 749aa09..df08a4b 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -1291,7 +1291,8 @@ static int kvm_get_xsave(X86CPU *cpu)
>>  sizeof env->fpregs);
>>  memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
>>  sizeof env->xmm_regs);
>> -env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
>> +env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] &
>> +XSTATE_SUPPORTED;
> Don't we just drop state here that will not be restored on the
> destination and destination will not be able to tell since we masked
> unsupported bits?

A well-behaved guest should not have modified that state anyway, since:

* the source and destination machines should have the same CPU

* since the destination QEMU does not support the feature, the source
should have masked it as well

* the guest should always probe CPUID before using a feature

There will be only one change for well-behaved guests with this patch
(and the change will be invisible to them since they behave well).
After the patch, KVM_SET_XSAVE will set the extended states to the
processor-reset state instead of all-zeros.  However, all
currently-defined states have a processor-reset state that is equal to
all-zeroes, so this change is theoretical.

In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
and we should hide all features that are not visible in CPUID.  It is
okay, however, to test it in cpu_post_load.

Paolo

>>  memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
>>  sizeof env->ymmh_regs);
>>  return 0;
>> diff --git a/target-i386/machine.c b/target-i386/machine.c
>> index dc81cde..9e2cfcf 100644
>> --- a/target-i386/machine.c
>> +++ b/target-i386/machine.c
>> @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id)
>>  CPUX86State *env = &cpu->env;
>>  int i;
>>  
>> +if (env->xstate_bv & ~XSTATE_SUPPORTED) {
>> +return -EINVAL;
>> +}
>> + 
>>  /*
>>   * Real mode guest segments register DPL should be zero.
>>   * Older KVM version were setting it wrongly.
>> -- 
>> 1.8.3.1
> 
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
> Il 08/09/2013 13:40, Gleb Natapov ha scritto:
> > On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
> >> On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
> >> and not restore anything.
> >>
> > XRSTOR restores FP/SSE state to reset state if no bits are set in
> > xstate_bv. This is what should happen on reset, no?
> 
> Yes. The problem happens on the migration destination when XSAVE data is
> not transmitted.  FP/SSE data is transmitted and must be restored, but
> xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
> state.  The vcpu then loses the values that were set in the migration data.
> 
> >> Since FP and SSE data are always valid, set them in xstate_bv at reset
> >> time.  In fact, that value is the same that KVM_GET_XSAVE returns on
> >> pre-XSAVE hosts.
> > It is needed for migration between non xsave host to xsave host.
> 
> Yes, and this patch does the same for migration between non-XSAVE QEMU
> and XSAVE QEMU.
> 
Can such migration happen? The commit that added xsave support
(f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.

> In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
> xstate_bv when XSAVE is not available.  Instead, it should reset the
> FXSAVE data to processor-reset values (except for MXCSR which always
> comes from XRSTOR data), i.e. to all-zeros except for the x87 control
> and tag words.  It should also check reserved bits of MXCSR.
> 
I do not see why.

> >>
> >> Signed-off-by: Paolo Bonzini 
> >> ---
> >>  target-i386/cpu.c | 1 +
> >>  target-i386/cpu.h | 5 +
> >>  2 files changed, 6 insertions(+)
> >>
> >> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> >> index c36345e..ac83106 100644
> >> --- a/target-i386/cpu.c
> >> +++ b/target-i386/cpu.c
> >> @@ -2386,6 +2386,7 @@ static void x86_cpu_reset(CPUState *s)
> >>  env->fpuc = 0x37f;
> >>  
> >>  env->mxcsr = 0x1f80;
> >> +env->xstate_bv = XSTATE_FP | XSTATE_SSE;
> >>  
> >>  env->pat = 0x0007040600070406ULL;
> >>  env->msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
> >> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> >> index 5723eff..a153078 100644
> >> --- a/target-i386/cpu.h
> >> +++ b/target-i386/cpu.h
> >> @@ -380,6 +380,11 @@
> >>  
> >>  #define MSR_VM_HSAVE_PA 0xc0010117
> >>  
> >> +#define XSTATE_SUPPORTED  (XSTATE_FP|XSTATE_SSE|XSTATE_YMM)
> > Supported by whom? By QEMU? We should filer unsupported bits from CPUID.0D 
> > then too.
> 
> Yes.  QEMU unmarshals information from the XSAVE region and back, so it
> cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, though.
> 
IMO this is the main issue here, not separate bug. If we gonna let guest
use CPU state QEMU does not support we gonna have a bad time.

--
Gleb.

Re: [Qemu-devel] [PATCH] ehci: save device pointer in EHCIState

2013-09-09 Thread Hans de Goede


Hi,

On 09/09/2013 10:20 AM, Gerd Hoffmann wrote:

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495


Looks like we've been working on exactly the same bug at the
same time, but we've come up with slightly different solutions.

If we're going to go this way (which seems the best), you
should also modify the qemu_sglist_init call in
ehci_init_transfer for consistency, and drop the bus and qbus
variable declarations at the top of the functions as those will
be unused then.

Regards,

Hans

Re: [Qemu-devel] savevm too slow

2013-09-09 Thread Kevin Wolf

Am 09.09.2013 um 10:47 hat xuanmao_001 geschrieben:
> > I sent patches that should eliminate the difference between the first
> > and second snapshot at least.
>  
> where I can find the patches that can
> eliminate the difference between the first
> and second snapshot ? Does they fit qemu-kvm-1.0,1 ?

I sent them to you on Friday, the first email has the following subject
line:

[PATCH 0/2] qcow2: Discard VM state in active L1 after creating snapshot

This patch series is for current git master, and chances are that it
would work for qemu 1.6 as well. It will most likely not apply for qemu
1.0, which is almost two years old.

Kevin

> ━━━
> xuanmao_001
>  
> From: Kevin Wolf
> Date: 2013-09-09 16:35
> To: xuanmao_001
> CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
> Subject: Re: Re: savevm too slow
> Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben:
> > >> the other question: when I change the buffer size #
> define IO_BUF_SIZE 32768
> > >> to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
> >  
> > > Is this for cache=unsafe as well?
> >  
> > > Juan, any specific reason for using 32k? I think it would be better to
> > > have a multiple of the qcow2 cluster size, otherwise we get COW for the
> > > empty part of newly allocated clusters. If we can't make it dynamic,
> > > using at least fixed 64k to match the qcow2 default would probably
> > > improve things a bit.
> >  
> > with cache=writeback.  Is there any risk for setting cache=writeback with
> > IO_BUF_SIZE 1M ?
>  
> No. Using a larger buffer size should be safe.
>  
> Kevin
>  
> >
>  
> ━━━
> > xuanmao_001
> >  
> > From: Kevin Wolf
> > Date: 2013-09-06 18:38
> > To: xuanmao_001
> > CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
> > Subject: Re: savevm too slow
> > Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben:
> > > Hi, qemuers:
> > >  
> > >
> >
>   I found that the guest disk file cache mode will affect to the time of 
> savevm.
> > >  
> > >
>  the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can,
> > > less than 10 seconds.
> > >  
> > > here is the example I use virsh:
> > > @cache with writeback:
> > > #the first snapshot
> > > real0m21.904s
> > > user0m0.006s
> > > sys 0m0.008s
> > >  
> > > #the secondary snapshot
> > > real2m11.624s
> > > user0m0.013s
> > > sys 0m0.008s
> > >  
> > > @cache with unsafe:
> > > #the first snapshot
> > > real0m0.730s
> > > user0m0.006s
> > > sys 0m0.005s
> > >  
> > > #the secondary snapshot
> > > real0m1.296s
> > > user0m0.002s
> > > sys 0m0.008s
> >  
> > I sent patches that should eliminate the difference between the first
> > and second snapshot at least.
> >  
> > > so, what the difference between them when using different cache.
> >  
> > cache=unsafe ignores any flush requests. It's possible that there is
> > potential for optimisation with cache=writeback, i.e. it sends flush
> > requests that aren't necessary in fact. This is something that I haven't
> > checked yet.
> >  
> > > the other question: when I change the buffer size #define IO_BUF_SIZE 
> > > 32768
> > > to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
> >  
> > Is this for cache=unsafe as well?
> >  
> > Juan, any specific reason for using 32k? I think it would be better to
> > have a multiple of the qcow2 cluster size, otherwise we get COW for the
> > empty part of newly allocated clusters. If we can't make it dynamic,
> > using at least fixed 64k to match the qcow2 default would probably
> > improve things a bit.
> >  
> > Kevin

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote:
> Il 08/09/2013 13:52, Gleb Natapov ha scritto:
> > On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
> >> QEMU moves state from CPUArchState to struct kvm_xsave and back when it
> >> invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
> >> region as an opaque blob, it might be impossible to set some state on
> >> the destination if migrating to an older version.
> >>
> >> This patch blocks migration if it finds that unsupported bits are set
> >> in the XSTATE_BV header field.  To make this work robustly, QEMU should
> >> only report in env->xstate_bv those fields that will actually end up
> >> in the migration stream.
> > 
> > We usually handle host cpu differences in cpuid layer, not by trying to
> > validate migration data.
> 
> Actually we do both.  QEMU for example detects invalid subsections and
> blocks migration, and CPU differences also result in subsections that
> the destination does not know.
> 
That's different from what you do here though. If xstate_bv was in its
separate subsection things would be easier, but it is not.

> But as far as QEMU is concerned, setting an unknown bit in XSTATE_BV is
> not a CPU difference, it is simply invalid migration data.
> 
> > i.e CPUID.0D should be configurable and
> > management should be able to query QEMU what is supported and prevent
> > migration attempt accordingly.
> 
> Management is already able to query QEMU of what is supported, because
> new XSAVE state is always attached to new CPUID bits in leaves other
> than 0Dh (e.g. EAX=07h, ECX=0h returns AVX512 and MPX support in EBX).
> QEMU should compute 0Dh data based on those bits indeed.
If it is computable from other data even better, easier for us.

> 
> However, KVM_GET/SET_XSAVE should still return all values supported by
> the hypervisor, independent of the supported CPUID bits.
> 
Why?

> >>
> >> Signed-off-by: Paolo Bonzini 
> >> ---
> >>  target-i386/kvm.c | 3 ++-
> >>  target-i386/machine.c | 4 
> >>  2 files changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> >> index 749aa09..df08a4b 100644
> >> --- a/target-i386/kvm.c
> >> +++ b/target-i386/kvm.c
> >> @@ -1291,7 +1291,8 @@ static int kvm_get_xsave(X86CPU *cpu)
> >>  sizeof env->fpregs);
> >>  memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
> >>  sizeof env->xmm_regs);
> >> -env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
> >> +env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] &
> >> +XSTATE_SUPPORTED;
> > Don't we just drop state here that will not be restored on the
> > destination and destination will not be able to tell since we masked
> > unsupported bits?
> 
> A well-behaved guest should not have modified that state anyway, since:
> 
> * the source and destination machines should have the same CPU
> 
> * since the destination QEMU does not support the feature, the source
> should have masked it as well
> 
> * the guest should always probe CPUID before using a feature
> 
The I fail to see what is the purpose of the patch. I see two cases:
1. Each extended state has separate CPUID bit (is this guarantied?)
  - In this case, as you say, by matching CPUID on src and dst we guaranty
that migration data is good.
2. There is a state that is advertised in CPUID.0D, but does not have
   any regular "feature" CPUID associated with it.
 - In this case this patch will drop valid state that needs to be
   restored.

> There will be only one change for well-behaved guests with this patch
> (and the change will be invisible to them since they behave well).
> After the patch, KVM_SET_XSAVE will set the extended states to the
> processor-reset state instead of all-zeros.  However, all
> currently-defined states have a processor-reset state that is equal to
> all-zeroes, so this change is theoretical.
> 
> In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
> and we should hide all features that are not visible in CPUID.  It is
> okay, however, to test it in cpu_post_load.
The kernel should not even return state that is not visible in CPUID.

> 
> Paolo
> 
> >>  memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
> >>  sizeof env->ymmh_regs);
> >>  return 0;
> >> diff --git a/target-i386/machine.c b/target-i386/machine.c
> >> index dc81cde..9e2cfcf 100644
> >> --- a/target-i386/machine.c
> >> +++ b/target-i386/machine.c
> >> @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id)
> >>  CPUX86State *env = &cpu->env;
> >>  int i;
> >>  
> >> +if (env->xstate_bv & ~XSTATE_SUPPORTED) {
> >> +return -EINVAL;
> >> +}
> >> + 
> >>  /*
> >>   * Real mode guest segments register DPL should be zero.
> >>   * Older KVM version were setting it wrongly.
> >> -- 
> >> 1.8.3.1
> > 
> > --
> > Gleb.
> > --
> > To un

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Alexander Graf

On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote:

> On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
>> I think it's ok to restrict live migration to machines with the same
>> tb frequency when kvm is enabled. Whether you implement it through a
>> hardcoded 512Mhz or through a timebase value that gets live migrated
>> and then compared is up to you - both ways work for me :).
> 
> The latter might be handy if we want to support migration on 970, though
> we don't have the TBU40 stuff there so adjusting the TB in the host
> kernel would be ... problematic.

Well, we could save/restore TB when we enter/exit the guest, no? But I really 
only want to have something that doesn't block the door for someone who wants 
to enable things.

Alex

Re: [Qemu-devel] [PATCH] raw-win32.c: Fix incorrect handling behaviour of small block files

2013-09-09 Thread Kevin Wolf

Am 09.09.2013 um 11:14 hat Tal Kain geschrieben:
> It is a valid case that the read data's size is smaller than the
> requested size since there could be files that are smaller than
> the minimum block size (For ex. when a VMDK disk descriptor file)
> 
> Signed-off-by: Tal Kain 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Benjamin Herrenschmidt

On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
> I think it's ok to restrict live migration to machines with the same
> tb frequency when kvm is enabled. Whether you implement it through a
> hardcoded 512Mhz or through a timebase value that gets live migrated
> and then compared is up to you - both ways work for me :).

The latter might be handy if we want to support migration on 970, though
we don't have the TBU40 stuff there so adjusting the TB in the host
kernel would be ... problematic.

Cheers,
Ben.

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Benjamin Herrenschmidt

On Mon, 2013-09-09 at 11:32 +0200, Alexander Graf wrote:
> On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote:
> 
> > On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
> >> I think it's ok to restrict live migration to machines with the same
> >> tb frequency when kvm is enabled. Whether you implement it through a
> >> hardcoded 512Mhz or through a timebase value that gets live migrated
> >> and then compared is up to you - both ways work for me :).
> > 
> > The latter might be handy if we want to support migration on 970, though
> > we don't have the TBU40 stuff there so adjusting the TB in the host
> > kernel would be ... problematic.
> 
> Well, we could save/restore TB when we enter/exit the guest, no? 

Hard to do without introducing drift...

> But I really only want to have something that doesn't block the door for 
> someone who wants to enable things.
> 
> 
> Alex

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Alexander Graf


On 09.09.2013, at 11:38, Benjamin Herrenschmidt wrote:

> On Mon, 2013-09-09 at 11:32 +0200, Alexander Graf wrote:
>> On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote:
>> 
>>> On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
 I think it's ok to restrict live migration to machines with the same
 tb frequency when kvm is enabled. Whether you implement it through a
 hardcoded 512Mhz or through a timebase value that gets live migrated
 and then compared is up to you - both ways work for me :).
>>> 
>>> The latter might be handy if we want to support migration on 970, though
>>> we don't have the TBU40 stuff there so adjusting the TB in the host
>>> kernel would be ... problematic.
>> 
>> Well, we could save/restore TB when we enter/exit the guest, no? 
> 
> Hard to do without introducing drift...

We could probably quantify the drift through DEC. But I'm personally not too 
eager to worry about live migration on 970 :).


Alex

Re: [Qemu-devel] vfio for platform devices - 9/5/2012 - minutes

2013-09-09 Thread Sethi Varun-B16395

Hi Will,
Just trying to understand the scope of platform device assignment to guest on 
ARM. So, are the AMBA devices also represented in the device tree?

Regards
Varun

> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
> ow...@vger.kernel.org] On Behalf Of Will Deacon
> Sent: Monday, September 09, 2013 2:23 PM
> To: Yoder Stuart-B08248
> Cc: Sethi Varun-B16395; Wood Scott-B07421; Bhushan Bharat-R65777; 'Peter
> Maydell'; 'Santosh Shukla'; 'Alex Williamson'; 'Alexander Graf';
> 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org';
> kvm...@lists.cs.columbia.edu; kvm-...@vger.kernel.org; qemu-
> de...@nongnu.org
> Subject: Re: vfio for platform devices - 9/5/2012 - minutes
> 
> On Fri, Sep 06, 2013 at 07:20:19PM +0100, Yoder Stuart-B08248 wrote:
> > Adding Will...
> 
> [...]
> 
> > > I have a query about the ARM SMMU driver. In the ARM smmu driver I
> > > see, that bus notifiers are registered for both amba and platform
> > > bus. Amba is the I/O interconnect, right? Why is bus notifier
> > > required for the amba bus?
> 
> Not sure I follow the question, really. If you have a DMA master
> registered as an AMBA device (e.g. PL330) and it wants to use an SMMU,
> then you need to be registered on that bus.
> 
> What would you prefer instead?
> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 11:18, Gleb Natapov ha scritto:
> On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote:
>> Il 08/09/2013 13:52, Gleb Natapov ha scritto:
>>> On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
 QEMU moves state from CPUArchState to struct kvm_xsave and back when it
 invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
 region as an opaque blob, it might be impossible to set some state on
 the destination if migrating to an older version.

 This patch blocks migration if it finds that unsupported bits are set
 in the XSTATE_BV header field.  To make this work robustly, QEMU should
 only report in env->xstate_bv those fields that will actually end up
 in the migration stream.
>>>
>>> We usually handle host cpu differences in cpuid layer, not by trying to
>>> validate migration data.
>>
>> Actually we do both.  QEMU for example detects invalid subsections and
>> blocks migration, and CPU differences also result in subsections that
>> the destination does not know.
>>
> That's different from what you do here though. If xstate_bv was in its
> separate subsection things would be easier, but it is not.

I agree.  And also if YMM was in its separate subsections; future XSAVE
states will likely use subsections (whose presence is keyed off bits in
env->xstate_bv).

>> However, KVM_GET/SET_XSAVE should still return all values supported by
>> the hypervisor, independent of the supported CPUID bits.
>
> Why?

Because this is not talking to the guest, it is talking to userspace.

The VCPU state is more than what is visible to the guest, and returning
all of it seems more consistent with the rest of the KVM API.  For
example, KVM_GET_FPU always returns SSE state even if the CPUID lacks
SSE and/or FXSR.

>> A well-behaved guest should not have modified that state anyway, since:
>>
>> * the source and destination machines should have the same CPU
>>
>> * since the destination QEMU does not support the feature, the source
>> should have masked it as well
>>
>> * the guest should always probe CPUID before using a feature
>>
> The I fail to see what is the purpose of the patch. I see two cases:
> 1. Each extended state has separate CPUID bit (is this guarantied?)

Not guaranteed, but it has always happened so far (AVX, AVX-512, MPX).

>   - In this case, as you say, by matching CPUID on src and dst we guaranty
> that migration data is good.

But we don't match CPUID on src and destination.  This is something that
the user should do, but it's better if we can test it too.  Subsections
do that for us; I am, in some sense, emulating subsections for the XSAVE
states that are not stored in subsections.

>> In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
>> and we should hide all features that are not visible in CPUID.  It is
>> okay, however, to test it in cpu_post_load.
> 
> The kernel should not even return state that is not visible in CPUID.

That's an interesting point of view that I hadn't considered.  But just
like you asked me why it should return state that is not visible in
CPUID, I'm asking you why it should not...

Paolo

>>
>> Paolo
>>
  memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
  sizeof env->ymmh_regs);
  return 0;
 diff --git a/target-i386/machine.c b/target-i386/machine.c
 index dc81cde..9e2cfcf 100644
 --- a/target-i386/machine.c
 +++ b/target-i386/machine.c
 @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id)
  CPUX86State *env = &cpu->env;
  int i;
  
 +if (env->xstate_bv & ~XSTATE_SUPPORTED) {
 +return -EINVAL;
 +}
 + 
  /*
   * Real mode guest segments register DPL should be zero.
   * Older KVM version were setting it wrongly.
 -- 
 1.8.3.1
>>>
>>> --
>>> Gleb.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 11:03, Gleb Natapov ha scritto:
> On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
>> Il 08/09/2013 13:40, Gleb Natapov ha scritto:
>>> On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
 On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
 and not restore anything.

>>> XRSTOR restores FP/SSE state to reset state if no bits are set in
>>> xstate_bv. This is what should happen on reset, no?
>>
>> Yes. The problem happens on the migration destination when XSAVE data is
>> not transmitted.  FP/SSE data is transmitted and must be restored, but
>> xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
>> state.  The vcpu then loses the values that were set in the migration data.
>>
 Since FP and SSE data are always valid, set them in xstate_bv at reset
 time.  In fact, that value is the same that KVM_GET_XSAVE returns on
 pre-XSAVE hosts.
>>> It is needed for migration between non xsave host to xsave host.
>>
>> Yes, and this patch does the same for migration between non-XSAVE QEMU
>> and XSAVE QEMU.
>>
> Can such migration happen? The commit that added xsave support
> (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.

Yes, old->new migration can happen.  New->old of course cannot.

>> In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
>> xstate_bv when XSAVE is not available.  Instead, it should reset the
>> FXSAVE data to processor-reset values (except for MXCSR which always
>> comes from XRSTOR data), i.e. to all-zeros except for the x87 control
>> and tag words.  It should also check reserved bits of MXCSR.
>
> I do not see why.

Because otherwise it behaves in a subtly different manner for XSAVE and
non-XSAVE hosts.

>> Yes.  QEMU unmarshals information from the XSAVE region and back, so it
>> cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, though.
>>
> IMO this is the main issue here, not separate bug. If we gonna let guest
> use CPU state QEMU does not support we gonna have a bad time.

We cannot force the guest not to use a feature; all we can do is hide
the CPUID bits so that a well-behaved guest will not use it.  QEMU does
hide CPUID bits for non-supported XSAVE states, except for "-cpu host".
 So this will not be a problem except with "-cpu host".

Paolo

[Qemu-devel] [PATCH v2] ehci: save device pointer in EHCIState

2013-09-09 Thread Gerd Hoffmann

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci.c | 7 +++
 hw/usb/hcd-ehci.h | 1 +
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 137e200..22bdbf4 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1241,13 +1241,11 @@ static int ehci_init_transfer(EHCIPacket *p)
 {
 uint32_t cpage, offset, bytes, plen;
 dma_addr_t page;
-USBBus *bus = &p->queue->ehci->bus;
-BusState *qbus = BUS(bus);
 
 cpage  = get_field(p->qtd.token, QTD_TOKEN_CPAGE);
 bytes  = get_field(p->qtd.token, QTD_TOKEN_TBYTES);
 offset = p->qtd.bufptr[0] & ~QTD_BUFPTR_MASK;
-qemu_sglist_init(&p->sgl, qbus->parent, 5, p->queue->ehci->as);
+qemu_sglist_init(&p->sgl, p->queue->ehci->device, 5, p->queue->ehci->as);
 
 while (bytes > 0) {
 if (cpage > 4) {
@@ -1486,7 +1484,7 @@ static int ehci_process_itd(EHCIState *ehci,
 return -1;
 }
 
-qemu_sglist_init(&ehci->isgl, DEVICE(ehci), 2, ehci->as);
+qemu_sglist_init(&ehci->isgl, ehci->device, 2, ehci->as);
 if (off + len > 4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
@@ -2529,6 +2527,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, 
Error **errp)
 
 s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s);
 s->async_bh = qemu_bh_new(ehci_frame_timer, s);
+s->device = dev;
 
 qemu_register_reset(ehci_reset, s);
 qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 15a28e8..065c9fa 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead;
 
 struct EHCIState {
 USBBus bus;
+DeviceState *device;
 qemu_irq irq;
 MemoryRegion mem;
 AddressSpace *as;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] ehci: save device pointer in EHCIState

2013-09-09 Thread Gerd Hoffmann

On Mo, 2013-09-09 at 11:04 +0200, Hans de Goede wrote:
> Hi,
> 
> On 09/09/2013 10:20 AM, Gerd Hoffmann wrote:
> > We'll need a pointer to the actual pci/sysbus device,
> > stick a pointer to it into the EHCIState struct.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1005495
> 
> Looks like we've been working on exactly the same bug at the
> same time, but we've come up with slightly different solutions.

Yep, and then "git send-email" exactly the same minute ;)

> If we're going to go this way (which seems the best), you
> should also modify the qemu_sglist_init call in
> ehci_init_transfer for consistency, and drop the bus and qbus
> variable declarations at the top of the functions as those will
> be unused then.

Makes sense indeed.  Done, v2 sent.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] ehci: Fix crash with isoc usb packets

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 10:20, Hans de Goede ha scritto:
> The isoc packet path in the ehci code has a bad qobject cast, causing an
> abort, this patch fixes this.
> 
> Note this problem is backported in 1.6.0 too, and this patch should be
> backported to the 1.6.0 stable tree.
> 
> Signed-off-by: Hans de Goede 
> ---
>  hw/usb/hcd-ehci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> index 010a0d0..77c4872 100644
> --- a/hw/usb/hcd-ehci.c
> +++ b/hw/usb/hcd-ehci.c
> @@ -1486,7 +1486,8 @@ static int ehci_process_itd(EHCIState *ehci,
>  return -1;
>  }
>  
> -qemu_sglist_init(&ehci->isgl, DEVICE(ehci), 2, ehci->as);
> +qemu_sglist_init(&ehci->isgl, BUS(&ehci->bus)->parent,
> + 2, ehci->as);
>  if (off + len > 4096) {
>  /* transfer crosses page border */
>  uint32_t len2 = off + len - 4096;
> 

... then qemu-stable should be CCed.

Paolo

[Qemu-devel] [PATCH] block/qcow2: Use bdrv_truncate for size amend

2013-09-09 Thread Max Reitz

When amending the size option for a qcow2 image, use bdrv_truncate
instead of qcow2_truncate directly, since the latter will not adjust the
total_sectors count in the BDS structure (whereas the former will).

Signed-off-by: Max Reitz 
---
Depends on (follow-up to):
 - block/qcow2: Image file option amendment (series, v5)
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d29547b..259edfd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1997,7 +1997,7 @@ static int qcow2_amend_options(BlockDriverState *bs,
 }
 
 if (new_size) {
-ret = qcow2_truncate(bs, new_size);
+ret = bdrv_truncate(bs, new_size);
 if (ret < 0) {
 return ret;
 }
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist

2013-09-09 Thread Paolo Bonzini

Il 06/09/2013 20:41, Eduardo Otubo ha scritto:
> Hello,
> 
> Any chance to get this patch applied?
> 
> Thanks!

Paul, perhaps you can add yourself to MAINTAINERS and send a pull request?

Paolo

> On 09/04/2013 11:11 AM, Paul Moore wrote:
>> On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote:
>>> This was causing Qemu process to hang when using -sandbox on.
>>>
>>> Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175
>>>
>>> Signed-off-by: Eduardo Otubo 
>>
>> Works for me.
>>
>> Tested-by: Paul Moore 
>>
>>> ---
>>>   qemu-seccomp.c |1 +
>>>   1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
>>> index 37d38f8..69cee44 100644
>>> --- a/qemu-seccomp.c
>>> +++ b/qemu-seccomp.c
>>> @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall
>>> seccomp_whitelist[]
>>> = { { SCMP_SYS(getuid), 245 },
>>>   { SCMP_SYS(geteuid), 245 },
>>>   { SCMP_SYS(timer_create), 245 },
>>> +{ SCMP_SYS(times), 245 },
>>>   { SCMP_SYS(exit), 245 },
>>>   { SCMP_SYS(clock_gettime), 245 },
>>>   { SCMP_SYS(time), 245 },
>>
>

Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions

2013-09-09 Thread Peter Maydell

Ping!

thanks
-- PMM


On 25 August 2013 15:59, Peter Maydell  wrote:
> The bswap.h header includes a set of "legacy unaligned functions"
> that (since commit c732a52d3 at the beginning of this year) are
> just wrappers for underlying {ld,st} functions. The legacy
> functions aren't used in many places, so just replace all their
> uses with uses of the new-style {ld,st} functions; this lets us
> remove the legacy wrappers altogether.
>
> Since we know the {ld,st}* routines are definitely functions,
> we can in the process remove some casts which were left over
> from when the legacy unaligned functions were previously macros.
>
> The patchset is divided up by function being removed, rather
> than by which device/subsystem is being fixed; I think this way
> round is easier to review since you only have to keep one
> substitution in your head when reading a patch.
>
> Peter Maydell (9):
>   bswap.h: Remove cpu_to_le16wu()
>   bswap.h: Remove cpu_to_le32wu()
>   bswap.h: Remove le16_to_cpupu()
>   bswap.h: Remove le32_to_cpupu()
>   bswap.h: Remove be32_to_cpupu()
>   bswap.h: Remove cpu_to_be16wu()
>   bswap.h: Remove cpu_to_be32wu()
>   bswap.h: Remove cpu_to_be64wu()
>   bswap.h: Remove cpu_to_32wu()
>
>  block/qcow2-cluster.c |2 +-
>  hw/acpi/core.c|3 +--
>  hw/block/cdrom.c  |   10 +-
>  hw/display/vga_template.h |   14 --
>  hw/ide/atapi.c|   16 +++
>  hw/net/e1000.c|   22 +
>  hw/net/ne2000.c   |4 ++--
>  hw/pci/pcie_aer.c |4 ++--
>  include/hw/pci/pci.h  |8 
>  include/qemu/bswap.h  |   47 
> -
>  10 files changed, 40 insertions(+), 90 deletions(-)
>
> --
> 1.7.9.5

Re: [Qemu-devel] [RFC PATCH v2 1/6] make.rule: fix $(obj) to a real relative path

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 03:34, Fam Zheng ha scritto:
> On Fri, 09/06 20:19, Lluís Vilanova wrote:
>> Fam Zheng writes:
>> [...]
>>> Because $(obj) here is './block', instead of '../block'. This doesn't
>>> hurt compiling because we basically build all .o from top Makefile,
>>> before entering Makefile.target, but it will affact arriving per-object
>>> libs support.
>>
>> I'm curious. What's the reason to not use recursive make in QEMU?
>>
> I don't know the answer, Paolo?

It predates my involvement by a long time, so I don't know.

But my guess is that whenever directories are not present in the build
tree (e.g. i386-softmmu/hw) we have to create the Makefile in the
configure script.  Thus a heavily declarative Makefile style works better.

Paolo

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 11:50:03AM +0200, Paolo Bonzini wrote:
> Il 09/09/2013 11:18, Gleb Natapov ha scritto:
> > On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote:
> >> Il 08/09/2013 13:52, Gleb Natapov ha scritto:
> >>> On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
>  QEMU moves state from CPUArchState to struct kvm_xsave and back when it
>  invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
>  region as an opaque blob, it might be impossible to set some state on
>  the destination if migrating to an older version.
> 
>  This patch blocks migration if it finds that unsupported bits are set
>  in the XSTATE_BV header field.  To make this work robustly, QEMU should
>  only report in env->xstate_bv those fields that will actually end up
>  in the migration stream.
> >>>
> >>> We usually handle host cpu differences in cpuid layer, not by trying to
> >>> validate migration data.
> >>
> >> Actually we do both.  QEMU for example detects invalid subsections and
> >> blocks migration, and CPU differences also result in subsections that
> >> the destination does not know.
> >>
> > That's different from what you do here though. If xstate_bv was in its
> > separate subsection things would be easier, but it is not.
> 
> I agree.  And also if YMM was in its separate subsections; future XSAVE
> states will likely use subsections (whose presence is keyed off bits in
> env->xstate_bv).
> 
> >> However, KVM_GET/SET_XSAVE should still return all values supported by
> >> the hypervisor, independent of the supported CPUID bits.
> >
> > Why?
> 
> Because this is not talking to the guest, it is talking to userspace.
> 
> The VCPU state is more than what is visible to the guest, and returning
If a state does not affect guest in any way there is not reason to
migrate it.

> all of it seems more consistent with the rest of the KVM API.  For
> example, KVM_GET_FPU always returns SSE state even if the CPUID lacks
> SSE and/or FXSR.
There are counter examples too :) If APIC is not created we do not return
fake information on GET_IRQCHIP.  I think nobody expected FPU state to
grow indefinitely, so fixed, inflexible API was introduced, but now,
when CPU state has flexible extended state management it make sense to
model it in the API too.

> 
> >> A well-behaved guest should not have modified that state anyway, since:
> >>
> >> * the source and destination machines should have the same CPU
> >>
> >> * since the destination QEMU does not support the feature, the source
> >> should have masked it as well
> >>
> >> * the guest should always probe CPUID before using a feature
> >>
> > The I fail to see what is the purpose of the patch. I see two cases:
> > 1. Each extended state has separate CPUID bit (is this guarantied?)
> 
> Not guaranteed, but it has always happened so far (AVX, AVX-512, MPX).
> 
OK, So for now no need to make 0D configurable, but we need to provide
correct one according to those flags, not to passthrough host values.
 
> >   - In this case, as you say, by matching CPUID on src and dst we guaranty
> > that migration data is good.
> 
> But we don't match CPUID on src and destination.  This is something that
Yes, I was saying that management infrastructure already knows how to handle
it.

> the user should do, but it's better if we can test it too.  Subsections
> do that for us; I am, in some sense, emulating subsections for the XSAVE
> states that are not stored in subsections.
We do not do it for other bits. It is possible currently to migrate to a
slightly different cpu without failure and it may cause guest to crash,
but we are not trying actively to catch those situations. Why XSAVE is
different?

> 
> >> In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
> >> and we should hide all features that are not visible in CPUID.  It is
> >> okay, however, to test it in cpu_post_load.
> > 
> > The kernel should not even return state that is not visible in CPUID.
> 
> That's an interesting point of view that I hadn't considered.  But just
> like you asked me why it should return state that is not visible in
> CPUID, I'm asking you why it should not...
> 
For number of reasons. First because since a sate is not used there is no
point in migrating it. Second to make interface more deterministic for
QEMU. i.e QEMU configures only features it supports and gets
exactly same state from the kernel no matter what host cpu is and what
kernel version is. This patch will not be needed since kernel will do
the job.

--
Gleb.

Re: [Qemu-devel] [RFC PATCH v2 3/6] Makefile: introduce common-obj-m and block-obj-m for DSO

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 04:26, Fam Zheng ha scritto:
> Actually I would like a try to keep $i-{obj,cflags,libs}:
> 
> foo-obj := bar.o biz.o
> foo-libs := $(FOO_LIBS)
> foo-cflags := $(FOO_CFLAGS)
> 
> getting rid of all the $(obj)/'s and expand them in the toplevel unnest magic
> (because we know foo.mo is contained in local $(block-obj-y) so it's all
> trackable).
> 
> If this idea can't work out, I'll use your suggested way.

Yes, either way works (i.e. as long as it's consistent).

Paolo

Re: [Qemu-devel] [RFC PATCH v2 1/6] make.rule: fix $(obj) to a real relative path

2013-09-09 Thread Peter Maydell

On 6 September 2013 18:19, Lluís Vilanova  wrote:
> Fam Zheng writes:
> [...]
>> Because $(obj) here is './block', instead of '../block'. This doesn't
>> hurt compiling because we basically build all .o from top Makefile,
>> before entering Makefile.target, but it will affact arriving per-object
>> libs support.
>
> I'm curious. What's the reason to not use recursive make in QEMU?

The classic paper is "Recursive make considered harmful":
http://aegis.sourceforge.net/auug97.pdf

-- PMM

Re: [Qemu-devel] [PATCH v2] ehci: save device pointer in EHCIState

2013-09-09 Thread Hans de Goede


Hi,

On 09/09/2013 11:57 AM, Gerd Hoffmann wrote:

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495


Looks good, note you've forgotten to add qemu-stable,
I've done so now.

Acked-by: Hans de Goede 

Regards,

Hans



Signed-off-by: Gerd Hoffmann 
---
  hw/usb/hcd-ehci.c | 7 +++
  hw/usb/hcd-ehci.h | 1 +
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 137e200..22bdbf4 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1241,13 +1241,11 @@ static int ehci_init_transfer(EHCIPacket *p)
  {
  uint32_t cpage, offset, bytes, plen;
  dma_addr_t page;
-USBBus *bus = &p->queue->ehci->bus;
-BusState *qbus = BUS(bus);

  cpage  = get_field(p->qtd.token, QTD_TOKEN_CPAGE);
  bytes  = get_field(p->qtd.token, QTD_TOKEN_TBYTES);
  offset = p->qtd.bufptr[0] & ~QTD_BUFPTR_MASK;
-qemu_sglist_init(&p->sgl, qbus->parent, 5, p->queue->ehci->as);
+qemu_sglist_init(&p->sgl, p->queue->ehci->device, 5, p->queue->ehci->as);

  while (bytes > 0) {
  if (cpage > 4) {
@@ -1486,7 +1484,7 @@ static int ehci_process_itd(EHCIState *ehci,
  return -1;
  }

-qemu_sglist_init(&ehci->isgl, DEVICE(ehci), 2, ehci->as);
+qemu_sglist_init(&ehci->isgl, ehci->device, 2, ehci->as);
  if (off + len > 4096) {
  /* transfer crosses page border */
  uint32_t len2 = off + len - 4096;
@@ -2529,6 +2527,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, 
Error **errp)

  s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s);
  s->async_bh = qemu_bh_new(ehci_frame_timer, s);
+s->device = dev;

  qemu_register_reset(ehci_reset, s);
  qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 15a28e8..065c9fa 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead;

  struct EHCIState {
  USBBus bus;
+DeviceState *device;
  qemu_irq irq;
  MemoryRegion mem;
  AddressSpace *as;

[Qemu-devel] [PATCH] e1000: NetClientInfo.receive_iov implemented

2013-09-09 Thread Vincenzo Maffione

This patch implements the NetClientInfo.receive_iov method for the
e1000 device emulation. In this way a network backend that uses
qemu_sendv_packet() can deliver the fragmented packet without
requiring an additional copy in the frontend/backend network code
(nc_sendv_compat() function).

The existing method NetClientInfo.receive has been reimplemented
using the new method.

Signed-off-by: Vincenzo Maffione 
---
 hw/net/e1000.c | 70 --
 1 file changed, 58 insertions(+), 12 deletions(-)

I propose this patch also because our research group (University of Pisa,
Department of Computer Engineering) is working on the e1000 device
(optimizations and paravirtual extensions) and we have patches to
support the VALE switch as a network backend (see
http://info.iet.unipi.it/~luigi/vale/).
The VALE backend uses qemu_sendv_packet() to send fragmented packets: For
this reason we think it could be interesting to better support these packets
with e1000.

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index f5ebed4..70ab17f 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -32,6 +32,7 @@
 #include "hw/loader.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/dma.h"
+#include "qemu/iov.h"
 
 #include "e1000_regs.h"
 
@@ -64,6 +65,8 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 /* this is the size past which hardware will drop packets when setting LPE=1 */
 #define MAXIMUM_ETHERNET_LPE_SIZE 16384
 
+#define MAXIMUM_ETHERNET_HDR_LEN (14+4)
+
 /*
  * HW models:
  *  E1000_DEV_ID_82540EM works with Windows and Linux
@@ -825,7 +828,7 @@ static uint64_t rx_desc_base(E1000State *s)
 }
 
 static ssize_t
-e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
 {
 E1000State *s = qemu_get_nic_opaque(nc);
 PCIDevice *d = PCI_DEVICE(s);
@@ -834,11 +837,14 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 unsigned int n, rdt;
 uint32_t rdh_start;
 uint16_t vlan_special = 0;
-uint8_t vlan_status = 0, vlan_offset = 0;
+uint8_t vlan_status = 0;
 uint8_t min_buf[MIN_BUF_SIZE];
 size_t desc_offset;
 size_t desc_size;
 size_t total_size;
+size_t size = iov_size(iov, iovcnt), iov_ofs = 0;
+struct iovec iv;
+uint8_t *filter_buf = iov->iov_base;
 
 if (!(s->mac_reg[STATUS] & E1000_STATUS_LU)) {
 return -1;
@@ -850,10 +856,16 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 
 /* Pad to minimum Ethernet frame length */
 if (size < sizeof(min_buf)) {
-memcpy(min_buf, buf, size);
+iov_to_buf(iov, iovcnt, 0, min_buf, size);
 memset(&min_buf[size], 0, sizeof(min_buf) - size);
-buf = min_buf;
-size = sizeof(min_buf);
+iv.iov_base = filter_buf = min_buf;
+iv.iov_len = size = sizeof(min_buf);
+iovcnt = 1;
+iov = &iv;
+} else if (iov->iov_len < MAXIMUM_ETHERNET_HDR_LEN) {
+/* This is very unlikely, but may happen. */
+iov_to_buf(iov, iovcnt, 0, min_buf, MAXIMUM_ETHERNET_HDR_LEN);
+filter_buf = min_buf;
 }
 
 /* Discard oversized packets if !LPE and !SBP. */
@@ -864,14 +876,24 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 return size;
 }
 
-if (!receive_filter(s, buf, size))
+if (!receive_filter(s, filter_buf, size)) {
 return size;
+}
 
-if (vlan_enabled(s) && is_vlan_packet(s, buf)) {
-vlan_special = cpu_to_le16(be16_to_cpup((uint16_t *)(buf + 14)));
-memmove((uint8_t *)buf + 4, buf, 12);
+if (vlan_enabled(s) && is_vlan_packet(s, filter_buf)) {
+vlan_special = cpu_to_le16(be16_to_cpup((uint16_t *)(filter_buf
++ 14)));
+iov_ofs = 4;
+if (filter_buf == iov->iov_base) {
+memmove(filter_buf + 4, filter_buf, 12);
+} else {
+iov_from_buf(iov, iovcnt, 4, filter_buf, 12);
+while (iov->iov_len <= iov_ofs) {
+iov_ofs -= iov->iov_len;
+iov++;
+}
+}
 vlan_status = E1000_RXD_STAT_VP;
-vlan_offset = 4;
 size -= 4;
 }
 
@@ -893,12 +915,24 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
 if (desc.buffer_addr) {
 if (desc_offset < size) {
+size_t iov_copy, copied = 0;
+hwaddr ba = le64_to_cpu(desc.buffer_addr);
 size_t copy_size = size - desc_offset;
 if (copy_size > s->rxbuf_size) {
 copy_size = s->rxbuf_size;
 }
-pci_dma_write(d, le64_to_cpu(desc.buffer_addr),
-  buf + desc_offset + vlan_offset, copy_size);
+do {
+iov_copy =

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote:
> Il 09/09/2013 11:03, Gleb Natapov ha scritto:
> > On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
> >> Il 08/09/2013 13:40, Gleb Natapov ha scritto:
> >>> On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
>  On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
>  and not restore anything.
> 
> >>> XRSTOR restores FP/SSE state to reset state if no bits are set in
> >>> xstate_bv. This is what should happen on reset, no?
> >>
> >> Yes. The problem happens on the migration destination when XSAVE data is
> >> not transmitted.  FP/SSE data is transmitted and must be restored, but
> >> xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
> >> state.  The vcpu then loses the values that were set in the migration data.
> >>
>  Since FP and SSE data are always valid, set them in xstate_bv at reset
>  time.  In fact, that value is the same that KVM_GET_XSAVE returns on
>  pre-XSAVE hosts.
> >>> It is needed for migration between non xsave host to xsave host.
> >>
> >> Yes, and this patch does the same for migration between non-XSAVE QEMU
> >> and XSAVE QEMU.
> >>
> > Can such migration happen? The commit that added xsave support
> > (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.
> 
> Yes, old->new migration can happen.  New->old of course cannot.
> 
I see. I am fine with the patch, but please drop defines that are not
used in the patch itself.

> >> In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
> >> xstate_bv when XSAVE is not available.  Instead, it should reset the
> >> FXSAVE data to processor-reset values (except for MXCSR which always
> >> comes from XRSTOR data), i.e. to all-zeros except for the x87 control
> >> and tag words.  It should also check reserved bits of MXCSR.
> >
> > I do not see why.
> 
> Because otherwise it behaves in a subtly different manner for XSAVE and
> non-XSAVE hosts.
I do not see how. Can you elaborate?

> 
> >> Yes.  QEMU unmarshals information from the XSAVE region and back, so it
> >> cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
> >> though.
> >>
> > IMO this is the main issue here, not separate bug. If we gonna let guest
> > use CPU state QEMU does not support we gonna have a bad time.
> 
> We cannot force the guest not to use a feature; all we can do is hide
Of course we can't, this is correct for other features too, but this is
guest's problem.

> the CPUID bits so that a well-behaved guest will not use it.  QEMU does
> hide CPUID bits for non-supported XSAVE states, except for "-cpu host".
>  So this will not be a problem except with "-cpu host".
> 

--
Gleb.

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 01:54:50PM +0300, Gleb Natapov wrote:
> On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote:
> > Il 09/09/2013 11:03, Gleb Natapov ha scritto:
> > > On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
> > >> Il 08/09/2013 13:40, Gleb Natapov ha scritto:
> > >>> On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
> >  On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
> >  and not restore anything.
> > 
> > >>> XRSTOR restores FP/SSE state to reset state if no bits are set in
> > >>> xstate_bv. This is what should happen on reset, no?
> > >>
> > >> Yes. The problem happens on the migration destination when XSAVE data is
> > >> not transmitted.  FP/SSE data is transmitted and must be restored, but
> > >> xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
> > >> state.  The vcpu then loses the values that were set in the migration 
> > >> data.
> > >>
> >  Since FP and SSE data are always valid, set them in xstate_bv at reset
> >  time.  In fact, that value is the same that KVM_GET_XSAVE returns on
> >  pre-XSAVE hosts.
> > >>> It is needed for migration between non xsave host to xsave host.
> > >>
> > >> Yes, and this patch does the same for migration between non-XSAVE QEMU
> > >> and XSAVE QEMU.
> > >>
> > > Can such migration happen? The commit that added xsave support
> > > (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.
> > 
> > Yes, old->new migration can happen.  New->old of course cannot.
> > 
> I see. I am fine with the patch, but please drop defines that are not
> used in the patch itself.
> 
BTW migration question, will xstate_bv no be zeroed by migration code in
old->new case?
 
--
Gleb.

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 12:54, Gleb Natapov ha scritto:
> On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote:
>> Il 09/09/2013 11:03, Gleb Natapov ha scritto:
>>> On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
 Il 08/09/2013 13:40, Gleb Natapov ha scritto:
> On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
>> On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
>> and not restore anything.
>>
> XRSTOR restores FP/SSE state to reset state if no bits are set in
> xstate_bv. This is what should happen on reset, no?

 Yes. The problem happens on the migration destination when XSAVE data is
 not transmitted.  FP/SSE data is transmitted and must be restored, but
 xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
 state.  The vcpu then loses the values that were set in the migration data.

>> Since FP and SSE data are always valid, set them in xstate_bv at reset
>> time.  In fact, that value is the same that KVM_GET_XSAVE returns on
>> pre-XSAVE hosts.
> It is needed for migration between non xsave host to xsave host.

 Yes, and this patch does the same for migration between non-XSAVE QEMU
 and XSAVE QEMU.

>>> Can such migration happen? The commit that added xsave support
>>> (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.
>>
>> Yes, old->new migration can happen.  New->old of course cannot.
>>
> I see. I am fine with the patch, but please drop defines that are not
> used in the patch itself.

Ok.

(For the "BTW" question, xstate_bv will not be zeroed, it will remain to
the default value).

 In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
 xstate_bv when XSAVE is not available.  Instead, it should reset the
 FXSAVE data to processor-reset values (except for MXCSR which always
 comes from XRSTOR data), i.e. to all-zeros except for the x87 control
 and tag words.  It should also check reserved bits of MXCSR.
>>>
>>> I do not see why.
>>
>> Because otherwise it behaves in a subtly different manner for XSAVE and
>> non-XSAVE hosts.
> 
> I do not see how. Can you elaborate?

Suppose userspace calls KVM_SET_XSAVE with XSTATE_BV=0.

On an XSAVE host, when the guest FPU state is loaded KVM will do an
XRSTOR.  The XRSTOR will restore the FPU state to default values.

On a non-XSAVE host, when the guest FPU state is loaded KVM will do an
FXRSTR.  The FXRSTR will load the FPU state from the first 512 bytes of
the block that was passed to KVM_SET_XSAVE.

This is not a problem because userspace will usually pass to
KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and
KVM_GET_XSAVE will never set XSTATE_BV=0.  However, KVM_SET_XSAVE is
supposed to emulate XSAVE/XRSTOR if it is not available, and it is
failing to emulate this detail.

 Yes.  QEMU unmarshals information from the XSAVE region and back, so it
 cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
 though.

>>> IMO this is the main issue here, not separate bug. If we gonna let guest
>>> use CPU state QEMU does not support we gonna have a bad time.
>>
>> We cannot force the guest not to use a feature; all we can do is hide
> 
> Of course we can't, this is correct for other features too, but this is
> guest's problem.

Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
always be "fresh" as long as the guest obeys CPUID bits it receives, and
the CPUID bits that QEMU passes will never enable XSAVE data that QEMU
does not support.

Paolo

Re: [Qemu-devel] Questions about hvm domU default devices with upstream qemu

2013-09-09 Thread Stefano Stabellini

On Thu, 5 Sep 2013, Fabio Fantoni wrote:
> Il 05/09/2013 16:04, Stefano Stabellini ha scritto:
> > On Thu, 5 Sep 2013, Fabio Fantoni wrote:
> > > > > About qemu default empty and unusable floppy and cdrom I have already
> > > > > reply at
> > > > > start of year that qemu traditional does not have them.
> > > > > The tests were with xl only if I remember good, now I checked also on
> > > > > old
> > > > > production server with xend and qemu traditional. I confirm that hvm
> > > > > domUs
> > > > > is
> > > > > without empty floppy and cdrom also in that case.
> > > > > 
> > > > > cd-insert works (tested with xl and upstream qemu). cd-insert does not
> > > > > works
> > > > > only with qemu default empty cdrom because is undefined on xen side.
> > > > So HVM domUs do NOT have a cdrom drive by default with qemu-traditional.
> > > Exact.
> > > 
> > > > Do they have an empty cdrom drive by default with qemu-xen?
> > > Yes
> > > 
> > > > Is the lack of a cdrom drive the reason why cd-insert doesn't work by
> > > > default (unless you specify an empty cdrom drive in the VM config file)?
> > > Yes.
> > > xl cd-insert command works with cdrom devices already present on domU
> > > defined
> > > on xl file configuration.
> > > The cdrom can be empty or not, the only exception is the empty cdrom added
> > > default by qemu that is not usable with cd-insert because not defined on
> > > xl
> > > file configuration.
> > I see. That should be fixed, by either removing the empty cdrom drive or
> > making it work properly with xl cd-insert.
> > I don't have a strong opinion on this. Given that qemu-traditional
> > doesn't have a cdrom drive by default, it might make sense to do the
> > same on qemu-xen.
> 
> I think the best choice is to remove the qemu default empty cdrom, if possible
> without another workaround as for floppy but considering to change the method
> hvm domUs starts qemu.
> I think that is probably better to add on qemu start command only components
> already defined xen side in order to avoid creation of unusable components
> added by qemu itself or mismatches on both sides.
> 
> Another actual problem is that is difficult to know if the qemu binary that
> should be used on xl create has the features/components requested and if they
> are compiled. Actually on xl create we can have only qemu log file after domU
> create failing.
> Why not expose all features/components of a given qemu binary and let xl or
> other toolstack (not only xen) query them before start qemu itself?

You are right, it would be nice to be able to know exactly what the qemu
binary supports. However it's difficult to do because Linux distros are
free to choose any QEMU version they like and any set of compile time
options they want for qemu-xen.
As a consequence we would actually need to start QEMU and issue QMP
commands to figure out what is supported.  It would slow down VM
creation too much, so we would probably need to do this at host boot
time and store the settings somewhere on disk or on xenstore.

[Qemu-devel] [PATCH RFC v2 0/3] pci: complete master abort protocol

2013-09-09 Thread Marcel Apfelbaum

Note: The series is incomplete, for review only

PCI spec requires that a transaction that has not been claimed
by any PCI bus devices will be terminated by the initiator
with "master abort". For read transactions -1() is returned and 
writes are silently dropped.

The series deals also with the other aspect of the master abort scenario:
Upon completion the master has to raise RECEIVED MASTER ABORT BIT in
initiator's STATUS register.

Implementation:
 - Allowed the MemoryRegion priority to be negative so a subregion will be
   visible on all the addresses not covered by the parent MemoryRegion
   or other subregions.
 - Added a memory region with negative priority that extends over all the
   pci address space. This region catches all the accesses
   to the unassigned pci addresses.
 - The MemoryRegion's ops emulates the master abort scenario.

Note:
For the moment the code assumes that all the reads/writes to
pci address space are done by the cpu.

Changes from v1:
 - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - "pci-unassgined-mem" does not have a ".valid.accept" field and
implements read write methods

Marcel Apfelbaum (3):
  memory: allow MemoryRegion's priority field to accept negative values
  hw/pci: add MemoryRegion ops for unassigned pci addresses
  hw/pci-host: catch acesses to unassigned pci addresses

 hw/pci-host/piix.c|  8 
 hw/pci-host/q35.c | 19 ---
 hw/pci/pci.c  | 18 ++
 include/exec/memory.h |  6 +++---
 include/hw/pci-host/q35.h |  1 +
 include/hw/pci/pci.h  |  3 +++
 memory.c  |  2 +-
 7 files changed, 50 insertions(+), 7 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

Priority is used to make visible some subregions by obscuring
the parent MemoryRegion addresses overlapping with the subregion.

By allowing the priority to be negative the opposite can be done:
Allow a subregion to be visible on all the addresses not covered
by the parent MemoryRegion or other subregions.

Signed-off-by: Marcel Apfelbaum 
---
 include/exec/memory.h | 6 +++---
 memory.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ebe0d24..6995087 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -153,7 +153,7 @@ struct MemoryRegion {
 bool flush_coalesced_mmio;
 MemoryRegion *alias;
 hwaddr alias_offset;
-unsigned priority;
+int priority;
 bool may_overlap;
 QTAILQ_HEAD(subregions, MemoryRegion) subregions;
 QTAILQ_ENTRY(MemoryRegion) subregions_link;
@@ -193,7 +193,7 @@ struct MemoryListener {
 void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
*section,
hwaddr addr, hwaddr len);
 /* Lower = earlier (during add), later (during del) */
-unsigned priority;
+int priority;
 AddressSpace *address_space_filter;
 QTAILQ_ENTRY(MemoryListener) link;
 };
@@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority);
+ int priority);
 
 /**
  * memory_region_get_ram_addr: Get the ram address associated with a memory
diff --git a/memory.c b/memory.c
index 5a10fd0..984a3dc 100644
--- a/memory.c
+++ b/memory.c
@@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority)
+ int priority)
 {
 subregion->may_overlap = true;
 subregion->priority = priority;
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

Created a MemoryRegion with negative priority that
spans over all the pci address space.
It "intercepts" the accesses to unassigned pci
address space and will follow the pci spec:
 1. returns -1 on read
 2. does nothing on write
 3. sets the RECEIVED MASTER ABORT bit in the STATUS register
of the device that initiated the transaction

Note: This implementation assumes that all the reads/writes to
the pci address space are done by the cpu.

Signed-off-by: Marcel Apfelbaum 
---
Changes from v1:
 - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - "pci-unassgined-mem" does not have a ".valid.accept" field and
implements read write methods

 hw/pci/pci.c | 46 ++
 include/hw/pci/pci_bus.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d00682e..b6a8026 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
 return rootbus->qbus.name;
 }
 
+static void unassigned_mem_access(PCIBus *bus)
+{
+/* FIXME assumption: memory access to the pci address
+ * space is always initiated by the host bridge
+ * (device 0 on the bus) */
+PCIDevice *d = bus->devices[0];
+if (!d) {
+return;
+}
+
+pci_word_test_and_set_mask(d->config + PCI_STATUS,
+   PCI_STATUS_REC_MASTER_ABORT);
+}
+
+static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+
+return -1ULL;
+}
+
+static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+}
+
+static const MemoryRegionOps unassigned_mem_ops = {
+.read = unassigned_mem_read,
+.write = unassigned_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+#define UNASSIGNED_MEM_PRIORITY -1
+
 static void pci_bus_init(PCIBus *bus, DeviceState *parent,
  const char *name,
  MemoryRegion *address_space_mem,
@@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 bus->address_space_mem = address_space_mem;
 bus->address_space_io = address_space_io;
 
+
+memory_region_init_io(&bus->unassigned_mem, OBJECT(bus),
+  &unassigned_mem_ops, bus, "pci-unassigned",
+  memory_region_size(bus->address_space_mem));
+memory_region_add_subregion_overlap(bus->address_space_mem,
+bus->address_space_mem->addr,
+&bus->unassigned_mem,
+UNASSIGNED_MEM_PRIORITY);
+
 /* host bridge */
 QLIST_INIT(&bus->child);
 
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 9df1788..4cc25a3 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -23,6 +23,7 @@ struct PCIBus {
 PCIDevice *parent_dev;
 MemoryRegion *address_space_mem;
 MemoryRegion *address_space_io;
+MemoryRegion unassigned_mem;
 
 QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
 QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v3 0/2] pci: complete master abort protocol

2013-09-09 Thread Marcel Apfelbaum

Note: The series is incomplete, for review only

PCI spec requires that a transaction that has not been claimed
by any PCI bus devices will be terminated by the initiator
with "master abort". For read transactions -1() is returned and 
writes are silently dropped.

The series deals also with the other aspect of the master abort scenario:
Upon completion the master has to raise RECEIVED MASTER ABORT BIT in
initiator's STATUS register.

Implementation:
 - Allowed the MemoryRegion priority to be negative so a subregion will be
   visible on all the addresses not covered by the parent MemoryRegion
   or other subregions.
 - Added a memory region with negative priority that extends over all the
   pci address space. This region catches all the accesses
   to the unassigned pci addresses.
 - The MemoryRegion's ops emulates the master abort scenario.

Note:
For the moment the code assumes that all the reads/writes to
pci address space are done by the cpu.

Changes from v2:
 - minor: changed nr of patches int the title
 - minor: modified series list

Changes from v1:
 - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - "pci-unassgined-mem" does not have a ".valid.accept" field and
implements read write methods


Marcel Apfelbaum (2):
  memory: allow MemoryRegion's priority field to accept negative values
  hw/pci: handle unassigned pci addresses

 hw/pci/pci.c | 46 ++
 include/exec/memory.h|  6 +++---
 include/hw/pci/pci_bus.h |  1 +
 memory.c |  2 +-
 4 files changed, 51 insertions(+), 4 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

Priority is used to make visible some subregions by obscuring
the parent MemoryRegion addresses overlapping with the subregion.

By allowing the priority to be negative the opposite can be done:
Allow a subregion to be visible on all the addresses not covered
by the parent MemoryRegion or other subregions.

Signed-off-by: Marcel Apfelbaum 
---
 include/exec/memory.h | 6 +++---
 memory.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ebe0d24..6995087 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -153,7 +153,7 @@ struct MemoryRegion {
 bool flush_coalesced_mmio;
 MemoryRegion *alias;
 hwaddr alias_offset;
-unsigned priority;
+int priority;
 bool may_overlap;
 QTAILQ_HEAD(subregions, MemoryRegion) subregions;
 QTAILQ_ENTRY(MemoryRegion) subregions_link;
@@ -193,7 +193,7 @@ struct MemoryListener {
 void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
*section,
hwaddr addr, hwaddr len);
 /* Lower = earlier (during add), later (during del) */
-unsigned priority;
+int priority;
 AddressSpace *address_space_filter;
 QTAILQ_ENTRY(MemoryListener) link;
 };
@@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority);
+ int priority);
 
 /**
  * memory_region_get_ram_addr: Get the ram address associated with a memory
diff --git a/memory.c b/memory.c
index 5a10fd0..984a3dc 100644
--- a/memory.c
+++ b/memory.c
@@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority)
+ int priority)
 {
 subregion->may_overlap = true;
 subregion->priority = priority;
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v3 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

Created a MemoryRegion with negative priority that
spans over all the pci address space.
It "intercepts" the accesses to unassigned pci
address space and will follow the pci spec:
 1. returns -1 on read
 2. does nothing on write
 3. sets the RECEIVED MASTER ABORT bit in the STATUS register
of the device that initiated the transaction

Note: This implementation assumes that all the reads/writes to
the pci address space are done by the cpu.

Signed-off-by: Marcel Apfelbaum 
---
Changes from v1:
 - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - "pci-unassgined-mem" does not have a ".valid.accept" field and
implements read write methods

 hw/pci/pci.c | 46 ++
 include/hw/pci/pci_bus.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d00682e..b6a8026 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
 return rootbus->qbus.name;
 }
 
+static void unassigned_mem_access(PCIBus *bus)
+{
+/* FIXME assumption: memory access to the pci address
+ * space is always initiated by the host bridge
+ * (device 0 on the bus) */
+PCIDevice *d = bus->devices[0];
+if (!d) {
+return;
+}
+
+pci_word_test_and_set_mask(d->config + PCI_STATUS,
+   PCI_STATUS_REC_MASTER_ABORT);
+}
+
+static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+
+return -1ULL;
+}
+
+static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+}
+
+static const MemoryRegionOps unassigned_mem_ops = {
+.read = unassigned_mem_read,
+.write = unassigned_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+#define UNASSIGNED_MEM_PRIORITY -1
+
 static void pci_bus_init(PCIBus *bus, DeviceState *parent,
  const char *name,
  MemoryRegion *address_space_mem,
@@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 bus->address_space_mem = address_space_mem;
 bus->address_space_io = address_space_io;
 
+
+memory_region_init_io(&bus->unassigned_mem, OBJECT(bus),
+  &unassigned_mem_ops, bus, "pci-unassigned",
+  memory_region_size(bus->address_space_mem));
+memory_region_add_subregion_overlap(bus->address_space_mem,
+bus->address_space_mem->addr,
+&bus->unassigned_mem,
+UNASSIGNED_MEM_PRIORITY);
+
 /* host bridge */
 QLIST_INIT(&bus->child);
 
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 9df1788..4cc25a3 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -23,6 +23,7 @@ struct PCIBus {
 PCIDevice *parent_dev;
 MemoryRegion *address_space_mem;
 MemoryRegion *address_space_io;
+MemoryRegion unassigned_mem;
 
 QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
 QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 01:07:37PM +0200, Paolo Bonzini wrote:
>  In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
>  xstate_bv when XSAVE is not available.  Instead, it should reset the
>  FXSAVE data to processor-reset values (except for MXCSR which always
>  comes from XRSTOR data), i.e. to all-zeros except for the x87 control
>  and tag words.  It should also check reserved bits of MXCSR.
> >>>
> >>> I do not see why.
> >>
> >> Because otherwise it behaves in a subtly different manner for XSAVE and
> >> non-XSAVE hosts.
> > 
> > I do not see how. Can you elaborate?
> 
> Suppose userspace calls KVM_SET_XSAVE with XSTATE_BV=0.
> 
> On an XSAVE host, when the guest FPU state is loaded KVM will do an
> XRSTOR.  The XRSTOR will restore the FPU state to default values.
> 
> On a non-XSAVE host, when the guest FPU state is loaded KVM will do an
> FXRSTR.  The FXRSTR will load the FPU state from the first 512 bytes of
> the block that was passed to KVM_SET_XSAVE.
> 
> This is not a problem because userspace will usually pass to
> KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and
> KVM_GET_XSAVE will never set XSTATE_BV=0.  However, KVM_SET_XSAVE is
> supposed to emulate XSAVE/XRSTOR if it is not available, and it is
> failing to emulate this detail.
> 
You are trying to be bug to bug compatible :) XSTATE_BV can be zero only
if FPU state is reset one, otherwise the guest will not survive. KVM_SET_XSAVE
is not suppose to emulate XSAVE/XRSTOR, it is not emulator function. It
is better to outlaw zero value for XSTATE_BV at all, but we cannot do it
because current QEMU uses it.

>  Yes.  QEMU unmarshals information from the XSAVE region and back, so it
>  cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
>  though.
> 
> >>> IMO this is the main issue here, not separate bug. If we gonna let guest
> >>> use CPU state QEMU does not support we gonna have a bad time.
> >>
> >> We cannot force the guest not to use a feature; all we can do is hide
> > 
> > Of course we can't, this is correct for other features too, but this is
> > guest's problem.
> 
> Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
Which problem exactly. The problems I see is that 1. We do not support
MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D
data is not consistent with features. Guest may not expect it and do stupid
things.

> always be "fresh" as long as the guest obeys CPUID bits it receives, and
> the CPUID bits that QEMU passes will never enable XSAVE data that QEMU
> does not support.
> 

--
Gleb.

Re: [Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Peter Maydell

On 9 September 2013 12:11, Marcel Apfelbaum  wrote:
> Priority is used to make visible some subregions by obscuring
> the parent MemoryRegion addresses overlapping with the subregion.
>
> By allowing the priority to be negative the opposite can be done:
> Allow a subregion to be visible on all the addresses not covered
> by the parent MemoryRegion or other subregions.
>
> Signed-off-by: Marcel Apfelbaum 
> ---
>  include/exec/memory.h | 6 +++---
>  memory.c  | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index ebe0d24..6995087 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -153,7 +153,7 @@ struct MemoryRegion {
>  bool flush_coalesced_mmio;
>  MemoryRegion *alias;
>  hwaddr alias_offset;
> -unsigned priority;
> +int priority;
>  bool may_overlap;
>  QTAILQ_HEAD(subregions, MemoryRegion) subregions;
>  QTAILQ_ENTRY(MemoryRegion) subregions_link;
> @@ -193,7 +193,7 @@ struct MemoryListener {
>  void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
> *section,
> hwaddr addr, hwaddr len);
>  /* Lower = earlier (during add), later (during del) */
> -unsigned priority;
> +int priority;

You haven't addressed any of the points I made reviewing
the first version of this. Please don't just ignore
code review, or people will stop reviewing your code.

I think it would also be nice to update docs/memory.txt
to say explicitly that priority is signed and why this
is useful, something like this:

begin
Priority values are signed, and the default value is
zero. This means that you can use
memory_region_add_subregion_overlap() both to specify
a region that must sit 'above' any others (with a
positive priority) and also a background region that
sits 'below' others (with a negative priority).
endit

(in the 'Overlapping regions and priority' section
of that document).

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
> Created a MemoryRegion with negative priority that
> spans over all the pci address space.
> It "intercepts" the accesses to unassigned pci
> address space and will follow the pci spec:
>  1. returns -1 on read
>  2. does nothing on write
>  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
> of the device that initiated the transaction
> 
> Note: This implementation assumes that all the reads/writes to
> the pci address space are done by the cpu.
> 
> Signed-off-by: Marcel Apfelbaum 
> ---
> Changes from v1:
>  - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
> various Host Bridges
>  - "pci-unassgined-mem" does not have a ".valid.accept" field and
> implements read write methods
> 
>  hw/pci/pci.c | 46 ++
>  include/hw/pci/pci_bus.h |  1 +
>  2 files changed, 47 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index d00682e..b6a8026 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
>  return rootbus->qbus.name;
>  }
>  
> +static void unassigned_mem_access(PCIBus *bus)
> +{
> +/* FIXME assumption: memory access to the pci address
> + * space is always initiated by the host bridge

/* Always
 * like
 * this
 */

/* Not
 * like this */

> + * (device 0 on the bus) */

Can't we pass the master device in?
We are still left with the assumption that
there's a single master, but at least
we get rid of the assumption that it's
always device 0 function 0.


> +PCIDevice *d = bus->devices[0];
> +if (!d) {
> +return;
> +}
> +
> +pci_word_test_and_set_mask(d->config + PCI_STATUS,
> +   PCI_STATUS_REC_MASTER_ABORT);

Probably should check and set secondary status for
a bridge device.

> +}
> +
> +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +PCIBus *bus = opaque;
> +unassigned_mem_access(bus);
> +
> +return -1ULL;
> +}
> +
> +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
> +unsigned size)
> +{
> +PCIBus *bus = opaque;
> +unassigned_mem_access(bus);
> +}
> +
> +static const MemoryRegionOps unassigned_mem_ops = {
> +.read = unassigned_mem_read,
> +.write = unassigned_mem_write,
> +.endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +#define UNASSIGNED_MEM_PRIORITY -1
> +
>  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
>   const char *name,
>   MemoryRegion *address_space_mem,


I would rename "unassigned" to "pci_master_Abort_" everywhere.

Call things by what they do not how they are used.

> @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
> *parent,
>  bus->address_space_mem = address_space_mem;
>  bus->address_space_io = address_space_io;
>  
> +
> +memory_region_init_io(&bus->unassigned_mem, OBJECT(bus),
> +  &unassigned_mem_ops, bus, "pci-unassigned",
> +  memory_region_size(bus->address_space_mem));
> +memory_region_add_subregion_overlap(bus->address_space_mem,
> +bus->address_space_mem->addr,
> +&bus->unassigned_mem,
> +UNASSIGNED_MEM_PRIORITY);
> +
>  /* host bridge */
>  QLIST_INIT(&bus->child);
>

It seems safer to add an API for enabling this functionality.
Something like

pci_set_master_for_master_abort(PCIBus *, PCIDevice *);

  
> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> index 9df1788..4cc25a3 100644
> --- a/include/hw/pci/pci_bus.h
> +++ b/include/hw/pci/pci_bus.h
> @@ -23,6 +23,7 @@ struct PCIBus {
>  PCIDevice *parent_dev;
>  MemoryRegion *address_space_mem;
>  MemoryRegion *address_space_io;
> +MemoryRegion unassigned_mem;
>  
>  QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
>  QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
> -- 
> 1.8.3.1

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
> Priority is used to make visible some subregions by obscuring
> the parent MemoryRegion addresses overlapping with the subregion.
> 
> By allowing the priority to be negative the opposite can be done:
> Allow a subregion to be visible on all the addresses not covered
> by the parent MemoryRegion or other subregions.
> 
> Signed-off-by: Marcel Apfelbaum 

Seems harmless enough.

Reviewed-by: Michael S. Tsirkin 


> ---
>  include/exec/memory.h | 6 +++---
>  memory.c  | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index ebe0d24..6995087 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -153,7 +153,7 @@ struct MemoryRegion {
>  bool flush_coalesced_mmio;
>  MemoryRegion *alias;
>  hwaddr alias_offset;
> -unsigned priority;
> +int priority;
>  bool may_overlap;
>  QTAILQ_HEAD(subregions, MemoryRegion) subregions;
>  QTAILQ_ENTRY(MemoryRegion) subregions_link;
> @@ -193,7 +193,7 @@ struct MemoryListener {
>  void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
> *section,
> hwaddr addr, hwaddr len);
>  /* Lower = earlier (during add), later (during del) */
> -unsigned priority;
> +int priority;
>  AddressSpace *address_space_filter;
>  QTAILQ_ENTRY(MemoryListener) link;
>  };
> @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
>  void memory_region_add_subregion_overlap(MemoryRegion *mr,
>   hwaddr offset,
>   MemoryRegion *subregion,
> - unsigned priority);
> + int priority);
>  
>  /**
>   * memory_region_get_ram_addr: Get the ram address associated with a memory
> diff --git a/memory.c b/memory.c
> index 5a10fd0..984a3dc 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
>  void memory_region_add_subregion_overlap(MemoryRegion *mr,
>   hwaddr offset,
>   MemoryRegion *subregion,
> - unsigned priority)
> + int priority)
>  {
>  subregion->may_overlap = true;
>  subregion->priority = priority;
> -- 
> 1.8.3.1

Re: [Qemu-devel] virtio with Windows 8.

2013-09-09 Thread Stefan Hajnoczi

On Mon, Sep 09, 2013 at 12:02:57AM -0500, Yaodong Yang wrote:
> 1. I create a raw image named as win8.img, using the following command:
> /usr/local/kvm/bin/qemu-img create -f raw win8.img 20G
> 
> 2. I try to install win8 with the following command, but I failed several 
> times.
> 
> sudo /usr/local/kvm/bin/qemu-system-x86_64 -enable-kvm -drive 
> file=./win8.img,if=virtio,cache=none -cdrom ./win8.iso -boot d -m 2048.
> 
> 3. I try the same command but with if=ide, it works. 
> 
> Could someone tell me the reason for this failure? 

The Windows installer will not detect virtio-blk disks since Windows
does not come with virtio drivers by default.

You must provide the virtio-win drivers ISO during the installation:

https://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/

Watch for the screen where Windows prompts you for a driver disk and put
in the virtio-win ISO.

Stefan

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Peter Maydell

On 9 September 2013 12:41, Michael S. Tsirkin  wrote:
> On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
>> Priority is used to make visible some subregions by obscuring
>> the parent MemoryRegion addresses overlapping with the subregion.
>>
>> By allowing the priority to be negative the opposite can be done:
>> Allow a subregion to be visible on all the addresses not covered
>> by the parent MemoryRegion or other subregions.
>>
>> Signed-off-by: Marcel Apfelbaum 
>
> Seems harmless enough.
>
> Reviewed-by: Michael S. Tsirkin 

No, the idea is good but this version is just broken.
See the comments I made on the previous version which
Marcel ignored :-(

-- PMM

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 12:45:12PM +0100, Peter Maydell wrote:
> On 9 September 2013 12:41, Michael S. Tsirkin  wrote:
> > On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
> >> Priority is used to make visible some subregions by obscuring
> >> the parent MemoryRegion addresses overlapping with the subregion.
> >>
> >> By allowing the priority to be negative the opposite can be done:
> >> Allow a subregion to be visible on all the addresses not covered
> >> by the parent MemoryRegion or other subregions.
> >>
> >> Signed-off-by: Marcel Apfelbaum 
> >
> > Seems harmless enough.
> >
> > Reviewed-by: Michael S. Tsirkin 
> 
> No, the idea is good but this version is just broken.
> See the comments I made on the previous version which
> Marcel ignored :-(
> 
> -- PMM

You are right, I missed the bugs.
Good catch, thanks.

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 13:28, Gleb Natapov ha scritto:
>> On an XSAVE host, when the guest FPU state is loaded KVM will do an
>> XRSTOR.  The XRSTOR will restore the FPU state to default values.
>>
>> On a non-XSAVE host, when the guest FPU state is loaded KVM will do an
>> FXRSTR.  The FXRSTR will load the FPU state from the first 512 bytes of
>> the block that was passed to KVM_SET_XSAVE.
>>
>> This is not a problem because userspace will usually pass to
>> KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and
>> KVM_GET_XSAVE will never set XSTATE_BV=0.  However, KVM_SET_XSAVE is
>> supposed to emulate XSAVE/XRSTOR if it is not available, and it is
>> failing to emulate this detail.
>>
> You are trying to be bug to bug compatible :) XSTATE_BV can be zero only
> if FPU state is reset one, otherwise the guest will not survive.

Yes.

> KVM_SET_XSAVE
> is not suppose to emulate XSAVE/XRSTOR, it is not emulator function. It
> is better to outlaw zero value for XSTATE_BV at all, but we cannot do it
> because current QEMU uses it.

I agree it'd be better to forbid it.  If the mismatch in semantics does
not bother you, I won't fix it.  It slightly bothers me. :)

>> Yes.  QEMU unmarshals information from the XSAVE region and back, so it
>> cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
>> though.
>>
> IMO this is the main issue here, not separate bug. If we gonna let guest
> use CPU state QEMU does not support we gonna have a bad time.

 We cannot force the guest not to use a feature; all we can do is hide
>>>
>>> Of course we can't, this is correct for other features too, but this is
>>> guest's problem.
>>
>> Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
> 
> Which problem exactly. The problems I see is that 1. We do not support
> MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D
> data is not consistent with features. Guest may not expect it and do stupid
> things.

It is not a problem to unmarshal information out of KVM_GET_XSAVE data
(and back).  If the guest does stupid things, it's a bug in an
ill-behaving guest.

On the other hand, I agree that passthrough of host 0xD data is bad and
will fix it.

Paolo

>> always be "fresh" as long as the guest obeys CPUID bits it receives, and
>> the CPUID bits that QEMU passes will never enable XSAVE data that QEMU
>> does not support.
>>
> 
> --
>   Gleb.
>

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:48:55PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 09, 2013 at 12:45:12PM +0100, Peter Maydell wrote:
> > On 9 September 2013 12:41, Michael S. Tsirkin  wrote:
> > > On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
> > >> Priority is used to make visible some subregions by obscuring
> > >> the parent MemoryRegion addresses overlapping with the subregion.
> > >>
> > >> By allowing the priority to be negative the opposite can be done:
> > >> Allow a subregion to be visible on all the addresses not covered
> > >> by the parent MemoryRegion or other subregions.
> > >>
> > >> Signed-off-by: Marcel Apfelbaum 
> > >
> > > Seems harmless enough.
> > >
> > > Reviewed-by: Michael S. Tsirkin 
> > 
> > No, the idea is good but this version is just broken.
> > See the comments I made on the previous version which
> > Marcel ignored :-(
> > 
> > -- PMM
> 
> You are right, I missed the bugs.
> Good catch, thanks.
> 

So I guess it's only an Acked-by: Michael S. Tsirkin 
at this point :)

Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions

2013-09-09 Thread Michael S. Tsirkin

On Sun, Aug 25, 2013 at 03:59:28PM +0100, Peter Maydell wrote:
> The bswap.h header includes a set of "legacy unaligned functions"
> that (since commit c732a52d3 at the beginning of this year) are
> just wrappers for underlying {ld,st} functions. The legacy
> functions aren't used in many places, so just replace all their
> uses with uses of the new-style {ld,st} functions; this lets us
> remove the legacy wrappers altogether.

I have a question on this: what happens if an unaligned
address is supplied?
In particular, if the address is supplied by the guest?

Esp the pci wrappers have many callers - were they all
audited?
I tried checking but there are many callers, will take
a while.

> Since we know the {ld,st}* routines are definitely functions,
> we can in the process remove some casts which were left over
> from when the legacy unaligned functions were previously macros.
> 
> The patchset is divided up by function being removed, rather
> than by which device/subsystem is being fixed; I think this way
> round is easier to review since you only have to keep one
> substitution in your head when reading a patch.
> 
> Peter Maydell (9):
>   bswap.h: Remove cpu_to_le16wu()
>   bswap.h: Remove cpu_to_le32wu()
>   bswap.h: Remove le16_to_cpupu()
>   bswap.h: Remove le32_to_cpupu()
>   bswap.h: Remove be32_to_cpupu()
>   bswap.h: Remove cpu_to_be16wu()
>   bswap.h: Remove cpu_to_be32wu()
>   bswap.h: Remove cpu_to_be64wu()
>   bswap.h: Remove cpu_to_32wu()
> 
>  block/qcow2-cluster.c |2 +-
>  hw/acpi/core.c|3 +--
>  hw/block/cdrom.c  |   10 +-
>  hw/display/vga_template.h |   14 --
>  hw/ide/atapi.c|   16 +++
>  hw/net/e1000.c|   22 +
>  hw/net/ne2000.c   |4 ++--
>  hw/pci/pcie_aer.c |4 ++--
>  include/hw/pci/pci.h  |8 
>  include/qemu/bswap.h  |   47 
> -
>  10 files changed, 40 insertions(+), 90 deletions(-)
> 
> -- 
> 1.7.9.5

Re: [Qemu-devel] [PATCH V4 0/3] qemu-iotests: add test for fd passing via SCM rights

2013-09-09 Thread Stefan Hajnoczi

On Fri, Sep 06, 2013 at 11:24:31AM +0800, Wenchao Xia wrote:
> This series add test case for fd passing with unix socket at runtime. Since
> getfd and closefd interface will interact with monitor's data, so it will
> help to do regression test for monitor patches. Since python2 do not support
> sendmsg(), so a C helper program is added to do the job.
> 
> v2:
>   1: add missing $ in the makefile rule.
> 
> v3:
>   Address Eric's comments:
>   1: typo fix, remove "." in the end of error message, strick
> check argc as "!=", use EXIT_SUCCESS and EXIT_FAILURE as exit
> values, strict error check for strtol() call.
>   Address Luiz's comments:
>   1: change the helper program parameter as "bin < socket-fd > < file-path >",
> the program open the file itself now, data parameter is removed and blank
> is always used as iov data, better usage tip message, folder the string 
> parsing
> code into a function.
>   2: related change for helper program parameter change.
>   3: related change for helper program parameter change.
>   Other:
>   1: remove "LINK" rule in makefile, remove fd checking code inside send_fd()
> since it is already checked before calling, add '' around %s for path and
> number string in error message.
>   2: renamed fd_bin to bin in send_fd_scm() to tip better, add '' around %s
> for path in error message.
> v4:
>   Address Stefan's comments:
>   2: add space after # for comments, refined the comment's grammar.
>   3: add space after # for comments, refined the comment's grammar, add two
> test cases for error path.
> 
> Wenchao Xia (3):
>   1 qemu-iotests: add unix socket help program
>   2 qemu-iotests: add infrastructure of fd passing via SCM
>   3 qemu-iotests: add tests for runtime fd passing via SCM rights
> 
>  QMP/qmp.py |6 ++
>  configure  |2 +-
>  tests/Makefile |3 +-
>  tests/qemu-iotests/045 |   51 -
>  tests/qemu-iotests/045.out |4 +-
>  tests/qemu-iotests/check   |1 +
>  tests/qemu-iotests/iotests.py  |   23 ++
>  tests/qemu-iotests/socket_scm_helper.c |  135 
> 
>  8 files changed, 220 insertions(+), 5 deletions(-)
>  create mode 100644 tests/qemu-iotests/socket_scm_helper.c

Reviewed-by: Stefan Hajnoczi

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 12:41, Gleb Natapov ha scritto:
 In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
 and we should hide all features that are not visible in CPUID.  It is
 okay, however, to test it in cpu_post_load.
>>>
>>> The kernel should not even return state that is not visible in CPUID.
>>
>> That's an interesting point of view that I hadn't considered.  But just
>> like you asked me why it should return state that is not visible in
>> CPUID, I'm asking you why it should not...
>>
> For number of reasons. First because since a sate is not used there is no
> point in migrating it. Second to make interface more deterministic for
> QEMU. i.e QEMU configures only features it supports and gets
> exactly same state from the kernel no matter what host cpu is and what
> kernel version is. This patch will not be needed since kernel will do
> the job.

Good reasons, thanks.  Let's do it in the kernel then and avoid this
patch altogether.

Paolo

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 01:46:49PM +0200, Paolo Bonzini wrote:
> >> Yes.  QEMU unmarshals information from the XSAVE region and back, so it
> >> cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
> >> though.
> >>
> > IMO this is the main issue here, not separate bug. If we gonna let guest
> > use CPU state QEMU does not support we gonna have a bad time.
> 
>  We cannot force the guest not to use a feature; all we can do is hide
> >>>
> >>> Of course we can't, this is correct for other features too, but this is
> >>> guest's problem.
> >>
> >> Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
> > 
> > Which problem exactly. The problems I see is that 1. We do not support
> > MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D
> > data is not consistent with features. Guest may not expect it and do stupid
> > things.
> 
> It is not a problem to unmarshal information out of KVM_GET_XSAVE data
> (and back).  If the guest does stupid things, it's a bug in an
> ill-behaving guest.
> 
You know I am first in line to blame guest for everything :) (who needs
guests anyway) but in this case I didn't mean that guest does something
illegal. If we advertise support for some XSAVE state in 0D leaf guest
is in his right to make conclusions we may not expect from that. It may
check corespondent feature bit and crash if it is not present for
instance.

> On the other hand, I agree that passthrough of host 0xD data is bad and
> will fix it.
> 
Thanks!

--
Gleb.

Re: [Qemu-devel] [PATCH v2 1/5] util: add socket_set_fast_reuse function which will replace setting SO_REUSEADDR

2013-09-09 Thread Stefan Hajnoczi

On Wed, Sep 04, 2013 at 07:08:51PM +0200, Sebastian Ottlik wrote:
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 3dc8b1b..f071793 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -159,6 +159,20 @@ void qemu_set_nonblock(int fd)
>  fcntl(fd, F_SETFL, f | O_NONBLOCK);
>  }
>  
> +int socket_set_fast_reuse(int fd)
> +{
> +int val=1, ret;
> +
> +ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
> + (const char *)&val, sizeof(val));
> +
> +if (ret < 0) {
> +  perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
> +}
> +
> +return ret;
> +}

Please run scripts/checkpatch.pl before submitting patches to check
coding style (whitespace, indentation, etc).

Re: [Qemu-devel] [PATCH v5 0/6] block/qcow2: Image file option amendment

2013-09-09 Thread Kevin Wolf

Am 03.09.2013 um 10:09 hat Max Reitz geschrieben:
> 
> This series adds support to qemu-img, block and qcow2 for amending image
> options on existing image files.
> 
> Depends on:
>  - option: Add assigned flag to QEMUOptionParameter
>  - qcow2-refcount: Snapshot update for zero clusters (series, v3)
>  - Add metadata overlap checks (series, v5)

Thanks, applied to the block branch with the following changes:

- Updated patch 5 with
  [PATCH] block/qcow2: Use bdrv_truncate for size amend

- Resolved merge conflict in patch 1 (bdrv_delete -> bdrv_unref)

Kevin

Re: [Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 12:28 +0100, Peter Maydell wrote:
> On 9 September 2013 12:11, Marcel Apfelbaum  wrote:
> > Priority is used to make visible some subregions by obscuring
> > the parent MemoryRegion addresses overlapping with the subregion.
> >
> > By allowing the priority to be negative the opposite can be done:
> > Allow a subregion to be visible on all the addresses not covered
> > by the parent MemoryRegion or other subregions.
> >
> > Signed-off-by: Marcel Apfelbaum 
> > ---
> >  include/exec/memory.h | 6 +++---
> >  memory.c  | 2 +-
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index ebe0d24..6995087 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -153,7 +153,7 @@ struct MemoryRegion {
> >  bool flush_coalesced_mmio;
> >  MemoryRegion *alias;
> >  hwaddr alias_offset;
> > -unsigned priority;
> > +int priority;
> >  bool may_overlap;
> >  QTAILQ_HEAD(subregions, MemoryRegion) subregions;
> >  QTAILQ_ENTRY(MemoryRegion) subregions_link;
> > @@ -193,7 +193,7 @@ struct MemoryListener {
> >  void (*coalesced_mmio_del)(MemoryListener *listener, 
> > MemoryRegionSection *section,
> > hwaddr addr, hwaddr len);
> >  /* Lower = earlier (during add), later (during del) */
> > -unsigned priority;
> > +int priority;
> 
> You haven't addressed any of the points I made reviewing
> the first version of this. Please don't just ignore
> code review, or people will stop reviewing your code.
Hi Peter,
I really value your comments and I did acted upon them.
Basically all the changes of this version are based on your comments, thanks!

I did answer to your comment and I was going to remove it,
but I missed it again :(. Sorry for that.
I will remove it of course in the following version.

> 
> I think it would also be nice to update docs/memory.txt
> to say explicitly that priority is signed and why this
> is useful, something like this:
Of course I will, thanks!

> 
> begin
> Priority values are signed, and the default value is
> zero. This means that you can use
> memory_region_add_subregion_overlap() both to specify
> a region that must sit 'above' any others (with a
> positive priority) and also a background region that
> sits 'below' others (with a negative priority).
> endit
> 
> (in the 'Overlapping regions and priority' section
> of that document).
> 
> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH v2 0/5] Do not set SO_REUSEADDR on Windows

2013-09-09 Thread Stefan Hajnoczi

On Thu, Sep 05, 2013 at 03:48:16PM +0200, Sebastian Ottlik wrote:
> On 04.09.2013 19:08, Sebastian Ottlik wrote:
> >This patchset disabels all use of SO_REUSEADDR on Windows. On Windows systems
> >the default behaviour is equivalent to SO_REUSEADDR on other operating
> >systems. SO_REUSEADDR can still be set but results in undesired behaviour
> >instead. It may even lead to situations were system behaviour is
> >unspecified. More information on this can be found at:
> >http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx
> >
> >I originally encountered this issue when accidentally launching two QEMU
> >instances with identical GDB ports at the same time. In which case QEMU won't
> >fail as one might expect.
> >
> >v2 Changes:
> >
> >- Introduce a function with os specific implementation instead of using 
> >#ifdef
> >   I named it socket_set_fast_reuse instead of the suggested 
> > qemu_set_reuseaddr
> >   so the name better reflects what the function actually does.
> >
> >  gdbstub.c  |6 ++
> >  include/qemu/sockets.h |1 +
> >  net/socket.c   |   19 +++
> >  slirp/misc.c   |3 +--
> >  slirp/socket.c |4 +---
> >  slirp/tcp_subr.c   |6 ++
> >  slirp/udp.c|4 ++--
> >  util/oslib-posix.c |   14 ++
> >  util/oslib-win32.c |   10 ++
> >  util/qemu-sockets.c|6 +++---
> >  10 files changed, 43 insertions(+), 30 deletions(-)
> >
> >
> >util: add socket_set_fast_reuse function
> >gdbstub: call socket_set_fast_reuse instead of setting SO_REUSEADDR
> >net: call socket_set_fast_reuse instead of setting SO_REUSEADDR
> >slirp: call socket_set_fast_reuse instead of setting SO_REUSEADDR
> >util: call socket_set_fast_reuse instead of setting SO_REUSEADDR
> >
> 
> Pinging this patch, as I think it is still an appropriate approach
> to the issue:
> 
> I did some research and apparently there is a valid use case for
> SO_REUSEADDR
> on windows when multiple clients need to listen to the same port for
> the same
> multicast group. IMHO making qemu_setsockopt ignore SO_REUSEADDR on windows
> might be confusing for some use cases. Actually net_socket_mcast_create in
> net/socket.c should probably set SO_REUSEADDR on windows. This is
> also an issue
> with patch 3 I supplied that I will address in a new version of this
> patch set if there is
> an agreement on a general approach.

Sounds like a good idea.  The patch series overall looks good.

Stefan

Re: [Qemu-devel] [PATCH v2] kvm: fix traces to use %x instead of %d

2013-09-09 Thread Stefan Hajnoczi

On Wed, Sep 04, 2013 at 08:26:25PM +1000, Alexey Kardashevskiy wrote:
> KVM request types are normally defined using hex constants but QEMU traces
> print decimal values instead, which is not very convenient.
> 
> This changes the request type format from %d to %x.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: Andreas Färber 
> Reviewed-by: Paolo Bonzini 
> ---
> Changes:
> v2:
> * added "0x" to format strings so the change won't come unnoticed
> * fixed the commit message
> ---
>  trace-events | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
> > Created a MemoryRegion with negative priority that
> > spans over all the pci address space.
> > It "intercepts" the accesses to unassigned pci
> > address space and will follow the pci spec:
> >  1. returns -1 on read
> >  2. does nothing on write
> >  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
> > of the device that initiated the transaction
> > 
> > Note: This implementation assumes that all the reads/writes to
> > the pci address space are done by the cpu.
> > 
> > Signed-off-by: Marcel Apfelbaum 
> > ---
> > Changes from v1:
> >  - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
> > various Host Bridges
> >  - "pci-unassgined-mem" does not have a ".valid.accept" field and
> > implements read write methods
> > 
> >  hw/pci/pci.c | 46 
> > ++
> >  include/hw/pci/pci_bus.h |  1 +
> >  2 files changed, 47 insertions(+)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index d00682e..b6a8026 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
> >  return rootbus->qbus.name;
> >  }
> >  
> > +static void unassigned_mem_access(PCIBus *bus)
> > +{
> > +/* FIXME assumption: memory access to the pci address
> > + * space is always initiated by the host bridge
> 
> /* Always
>  * like
>  * this
>  */
> 
> /* Not
>  * like this */
OK

> 
> > + * (device 0 on the bus) */
> 
> Can't we pass the master device in?
> We are still left with the assumption that
> there's a single master, but at least
> we get rid of the assumption that it's
> always device 0 function 0.
> 
> 
> > +PCIDevice *d = bus->devices[0];
> > +if (!d) {
> > +return;
> > +}
> > +
> > +pci_word_test_and_set_mask(d->config + PCI_STATUS,
> > +   PCI_STATUS_REC_MASTER_ABORT);
> 
> Probably should check and set secondary status for
> a bridge device.
I wanted to add the support for bridges later,
I am struggling with some issues with PCI bridges.
 
> 
> > +}
> > +
> > +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned 
> > size)
> > +{
> > +PCIBus *bus = opaque;
> > +unassigned_mem_access(bus);
> > +
> > +return -1ULL;
> > +}
> > +
> > +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
> > +unsigned size)
> > +{
> > +PCIBus *bus = opaque;
> > +unassigned_mem_access(bus);
> > +}
> > +
> > +static const MemoryRegionOps unassigned_mem_ops = {
> > +.read = unassigned_mem_read,
> > +.write = unassigned_mem_write,
> > +.endianness = DEVICE_NATIVE_ENDIAN,
> > +};
> > +
> > +#define UNASSIGNED_MEM_PRIORITY -1
> > +
> >  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
> >   const char *name,
> >   MemoryRegion *address_space_mem,
> 
> 
> I would rename "unassigned" to "pci_master_Abort_" everywhere.
Are you sure about the capital "A"?

For the memory ops I suppose is OK, but the MemoryRegion name
I think it should remain "unassigned_mem"; it will be strange
to name it pci_master_Abort_mem.
> 
> Call things by what they do not how they are used.
> 
> > @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
> > *parent,
> >  bus->address_space_mem = address_space_mem;
> >  bus->address_space_io = address_space_io;
> >  
> > +
> > +memory_region_init_io(&bus->unassigned_mem, OBJECT(bus),
> > +  &unassigned_mem_ops, bus, "pci-unassigned",
> > +  memory_region_size(bus->address_space_mem));
> > +memory_region_add_subregion_overlap(bus->address_space_mem,
> > +bus->address_space_mem->addr,
> > +&bus->unassigned_mem,
> > +UNASSIGNED_MEM_PRIORITY);
> > +
> >  /* host bridge */
> >  QLIST_INIT(&bus->child);
> >
> 
> It seems safer to add an API for enabling this functionality.
> Something like
> 
> pci_set_master_for_master_abort(PCIBus *, PCIDevice *);
I started to implement that way, but it would be strange (I think) 
to see in the code the above method because almost all the pci devices
can be master devices.

I selected an "implicit" implementation so for this reason exactly.
We do not really select a master, but a "master for writes/reads on
pci address space". Selecting device 0 on the bus automatically for
this makes more sense to me.
What do you think?
Marcel 

> 
>   
> > diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> > index 9df1788..4cc25a3 100644
> > --- a/include/hw/pci/pci_bus.h
> > +++ b/include/hw/pci/pci_bus.h
> > @@ -23,6 +23,7 @@ struct PCIBus {
> >  PCIDevice *parent_dev;
> >

Re: [Qemu-devel] [PATCH v2 0/5] Do not set SO_REUSEADDR on Windows

2013-09-09 Thread Sebastian Ottlik


On 09.09.2013 14:05, Stefan Hajnoczi wrote:

On Thu, Sep 05, 2013 at 03:48:16PM +0200, Sebastian Ottlik wrote:

On 04.09.2013 19:08, Sebastian Ottlik wrote:

This patchset disabels all use of SO_REUSEADDR on Windows. On Windows systems
the default behaviour is equivalent to SO_REUSEADDR on other operating
systems. SO_REUSEADDR can still be set but results in undesired behaviour
instead. It may even lead to situations were system behaviour is
unspecified. More information on this can be found at:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx

I originally encountered this issue when accidentally launching two QEMU
instances with identical GDB ports at the same time. In which case QEMU won't
fail as one might expect.

v2 Changes:

- Introduce a function with os specific implementation instead of using #ifdef
   I named it socket_set_fast_reuse instead of the suggested qemu_set_reuseaddr
   so the name better reflects what the function actually does.

  gdbstub.c  |6 ++
  include/qemu/sockets.h |1 +
  net/socket.c   |   19 +++
  slirp/misc.c   |3 +--
  slirp/socket.c |4 +---
  slirp/tcp_subr.c   |6 ++
  slirp/udp.c|4 ++--
  util/oslib-posix.c |   14 ++
  util/oslib-win32.c |   10 ++
  util/qemu-sockets.c|6 +++---
  10 files changed, 43 insertions(+), 30 deletions(-)


util: add socket_set_fast_reuse function
gdbstub: call socket_set_fast_reuse instead of setting SO_REUSEADDR
net: call socket_set_fast_reuse instead of setting SO_REUSEADDR
slirp: call socket_set_fast_reuse instead of setting SO_REUSEADDR
util: call socket_set_fast_reuse instead of setting SO_REUSEADDR


Pinging this patch, as I think it is still an appropriate approach
to the issue:

I did some research and apparently there is a valid use case for
SO_REUSEADDR
on windows when multiple clients need to listen to the same port for
the same
multicast group. IMHO making qemu_setsockopt ignore SO_REUSEADDR on windows
might be confusing for some use cases. Actually net_socket_mcast_create in
net/socket.c should probably set SO_REUSEADDR on windows. This is
also an issue
with patch 3 I supplied that I will address in a new version of this
patch set if there is
an agreement on a general approach.

Sounds like a good idea.  The patch series overall looks good.

Stefan
Thanks for the feedback. I will resubmit the patch series including the 
change for net_socket_mcast_create and fixes for the style issues you 
pointed out soon.


When I submitted this new version of the patch set I think I was a 
little early as there was still some discussion in the thread of the 
original version. In general, what is a good period to wait before 
submitting a new version?


Best Regards,
  Sebastian

Re: [Qemu-devel] [PATCH RFC 1/3] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-02 at 15:38 +0100, Peter Maydell wrote:
> On 2 September 2013 15:13, Marcel Apfelbaum  wrote:
> > Priority is used to make visible some subregions by obscuring
> > the parent MemoryRegion addresses overlapping with the subregion.
> >
> > By allowing the priority to be negative the opposite can be done:
> > Allow a subregion to be visible on all the addresses not covered
> > by the parent MemoryRegion or other subregions.
> 
> This comment is not exactly accurate. Allowing priority to
> be signed is just a convenience: you can achieve exactly
> the same effect by specifying some positive priority for
> everything you map into the region and having the background
> region be priority zero. (If you care at all about priorities
> then everything being mapped into the region should be
> happening under the control of your code anyway.)
> 
> 
> > Signed-off-by: Marcel Apfelbaum 
> > ---
> >  include/exec/memory.h | 6 +++---
> >  memory.c  | 2 +-
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index ebe0d24..6995087 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -153,7 +153,7 @@ struct MemoryRegion {
> >  bool flush_coalesced_mmio;
> >  MemoryRegion *alias;
> >  hwaddr alias_offset;
> > -unsigned priority;
> > +int priority;
> >  bool may_overlap;
> >  QTAILQ_HEAD(subregions, MemoryRegion) subregions;
> >  QTAILQ_ENTRY(MemoryRegion) subregions_link;
> > @@ -193,7 +193,7 @@ struct MemoryListener {
> >  void (*coalesced_mmio_del)(MemoryListener *listener, 
> > MemoryRegionSection *section,
> > hwaddr addr, hwaddr len);
> >  /* Lower = earlier (during add), later (during del) */
> > -unsigned priority;
> > +int priority;
> 
> This is unrelated to MemoryRegion priorities -- it controls the
> order in which listener callbacks are called. Don't try to change
> both at once (and you only need the MR priorities anyway.)
> 
> >  AddressSpace *address_space_filter;
> >  QTAILQ_ENTRY(MemoryListener) link;
> >  };
> > @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
> >  void memory_region_add_subregion_overlap(MemoryRegion *mr,
> >   hwaddr offset,
> >   MemoryRegion *subregion,
> > - unsigned priority);
> > + int priority);
> >
> >  /**
> >   * memory_region_get_ram_addr: Get the ram address associated with a memory
> > diff --git a/memory.c b/memory.c
> > index 886f838..dfb3ae6 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
> >  void memory_region_add_subregion_overlap(MemoryRegion *mr,
> >   hwaddr offset,
> >   MemoryRegion *subregion,
> > - unsigned priority)
> > + int priority)
> >  {
> >  subregion->may_overlap = true;
> >  subregion->priority = priority;
> 
> This isn't a complete set of changes. For instance
> memory_region_set_address() has a local variable 'priority'
> which should be signed now.
> 
> sysbus_mmio_map_common() and sysbus_mmio_map_overlap()
> have priority arguments which need to change to match.
Missed that :(
Making necessary changes and send a new version
Thanks,
Marcel

> 
> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions

2013-09-09 Thread Peter Maydell

On 9 September 2013 12:54, Michael S. Tsirkin  wrote:
> On Sun, Aug 25, 2013 at 03:59:28PM +0100, Peter Maydell wrote:
>> The bswap.h header includes a set of "legacy unaligned functions"
>> that (since commit c732a52d3 at the beginning of this year) are
>> just wrappers for underlying {ld,st} functions. The legacy
>> functions aren't used in many places, so just replace all their
>> uses with uses of the new-style {ld,st} functions; this lets us
>> remove the legacy wrappers altogether.
>
> I have a question on this: what happens if an unaligned
> address is supplied?
> In particular, if the address is supplied by the guest?
>
> Esp the pci wrappers have many callers - were they all
> audited?
> I tried checking but there are many callers, will take
> a while.

This patchset has zero behavioural changes, because
literally all it is doing is taking trivial wrapper
functions like

static inline void cpu_to_le16wu(uint16_t *p, uint16_t v)
{
stw_le_p(p, v);
}

and replacing the calls to the wrapper with direct
calls to the underlying function. There's no need
to audit calls to the callsites because there is
no semantic change and not even any implementation
change here.

There are no problems with unaligned accesses, because
the stw_le_p &c functions are designed to work for
unaligned accesses: they boil down to calls to
ldl_p/stw_p and friends, which use memcpy() to
ensure they work OK with unaligned accesses.

You can check all this by looking at bswap.h.

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
> On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
> > On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
> > > Created a MemoryRegion with negative priority that
> > > spans over all the pci address space.
> > > It "intercepts" the accesses to unassigned pci
> > > address space and will follow the pci spec:
> > >  1. returns -1 on read
> > >  2. does nothing on write
> > >  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
> > > of the device that initiated the transaction
> > > 
> > > Note: This implementation assumes that all the reads/writes to
> > > the pci address space are done by the cpu.
> > > 
> > > Signed-off-by: Marcel Apfelbaum 
> > > ---
> > > Changes from v1:
> > >  - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
> > > various Host Bridges
> > >  - "pci-unassgined-mem" does not have a ".valid.accept" field and
> > > implements read write methods
> > > 
> > >  hw/pci/pci.c | 46 
> > > ++
> > >  include/hw/pci/pci_bus.h |  1 +
> > >  2 files changed, 47 insertions(+)
> > > 
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index d00682e..b6a8026 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
> > >  return rootbus->qbus.name;
> > >  }
> > >  
> > > +static void unassigned_mem_access(PCIBus *bus)
> > > +{
> > > +/* FIXME assumption: memory access to the pci address
> > > + * space is always initiated by the host bridge
> > 
> > /* Always
> >  * like
> >  * this
> >  */
> > 
> > /* Not
> >  * like this */
> OK
> 
> > 
> > > + * (device 0 on the bus) */
> > 
> > Can't we pass the master device in?
> > We are still left with the assumption that
> > there's a single master, but at least
> > we get rid of the assumption that it's
> > always device 0 function 0.
> > 
> > 
> > > +PCIDevice *d = bus->devices[0];
> > > +if (!d) {
> > > +return;
> > > +}
> > > +
> > > +pci_word_test_and_set_mask(d->config + PCI_STATUS,
> > > +   PCI_STATUS_REC_MASTER_ABORT);
> > 
> > Probably should check and set secondary status for
> > a bridge device.
> I wanted to add the support for bridges later,
> I am struggling with some issues with PCI bridges.

Shouldn't be very different.

> > 
> > > +}
> > > +
> > > +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned 
> > > size)
> > > +{
> > > +PCIBus *bus = opaque;
> > > +unassigned_mem_access(bus);
> > > +
> > > +return -1ULL;
> > > +}
> > > +
> > > +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
> > > +unsigned size)
> > > +{
> > > +PCIBus *bus = opaque;
> > > +unassigned_mem_access(bus);
> > > +}
> > > +
> > > +static const MemoryRegionOps unassigned_mem_ops = {
> > > +.read = unassigned_mem_read,
> > > +.write = unassigned_mem_write,
> > > +.endianness = DEVICE_NATIVE_ENDIAN,
> > > +};
> > > +
> > > +#define UNASSIGNED_MEM_PRIORITY -1
> > > +
> > >  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
> > >   const char *name,
> > >   MemoryRegion *address_space_mem,
> > 
> > 
> > I would rename "unassigned" to "pci_master_Abort_" everywhere.
> Are you sure about the capital "A"?

Ugh no, that's a typo.

> 
> For the memory ops I suppose is OK, but the MemoryRegion name
> I think it should remain "unassigned_mem"; it will be strange
> to name it pci_master_Abort_mem.

Why would it be strange?

> > 
> > Call things by what they do not how they are used.
> > 
> > > @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
> > > *parent,
> > >  bus->address_space_mem = address_space_mem;
> > >  bus->address_space_io = address_space_io;
> > >  
> > > +
> > > +memory_region_init_io(&bus->unassigned_mem, OBJECT(bus),
> > > +  &unassigned_mem_ops, bus, "pci-unassigned",
> > > +  memory_region_size(bus->address_space_mem));
> > > +memory_region_add_subregion_overlap(bus->address_space_mem,
> > > +bus->address_space_mem->addr,
> > > +&bus->unassigned_mem,
> > > +UNASSIGNED_MEM_PRIORITY);
> > > +
> > >  /* host bridge */
> > >  QLIST_INIT(&bus->child);
> > >
> > 
> > It seems safer to add an API for enabling this functionality.
> > Something like
> > 
> > pci_set_master_for_master_abort(PCIBus *, PCIDevice *);
> I started to implement that way, but it would be strange (I think) 
> to see in the code the above method because almost all the pci devices
> can be master devices.

In theory yes in practice no one programs devices
with addresses that trigger master abort.

Address

Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist

2013-09-09 Thread Paul Moore

On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote:
> Il 06/09/2013 20:41, Eduardo Otubo ha scritto:
> > Hello,
> > 
> > Any chance to get this patch applied?
> > 
> > Thanks!
> 
> Paul, perhaps you can add yourself to MAINTAINERS and send a pull request?
> 
> Paolo

Out of respect for the work that Eduardo has done, and is continuing to do, 
with the QEMU seccomp filtering, I think Eduardo should be the one to take on 
this role.  If Eduardo declines I'll do ahead and submit a patch adding myself 
to the MAINTAINERS file.

> > On 09/04/2013 11:11 AM, Paul Moore wrote:
> >> On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote:
> >>> This was causing Qemu process to hang when using -sandbox on.
> >>> 
> >>> Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175
> >>> 
> >>> Signed-off-by: Eduardo Otubo 
> >> 
> >> Works for me.
> >> 
> >> Tested-by: Paul Moore 
> >> 
> >>> ---
> >>> 
> >>>   qemu-seccomp.c |1 +
> >>>   1 files changed, 1 insertions(+), 0 deletions(-)
> >>> 
> >>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> >>> index 37d38f8..69cee44 100644
> >>> --- a/qemu-seccomp.c
> >>> +++ b/qemu-seccomp.c
> >>> @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall
> >>> seccomp_whitelist[]
> >>> = { { SCMP_SYS(getuid), 245 },
> >>> 
> >>>   { SCMP_SYS(geteuid), 245 },
> >>>   { SCMP_SYS(timer_create), 245 },
> >>> 
> >>> +{ SCMP_SYS(times), 245 },
> >>> 
> >>>   { SCMP_SYS(exit), 245 },
> >>>   { SCMP_SYS(clock_gettime), 245 },
> >>>   { SCMP_SYS(time), 245 },

-- 
paul moore
security and virtualization @ redhat

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
> > On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
> > > On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
> > > > Created a MemoryRegion with negative priority that
> > > > spans over all the pci address space.
> > > > It "intercepts" the accesses to unassigned pci
> > > > address space and will follow the pci spec:
> > > >  1. returns -1 on read
> > > >  2. does nothing on write
> > > >  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
> > > > of the device that initiated the transaction
> > > > 
> > > > Note: This implementation assumes that all the reads/writes to
> > > > the pci address space are done by the cpu.
> > > > 
> > > > Signed-off-by: Marcel Apfelbaum 
> > > > ---
> > > > Changes from v1:
> > > >  - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
> > > > various Host Bridges
> > > >  - "pci-unassgined-mem" does not have a ".valid.accept" field and
> > > > implements read write methods
> > > > 
> > > >  hw/pci/pci.c | 46 
> > > > ++
> > > >  include/hw/pci/pci_bus.h |  1 +
> > > >  2 files changed, 47 insertions(+)
> > > > 
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index d00682e..b6a8026 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
> > > >  return rootbus->qbus.name;
> > > >  }
> > > >  
> > > > +static void unassigned_mem_access(PCIBus *bus)
> > > > +{
> > > > +/* FIXME assumption: memory access to the pci address
> > > > + * space is always initiated by the host bridge
> > > 
> > > /* Always
> > >  * like
> > >  * this
> > >  */
> > > 
> > > /* Not
> > >  * like this */
> > OK
> > 
> > > 
> > > > + * (device 0 on the bus) */
> > > 
> > > Can't we pass the master device in?
> > > We are still left with the assumption that
> > > there's a single master, but at least
> > > we get rid of the assumption that it's
> > > always device 0 function 0.
> > > 
> > > 
> > > > +PCIDevice *d = bus->devices[0];
> > > > +if (!d) {
> > > > +return;
> > > > +}
> > > > +
> > > > +pci_word_test_and_set_mask(d->config + PCI_STATUS,
> > > > +   PCI_STATUS_REC_MASTER_ABORT);
> > > 
> > > Probably should check and set secondary status for
> > > a bridge device.
> > I wanted to add the support for bridges later,
> > I am struggling with some issues with PCI bridges.
> 
> Shouldn't be very different.
The current implementation uses only one MemoryRegion that spans
over all pci address space. It catches all the accesses both to
primary bus addresses (bus 0), or to the regions covered by the bridges.
The callback ALWAYS receive a pointer to the primary bus.

What would be a better approach?
1. Remain with a single MemoryRegion and check if the address
   belongs to a bridge address range and recursively travel
   the bridges tree and update the registers
2. Model a MemoryRegion for each bridge that represents
   the address range behind the bridge (not existing today).
   Add a MemoryRegion child to catch accesses to unassigned
   addresses behind the bridge, then recursively travel the
   bridges to top and update the registers. 

> 
> > > 
> > > > +}
> > > > +
> > > > +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, 
> > > > unsigned size)
> > > > +{
> > > > +PCIBus *bus = opaque;
> > > > +unassigned_mem_access(bus);
> > > > +
> > > > +return -1ULL;
> > > > +}
> > > > +
> > > > +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t 
> > > > val,
> > > > +unsigned size)
> > > > +{
> > > > +PCIBus *bus = opaque;
> > > > +unassigned_mem_access(bus);
> > > > +}
> > > > +
> > > > +static const MemoryRegionOps unassigned_mem_ops = {
> > > > +.read = unassigned_mem_read,
> > > > +.write = unassigned_mem_write,
> > > > +.endianness = DEVICE_NATIVE_ENDIAN,
> > > > +};
> > > > +
> > > > +#define UNASSIGNED_MEM_PRIORITY -1
> > > > +
> > > >  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
> > > >   const char *name,
> > > >   MemoryRegion *address_space_mem,
> > > 
> > > 
> > > I would rename "unassigned" to "pci_master_Abort_" everywhere.
> > Are you sure about the capital "A"?
> 
> Ugh no, that's a typo.
> 
> > 
> > For the memory ops I suppose is OK, but the MemoryRegion name
> > I think it should remain "unassigned_mem"; it will be strange
> > to name it pci_master_Abort_mem.
> 
> Why would it be strange?
1. Because it states what happens when there is a an access to
   unassigned memory and not that IS an unassigned memory.
2. This name is already used for unassigned IO ports and
   maybe is better to follow this convension
> 
> > > 
> > > Call thin

[Qemu-devel] [PATCH] hw/9pfs/virtio_9p_device: use virtio wrappers to access headers.

2013-09-09 Thread Greg Kurz

Follow-up to Rusty's virtio endianness serie: enough to get a working
virtfs mount.

Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/virtio-9p-device.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index f0ffbe8..bc2d0da 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -19,6 +19,7 @@
 #include "fsdev/qemu-fsdev.h"
 #include "virtio-9p-xattr.h"
 #include "virtio-9p-coth.h"
+#include "hw/virtio/virtio-access.h"
 
 static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
 {
@@ -34,7 +35,7 @@ static void virtio_9p_get_config(VirtIODevice *vdev, uint8_t 
*config)
 
 len = strlen(s->tag);
 cfg = g_malloc0(sizeof(struct virtio_9p_config) + len);
-stw_raw(&cfg->tag_len, len);
+virtio_stl_p(&cfg->tag_len, len);
 /* We don't copy the terminating null to config space */
 memcpy(cfg->tag, s->tag, len);
 memcpy(config, cfg, s->config_size);

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 13:43, Marcel Apfelbaum  wrote:
> The scenario is covered only for the primary bus and not for buses
> behind the PCI bridge (the later being handled differently.)
> In this case, isn't the Host Bridge always device 00.0?

No. For instance the host controller may pass a nonzero
value of devfn_min to pci_bus_new/pci_register_bus (in
which case the host bridge will end up there; example
hw/pci-host/versatile.c) or it might just pass a nonzero
devfn to pci_create_simple when it creates the host controller
PCI device on the bus (I don't think we have anything that
does it that way, though).

> If not, can we find a way to scan all bus devices and find
> the host bridge so we will not have to manually add it to each
> host bridge?

It would be conceptually nicer not to treat host bridges as
a special case but instead to just report the abort back
to whatever the PCI master was (which might be a device
doing DMA). That might be a lot of effort though.

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
> On 9 September 2013 13:43, Marcel Apfelbaum  wrote:
> > The scenario is covered only for the primary bus and not for buses
> > behind the PCI bridge (the later being handled differently.)
> > In this case, isn't the Host Bridge always device 00.0?
> 
> No. For instance the host controller may pass a nonzero
> value of devfn_min to pci_bus_new/pci_register_bus (in
> which case the host bridge will end up there; example
> hw/pci-host/versatile.c) or it might just pass a nonzero
> devfn to pci_create_simple when it creates the host controller
> PCI device on the bus (I don't think we have anything that
> does it that way, though).
> 
> > If not, can we find a way to scan all bus devices and find
> > the host bridge so we will not have to manually add it to each
> > host bridge?
> 
> It would be conceptually nicer not to treat host bridges as
> a special case but instead to just report the abort back
> to whatever the PCI master was (which might be a device
> doing DMA). That might be a lot of effort though.
> 
> -- PMM

Yes. As a shortcut, what I suggest is registering the
device that wants to be notified of master aborts with
the bus.

-- 
MST

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 13:59, Michael S. Tsirkin  wrote:
> On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
>> It would be conceptually nicer not to treat host bridges as
>> a special case but instead to just report the abort back
>> to whatever the PCI master was (which might be a device
>> doing DMA). That might be a lot of effort though.

> Yes. As a shortcut, what I suggest is registering the
> device that wants to be notified of master aborts with
> the bus.

Can you just pick the device which is (a subclass of)
TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
aren't using that class?

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 13:52 +0100, Peter Maydell wrote:
> On 9 September 2013 13:43, Marcel Apfelbaum  wrote:
> > The scenario is covered only for the primary bus and not for buses
> > behind the PCI bridge (the later being handled differently.)
> > In this case, isn't the Host Bridge always device 00.0?
> 
> No. For instance the host controller may pass a nonzero
> value of devfn_min to pci_bus_new/pci_register_bus (in
> which case the host bridge will end up there; example
> hw/pci-host/versatile.c) or it might just pass a nonzero
> devfn to pci_create_simple when it creates the host controller
> PCI device on the bus (I don't think we have anything that
> does it that way, though).
If my assumption is not generally true, I will not use it
Thanks!

> 
> > If not, can we find a way to scan all bus devices and find
> > the host bridge so we will not have to manually add it to each
> > host bridge?
> 
> It would be conceptually nicer not to treat host bridges as
> a special case but instead to just report the abort back
> to whatever the PCI master was (which might be a device
> doing DMA). That might be a lot of effort though.
This is exactly my point. ALL device on the bus can be masters
of a DMA transaction. So adding an interface as suggested by
Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
for the general case (a device doing DMA) it is too far from reality.

However, if we don't want the master device far all accesses
(including DMA) but only for accesses to pci address space,
in this specific case we can appoint the HostBridge
(explicitly/implicitly) as the master device because most
devices do not access pci address space of other devices.

As a conclusion, should we call the API
pci_set_master_for_master_abort_on_pci_space ?
Marcel




> 
> -- PMM

Re: [Qemu-devel] [PATCH] qxl: fix local renderer

2013-09-09 Thread Cole Robinson

On 09/06/2013 01:20 AM, Gerd Hoffmann wrote:
> The local spice renderer assumes the primary surface is located at the
> start of the "ram" bar.  This used to be a requirement in qxl hardware
> revision 1.  In revision 2+ this is relaxed.  Nevertheless guest drivers
> continued to use the traditional location, for historical and backward
> compatibility reasons.  The qxl kms driver doesn't though as it depends
> on qxl revision 4+ anyway.
> 
> Result is that local rendering is hosed for recent linux guests, you'll
> get pixel garbage with non-spice ui (gtk, sdl, vnc) and when doing
> screendumps.  Fix that by doing a proper mapping of the guest-specified
> memory location.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=948717
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/display/qxl-render.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
> index 269b1a7..d34b0c4 100644
> --- a/hw/display/qxl-render.c
> +++ b/hw/display/qxl-render.c
> @@ -31,10 +31,6 @@ static void qxl_blit(PCIQXLDevice *qxl, QXLRect *rect)
>  if (is_buffer_shared(surface)) {
>  return;
>  }
> -if (!qxl->guest_primary.data) {
> -trace_qxl_render_blit_guest_primary_initialized();
> -qxl->guest_primary.data = memory_region_get_ram_ptr(&qxl->vga.vram);
> -}
>  trace_qxl_render_blit(qxl->guest_primary.qxl_stride,
>  rect->left, rect->right, rect->top, rect->bottom);
>  src = qxl->guest_primary.data;
> @@ -104,7 +100,12 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
> *qxl)
>  
>  if (qxl->guest_primary.resized) {
>  qxl->guest_primary.resized = 0;
> -qxl->guest_primary.data = memory_region_get_ram_ptr(&qxl->vga.vram);
> +qxl->guest_primary.data = qxl_phys2virt(qxl,
> +
> qxl->guest_primary.surface.mem,
> +MEMSLOT_GROUP_GUEST);
> +if (!qxl->guest_primary.data) {
> +return;
> +}
>  qxl_set_rect_to_surface(qxl, &qxl->dirty[0]);
>  qxl->num_dirty_rects = 1;
>  trace_qxl_render_guest_primary_resized(
> @@ -128,6 +129,10 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
> *qxl)
>  }
>  dpy_gfx_replace_surface(vga->con, surface);
>  }
> +
> +if (!qxl->guest_primary.data) {
> +return;
> +}
>  for (i = 0; i < qxl->num_dirty_rects; i++) {
>  if (qemu_spice_rect_is_empty(qxl->dirty+i)) {
>  break;
> 

Tested-by: Cole Robinson 

And cc-ing qemu-stable

- Cole

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
> On 9 September 2013 13:59, Michael S. Tsirkin  wrote:
> > On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
> >> It would be conceptually nicer not to treat host bridges as
> >> a special case but instead to just report the abort back
> >> to whatever the PCI master was (which might be a device
> >> doing DMA). That might be a lot of effort though.
> 
> > Yes. As a shortcut, what I suggest is registering the
> > device that wants to be notified of master aborts with
> > the bus.
> 
> Can you just pick the device which is (a subclass of)
> TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
> aren't using that class?
This is what I would really want to do, but some HOST Bridge devices
inherit directly from PCI_DEVICE.

TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
is a not a PCI device and does not help us here (not a PCI_DEVICE
on the bus)

Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
and it hold as composition a PCIDevice that will be part of
the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
inherits from PCI_DEVICE.

It seems to me as a dead end, or we need to re-factor the current
hierarchy.

Marcel
> 
> -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 14:07, Marcel Apfelbaum  wrote:
> This is exactly my point. ALL device on the bus can be masters
> of a DMA transaction. So adding an interface as suggested by
> Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
> for the general case (a device doing DMA) it is too far from reality.

Actually I don't think it would be too painful.
At the moment in do_pci_register_device() we do this to
create the memory region used by a device for its bus
master transactions:

memory_region_init_alias(&pci_dev->bus_master_enable_region,
 OBJECT(pci_dev), "bus master",
 dma_as->root, 0,
 memory_region_size(dma_as->root));

If instead of using this alias directly as the
bus_master_enable region you instead:
 * create a container region
 * create a 'background' region at negative priority
   (ie one per device, and you can make the 'opaque' pointer
   point to the device, not the bus)
 * put the alias and the background region into the container
 * use the container as the bus_master_enable region

then you will get in your callback a pointer to the
device which caused the abort. You can then have your
callback call a method defined on PCIDevice which we
implement:
 * as do-nothing in the PCI device base class
 * as set-the-master-abort bit in the PCI host bridge
   class
(and anybody who wants to get fancy about handling aborts
can override it in their own device implementation)

That seems achievable without really requiring new
infrastructure. Have I missed something that wouldn't
work if we did this?

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 14:15, Marcel Apfelbaum  wrote:
> On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
>> Can you just pick the device which is (a subclass of)
>> TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
>> aren't using that class?
> This is what I would really want to do, but some HOST Bridge devices
> inherit directly from PCI_DEVICE.

> TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
> is a not a PCI device and does not help us here (not a PCI_DEVICE
> on the bus)

Oops, yes, I get those two the wrong way round a lot. Anyway,
if we need to make all host bridges have a common subclass
we could certainly refactor them accordingly.

> Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
> and it hold as composition a PCIDevice that will be part of
> the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
> inherits from PCI_DEVICE.

This may just be wrong choice of name rather than actually
wrong hierarchy.

-- PMM

Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist

2013-09-09 Thread Eduardo Otubo




On 09/09/2013 09:36 AM, Paul Moore wrote:

On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote:

Il 06/09/2013 20:41, Eduardo Otubo ha scritto:

Hello,

 Any chance to get this patch applied?

Thanks!


Paul, perhaps you can add yourself to MAINTAINERS and send a pull request?

Paolo


Out of respect for the work that Eduardo has done, and is continuing to do,
with the QEMU seccomp filtering, I think Eduardo should be the one to take on
this role.  If Eduardo declines I'll do ahead and submit a patch adding myself
to the MAINTAINERS file.


If this is ok for everyone, I would be really glad to take this role to 
myself. Paul, thanks for this vote of confidence. Paolo, should I send a 
patch for MAINTAINERS right away?


Regards,




On 09/04/2013 11:11 AM, Paul Moore wrote:

On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote:

This was causing Qemu process to hang when using -sandbox on.

Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175

Signed-off-by: Eduardo Otubo 


Works for me.

Tested-by: Paul Moore 


---

   qemu-seccomp.c |1 +
   1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 37d38f8..69cee44 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall
seccomp_whitelist[]
= { { SCMP_SYS(getuid), 245 },

   { SCMP_SYS(geteuid), 245 },
   { SCMP_SYS(timer_create), 245 },

+{ SCMP_SYS(times), 245 },

   { SCMP_SYS(exit), 245 },
   { SCMP_SYS(clock_gettime), 245 },
   { SCMP_SYS(time), 245 },




--
Eduardo Otubo
IBM Linux Technology Center

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote:
> On 9 September 2013 14:15, Marcel Apfelbaum  wrote:
> > On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
> >> Can you just pick the device which is (a subclass of)
> >> TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
> >> aren't using that class?
> > This is what I would really want to do, but some HOST Bridge devices
> > inherit directly from PCI_DEVICE.
> 
> > TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
> > is a not a PCI device and does not help us here (not a PCI_DEVICE
> > on the bus)
> 
> Oops, yes, I get those two the wrong way round a lot. Anyway,
> if we need to make all host bridges have a common subclass
> we could certainly refactor them accordingly.
> 
> > Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
> > and it hold as composition a PCIDevice that will be part of
> > the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
> > inherits from PCI_DEVICE.
> 
> This may just be wrong choice of name rather than actually
> wrong hierarchy.
I try not to "judge" the naming convention, so let's leave it
aside for now.
My issue is that we have at least 2 ways to model the bridges:
1. TYPE_PCI_HOST_BRIDGE
   * derives from TYPE_SYS_BUS_DEVICE
   * has a bus
   * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
derives from TYPE_PCI_DEVICE
2. TYPE_PCIE_HOST_BRIDGE
   * derives from TYPE_PCI_HOST_BRIDGE which derives
 from TYPE_SYS_BUS_DEVICE
   * has a PciDevice and register it to the bus in order
 to work as (1)

I would like to implement an hierarchy that will allow
all the host bridge devices to have a common ancestor
In this was, we can scan the PCI bus to look for
master...
 

> 
> -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 14:29, Marcel Apfelbaum  wrote:
> My issue is that we have at least 2 ways to model the bridges:
> 1. TYPE_PCI_HOST_BRIDGE
>* derives from TYPE_SYS_BUS_DEVICE
>* has a bus
>* one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
> derives from TYPE_PCI_DEVICE
> 2. TYPE_PCIE_HOST_BRIDGE
>* derives from TYPE_PCI_HOST_BRIDGE which derives
>  from TYPE_SYS_BUS_DEVICE
>* has a PciDevice and register it to the bus in order
>  to work as (1)

So I think there are two different (and slightly confusing)
things here:
 (1) the model of the host's PCI controller; this is
 typically derived from TYPE_SYS_BUS_DEVICE somehow
 but I guess in theory need not be; often it's a
 TYPE_PCI_HOST_BRIDGE or TYPE_PCIE_HOST_BRIDGE.
 (2) the PCI device which sits on the PCI bus and is
 visible to the guest, usually with a PCI class ID
 of PCI_CLASS_BRIDGE_HOST (but not necessarily; versatile
 is different, for instance). Currently we generally
 make these direct subclasses of TYPE_PCI_DEVICE.

[The naming convention confusion arises because
different controllers are inconsistent about how
they name these classes and which type name ends up
with 'host' in it.]

What you're going to get in the callback is (2)...

> I would like to implement an hierarchy that will allow
> all the host bridge devices to have a common ancestor
> In this was, we can scan the PCI bus to look for
> master...

...and yes, we could insert an extra class and make
all those bridge hosts derive from it rather than
directly from TYPE_PCI_DEVICE if we needed to.

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:16 +0100, Peter Maydell wrote:
> On 9 September 2013 14:07, Marcel Apfelbaum  wrote:
> > This is exactly my point. ALL device on the bus can be masters
> > of a DMA transaction. So adding an interface as suggested by
> > Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
> > for the general case (a device doing DMA) it is too far from reality.
> 
> Actually I don't think it would be too painful.
> At the moment in do_pci_register_device() we do this to
> create the memory region used by a device for its bus
> master transactions:
> 
> memory_region_init_alias(&pci_dev->bus_master_enable_region,
>  OBJECT(pci_dev), "bus master",
>  dma_as->root, 0,
>  memory_region_size(dma_as->root));
> 
> If instead of using this alias directly as the
> bus_master_enable region you instead:
>  * create a container region
>  * create a 'background' region at negative priority
>(ie one per device, and you can make the 'opaque' pointer
>point to the device, not the bus)
>  * put the alias and the background region into the container
>  * use the container as the bus_master_enable region
> 
> then you will get in your callback a pointer to the
> device which caused the abort. You can then have your
> callback call a method defined on PCIDevice which we
> implement:
>  * as do-nothing in the PCI device base class
>  * as set-the-master-abort bit in the PCI host bridge
>class
The Received Master Abort bit must be set for the initiator.
In the case described here this bit mast be set in the
device register rather that in host bridge.

> (and anybody who wants to get fancy about handling aborts
> can override it in their own device implementation)
> 
> That seems achievable without really requiring new
> infrastructure. Have I missed something that wouldn't
> work if we did this?
The idea seems correct (and cool!) to me (I'll look deeper),
but it covers only one way: upstream.
We need to unify this with the downstream: The cpu accesses to
unassigned memory should result in the callback to the host bridge.

All we need is that all the host bridges to have a common class and
following the same idea we get the downstream (The host bridge inititates
all transactions on the bus on behalf of the cpu)

If this works, we don't need no work around!
Marcel


> 
> thanks
> -- PMM

[Qemu-devel] seccomp submaintainer? (was Re: [PATCH] seccomp: adding times() to the whitelist)

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 15:20, Eduardo Otubo ha scritto:
>> Out of respect for the work that Eduardo has done, and is
>> continuing to do, with the QEMU seccomp filtering, I think Eduardo
>> should be the one to take on this role. If Eduardo declines I'll do
>> ahead and submit a patch adding myself to the MAINTAINERS file.
>
> If this is ok for everyone, I would be really glad to take this role to
> myself. Paul, thanks for this vote of confidence. Paolo, should I send a
> patch for MAINTAINERS right away?

Ok, I was suggesting Paul because he was the one doing reviews.

Eduardo, that is also okay for me.  However, even as a maintainer please
do wait for Paul's reviews.  Many areas of QEMU have maintainers that do
not send their own patches without a review, so this wouldn't be a new
rule. :)

Please wait for Anthony's ack.  I changed the subject and CCed him to
grab his attention.

Paolo

[Qemu-devel] [RFC] qapi/error: Optional error propagation backtrace

2013-09-09 Thread Max Reitz

Add a configure switch which enables an error propagation backtrace.
This results in the error_set function prepending every message by the
source file name, function and line in which it was called, as well as
error_propagate appending this information to the propagated message,
resulting in a backtrace.

Signed-off-by: Max Reitz 
---
Since this obviously breaks existing tests (and cannot really be used for
new tests since it includes a line number, which is a rather volatile
output) and is generally not very useful to normal users, the switch is
disabled by default. This functionality is intended for developers
tracking down error paths.

Since I do not know whether I am the only one considering it useful
in the first place, this is just an RFC for now.

Furthermore, I am not sure whether a configure switch is really the right
way to implement this.
---
 configure| 12 +++
 include/qapi/error.h | 40 +++-
 util/error.c | 57 +++-
 3 files changed, 90 insertions(+), 19 deletions(-)

diff --git a/configure b/configure
index e989609..4e43f7b 100755
--- a/configure
+++ b/configure
@@ -243,6 +243,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+error_backtrace="no"
 
 # parse CC options first
 for opt do
@@ -949,6 +950,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --enable-error-bt) error_backtrace="yes"
+  ;;
+  --disable-error-bt) error_backtrace="no"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1168,6 +1173,8 @@ echo "  --gcov=GCOV  use specified gcov 
[$gcov_tool]"
 echo "  --enable-tpm enable TPM support"
 echo "  --disable-libssh2disable ssh block device support"
 echo "  --enable-libssh2 enable ssh block device support"
+echo "  --disable-error-bt   disable backtraces on internal errors"
+echo "  --enable-error-btenable backtraces on internal errors"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -3650,6 +3657,7 @@ echo "gcov  $gcov_tool"
 echo "gcov enabled  $gcov"
 echo "TPM support   $tpm"
 echo "libssh2 support   $libssh2"
+echo "error backtraces  $error_backtrace"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging $qom_cast_debug"
 
@@ -4044,6 +4052,10 @@ if test "$libssh2" = "yes" ; then
   echo "CONFIG_LIBSSH2=y" >> $config_host_mak
 fi
 
+if test "$error_backtrace" = "yes" ; then
+  echo "CONFIG_ERROR_BACKTRACE=y" >> $config_host_mak
+fi
+
 if test "$virtio_blk_data_plane" = "yes" ; then
   echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
 fi
diff --git a/include/qapi/error.h b/include/qapi/error.h
index ffd1cea..d3cfe35 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -25,16 +25,28 @@ typedef struct Error Error;
 /**
  * Set an indirect pointer to an error given a ErrorClass value and a
  * printf-style human message.  This function is not meant to be used outside
- * of QEMU.
+ * of QEMU.  The message will be prepended by the file/function/line 
information
+ * if CONFIG_ERROR_BACKTRACE is defined.
  */
-void error_set(Error **err, ErrorClass err_class, const char *fmt, ...) 
GCC_FMT_ATTR(3, 4);
+void error_set_bt(const char *file, const char *func, int line,
+  Error **err, ErrorClass err_class, const char *fmt, ...)
+GCC_FMT_ATTR(6, 7);
+
+#define error_set(...) \
+error_set_bt(__FILE__, __func__, __LINE__, __VA_ARGS__)
 
 /**
  * Set an indirect pointer to an error given a ErrorClass value and a
  * printf-style human message, followed by a strerror() string if
- * @os_error is not zero.
+ * @os_error is not zero.  The message will be prepended by the
+ * file/function/line information if CONFIG_ERROR_BACKTRACE is defined.
  */
-void error_set_errno(Error **err, int os_error, ErrorClass err_class, const 
char *fmt, ...) GCC_FMT_ATTR(4, 5);
+void error_set_errno_bt(const char *file, const char *func, int line,
+Error **err, int os_error, ErrorClass err_class,
+const char *fmt, ...) GCC_FMT_ATTR(7, 8);
+
+#define error_set_errno(...) \
+error_set_errno_bt(__FILE__, __func__, __LINE__, __VA_ARGS__)
 
 /**
  * Same as error_set(), but sets a generic error
@@ -47,7 +59,11 @@ void error_set_errno(Error **err, int os_error, ErrorClass 
err_class, const char
 /**
  * Helper for open() errors
  */
-void error_setg_file_open(Error **errp, int os_errno, const char *filename);
+void error_setg_file_open_bt(const char *file, const char *func, int line,
+ Error **errp, int os_errno, const char *filename);
+
+#define error_setg_file_open(...) \
+error_setg_file_open_bt(__FILE__, __func__, __LINE__, __VA_ARGS__)
 
 /**
  * Returns true if an indirect pointer to an error is pointing to a valid
@@ -71,11 +87,17 @@ Error *error_copy(const Error *err);
 const

Re: [Qemu-devel] [PATCH V3] xl: HVM domain S3 bugfix

2013-09-09 Thread Ian Campbell

On Mon, 2013-09-09 at 03:29 +, Liu, Jinsong wrote:
> From 18344216b432648605726b137b348f28ef64a4ef Mon Sep 17 00:00:00 2001
> From: Liu Jinsong 
> Date: Fri, 23 Aug 2013 23:30:23 +0800
> Subject: [PATCH V3] xl: HVM domain S3 bugfix
> 
> Currently Xen hvm s3 has a bug coming from the difference between
> qemu-traditioanl and qemu-xen. For qemu-traditional, the way to

"traditional"

> resume from hvm s3 is via 'xl trigger' command. However, for
> qemu-xen, the way to resume from hvm s3 inherited from standard
> qemu, i.e. via QMP, and it doesn't work under Xen.
> 
> The root cause is, for qemu-xen, 'xl trigger' command didn't reset
> devices, while QMP didn't unpause hvm domain though they did qemu
> system reset.
> 
> We have two qemu patches one xl patch to fix the HVM S3 bug:
> This patch is the xl patch. It invokes QMP system_wakeup so that
> qemu logic for hvm s3 could be triggerred.

"triggered"

> 
> Signed-off-by: Liu Jinsong 

Acked-by: Ian Campbell 

I'll hold off on applying until someone givers me a heads up that the
qemu side is in place. I'll try and remember to fix the commit message
typos as I commit, so no need to resend for those.

> ---
>  tools/libxl/libxl.c  |   31 +--
>  tools/libxl/libxl_internal.h |2 ++
>  tools/libxl/libxl_qmp.c  |5 +
>  3 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 81785df..267bc25 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -4641,10 +4641,37 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, 
> uint32_t domid,
>  return ret;
>  }
>  
> +static int libxl__domain_s3_resume(libxl__gc *gc, int domid)
> +{
> +int rc = 0;
> +
> +switch (libxl__domain_type(gc, domid)) {
> +case LIBXL_DOMAIN_TYPE_HVM:
> +switch (libxl__device_model_version_running(gc, domid)) {
> +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +rc = xc_set_hvm_param(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, 
> 0);
> +break;
> +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +rc = libxl__qmp_system_wakeup(gc, domid);
> +break;
> +default:
> +rc = ERROR_INVAL;
> +break;
> +}
> +break;
> +default:
> +rc = ERROR_INVAL;
> +break;
> +}
> +
> +return rc;
> +}
> +
>  int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
> libxl_trigger trigger, uint32_t vcpuid)
>  {
>  int rc;
> +GC_INIT(ctx);
>  
>  switch (trigger) {
>  case LIBXL_TRIGGER_POWER:
> @@ -4668,8 +4695,7 @@ int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
>  XEN_DOMCTL_SENDTRIGGER_RESET, vcpuid);
>  break;
>  case LIBXL_TRIGGER_S3RESUME:
> -xc_set_hvm_param(ctx->xch, domid, HVM_PARAM_ACPI_S_STATE, 0);
> -rc = 0;
> +rc = libxl__domain_s3_resume(gc, domid);
>  break;
>  default:
>  rc = -1;
> @@ -4684,6 +4710,7 @@ int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
>  rc = ERROR_FAIL;
>  }
>  
> +GC_FREE;
>  return rc;
>  }
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f051d91..0ef0a3a 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1408,6 +1408,8 @@ _hidden int libxl__qmp_query_serial(libxl__qmp_handler 
> *qmp);
>  _hidden int libxl__qmp_pci_add(libxl__gc *gc, int d, libxl_device_pci 
> *pcidev);
>  _hidden int libxl__qmp_pci_del(libxl__gc *gc, int domid,
> libxl_device_pci *pcidev);
> +/* Resume hvm domain */
> +_hidden int libxl__qmp_system_wakeup(libxl__gc *gc, int domid);
>  /* Suspend QEMU. */
>  _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
>  /* Resume QEMU. */
> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> index f681f3a..3c3b301 100644
> --- a/tools/libxl/libxl_qmp.c
> +++ b/tools/libxl/libxl_qmp.c
> @@ -878,6 +878,11 @@ int libxl__qmp_pci_del(libxl__gc *gc, int domid, 
> libxl_device_pci *pcidev)
>  return qmp_device_del(gc, domid, id);
>  }
>  
> +int libxl__qmp_system_wakeup(libxl__gc *gc, int domid)
> +{
> +return qmp_run_command(gc, domid, "system_wakeup", NULL, NULL, NULL);
> +}
> +
>  int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
>  {
>  libxl__json_object *args = NULL;

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 03:43:49PM +0300, Marcel Apfelbaum wrote:
> On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote:
> > On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
> > > On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
> > > > > Created a MemoryRegion with negative priority that
> > > > > spans over all the pci address space.
> > > > > It "intercepts" the accesses to unassigned pci
> > > > > address space and will follow the pci spec:
> > > > >  1. returns -1 on read
> > > > >  2. does nothing on write
> > > > >  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
> > > > > of the device that initiated the transaction
> > > > > 
> > > > > Note: This implementation assumes that all the reads/writes to
> > > > > the pci address space are done by the cpu.
> > > > > 
> > > > > Signed-off-by: Marcel Apfelbaum 
> > > > > ---
> > > > > Changes from v1:
> > > > >  - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not on
> > > > > various Host Bridges
> > > > >  - "pci-unassgined-mem" does not have a ".valid.accept" field and
> > > > > implements read write methods
> > > > > 
> > > > >  hw/pci/pci.c | 46 
> > > > > ++
> > > > >  include/hw/pci/pci_bus.h |  1 +
> > > > >  2 files changed, 47 insertions(+)
> > > > > 
> > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > index d00682e..b6a8026 100644
> > > > > --- a/hw/pci/pci.c
> > > > > +++ b/hw/pci/pci.c
> > > > > @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
> > > > >  return rootbus->qbus.name;
> > > > >  }
> > > > >  
> > > > > +static void unassigned_mem_access(PCIBus *bus)
> > > > > +{
> > > > > +/* FIXME assumption: memory access to the pci address
> > > > > + * space is always initiated by the host bridge
> > > > 
> > > > /* Always
> > > >  * like
> > > >  * this
> > > >  */
> > > > 
> > > > /* Not
> > > >  * like this */
> > > OK
> > > 
> > > > 
> > > > > + * (device 0 on the bus) */
> > > > 
> > > > Can't we pass the master device in?
> > > > We are still left with the assumption that
> > > > there's a single master, but at least
> > > > we get rid of the assumption that it's
> > > > always device 0 function 0.
> > > > 
> > > > 
> > > > > +PCIDevice *d = bus->devices[0];
> > > > > +if (!d) {
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +pci_word_test_and_set_mask(d->config + PCI_STATUS,
> > > > > +   PCI_STATUS_REC_MASTER_ABORT);
> > > > 
> > > > Probably should check and set secondary status for
> > > > a bridge device.
> > > I wanted to add the support for bridges later,
> > > I am struggling with some issues with PCI bridges.
> > 
> > Shouldn't be very different.
> The current implementation uses only one MemoryRegion that spans
> over all pci address space. It catches all the accesses both to
> primary bus addresses (bus 0), or to the regions covered by the bridges.

That's not what I see in code.

pci_bridge_initfn creates address spaces for IO and memory.
Bridge windows are aliases created by pci_bridge_init_alias.
All they do is forward transactions to the secondary bus,
so these address spaces represent secondary bus addressing.

What did I miss?

> The callback ALWAYS receive a pointer to the primary bus.
> 
> What would be a better approach?
> 1. Remain with a single MemoryRegion and check if the address
>belongs to a bridge address range and recursively travel
>the bridges tree and update the registers
> 2. Model a MemoryRegion for each bridge that represents
>the address range behind the bridge (not existing today).

I think this is exactly what address_space_XXX represent.

>Add a MemoryRegion child to catch accesses to unassigned
>addresses behind the bridge, then recursively travel the
>bridges to top and update the registers. 

This bit sounds right.

> 
> > 
> > > > 
> > > > > +}
> > > > > +
> > > > > +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, 
> > > > > unsigned size)
> > > > > +{
> > > > > +PCIBus *bus = opaque;
> > > > > +unassigned_mem_access(bus);
> > > > > +
> > > > > +return -1ULL;
> > > > > +}
> > > > > +
> > > > > +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t 
> > > > > val,
> > > > > +unsigned size)
> > > > > +{
> > > > > +PCIBus *bus = opaque;
> > > > > +unassigned_mem_access(bus);
> > > > > +}
> > > > > +
> > > > > +static const MemoryRegionOps unassigned_mem_ops = {
> > > > > +.read = unassigned_mem_read,
> > > > > +.write = unassigned_mem_write,
> > > > > +.endianness = DEVICE_NATIVE_ENDIAN,
> > > > > +};
> > > > > +
> > > > > +#define UNASSIGNED_MEM_PRIORITY -1
> > > > > +
> > > > >  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
> > > > >   const ch

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:02:47PM +0100, Peter Maydell wrote:
> On 9 September 2013 13:59, Michael S. Tsirkin  wrote:
> > On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
> >> It would be conceptually nicer not to treat host bridges as
> >> a special case but instead to just report the abort back
> >> to whatever the PCI master was (which might be a device
> >> doing DMA). That might be a lot of effort though.
> 
> > Yes. As a shortcut, what I suggest is registering the
> > device that wants to be notified of master aborts with
> > the bus.
> 
> Can you just pick the device which is (a subclass of)
> TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
> aren't using that class?
> 
> -- PMM

Not anymore I think. So yes, I think this will work.

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 04:07:53PM +0300, Marcel Apfelbaum wrote:
> On Mon, 2013-09-09 at 13:52 +0100, Peter Maydell wrote:
> > On 9 September 2013 13:43, Marcel Apfelbaum  wrote:
> > > The scenario is covered only for the primary bus and not for buses
> > > behind the PCI bridge (the later being handled differently.)
> > > In this case, isn't the Host Bridge always device 00.0?
> > 
> > No. For instance the host controller may pass a nonzero
> > value of devfn_min to pci_bus_new/pci_register_bus (in
> > which case the host bridge will end up there; example
> > hw/pci-host/versatile.c) or it might just pass a nonzero
> > devfn to pci_create_simple when it creates the host controller
> > PCI device on the bus (I don't think we have anything that
> > does it that way, though).
> If my assumption is not generally true, I will not use it
> Thanks!
> 
> > 
> > > If not, can we find a way to scan all bus devices and find
> > > the host bridge so we will not have to manually add it to each
> > > host bridge?
> > 
> > It would be conceptually nicer not to treat host bridges as
> > a special case but instead to just report the abort back
> > to whatever the PCI master was (which might be a device
> > doing DMA). That might be a lot of effort though.
> This is exactly my point. ALL device on the bus can be masters
> of a DMA transaction. So adding an interface as suggested by
> Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
> for the general case (a device doing DMA) it is too far from reality.
> 
> However, if we don't want the master device far all accesses
> (including DMA) but only for accesses to pci address space,
> in this specific case we can appoint the HostBridge
> (explicitly/implicitly) as the master device because most
> devices do not access pci address space of other devices.
> 
> As a conclusion, should we call the API
> pci_set_master_for_master_abort_on_pci_space ?
> Marcel

I like Peter's idea of detecting a host bridge
implictly using device type.
With a big FIXME explaining that we shouldn't
need to do this.


> 
> 
> 
> > 
> > -- PMM
>

Re: [Qemu-devel] [PATCH 1/2] alpha-linux-user: Fix umount syscall numbers

2013-09-09 Thread Riku Voipio

Hi,

On Mon, Aug 26, 2013 at 01:26:11PM -0700, Richard Henderson wrote:
> Ping.

Sorry for the delay, adding it to the next pull request.

Riku

> On 08/16/2013 11:24 PM, Richard Henderson wrote:
> > Ping.
> > 
> > r~
> > 
> > 
> > On 07/24/2013 12:50 PM, Richard Henderson wrote:
> >> It has been pointed out on LKML that the alpha umount syscall numbers
> >> are named wrong, and a patch to rectify that has been posted for 3.11.
> >>
> >> Glibc works around this by treating NR_umount as NR_umount2 if
> >> NR_oldumount exists.  That's more complicated than we need in QEMU,
> >> given that we control linux-user/*/syscall_nr.h.
> >>
> >> This is the last instance of TARGET_NR_oldumount, so delete that from
> >> the strace.list.
> >>
> >> Signed-off-by: Richard Henderson 
> >> ---
> >>  linux-user/alpha/syscall_nr.h | 4 ++--
> >>  linux-user/strace.list| 3 ---
> >>  linux-user/syscall.c  | 2 +-
> >>  3 files changed, 3 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/linux-user/alpha/syscall_nr.h b/linux-user/alpha/syscall_nr.h
> >> index ac2b6e2..d52d76e 100644
> >> --- a/linux-user/alpha/syscall_nr.h
> >> +++ b/linux-user/alpha/syscall_nr.h
> >> @@ -20,7 +20,7 @@
> >>  #define TARGET_NR_lseek19
> >>  #define TARGET_NR_getxpid  20
> >>  #define TARGET_NR_osf_mount21
> >> -#define TARGET_NR_umount   22
> >> +#define TARGET_NR_umount2  22
> >>  #define TARGET_NR_setuid   23
> >>  #define TARGET_NR_getxuid  24
> >>  #define TARGET_NR_exec_with_loader 25 /* not implemented */
> >> @@ -255,7 +255,7 @@
> >>  #define TARGET_NR_sysinfo 318
> >>  #define TARGET_NR__sysctl 319
> >>  /* 320 was sys_idle.  */
> >> -#define TARGET_NR_oldumount   321
> >> +#define TARGET_NR_umount  321
> >>  #define TARGET_NR_swapon  322
> >>  #define TARGET_NR_times   323
> >>  #define TARGET_NR_personality 324
> >> diff --git a/linux-user/strace.list b/linux-user/strace.list
> >> index 08f115d..4f9c364 100644
> >> --- a/linux-user/strace.list
> >> +++ b/linux-user/strace.list
> >> @@ -612,9 +612,6 @@
> >>  #ifdef TARGET_NR_oldstat
> >>  { TARGET_NR_oldstat, "oldstat" , NULL, NULL, NULL },
> >>  #endif
> >> -#ifdef TARGET_NR_oldumount
> >> -{ TARGET_NR_oldumount, "oldumount" , NULL, NULL, NULL },
> >> -#endif
> >>  #ifdef TARGET_NR_olduname
> >>  { TARGET_NR_olduname, "olduname" , NULL, NULL, NULL },
> >>  #endif
> >> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> >> index 00a0390..e42c20e 100644
> >> --- a/linux-user/syscall.c
> >> +++ b/linux-user/syscall.c
> >> @@ -5719,7 +5719,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
> >> arg1,
> >>  unlock_user(p, arg1, 0);
> >>  }
> >>  break;
> >> -#ifdef TARGET_NR_umount2 /* not on alpha */
> >> +#ifdef TARGET_NR_umount2
> >>  case TARGET_NR_umount2:
> >>  if (!(p = lock_user_string(arg1)))
> >>  goto efault;
> >>
> >

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:39 +0100, Peter Maydell wrote:
> On 9 September 2013 14:29, Marcel Apfelbaum  wrote:
> > My issue is that we have at least 2 ways to model the bridges:
> > 1. TYPE_PCI_HOST_BRIDGE
> >* derives from TYPE_SYS_BUS_DEVICE
> >* has a bus
> >* one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
> > derives from TYPE_PCI_DEVICE
> > 2. TYPE_PCIE_HOST_BRIDGE
> >* derives from TYPE_PCI_HOST_BRIDGE which derives
> >  from TYPE_SYS_BUS_DEVICE
> >* has a PciDevice and register it to the bus in order
> >  to work as (1)
> 
> So I think there are two different (and slightly confusing)
> things here:
>  (1) the model of the host's PCI controller; this is
>  typically derived from TYPE_SYS_BUS_DEVICE somehow
>  but I guess in theory need not be; often it's a
>  TYPE_PCI_HOST_BRIDGE or TYPE_PCIE_HOST_BRIDGE.
>  (2) the PCI device which sits on the PCI bus and is
>  visible to the guest, usually with a PCI class ID
>  of PCI_CLASS_BRIDGE_HOST (but not necessarily; versatile
>  is different, for instance). Currently we generally
>  make these direct subclasses of TYPE_PCI_DEVICE.
> 
> [The naming convention confusion arises because
> different controllers are inconsistent about how
> they name these classes and which type name ends up
> with 'host' in it.]
> 
> What you're going to get in the callback is (2)...
This I come to understand, thanks!
> 
> > I would like to implement an hierarchy that will allow
> > all the host bridge devices to have a common ancestor
> > In this was, we can scan the PCI bus to look for
> > master...
> 
> ...and yes, we could insert an extra class and make
> all those bridge hosts derive from it rather than
> directly from TYPE_PCI_DEVICE if we needed to.
And we solve the main issue - no need for an API for master abort
- for upstream: use your suggestion (let's hope it works)
- for downstream: look on the bus for a device deriving
  from this class and *this device* will handle the
  master abort
I'll find a way how to handle the PCI bridges.

By the way, I am not sure that the upstream transactions (DMA)
can actually end with a master abort. Master abort would happen
if a transaction will not be claimed by any device on the bus.
But in DMA transactions, maybe the host bridge will always claim it
and return with error. Does anybody know something about this?

Thanks for all the help!
Marcel




> 
> -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 04:29:04PM +0300, Marcel Apfelbaum wrote:
> On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote:
> > On 9 September 2013 14:15, Marcel Apfelbaum  wrote:
> > > On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
> > >> Can you just pick the device which is (a subclass of)
> > >> TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
> > >> aren't using that class?
> > > This is what I would really want to do, but some HOST Bridge devices
> > > inherit directly from PCI_DEVICE.
> > 
> > > TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
> > > is a not a PCI device and does not help us here (not a PCI_DEVICE
> > > on the bus)
> > 
> > Oops, yes, I get those two the wrong way round a lot. Anyway,
> > if we need to make all host bridges have a common subclass
> > we could certainly refactor them accordingly.
> > 
> > > Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
> > > and it hold as composition a PCIDevice that will be part of
> > > the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
> > > inherits from PCI_DEVICE.
> > 
> > This may just be wrong choice of name rather than actually
> > wrong hierarchy.
> I try not to "judge" the naming convention, so let's leave it
> aside for now.
> My issue is that we have at least 2 ways to model the bridges:
> 1. TYPE_PCI_HOST_BRIDGE
>* derives from TYPE_SYS_BUS_DEVICE
>* has a bus
>* one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
> derives from TYPE_PCI_DEVICE
> 2. TYPE_PCIE_HOST_BRIDGE
>* derives from TYPE_PCI_HOST_BRIDGE which derives
>  from TYPE_SYS_BUS_DEVICE
>* has a PciDevice and register it to the bus in order
>  to work as (1)
> 
> I would like to implement an hierarchy that will allow
> all the host bridge devices to have a common ancestor
> In this was, we can scan the PCI bus to look for
> master...

I wouldn't object to is_host is stuct PCIDeviceClass.
That's probably easier that trying to create
a common class for devices that share no common code.


> 
> > 
> > -- PMM
>

Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization

2013-09-09 Thread Igor Mammedov

On Fri, 02 Aug 2013 22:33:24 +0200
Andreas Färber  wrote:

> Am 23.07.2013 18:22, schrieb Igor Mammedov:
> > Signed-off-by: Igor Mammedov 
> > ---
> >  vl.c |7 +--
> >  1 files changed, 1 insertions(+), 6 deletions(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index 8190504..bf0c658 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
> >  module_call_init(MODULE_INIT_MACHINE);
> >  machine = find_default_machine();
> >  cpu_model = NULL;
> > -ram_size = 0;
> > +ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> >  snapshot = 0;
> >  cyls = heads = secs = 0;
> >  translation = BIOS_ATA_TRANSLATION_AUTO;
> > @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
> >  exit(1);
> >  }
> >  
> > -/* init the memory */
> > -if (ram_size == 0) {
> > -ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
> > -}
> > -
> >  if (qemu_opts_foreach(qemu_find_opts("device"), device_help_func, 
> > NULL, 0)
> >  != 0) {
> >  exit(0);
> 
> Commit message doesn't give any explanation why?
it was intended as cleanup

> 
> What happens with -m 0? My guess is the old code translates that to the
> default size, where by intializing the default earlier it would stay.
patch is broken in this aspect. It aborts on start up with -m 0

The question is if -m 0 is correct value, perhaps QEMU should exit with
error message in this case, instead of silent fallback to default?

> 
> Andreas
>

Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements

2013-09-09 Thread Richard Henderson

On 09/09/2013 01:13 AM, Claudio Fontana wrote:
> after carefully reading and testing your patches, this is how I suggest to 
> proceed: 
> 
> first do the implementation of the new functionality (tcg opcodes, jit) in a 
> way that is consistent with the existing code.
> No type changes, no refactoring, no beautification.
> 
> Once we agree on those, introduce the meaningful restructuring you want to do,
> like the new INSN type, the "don't handle mov/movi in tcg_out_op", the 
> TCG_OPF_64BIT thing, etc.
> 
> Last do the cosmetic stuff if you really want to do it, like the change all 
> ext to bool (note that there is no point if the callers still use "1" and 
> "0": adapt them as well) etc.

No, I don't agree.  Especially with respect to the insn type.

I'd much rather do all the "cosmetic stuff", as you put it, first.  It makes
all of the "real" changes much easier to understand.


r~

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 16:58 +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 09, 2013 at 03:43:49PM +0300, Marcel Apfelbaum wrote:
> > On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote:
> > > On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
> > > > On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
> > > > > > Created a MemoryRegion with negative priority that
> > > > > > spans over all the pci address space.
> > > > > > It "intercepts" the accesses to unassigned pci
> > > > > > address space and will follow the pci spec:
> > > > > >  1. returns -1 on read
> > > > > >  2. does nothing on write
> > > > > >  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
> > > > > > of the device that initiated the transaction
> > > > > > 
> > > > > > Note: This implementation assumes that all the reads/writes to
> > > > > > the pci address space are done by the cpu.
> > > > > > 
> > > > > > Signed-off-by: Marcel Apfelbaum 
> > > > > > ---
> > > > > > Changes from v1:
> > > > > >  - "pci-unassigned-mem" MemoryRegion resides now in PCIBus and not 
> > > > > > on
> > > > > > various Host Bridges
> > > > > >  - "pci-unassgined-mem" does not have a ".valid.accept" field and
> > > > > > implements read write methods
> > > > > > 
> > > > > >  hw/pci/pci.c | 46 
> > > > > > ++
> > > > > >  include/hw/pci/pci_bus.h |  1 +
> > > > > >  2 files changed, 47 insertions(+)
> > > > > > 
> > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > > index d00682e..b6a8026 100644
> > > > > > --- a/hw/pci/pci.c
> > > > > > +++ b/hw/pci/pci.c
> > > > > > @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
> > > > > >  return rootbus->qbus.name;
> > > > > >  }
> > > > > >  
> > > > > > +static void unassigned_mem_access(PCIBus *bus)
> > > > > > +{
> > > > > > +/* FIXME assumption: memory access to the pci address
> > > > > > + * space is always initiated by the host bridge
> > > > > 
> > > > > /* Always
> > > > >  * like
> > > > >  * this
> > > > >  */
> > > > > 
> > > > > /* Not
> > > > >  * like this */
> > > > OK
> > > > 
> > > > > 
> > > > > > + * (device 0 on the bus) */
> > > > > 
> > > > > Can't we pass the master device in?
> > > > > We are still left with the assumption that
> > > > > there's a single master, but at least
> > > > > we get rid of the assumption that it's
> > > > > always device 0 function 0.
> > > > > 
> > > > > 
> > > > > > +PCIDevice *d = bus->devices[0];
> > > > > > +if (!d) {
> > > > > > +return;
> > > > > > +}
> > > > > > +
> > > > > > +pci_word_test_and_set_mask(d->config + PCI_STATUS,
> > > > > > +   PCI_STATUS_REC_MASTER_ABORT);
> > > > > 
> > > > > Probably should check and set secondary status for
> > > > > a bridge device.
> > > > I wanted to add the support for bridges later,
> > > > I am struggling with some issues with PCI bridges.
> > > 
> > > Shouldn't be very different.
> > The current implementation uses only one MemoryRegion that spans
> > over all pci address space. It catches all the accesses both to
> > primary bus addresses (bus 0), or to the regions covered by the bridges.
> 
> That's not what I see in code.
> 
> pci_bridge_initfn creates address spaces for IO and memory.
> Bridge windows are aliases created by pci_bridge_init_alias.
> All they do is forward transactions to the secondary bus,
> so these address spaces represent secondary bus addressing.
> 
> What did I miss?
I was referring to *my* implementation of "unassigned mem" region,
not to pci bridge's regions.
> 
> > The callback ALWAYS receive a pointer to the primary bus.
> > 
> > What would be a better approach?
> > 1. Remain with a single MemoryRegion and check if the address
> >belongs to a bridge address range and recursively travel
> >the bridges tree and update the registers
> > 2. Model a MemoryRegion for each bridge that represents
> >the address range behind the bridge (not existing today).
> 
> I think this is exactly what address_space_XXX represent.
Today I see (using info mtree) under the bridge:
 1. A  container memory region for all the space 0 - 2^64
 2. Regions for the devices under the bridge
I do not see a region corresponding with the bridge "memory region"
that would be forwarded to the secondary bus.

Thanks,
Marcel

>  
> >Add a MemoryRegion child to catch accesses to unassigned
> >addresses behind the bridge, then recursively travel the
> >bridges to top and update the registers. 
> 
> This bit sounds right.
> 
> > 
> > > 
> > > > > 
> > > > > > +}
> > > > > > +
> > > > > > +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, 
> > > > > > unsigned size)
> > > > > > +{
> > > > > > +PCIBus *bus = opaque;
> > > > > > +unassigned_mem_access(bus);
> > > > > > +
> > > > > > +return -1ULL;
> > > > >

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 17:04 +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 09, 2013 at 04:29:04PM +0300, Marcel Apfelbaum wrote:
> > On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote:
> > > On 9 September 2013 14:15, Marcel Apfelbaum  wrote:
> > > > On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
> > > >> Can you just pick the device which is (a subclass of)
> > > >> TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
> > > >> aren't using that class?
> > > > This is what I would really want to do, but some HOST Bridge devices
> > > > inherit directly from PCI_DEVICE.
> > > 
> > > > TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
> > > > is a not a PCI device and does not help us here (not a PCI_DEVICE
> > > > on the bus)
> > > 
> > > Oops, yes, I get those two the wrong way round a lot. Anyway,
> > > if we need to make all host bridges have a common subclass
> > > we could certainly refactor them accordingly.
> > > 
> > > > Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
> > > > and it hold as composition a PCIDevice that will be part of
> > > > the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
> > > > inherits from PCI_DEVICE.
> > > 
> > > This may just be wrong choice of name rather than actually
> > > wrong hierarchy.
> > I try not to "judge" the naming convention, so let's leave it
> > aside for now.
> > My issue is that we have at least 2 ways to model the bridges:
> > 1. TYPE_PCI_HOST_BRIDGE
> >* derives from TYPE_SYS_BUS_DEVICE
> >* has a bus
> >* one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
> > derives from TYPE_PCI_DEVICE
> > 2. TYPE_PCIE_HOST_BRIDGE
> >* derives from TYPE_PCI_HOST_BRIDGE which derives
> >  from TYPE_SYS_BUS_DEVICE
> >* has a PciDevice and register it to the bus in order
> >  to work as (1)
> > 
> > I would like to implement an hierarchy that will allow
> > all the host bridge devices to have a common ancestor
> > In this was, we can scan the PCI bus to look for
> > master...
> 
> I wouldn't object to is_host is stuct PCIDeviceClass.
> That's probably easier that trying to create
> a common class for devices that share no common code.
This will work too, and less changes to the code.
However Peter suggested that we can unify both upstream
and downstream handling of the master abort by putting
it all in this class.
But if we have "is_host" flag, we can differentiate
regular devices from host devices and handle them different.

If there are no objections, I will chose this path.
Thanks!
Marcel




> 
> 
> > 
> > > 
> > > -- PMM
> >

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 15:04, Marcel Apfelbaum  wrote:
> By the way, I am not sure that the upstream transactions (DMA)
> can actually end with a master abort. Master abort would happen
> if a transaction will not be claimed by any device on the bus.
> But in DMA transactions, maybe the host bridge will always claim it
> and return with error. Does anybody know something about this?

No, it's perfectly possible for a bus master transaction
to abort. The PC's host controller happens to be set up so
that bus master DMA covers the whole of the PCI memory space
and so it's probably not possible to get an abort on that
platform, but this isn't necessarily the case. For instance
the versatilePB's PCI controller only responds to accesses
within its programmed MMIO BAR ranges, so if the device
or the controller have been misconfigured you can get an
abort when the device tries to do DMA. (This usually causes
the device to decide something has gone seriously wrong.
For instance an EHCI USB controller device will stop
and set the STS_FATAL bit in its status register:
http://lxr.free-electrons.com/source/include/linux/usb/ehci_def.h#L96

If we wanted to implement this correctly we would need
to be able to return an "access succeeded/failed" indicator
from dma_memory_write(): check hw/usb/hcd-ehci.c:get_dwords()
(which already has the logic for "whoops, DMA failed"
for the extremely simple case of "no DMA address space").

-- PMM

Re: [Qemu-devel] [PATCH 0/2] qcow2: Discard VM state in active L1 after creating snapshot

2013-09-09 Thread Max Reitz


On 2013-09-06 12:32, Kevin Wolf wrote:

This series fixes that 'savevm' becomes much slower after the first snapshot.
The reason is unnecessary COW for the VM state, which is now avoided.

Kevin Wolf (2):
   qcow2: Pass discard type to qcow2_discard_clusters()
   qcow2: Discard VM state in active L1 after creating snapshot

  block/qcow2-cluster.c  | 8 
  block/qcow2-snapshot.c | 7 +++
  block/qcow2.c  | 7 +--
  block/qcow2.h  | 7 ++-
  4 files changed, 18 insertions(+), 11 deletions(-)



Reviewed-by: Max Reitz

Re: [Qemu-devel] [Bug 1180777] Re: Windows 7 VM freeze on Ubuntu 12.04 KVM

2013-09-09 Thread f3a97

Hi,

Eventually I've been able to reproduce the hang.

Just to recap the experiment: I have recorded network traffic during RDP
connection hangs. The result is that the VM has do not experience total
network loss: network traffic continues to flow (tcpdump from the host on
ports other than 3389).




On 5 September 2013 15:22, Stefano Doni  wrote:

> Hi Hector,
>
>
> My network configuration is different than yours - it is not bridged.
> Perhaps this might help to diagnose the issue?
>
>
> I'm now trying to sniff network traffic generated by the VM and see what
> happens during RDP hangs: does it still poll outside servers (i.e. google
> drive) or will become disconnected?
>
>
> I'll report the results, stay tuned.
>
>
> On 19 August 2013 00:42, Hector Perez  wrote:
>
>> Hi,
>>
>> I reported the #1212051 (https://bugs.launchpad.net/ubuntu/+source/qemu-
>> kvm/+bug/1212051)
>> bug with Windows XP, but reading this case i think it
>> could be the same issue.
>>
>> I connect via RDP to my windows XP VM and after a while it seems to
>> freeze, then, I connect via VNC an without doing anything the
>> communications are restored between the VM and the outside world. So i
>> can connect again with RDP.
>>
>> Even more, if i leave an open connection with VNC (minimized because is
>> more slow than RDP), connecting via RDP has no trouble and no problem of
>> freeze (lost of communications) occurs.
>>
>> f3a97, Could you do this test and tell if you have the same experience
>> than i?,  If so, my bug is the same of this.
>> Thanks.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1180777
>>
>> Title:
>>   Windows 7 VM freeze on Ubuntu 12.04 KVM
>>
>> Status in QEMU:
>>   New
>> Status in “qemu-kvm” package in Ubuntu:
>>   Confirmed
>>
>> Bug description:
>>   Hi,
>>
>>   I have recently setup a Windows 7 VM on KVM and started using it
>>   through remote desktop.
>>
>>   What happens is that, after some hours of usage, the remote desktop
>>   connection freezes. I thought it was a remmina bug, as the it was
>>   enough to kill and restart it to successfully connect again to the VM.
>>
>>   However, today I've switched to a different RDP client (2X Client
>>   chromium app) and the freeze just happened again!
>>
>>   Some information:
>>   - the host and the VM are completely idle when the freeze occurs
>>   - I've tried sniffing the network packets toward the RDP port during
>> the freeze and found that the client is sending packets but no packet is
>> sent back
>>
>>   Could this be a KVM issue? How can I further debug this one (I expect
>>   the freeze to happen again...)?
>>
>>   ProblemType: Bug
>>   DistroRelease: Ubuntu 12.04
>>   Package: kvm 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.8
>>   ProcVersionSignature: Ubuntu 3.2.0-41.66-generic 3.2.42
>>   Uname: Linux 3.2.0-41-generic x86_64
>>   ApportVersion: 2.0.1-0ubuntu17.2
>>   Architecture: amd64
>>   Date: Thu May 16 14:12:40 2013
>>   MachineType: Hewlett-Packard HP ProBook 4520s
>>   MarkForUpload: True
>>   ProcEnviron:
>>TERM=xterm
>>PATH=(custom, no user)
>>LANG=en_US.UTF-8
>>SHELL=/bin/bash
>>   ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-41-generic
>> root=UUID=D2E20BC3E20BAAB5 loop=/hostname/disks/root.disk ro quiet splash
>> vt.handoff=7
>>   SourcePackage: qemu-kvm
>>   UpgradeStatus: No upgrade log present (probably fresh install)
>>   dmi.bios.date: 08/26/2010
>>   dmi.bios.vendor: Hewlett-Packard
>>   dmi.bios.version: 68AZZ Ver. F.0A
>>   dmi.board.name: 1411
>>   dmi.board.vendor: Hewlett-Packard
>>   dmi.board.version: KBC Version 57.30
>>   dmi.chassis.type: 10
>>   dmi.chassis.vendor: Hewlett-Packard
>>   dmi.modalias:
>> dmi:bvnHewlett-Packard:bvr68AZZVer.F.0A:bd08/26/2010:svnHewlett-Packard:pnHPProBook4520s:pvr:rvnHewlett-Packard:rn1411:rvrKBCVersion57.30:cvnHewlett-Packard:ct10:cvr:
>>   dmi.product.name: HP ProBook 4520s
>>   dmi.sys.vendor: Hewlett-Packard
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/qemu/+bug/1180777/+subscriptions
>>
>
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1180777

Title:
  Windows 7 VM freeze on Ubuntu 12.04 KVM

Status in QEMU:
  New
Status in “qemu-kvm” package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  I have recently setup a Windows 7 VM on KVM and started using it
  through remote desktop.

  What happens is that, after some hours of usage, the remote desktop
  connection freezes. I thought it was a remmina bug, as the it was
  enough to kill and restart it to successfully connect again to the VM.

  However, today I've switched to a different RDP client (2X Client
  chromium app) and the freeze just happened again!

  Some information:
  - the host and the VM are completely idle when the freez

Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements

2013-09-09 Thread Peter Maydell

On 9 September 2013 16:02, Claudio Fontana  wrote:
> I guess we are stuck then. With the cosmetic and restructuring
> stuff coming before, I cannot cherry pick the good parts later.

...what do you need to cherry pick it into?

-- PMM

1 2 3 >

1 - 100 of 215 matches

Mail list logo