date:20100617

Re: [Qemu-devel] [PATCH 02/10] qdev: export qdev_reset() for later use.

2010-06-17 Thread Markus Armbruster

Isaku Yamahata  writes:

> export qdev_reset() for later use.
>
> Signed-off-by: Isaku Yamahata 
> ---
>  hw/qdev.c |   13 +
>  hw/qdev.h |1 +
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/hw/qdev.c b/hw/qdev.c
> index 61f999c..378f842 100644
> --- a/hw/qdev.c
> +++ b/hw/qdev.c
> @@ -256,13 +256,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
>  return qdev;
>  }
>  
> -static void qdev_reset(void *opaque)
> +void qdev_reset(DeviceState *dev)
>  {
> -DeviceState *dev = opaque;
>  if (dev->info->reset)
>  dev->info->reset(dev);
>  }
>  
> +static void qdev_reset_fn(void *opaque)
> +{
> +DeviceState *dev = opaque;
> +qdev_reset(dev);
> +}
> +

Nitpick: why the local variable?

[...]

[Qemu-devel] Re: [PATCH] SeaBIOS: Fix bvprintf() to respect padding for hex printing.

2010-06-17 Thread Jes Sorensen

On 06/17/10 04:42, Kevin O'Connor wrote:
> On Mon, Jun 14, 2010 at 02:04:31PM +0200, jes.soren...@redhat.com wrote:
>> From: Jes Sorensen 
>>
>> Fix bvprintf to respect space padding when printing hex numbers
>> and the caller specifies alignment without zero padding, eg. %2x
>> as opposed to %02x
> 
> I thought your patch would increase stack space in 16bit mode, but
> oddly it seems to actually reduce stack space (at least on gcc4.4.4).
> 
> So, the patch looks good, but I think you missed the case where the
> length given is smaller than the actual number, and %p needs to use
> zero padding.  How about the below instead.

Hi Kevin,

DOH, you're right! Your patch looks good to me so
Signed-off-by: Jes Sorensen 

Thanks for catching this.

Cheers,
Jes

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
> On Wed, Jun 16, 2010 at 06:00:56PM +0200, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Wed, Jun 16, 2010 at 12:35:16PM +0300, Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 11:33:13AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
>> On Wed, Jun 16, 2010 at 09:57:35AM +0200, Jan Kiszka wrote:
>>> Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:51:14AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
>> On Wed, Jun 16, 2010 at 09:03:01AM +0200, Jan Kiszka wrote:
>>> Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 12:40:28AM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
>
> There is no need starting with the special value for 
> hpet_cfg.count.
> Either Seabios is aware of the new firmware interface and properly
> interprets the counter or it simply ignores it anyway.
>
 I want seabios to be able to distinguish between old qemu and new 
 one.
>>> I see now. But isn't it a good chance to introduce a proper generic
>>> interface for exploring supported fw-cfg keys?
>>>
>> Having such interface would be nice. Pity we haven't introduced it 
>> from
>> the start. If we do it now seabios will have to find out somehow that
>> qemu support such interface. Chicken and egg ;)
> That is easy: Add a key the describes the highest supported key value
> (looks like this is monotonously increasing). Older qemu versions will
> return 0.
>
 That will not support holes in key space, and our key space is already
 sparse.
>>> Then add a service to obtain a bitmap of supported keys. If that bitmap
>>> is empty...
>>>
>> Bitmap will be 2k long. We can add read capability to control port. To
>> check if key is present you select it (write its value to control port)
>> and then read control port back. If values is non-zero the key is valid.
>> But how to detect qemu that does not support that?
> Isn't there some key that was always there and will always be?
>
 FW_CFG_SIGNATURE

>>> So any ideas? Or did I misunderstood your hint? ;)
>> I thought you found the answer yourself:
>>
>> Seabios could select FW_CFG_SIGNATURE and then perform a read-back on
>> the control register. Older QEMUs will return -1, versions that support
>> the read-back 0. Problem solved, no?
>>
> AFAIK QEMU returns 0 if io read was done from non-used port or mmio
> address, but can we rely on this? If we can then problem solved, if
> we can't then no.

It works for IO-based fw-cfg, but not for MMIO-based. So the firmware
should probably pick a non-zero key for this check, e.g. FW_CFG_ID.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 2/4] qemu: Enable XSAVE related CPUID

2010-06-17 Thread Sheng Yang

We can support it in KVM now. The 0xd leaf is queried from KVM.

Signed-off-by: Sheng Yang 
---
 target-i386/cpuid.c |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 99d1f44..ab6536b 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1067,6 +1067,27 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx = 0;
 *edx = 0;
 break;
+case 0xD:
+/* Processor Extended State */
+if (!(env->cpuid_ext_features & CPUID_EXT_XSAVE)) {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+break;
+}
+if (kvm_enabled()) {
+*eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
+*edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
+} else {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+}
+break;
 case 0x8000:
 *eax = env->cpuid_xlevel;
 *ebx = env->cpuid_vendor1;
-- 
1.7.0.1

[Qemu-devel] [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang 
---
 target-i386/cpu.h |5 ++
 target-i386/kvm.c |  134 +
 target-i386/machine.c |   20 +++
 3 files changed, 159 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 548ab80..75070d3 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -718,6 +718,11 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
+
+uint64_t xstate_bv;
+XMMReg ymmh_regs[CPU_NB_REGS];
+
+uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6a12f..90ff323 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
 } else {
 env->mp_state = KVM_MP_STATE_RUNNABLE;
 }
+/* Legal xcr0 for loading */
+env->xcr0 = 1;
 }
 
 static int kvm_has_msr_star(CPUState *env)
@@ -504,6 +506,57 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
 }
 
+#ifdef KVM_CAP_XSAVE
+
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+
+static int kvm_put_xsave(CPUState *env)
+{
+int i;
+struct kvm_xsave* xsave;
+uint16_t cwd, swd, twd, fop;
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env->fpus & ~(7 << 11);
+swd |= (env->fpstt & 7) << 11;
+cwd = env->fpuc;
+for (i = 0; i < 8; ++i)
+twd |= (!env->fptags[i]) << i;
+xsave->region[0] = (uint32_t)(swd << 16) + cwd;
+xsave->region[1] = (uint32_t)(fop << 16) + twd;
+memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
+sizeof env->fpregs);
+memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
+sizeof env->xmm_regs);
+xsave->region[XSAVE_MXCSR] = env->mxcsr;
+*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
+memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
+sizeof env->ymmh_regs);
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+}
+#endif
+
+#ifdef KVM_CAP_XCRS
+static int kvm_put_xcrs(CPUState *env)
+{
+struct kvm_xcrs xcrs;
+
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env->xcr0;
+return kvm_vcpu_ioctl(env, KVM_SET_XCRS, &xcrs);
+}
+#endif
+
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -621,6 +674,59 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+#ifdef KVM_CAP_XSAVE
+static int kvm_get_xsave(CPUState *env)
+{
+struct kvm_xsave* xsave;
+int ret, i;
+uint16_t cwd, swd, twd, fop;
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+if (ret < 0)
+return ret;
+
+cwd = (uint16_t)xsave->region[0];
+swd = (uint16_t)(xsave->region[0] >> 16);
+twd = (uint16_t)xsave->region[1];
+fop = (uint16_t)(xsave->region[1] >> 16);
+env->fpstt = (swd >> 11) & 7;
+env->fpus = swd;
+env->fpuc = cwd;
+for (i = 0; i < 8; ++i)
+env->fptags[i] = !((twd >> i) & 1);
+env->mxcsr = xsave->region[XSAVE_MXCSR];
+memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
+sizeof env->fpregs);
+memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
+sizeof env->xmm_regs);
+env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
+memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
+sizeof env->ymmh_regs);
+return 0;
+}
+#endif
+
+#ifdef KVM_CAP_XCRS
+static int kvm_get_xcrs(CPUState *env)
+{
+int i, ret;
+struct kvm_xcrs xcrs;
+
+ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, &xcrs);
+if (ret < 0)
+return ret;
+
+for (i = 0; i < xcrs.nr_xcrs; i++)
+/* Only support xcr0 now */
+if (xcrs.xcrs[0].xcr == 0) {
+env->xcr0 = xcrs.xcrs[0].value;
+break;
+}
+return 0;
+}
+#endif
+
 static int kvm_get_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -965,9 +1071,23 @@ int kvm_arch_put_registers(CPUState *env, int level)
 if (ret < 0)
 return ret;
 
+#ifdef KVM_CAP_XSAVE
+if (kvm_check_extension(env->kvm_state, KVM_CAP_XSAVE))
+ret = kvm_put_xsave(env);
+else
+ret = kvm_put_fpu(env);
+#else
 ret = kvm_put_fpu(env);
+#endif
+if (ret < 0)
+return ret;
+
+#ifdef KVM_CAP_XCRS
+if (kvm_check_extension(env->kvm_state, KVM_CAP_XCRS))
+ret = kvm_put_xcrs(env);
 if (ret < 0)
 return ret;
+#endif
 
 ret = kvm_put_sregs(env);
 if (ret < 0)
@@ -1009,9 +1129,23 @@ int kvm_arch_get_registers(CPUState *env)
 if (ret

[Qemu-devel] [PATCH 1/4] qemu: kvm: Extend kvm_arch_get_supported_cpuid() to support index

2010-06-17 Thread Sheng Yang

Would use it later for XSAVE related CPUID.

Signed-off-by: Sheng Yang 
---
 kvm.h |2 +-
 target-i386/kvm.c |   19 +++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/kvm.h b/kvm.h
index a28e7aa..7975e87 100644
--- a/kvm.h
+++ b/kvm.h
@@ -145,7 +145,7 @@ bool kvm_arch_stop_on_emulation_error(CPUState *env);
 int kvm_check_extension(KVMState *s, unsigned int extension);
 
 uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-  int reg);
+  uint32_t index, int reg);
 void kvm_cpu_synchronize_state(CPUState *env);
 void kvm_cpu_synchronize_post_reset(CPUState *env);
 void kvm_cpu_synchronize_post_init(CPUState *env);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5a088a7..bb6a12f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -72,7 +72,8 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
 return cpuid;
 }
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int 
reg)
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+  uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
 int i, max;
@@ -89,7 +90,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t 
function, int reg)
 }
 
 for (i = 0; i < cpuid->nent; ++i) {
-if (cpuid->entries[i].function == function) {
+if (cpuid->entries[i].function == function &&
+cpuid->entries[i].index == index) {
 switch (reg) {
 case R_EAX:
 ret = cpuid->entries[i].eax;
@@ -111,7 +113,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
  */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, R_EDX);
+cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, 
R_EDX);
 ret |= cpuid_1_edx & 0x183f7ff;
 break;
 }
@@ -127,7 +129,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
 
 #else
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int 
reg)
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+  uint32_t index, int reg)
 {
 return -1U;
 }
@@ -179,16 +182,16 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 env->mp_state = KVM_MP_STATE_RUNNABLE;
 
-env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, R_EDX);
+env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
 
 i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
-env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(env, 1, R_ECX);
+env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_ECX);
 env->cpuid_ext_features |= i;
 
 env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(env, 0x8001,
- R_EDX);
+ 0, R_EDX);
 env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(env, 0x8001,
- R_ECX);
+ 0, R_ECX);
 
 cpuid_i = 0;
 
-- 
1.7.0.1

[Qemu-devel] [PATCH v4 0/4] XSAVE enabling in QEmu

2010-06-17 Thread Sheng Yang

Notice the first three patches applied to uq/master branch of qemu-kvm, the 
last one
applied to qemu-kvm master branch. And the last one would only apply after the
first three merged in master branch.

[Qemu-devel] [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang

Based on upstream xsave related fields.

Signed-off-by: Sheng Yang 
---
 qemu-kvm-x86.c |   95 +++-
 qemu-kvm.c |   24 ++
 qemu-kvm.h |   28 
 3 files changed, 146 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 3c33e64..dcef8b5 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct 
kvm_segment *rhs)
| (rhs->avl * DESC_AVL_MASK);
 }
 
+#ifdef KVM_CAP_XSAVE
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+#endif
+
 void kvm_arch_load_regs(CPUState *env, int level)
 {
 struct kvm_regs regs;
 struct kvm_fpu fpu;
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+#endif
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+#endif
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 int rc, n, i;
@@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
 
 kvm_set_regs(env, ®s);
 
+#ifdef KVM_CAP_XSAVE
+if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+uint16_t cwd, swd, twd, fop;
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env->fpus & ~(7 << 11);
+swd |= (env->fpstt & 7) << 11;
+cwd = env->fpuc;
+for (i = 0; i < 8; ++i)
+twd |= (!env->fptags[i]) << i;
+xsave->region[0] = (uint32_t)(swd << 16) + cwd;
+xsave->region[1] = (uint32_t)(fop << 16) + twd;
+memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
+sizeof env->fpregs);
+memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
+sizeof env->xmm_regs);
+xsave->region[XSAVE_MXCSR] = env->mxcsr;
+*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
+memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
+sizeof env->ymmh_regs);
+kvm_set_xsave(env, xsave);
+#ifdef KVM_CAP_XCRS
+if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env->xcr0;
+kvm_set_xcrs(env, &xcrs);
+}
+#endif /* KVM_CAP_XCRS */
+} else {
+#endif /* KVM_CAP_XSAVE */
 memset(&fpu, 0, sizeof fpu);
 fpu.fsw = env->fpus & ~(7 << 11);
 fpu.fsw |= (env->fpstt & 7) << 11;
 fpu.fcw = env->fpuc;
 for (i = 0; i < 8; ++i)
-   fpu.ftwx |= (!env->fptags[i]) << i;
+fpu.ftwx |= (!env->fptags[i]) << i;
 memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
 memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
 fpu.mxcsr = env->mxcsr;
 kvm_set_fpu(env, &fpu);
+#ifdef KVM_CAP_XSAVE
+}
+#endif
 
 memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
 if (env->interrupt_injected >= 0) {
@@ -934,6 +987,12 @@ void kvm_arch_save_regs(CPUState *env)
 {
 struct kvm_regs regs;
 struct kvm_fpu fpu;
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+#endif
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+#endif
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 uint32_t hflags;
@@ -965,6 +1024,37 @@ void kvm_arch_save_regs(CPUState *env)
 env->eflags = regs.rflags;
 env->eip = regs.rip;
 
+#ifdef KVM_CAP_XSAVE
+if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+uint16_t cwd, swd, twd, fop;
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+kvm_get_xsave(env, xsave);
+cwd = (uint16_t)xsave->region[0];
+swd = (uint16_t)(xsave->region[0] >> 16);
+twd = (uint16_t)xsave->region[1];
+fop = (uint16_t)(xsave->region[1] >> 16);
+env->fpstt = (swd >> 11) & 7;
+env->fpus = swd;
+env->fpuc = cwd;
+for (i = 0; i < 8; ++i)
+env->fptags[i] = !((twd >> i) & 1);
+env->mxcsr = xsave->region[XSAVE_MXCSR];
+memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
+sizeof env->fpregs);
+memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
+sizeof env->xmm_regs);
+env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
+memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
+sizeof env->ymmh_regs);
+#ifdef KVM_CAP_XCRS
+if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
+kvm_get_xcrs(env, &xcrs);
+if (xcrs.xcrs[0].xcr == 0)
+env->xcr0 = xcrs.xcrs[0].value;
+}
+#endif
+} else {
+#endif
 kvm_get_fpu(env, &fpu);
 env->fpstt = (fpu.fsw >> 11) & 7;
 env->fpus = fpu.fsw;
@@ -974,6 +1064,9 @@ void kvm_arch_save_regs(CPUState *en

[Qemu-devel] Re: [PATCH v3] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> On Thursday 17 June 2010 00:05:44 Marcelo Tosatti wrote:
>> On Wed, Jun 16, 2010 at 05:48:46PM +0200, Jan Kiszka wrote:
>>> Marcelo Tosatti wrote:
 On Fri, Jun 11, 2010 at 12:36:49PM +0800, Sheng Yang wrote:
> Signed-off-by: Sheng Yang 
> ---
>
>  qemu-kvm-x86.c|  109
>  - qemu-kvm.c
> |   24 +++
>  qemu-kvm.h|   28 +
>  target-i386/cpu.h |5 ++
>  target-i386/kvm.c |2 +
>  target-i386/machine.c |   20 +
>  6 files changed, 169 insertions(+), 19 deletions(-)
 Applied, thanks.
>>> Oops, late remark: Why introducing this feature against qemu-kvm instead
>>> of upstream? Doesn't this just generate additional conversion work and
>>> the risk of divergence to upstream in the migration protocol?
> 
> Hi Jan
> 
> You're late... Hope you could raise the comment earlier next time so we can 
> work 
> together more efficient.

This case is "lost", probably was already when you posted the first
time. But I hope we can raise awareness for the issue that way again.

>> Thats true. Sheng, can you add save/restore support to uq/master to
>> avoid these problems?
> 
> Yes, there is divergence risk, would send an upstream version as well.
> 
> But I think as long as qemu-kvm and qemu upstream use different LM path, the 
> duplicate code/work can't be avoid. 

Probably. The vision is that one day you can write a KVM feature and
apply it to qemu-kvm as a staging tree for later unmodified merge into
qemu upstream. qemu-kvm[-arch].[ch] is still in our way, but it already
uses many bits from upstream. So I would recommend to design new
features against upstream first and then provide the few bits to also
make use of it in qemu-kvm once the latter has merged the required bits
(which may actually happen before upstream, but that doesn't matter).

Jan

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> Signed-off-by: Sheng Yang 
> ---
>  target-i386/cpu.h |5 ++
>  target-i386/kvm.c |  134 
> +
>  target-i386/machine.c |   20 +++
>  3 files changed, 159 insertions(+), 0 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 548ab80..75070d3 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -718,6 +718,11 @@ typedef struct CPUX86State {
>  uint16_t fpus_vmstate;
>  uint16_t fptag_vmstate;
>  uint16_t fpregs_format_vmstate;
> +
> +uint64_t xstate_bv;
> +XMMReg ymmh_regs[CPU_NB_REGS];
> +
> +uint64_t xcr0;
>  } CPUX86State;
>  
>  CPUX86State *cpu_x86_init(const char *cpu_model);
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index bb6a12f..90ff323 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
>  } else {
>  env->mp_state = KVM_MP_STATE_RUNNABLE;
>  }
> +/* Legal xcr0 for loading */
> +env->xcr0 = 1;
>  }
>  
>  static int kvm_has_msr_star(CPUState *env)
> @@ -504,6 +506,57 @@ static int kvm_put_fpu(CPUState *env)
>  return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
>  }
>  
> +#ifdef KVM_CAP_XSAVE
> +
> +#define XSAVE_CWD_RIP 2
> +#define XSAVE_CWD_RDP 4
> +#define XSAVE_MXCSR   6
> +#define XSAVE_ST_SPACE8
> +#define XSAVE_XMM_SPACE   40
> +#define XSAVE_XSTATE_BV   128
> +#define XSAVE_YMMH_SPACE  144
> +
> +static int kvm_put_xsave(CPUState *env)
> +{
> +int i;
> +struct kvm_xsave* xsave;
> +uint16_t cwd, swd, twd, fop;
> +
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +memset(xsave, 0, sizeof(struct kvm_xsave));
> +cwd = swd = twd = fop = 0;
> +swd = env->fpus & ~(7 << 11);
> +swd |= (env->fpstt & 7) << 11;
> +cwd = env->fpuc;
> +for (i = 0; i < 8; ++i)
> +twd |= (!env->fptags[i]) << i;
> +xsave->region[0] = (uint32_t)(swd << 16) + cwd;
> +xsave->region[1] = (uint32_t)(fop << 16) + twd;
> +memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
> +sizeof env->fpregs);
> +memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
> +sizeof env->xmm_regs);
> +xsave->region[XSAVE_MXCSR] = env->mxcsr;
> +*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
> +memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
> +sizeof env->ymmh_regs);
> +return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
> +}
> +#endif
> +
> +#ifdef KVM_CAP_XCRS
> +static int kvm_put_xcrs(CPUState *env)
> +{
> +struct kvm_xcrs xcrs;
> +
> +xcrs.nr_xcrs = 1;
> +xcrs.flags = 0;
> +xcrs.xcrs[0].xcr = 0;
> +xcrs.xcrs[0].value = env->xcr0;
> +return kvm_vcpu_ioctl(env, KVM_SET_XCRS, &xcrs);
> +}
> +#endif
> +
>  static int kvm_put_sregs(CPUState *env)
>  {
>  struct kvm_sregs sregs;
> @@ -621,6 +674,59 @@ static int kvm_get_fpu(CPUState *env)
>  return 0;
>  }
>  
> +#ifdef KVM_CAP_XSAVE
> +static int kvm_get_xsave(CPUState *env)
> +{
> +struct kvm_xsave* xsave;
> +int ret, i;
> +uint16_t cwd, swd, twd, fop;
> +
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
> +if (ret < 0)
> +return ret;
> +
> +cwd = (uint16_t)xsave->region[0];
> +swd = (uint16_t)(xsave->region[0] >> 16);
> +twd = (uint16_t)xsave->region[1];
> +fop = (uint16_t)(xsave->region[1] >> 16);
> +env->fpstt = (swd >> 11) & 7;
> +env->fpus = swd;
> +env->fpuc = cwd;
> +for (i = 0; i < 8; ++i)
> +env->fptags[i] = !((twd >> i) & 1);
> +env->mxcsr = xsave->region[XSAVE_MXCSR];
> +memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
> +sizeof env->fpregs);
> +memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
> +sizeof env->xmm_regs);
> +env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
> +memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
> +sizeof env->ymmh_regs);
> +return 0;
> +}
> +#endif
> +
> +#ifdef KVM_CAP_XCRS
> +static int kvm_get_xcrs(CPUState *env)
> +{
> +int i, ret;
> +struct kvm_xcrs xcrs;
> +
> +ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, &xcrs);
> +if (ret < 0)
> +return ret;
> +
> +for (i = 0; i < xcrs.nr_xcrs; i++)
> +/* Only support xcr0 now */
> +if (xcrs.xcrs[0].xcr == 0) {
> +env->xcr0 = xcrs.xcrs[0].value;
> +break;
> +}
> +return 0;
> +}
> +#endif
> +
>  static int kvm_get_sregs(CPUState *env)
>  {
>  struct kvm_sregs sregs;
> @@ -965,9 +1071,23 @@ int kvm_arch_put_registers(CPUState *env, int level)
>  if (ret < 0)
>  return ret;
>  
> +#ifdef KVM_CAP_XSAVE
> +if (kvm_check_extension(env->kvm_state, KVM_CAP_XSAVE))
> +ret = kvm_put_xsave(env);
> +else
> +ret = kvm_put_fpu(env);
> +#else
>  ret = kvm_put_

[Qemu-devel] Re: [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> Based on upstream xsave related fields.
> 
> Signed-off-by: Sheng Yang 
> ---
>  qemu-kvm-x86.c |   95 
> +++-
>  qemu-kvm.c |   24 ++
>  qemu-kvm.h |   28 
>  3 files changed, 146 insertions(+), 1 deletions(-)
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 3c33e64..dcef8b5 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct 
> kvm_segment *rhs)
>   | (rhs->avl * DESC_AVL_MASK);
>  }
>  
> +#ifdef KVM_CAP_XSAVE
> +#define XSAVE_CWD_RIP 2
> +#define XSAVE_CWD_RDP 4
> +#define XSAVE_MXCSR   6
> +#define XSAVE_ST_SPACE8
> +#define XSAVE_XMM_SPACE   40
> +#define XSAVE_XSTATE_BV   128
> +#define XSAVE_YMMH_SPACE  144
> +#endif
> +
>  void kvm_arch_load_regs(CPUState *env, int level)
>  {
>  struct kvm_regs regs;
>  struct kvm_fpu fpu;
> +#ifdef KVM_CAP_XSAVE
> +struct kvm_xsave* xsave;
> +#endif
> +#ifdef KVM_CAP_XCRS
> +struct kvm_xcrs xcrs;
> +#endif
>  struct kvm_sregs sregs;
>  struct kvm_msr_entry msrs[100];
>  int rc, n, i;
> @@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
>  
>  kvm_set_regs(env, ®s);
>  
> +#ifdef KVM_CAP_XSAVE
> +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
> +uint16_t cwd, swd, twd, fop;
> +
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +memset(xsave, 0, sizeof(struct kvm_xsave));
> +cwd = swd = twd = fop = 0;
> +swd = env->fpus & ~(7 << 11);
> +swd |= (env->fpstt & 7) << 11;
> +cwd = env->fpuc;
> +for (i = 0; i < 8; ++i)
> +twd |= (!env->fptags[i]) << i;
> +xsave->region[0] = (uint32_t)(swd << 16) + cwd;
> +xsave->region[1] = (uint32_t)(fop << 16) + twd;
> +memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
> +sizeof env->fpregs);
> +memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
> +sizeof env->xmm_regs);
> +xsave->region[XSAVE_MXCSR] = env->mxcsr;
> +*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
> +memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
> +sizeof env->ymmh_regs);
> +kvm_set_xsave(env, xsave);
> +#ifdef KVM_CAP_XCRS
> +if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
> +xcrs.nr_xcrs = 1;
> +xcrs.flags = 0;
> +xcrs.xcrs[0].xcr = 0;
> +xcrs.xcrs[0].value = env->xcr0;
> +kvm_set_xcrs(env, &xcrs);
> +}
> +#endif /* KVM_CAP_XCRS */
> +} else {
> +#endif /* KVM_CAP_XSAVE */

Why not reusing kvm_put/get_xsave as defined for upstream? There should
be enough examples for that pattern. The result will be a tiny qemu-kvm
patch.

Jan

>  memset(&fpu, 0, sizeof fpu);
>  fpu.fsw = env->fpus & ~(7 << 11);
>  fpu.fsw |= (env->fpstt & 7) << 11;
>  fpu.fcw = env->fpuc;
>  for (i = 0; i < 8; ++i)
> - fpu.ftwx |= (!env->fptags[i]) << i;
> +fpu.ftwx |= (!env->fptags[i]) << i;
>  memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
>  memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
>  fpu.mxcsr = env->mxcsr;
>  kvm_set_fpu(env, &fpu);
> +#ifdef KVM_CAP_XSAVE
> +}
> +#endif
>  
>  memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
>  if (env->interrupt_injected >= 0) {
> @@ -934,6 +987,12 @@ void kvm_arch_save_regs(CPUState *env)
>  {
>  struct kvm_regs regs;
>  struct kvm_fpu fpu;
> +#ifdef KVM_CAP_XSAVE
> +struct kvm_xsave* xsave;
> +#endif
> +#ifdef KVM_CAP_XCRS
> +struct kvm_xcrs xcrs;
> +#endif
>  struct kvm_sregs sregs;
>  struct kvm_msr_entry msrs[100];
>  uint32_t hflags;
> @@ -965,6 +1024,37 @@ void kvm_arch_save_regs(CPUState *env)
>  env->eflags = regs.rflags;
>  env->eip = regs.rip;
>  
> +#ifdef KVM_CAP_XSAVE
> +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
> +uint16_t cwd, swd, twd, fop;
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +kvm_get_xsave(env, xsave);
> +cwd = (uint16_t)xsave->region[0];
> +swd = (uint16_t)(xsave->region[0] >> 16);
> +twd = (uint16_t)xsave->region[1];
> +fop = (uint16_t)(xsave->region[1] >> 16);
> +env->fpstt = (swd >> 11) & 7;
> +env->fpus = swd;
> +env->fpuc = cwd;
> +for (i = 0; i < 8; ++i)
> +env->fptags[i] = !((twd >> i) & 1);
> +env->mxcsr = xsave->region[XSAVE_MXCSR];
> +memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
> +sizeof env->fpregs);
> +memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
> +sizeof env->xmm_regs);
> +env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
> +memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],

Re: [Qemu-devel] Q35 qemu repository?

2010-06-17 Thread Isaku Yamahata

Thanks for the patch.
Does vista boot with the patch eventually?

On Wed, Jun 16, 2010 at 06:33:15PM +0100, Matthew Garrett wrote:
> On Wed, Jun 16, 2010 at 04:42:10PM +0100, Matthew Garrett wrote:
> 
> > Thanks for this - however, Vista gives me an ACPI error on boot (stop 
> > 0x00a5, 0x000d, which indicates that there's a malformed or 
> > undefined ACPI device). I don't suppose you have any idea what the 
> > problem here may be? Linux boots without complaint.
> 
> Fixed with the following patch. Any devices with duplicate _HIDs require 
> _UIDs.
> 
> diff --git a/src/q35-acpi-dsdt.dsl b/src/q35-acpi-dsdt.dsl
> index ad05c7a..4697527 100644
> --- a/src/q35-acpi-dsdt.dsl
> +++ b/src/q35-acpi-dsdt.dsl
> @@ -45,6 +45,7 @@ DefinitionBlock (
>  Device (DBG0)
>  {
>  Name(_HID, EISAID("PNP0C02"))
> +Name(_UID, 0)
>  Name(_CRS, ResourceTemplate() {
>  IO (Decode16, 0xb080, 0xb080, 0x00, 0x04)
>  })
> @@ -71,6 +72,7 @@ DefinitionBlock (
>  Device(HP0)
>  {
>  Name(_HID, EISAID("PNP0C02"))
> +Name(_UID, 0x01)
>  Name(_CRS, ResourceTemplate() {
>  IO (Decode16, 0xae00, 0xae00, 0x00, 0x0C)
>  IO (Decode16, 0xae0c, 0xae0c, 0x00, 0x01)
> 
> -- 
> Matthew Garrett | mj...@srcf.ucam.org
> 

-- 
yamahata

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 09:17:51AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Wed, Jun 16, 2010 at 06:00:56PM +0200, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Wed, Jun 16, 2010 at 12:35:16PM +0300, Gleb Natapov wrote:
>  On Wed, Jun 16, 2010 at 11:33:13AM +0200, Jan Kiszka wrote:
> > Gleb Natapov wrote:
> >> On Wed, Jun 16, 2010 at 09:57:35AM +0200, Jan Kiszka wrote:
> >>> Gleb Natapov wrote:
>  On Wed, Jun 16, 2010 at 09:51:14AM +0200, Jan Kiszka wrote:
> > Gleb Natapov wrote:
> >> On Wed, Jun 16, 2010 at 09:03:01AM +0200, Jan Kiszka wrote:
> >>> Gleb Natapov wrote:
>  On Wed, Jun 16, 2010 at 12:40:28AM +0200, Jan Kiszka wrote:
> > From: Jan Kiszka 
> >
> > There is no need starting with the special value for 
> > hpet_cfg.count.
> > Either Seabios is aware of the new firmware interface and 
> > properly
> > interprets the counter or it simply ignores it anyway.
> >
>  I want seabios to be able to distinguish between old qemu and 
>  new one.
> >>> I see now. But isn't it a good chance to introduce a proper 
> >>> generic
> >>> interface for exploring supported fw-cfg keys?
> >>>
> >> Having such interface would be nice. Pity we haven't introduced it 
> >> from
> >> the start. If we do it now seabios will have to find out somehow 
> >> that
> >> qemu support such interface. Chicken and egg ;)
> > That is easy: Add a key the describes the highest supported key 
> > value
> > (looks like this is monotonously increasing). Older qemu versions 
> > will
> > return 0.
> >
>  That will not support holes in key space, and our key space is 
>  already
>  sparse.
> >>> Then add a service to obtain a bitmap of supported keys. If that 
> >>> bitmap
> >>> is empty...
> >>>
> >> Bitmap will be 2k long. We can add read capability to control port. To
> >> check if key is present you select it (write its value to control port)
> >> and then read control port back. If values is non-zero the key is 
> >> valid.
> >> But how to detect qemu that does not support that?
> > Isn't there some key that was always there and will always be?
> >
>  FW_CFG_SIGNATURE
> 
> >>> So any ideas? Or did I misunderstood your hint? ;)
> >> I thought you found the answer yourself:
> >>
> >> Seabios could select FW_CFG_SIGNATURE and then perform a read-back on
> >> the control register. Older QEMUs will return -1, versions that support
> >> the read-back 0. Problem solved, no?
> >>
> > AFAIK QEMU returns 0 if io read was done from non-used port or mmio
> > address, but can we rely on this? If we can then problem solved, if
> > we can't then no.
> 
> It works for IO-based fw-cfg, but not for MMIO-based. So the firmware
> should probably pick a non-zero key for this check, e.g. FW_CFG_ID.
> 
Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
MMIO-based". Can you write pseudo logic of how you think it
all should work?

--
Gleb.

Re: [Qemu-devel] Re: [PATCH 1/2] qemu-io: retry fgets() when errno is EINTR

2010-06-17 Thread Kevin Wolf

Am 16.06.2010 18:52, schrieb MORITA Kazutaka:
> At Wed, 16 Jun 2010 13:04:47 +0200,
> Kevin Wolf wrote:
>>
>> Am 15.06.2010 19:53, schrieb MORITA Kazutaka:
>>> posix-aio-compat sends a signal in aio operations, so we should
>>> consider that fgets() could be interrupted here.
>>>
>>> Signed-off-by: MORITA Kazutaka 
>>> ---
>>>  cmd.c |3 +++
>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cmd.c b/cmd.c
>>> index 2336334..460df92 100644
>>> --- a/cmd.c
>>> +++ b/cmd.c
>>> @@ -272,7 +272,10 @@ fetchline(void)
>>> return NULL;
>>> printf("%s", get_prompt());
>>> fflush(stdout);
>>> +again:
>>> if (!fgets(line, MAXREADLINESZ, stdin)) {
>>> +   if (errno == EINTR)
>>> +   goto again;
>>> free(line);
>>> return NULL;
>>> }
>>
>> This looks like a loop replaced by goto (and braces are missing). What
>> about this instead?
>>
>> do {
>> ret = fgets(...)
>> } while (ret == NULL && errno == EINTR)
>>
>> if (ret == NULL) {
>>fail
>> }
>>
> 
> I agree.
> 
> However, it seems that my second patch have already solved the
> problem.  We register this readline routines as an aio handler now, so
> fgets() does not block and cannot return with EINTR.
> 
> This patch looks no longer needed, sorry.

Good point. Thanks for having a look.

Kevin

[Qemu-devel] [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang 
---
 target-i386/cpu.h |7 ++-
 target-i386/kvm.c |  139 -
 target-i386/machine.c |   20 +++
 3 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 548ab80..680eed1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -718,6 +718,11 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
+
+uint64_t xstate_bv;
+XMMReg ymmh_regs[CPU_NB_REGS];
+
+uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define cpu_list_id x86_cpu_list
 #define cpudef_setup   x86_cpudef_setup
 
-#define CPU_SAVE_VERSION 11
+#define CPU_SAVE_VERSION 12
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6a12f..e490c0a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
 } else {
 env->mp_state = KVM_MP_STATE_RUNNABLE;
 }
+/* Legal xcr0 for loading */
+env->xcr0 = 1;
 }
 
 static int kvm_has_msr_star(CPUState *env)
@@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
 }
 
+#ifdef KVM_CAP_XSAVE
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+#endif
+
+static int kvm_put_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+int i;
+struct kvm_xsave* xsave;
+uint16_t cwd, swd, twd, fop;
+
+if (kvm_check_extension(env->kvm_state, KVM_CAP_XSAVE))
+return kvm_put_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env->fpus & ~(7 << 11);
+swd |= (env->fpstt & 7) << 11;
+cwd = env->fpuc;
+for (i = 0; i < 8; ++i)
+twd |= (!env->fptags[i]) << i;
+xsave->region[0] = (uint32_t)(swd << 16) + cwd;
+xsave->region[1] = (uint32_t)(fop << 16) + twd;
+memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
+sizeof env->fpregs);
+memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
+sizeof env->xmm_regs);
+xsave->region[XSAVE_MXCSR] = env->mxcsr;
+*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
+memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
+sizeof env->ymmh_regs);
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+#else
+return kvm_put_fpu(env);
+#endif
+}
+
+static int kvm_put_xcrs(CPUState *env)
+{
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+
+if (!kvm_check_extension(env->kvm_state, KVM_CAP_XCRS))
+return 0;
+
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env->xcr0;
+return kvm_vcpu_ioctl(env, KVM_SET_XCRS, &xcrs);
+#else
+return 0;
+#endif
+}
+
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+static int kvm_get_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+int ret, i;
+uint16_t cwd, swd, twd, fop;
+
+if (!kvm_check_extension(env->kvm_state, KVM_CAP_XSAVE))
+return kvm_get_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+if (ret < 0)
+return ret;
+
+cwd = (uint16_t)xsave->region[0];
+swd = (uint16_t)(xsave->region[0] >> 16);
+twd = (uint16_t)xsave->region[1];
+fop = (uint16_t)(xsave->region[1] >> 16);
+env->fpstt = (swd >> 11) & 7;
+env->fpus = swd;
+env->fpuc = cwd;
+for (i = 0; i < 8; ++i)
+env->fptags[i] = !((twd >> i) & 1);
+env->mxcsr = xsave->region[XSAVE_MXCSR];
+memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
+sizeof env->fpregs);
+memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
+sizeof env->xmm_regs);
+env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
+memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
+sizeof env->ymmh_regs);
+return 0;
+#else
+return kvm_get_fpu(env);
+#endif
+}
+
+static int kvm_get_xcrs(CPUState *env)
+{
+#ifdef KVM_CAP_XCRS
+int i, ret;
+struct kvm_xcrs xcrs;
+
+if (!kvm_check_extension(env->kvm_state, KVM_CAP_XCRS))
+return 0;
+
+ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, &xcrs);
+if (ret < 0)
+return ret;
+
+for (i = 0; i < xcrs.nr_xcrs; i++)
+/* Only support xcr0 now */
+if (xcrs.xcrs[0].xcr == 0) {
+env->xcr0 = xcrs.xcrs[0].value;
+break;
+}
+retur

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
> On Thu, Jun 17, 2010 at 09:17:51AM +0200, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Wed, Jun 16, 2010 at 06:00:56PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
> On Wed, Jun 16, 2010 at 12:35:16PM +0300, Gleb Natapov wrote:
>> On Wed, Jun 16, 2010 at 11:33:13AM +0200, Jan Kiszka wrote:
>>> Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:57:35AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
>> On Wed, Jun 16, 2010 at 09:51:14AM +0200, Jan Kiszka wrote:
>>> Gleb Natapov wrote:
 On Wed, Jun 16, 2010 at 09:03:01AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
>> On Wed, Jun 16, 2010 at 12:40:28AM +0200, Jan Kiszka wrote:
>>> From: Jan Kiszka 
>>>
>>> There is no need starting with the special value for 
>>> hpet_cfg.count.
>>> Either Seabios is aware of the new firmware interface and 
>>> properly
>>> interprets the counter or it simply ignores it anyway.
>>>
>> I want seabios to be able to distinguish between old qemu and 
>> new one.
> I see now. But isn't it a good chance to introduce a proper 
> generic
> interface for exploring supported fw-cfg keys?
>
 Having such interface would be nice. Pity we haven't introduced it 
 from
 the start. If we do it now seabios will have to find out somehow 
 that
 qemu support such interface. Chicken and egg ;)
>>> That is easy: Add a key the describes the highest supported key 
>>> value
>>> (looks like this is monotonously increasing). Older qemu versions 
>>> will
>>> return 0.
>>>
>> That will not support holes in key space, and our key space is 
>> already
>> sparse.
> Then add a service to obtain a bitmap of supported keys. If that 
> bitmap
> is empty...
>
 Bitmap will be 2k long. We can add read capability to control port. To
 check if key is present you select it (write its value to control port)
 and then read control port back. If values is non-zero the key is 
 valid.
 But how to detect qemu that does not support that?
>>> Isn't there some key that was always there and will always be?
>>>
>> FW_CFG_SIGNATURE
>>
> So any ideas? Or did I misunderstood your hint? ;)
 I thought you found the answer yourself:

 Seabios could select FW_CFG_SIGNATURE and then perform a read-back on
 the control register. Older QEMUs will return -1, versions that support
 the read-back 0. Problem solved, no?

>>> AFAIK QEMU returns 0 if io read was done from non-used port or mmio
>>> address, but can we rely on this? If we can then problem solved, if
>>> we can't then no.
>> It works for IO-based fw-cfg, but not for MMIO-based. So the firmware
>> should probably pick a non-zero key for this check, e.g. FW_CFG_ID.
>>
> Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
> MMIO-based".

Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
you need to select a key that is different from both.

> Can you write pseudo logic of how you think it
> all should work?

The firmware should do this:

write(CTL_BASE, FW_CFG_ID);
if (read(CTL_BASE) != FW_CFG_ID)
deal_with_old_qemu();
else
check_for_supported_keys();

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, QMP docs

2010-06-17 Thread Kevin Wolf

Am 16.06.2010 20:07, schrieb Anthony Liguori:
>>   But it requires that
>> everything that -blockdev provides is accessible with -drive, too (or
>> that we're okay with users hating us).
>>
> 
> I'm happy for -drive to die.  I think we should support -hda and 
> -blockdev. 

-hda is not sufficient for most users. It doesn't provide any options.
It doesn't even support virtio. If -drive is going to die (and we seem
to agree all on that), then -blockdev needs to be usable for users (and
it's only you who contradicts so far).

> -blockdev should be optimized for config files, not single 
> argument input.  IOW:
> 
> [blockdev "blk2"]
>   format = "raw"
>   file = "/path/to/base.img"
>   cache = "writeback"
> 
> [blockdev "blk1"]
>format = "qcow2"
>file = "/path/to/leaf.img"
>cache="off"
>backing_dev = "blk2"
> 
> [device "disk1"]
>driver = "ide-drive"
>blockdev = "blk1"
>bus = "0"
>unit = "0"

You don't specify the backing file of an image on the command line (or
in the configuration file). It's saved as part of the image. It's more
like this (for a simple raw image file):

[blockdev-protocol "proto1"]
   protocol = "file"
   file = "/path/to/image.img"

[blockdev "blk1"]
   format = "raw"
   cache="off"
   protocol = "proto1"

[device "disk1"]
   driver = "ide-drive"
   blockdev = "blk1"
   bus = "0"
   unit = "0"

(This would be Markus' option 3, I think)

> Or:
> 
> qemu -blockdev id=blk2,format=raw,file=/path/to/base.img,cache=writeback \
>-blockdev 
> id=blk1,format=qcow2,file=/path/to/leaf.img,backing_dev=blk2 \
>-device ide-disk,blockdev=blk1,bus=0,unit=0
> 
> Or:
> 
> qemu -hda /path/to/leaf.img
> 
> And if a user really feels they need to modify the defaults, they can do:
> 
> qemu -hda /path/to/leaf.img -writeconfig myconf.cfg
> 
> And edit from there.
> 
>>> But honestly, I'm thoroughly confused about the distinction between
>>> protocol and format.  I had thought that protocols were a type of format
>>> and I'm not sure why we're making a distinction.
>>>  
>> Technically, they are mostly the same. Logically, they are not. You have
>> one image format driver (raw, qcow2, ...) that accesses its image data
>> through one or more stacked protocols (file, host_device, nbd, http, ...).
>>
>> In the past we've had quite some trouble because there was no clear
>> distinction. raw and file was the same. If you had an image on a block
>> device, you were asking for trouble.
>>
> 
> As Christoph mentions, we really don't have stacked protocols and I'm 
> not sure they make sense.

Right, if we go for Christoph's suggestion, we don't need stacked
protocols. We'll have stacked formats instead. I'm not sure if you like
this any better. ;-)

We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
blkdebug on file. We need to be able to represent this.

 I sure prefer the latter.  The brackets look like noise.  You need to
 understand protocol stacking for them to make any sense.

 Regarding confusion caused by mixing format and protocol options: yes,
 the brackets force you to distinguish between protocol options and
 other options.  But I doubt that'll reduce confusion here.  Either you
 understand protocols.  Then I doubt you need brackets to unconfuse
 you.  Or you don't understand protocols.  Then whether to put an
 option inside or outside the brackets is voodoo.


>>> If the above is necessary just to create a raw image, then we're doing
>>> something wrong in the block layer.  If should be possible to just say:
>>>
>>> -blockdev id=blk1,format=raw,file=fedora.img
>>>  
>> I think we all agree on this (although it contradicts what you said
>> above, because file is a property of the protocol). The question is how
>> to specify protocols explicitly.
> 
> I think raw doesn't make very much sense then.  What's the point of it 
> if it's just a thin wrapper around a protocol?

That it can be wrapped around any protocol. It's just about separating
code for handling the content of an image and code for accessing the image.

Ever tried something like "qemu-img create -f raw /dev/something 10G"?
You need the host_device protocol there, not the file protocol. When we
had raw == file this completely failed. And it's definitely reasonable
to expect that it works because the image format _is_ raw, it's just not
saved in a file.

Or the famous qcow2 images on block devices. Why did qemu guess the
format correctly when qcow2 was saved in a file, but not on a host
device? This was just inconsistent.

I've had more than one bug report about things like this which are
magically fixed when you do the layering right.

Kevin

[Qemu-devel] Re: [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang

On Thursday 17 June 2010 15:41:43 Jan Kiszka wrote:
> Sheng Yang wrote:
> > Based on upstream xsave related fields.
> > 
> > Signed-off-by: Sheng Yang 
> > ---
> > 
> >  qemu-kvm-x86.c |   95
> >  +++- qemu-kvm.c
> >  |   24 ++
> >  qemu-kvm.h |   28 
> >  3 files changed, 146 insertions(+), 1 deletions(-)
> > 
> > diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> > index 3c33e64..dcef8b5 100644
> > --- a/qemu-kvm-x86.c
> > +++ b/qemu-kvm-x86.c
> > @@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct
> > kvm_segment *rhs)
> > 
> > | (rhs->avl * DESC_AVL_MASK);
> >  
> >  }
> > 
> > +#ifdef KVM_CAP_XSAVE
> > +#define XSAVE_CWD_RIP 2
> > +#define XSAVE_CWD_RDP 4
> > +#define XSAVE_MXCSR   6
> > +#define XSAVE_ST_SPACE8
> > +#define XSAVE_XMM_SPACE   40
> > +#define XSAVE_XSTATE_BV   128
> > +#define XSAVE_YMMH_SPACE  144
> > +#endif
> > +
> > 
> >  void kvm_arch_load_regs(CPUState *env, int level)
> >  {
> >  
> >  struct kvm_regs regs;
> >  struct kvm_fpu fpu;
> > 
> > +#ifdef KVM_CAP_XSAVE
> > +struct kvm_xsave* xsave;
> > +#endif
> > +#ifdef KVM_CAP_XCRS
> > +struct kvm_xcrs xcrs;
> > +#endif
> > 
> >  struct kvm_sregs sregs;
> >  struct kvm_msr_entry msrs[100];
> >  int rc, n, i;
> > 
> > @@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
> > 
> >  kvm_set_regs(env, ®s);
> > 
> > +#ifdef KVM_CAP_XSAVE
> > +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
> > +uint16_t cwd, swd, twd, fop;
> > +
> > +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> > +memset(xsave, 0, sizeof(struct kvm_xsave));
> > +cwd = swd = twd = fop = 0;
> > +swd = env->fpus & ~(7 << 11);
> > +swd |= (env->fpstt & 7) << 11;
> > +cwd = env->fpuc;
> > +for (i = 0; i < 8; ++i)
> > +twd |= (!env->fptags[i]) << i;
> > +xsave->region[0] = (uint32_t)(swd << 16) + cwd;
> > +xsave->region[1] = (uint32_t)(fop << 16) + twd;
> > +memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
> > +sizeof env->fpregs);
> > +memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
> > +sizeof env->xmm_regs);
> > +xsave->region[XSAVE_MXCSR] = env->mxcsr;
> > +*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
> > +memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
> > +sizeof env->ymmh_regs);
> > +kvm_set_xsave(env, xsave);
> > +#ifdef KVM_CAP_XCRS
> > +if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
> > +xcrs.nr_xcrs = 1;
> > +xcrs.flags = 0;
> > +xcrs.xcrs[0].xcr = 0;
> > +xcrs.xcrs[0].value = env->xcr0;
> > +kvm_set_xcrs(env, &xcrs);
> > +}
> > +#endif /* KVM_CAP_XCRS */
> > +} else {
> > +#endif /* KVM_CAP_XSAVE */
> 
> Why not reusing kvm_put/get_xsave as defined for upstream? There should
> be enough examples for that pattern. The result will be a tiny qemu-kvm
> patch.

Still lots of codes in kvm_arch_load/save_regs() duplicate with ones in kvm.c, 
e.g. kvm_get/put_sregs, kvm_get/put_msrs. So would like to wait for merging.

--
regards
Yang, Sheng

> 
> Jan
> 
> >  memset(&fpu, 0, sizeof fpu);
> >  fpu.fsw = env->fpus & ~(7 << 11);
> >  fpu.fsw |= (env->fpstt & 7) << 11;
> >  fpu.fcw = env->fpuc;
> >  for (i = 0; i < 8; ++i)
> > 
> > -   fpu.ftwx |= (!env->fptags[i]) << i;
> > +fpu.ftwx |= (!env->fptags[i]) << i;
> > 
> >  memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
> >  memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
> >  fpu.mxcsr = env->mxcsr;
> >  kvm_set_fpu(env, &fpu);
> > 
> > +#ifdef KVM_CAP_XSAVE
> > +}
> > +#endif
> > 
> >  memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
> >  if (env->interrupt_injected >= 0) {
> > 
> > @@ -934,6 +987,12 @@ void kvm_arch_save_regs(CPUState *env)
> > 
> >  {
> >  
> >  struct kvm_regs regs;
> >  struct kvm_fpu fpu;
> > 
> > +#ifdef KVM_CAP_XSAVE
> > +struct kvm_xsave* xsave;
> > +#endif
> > +#ifdef KVM_CAP_XCRS
> > +struct kvm_xcrs xcrs;
> > +#endif
> > 
> >  struct kvm_sregs sregs;
> >  struct kvm_msr_entry msrs[100];
> >  uint32_t hflags;
> > 
> > @@ -965,6 +1024,37 @@ void kvm_arch_save_regs(CPUState *env)
> > 
> >  env->eflags = regs.rflags;
> >  env->eip = regs.rip;
> > 
> > +#ifdef KVM_CAP_XSAVE
> > +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
> > +uint16_t cwd, swd, twd, fop;
> > +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> > +kvm_get_xsave(env, xsave);
> > +cwd = (uint16_t)xsave->region[0];
> > +swd = (uint16_t)(xsave->region[0] >> 16);
> > +twd = (uint16_t)xsave->region[1];
> > +fop = (uint16_t)(xsave->region[1] >> 16);
> > +

Re: [Qemu-devel] Re: [PATCH] monitor: Really show snapshot information about all devices

2010-06-17 Thread Kevin Wolf

Am 16.06.2010 17:57, schrieb Chris Lalancette:
> On 06/16/10 - 05:32:58PM, Kevin Wolf wrote:
>> Am 16.06.2010 17:22, schrieb Chris Lalancette:
>>> On 06/16/10 - 03:15:11PM, Kevin Wolf wrote:
 Am 16.06.2010 14:59, schrieb Miguel Di Ciurcio Filho:
> On Wed, Jun 16, 2010 at 9:40 AM, Kevin Wolf  wrote:
>>
>> If the human monitor was exactly what its name says, I'd happily apply
>> this one (though I think it should be made clear from which image the VM
>> state would be loaded). However, it isn't and I'm not sure if this
>> wouldn't break libvirt. Dan, can you help?
>>
>
> I didn't mention in the commit, but I've looked at libvirt's source
> and it is not using 'info snapshots' AFAIK.

 Anthony, Dan, are you okay with the change then?
>>>
>>> Right, exactly as Miguel said, libvirt doesn't use "info snapshots" at all
>>> at the moment.  One of the reasons we don't use it at present is precisely
>>> because it doesn't give us information about all disks in-use.
>>>
>>> The other reason that we can't use "info snapshots" is that we need to know
>>> parent information about snapshots. That is, if you take a sequence of
>>> snapshots:
>>>
>>> A -> B -> C
>>>
>>> And then you delete B, the disk changes from B will be merged automatically
>>> into C to keep C a valid snapshot.  However, there is currently no way to
>>> discover this parent/child relationship, so we can't use "info snapshots"
>>> for that reason as well.
>>
>> Well, there is no parent/child relation in qcow2, so exposing this is
>> going to be really hard. We also don't really need it anywhere in qemu.
>> What would libvirt use this information for?
> 
> I keep being told this, and I don't really understand how this is.  I know
> when I was heavily playing with this, the scenario above worked; that is, the
> deletion of snapshot B maintained a valid C snapshot.  If nothing is tracking
> the parent/child relationship, how does this work?

Clusters are refcounted. When you save a snapshot, the refcount for all
clusters in the current state is increased. When you delete it, the
refcount is decreased and only if it's zero the cluster is freed.

> As for how libvirt uses it, it's mostly to provide the ability for the user
> to keep track of a "tree" of snapshots.  So the user could do something like
> install their base OS, and take a snapshot S1.  Then they could install one 
> set
> of applications, and take a snapshot S2.  Now they can go back to the base
> image, install a different set of applications, and take a snapshot S3.
> Now both S2 and S3 are children of S1, and libvirt wants to be able to
> represent this relationship.

qemu doesn't even remember which snapshot you have loaded. Basically you
have one L1 table for active cluster allocations and you have another
one for each snapshot. When you load a snapshot, it just copies the L1
table to the active one (and adjusts refcounts).

So technically the concept of a snapshot tree doesn't exist with
internal snapshots. It's something that management introduces.

Kevin

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
> > Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
> > MMIO-based".
> 
> Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
> you need to select a key that is different from both.
> 
But can we rely on it? Is this defined somewhere or if it happens to be
the case in current qemu for x86 arch.

> > Can you write pseudo logic of how you think it
> > all should work?
> 
> The firmware should do this:
> 
> write(CTL_BASE, FW_CFG_ID);
> if (read(CTL_BASE) != FW_CFG_ID)
>   deal_with_old_qemu();
> else
>   check_for_supported_keys();
> 
Ah, I thought about read() returning 0/1, not key itself, so any key that
always existed would do.

--
Gleb.

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
> On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
>>> Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
>>> MMIO-based".
>> Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
>> you need to select a key that is different from both.
>>
> But can we rely on it? Is this defined somewhere or if it happens to be
> the case in current qemu for x86 arch.

For x86 with its port-based access, we are on the safe side as (pre-pnp)
device probing used to work this way. Can't tell for the other archs
that support fw-cfg.

> 
>>> Can you write pseudo logic of how you think it
>>> all should work?
>> The firmware should do this:
>>
>> write(CTL_BASE, FW_CFG_ID);
>> if (read(CTL_BASE) != FW_CFG_ID)
>>  deal_with_old_qemu();
>> else
>>  check_for_supported_keys();
>>
> Ah, I thought about read() returning 0/1, not key itself, so any key that
> always existed would do.

Yes, read-back would mean returning FWCfgState::cur_entry. And that will
be -1 when selected an invalid one.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 4/4] qemu-kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> On Thursday 17 June 2010 15:41:43 Jan Kiszka wrote:
>> Sheng Yang wrote:
>>> Based on upstream xsave related fields.
>>>
>>> Signed-off-by: Sheng Yang 
>>> ---
>>>
>>>  qemu-kvm-x86.c |   95
>>>  +++- qemu-kvm.c
>>>  |   24 ++
>>>  qemu-kvm.h |   28 
>>>  3 files changed, 146 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>>> index 3c33e64..dcef8b5 100644
>>> --- a/qemu-kvm-x86.c
>>> +++ b/qemu-kvm-x86.c
>>> @@ -772,10 +772,26 @@ static void get_seg(SegmentCache *lhs, const struct
>>> kvm_segment *rhs)
>>>
>>> | (rhs->avl * DESC_AVL_MASK);
>>>  
>>>  }
>>>
>>> +#ifdef KVM_CAP_XSAVE
>>> +#define XSAVE_CWD_RIP 2
>>> +#define XSAVE_CWD_RDP 4
>>> +#define XSAVE_MXCSR   6
>>> +#define XSAVE_ST_SPACE8
>>> +#define XSAVE_XMM_SPACE   40
>>> +#define XSAVE_XSTATE_BV   128
>>> +#define XSAVE_YMMH_SPACE  144
>>> +#endif
>>> +
>>>
>>>  void kvm_arch_load_regs(CPUState *env, int level)
>>>  {
>>>  
>>>  struct kvm_regs regs;
>>>  struct kvm_fpu fpu;
>>>
>>> +#ifdef KVM_CAP_XSAVE
>>> +struct kvm_xsave* xsave;
>>> +#endif
>>> +#ifdef KVM_CAP_XCRS
>>> +struct kvm_xcrs xcrs;
>>> +#endif
>>>
>>>  struct kvm_sregs sregs;
>>>  struct kvm_msr_entry msrs[100];
>>>  int rc, n, i;
>>>
>>> @@ -806,16 +822,53 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>>
>>>  kvm_set_regs(env, ®s);
>>>
>>> +#ifdef KVM_CAP_XSAVE
>>> +if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
>>> +uint16_t cwd, swd, twd, fop;
>>> +
>>> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
>>> +memset(xsave, 0, sizeof(struct kvm_xsave));
>>> +cwd = swd = twd = fop = 0;
>>> +swd = env->fpus & ~(7 << 11);
>>> +swd |= (env->fpstt & 7) << 11;
>>> +cwd = env->fpuc;
>>> +for (i = 0; i < 8; ++i)
>>> +twd |= (!env->fptags[i]) << i;
>>> +xsave->region[0] = (uint32_t)(swd << 16) + cwd;
>>> +xsave->region[1] = (uint32_t)(fop << 16) + twd;
>>> +memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
>>> +sizeof env->fpregs);
>>> +memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
>>> +sizeof env->xmm_regs);
>>> +xsave->region[XSAVE_MXCSR] = env->mxcsr;
>>> +*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
>>> +memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
>>> +sizeof env->ymmh_regs);
>>> +kvm_set_xsave(env, xsave);
>>> +#ifdef KVM_CAP_XCRS
>>> +if (kvm_check_extension(kvm_state, KVM_CAP_XCRS)) {
>>> +xcrs.nr_xcrs = 1;
>>> +xcrs.flags = 0;
>>> +xcrs.xcrs[0].xcr = 0;
>>> +xcrs.xcrs[0].value = env->xcr0;
>>> +kvm_set_xcrs(env, &xcrs);
>>> +}
>>> +#endif /* KVM_CAP_XCRS */
>>> +} else {
>>> +#endif /* KVM_CAP_XSAVE */
>> Why not reusing kvm_put/get_xsave as defined for upstream? There should
>> be enough examples for that pattern. The result will be a tiny qemu-kvm
>> patch.
> 
> Still lots of codes in kvm_arch_load/save_regs() duplicate with ones in 
> kvm.c, 
> e.g. kvm_get/put_sregs, kvm_get/put_msrs. So would like to wait for merging.

That we still have some legacy here is no good reason to increase it.
Just check how debugregs were introduced.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Gautham R Shenoy

On Wed, Jun 16, 2010 at 02:34:16PM +0200, Paolo Bonzini wrote:
>> +block-obj-y += qemu-thread.o
>> +block-obj-y += async-work.o
>
> These should be (at least for now) block-obj-$(CONFIG_POSIX).

Right. Will fix that.
>
>> +while (QTAILQ_EMPTY(&(queue->request_list))&&
>> +   (ret != ETIMEDOUT)) {
>> +ret = qemu_cond_timedwait(&(queue->cond),
>> +&(queue->lock), 10*10);
>> +}
>
> Using qemu_cond_timedwait is a hack for not properly broadcasting the 
> condvar in flush_threadlet_queue.

I think Anthony answered this one.
>
>> +if (QTAILQ_EMPTY(&(queue->request_list)))
>> +goto check_exit;
>
> What's the reason for the goto?  {...} works just as well.

Yes {...} works.

Besides, this two step condition checking is broken and can
cause the threads to exit even in the presence of unprocessed
queued ThreadletWork items.
Will fix this in the v5 (hopefully there will be one :-))

>
>> +/**
>> + * flush_threadlet_queue: Wait till completion of all the submitted tasks
>> + * @queue: Queue containing the tasks we're waiting on.
>> + */
>> +void flush_threadlet_queue(ThreadletQueue *queue)
>> +{
>> +qemu_mutex_lock(&queue->lock);
>> +queue->exit = 1;
>> +
>> +qemu_barrier_init(&queue->barr, queue->cur_threads + 1);
>> +qemu_mutex_unlock(&queue->lock);
>> +
>> +qemu_barrier_wait(&queue->barr);
>
> Can be implemented just as well with queue->cond and a loop waiting for 
> queue->cur_threads == 0.  This would remove the need to implement barriers 
> in qemu-threads (especially for Win32).  Anyway whoever will contribute 
> Win32 qemu-threads can do it, since it's not hard.

That was the other option I had considered before going for barriers,
for no particular reason. Now, considering that barriers are not
welcome, I will implement this method.

>
>> +int cancel_threadlet_common(ThreadletWork *work)
>> +{
>> +return cancel_threadlet(&globalqueue, work);
>> +}
>
> I would prefer *_threadlet to be the globalqueue function (and 
> flush_threadlets) and queue_*_threadlet to be the special-queue function. I 
> should have spoken earlier probably, but please consider this if there will 
> be a v5.

Sure, will do that.
>
>> + * Generalization based on posix-aio emulation code.
>
> No need to specify these as long as the original authors are attributed 
> properly.

Ok!
>
>> +static inline void threadlet_queue_init(ThreadletQueue *queue,
>> +int max_threads, int min_threads)
>> +{
>> +queue->cur_threads  = 0;
>> +queue->idle_threads = 0;
>> +queue->exit = 0;
>> +queue->max_threads  = max_threads;
>> +queue->min_threads  = min_threads;
>> +QTAILQ_INIT(&(queue->request_list));
>> +QTAILQ_INIT(&(queue->threadlet_work_pool));
>> +qemu_mutex_init(&(queue->lock));
>> +qemu_cond_init(&(queue->cond));
>> +}
>
> No need to make this inline.

Will fix this.
>
>> +extern void threadlet_submit(ThreadletQueue *queue,
>> +  ThreadletWork *work);
>> +
>> +extern void threadlet_submit_common(ThreadletWork *work);
>> +
>> +extern int cancel_threadlet(ThreadletQueue *queue, ThreadletWork *work);
>> +extern int cancel_threadlet_common(ThreadletWork *work);
>> +
>> +
>> +extern void flush_threadlet_queue(ThreadletQueue *queue);
>> +extern void flush_common_threadlet_queue(void);
>
> Please make the position of the verb consistent (e.g. "submit_threadlet").

Overlooked threadlet_submit() in the rename process. It has to be
submit_threadlet(). Will fix.

Thanks for the detailed review.
Regards
gautham.
>
> Paolo

[Qemu-devel] Re: [SeaBIOS] [PATCHv2] load hpet info for HPET ACPI table from qemu

2010-06-17 Thread Gleb Natapov

On Wed, Jun 16, 2010 at 09:22:09PM -0400, Kevin O'Connor wrote:
> On Tue, Jun 15, 2010 at 09:37:07AM +0300, Gleb Natapov wrote:
> > On Mon, Jun 14, 2010 at 04:12:32PM -0400, Kevin O'Connor wrote:
> > > But.. in order to move to a newer ACPI spec, there would be qemu
> > > changes anyway.  (If nothing else, so that qemu can tell seabios if
> > > it's okay to use the new rev.)  At that point we're stuck changing
> > > both repos anyway - nothing gained, nothing lost.
> > I don't see why qemu should care what ACPI rev Seabios uses.
> 
> A change in ACPI rev would likely break old OSs.  Only the user would
In that case new ACPI would never be adopted. No HW manufacturer would
risk to not be able to run WindowsXP on their HW. Real BIOS may have
config option to choose what ACPI version to use though. We can add this
too.

> know this, and so a method of propagating that info from qemu to
> seabios would be needed.  (However, it's much more likely that a new
> ACPI rev would require more data which qemu would then also need to
> pass to seabios.)
Why do you think so? But anyway my position is that we need to pass
maximum information from qemu to firmware. On real HW bios knows
everything about underlying hardware.


> 
> > > I still think there is an opportunity to reduce the load on the bulk
> > > of acpi changes - most of these changes have no dependence on seabios
> > > at all.
> > That depends on how you view seabios project. If you consider it to be
> > legacy bios functionality provider only then I agree and we should move
> > to coreboot model. If you consider it to be legacy bios + qemu firmware
> > (like old BOCHS bios was) then by definition it's seabios job to
> > describe underlying HW to an OS.
> 
> I don't think this is that "cut and dry".  A real machine just ships
> with these acpi tables compiled in.  This is what BOCHS bios did and
> it is what seabios did up until about 8 months ago.
That was because qemu was stale project for a couple of years. Now pace of
qemu development is very fast, so the same is required from firmware
too. When qemu development started to accelerate BOCHS bios was
essentially forked to allow for faster development.

> 
> However, qemu isn't a simple machine emulator - it can emulate a whole
> class of x86 machines.  It's not practical to compile a seabios.bin
> file for every permutation of x86 machine that qemu can emulate.  So,
> we pass info from qemu to seabios so that it can support all the
> classes of hardware.  This isn't what real machines do, and it's not
> what bochs bios did.
BOCHS bios didn't do it because when qemu development accelerated we
switched to seabios. I agree with paragraph above otherwise. We just not
agree in what form information should be passed. You think we should
pass HPET ACPI table (my guess is just because we already have a way to
pass ACPI table to seabios) and I think this is abuse of ACPI spec. fw cfg
interface was designed to be extendible, why oppose adding things to it?
It is not like if we build HPET table in qemu we will not have to patch
seabios and coordinate changes. Seabios creates HPET table
unconditionally now and we will have to fix it to not do that if HPET
table is passed from qemu (and for that seabios will have to expect all tables
that it receives over fw cfg interface, something it doesn't do now) and
it will have to detect old qemu somehow and create HPET unconditionally
to preserve old behaviour on old qemus. In the end the change to seabios
will be bigger that proposed patch.

> 
> I do view SeaBIOS as primarily a legacy bios interface and a boot
> loader.  
This is worrying statement for qemu project.

>  I also think it makes sense to handle qemu and kvm firmware
> needs - 
Good, but qemu needs are growing in the pace of qemu development and
this is fast these days.

>  some initialization wants to be done from the guest and
> seabios is a good place to do that.
> 
HW does not initialize itself. So the only sane place to do _all_
initialization is from guest.

> This hpet thing is really rather minor, but it has me puzzled.  The
> guest OS wants the info in ACPI form, and only qemu has the info.  I
> don't understand why there is a desire to pass the info in this new
> arbitrary form instead of passing it in the form that the OS wants it
> in.
Because OS does not talk directly to qemu. It has mediator in the form
of seabios. We have spec that define interface between seabios and an OS
(ACPI spec) and we define interface between seabios and qemu by ourselves.
Why intentionally blur this separation?

> 
> A couple of emails back you stated you considered using the existing
> qemu_cfg_acpi_additional_tables() format but dismissed the idea.
> Maybe you could explain why you dismissed it and/or what the
> deficiencies of this mechanism are?
> 
I dismissed it (very quickly) on the premiss that this is layering
violation. I saw that I need to specify value that qemu should have
nothing to do with to b

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 10:42:34AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
> >>> Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
> >>> MMIO-based".
> >> Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
> >> you need to select a key that is different from both.
> >>
> > But can we rely on it? Is this defined somewhere or if it happens to be
> > the case in current qemu for x86 arch.
> 
> For x86 with its port-based access, we are on the safe side as (pre-pnp)
> device probing used to work this way. Can't tell for the other archs
> that support fw-cfg.
> 
> > 
> >>> Can you write pseudo logic of how you think it
> >>> all should work?
> >> The firmware should do this:
> >>
> >> write(CTL_BASE, FW_CFG_ID);
> >> if (read(CTL_BASE) != FW_CFG_ID)
> >>deal_with_old_qemu();
> >> else
> >>check_for_supported_keys();
> >>
> > Ah, I thought about read() returning 0/1, not key itself, so any key that
> > always existed would do.
> 
> Yes, read-back would mean returning FWCfgState::cur_entry. And that will
> be -1 when selected an invalid one.
> 
Heh, actually I have better idea. Why not advance FW_CFG_ID to version 2.

--
Gleb.

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Jan Kiszka

Gleb Natapov wrote:
> On Thu, Jun 17, 2010 at 10:42:34AM +0200, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
> Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
> MMIO-based".
 Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
 you need to select a key that is different from both.

>>> But can we rely on it? Is this defined somewhere or if it happens to be
>>> the case in current qemu for x86 arch.
>> For x86 with its port-based access, we are on the safe side as (pre-pnp)
>> device probing used to work this way. Can't tell for the other archs
>> that support fw-cfg.
>>
> Can you write pseudo logic of how you think it
> all should work?
 The firmware should do this:

 write(CTL_BASE, FW_CFG_ID);
 if (read(CTL_BASE) != FW_CFG_ID)
deal_with_old_qemu();
 else
check_for_supported_keys();

>>> Ah, I thought about read() returning 0/1, not key itself, so any key that
>>> always existed would do.
>> Yes, read-back would mean returning FWCfgState::cur_entry. And that will
>> be -1 when selected an invalid one.
>>
> Heh, actually I have better idea. Why not advance FW_CFG_ID to version 2.

If that is supposed to be a version number - yeah, good idea.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> Signed-off-by: Sheng Yang 
> ---
>  target-i386/cpu.h |7 ++-
>  target-i386/kvm.c |  139 
> -
>  target-i386/machine.c |   20 +++
>  3 files changed, 163 insertions(+), 3 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 548ab80..680eed1 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -718,6 +718,11 @@ typedef struct CPUX86State {
>  uint16_t fpus_vmstate;
>  uint16_t fptag_vmstate;
>  uint16_t fpregs_format_vmstate;
> +
> +uint64_t xstate_bv;
> +XMMReg ymmh_regs[CPU_NB_REGS];
> +
> +uint64_t xcr0;
>  } CPUX86State;
>  
>  CPUX86State *cpu_x86_init(const char *cpu_model);
> @@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
>  #define cpu_list_id x86_cpu_list
>  #define cpudef_setup x86_cpudef_setup
>  
> -#define CPU_SAVE_VERSION 11
> +#define CPU_SAVE_VERSION 12
>  
>  /* MMU modes definitions */
>  #define MMU_MODE0_SUFFIX _kernel
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index bb6a12f..e490c0a 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
>  } else {
>  env->mp_state = KVM_MP_STATE_RUNNABLE;
>  }
> +/* Legal xcr0 for loading */
> +env->xcr0 = 1;
>  }
>  
>  static int kvm_has_msr_star(CPUState *env)
> @@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
>  return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
>  }
>  
> +#ifdef KVM_CAP_XSAVE
> +#define XSAVE_CWD_RIP 2
> +#define XSAVE_CWD_RDP 4
> +#define XSAVE_MXCSR   6
> +#define XSAVE_ST_SPACE8
> +#define XSAVE_XMM_SPACE   40
> +#define XSAVE_XSTATE_BV   128
> +#define XSAVE_YMMH_SPACE  144
> +#endif
> +
> +static int kvm_put_xsave(CPUState *env)
> +{
> +#ifdef KVM_CAP_XSAVE
> +int i;
> +struct kvm_xsave* xsave;
> +uint16_t cwd, swd, twd, fop;
> +
> +if (kvm_check_extension(env->kvm_state, KVM_CAP_XSAVE))

That's still one syscall too much for this path (which will be a
fast-path for Kemari). Get that value during arch_init.

> +return kvm_put_fpu(env);
> +
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +memset(xsave, 0, sizeof(struct kvm_xsave));
> +cwd = swd = twd = fop = 0;
> +swd = env->fpus & ~(7 << 11);
> +swd |= (env->fpstt & 7) << 11;
> +cwd = env->fpuc;
> +for (i = 0; i < 8; ++i)
> +twd |= (!env->fptags[i]) << i;
> +xsave->region[0] = (uint32_t)(swd << 16) + cwd;
> +xsave->region[1] = (uint32_t)(fop << 16) + twd;
> +memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
> +sizeof env->fpregs);
> +memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
> +sizeof env->xmm_regs);
> +xsave->region[XSAVE_MXCSR] = env->mxcsr;
> +*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
> +memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
> +sizeof env->ymmh_regs);
> +return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
> +#else
> +return kvm_put_fpu(env);
> +#endif
> +}
> +
> +static int kvm_put_xcrs(CPUState *env)
> +{
> +#ifdef KVM_CAP_XCRS
> +struct kvm_xcrs xcrs;
> +
> +if (!kvm_check_extension(env->kvm_state, KVM_CAP_XCRS))
> +return 0;
> +
> +xcrs.nr_xcrs = 1;
> +xcrs.flags = 0;
> +xcrs.xcrs[0].xcr = 0;
> +xcrs.xcrs[0].value = env->xcr0;
> +return kvm_vcpu_ioctl(env, KVM_SET_XCRS, &xcrs);
> +#else
> +return 0;
> +#endif
> +}
> +
>  static int kvm_put_sregs(CPUState *env)
>  {
>  struct kvm_sregs sregs;
> @@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
>  return 0;
>  }
>  
> +static int kvm_get_xsave(CPUState *env)
> +{
> +#ifdef KVM_CAP_XSAVE
> +struct kvm_xsave* xsave;
> +int ret, i;
> +uint16_t cwd, swd, twd, fop;
> +
> +if (!kvm_check_extension(env->kvm_state, KVM_CAP_XSAVE))
> +return kvm_get_fpu(env);
> +
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
> +if (ret < 0)
> +return ret;
> +
> +cwd = (uint16_t)xsave->region[0];
> +swd = (uint16_t)(xsave->region[0] >> 16);
> +twd = (uint16_t)xsave->region[1];
> +fop = (uint16_t)(xsave->region[1] >> 16);
> +env->fpstt = (swd >> 11) & 7;
> +env->fpus = swd;
> +env->fpuc = cwd;
> +for (i = 0; i < 8; ++i)
> +env->fptags[i] = !((twd >> i) & 1);
> +env->mxcsr = xsave->region[XSAVE_MXCSR];
> +memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
> +sizeof env->fpregs);
> +memcpy(env->xmm_regs, &xsave->region[XSAVE_XMM_SPACE],
> +sizeof env->xmm_regs);
> +env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
> +memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
> +sizeof env->ymmh_regs);
> +return 0;
> +#else
> +return kvm_get_fpu(env);
> +#endif
> +}
> +
> +static int kvm_get_xcrs(CPUStat

Re: [Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Gautham R Shenoy

On Wed, Jun 16, 2010 at 10:20:36AM -0500, Anthony Liguori wrote:
> On 06/16/2010 09:52 AM, Paolo Bonzini wrote:
>> BTW it's obviously okay with signaling the condition when a threadlet is 
>> submitted.  But when something affects all queue's workers 
>> (flush_threadlet_queue) you want a broadcast and using expiration as a 
>> substitute is fishy.
>
> IMHO, there shouldn't be a need for flush_threadlet_queue.  It doesn't look 
> used in the aio conversion and if virtio-9p needs it, I suspect something 
> is wrong.

virtio-9p doesn't need it.

The API has been added for the vnc-server case, where a subsystem wants
to wait on the threads of it's private queue to finish executing the
already queued tasks. It's the responsibility of the subsystem to make sure
that new tasks are not submitted during this interval.

I sought clarification regarding this earlier,
http://lists.gnu.org/archive/html/qemu-devel/2010-06/msg01382.html

But now I am beginning to doubt I understood the use-case correctly.
>
> Regards,
>
> Anthony Liguori
-- 
Thanks and Regards
gautham

Re: [Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Gautham R Shenoy

On Wed, Jun 16, 2010 at 06:06:35PM +0200, Corentin Chary wrote:
> On Wed, Jun 16, 2010 at 5:52 PM, Anthony Liguori
>  wrote:
> > On 06/16/2010 10:47 AM, Corentin Chary wrote:
> >>
> >> I would need something like flush_threadlet_queue for the vnc server.
> >> I need it in
> >> vnc_disconnect(), vnc_dpy_resize() and vnc_dpy_cpy() so wait (and/or
> >> abort) current
> >> encoding jobs.
> >>
> >
> > I'm not sure threadlets are the right thing for the VNC server.  The VNC
> > server wants one dedicated thread.  Threadlets are a thread pool.  You could
> > potentially use one thread per client but I doubt it would be worth it.
> >
> > At any rate, flushing the full queue is overkill.  You want to wait for your
> > specific thread to terminate and you want to block execution until that
> > happens.  IOW, you want to join the thread.
> >
> 
> Oh right, I should have read the changelog more carefully, it's a
> global queue now ...

Well, the APIs that allow the subsystems to create their own private
queues is still retained. But having read what Anthony mentioned, I
doubt if you would want to do that for a single helper thread :-)

> 
> Thanks,
> -- 
> Corentin Chary
> http://xf.iksaif.net

-- 
Thanks and Regards
gautham

Re: [Qemu-devel] Re: [PATCH] hpet: Clean up initial hpet counter

2010-06-17 Thread Gleb Natapov

On Thu, Jun 17, 2010 at 10:59:01AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Thu, Jun 17, 2010 at 10:42:34AM +0200, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Thu, Jun 17, 2010 at 10:30:15AM +0200, Jan Kiszka wrote:
> > Sorry, I lost you here. What "works for IO-based fw-cfg, but not for
> > MMIO-based".
>  Undefined IO ports return -1, undefined (/wrt read access) MMIO 0. So
>  you need to select a key that is different from both.
> 
> >>> But can we rely on it? Is this defined somewhere or if it happens to be
> >>> the case in current qemu for x86 arch.
> >> For x86 with its port-based access, we are on the safe side as (pre-pnp)
> >> device probing used to work this way. Can't tell for the other archs
> >> that support fw-cfg.
> >>
> > Can you write pseudo logic of how you think it
> > all should work?
>  The firmware should do this:
> 
>  write(CTL_BASE, FW_CFG_ID);
>  if (read(CTL_BASE) != FW_CFG_ID)
>   deal_with_old_qemu();
>  else
>   check_for_supported_keys();
> 
> >>> Ah, I thought about read() returning 0/1, not key itself, so any key that
> >>> always existed would do.
> >> Yes, read-back would mean returning FWCfgState::cur_entry. And that will
> >> be -1 when selected an invalid one.
> >>
> > Heh, actually I have better idea. Why not advance FW_CFG_ID to version 2.
> 
> If that is supposed to be a version number - yeah, good idea.
> 
That was the idea behind it. I just forgot it exists.

--
Gleb.

[Qemu-devel] Re: [PATCH 09/10] pci: set PCI multi-function bit appropriately.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:51PM +0900, Isaku Yamahata wrote:
> set PCI multi-function bit appropriately.
> 
> Signed-off-by: Isaku Yamahata 
> 
> ---
> changes v1 -> v2:
> don't set header type register in configuration space.
> ---
>  hw/pci.c |   25 +
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 5316aa5..ee391dc 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -607,6 +607,30 @@ static void pci_init_wmask_bridge(PCIDevice *d)
>  pci_set_word(d->wmask + PCI_BRIDGE_CONTROL, 0x);
>  }
>  
> +static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev)
> +{
> +uint8_t slot = PCI_SLOT(dev->devfn);
> +uint8_t func_max = 8;

enum or define would be better

> +uint8_t func;

If I understand correctly what this does, it goes over
other functions of the same device, and sets the MULTI_FUNCTION bit
for them if there's more than one function.
Instead, why don't we just set PCI_HEADER_TYPE_MULTI_FUNCTION
in relevant devices?

> +
> +for (func = 0; func < func_max; ++func) {
> +if (bus->devices[PCI_DEVFN(slot, func)]) {
> +break;
> +}
> +}
> +if (func == func_max) {
> +return;
> +}
> +

The above only works if the function is called before
device is added to bus.

> +for (func = 0; func < func_max; ++func) {
> +if (bus->devices[PCI_DEVFN(slot, func)]) {
> +bus->devices[PCI_DEVFN(slot, func)]->config[PCI_HEADER_TYPE] |=
> +PCI_HEADER_TYPE_MULTI_FUNCTION;
> +}
> +}
> +dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;

Isn't the bit set above already?

> +}
> +
>  static void pci_config_alloc(PCIDevice *pci_dev)
>  {
>  int config_size = pci_config_size(pci_dev);
> @@ -660,6 +684,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev, PCIBus *bus,
>  if (is_bridge) {
>  pci_init_wmask_bridge(pci_dev);
>  }
> +pci_init_multifunction(bus, pci_dev);
>  
>  if (!config_read)
>  config_read = pci_default_read_config;
> -- 
> 1.6.6.1

[Qemu-devel] Re: [PATCH 10/10] pci: don't overwrite multi functio bit in pci header type.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:52PM +0900, Isaku Yamahata wrote:
> Don't overwrite pci header type.
> Otherwise, multi function bit which pci_init_header_type() sets
> appropriately is lost.
> Anyway PCI_HEADER_TYPE_NORMAL is zero, so it is unnecessary to zero
> which is already zero cleared.
> 
> Signed-off-by: Isaku Yamahata 

All this churn will need quite a bit of testing.
Please record what was tested in the commit message.
If we are doing it, let's clean other registers which
sets registers to default values?

> ---
> changes v1 -> v2:
> - set header type of bridge type in pci_bridge_initfn().
> - dropped ugly hunk in apb_pci.c.
> ---
>  hw/ac97.c |1 -
>  hw/acpi_piix4.c   |1 -
>  hw/apb_pci.c  |2 --
>  hw/grackle_pci.c  |1 -
>  hw/ide/cmd646.c   |1 -
>  hw/ide/piix.c |1 -
>  hw/macio.c|1 -
>  hw/ne2000.c   |1 -
>  hw/openpic.c  |1 -
>  hw/pcnet.c|1 -
>  hw/piix4.c|3 +--
>  hw/piix_pci.c |4 +---
>  hw/prep_pci.c |1 -
>  hw/rtl8139.c  |1 -
>  hw/sun4u.c|1 -
>  hw/unin_pci.c |4 
>  hw/usb-uhci.c |1 -
>  hw/vga-pci.c  |1 -
>  hw/virtio-pci.c   |1 -
>  hw/vmware_vga.c   |1 -
>  hw/wdt_i6300esb.c |1 -
>  21 files changed, 2 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/ac97.c b/hw/ac97.c
> index 4319bc8..d71072d 100644
> --- a/hw/ac97.c
> +++ b/hw/ac97.c
> @@ -1295,7 +1295,6 @@ static int ac97_initfn (PCIDevice *dev)
>  c[PCI_REVISION_ID] = 0x01;  /* rid revision ro */
>  c[PCI_CLASS_PROG] = 0x00;  /* pi programming interface ro */
>  pci_config_set_class (c, PCI_CLASS_MULTIMEDIA_AUDIO); /* ro */
> -c[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; /* headtyp header type ro */
>  
>  /* TODO set when bar is registered. no need to override. */
>  /* nabmar native audio mixer base address rw */
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 8d1a628..bfa1d9a 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -369,7 +369,6 @@ static int piix4_pm_initfn(PCIDevice *dev)
>  pci_conf[0x08] = 0x03; // revision number
>  pci_conf[0x09] = 0x00;
>  pci_config_set_class(pci_conf, PCI_CLASS_BRIDGE_OTHER);
> -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
>  pci_conf[0x3d] = 0x01; // interrupt pin 1
>  
>  pci_conf[0x40] = 0x01; /* PM io base read only bit */
> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> index a1c17b9..3b8eda3 100644
> --- a/hw/apb_pci.c
> +++ b/hw/apb_pci.c
> @@ -431,8 +431,6 @@ static int pbm_pci_host_init(PCIDevice *d)
>   PCI_STATUS_FAST_BACK | PCI_STATUS_66MHZ |
>   PCI_STATUS_DEVSEL_MEDIUM);
>  pci_config_set_class(d->config, PCI_CLASS_BRIDGE_HOST);
> -pci_set_byte(d->config + PCI_HEADER_TYPE,
> - PCI_HEADER_TYPE_NORMAL);
>  return 0;
>  }
>  
> diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
> index aa0c51b..b3a5f54 100644
> --- a/hw/grackle_pci.c
> +++ b/hw/grackle_pci.c
> @@ -126,7 +126,6 @@ static int grackle_pci_host_init(PCIDevice *d)
>  d->config[0x08] = 0x00; // revision
>  d->config[0x09] = 0x01;
>  pci_config_set_class(d->config, PCI_CLASS_BRIDGE_HOST);
> -d->config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
>  return 0;
>  }
>  
> diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
> index 559147f..756ee81 100644
> --- a/hw/ide/cmd646.c
> +++ b/hw/ide/cmd646.c
> @@ -240,7 +240,6 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
>  pci_conf[PCI_CLASS_PROG] = 0x8f;
>  
>  pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
> -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
>  
>  pci_conf[0x51] = 0x04; // enable IDE0
>  if (d->secondary) {
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index dad6e86..8817915 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -122,7 +122,6 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
>  
>  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
>  pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
> -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
>  
>  qemu_register_reset(piix3_reset, d);
>  
> diff --git a/hw/macio.c b/hw/macio.c
> index e92e82a..789ca55 100644
> --- a/hw/macio.c
> +++ b/hw/macio.c
> @@ -110,7 +110,6 @@ void macio_init (PCIBus *bus, int device_id, int 
> is_oldworld, int pic_mem_index,
>  pci_config_set_vendor_id(d->config, PCI_VENDOR_ID_APPLE);
>  pci_config_set_device_id(d->config, device_id);
>  pci_config_set_class(d->config, PCI_CLASS_OTHERS << 8);
> -d->config[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
>  
>  d->config[0x3d] = 0x01; // interrupt on pin 1
>  
> diff --git a/hw/ne2000.c b/hw/ne2000.c
> index 78fe14f..126e7cf 100644
> --- a/hw/ne2000.c
> +++ b/hw/ne2000.c
> @@ -723,7 +723,6 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
>  pci_config_set_ven

[Qemu-devel] [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang 
---
 kvm-all.c |   21 +++
 kvm.h |2 +
 target-i386/cpu.h |7 ++-
 target-i386/kvm.c |  139 -
 target-i386/machine.c |   20 +++
 5 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 43704b8..343c06e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -71,6 +71,7 @@ struct KVMState
 #endif
 int irqchip_in_kernel;
 int pit_in_kernel;
+int xsave, xcrs;
 };
 
 static KVMState *kvm_state;
@@ -685,6 +686,16 @@ int kvm_init(int smp_cpus)
 s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
 #endif
 
+s->xsave = 0;
+#ifdef KVM_CAP_XSAVE
+s->xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
+#endif
+
+s->xcrs = 0;
+#ifdef KVM_CAP_XCRS
+s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
+#endif
+
 ret = kvm_arch_init(s, smp_cpus);
 if (ret < 0)
 goto err;
@@ -1013,6 +1024,16 @@ int kvm_has_debugregs(void)
 return kvm_state->debugregs;
 }
 
+int kvm_has_xsave(void)
+{
+return kvm_state->xsave;
+}
+
+int kvm_has_xcrs(void)
+{
+return kvm_state->xcrs;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm.h b/kvm.h
index 7975e87..50c4192 100644
--- a/kvm.h
+++ b/kvm.h
@@ -41,6 +41,8 @@ int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
+int kvm_has_xsave(void);
+int kvm_has_xcrs(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 548ab80..680eed1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -718,6 +718,11 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
+
+uint64_t xstate_bv;
+XMMReg ymmh_regs[CPU_NB_REGS];
+
+uint64_t xcr0;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define cpu_list_id x86_cpu_list
 #define cpudef_setup   x86_cpudef_setup
 
-#define CPU_SAVE_VERSION 11
+#define CPU_SAVE_VERSION 12
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index bb6a12f..db1f21d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
 } else {
 env->mp_state = KVM_MP_STATE_RUNNABLE;
 }
+/* Legal xcr0 for loading */
+env->xcr0 = 1;
 }
 
 static int kvm_has_msr_star(CPUState *env)
@@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
 }
 
+#ifdef KVM_CAP_XSAVE
+#define XSAVE_CWD_RIP 2
+#define XSAVE_CWD_RDP 4
+#define XSAVE_MXCSR   6
+#define XSAVE_ST_SPACE8
+#define XSAVE_XMM_SPACE   40
+#define XSAVE_XSTATE_BV   128
+#define XSAVE_YMMH_SPACE  144
+#endif
+
+static int kvm_put_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+int i;
+struct kvm_xsave* xsave;
+uint16_t cwd, swd, twd, fop;
+
+if (!kvm_has_xsave())
+return kvm_put_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+memset(xsave, 0, sizeof(struct kvm_xsave));
+cwd = swd = twd = fop = 0;
+swd = env->fpus & ~(7 << 11);
+swd |= (env->fpstt & 7) << 11;
+cwd = env->fpuc;
+for (i = 0; i < 8; ++i)
+twd |= (!env->fptags[i]) << i;
+xsave->region[0] = (uint32_t)(swd << 16) + cwd;
+xsave->region[1] = (uint32_t)(fop << 16) + twd;
+memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
+sizeof env->fpregs);
+memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
+sizeof env->xmm_regs);
+xsave->region[XSAVE_MXCSR] = env->mxcsr;
+*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
+memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
+sizeof env->ymmh_regs);
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+#else
+return kvm_put_fpu(env);
+#endif
+}
+
+static int kvm_put_xcrs(CPUState *env)
+{
+#ifdef KVM_CAP_XCRS
+struct kvm_xcrs xcrs;
+
+if (!kvm_has_xcrs())
+return 0;
+
+xcrs.nr_xcrs = 1;
+xcrs.flags = 0;
+xcrs.xcrs[0].xcr = 0;
+xcrs.xcrs[0].value = env->xcr0;
+return kvm_vcpu_ioctl(env, KVM_SET_XCRS, &xcrs);
+#else
+return 0;
+#endif
+}
+
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+static int kvm_get_xsave(CPUState *env)
+{
+#ifdef KVM_CAP_XSAVE
+struct kvm_xsave* xsave;
+int ret, i;
+uint16_t cwd, swd, twd, fop;
+
+if (!kvm_has_xsave())
+return kvm_get_fpu(env);
+
+xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
+ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+if (ret < 0)
+return ret;
+
+cwd = (uint16_t

[Qemu-devel] Re: [PATCH 04/10] pci_bridge: introduce pci bridge layer.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:46PM +0900, Isaku Yamahata wrote:
> introduce pci bridge layer.
> export pci_bridge_write_config() for generic use.
> support device reset and bus reset of bridge control.
> convert apb bridge and dec p2p bridge to use new pci bridge layer.
> save/restore is supported as a side effect.
> 
> This might be a bit over engineering, but this is also preparation
> for pci express root/upstream/downstream port.
> 
> Signed-off-by: Isaku Yamahata 

Well, preparations are easier to judge with patches that use them.

> ---
>  hw/apb_pci.c|   38 +-
>  hw/dec_pci.c|   28 +++---
>  hw/pci_bridge.c |  146 
> +--
>  hw/pci_bridge.h |   35 -
>  qemu-common.h   |1 +
>  5 files changed, 177 insertions(+), 71 deletions(-)
> 
> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> index c11d9b5..cb9051b 100644
> --- a/hw/apb_pci.c
> +++ b/hw/apb_pci.c
> @@ -31,6 +31,7 @@
>  #include "pci_host.h"
>  #include "pci_bridge.h"
>  #include "rwhandler.h"
> +#include "pci_bridge.h"
>  #include "apb_pci.h"
>  #include "sysemu.h"
>  
> @@ -294,9 +295,12 @@ static void pci_apb_set_irq(void *opaque, int irq_num, 
> int level)
>  }
>  }
>  
> -static void apb_pci_bridge_init(PCIBus *b)
> +static int apb_pci_bridge_init(PCIBridge *br)
>  {
> -PCIDevice *dev = pci_bridge_get_device(b);
> +PCIDevice *dev = &br->dev;
> +
> +pci_config_set_vendor_id(dev->config, PCI_VENDOR_ID_SUN);
> +pci_config_set_device_id(dev->config, PCI_DEVICE_ID_SUN_SIMBA);
>  
>  /*
>   * command register:
> @@ -316,6 +320,8 @@ static void apb_pci_bridge_init(PCIBus *b)
>  pci_set_byte(dev->config + PCI_HEADER_TYPE,
>   pci_get_byte(dev->config + PCI_HEADER_TYPE) |
>   PCI_HEADER_TYPE_MULTI_FUNCTION);
> +
> +return 0;
>  }
>  
>  PCIBus *pci_apb_init(target_phys_addr_t special_base,
> @@ -326,6 +332,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>  SysBusDevice *s;
>  APBState *d;
>  unsigned int i;
> +PCIBridge *br;
>  
>  /* Ultrasparc PBM main bus */
>  dev = qdev_create(NULL, "pbm");
> @@ -351,17 +358,13 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>  pci_create_simple(d->bus, 0, "pbm");
>  
>  /* APB secondary busses */
> -*bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0),
> -PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
> -pci_apb_map_irq,
> -"Advanced PCI Bus secondary bridge 1");
> -apb_pci_bridge_init(*bus2);
> -
> -*bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1),
> -PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
> -pci_apb_map_irq,
> -"Advanced PCI Bus secondary bridge 2");
> -apb_pci_bridge_init(*bus3);
> +br = pci_bridge_create_simple(d->bus, PCI_DEVFN(1, 0), "pbm-bridge",
> +  "Advanced PCI Bus secondary bridge 1");
> +*bus2 = pci_bridge_get_sec_bus(br);
> +
> +br = pci_bridge_create_simple(d->bus, PCI_DEVFN(1, 1), "pbm-bridge",
> +  "Advanced PCI Bus secondary bridge 2");
> +*bus3 = pci_bridge_get_sec_bus(br);
>  
>  return d->bus;
>  }
> @@ -446,10 +449,19 @@ static SysBusDeviceInfo pbm_host_info = {
>  .qdev.reset = pci_pbm_reset,
>  .init = pci_pbm_init_device,
>  };
> +
> +static PCIBridgeInfo pbm_pci_bridge_info = {
> +.pci.qdev.name = "pbm-bridge",
> +.pci.qdev.vmsd = &vmstate_pci_device,
> +.init = apb_pci_bridge_init,
> +.map_irq = pci_apb_map_irq,
> +};
> +
>  static void pbm_register_devices(void)
>  {
>  sysbus_register_withprop(&pbm_host_info);
>  pci_qdev_register(&pbm_pci_host_info);
> +pci_bridge_qdev_register(&pbm_pci_bridge_info);
>  }
>  
>  device_init(pbm_register_devices)
> diff --git a/hw/dec_pci.c b/hw/dec_pci.c
> index b2759dd..45b5c28 100644
> --- a/hw/dec_pci.c
> +++ b/hw/dec_pci.c
> @@ -49,18 +49,27 @@ static int dec_map_irq(PCIDevice *pci_dev, int irq_num)
>  return irq_num;
>  }
>  
> -PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
> +static int dec_21154_initfn(PCIBridge *br)
>  {
> -DeviceState *dev;
> -PCIBus *ret;
> +pci_config_set_vendor_id(br->dev.config, PCI_VENDOR_ID_DEC);
> +pci_config_set_device_id(br->dev.config, PCI_DEVICE_ID_DEC_21154);
> +return 0;
> +}
>  
> -dev = qdev_create(NULL, "dec-21154");
> -qdev_init_nofail(dev);
> -ret = pci_bridge_init(parent_bus, devfn,
> -  PCI_VENDOR_ID_DEC, PCI_DEVICE_ID_DEC_21154,
> -  dec_map_irq, "DEC 21154 PCI-PCI bridge");
> +static PCIBridgeInfo dec_21154_pci_bridge_info = {
> +.pci.qdev.name = "dec-21154-p2p-bridge",
> +.pci.qdev.desc = "DEC 21154 PCI-PCI bridge",
> +.pci.qdev.vmsd = &vmstate_pci_device,
> +.init = dec_21154_initfn,
> +.ma

[Qemu-devel] [PATCH] qemu-kvm: Replace kvm_set/get_fpu() with upstream version.

2010-06-17 Thread Sheng Yang


Signed-off-by: Sheng Yang 
---

Would send out XSAVE patch after the upstream ones have been merged, since the
patch would be affected by the merge.

 qemu-kvm-x86.c|   23 ++-
 qemu-kvm.c|   10 --
 qemu-kvm.h|   30 --
 target-i386/kvm.c |5 -
 4 files changed, 6 insertions(+), 62 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 3c33e64..49218ae 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -775,7 +775,6 @@ static void get_seg(SegmentCache *lhs, const struct 
kvm_segment *rhs)
 void kvm_arch_load_regs(CPUState *env, int level)
 {
 struct kvm_regs regs;
-struct kvm_fpu fpu;
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 int rc, n, i;
@@ -806,16 +805,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 
 kvm_set_regs(env, ®s);
 
-memset(&fpu, 0, sizeof fpu);
-fpu.fsw = env->fpus & ~(7 << 11);
-fpu.fsw |= (env->fpstt & 7) << 11;
-fpu.fcw = env->fpuc;
-for (i = 0; i < 8; ++i)
-   fpu.ftwx |= (!env->fptags[i]) << i;
-memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
-memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
-fpu.mxcsr = env->mxcsr;
-kvm_set_fpu(env, &fpu);
+kvm_put_fpu(env);
 
 memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
 if (env->interrupt_injected >= 0) {
@@ -933,7 +923,6 @@ void kvm_arch_load_regs(CPUState *env, int level)
 void kvm_arch_save_regs(CPUState *env)
 {
 struct kvm_regs regs;
-struct kvm_fpu fpu;
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 uint32_t hflags;
@@ -965,15 +954,7 @@ void kvm_arch_save_regs(CPUState *env)
 env->eflags = regs.rflags;
 env->eip = regs.rip;
 
-kvm_get_fpu(env, &fpu);
-env->fpstt = (fpu.fsw >> 11) & 7;
-env->fpus = fpu.fsw;
-env->fpuc = fpu.fcw;
-for (i = 0; i < 8; ++i)
-   env->fptags[i] = !((fpu.ftwx >> i) & 1);
-memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
-memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
-env->mxcsr = fpu.mxcsr;
+kvm_get_fpu(env);
 
 kvm_get_sregs(env, &sregs);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 96d458c..114cb5e 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -461,16 +461,6 @@ int kvm_set_regs(CPUState *env, struct kvm_regs *regs)
 return kvm_vcpu_ioctl(env, KVM_SET_REGS, regs);
 }
 
-int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu)
-{
-return kvm_vcpu_ioctl(env, KVM_GET_FPU, fpu);
-}
-
-int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu)
-{
-return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
-}
-
 int kvm_get_sregs(CPUState *env, struct kvm_sregs *sregs)
 {
 return kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs);
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6f6c6d8..ebe7893 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -222,36 +222,6 @@ int kvm_get_regs(CPUState *env, struct kvm_regs *regs);
  * \return 0 on success
  */
 int kvm_set_regs(CPUState *env, struct kvm_regs *regs);
-/*!
- * \brief Read VCPU fpu registers
- *
- * This gets the FPU registers from the VCPU and outputs them
- * into a kvm_fpu structure
- *
- * \note This function returns a \b copy of the VCPUs registers.\n
- * If you wish to modify the VCPU FPU registers, you should call kvm_set_fpu()
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which virtual CPU should get dumped
- * \param fpu Pointer to a kvm_fpu which will be populated with the VCPUs
- * fpu registers values
- * \return 0 on success
- */
-int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu);
-
-/*!
- * \brief Write VCPU fpu registers
- *
- * This sets the FPU registers on the VCPU from a kvm_fpu structure
- *
- * \note When this function returns, the fpu pointer and the data it points to
- * can be discarded
- * \param kvm Pointer to the current kvm_context
- * \param vcpu Which virtual CPU should get dumped
- * \param fpu Pointer to a kvm_fpu which holds the new vcpu fpu state
- * \return 0 on success
- */
-int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu);
 
 /*!
  * \brief Read VCPU system registers
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9cb9cf4..9c13f62 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -488,6 +488,7 @@ static int kvm_getput_regs(CPUState *env, int set)
 
 return ret;
 }
+#endif /* KVM_UPSTREAM */
 
 static int kvm_put_fpu(CPUState *env)
 {
@@ -507,6 +508,7 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
 }
 
+#ifdef KVM_UPSTREAM
 static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
@@ -605,7 +607,7 @@ static int kvm_put_msrs(CPUState *env, int level)
 return kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data);
 
 }
-
+#endif /* KVM_UPSTREAM */
 
 static int kvm_get_fpu(CPUState *env)
 {
@@ -628,6 +630,7 @@ static int kvm_get_fpu(CPUState *env)
 return 0;
 }
 
+#ifdef KVM_UPSTREAM
 static int kvm_get_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;

[Qemu-devel] Re: [PATCH 00/10] pci: pci to pci bridge clean up and enhancement

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:42PM +0900, Isaku Yamahata wrote:
> This patch series cleans up pci to pci bridge layer by introducing
> pci bridge layer. and some bug fixes.
> Although pci bridge implementation would belong to pci.c,
> I split it out into pci_bridge.c because pci.c is already big enough.
> 
> This might seem over engineering, but it's also a preparation for
> pci express root/upstream/downstream port emulators.
> Those express ports are similar, but different from each other.
> So new pci bridge layer helps here.
> Once this patch series is merged, the express ports patch will follow.

A separate patchset with just bugfixes and cleanups, would be easier to
merge.


For example, forcing all devices to call pci_reset_default
in their reset routines does not look like a good cleanup:
the less boilerplate, the better IMO. New APIs seem undocumented
and it is not obvious what they do. For example, what does
qdev_reset do?  Adding more callbacks
also does not make me very happy, they are hard to follow
and to debug. Maybe it would be better to look at the bridge layer when
we see how it helps pci express. But it would be even better IMO to avoid
layers, and just export some common functions that can be reused,
without forcing all devices to go through them.

> Isaku Yamahata (10):
>   pci_bridge: split out pci bridge code into pci_bridge.c from pci.c
>   qdev: export qdev_reset() for later use.
>   pci: fix pci_bus_reset() with 64bit BAR and several clean ups.
>   pci_bridge: introduce pci bridge layer.
>   pci bridge: add helper function for ssvid capability.
>   pci: eliminate work around in pci_device_reset().
>   pci: fix pci domain registering.
>   pci: remove PCIDeviceInfo::header_type
>   pci: set PCI multi-function bit appropriately.
>   pci: don't overwrite multi functio bit in pci header type.
> 
>  Makefile.objs |2 +-
>  hw/ac97.c |1 -
>  hw/acpi_piix4.c   |1 -
>  hw/apb_pci.c  |   43 -
>  hw/dec_pci.c  |   31 ++---
>  hw/e1000.c|1 +
>  hw/grackle_pci.c  |1 -
>  hw/ide/cmd646.c   |1 -
>  hw/ide/piix.c |1 -
>  hw/lsi53c895a.c   |2 +
>  hw/macio.c|1 -
>  hw/ne2000.c   |1 -
>  hw/openpic.c  |1 -
>  hw/pci.c  |  194 
> +++--
>  hw/pci.h  |   22 +-
>  hw/pci_bridge.c   |  188 +++
>  hw/pci_bridge.h   |   71 +++
>  hw/pcnet.c|2 +-
>  hw/piix4.c|3 +-
>  hw/piix_pci.c |5 +-
>  hw/prep_pci.c |1 -
>  hw/qdev.c |   13 +++-
>  hw/qdev.h |1 +
>  hw/rtl8139.c  |3 +-
>  hw/sun4u.c|1 -
>  hw/unin_pci.c |4 -
>  hw/usb-uhci.c |1 -
>  hw/vga-pci.c  |1 -
>  hw/virtio-pci.c   |2 +-
>  hw/vmware_vga.c   |1 -
>  hw/wdt_i6300esb.c |1 -
>  qemu-common.h |1 +
>  32 files changed, 430 insertions(+), 172 deletions(-)
>  create mode 100644 hw/pci_bridge.c
>  create mode 100644 hw/pci_bridge.h

[Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Paolo Bonzini


+while (QTAILQ_EMPTY(&(queue->request_list))&&
+   (ret != ETIMEDOUT)) {
+ret = qemu_cond_timedwait(&(queue->cond),
+   &(queue->lock), 10*10);
+}


Using qemu_cond_timedwait is a hack for not properly broadcasting the
condvar in flush_threadlet_queue.


I think Anthony answered this one.


I think he said that the code has been changed so I am right? :)


+/**
+ * flush_threadlet_queue: Wait till completion of all the submitted tasks
+ * @queue: Queue containing the tasks we're waiting on.
+ */
+void flush_threadlet_queue(ThreadletQueue *queue)
+{
+qemu_mutex_lock(&queue->lock);
+queue->exit = 1;
+
+qemu_barrier_init(&queue->barr, queue->cur_threads + 1);
+qemu_mutex_unlock(&queue->lock);
+
+qemu_barrier_wait(&queue->barr);


Can be implemented just as well with queue->cond and a loop waiting for
queue->cur_threads == 0.  This would remove the need to implement barriers
in qemu-threads (especially for Win32).  Anyway whoever will contribute
Win32 qemu-threads can do it, since it's not hard.


That was the other option I had considered before going for barriers,
for no particular reason. Now, considering that barriers are not
welcome, I will implement this method.


I guess we decided flush isn't really useful at all.  Might as well 
leave it out of v5 and implement it later, so the barrier and 
complicated exit condition are now unnecessary.


Thanks,

Paolo

[Qemu-devel] Re: [PATCH 02/10] qdev: export qdev_reset() for later use.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:44PM +0900, Isaku Yamahata wrote:
> export qdev_reset() for later use.
> 
> Signed-off-by: Isaku Yamahata 
> ---
>  hw/qdev.c |   13 +
>  hw/qdev.h |1 +
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/qdev.c b/hw/qdev.c
> index 61f999c..378f842 100644
> --- a/hw/qdev.c
> +++ b/hw/qdev.c
> @@ -256,13 +256,18 @@ DeviceState *qdev_device_add(QemuOpts *opts)
>  return qdev;
>  }
>  
> -static void qdev_reset(void *opaque)
> +void qdev_reset(DeviceState *dev)

What does this API do? Yes, I see that it invokes
the reset callback internally. But what does it do
that the caller wants? After all, the callback
gets invoked on reset directly.


>  {
> -DeviceState *dev = opaque;
>  if (dev->info->reset)
>  dev->info->reset(dev);
>  }
>  
> +static void qdev_reset_fn(void *opaque)
> +{
> +DeviceState *dev = opaque;
> +qdev_reset(dev);
> +}
> +
>  /* Initialize a device.  Device properties should be set before calling
> this function.  IRQs and MMIO regions should be connected/mapped after
> calling this function.
> @@ -278,7 +283,7 @@ int qdev_init(DeviceState *dev)
>  qdev_free(dev);
>  return rc;
>  }
> -qemu_register_reset(qdev_reset, dev);
> +qemu_register_reset(qdev_reset_fn, dev);
>  if (dev->info->vmsd) {
>  vmstate_register_with_alias_id(-1, dev->info->vmsd, dev,
> dev->instance_id_alias,
> @@ -348,7 +353,7 @@ void qdev_free(DeviceState *dev)
>  if (dev->opts)
>  qemu_opts_del(dev->opts);
>  }
> -qemu_unregister_reset(qdev_reset, dev);
> +qemu_unregister_reset(qdev_reset_fn, dev);
>  QLIST_REMOVE(dev, sibling);
>  for (prop = dev->info->props; prop && prop->name; prop++) {
>  if (prop->info->free) {
> diff --git a/hw/qdev.h b/hw/qdev.h
> index be5ad67..5fbdebf 100644
> --- a/hw/qdev.h
> +++ b/hw/qdev.h
> @@ -113,6 +113,7 @@ typedef struct GlobalProperty {
>  DeviceState *qdev_create(BusState *bus, const char *name);
>  int qdev_device_help(QemuOpts *opts);
>  DeviceState *qdev_device_add(QemuOpts *opts);
> +void qdev_reset(DeviceState *dev);
>  int qdev_init(DeviceState *dev) QEMU_WARN_UNUSED_RESULT;
>  void qdev_init_nofail(DeviceState *dev);
>  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
> -- 
> 1.6.6.1

[Qemu-devel] Re: [PATCH 05/10] pci bridge: add helper function for ssvid capability.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:47PM +0900, Isaku Yamahata wrote:
> helper function to add ssvid capability.
> 
> Signed-off-by: Isaku Yamahata 

But .. this is unused?

> ---
>  hw/pci_bridge.c |   20 
>  hw/pci_bridge.h |3 +++
>  2 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
> index 43c21d4..1397a11 100644
> --- a/hw/pci_bridge.c
> +++ b/hw/pci_bridge.c
> @@ -29,6 +29,26 @@
>  
>  #include "pci_bridge.h"
>  
> +/* PCI bridge subsystem vendor ID helper functions */
> +#define PCI_SSVID_SIZEOF8
> +#define PCI_SSVID_SVID  4
> +#define PCI_SSVID_SSID  6
> +
> +int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> +  uint16_t svid, uint16_t ssid)
> +{
> +int pos;
> +pos = pci_add_capability_at_offset(dev, PCI_CAP_ID_SSVID,
> +   offset, PCI_SSVID_SIZEOF);
> +if (pos < 0) {
> +return pos;
> +}
> +
> +pci_set_word(dev->config + pos + PCI_SSVID_SVID, svid);
> +pci_set_word(dev->config + pos + PCI_SSVID_SSID, ssid);
> +return pos;
> +}
> +
>  void pci_bridge_write_config(PCIDevice *d,
>   uint32_t address, uint32_t val, int len)
>  {
> diff --git a/hw/pci_bridge.h b/hw/pci_bridge.h
> index 2747e7f..a1f160b 100644
> --- a/hw/pci_bridge.h
> +++ b/hw/pci_bridge.h
> @@ -23,6 +23,9 @@
>  
>  #include "pci.h"
>  
> +int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> +  uint16_t svid, uint16_t ssid);
> +
>  struct PCIBridge {
>  PCIDevice dev;
>  
> -- 
> 1.6.6.1

Re: [Qemu-devel] RFC v3: blockdev_add & friends, brief rationale, QMP docs

2010-06-17 Thread Stefan Hajnoczi

On Wed, Jun 16, 2010 at 6:27 PM, Markus Armbruster  wrote:
> blockdev_add
> 
>
> Add host block device.
>
> Arguments:
>
> - "id": the host block device's ID, must be unique (json-string)
> - "format": image format (json-string, optional)
>    - Possible values: "raw", "qcow2", ...

What is the default when unset?  (I expect we'll auto-detect the
format but this should be documented.)

> - "protocol": image access protocol (json-string, optional)
>    - Possible values: "auto", "file", "nbd", ...

The semantics of "auto" are not documented here.

> Notes:
>
> (1) If argument "protocol" is missing, all other optional arguments must
>    be missing as well.  This defines a block device with no media
>    inserted.

Perhaps this is what "auto" means?

> (2) It's possible to list supported disk formats and protocols by
>    running QEMU with arguments "-blockdev_add \?".

Is there an query-block-driver command or something in QMP to
enumerate supported formats and protocols?  Not sure how useful this
would be to the management stack - blockdev_add will probably return
an error if an attempt is made to open an unsupported file.

> blockdev_del
> 
>
> Remove a host block device.
>
> Arguments:
>
> - "id": the host block device's ID (json-string)
>
> Example:
>
> -> { "execute": "blockdev_del", "arguments": { "id": "blk1" } }
> <- { "return": {} }

What about an attached guest device?  Will this fail if the virtio-blk
PCI device is still present?  For SCSI I imagine we can usually just
remove the host block device.  For IDE there isn't hotplug support
AFAIK, what happens?

Stefan

[Qemu-devel] Re: [PATCH 5/5] linux fbdev display driver.

2010-06-17 Thread Gerd Hoffmann


  Hi,


+static void fbdev_free_displaysurface(DisplaySurface *surface)
+{
+if (surface == NULL)
+return;
+
+if (surface->flags&  QEMU_ALLOCATED_FLAG) {
+qemu_free(surface->data);
+}
+
+surface->data = NULL;


This is pretty pointless ...


+qemu_free(surface);


... as you free surface anyway ;)


@@ -910,7 +959,17 @@ void fbdev_display_init(DisplayState *ds, const char 
*device)
  dcl->dpy_update  = fbdev_update;
  dcl->dpy_resize  = fbdev_resize;
  dcl->dpy_refresh = fbdev_refresh;
+dcl->dpy_setdata = fbdev_setdata;
  register_displaychangelistener(ds, dcl);
+
+da = qemu_mallocz(sizeof (DisplayAllocator));
+da->create_displaysurface = fbdev_create_displaysurface;
+da->resize_displaysurface = fbdev_resize_displaysurface;
+da->free_displaysurface = fbdev_free_displaysurface;
+
+if (register_displayallocator(ds, da) == da) {
+dpy_resize(ds);
+}


You register the display allocator, but don't unregister in 
fbdev_display_uninit().


You are just lucky that fbdev_cleanup() forgets to unmap the framebuffer.

Apply the attached fix, start qemu with vnc, then do "change fbdev on" 
and "change fbdev off" in the monitor and watch qemu segfault.


Also after "change fbdev on" the guest screen isn't rendered correctly.

cheers,
  Gerd

>From 685849ae48eaef7927b90e012fb6afb4494052d0 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Thu, 17 Jun 2010 12:32:53 +0200
Subject: [PATCH] fbdev: unmap framebuffer on cleanup

---
 fbdev.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fbdev.c b/fbdev.c
index 6623a4f..1a95ede 100644
--- a/fbdev.c
+++ b/fbdev.c
@@ -518,6 +518,10 @@ static void fbdev_cleanup(void)
 fprintf(stderr, "%s\n", __FUNCTION__);
 
 /* restore console */
+if (fb_mem != NULL) {
+munmap(fb_mem, fb_fix.smem_len+fb_mem_offset);
+fb_mem = NULL;
+}
 if (fb != -1) {
 if (ioctl(fb,FBIOPUT_VSCREENINFO, &fb_ovar) < 0)
 perror("ioctl FBIOPUT_VSCREENINFO");
-- 
1.6.5.2

[Qemu-devel] [Bug 595438] Re: KVM segmentation fault, using SCSI+writeback and linux 2.4 guest

2010-06-17 Thread Коренберг Марк

** Summary changed:

- segmentation  scsi writeback
+ KVM segmentation fault, using SCSI+writeback and linux 2.4 guest

-- 
KVM segmentation fault, using SCSI+writeback and linux 2.4 guest
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [Bug 595438] Re: segmentation scsi writeback

2010-06-17 Thread Коренберг Марк

Bug 100% reproductible (on this, and on other machine with different
processor)

core dump (bzip2) attached


** Attachment added: "core dump"
   http://launchpadlibrarian.net/50482028/core.bz2

-- 
segmentation  scsi writeback
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [Bug 595438] [NEW] segmentation scsi writeback

2010-06-17 Thread Коренберг Марк

Public bug reported:

I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system.
During install process (when .tar.gz actively unpacked), kvm dead with
segmentation fault.

And ONLY when scsi virtual disk and writeback simultaneously.
writeback+ide, writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when
qcow2 file size need to be expanded.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
segmentation  scsi writeback
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [Bug 595438] Re: segmentation scsi writeback

2010-06-17 Thread Коренберг Марк

do not fuck me about 'spamsender' machine name. I never send spam. it's
just our mail server :)

-- 
segmentation  scsi writeback
https://bugs.launchpad.net/bugs/595438
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
I Use Ubuntu 32 bit 10.04 with standerd kvm.
I have E7600  @ 3.06GHz processor with VMX

In this system I Run:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name 
spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait 
-monitor chardev:monitor -boot d -drive 
file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive 
file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback 
-net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 
-chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 
127.0.0.1:0 -vga cirrus

.iso image contain custom distro of 2.4-linux kernel based system. During 
install process (when .tar.gz actively unpacked), kvm dead with segmentation 
fault.

And ONLY when scsi virtual disk and writeback simultaneously. writeback+ide, 
writethrough+scsi works OK.

I use qcow2. It seems, that qcow does not have such problems.

Virtual machine get down at radom place during file copy. It seems, when qcow2 
file size need to be expanded.

[Qemu-devel] [PATCH 0/8] seabios: pci: multi pci bus support

2010-06-17 Thread Isaku Yamahata

This patch set allows seabios to initialize multi pci bus and 64bit BAR.

Currently seabios is able to initialize only pci root bus.
However multi pci bus support is wanted because
  - more pci bus is wanted in qemu for many slots
  - pci express support is commin in qemu which requires multi pci bus.
those patches on Qemu part are under way, though.

Isaku Yamahata (8):
  seabios: pci: introduce foreachpci_in_bus() helper macro.
  seabios: pciinit: factor out pci bar region allocation logic.
  seabios: pciinit: make pci memory space assignment 64bit aware.
  seabios: pciinit: make pci bar assigner preferchable memory aware.
  seabios: pciinit: factor out bar offset calculation.
  seabios: pciinit: make bar offset calculation pci bridge aware.
  seabios: pciinit: pci bridge bus initialization.
  seabios: pciinit: initialize pci bridge filtering registers.

 src/pci.c |   30 ++
 src/pci.h |   11 ++
 src/pciinit.c |  310 
 3 files changed, 306 insertions(+), 45 deletions(-)

[Qemu-devel] [PATCH 7/8] seabios: pciinit: pci bridge bus initialization.

2010-06-17 Thread Isaku Yamahata

pci bridge bus initialization.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |   70 +
 1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 1c2c8c6..fe6848a 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -220,6 +220,74 @@ static void pci_bios_init_device(u16 bdf)
 }
 }
 
+static void
+pci_bios_init_bus_rec(int bus, u8 *pci_bus)
+{
+int devfn, bdf;
+u16 class;
+
+dprintf(1, "PCI: %s bus = 0x%x\n", __func__, bus);
+
+/* prevent accidental access to unintended devices */
+foreachpci_in_bus(bus, devfn, bdf) {
+class = pci_config_readw(bdf, PCI_CLASS_DEVICE);
+if (class == PCI_CLASS_BRIDGE_PCI) {
+pci_config_writeb(bdf, PCI_SECONDARY_BUS, 255);
+pci_config_writeb(bdf, PCI_SUBORDINATE_BUS, 0);
+}
+}
+
+foreachpci_in_bus(bus, devfn, bdf) {
+class = pci_config_readw(bdf, PCI_CLASS_DEVICE);
+if (class != PCI_CLASS_BRIDGE_PCI) {
+continue;
+}
+dprintf(1, "PCI: %s bdf = 0x%x\n", __func__, bdf);
+
+u8 pribus = pci_config_readb(bdf, PCI_PRIMARY_BUS);
+if (pribus != bus) {
+dprintf(1, "PCI: primary bus = 0x%x -> 0x%x\n", pribus, bus);
+pci_config_writeb(bdf, PCI_PRIMARY_BUS, bus);
+} else {
+dprintf(1, "PCI: primary bus = 0x%x\n", pribus);
+}
+
+u8 secbus = pci_config_readb(bdf, PCI_SECONDARY_BUS);
+(*pci_bus)++;
+if (*pci_bus != secbus) {
+dprintf(1, "PCI: secondary bus = 0x%x -> 0x%x\n",
+secbus, *pci_bus);
+secbus = *pci_bus;
+pci_config_writeb(bdf, PCI_SECONDARY_BUS, secbus);
+} else {
+dprintf(1, "PCI: secondary bus = 0x%x\n", secbus);
+}
+
+/* set to max for access to all subordinate buses.
+   later set it to accurate value */
+u8 subbus = pci_config_readb(bdf, PCI_SUBORDINATE_BUS);
+pci_config_writeb(bdf, PCI_SUBORDINATE_BUS, 255);
+
+pci_bios_init_bus_rec(secbus, pci_bus);
+
+if (subbus != *pci_bus) {
+dprintf(1, "PCI: subordinate bus = 0x%x -> 0x%x\n",
+subbus, *pci_bus);
+subbus = *pci_bus;
+} else {
+dprintf(1, "PCI: subordinate bus = 0x%x\n", subbus);
+}
+pci_config_writeb(bdf, PCI_SUBORDINATE_BUS, subbus);
+}
+}
+
+static void
+pci_bios_init_bus(void)
+{
+u8 pci_bus = 0;
+pci_bios_init_bus_rec(0 /* host bus */, &pci_bus);
+}
+
 void
 pci_setup(void)
 {
@@ -235,6 +303,8 @@ pci_setup(void)
 /* pci_bios_mem_addr +  */
 pci_bios_prefmem_addr = pci_bios_mem_addr + 0x0800;
 
+pci_bios_init_bus();
+
 int bdf, max;
 foreachpci(bdf, max) {
 pci_bios_init_bridges(bdf);
-- 
1.6.6.1

[Qemu-devel] [PATCH 2/8] seabios: pciinit: factor out pci bar region allocation logic.

2010-06-17 Thread Isaku Yamahata

factor out pci bar region allocation logic.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |   84 -
 1 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 0556ee2..488c77b 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -37,6 +37,50 @@ static void pci_set_io_region_addr(u16 bdf, int region_num, 
u32 addr)
 dprintf(1, "region %d: 0x%08x\n", region_num, addr);
 }
 
+static void pci_bios_allocate_region(u16 bdf, int region_num)
+{
+u32 *paddr;
+int ofs;
+if (region_num == PCI_ROM_SLOT)
+ofs = PCI_ROM_ADDRESS;
+else
+ofs = PCI_BASE_ADDRESS_0 + region_num * 4;
+
+u32 old = pci_config_readl(bdf, ofs);
+u32 mask;
+if (region_num == PCI_ROM_SLOT) {
+mask = PCI_ROM_ADDRESS_MASK;
+pci_config_writel(bdf, ofs, mask);
+} else {
+if (old & PCI_BASE_ADDRESS_SPACE_IO)
+mask = PCI_BASE_ADDRESS_IO_MASK;
+else
+mask = PCI_BASE_ADDRESS_MEM_MASK;
+pci_config_writel(bdf, ofs, ~0);
+}
+u32 val = pci_config_readl(bdf, ofs);
+pci_config_writel(bdf, ofs, old);
+
+if (val != 0) {
+u32 size = (~(val & mask)) + 1;
+if (val & PCI_BASE_ADDRESS_SPACE_IO)
+paddr = &pci_bios_io_addr;
+else
+paddr = &pci_bios_mem_addr;
+*paddr = ALIGN(*paddr, size);
+pci_set_io_region_addr(bdf, region_num, *paddr);
+*paddr += size;
+}
+}
+
+static void pci_bios_allocate_regions(u16 bdf)
+{
+int i;
+for (i = 0; i < PCI_NUM_REGIONS; i++) {
+pci_bios_allocate_region(bdf, i);
+}
+}
+
 /* return the global irq number corresponding to a given device irq
pin. We could also use the bus number to have a more precise
mapping. */
@@ -78,8 +122,7 @@ static void pci_bios_init_bridges(u16 bdf)
 static void pci_bios_init_device(u16 bdf)
 {
 int class;
-u32 *paddr;
-int i, pin, pic_irq, vendor_id, device_id;
+int pin, pic_irq, vendor_id, device_id;
 
 class = pci_config_readw(bdf, PCI_CLASS_DEVICE);
 vendor_id = pci_config_readw(bdf, PCI_VENDOR_ID);
@@ -94,7 +137,7 @@ static void pci_bios_init_device(u16 bdf)
 /* PIIX3/PIIX4 IDE */
 pci_config_writew(bdf, 0x40, 0x8000); // enable IDE0
 pci_config_writew(bdf, 0x42, 0x8000); // enable IDE1
-goto default_map;
+pci_bios_allocate_regions(bdf);
 } else {
 /* IDE: we map it as in ISA mode */
 pci_set_io_region_addr(bdf, 0, PORT_ATA1_CMD_BASE);
@@ -121,41 +164,8 @@ static void pci_bios_init_device(u16 bdf)
 }
 break;
 default:
-default_map:
 /* default memory mappings */
-for (i = 0; i < PCI_NUM_REGIONS; i++) {
-int ofs;
-if (i == PCI_ROM_SLOT)
-ofs = PCI_ROM_ADDRESS;
-else
-ofs = PCI_BASE_ADDRESS_0 + i * 4;
-
-u32 old = pci_config_readl(bdf, ofs);
-u32 mask;
-if (i == PCI_ROM_SLOT) {
-mask = PCI_ROM_ADDRESS_MASK;
-pci_config_writel(bdf, ofs, mask);
-} else {
-if (old & PCI_BASE_ADDRESS_SPACE_IO)
-mask = PCI_BASE_ADDRESS_IO_MASK;
-else
-mask = PCI_BASE_ADDRESS_MEM_MASK;
-pci_config_writel(bdf, ofs, ~0);
-}
-u32 val = pci_config_readl(bdf, ofs);
-pci_config_writel(bdf, ofs, old);
-
-if (val != 0) {
-u32 size = (~(val & mask)) + 1;
-if (val & PCI_BASE_ADDRESS_SPACE_IO)
-paddr = &pci_bios_io_addr;
-else
-paddr = &pci_bios_mem_addr;
-*paddr = ALIGN(*paddr, size);
-pci_set_io_region_addr(bdf, i, *paddr);
-*paddr += size;
-}
-}
+pci_bios_allocate_regions(bdf);
 break;
 }
 
-- 
1.6.6.1

[Qemu-devel] Re: [PATCH 03/10] pci: fix pci_bus_reset() with 64bit BAR and several clean ups.

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 03:15:45PM +0900, Isaku Yamahata wrote:
> fix pci_device_reset() with 64bit BAR.
> export pci_bus_reset(), pci_device_reset() and two helper functions
> for later use. And several clean ups.
> 
> Signed-off-by: Isaku Yamahata 
> ---
>  hw/pci.c |   44 
>  hw/pci.h |5 +
>  2 files changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 9ba62eb..87f5e6c 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -144,28 +144,50 @@ static void pci_update_irq_status(PCIDevice *dev)
>  }
>  }
>  
> -static void pci_device_reset(PCIDevice *dev)
> +void pci_device_reset_default(PCIDevice *dev)
>  {
>  int r;
>  
>  dev->irq_state = 0;
>  pci_update_irq_status(dev);
> -dev->config[PCI_COMMAND] &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
> -  PCI_COMMAND_MASTER);
> +pci_set_word(dev->config + PCI_COMMAND,
> + pci_get_word(dev->config + PCI_COMMAND) &
> + ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | 
> PCI_COMMAND_MASTER));
>  dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
>  dev->config[PCI_INTERRUPT_LINE] = 0x0;
>  for (r = 0; r < PCI_NUM_REGIONS; ++r) {
> -if (!dev->io_regions[r].size) {
> +PCIIORegion *region = &dev->io_regions[r];
> +if (!region->size) {
>  continue;
>  }
> -pci_set_long(dev->config + pci_bar(dev, r), dev->io_regions[r].type);
> +
> +if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) &&
> +region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> +pci_set_quad(dev->config + pci_bar(dev, r), region->type);
> +} else {
> +pci_set_long(dev->config + pci_bar(dev, r), region->type);
> +}
>  }
>  pci_update_mappings(dev);
>  }
>  

I applied the first hunk. Looking at it
made me notice that we don't clear interrupt disable
bit on reset, and we really should as it is read/write.
Rather than duplicating code, we should just use wmask.

I ended up with this:

commit b82d3876099c4f1fd009082f052e3bac7e3062e7
Author: Isaku Yamahata 
Date:   Thu Jun 17 15:15:45 2010 +0900

pci: fix pci_device_reset

Clear interrupt disable bit on reset, according to PCI spec.
Fix pci_device_reset() with 64bit BAR.

Signed-off-by: Isaku Yamahata 
Signed-off-by: Michael S. Tsirkin 

diff --git a/hw/pci.c b/hw/pci.c
index 7787005..de33745 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -150,15 +150,24 @@ static void pci_device_reset(PCIDevice *dev)
 
 dev->irq_state = 0;
 pci_update_irq_status(dev);
-dev->config[PCI_COMMAND] &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
-  PCI_COMMAND_MASTER);
+/* Clear all writeable bits */
+pci_set_word(dev->config + PCI_COMMAND,
+ pci_get_word(dev->config + PCI_COMMAND) &
+ ~pci_get_word(dev->wmask + PCI_COMMAND));
 dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
 dev->config[PCI_INTERRUPT_LINE] = 0x0;
 for (r = 0; r < PCI_NUM_REGIONS; ++r) {
-if (!dev->io_regions[r].size) {
+PCIIORegion *region = &dev->io_regions[r];
+if (!region->size) {
 continue;
 }
-pci_set_long(dev->config + pci_bar(dev, r), dev->io_regions[r].type);
+
+if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) &&
+region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+pci_set_quad(dev->config + pci_bar(dev, r), region->type);
+} else {
+pci_set_long(dev->config + pci_bar(dev, r), region->type);
+}
 }
 pci_update_mappings(dev);
 }

[Qemu-devel] [PATCH 1/8] seabios: pci: introduce foreachpci_in_bus() helper macro.

2010-06-17 Thread Isaku Yamahata

This patch introduces foreachpci_in_bus() helper macro for
depth first recursion. foreachpci() is for width first recursion.
The macro will be used later to initialize pci bridge
that requires depth first recursion.

Signed-off-by: Isaku Yamahata 
---
 src/pci.c |   30 ++
 src/pci.h |   11 +++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/src/pci.c b/src/pci.c
index 1ab3c2c..d418b4b 100644
--- a/src/pci.c
+++ b/src/pci.c
@@ -157,6 +157,36 @@ pci_find_vga(void)
 }
 }
 
+// Helper function for foreachpci_in_bus() macro - return next devfn
+int
+pci_next_in_bus(int bus, int devfn)
+{
+int bdf = pci_bus_devfn_to_bdf(bus, devfn);
+if (pci_bdf_to_fn(bdf) == 1
+&& (pci_config_readb(bdf-1, PCI_HEADER_TYPE) & 0x80) == 0)
+// Last found device wasn't a multi-function device - skip to
+// the next device.
+devfn += 7;
+
+for (;;) {
+if (devfn >= 0x100)
+return -1;
+
+bdf = pci_bus_devfn_to_bdf(bus, devfn);
+u16 v = pci_config_readw(bdf, PCI_VENDOR_ID);
+if (v != 0x && v != 0x)
+// Device is present.
+break;
+
+if (pci_bdf_to_fn(bdf) == 0)
+devfn += 8;
+else
+devfn += 1;
+}
+
+return devfn;
+}
+
 // Search for a device with the specified vendor and device ids.
 int
 pci_find_device(u16 vendid, u16 devid)
diff --git a/src/pci.h b/src/pci.h
index 8a21c06..26bfd40 100644
--- a/src/pci.h
+++ b/src/pci.h
@@ -21,6 +21,9 @@ static inline u8 pci_bdf_to_fn(u16 bdf) {
 static inline u16 pci_to_bdf(int bus, int dev, int fn) {
 return (bus<<8) | (dev<<3) | fn;
 }
+static inline u16 pci_bus_devfn_to_bdf(int bus, u16 devfn) {
+return (bus << 8) | devfn;
+}
 
 static inline u32 pci_vd(u16 vendor, u16 device) {
 return (device << 16) | vendor;
@@ -50,6 +53,14 @@ int pci_next(int bdf, int *pmax);
  ; BDF >= 0 \
  ; BDF=pci_next(BDF+1, &MAX))
 
+int pci_next_in_bus(int bus, int devfn);
+#define foreachpci_in_bus(BUS, DEVFN, BDF)  \
+for (DEVFN = pci_next_in_bus(BUS, 0),   \
+ BDF = pci_bus_devfn_to_bdf(BUS, DEVFN) \
+ ; DEVFN >= 0   \
+ ; DEVFN = pci_next_in_bus(BUS, DEVFN + 1), \
+   BDF = pci_bus_devfn_to_bdf(BUS, DEVFN))
+
 // pirtable.c
 void create_pirtable(void);
 
-- 
1.6.6.1

[Qemu-devel] [PATCH 4/8] seabios: pciinit: make pci bar assigner preferchable memory aware.

2010-06-17 Thread Isaku Yamahata

Make pci bar assigner preferchable memory aware.
This is needed for PCI bridge support because memory space and
prefetchable memory space is filtered independently based on
memory base/limit and prefetchable memory base/limit of pci bridge.
On bus 0, such a distinction isn't necessary so keep existing behavior
by checking bus=0.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index b635e44..b6ab157 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -16,6 +16,7 @@
 
 static u32 pci_bios_io_addr;
 static u32 pci_bios_mem_addr;
+static u32 pci_bios_prefmem_addr;
 /* host irqs corresponding to PCI irqs A-D */
 static u8 pci_irqs[4] = {
 10, 10, 11, 11
@@ -70,6 +71,12 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
 u32 size = (~(val & mask)) + 1;
 if (val & PCI_BASE_ADDRESS_SPACE_IO)
 paddr = &pci_bios_io_addr;
+else if ((val & PCI_BASE_ADDRESS_MEM_PREFETCH) &&
+ /* keep behaviour on bus = 0 */
+ pci_bdf_to_bus(bdf) != 0 &&
+ /* If pci_bios_prefmem_addr == 0, keep old behaviour */
+ pci_bios_prefmem_addr != 0)
+paddr = &pci_bios_prefmem_addr;
 else
 paddr = &pci_bios_mem_addr;
 *paddr = ALIGN(*paddr, size);
@@ -221,6 +228,9 @@ pci_setup(void)
 pci_bios_io_addr = 0xc000;
 pci_bios_mem_addr = BUILD_PCIMEM_START;
 
+/* pci_bios_mem_addr +  */
+pci_bios_prefmem_addr = pci_bios_mem_addr + 0x0800;
+
 int bdf, max;
 foreachpci(bdf, max) {
 pci_bios_init_bridges(bdf);
-- 
1.6.6.1

[Qemu-devel] [PATCH 5/8] seabios: pciinit: factor out bar offset calculation.

2010-06-17 Thread Isaku Yamahata

This patch factors out bar offset calculation.
Later the calculation logic will be enhanced.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index b6ab157..6ba51f2 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -22,15 +22,19 @@ static u8 pci_irqs[4] = {
 10, 10, 11, 11
 };
 
+static u32 pci_bar(u16 bdf, int region_num)
+{
+if (region_num != PCI_ROM_SLOT) {
+return PCI_BASE_ADDRESS_0 + region_num * 4;
+}
+return PCI_ROM_ADDRESS;
+}
+
 static void pci_set_io_region_addr(u16 bdf, int region_num, u32 addr)
 {
 u32 ofs, old_addr;
 
-if (region_num == PCI_ROM_SLOT) {
-ofs = PCI_ROM_ADDRESS;
-} else {
-ofs = PCI_BASE_ADDRESS_0 + region_num * 4;
-}
+ofs = pci_bar(bdf, region_num);
 
 old_addr = pci_config_readl(bdf, ofs);
 
@@ -46,11 +50,7 @@ static void pci_set_io_region_addr(u16 bdf, int region_num, 
u32 addr)
 static int pci_bios_allocate_region(u16 bdf, int region_num)
 {
 u32 *paddr;
-int ofs;
-if (region_num == PCI_ROM_SLOT)
-ofs = PCI_ROM_ADDRESS;
-else
-ofs = PCI_BASE_ADDRESS_0 + region_num * 4;
+u32 ofs = pci_bar(bdf, region_num);
 
 u32 old = pci_config_readl(bdf, ofs);
 u32 mask;
-- 
1.6.6.1

[Qemu-devel] [PATCH 8/8] seabios: pciinit: initialize pci bridge filtering registers.

2010-06-17 Thread Isaku Yamahata

initialize pci bridge filtering registers.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |  117 +++-
 1 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index fe6848a..f68a690 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -14,6 +14,8 @@
 #define PCI_ROM_SLOT 6
 #define PCI_NUM_REGIONS 7
 
+static void pci_bios_init_device_in_bus(int bus);
+
 static u32 pci_bios_io_addr;
 static u32 pci_bios_mem_addr;
 static u32 pci_bios_prefmem_addr;
@@ -145,6 +147,106 @@ static void pci_bios_init_bridges(u16 bdf)
 }
 }
 
+#define PCI_IO_ALIGN4096
+#define PCI_IO_SHIFT8
+#define PCI_MEMORY_ALIGN(1UL << 20)
+#define PCI_MEMORY_SHIFT16
+#define PCI_PREF_MEMORY_ALIGN   (1UL << 20)
+#define PCI_PREF_MEMORY_SHIFT   16
+
+static void pci_bios_init_device_bridge(u16 bdf)
+{
+u32 io_old;
+u32 mem_old;
+u32 prefmem_old;
+
+u32 io_base;
+u32 io_end;
+u32 mem_base;
+u32 mem_end;
+u32 prefmem_base;
+u32 prefmem_end;
+
+pci_bios_allocate_region(bdf, 0);
+pci_bios_allocate_region(bdf, 1);
+pci_bios_allocate_region(bdf, PCI_ROM_SLOT);
+
+io_old = pci_bios_io_addr;
+mem_old = pci_bios_mem_addr;
+prefmem_old = pci_bios_prefmem_addr;
+
+/* IO BASE is assumed to be 16 bit */
+pci_bios_io_addr = ALIGN(pci_bios_io_addr, PCI_IO_ALIGN);
+pci_bios_mem_addr = ALIGN(pci_bios_mem_addr, PCI_MEMORY_ALIGN);
+pci_bios_prefmem_addr =
+ALIGN(pci_bios_prefmem_addr, PCI_PREF_MEMORY_ALIGN);
+
+io_base = pci_bios_io_addr;
+mem_base = pci_bios_mem_addr;
+prefmem_base = pci_bios_prefmem_addr;
+
+u8 secbus = pci_config_readb(bdf, PCI_SECONDARY_BUS);
+if (secbus > 0) {
+pci_bios_init_device_in_bus(secbus);
+}
+
+pci_bios_io_addr = ALIGN(pci_bios_io_addr, PCI_IO_ALIGN);
+pci_bios_mem_addr = ALIGN(pci_bios_mem_addr, PCI_MEMORY_ALIGN);
+pci_bios_prefmem_addr =
+ALIGN(pci_bios_prefmem_addr, PCI_PREF_MEMORY_ALIGN);
+
+io_end = pci_bios_io_addr;
+if (io_end == io_base) {
+pci_bios_io_addr = io_old;
+io_base = 0x;
+io_end = 1;
+}
+pci_config_writeb(bdf, PCI_IO_BASE, io_base >> PCI_IO_SHIFT);
+pci_config_writew(bdf, PCI_IO_BASE_UPPER16, 0);
+pci_config_writeb(bdf, PCI_IO_LIMIT, (io_end - 1) >> PCI_IO_SHIFT);
+pci_config_writew(bdf, PCI_IO_LIMIT_UPPER16, 0);
+
+mem_end = pci_bios_mem_addr;
+if (mem_end == mem_base) {
+pci_bios_mem_addr = mem_old;
+mem_base = 0x;
+mem_end = 1;
+}
+pci_config_writew(bdf, PCI_MEMORY_BASE, mem_base >> PCI_MEMORY_SHIFT);
+pci_config_writew(bdf, PCI_MEMORY_LIMIT, (mem_end -1) >> PCI_MEMORY_SHIFT);
+
+prefmem_end = pci_bios_prefmem_addr;
+if (prefmem_end == prefmem_base) {
+pci_bios_prefmem_addr = prefmem_old;
+prefmem_base = 0x;
+prefmem_end = 1;
+}
+pci_config_writew(bdf, PCI_PREF_MEMORY_BASE,
+  prefmem_base >> PCI_PREF_MEMORY_SHIFT);
+pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT,
+  (prefmem_end - 1) >> PCI_PREF_MEMORY_SHIFT);
+pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, 0);
+pci_config_writel(bdf, PCI_PREF_LIMIT_UPPER32, 0);
+
+dprintf(1, "PCI: br io   = [0x%x, 0x%x)\n", io_base, io_end);
+dprintf(1, "PCI: br mem  = [0x%x, 0x%x)\n", mem_base, mem_end);
+dprintf(1, "PCI: br pref = [0x%x, 0x%x)\n", prefmem_base, prefmem_end);
+
+u16 cmd = pci_config_readw(bdf, PCI_COMMAND);
+cmd &= ~PCI_COMMAND_IO;
+if (io_end > io_base) {
+cmd |= PCI_COMMAND_IO;
+}
+cmd &= ~PCI_COMMAND_MEMORY;
+if (mem_end > mem_base || prefmem_end > prefmem_base) {
+cmd |= PCI_COMMAND_MEMORY;
+}
+cmd |= PCI_COMMAND_MASTER;
+pci_config_writew(bdf, PCI_COMMAND, cmd);
+
+pci_config_maskw(bdf, PCI_BRIDGE_CONTROL, 0, PCI_BRIDGE_CTL_SERR);
+}
+
 static void pci_bios_init_device(u16 bdf)
 {
 int class;
@@ -189,6 +291,9 @@ static void pci_bios_init_device(u16 bdf)
 pci_set_io_region_addr(bdf, 0, 0x8080);
 }
 break;
+case PCI_CLASS_BRIDGE_PCI:
+pci_bios_init_device_bridge(bdf);
+break;
 default:
 /* default memory mappings */
 pci_bios_allocate_regions(bdf);
@@ -220,6 +325,14 @@ static void pci_bios_init_device(u16 bdf)
 }
 }
 
+static void pci_bios_init_device_in_bus(int bus)
+{
+int devfn, bdf;
+foreachpci_in_bus(bus, devfn, bdf) {
+pci_bios_init_device(bdf);
+}
+}
+
 static void
 pci_bios_init_bus_rec(int bus, u8 *pci_bus)
 {
@@ -309,7 +422,5 @@ pci_setup(void)
 foreachpci(bdf, max) {
 pci_bios_init_bridges(bdf);
 }
-foreachpci(bdf, max) {
-pci_bios_init_device(bdf);
-}
+pci_bios_init_device_in_bus(0 /* host bus */);
 }
-- 
1.6.6.1

[Qemu-devel] [PATCH 6/8] seabios: pciinit: make bar offset calculation pci bridge aware.

2010-06-17 Thread Isaku Yamahata

This patch makes pci bar offset calculation pci bridge aware.
The offset of pci bridge rom is different from normal device.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 6ba51f2..1c2c8c6 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -27,7 +27,11 @@ static u32 pci_bar(u16 bdf, int region_num)
 if (region_num != PCI_ROM_SLOT) {
 return PCI_BASE_ADDRESS_0 + region_num * 4;
 }
-return PCI_ROM_ADDRESS;
+
+#define PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
+u8 type = pci_config_readb(bdf, PCI_HEADER_TYPE);
+type &= ~PCI_HEADER_TYPE_MULTI_FUNCTION;
+return type == PCI_HEADER_TYPE_BRIDGE ? PCI_ROM_ADDRESS1 : PCI_ROM_ADDRESS;
 }
 
 static void pci_set_io_region_addr(u16 bdf, int region_num, u32 addr)
-- 
1.6.6.1

[Qemu-devel] [PATCH 3/8] seabios: pciinit: make pci memory space assignment 64bit aware.

2010-06-17 Thread Isaku Yamahata

make pci memory space assignment 64bit aware.
If 64bit memory space is found while assigning pci memory space,
clear higher bit and skip to next bar.

This patch is preparation for q35 chipset initialization which
has 64bit bar.

Signed-off-by: Isaku Yamahata 
---
 src/pciinit.c |   19 +--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 488c77b..b635e44 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -37,7 +37,12 @@ static void pci_set_io_region_addr(u16 bdf, int region_num, 
u32 addr)
 dprintf(1, "region %d: 0x%08x\n", region_num, addr);
 }
 
-static void pci_bios_allocate_region(u16 bdf, int region_num)
+/*
+ * return value
+ *  0: 32bit BAR
+ *  non 0: 64bit BAR
+ */
+static int pci_bios_allocate_region(u16 bdf, int region_num)
 {
 u32 *paddr;
 int ofs;
@@ -71,13 +76,23 @@ static void pci_bios_allocate_region(u16 bdf, int 
region_num)
 pci_set_io_region_addr(bdf, region_num, *paddr);
 *paddr += size;
 }
+
+int is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
+(val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
+if (is_64bit) {
+pci_config_writel(bdf, ofs + 4, 0);
+}
+return is_64bit;
 }
 
 static void pci_bios_allocate_regions(u16 bdf)
 {
 int i;
 for (i = 0; i < PCI_NUM_REGIONS; i++) {
-pci_bios_allocate_region(bdf, i);
+int is_64bit = pci_bios_allocate_region(bdf, i);
+if (is_64bit){
+i++;
+}
 }
 }
 
-- 
1.6.6.1

[Qemu-devel] [PATCH] vmware_vga: fix reset value for command register

2010-06-17 Thread Michael S. Tsirkin

Make init value for this register match the spec.
BAR address is 0 at init, so enabling it
only works by chance.

Signed-off-by: Michael S. Tsirkin 
---

This patch is untested. Could someone who has vmware
guests please look at it?
Thanks!

 hw/vmware_vga.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index bf2a699..41c959b 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1240,9 +1240,6 @@ static int pci_vmsvga_initfn(PCIDevice *dev)
 
 pci_config_set_vendor_id(s->card.config, PCI_VENDOR_ID_VMWARE);
 pci_config_set_device_id(s->card.config, SVGA_PCI_DEVICE_ID);
-s->card.config[PCI_COMMAND]= PCI_COMMAND_IO |
-  PCI_COMMAND_MEMORY |
-  PCI_COMMAND_MASTER; /* I/O + Memory */
 pci_config_set_class(s->card.config, PCI_CLASS_DISPLAY_VGA);
 s->card.config[PCI_CACHE_LINE_SIZE]= 0x08; /* Cache line 
size */
 s->card.config[PCI_LATENCY_TIMER] = 0x40;  /* Latency timer */
-- 
1.7.1.12.g42b7f

[Qemu-devel] [PATCH] pcnet: address TODOs

2010-06-17 Thread Michael S. Tsirkin

pcnet enables memory/io on init, which
does not make sense as BAR values are wrong.
Fix this, disabling BARs according to PCI spec.
Address other minor TODOs.

Signed-off-by: Michael S. Tsirkin 
---

The following untested patch brings pcnet in
compliance with the spec.
Could someone who's interested in pcnet look
at this patch please?


 hw/pcnet.c |   17 ++---
 1 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/hw/pcnet.c b/hw/pcnet.c
index 5e63eb5..b52935a 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1981,26 +1981,14 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
 
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_AMD);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_AMD_LANCE);
-/* TODO: value should be 0 at RST# */
-pci_set_word(pci_conf + PCI_COMMAND,
- PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
 pci_set_word(pci_conf + PCI_STATUS,
  PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MEDIUM);
 pci_conf[PCI_REVISION_ID] = 0x10;
-/* TODO: 0 is the default anyway, no need to set it. */
-pci_conf[PCI_CLASS_PROG] = 0x00;
 pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
-pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
-
-/* TODO: not necessary, is set when BAR is registered. */
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, PCI_BASE_ADDRESS_SPACE_IO);
-pci_set_long(pci_conf + PCI_BASE_ADDRESS_0 + 4,
- PCI_BASE_ADDRESS_SPACE_MEMORY);
 
 pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID, 0x0);
 pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, 0x0);
 
-/* TODO: value must be 0 at RST# */
 pci_conf[PCI_INTERRUPT_PIN] = 1; // interrupt pin 0
 pci_conf[PCI_MIN_GNT] = 0x06;
 pci_conf[PCI_MAX_LAT] = 0xff;
@@ -2009,11 +1997,10 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
 s->mmio_index =
   cpu_register_io_memory(pcnet_mmio_read, pcnet_mmio_write, &d->state);
 
-/* TODO: use pci_dev, avoid cast below. */
-pci_register_bar((PCIDevice *)d, 0, PCNET_IOPORT_SIZE,
+pci_register_bar(pci_dev, 0, PCNET_IOPORT_SIZE,
PCI_BASE_ADDRESS_SPACE_IO, pcnet_ioport_map);
 
-pci_register_bar((PCIDevice *)d, 1, PCNET_PNPMMIO_SIZE,
+pci_register_bar(pci_dev, 1, PCNET_PNPMMIO_SIZE,
PCI_BASE_ADDRESS_SPACE_MEMORY, pcnet_mmio_map);
 
 s->irq = pci_dev->irq[0];
-- 
1.7.1.12.g42b7f

[Qemu-devel] Re: [PATCH 0/8] seabios: pci: multi pci bus support

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 08:03:08PM +0900, Isaku Yamahata wrote:
> This patch set allows seabios to initialize multi pci bus and 64bit BAR.
> 
> Currently seabios is able to initialize only pci root bus.
> However multi pci bus support is wanted because
>   - more pci bus is wanted in qemu for many slots
>   - pci express support is commin in qemu which requires multi pci bus.
> those patches on Qemu part are under way, though.

Not that I object, but - does it really require multi bus? Why?

> Isaku Yamahata (8):
>   seabios: pci: introduce foreachpci_in_bus() helper macro.
>   seabios: pciinit: factor out pci bar region allocation logic.
>   seabios: pciinit: make pci memory space assignment 64bit aware.
>   seabios: pciinit: make pci bar assigner preferchable memory aware.
>   seabios: pciinit: factor out bar offset calculation.
>   seabios: pciinit: make bar offset calculation pci bridge aware.
>   seabios: pciinit: pci bridge bus initialization.
>   seabios: pciinit: initialize pci bridge filtering registers.
> 
>  src/pci.c |   30 ++
>  src/pci.h |   11 ++
>  src/pciinit.c |  310 
>  3 files changed, 306 insertions(+), 45 deletions(-)

[Qemu-devel] Re: [PATCH 00/10] pci: pci to pci bridge clean up and enhancement

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 01:02:43PM +0300, Michael S. Tsirkin wrote:
> For example, forcing all devices to call pci_reset_default
> in their reset routines does not look like a good cleanup:
> the less boilerplate, the better IMO.

One thing that we need to address, is devices
which need to enable memory+master on init.
They should probably also enable this on reset.

One approach that was discussed several times
would be to call cleanup and then init again.
I expect this would be enough to get rid of reset
callbacks in most devices.

-- 
MST

[Qemu-devel] [RFC][PATCH 0/2] block: Add flush after metadata writes

2010-06-17 Thread Kevin Wolf

This addresses the data integrity problems described at
http://wiki.qemu.org/Features/Qcow2DataIntegrity#Metadata_update_ordering.2C_Part_2

These problems are the same for all image formats (except raw, which doesn't
have any metadata), so I'm going to add more patches for the other formats for
the real patch submission.

Kevin Wolf (2):
  block: Add bdrv_(p)write_sync
  qcow2: Use bdrv_(p)write_sync for metadata writes

 block.c|   37 +
 block.h|4 
 block/qcow2-cluster.c  |   16 
 block/qcow2-refcount.c |   18 +-
 block/qcow2-snapshot.c |   14 +++---
 block/qcow2.c  |   10 +-
 6 files changed, 70 insertions(+), 29 deletions(-)

[Qemu-devel] [RFC][PATCH 1/2] block: Add bdrv_(p)write_sync

2010-06-17 Thread Kevin Wolf

Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)

Signed-off-by: Kevin Wolf 
---
 block.c |   37 +
 block.h |4 
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 0765fbc..7b64c2d 100644
--- a/block.c
+++ b/block.c
@@ -1010,6 +1010,43 @@ int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 return count1;
 }
 
+/*
+ * Writes to the file and ensures that no writes are reordered across this
+ * request (acts as a barrier)
+ *
+ * Returns 0 on success, -errno in error cases.
+ */
+int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
+const void *buf, int count)
+{
+int ret;
+
+ret = bdrv_pwrite(bs, offset, buf, count);
+if (ret < 0) {
+return ret;
+}
+
+/* No flush needed for cache=writethrough, it uses O_DSYNC */
+if ((bs->open_flags & BDRV_O_CACHE_MASK) != 0) {
+bdrv_flush(bs);
+}
+
+return 0;
+}
+
+/*
+ * Writes to the file and ensures that no writes are reordered across this
+ * request (acts as a barrier)
+ *
+ * Returns 0 on success, -errno in error cases.
+ */
+int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
+const uint8_t *buf, int nb_sectors)
+{
+return bdrv_pwrite_sync(bs, BDRV_SECTOR_SIZE * sector_num,
+buf, BDRV_SECTOR_SIZE * nb_sectors);
+}
+
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
diff --git a/block.h b/block.h
index 9df9b38..6a157f4 100644
--- a/block.h
+++ b/block.h
@@ -80,6 +80,10 @@ int bdrv_pread(BlockDriverState *bs, int64_t offset,
void *buf, int count);
 int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
 const void *buf, int count);
+int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
+const void *buf, int count);
+int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
+const uint8_t *buf, int nb_sectors);
 int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
-- 
1.6.6.1

[Qemu-devel] [RFC][PATCH 2/2] qcow2: Use bdrv_(p)write_sync for metadata writes

2010-06-17 Thread Kevin Wolf

Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c  |   16 
 block/qcow2-refcount.c |   18 +-
 block/qcow2-snapshot.c |   14 +++---
 block/qcow2.c  |   10 +-
 4 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5760ad6..05cf6c2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -64,7 +64,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
 for(i = 0; i < s->l1_size; i++)
 new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-ret = bdrv_pwrite(bs->file, new_l1_table_offset, new_l1_table, 
new_l1_size2);
+ret = bdrv_pwrite_sync(bs->file, new_l1_table_offset, new_l1_table, 
new_l1_size2);
 if (ret != new_l1_size2)
 goto fail;
 for(i = 0; i < s->l1_size; i++)
@@ -74,7 +74,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
 cpu_to_be32w((uint32_t*)data, new_l1_size);
 cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
-ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, l1_size), 
data,sizeof(data));
+ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size), 
data,sizeof(data));
 if (ret != sizeof(data)) {
 goto fail;
 }
@@ -207,7 +207,7 @@ static int write_l1_entry(BlockDriverState *bs, int 
l1_index)
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
-ret = bdrv_pwrite(bs->file, s->l1_table_offset + 8 * l1_start_index,
+ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset + 8 * l1_start_index,
 buf, sizeof(buf));
 if (ret < 0) {
 return ret;
@@ -263,7 +263,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, 
uint64_t **table)
 }
 /* write the l2 table to the file */
 BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
-ret = bdrv_pwrite(bs->file, l2_offset, l2_table,
+ret = bdrv_pwrite_sync(bs->file, l2_offset, l2_table,
 s->l2_size * sizeof(uint64_t));
 if (ret < 0) {
 goto fail;
@@ -413,8 +413,8 @@ static int copy_sectors(BlockDriverState *bs, uint64_t 
start_sect,
 &s->aes_encrypt_key);
 }
 BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
-ret = bdrv_write(bs->file, (cluster_offset >> 9) + n_start,
- s->cluster_data, n);
+ret = bdrv_write_sync(bs->file, (cluster_offset >> 9) + n_start,
+s->cluster_data, n);
 if (ret < 0)
 return ret;
 return 0;
@@ -631,7 +631,7 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 l2_table[l2_index] = cpu_to_be64(cluster_offset);
-if (bdrv_pwrite(bs->file,
+if (bdrv_pwrite_sync(bs->file,
 l2_offset + l2_index * sizeof(uint64_t),
 l2_table + l2_index,
 sizeof(uint64_t)) != sizeof(uint64_t))
@@ -655,7 +655,7 @@ static int write_l2_entries(BlockDriverState *bs, uint64_t 
*l2_table,
 int ret;
 
 BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
-ret = bdrv_pwrite(bs->file, l2_offset + start_offset,
+ret = bdrv_pwrite_sync(bs->file, l2_offset + start_offset,
 &l2_table[l2_start_index], len);
 if (ret < 0) {
 return ret;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 41e1da9..540bf49 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -44,7 +44,7 @@ static int write_refcount_block(BlockDriverState *bs)
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE);
-if (bdrv_pwrite(bs->file, s->refcount_block_cache_offset,
+if (bdrv_pwrite_sync(bs->file, s->refcount_block_cache_offset,
 s->refcount_block_cache, size) != size)
 {
 return -EIO;
@@ -269,7 +269,7 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index)
 
 /* Now the new refcount block needs to be written to disk */
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
-ret = bdrv_pwrite(bs->file, new_block, s->refcount_block_cache,
+ret = bdrv_pwrite_sync(bs->file, new_block, s->refcount_block_cache,
 s->cluster_size);
 if (ret < 0) {
 goto fail_block;
@@ -279,7 +279,7 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index)
 if (refcount_table_index < s->refcount_table_size) {
 uint64_t data64 = cpu_to_be64(new_block);
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_HOOKUP);
-ret = bdrv_pwrite(bs->file,
+ret = bdrv_pwrite_sync(bs->file,
 s->refcount_table_offset + refcount_table_index * sizeof(uint64_t),
 &data64, sizeof(data64));
 if (ret < 0) {
@@ -359,7 +359,7 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index

Re: [Qemu-devel] RFC v3: blockdev_add & friends, brief rationale, QMP docs

2010-06-17 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> On Wed, Jun 16, 2010 at 6:27 PM, Markus Armbruster  wrote:
>> blockdev_add
>> 
>>
>> Add host block device.
>>
>> Arguments:
>>
>> - "id": the host block device's ID, must be unique (json-string)
>> - "format": image format (json-string, optional)
>>    - Possible values: "raw", "qcow2", ...
>
> What is the default when unset?  (I expect we'll auto-detect the
> format but this should be documented.)

For command line and human monitor, we definitely want a sensible
default.  I sketched one in section "Command line syntax".  I'll quote
it for your convenience a few lines down.

>> - "protocol": image access protocol (json-string, optional)
>>    - Possible values: "auto", "file", "nbd", ...
>
> The semantics of "auto" are not documented here.

Uh, that slipped in here.  It means "guess protocol from image file
type".

Again, for command line and human monitor, we definitely want a sensible
default, and I sketched one in section "Command line syntax".

We may want the same defaults in QMP, although more for consistency than
for usability.  But I didn't want to complicate the QMP section with all
that defaults business, so I moved discussion of defaults down to the
command line section.  Hope I didn't cause even more confusion that way.

Anyway, here's what I wrote on default format:

   * The default format is derived from the image file name: if it ends
 with .F, where F is a format name, that format is the default, else
 "raw".

To let users ask for this explicitely, we could have pseudo-format
"auto".

We also need a pseudo-format "probe", which guesses the format from the
image contents.  Can't be made the default, because it's insecure.

On protocol "auto":

   * The default protocol depends on the image file type: if it is a
 special file, it defaults to the protocol appropriate for that special
 file ("host_cdrom" for CD-ROM, ...).  Else it defaults to "file".

   This permits shortening the first two examples:

   -blockdev id=blk1,file=fedora.img

   -blockdev id=blk2,blkdebug=test.blkdebug,file=test.qcow2

And for completeness, let me quote the unshortened examples, too:

   -blockdev id=blk1,format=raw,protocol=file,file=fedora.img

   -blockdev id=blk2,format=qcow2,blkdebug=test.blkdebug,\
   protocol=file,file=test.qcow2

>> Notes:
>>
>> (1) If argument "protocol" is missing, all other optional arguments must
>>    be missing as well.  This defines a block device with no media
>>    inserted.
>
> Perhaps this is what "auto" means?
>
>> (2) It's possible to list supported disk formats and protocols by
>>    running QEMU with arguments "-blockdev_add \?".
>
> Is there an query-block-driver command or something in QMP to
> enumerate supported formats and protocols?  Not sure how useful this
> would be to the management stack - blockdev_add will probably return
> an error if an attempt is made to open an unsupported file.

QMP should be "self-documenting": a client should be able to list
commands, their arguments, and possible argument values.  Listing
supported formats then becomes "list possible values of command
blockdev_add's argument format".

>> blockdev_del
>> 
>>
>> Remove a host block device.
>>
>> Arguments:
>>
>> - "id": the host block device's ID (json-string)
>>
>> Example:
>>
>> -> { "execute": "blockdev_del", "arguments": { "id": "blk1" } }
>> <- { "return": {} }
>
> What about an attached guest device?  Will this fail if the virtio-blk
> PCI device is still present?  For SCSI I imagine we can usually just
> remove the host block device.  For IDE there isn't hotplug support
> AFAIK, what happens?

Command fails.  You have to device_del the device first.  Which is only
possible if its bus supports hot-plug.

Thanks!

[Qemu-devel] Re: [Bug 595117] Re: qemu-nbd slow and missing "writeback" cache option

2010-06-17 Thread Stephane Chazelas

2010-06-16 20:36:00 -, Dustin Kirkland:
[...]
> Could you please send that patch to the qemu-devel@ mailing list?
> Thanks!
[...]

Hi Dustin, it looks like qemu-devel is subscribed to bugs in
there, so the bug report is on the list already.

Note that I still consider it as a bug because:
  - slow performance for no good reason
  - --nocache option is misleading
  - no fsync on "-d" which to my mind is a bug.

Cheers,
Stephane

-- 
qemu-nbd slow and missing "writeback" cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and "qemu-nbd -d" doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless "syncs" are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
"sync"s will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon "qemu-nbd -d" to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a < /dev/null
qemu-nbd --cache= -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT->sync or die$!' 1<> /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.

[Qemu-devel] [PATCH v2] monitor: Really show snapshot information about all devices

2010-06-17 Thread Miguel Di Ciurcio Filho

The 'info snapshots' monitor command does not show snapshot information from all
available block devices.

Usage example:
$ qemu -hda disk1.qcow2 -hdb disk2.qcow2

(qemu) info snapshots
Snapshot devices: ide0-hd0
Snapshot list (from ide0-hd0):
IDTAG VM SIZEDATE   VM CLOCK
11.5M 2010-05-26 21:51:02   00:00:03.263
21.5M 2010-05-26 21:51:09   00:00:08.844
31.5M 2010-05-26 21:51:24   00:00:23.274
41.5M 2010-05-26 21:53:17   00:00:03.595

In the above case, disk2.qcow2 has snapshot information, but it is not being
shown. Only the first device is always shown.

This patch updates the do_info_snapshots() function do correctly show snapshot
information about all available block devices.

New output:
(qemu) info snapshots
Snapshot list from ide0-hd0 (VM state image):
IDTAG VM SIZEDATE   VM CLOCK
11.5M 2010-05-26 21:51:02   00:00:03.263
21.5M 2010-05-26 21:51:09   00:00:08.844
31.5M 2010-05-26 21:51:24   00:00:23.274
41.5M 2010-05-26 21:53:17   00:00:03.595

Snapshot list from ide0-hd1:
IDTAG VM SIZEDATE   VM CLOCK
1   0 2010-05-26 21:51:02   00:00:03.263
2   0 2010-05-26 21:51:09   00:00:08.844
3   0 2010-05-26 21:51:24   00:00:23.274
4   0 2010-05-26 21:53:17   00:00:03.595

changelog
-
v1 -> v2
- Added support to identify the device elected to save the VM's state.

Signed-off-by: Miguel Di Ciurcio Filho 
---
 savevm.c |   57 -
 1 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/savevm.c b/savevm.c
index 20354a8..5bc5fcd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1858,37 +1858,44 @@ void do_delvm(Monitor *mon, const QDict *qdict)
 
 void do_info_snapshots(Monitor *mon)
 {
-BlockDriverState *bs, *bs1;
-QEMUSnapshotInfo *sn_tab, *sn;
+BlockDriverState *bs, *bs_vm_state;
+QEMUSnapshotInfo *sn_tab;
 int nb_sns, i;
 char buf[256];
 
-bs = get_bs_snapshots();
-if (!bs) {
+bs_vm_state = get_bs_snapshots();
+if (!bs_vm_state) {
 monitor_printf(mon, "No available block device supports snapshots\n");
 return;
 }
-monitor_printf(mon, "Snapshot devices:");
-bs1 = NULL;
-while ((bs1 = bdrv_next(bs1))) {
-if (bdrv_can_snapshot(bs1)) {
-if (bs == bs1)
-monitor_printf(mon, " %s", bdrv_get_device_name(bs1));
-}
-}
-monitor_printf(mon, "\n");
 
-nb_sns = bdrv_snapshot_list(bs, &sn_tab);
-if (nb_sns < 0) {
-monitor_printf(mon, "bdrv_snapshot_list: error %d\n", nb_sns);
-return;
-}
-monitor_printf(mon, "Snapshot list (from %s):\n",
-   bdrv_get_device_name(bs));
-monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), NULL));
-for(i = 0; i < nb_sns; i++) {
-sn = &sn_tab[i];
-monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), sn));
+bs = NULL;
+while ((bs = bdrv_next(bs))) {
+if (bdrv_can_snapshot(bs)) {
+monitor_printf(mon, "Snapshot list from %s",
+   bdrv_get_device_name(bs));
+
+if (bs == bs_vm_state) {
+monitor_printf(mon, " (VM state image):\n");
+} else {
+monitor_printf(mon, ":\n");
+}
+
+monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, sizeof(buf), 
NULL));
+
+nb_sns = bdrv_snapshot_list(bs, &sn_tab);
+if (nb_sns < 0) {
+monitor_printf(mon, "bdrv_snapshot_list: error %d\n", nb_sns);
+continue;
+}
+
+for (i = 0; i < nb_sns; i++) {
+monitor_printf(mon, "%s\n", bdrv_snapshot_dump(buf, 
sizeof(buf),
+&sn_tab[i]));
+}
+
+qemu_free(sn_tab);
+monitor_printf(mon, "\n");
+}
 }
-qemu_free(sn_tab);
 }
-- 
1.7.1

Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, QMP docs

2010-06-17 Thread Anthony Liguori


On 06/17/2010 03:20 AM, Kevin Wolf wrote:

Am 16.06.2010 20:07, schrieb Anthony Liguori:
   

   But it requires that
everything that -blockdev provides is accessible with -drive, too (or
that we're okay with users hating us).

   

I'm happy for -drive to die.  I think we should support -hda and
-blockdev.
 

-hda is not sufficient for most users. It doesn't provide any options.
It doesn't even support virtio. If -drive is going to die (and we seem
to agree all on that), then -blockdev needs to be usable for users (and
it's only you who contradicts so far).
   


I've always thought we should have a -vda argument and an -sda argument 
specifically for specifying virtio and scsi disks.



-blockdev should be optimized for config files, not single
argument input.  IOW:

[blockdev "blk2"]
   format = "raw"
   file = "/path/to/base.img"
   cache = "writeback"

[blockdev "blk1"]
format = "qcow2"
file = "/path/to/leaf.img"
cache="off"
backing_dev = "blk2"

[device "disk1"]
driver = "ide-drive"
blockdev = "blk1"
bus = "0"
unit = "0"
 

You don't specify the backing file of an image on the command line (or
in the configuration file).


But we really ought to allow it.  Backing files are implemented as part 
of the core block layer, not the actual block formats.  Today the block 
layer queries the block format for the name of the backing file but gets 
no additional options from the block format.  File isn't necessarily 
enough information to successfully open the backing device so why treat 
it specially?


I think we should keep the current ability to query the block format for 
a backing file name but we should also support hooking up the backing 
device without querying the block format at all.  It makes the model 
much more elegant IMHO because then we're just creating block devices 
and hooking them up.  All block devices are created equal more or less.



  It's saved as part of the image. It's more
like this (for a simple raw image file):

[blockdev-protocol "proto1"]
protocol = "file"
file = "/path/to/image.img"

[blockdev "blk1"]
format = "raw"
cache="off"
protocol = "proto1"

[device "disk1"]
driver = "ide-drive"
blockdev = "blk1"
bus = "0"
unit = "0"

(This would be Markus' option 3, I think)
   


I don't understand why we need two layers of abstraction here.  Why not 
just:


[blockdev "proto1"]
  protocol = "file"
  cache = "off"
  file = "/path/to/image.img"

Why does the cache option belong with raw and not with file and why 
can't we just use file directly?As Christoph mentions, we really don't 
have stacked protocols and I'm



not sure they make sense.
 

Right, if we go for Christoph's suggestion, we don't need stacked
protocols. We'll have stacked formats instead. I'm not sure if you like
this any better. ;-)

We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
blkdebug on file. We need to be able to represent this.
   


I think we need to do stacking in a device specific way.  When you look 
at something like vmdk, it should actually support multiple leafs since 
the format does support such a thing.  So what I'd suggest is:


[blockdev "part1"]
  format = "raw"
  file = "image000.vmdk"

[blockdev "part2"]
  format = "raw"
  file = "image001.vmdk"

[blockdev "image"]
  format = "vmdk"
  section0 = "part1"
  section1 = "part2"

Note, we'll need to support this sort of model in order to support a 
disk that creates an automatic partition table (which would be a pretty 
useful feature).  For blkdebug, it would look like:


[blockdev "disk"]
  format = "qcow2"
  file = "foo.qcow2"

[blockdev "debug"]
  format = "blkdebug"
  blockdev = "disk"


I think raw doesn't make very much sense then.  What's the point of it
if it's just a thin wrapper around a protocol?
 

That it can be wrapped around any protocol. It's just about separating
code for handling the content of an image and code for accessing the image.

Ever tried something like "qemu-img create -f raw /dev/something 10G"?
You need the host_device protocol there, not the file protocol. When we
had raw == file this completely failed. And it's definitely reasonable
to expect that it works because the image format _is_ raw, it's just not
saved in a file.
   


No, I don't actually thing it's reasonable.  There's nothing meaningful 
that command can do.  Also, I've never understand creating qcow2 images 
on a physical device.  qcow2 needs to grow dynamically and physical 
devices can't.


I understand that we need to support the later use case but I don't 
think creating this layer of user-visible abstraction is the right thing 
to do.  This is an obscure use case and it shouldn't be the model that 
we force upon our users.



Or the famous qcow2 images on block devices. Why did qemu guess the
format correctly when qcow2 was saved in a file, but not on a host
device? This was just inconsistent.

I've had more than one bug report about th

[Qemu-devel] Re: [PATCH] pcnet: address TODOs

2010-06-17 Thread Jan Kiszka

Michael S. Tsirkin wrote:
> pcnet enables memory/io on init, which
> does not make sense as BAR values are wrong.
> Fix this, disabling BARs according to PCI spec.
> Address other minor TODOs.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> The following untested patch brings pcnet in
> compliance with the spec.
> Could someone who's interested in pcnet look
> at this patch please?
> 

At least our special guest still works with your patch applied.

Tested-by: Jan Kiszka 

Jan

> 
>  hw/pcnet.c |   17 ++---
>  1 files changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/pcnet.c b/hw/pcnet.c
> index 5e63eb5..b52935a 100644
> --- a/hw/pcnet.c
> +++ b/hw/pcnet.c
> @@ -1981,26 +1981,14 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
>  
>  pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_AMD);
>  pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_AMD_LANCE);
> -/* TODO: value should be 0 at RST# */
> -pci_set_word(pci_conf + PCI_COMMAND,
> - PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
>  pci_set_word(pci_conf + PCI_STATUS,
>   PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MEDIUM);
>  pci_conf[PCI_REVISION_ID] = 0x10;
> -/* TODO: 0 is the default anyway, no need to set it. */
> -pci_conf[PCI_CLASS_PROG] = 0x00;
>  pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
> -pci_conf[PCI_HEADER_TYPE] = PCI_HEADER_TYPE_NORMAL; // header_type
> -
> -/* TODO: not necessary, is set when BAR is registered. */
> -pci_set_long(pci_conf + PCI_BASE_ADDRESS_0, PCI_BASE_ADDRESS_SPACE_IO);
> -pci_set_long(pci_conf + PCI_BASE_ADDRESS_0 + 4,
> - PCI_BASE_ADDRESS_SPACE_MEMORY);
>  
>  pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID, 0x0);
>  pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, 0x0);
>  
> -/* TODO: value must be 0 at RST# */
>  pci_conf[PCI_INTERRUPT_PIN] = 1; // interrupt pin 0
>  pci_conf[PCI_MIN_GNT] = 0x06;
>  pci_conf[PCI_MAX_LAT] = 0xff;
> @@ -2009,11 +1997,10 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
>  s->mmio_index =
>cpu_register_io_memory(pcnet_mmio_read, pcnet_mmio_write, &d->state);
>  
> -/* TODO: use pci_dev, avoid cast below. */
> -pci_register_bar((PCIDevice *)d, 0, PCNET_IOPORT_SIZE,
> +pci_register_bar(pci_dev, 0, PCNET_IOPORT_SIZE,
> PCI_BASE_ADDRESS_SPACE_IO, pcnet_ioport_map);
>  
> -pci_register_bar((PCIDevice *)d, 1, PCNET_PNPMMIO_SIZE,
> +pci_register_bar(pci_dev, 1, PCNET_PNPMMIO_SIZE,
> PCI_BASE_ADDRESS_SPACE_MEMORY, pcnet_mmio_map);
>  
>  s->irq = pci_dev->irq[0];

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 3/4] qemu: kvm: Enable XSAVE live migration support

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> Signed-off-by: Sheng Yang 
> ---
>  kvm-all.c |   21 +++
>  kvm.h |2 +
>  target-i386/cpu.h |7 ++-
>  target-i386/kvm.c |  139 
> -
>  target-i386/machine.c |   20 +++
>  5 files changed, 186 insertions(+), 3 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 43704b8..343c06e 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -71,6 +71,7 @@ struct KVMState
>  #endif
>  int irqchip_in_kernel;
>  int pit_in_kernel;
> +int xsave, xcrs;
>  };
>  
>  static KVMState *kvm_state;
> @@ -685,6 +686,16 @@ int kvm_init(int smp_cpus)
>  s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
>  #endif
>  
> +s->xsave = 0;
> +#ifdef KVM_CAP_XSAVE
> +s->xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
> +#endif
> +
> +s->xcrs = 0;
> +#ifdef KVM_CAP_XCRS
> +s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
> +#endif
> +
>  ret = kvm_arch_init(s, smp_cpus);
>  if (ret < 0)
>  goto err;
> @@ -1013,6 +1024,16 @@ int kvm_has_debugregs(void)
>  return kvm_state->debugregs;
>  }
>  
> +int kvm_has_xsave(void)
> +{
> +return kvm_state->xsave;
> +}
> +
> +int kvm_has_xcrs(void)
> +{
> +return kvm_state->xcrs;
> +}
> +
>  void kvm_setup_guest_memory(void *start, size_t size)
>  {
>  if (!kvm_has_sync_mmu()) {
> diff --git a/kvm.h b/kvm.h
> index 7975e87..50c4192 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -41,6 +41,8 @@ int kvm_has_sync_mmu(void);
>  int kvm_has_vcpu_events(void);
>  int kvm_has_robust_singlestep(void);
>  int kvm_has_debugregs(void);
> +int kvm_has_xsave(void);
> +int kvm_has_xcrs(void);
>  
>  #ifdef NEED_CPU_H
>  int kvm_init_vcpu(CPUState *env);
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 548ab80..680eed1 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -718,6 +718,11 @@ typedef struct CPUX86State {
>  uint16_t fpus_vmstate;
>  uint16_t fptag_vmstate;
>  uint16_t fpregs_format_vmstate;
> +
> +uint64_t xstate_bv;
> +XMMReg ymmh_regs[CPU_NB_REGS];
> +
> +uint64_t xcr0;
>  } CPUX86State;
>  
>  CPUX86State *cpu_x86_init(const char *cpu_model);
> @@ -895,7 +900,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
>  #define cpu_list_id x86_cpu_list
>  #define cpudef_setup x86_cpudef_setup
>  
> -#define CPU_SAVE_VERSION 11
> +#define CPU_SAVE_VERSION 12
>  
>  /* MMU modes definitions */
>  #define MMU_MODE0_SUFFIX _kernel
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index bb6a12f..db1f21d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -289,6 +289,8 @@ void kvm_arch_reset_vcpu(CPUState *env)
>  } else {
>  env->mp_state = KVM_MP_STATE_RUNNABLE;
>  }
> +/* Legal xcr0 for loading */
> +env->xcr0 = 1;
>  }
>  
>  static int kvm_has_msr_star(CPUState *env)
> @@ -504,6 +506,68 @@ static int kvm_put_fpu(CPUState *env)
>  return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
>  }
>  
> +#ifdef KVM_CAP_XSAVE
> +#define XSAVE_CWD_RIP 2
> +#define XSAVE_CWD_RDP 4
> +#define XSAVE_MXCSR   6
> +#define XSAVE_ST_SPACE8
> +#define XSAVE_XMM_SPACE   40
> +#define XSAVE_XSTATE_BV   128
> +#define XSAVE_YMMH_SPACE  144
> +#endif
> +
> +static int kvm_put_xsave(CPUState *env)
> +{
> +#ifdef KVM_CAP_XSAVE
> +int i;
> +struct kvm_xsave* xsave;
> +uint16_t cwd, swd, twd, fop;
> +
> +if (!kvm_has_xsave())
> +return kvm_put_fpu(env);
> +
> +xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
> +memset(xsave, 0, sizeof(struct kvm_xsave));
> +cwd = swd = twd = fop = 0;
> +swd = env->fpus & ~(7 << 11);
> +swd |= (env->fpstt & 7) << 11;
> +cwd = env->fpuc;
> +for (i = 0; i < 8; ++i)
> +twd |= (!env->fptags[i]) << i;
> +xsave->region[0] = (uint32_t)(swd << 16) + cwd;
> +xsave->region[1] = (uint32_t)(fop << 16) + twd;
> +memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
> +sizeof env->fpregs);
> +memcpy(&xsave->region[XSAVE_XMM_SPACE], env->xmm_regs,
> +sizeof env->xmm_regs);
> +xsave->region[XSAVE_MXCSR] = env->mxcsr;
> +*(uint64_t *)&xsave->region[XSAVE_XSTATE_BV] = env->xstate_bv;
> +memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
> +sizeof env->ymmh_regs);
> +return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
> +#else
> +return kvm_put_fpu(env);
> +#endif
> +}
> +
> +static int kvm_put_xcrs(CPUState *env)
> +{
> +#ifdef KVM_CAP_XCRS
> +struct kvm_xcrs xcrs;
> +
> +if (!kvm_has_xcrs())
> +return 0;
> +
> +xcrs.nr_xcrs = 1;
> +xcrs.flags = 0;
> +xcrs.xcrs[0].xcr = 0;
> +xcrs.xcrs[0].value = env->xcr0;
> +return kvm_vcpu_ioctl(env, KVM_SET_XCRS, &xcrs);
> +#else
> +return 0;
> +#endif
> +}
> +
>  static int kvm_put_sregs(CPUState *env)
>  {
>  struct kvm_sregs sregs;
> @@ -621,6 +685,69 @@ static int kvm_get_fpu(CPUState *env)
>  return 0;
>  }
>  
> +static i

[Qemu-devel] Re: [PATCH] qemu-kvm: Replace kvm_set/get_fpu() with upstream version.

2010-06-17 Thread Jan Kiszka

Sheng Yang wrote:
> Signed-off-by: Sheng Yang 
> ---
> 
> Would send out XSAVE patch after the upstream ones have been merged, since the
> patch would be affected by the merge.
> 
>  qemu-kvm-x86.c|   23 ++-
>  qemu-kvm.c|   10 --
>  qemu-kvm.h|   30 --
>  target-i386/kvm.c |5 -
>  4 files changed, 6 insertions(+), 62 deletions(-)
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 3c33e64..49218ae 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -775,7 +775,6 @@ static void get_seg(SegmentCache *lhs, const struct 
> kvm_segment *rhs)
>  void kvm_arch_load_regs(CPUState *env, int level)
>  {
>  struct kvm_regs regs;
> -struct kvm_fpu fpu;
>  struct kvm_sregs sregs;
>  struct kvm_msr_entry msrs[100];
>  int rc, n, i;
> @@ -806,16 +805,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
>  
>  kvm_set_regs(env, ®s);
>  
> -memset(&fpu, 0, sizeof fpu);
> -fpu.fsw = env->fpus & ~(7 << 11);
> -fpu.fsw |= (env->fpstt & 7) << 11;
> -fpu.fcw = env->fpuc;
> -for (i = 0; i < 8; ++i)
> - fpu.ftwx |= (!env->fptags[i]) << i;
> -memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
> -memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
> -fpu.mxcsr = env->mxcsr;
> -kvm_set_fpu(env, &fpu);
> +kvm_put_fpu(env);
>  
>  memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
>  if (env->interrupt_injected >= 0) {
> @@ -933,7 +923,6 @@ void kvm_arch_load_regs(CPUState *env, int level)
>  void kvm_arch_save_regs(CPUState *env)
>  {
>  struct kvm_regs regs;
> -struct kvm_fpu fpu;
>  struct kvm_sregs sregs;
>  struct kvm_msr_entry msrs[100];
>  uint32_t hflags;
> @@ -965,15 +954,7 @@ void kvm_arch_save_regs(CPUState *env)
>  env->eflags = regs.rflags;
>  env->eip = regs.rip;
>  
> -kvm_get_fpu(env, &fpu);
> -env->fpstt = (fpu.fsw >> 11) & 7;
> -env->fpus = fpu.fsw;
> -env->fpuc = fpu.fcw;
> -for (i = 0; i < 8; ++i)
> - env->fptags[i] = !((fpu.ftwx >> i) & 1);
> -memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
> -memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
> -env->mxcsr = fpu.mxcsr;
> +kvm_get_fpu(env);
>  
>  kvm_get_sregs(env, &sregs);
>  
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 96d458c..114cb5e 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -461,16 +461,6 @@ int kvm_set_regs(CPUState *env, struct kvm_regs *regs)
>  return kvm_vcpu_ioctl(env, KVM_SET_REGS, regs);
>  }
>  
> -int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu)
> -{
> -return kvm_vcpu_ioctl(env, KVM_GET_FPU, fpu);
> -}
> -
> -int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu)
> -{
> -return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
> -}
> -
>  int kvm_get_sregs(CPUState *env, struct kvm_sregs *sregs)
>  {
>  return kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs);
> diff --git a/qemu-kvm.h b/qemu-kvm.h
> index 6f6c6d8..ebe7893 100644
> --- a/qemu-kvm.h
> +++ b/qemu-kvm.h
> @@ -222,36 +222,6 @@ int kvm_get_regs(CPUState *env, struct kvm_regs *regs);
>   * \return 0 on success
>   */
>  int kvm_set_regs(CPUState *env, struct kvm_regs *regs);
> -/*!
> - * \brief Read VCPU fpu registers
> - *
> - * This gets the FPU registers from the VCPU and outputs them
> - * into a kvm_fpu structure
> - *
> - * \note This function returns a \b copy of the VCPUs registers.\n
> - * If you wish to modify the VCPU FPU registers, you should call 
> kvm_set_fpu()
> - *
> - * \param kvm Pointer to the current kvm_context
> - * \param vcpu Which virtual CPU should get dumped
> - * \param fpu Pointer to a kvm_fpu which will be populated with the VCPUs
> - * fpu registers values
> - * \return 0 on success
> - */
> -int kvm_get_fpu(CPUState *env, struct kvm_fpu *fpu);
> -
> -/*!
> - * \brief Write VCPU fpu registers
> - *
> - * This sets the FPU registers on the VCPU from a kvm_fpu structure
> - *
> - * \note When this function returns, the fpu pointer and the data it points 
> to
> - * can be discarded
> - * \param kvm Pointer to the current kvm_context
> - * \param vcpu Which virtual CPU should get dumped
> - * \param fpu Pointer to a kvm_fpu which holds the new vcpu fpu state
> - * \return 0 on success
> - */
> -int kvm_set_fpu(CPUState *env, struct kvm_fpu *fpu);
>  
>  /*!
>   * \brief Read VCPU system registers
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 9cb9cf4..9c13f62 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -488,6 +488,7 @@ static int kvm_getput_regs(CPUState *env, int set)
>  
>  return ret;
>  }
> +#endif /* KVM_UPSTREAM */
>  
>  static int kvm_put_fpu(CPUState *env)
>  {
> @@ -507,6 +508,7 @@ static int kvm_put_fpu(CPUState *env)
>  return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
>  }
>  
> +#ifdef KVM_UPSTREAM
>  static int kvm_put_sregs(CPUState *env)
>  {
>  struct kvm_sregs sregs;
> @@ -605,7 +607,7 @@ static int kvm_put_msrs(CPUState *env

Re: [Qemu-devel] Q35 qemu repository?

2010-06-17 Thread Matthew Garrett

On Thu, Jun 17, 2010 at 04:48:09PM +0900, Isaku Yamahata wrote:
> Thanks for the patch.
> Does vista boot with the patch eventually?

Vista boots, but is unable to allocate resources for the pcie root 
ports. I'm looking into that.

-- 
Matthew Garrett | mj...@srcf.ucam.org

Re: [Qemu-devel] Re: RFC v2: blockdev_add & friends, brief rationale, QMP docs

2010-06-17 Thread Kevin Wolf

Am 17.06.2010 15:01, schrieb Anthony Liguori:
> On 06/17/2010 03:20 AM, Kevin Wolf wrote:
>> Am 16.06.2010 20:07, schrieb Anthony Liguori:
>>
But it requires that
 everything that -blockdev provides is accessible with -drive, too (or
 that we're okay with users hating us).


>>> I'm happy for -drive to die.  I think we should support -hda and
>>> -blockdev.
>>>  
>> -hda is not sufficient for most users. It doesn't provide any options.
>> It doesn't even support virtio. If -drive is going to die (and we seem
>> to agree all on that), then -blockdev needs to be usable for users (and
>> it's only you who contradicts so far).
>>
> 
> I've always thought we should have a -vda argument and an -sda argument 
> specifically for specifying virtio and scsi disks.

It would at least fix the most obvious problem. However, it still
doesn't allow passing options.

>>> -blockdev should be optimized for config files, not single
>>> argument input.  IOW:
>>>
>>> [blockdev "blk2"]
>>>format = "raw"
>>>file = "/path/to/base.img"
>>>cache = "writeback"
>>>
>>> [blockdev "blk1"]
>>> format = "qcow2"
>>> file = "/path/to/leaf.img"
>>> cache="off"
>>> backing_dev = "blk2"
>>>
>>> [device "disk1"]
>>> driver = "ide-drive"
>>> blockdev = "blk1"
>>> bus = "0"
>>> unit = "0"
>>>  
>> You don't specify the backing file of an image on the command line (or
>> in the configuration file).
> 
> But we really ought to allow it.  Backing files are implemented as part 
> of the core block layer, not the actual block formats.  

The generic block layer knows the name of the backing file, so it can be
displayed in tools, but that's about it. Calling this the
"implementation" of backing files is daring.

I see no use case for specifying it on the command line. The only thing
you can achieve better with it is corrupting your image because you
specify the wrong/no backing file next time.

> Today the block 
> layer queries the block format for the name of the backing file but gets 
> no additional options from the block format.  File isn't necessarily 
> enough information to successfully open the backing device so why treat 
> it specially?
> 
> I think we should keep the current ability to query the block format for 
> a backing file name but we should also support hooking up the backing 
> device without querying the block format at all.  It makes the model 
> much more elegant IMHO because then we're just creating block devices 
> and hooking them up.  All block devices are created equal more or less.
> 
>>   It's saved as part of the image. It's more
>> like this (for a simple raw image file):
>>
>> [blockdev-protocol "proto1"]
>> protocol = "file"
>> file = "/path/to/image.img"
>>
>> [blockdev "blk1"]
>> format = "raw"
>> cache="off"
>> protocol = "proto1"
>>
>> [device "disk1"]
>> driver = "ide-drive"
>> blockdev = "blk1"
>> bus = "0"
>> unit = "0"
>>
>> (This would be Markus' option 3, I think)
>>
> 
> I don't understand why we need two layers of abstraction here.  Why not 
> just:
> 
> [blockdev "proto1"]
>protocol = "file"
>cache = "off"
>file = "/path/to/image.img"
> 
> Why does the cache option belong with raw and not with file and why 
> can't we just use file directly?

The cache option is shared along the chain, so it probably fits best in
the blockdev.

And we don't use file directly because it's wrong. Users say that their
image is in raw format, and they don't get why they should have to make
a difference between a raw image stored on a block device and one stored
in a file.

> As Christoph mentions, we really don't 
> have stacked protocols and I'm

The only question is if we call them stacked formats or stacked
protocols. One of them exists.

>>> not sure they make sense.
>>>  
>> Right, if we go for Christoph's suggestion, we don't need stacked
>> protocols. We'll have stacked formats instead. I'm not sure if you like
>> this any better. ;-)
>>
>> We do have stacking today. -hda blkdebug:test.cfg:foo.qcow2 is qcow2 on
>> blkdebug on file. We need to be able to represent this.
>>
> 
> I think we need to do stacking in a device specific way.  When you look 
> at something like vmdk, it should actually support multiple leafs since 
> the format does support such a thing.  So what I'd suggest is:
> 
> [blockdev "part1"]
>format = "raw"
>file = "image000.vmdk"
> 
> [blockdev "part2"]
>format = "raw"
>file = "image001.vmdk"
> 
> [blockdev "image"]
>format = "vmdk"
>section0 = "part1"
>section1 = "part2"

Actually, I'd prefer to read that information from the VMDK file instead
of requiring the user to configure this manually...

> Note, we'll need to support this sort of model in order to support a 
> disk that creates an automatic partition table (which would be a pretty 
> useful feature). 

Sounds like a good example of a useful protocol.

Markus, I'm

[Qemu-devel] Re: [RFC][PATCH 2/2] qcow2: Use bdrv_(p)write_sync for metadata writes

2010-06-17 Thread Stefan Hajnoczi

On Thu, Jun 17, 2010 at 1:03 PM, Kevin Wolf  wrote:
> Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Any performance numbers?  This change is necessary for correctness but
I wonder what the performance impact is for users.

Stefan

[Qemu-devel] Re: [PATCH v3 0/5] Add QMP migration events

2010-06-17 Thread Luiz Capitulino

On Wed, 16 Jun 2010 21:10:04 +0200
Juan Quintela  wrote:

> Luiz Capitulino  wrote:
> > On Tue, 15 Jun 2010 17:24:59 +0200
> > Juan Quintela  wrote:
> 
> >> >
> >> >  I still don't see the need for MIGRATION_STARTED, it could be useful in
> >> > the target but I'd like to understand the use case in more detail.
> >> 
> >> At this point, if you are doing migration with tcp, and you are putting
> >> the wrong port on source (no path or any other error), you get no info
> >> at all of what is happening.
> >
> >  Shouldn't the migrate command just the return the expected error?
> 
> No.  Think you are "having troubles".  You try to find what happens.
> launch things by hand.  And there is no way to know if anybody has
> conected to the destination machine.  Some notification that migration
> has started is _very_ useful.  expecially when there are
> networks/firewalls/... in the middle.

 [...]

> That is it.  But you continue telling that going to the old house and
> doing a info migrate is a good interface.

 I'm sorry? When did I ever claimed such a thing?

 First point: all you describe is MIGRATION_CONNECTED, at the end of the day
it would do exactly what you want for MIGRATION_STARTED.

 The second, and most important point, is that we're trying not to make
things worse. Adding a number of events to circumvent a bad designed
command and having the wrong expectations (ie. help developer debugging)
is a clear recipe for disaster.

 Anyway, I think it doesn't matter anymore, as QMP is not going to be declared
stable for 0.13. In this case we'll have enough time to design the proper
interface.

> To add insult to injury, the problem is that libvirt people are not
> collaborative, and expect things that can't be done, are uncooperative,

 Again, I've never claimed that and I think you're taking this thread to
the wrong direction.

> 
> 
> Libvirt folks "also" do lots of things wrong, they are not perfect.  But
> it in this case, who is being completely unreasonable is qemu land.
> 
> Later, Juan.
>

[Qemu-devel] Re: [PATCH 5/5] linux fbdev display driver.

2010-06-17 Thread Julian Pidancet

On 06/17/2010 11:43 AM, Gerd Hoffmann wrote:
>Hi,
> 
> You register the display allocator, but don't unregister in 
> fbdev_display_uninit().
> 
> You are just lucky that fbdev_cleanup() forgets to unmap the framebuffer.
> 
> Apply the attached fix, start qemu with vnc, then do "change fbdev on" 
> and "change fbdev off" in the monitor and watch qemu segfault.
> 
> Also after "change fbdev on" the guest screen isn't rendered correctly.
> 
> cheers,
>Gerd
> 

Hi,

Thanks for spotting these errors. Here is a respin of my patch to address you 
concerns.
(The munmap call is included).

Cheers,

Julian

diff --git a/console.c b/console.c
index 698bc10..12ce215 100644
--- a/console.c
+++ b/console.c
@@ -1376,6 +1376,16 @@ DisplayAllocator *register_displayallocator(DisplayState 
*ds, DisplayAllocator *
 return ds->allocator;
 }
 
+void unregister_displayallocator(DisplayState *ds)
+{
+if (ds->allocator != &default_allocator) {
+ds->allocator->free_displaysurface(ds->surface);
+ds->surface = defaultallocator_create_displaysurface(ds_get_width(ds),
+ 
ds_get_height(ds));
+ds->allocator = &default_allocator;
+}
+}
+
 DisplayState *graphic_console_init(vga_hw_update_ptr update,
vga_hw_invalidate_ptr invalidate,
vga_hw_screen_dump_ptr screen_dump,
diff --git a/console.h b/console.h
index 124a22b..40bd927 100644
--- a/console.h
+++ b/console.h
@@ -192,6 +192,7 @@ PixelFormat qemu_different_endianness_pixelformat(int bpp);
 PixelFormat qemu_default_pixelformat(int bpp);
 
 DisplayAllocator *register_displayallocator(DisplayState *ds, DisplayAllocator 
*da);
+void unregister_displayallocator(DisplayState *ds);
 
 static inline DisplaySurface* qemu_create_displaysurface(DisplayState *ds, int 
width, int height)
 {
@@ -371,7 +372,7 @@ void sdl_display_init(DisplayState *ds, int full_screen, 
int no_frame);
 
 /* fbdev.c */
 void fbdev_display_init(DisplayState *ds, const char *device);
-void fbdev_display_uninit(void);
+void fbdev_display_uninit(DisplayState *ds);
 
 /* cocoa.m */
 void cocoa_display_init(DisplayState *ds, int full_screen);
diff --git a/fbdev.c b/fbdev.c
index 54f2381..8ea1838 100644
--- a/fbdev.c
+++ b/fbdev.c
@@ -67,13 +67,13 @@ static int fb_switch_state = FB_ACTIVE;
 
 /* qdev windup */
 static DisplayChangeListener  *dcl;
+static DisplayAllocator   *da;
 static QemuPfConv *conv;
 static PixelFormatfbpf;
-static intresize_screen;
-static intredraw_screen;
 static intcx, cy, cw, ch;
 static intdebug = 0;
 static Notifier   exit_notifier;
+uint8_t   *guest_surface;
 
 /* fwd decls */
 static int fbdev_activate_vt(int tty, int vtno, bool wait);
@@ -519,6 +519,10 @@ static void fbdev_cleanup(void)
 fprintf(stderr, "%s\n", __FUNCTION__);
 
 /* restore console */
+if (fb_mem != NULL) {
+munmap(fb_mem, fb_fix.smem_len + fb_mem_offset);
+fb_mem = NULL;
+}
 if (fb != -1) {
 if (ioctl(fb,FBIOPUT_VSCREENINFO, &fb_ovar) < 0)
 perror("ioctl FBIOPUT_VSCREENINFO");
@@ -786,10 +790,10 @@ static void fbdev_render(DisplayState *ds, int x, int y, 
int w, int h)
 uint8_t *src;
 int line;
 
-if (!conv)
+if (!conv || !guest_surface)
 return;
 
-src = ds_get_data(ds) + y * ds_get_linesize(ds)
+src = guest_surface + y * ds_get_linesize(ds)
 + x * ds_get_bytes_per_pixel(ds);
 dst = fb_mem + y * fb_fix.line_length
 + x * fbpf.bytes_per_pixel;
@@ -819,46 +823,50 @@ static void fbdev_update(DisplayState *ds, int x, int y, 
int w, int h)
 if (fb_switch_state != FB_ACTIVE)
 return;
 
-if (resize_screen) {
-if (debug)
-fprintf(stderr, "%s: handle resize\n", __FUNCTION__);
-resize_screen = 0;
-cx = 0; cy = 0;
-cw = ds_get_width(ds);
-ch = ds_get_height(ds);
-if (ds_get_width(ds) < fb_var.xres) {
-cx = (fb_var.xres - ds_get_width(ds)) / 2;
-}
-if (ds_get_height(ds) < fb_var.yres) {
-cy = (fb_var.yres - ds_get_height(ds)) / 2;
-}
+if (guest_surface != NULL) {
+fbdev_render(ds, x, y, w, h);
+}
+}
 
-if (conv) {
-qemu_pf_conv_put(conv);
-}
-conv = qemu_pf_conv_get(&fbpf, &ds->surface->pf);
-if (conv == NULL) {
-fprintf(stderr, "fbdev: unsupported PixelFormat conversion\n");
-}
+static void fbdev_setdata(DisplayState *ds)
+{
+if (conv) {
+qemu_pf_conv_put(conv);
 }
 
-if (redraw_screen) {
-if (debug)
-fprintf(stderr, "%s: handle redraw\n", __FUNCTION__);
-redraw_screen = 0;
-fbdev_cls();
-x = 0; y = 0; w = ds_get_wi

[Qemu-devel] Re: [PATCH v2] monitor: Really show snapshot information about all devices

2010-06-17 Thread Luiz Capitulino

On Thu, 17 Jun 2010 09:58:37 -0300
Miguel Di Ciurcio Filho  wrote:

> The 'info snapshots' monitor command does not show snapshot information from 
> all
> available block devices.
> 
> Usage example:
> $ qemu -hda disk1.qcow2 -hdb disk2.qcow2
> 
> (qemu) info snapshots
> Snapshot devices: ide0-hd0
> Snapshot list (from ide0-hd0):
> IDTAG VM SIZEDATE   VM CLOCK
> 11.5M 2010-05-26 21:51:02   00:00:03.263
> 21.5M 2010-05-26 21:51:09   00:00:08.844
> 31.5M 2010-05-26 21:51:24   00:00:23.274
> 41.5M 2010-05-26 21:53:17   00:00:03.595
> 
> In the above case, disk2.qcow2 has snapshot information, but it is not being
> shown. Only the first device is always shown.
> 
> This patch updates the do_info_snapshots() function do correctly show snapshot
> information about all available block devices.
> 
> New output:
> (qemu) info snapshots
> Snapshot list from ide0-hd0 (VM state image):
> IDTAG VM SIZEDATE   VM CLOCK
> 11.5M 2010-05-26 21:51:02   00:00:03.263
> 21.5M 2010-05-26 21:51:09   00:00:08.844
> 31.5M 2010-05-26 21:51:24   00:00:23.274
> 41.5M 2010-05-26 21:53:17   00:00:03.595
> 
> Snapshot list from ide0-hd1:
> IDTAG VM SIZEDATE   VM CLOCK
> 1   0 2010-05-26 21:51:02   00:00:03.263
> 2   0 2010-05-26 21:51:09   00:00:08.844
> 3   0 2010-05-26 21:51:24   00:00:23.274
> 4   0 2010-05-26 21:53:17   00:00:03.595

 I agree we need this info somewhere, but I'm wondering if this output won't
get users confused.

 Perhaps it would be perfect to have 'info snapshots -a', but the user Monitor
don't support passing options to info commands.

 Suggestions?

[Qemu-devel] Re: [RFC][PATCH 2/2] qcow2: Use bdrv_(p)write_sync for metadata writes

2010-06-17 Thread Kevin Wolf

Am 17.06.2010 16:19, schrieb Stefan Hajnoczi:
> On Thu, Jun 17, 2010 at 1:03 PM, Kevin Wolf  wrote:
>> Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.
> 
> Any performance numbers?  This change is necessary for correctness but
> I wonder what the performance impact is for users.

No numbers yet, but as you say we need to do it anyway. It should
definitely be better than any other option that I can think of
(cache=writethrough or some O_DIRECT|O_DSYNC mode) in that it only hurts
performance when metadata is actually changed. As long as we only write
guest data, there is no difference.

Making it a barrier instead of a flush would probably be better, have
you already had a look at this since we talked about it?

Kevin

[Qemu-devel] Re: [CFR 0/10] QMP specification review

2010-06-17 Thread Luiz Capitulino

On Tue, 15 Jun 2010 11:30:20 -0500
Anthony Liguori  wrote:

> This is the first set of commands as part of the QMP specification review.
> Please comment on the individual commands specifications and Stefan and I will
> try to fold the comments back into the command documentation.

 Very nice!

 A few comments regarding the process in general:

  1. How are the issues going to be addressed? I mean, are you or Stefan going
 to send fixes or should we create a TODO page in the wiki so that we can
 work on the feedback later?

  2. I think we should slow down, so that we give more time to reviewers and
 there's no reason to hurry IMHO, as we won't go stable in 0.13

  3. Avi and Daniel, please join the effort

[Qemu-devel] Re: [PATCH 2/3] Monitor command 'info trace'

2010-06-17 Thread Stefan Hajnoczi

On Wed, Jun 16, 2010 at 06:12:06PM +0530, Prerna Saxena wrote:
> diff --git a/simpletrace.c b/simpletrace.c
> index 2fec4d3..239ae3f 100644
> --- a/simpletrace.c
> +++ b/simpletrace.c
> @@ -62,3 +62,16 @@ void trace4(TraceEvent event, unsigned long x1, unsigned 
> long x2, unsigned long
>  void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
> long x3, unsigned long x4, unsigned long x5) {
>  trace(event, x1, x2, x3, x4, x5);
>  }
> +
> +void do_info_trace(Monitor *mon)
> +{
> +unsigned int i, max_idx;
> +
> +max_idx = trace_idx ? trace_idx : TRACE_BUF_LEN;

trace_idx is always in the range [0, TRACE_BUF_LEN).  There is no need
to perform this test.

> +
> +for (i=0; i +monitor_printf(mon, "Event %ld : %ld %ld %ld %ld %ld\n",
> +  trace_buf[i].event, trace_buf[i].x1, 
> trace_buf[i].x2,
> +trace_buf[i].x3, trace_buf[i].x4, 
> trace_buf[i].x5);

Getting only numeric output is the limitation of a binary trace.  It
would probably be possible to pretty-print without much additional code
by using the format strings from the trace-events file.

I think the numeric dump is good for now though.  Hex is more compact
than decimal and would make pointers easier to spot.  Want to change
this?

> +}
> +}
> diff --git a/tracetool b/tracetool
> index 9ea9c08..2c73bab 100755
> --- a/tracetool
> +++ b/tracetool
> @@ -130,6 +130,7 @@ void trace2(TraceEvent event, unsigned long x1, unsigned 
> long x2);
>  void trace3(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
> long x3);
>  void trace4(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
> long x3, unsigned long x4);
>  void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
> long x3, unsigned long x4, unsigned long x5);
> +void do_info_trace(Monitor *mon);
>  EOF
> 
>  simple_event_num=0
> @@ -289,6 +290,7 @@ tracetoh()
>  #define TRACE_H
> 
>  #include "qemu-common.h"
> +#include "monitor.h"

qemu-common.h forward-declares Monitor, I don't think you need
monitor.h.

Stefan

RE: [Qemu-devel] VLIW?

2010-06-17 Thread Gibbons, Scott

Yes, as a guest.

Thanks for the helpful suggestions.  We have a closed pipeline and code errors 
are caught by the assembler.  Delaying writeback is most likely what I'll be 
doing.

Another question I have is how to handle this multithreaded architecture.  This 
seems to be extraordinarily difficult as a dynamic translation problem and I'll 
probably defer it to later.  But, if anyone has any suggestions, I'd be glad to 
hear them.

Thanks,
--Scott

---
Qualcomm Inc. / Hexagon Tools
Austin, TX

-Original Message-
From: Richard Henderson [mailto:rth7...@gmail.com] On Behalf Of Richard 
Henderson
Sent: Wednesday, June 16, 2010 12:41 PM
To: Gibbons, Scott
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] VLIW?

On 06/15/2010 08:53 AM, Gibbons, Scott wrote:
> Has anyone done a port of QEMU to a VLIW architecture?  I'm interested
> in seeing what was done.

Do you mean as guest or host?  I presume guest.

There's not such a port in the main repository; I don't know
what might have been done privately.

It'll be a more difficult job if you have an open pipeline, but
even then I should think it could be done.  It really depends on
the exact specification of your cpu.

For instance, with a closed pipeline, I think all you would need
to track during translation are the output temporaries.  You would
translate each member instruction sequentially, but delay writeback
to the architectual register until the end of the vliw packet.

With an open pipeline, I imagine that you would model each exposed
architectural feature.  For instance, if a load insn places its
result onto a bus in the cycle following the issue of the load,
then you could model the bus with a TCG register and have the
translator be responsible for issuing moves between the TCG 
registers during appropriate cycles.

I imagine the difficulty increases (but not intractably) if you
want the translator to catch and signal user coding errors in the
vliw assembly.  Though usually that's a job that can be performed
statically by the assembler...

r~

[Qemu-devel] [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Alex Williamson

The comment suggests we're checking for the driver in the ready
state and bus master disabled, but the code is checking that it's
not in the ready state.

Signed-off-by: Alex Williamson 
Found-by: Amit Shah 
---

 hw/virtio-pci.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index e101fa0..7a86a81 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
*f)
 
 /* Try to find out if the guest has bus master disabled, but is
in ready state. Then we have a buggy guest OS. */
-if (!(proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
 !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
 proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
 }

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Michael S. Tsirkin

On Thu, Jun 17, 2010 at 09:15:02AM -0600, Alex Williamson wrote:
> The comment suggests we're checking for the driver in the ready
> state and bus master disabled, but the code is checking that it's
> not in the ready state.
> 
> Signed-off-by: Alex Williamson 
> Found-by: Amit Shah 

Acked-by: Michael S. Tsirkin 

> ---
> 
>  hw/virtio-pci.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index e101fa0..7a86a81 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
> *f)
>  
>  /* Try to find out if the guest has bus master disabled, but is
> in ready state. Then we have a buggy guest OS. */
> -if (!(proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>  !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
>  proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
>  }

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Amit Shah

On (Thu) Jun 17 2010 [09:15:02], Alex Williamson wrote:
> The comment suggests we're checking for the driver in the ready
> state and bus master disabled, but the code is checking that it's
> not in the ready state.
> 
> Signed-off-by: Alex Williamson 
> Found-by: Amit Shah 
> ---

Acked-by: Amit Shah 


Amit

Re: [Qemu-devel] VLIW?

2010-06-17 Thread Richard Henderson

On 06/17/2010 08:12 AM, Gibbons, Scott wrote:
> Another question I have is how to handle this multithreaded
> architecture.  This seems to be extraordinarily difficult as a
> dynamic translation problem and I'll probably defer it to later.
> But, if anyone has any suggestions, I'd be glad to hear them.

How is your threading different from other SMP systems?

In system mode, QEMU TCG is single-threaded and models SMP via
cooperative switching in between TCG translation blocks.  It's
not ideal, but it does solve quite a number of problems and is
at least functional.

r~

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Alexander Graf

Alex Williamson wrote:
> The comment suggests we're checking for the driver in the ready
> state and bus master disabled, but the code is checking that it's
> not in the ready state.
>
> Signed-off-by: Alex Williamson 
> Found-by: Amit Shah 
> ---
>
>  hw/virtio-pci.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index e101fa0..7a86a81 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
> *f)
>  
>  /* Try to find out if the guest has bus master disabled, but is
> in ready state. Then we have a buggy guest OS. */
> -if (!(proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>   

Phew - that's an evil one. Thanks for the catch!

Acked-by: Alexander Graf 

Alex

[Qemu-devel] Re: [Bug 595117] Re: qemu-nbd slow and missing "writeback" cache option

2010-06-17 Thread Dustin Kirkland

Stephane-

I understand your plight.  However, according to the rules and
policies of the QEMU project, you must submit the patch on the
qemu-devel@ mailing list, in addition to (or instead of) in the bug
tracker.  It's not my project, not my policy.  I'm just trying to make
sure you get your patch in front of the right audience such that it
can be discussed and accepted.

-- 
qemu-nbd slow and missing "writeback" cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and "qemu-nbd -d" doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless "syncs" are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
"sync"s will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon "qemu-nbd -d" to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a < /dev/null
qemu-nbd --cache= -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT->sync or die$!' 1<> /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.

[Qemu-devel] Re: [PATCH 3/3] Toggle tracepoint state

2010-06-17 Thread Stefan Hajnoczi

On Wed, Jun 16, 2010 at 06:14:35PM +0530, Prerna Saxena wrote:
> This patch adds support for dynamically enabling/disabling of tracepoints.
> This is done by internally maintaining each tracepoint's state, and 
> permitting logging of data from a tracepoint only if it is in an 
> 'active' state.
> 
> Monitor commands added :
> 1) info tracepoints   : to view all available tracepoints and 
> their state.
> 2) tracepoint NAME on|off : to enable/disable data logging from a 
> given tracepoint.
> Eg, tracepoint paio_submit off 
>   disables logging of data when 
>   paio_submit is hit.
> 
> Signed-off-by: Prerna Saxena 
> ---
> 
>  monitor.c   |   16 ++
>  qemu-monitor.hx |   18 
>  simpletrace.c   |   63 
> +++
>  tracetool   |   30 +++---
>  vl.c|6 +
>  5 files changed, 129 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/monitor.c b/monitor.c
> index 8b60830..238bdf0 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -548,6 +548,15 @@ static void do_commit(Monitor *mon, const QDict *qdict)
>  }
>  }
> 
> +#ifdef CONFIG_SIMPLE_TRACE
> +static void do_change_tracepoint_state(Monitor *mon, const QDict *qdict)
> +{
> +const char *tp_name = qdict_get_str(qdict, "name");
> +bool new_state = qdict_get_bool(qdict, "option");
> +change_tracepoint_state(tp_name, new_state);
> +}
> +#endif
> +
>  static void user_monitor_complete(void *opaque, QObject *ret_data)
>  {
>  MonitorCompletionData *data = (MonitorCompletionData *)opaque; 
> @@ -2791,6 +2800,13 @@ static const mon_cmd_t info_cmds[] = {
>  .help   = "show current contents of trace buffer",
>  .mhandler.info = do_info_trace,
>  },
> +{
> +.name   = "tracepoints",
> +.args_type  = "",
> +.params = "",
> +.help   = "show available tracepoints & their state",
> +.mhandler.info = do_info_all_tracepoints,
> +},
>  #endif
>  {
>  .name   = NULL,
> diff --git a/qemu-monitor.hx b/qemu-monitor.hx
> index 766c30f..8540b8f 100644
> --- a/qemu-monitor.hx
> +++ b/qemu-monitor.hx
> @@ -117,6 +117,8 @@ show device tree
>  #ifdef CONFIG_SIMPLE_TRACE
>  @item info trace
>  show contents of trace buffer
> +...@item info tracepoints
> +show available tracepoints and their state
>  #endif
>  @end table
>  ETEXI
> @@ -225,6 +227,22 @@ STEXI
>  @item logfile @var{filename}
>  @findex logfile
>  Output logs to @var{filename}.
> +#ifdef CONFIG_SIMPLE_TRACE
> +ETEXI
> +
> +{
> +.name   = "tracepoint",
> +.args_type  = "name:s,option:b",
> +.params = "name on|off",
> +.help   = "changes status of a specific tracepoint",
> +.mhandler.cmd = do_change_tracepoint_state,
> +},
> +
> +STEXI
> +...@item tracepoint
> +...@findex tracepoint
> +changes status of a tracepoint
> +#endif
>  ETEXI
> 
>  {
> diff --git a/simpletrace.c b/simpletrace.c
> index 239ae3f..4221a8f 100644
> --- a/simpletrace.c
> +++ b/simpletrace.c
> @@ -3,6 +3,12 @@
>  #include "trace.h"
> 
>  typedef struct {
> +char *tp_name;
> +bool state;
> +unsigned int hash;
> +} Tracepoint;

The tracing infrastructure avoids using the name 'tracepoint'.  It calls
them trace events.  I didn't deliberately choose that name, but was
unaware at the time that Linux tracing calls them tracepoints.  Given
that 'trace event' is currently used, it would be nice to remain
consistent/reduce confusion.

How about:
typedef struct {
const char *name;
bool enabled;
unsigned int hash;
} TraceEventState;

Or a nicer overall change might be to rename enum TraceEvent to
TraceEventID and Tracepoint to TraceEvent.

> +
> +typedef struct {
>  unsigned long event;
>  unsigned long x1;
>  unsigned long x2;
> @@ -18,11 +24,29 @@ enum {
>  static TraceRecord trace_buf[TRACE_BUF_LEN];
>  static unsigned int trace_idx;
>  static FILE *trace_fp;
> +static Tracepoint trace_list[NR_TRACEPOINTS];
> +
> +void init_tracepoint(const char *tname, TraceEvent tevent)
> +{
> +if (!tname || tevent > NR_TRACEPOINTS) {
> +return;
> +}

I'd drop this check because only trace.c should use init_tracepoint()
and you have ensured it uses correct arguments.  Just a coding style
suggestion; having redundant checks makes the code more verbose, may
lead the reader to assume that this function really is called with junk
arguments, and silently returning will not help make the issue visible.

> +trace_list[tevent].tp_name = (char*)qemu_malloc(strlen(tname)+1);
> +strncpy(trace_list[tevent].tp_name, tname, strlen(tname));

Or use qemmu_strdup() but we don't really need to allocate memory at all
here.  Just hold the const char* to a string litera

Re: [Qemu-devel] Re: [PATCH 5/5] linux fbdev display driver.

2010-06-17 Thread Julian Pidancet

On 06/17/2010 03:29 PM, Julian Pidancet wrote:
> 
> Hi,
> 
> Thanks for spotting these errors. Here is a respin of my patch to address you 
> concerns.
> (The munmap call is included).
> 
> Cheers,
> 
> Julian
> 

Oh, I actually tested the last patch only with the -nographic switch. There's 
still a segfault when starting qemu with vnc.
You can fix it by adding a call to dpy_resize(ds) after the dcl = NULL; line in 
fbdev_display_uninit().

For some reason, the display is extremely slow when using vnc and fbdev at the 
same time.

Julian

[Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing "writeback" cache option

2010-06-17 Thread Brian Murray

** Tags added: patch

-- 
qemu-nbd slow and missing "writeback" cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and "qemu-nbd -d" doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless "syncs" are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
"sync"s will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon "qemu-nbd -d" to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a < /dev/null
qemu-nbd --cache= -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT->sync or die$!' 1<> /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.

[Qemu-devel] Re: [PATCH] virtio-pci: fix bus master bug setting on load

2010-06-17 Thread Juan Quintela

Alex Williamson  wrote:
> The comment suggests we're checking for the driver in the ready
> state and bus master disabled, but the code is checking that it's
> not in the ready state.
>
> Signed-off-by: Alex Williamson 
> Found-by: Amit Shah 

Acked-by: Juan Quintela 

> ---
>
>  hw/virtio-pci.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index e101fa0..7a86a81 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -155,7 +155,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
> *f)
>  
>  /* Try to find out if the guest has bus master disabled, but is
> in ready state. Then we have a buggy guest OS. */
> -if (!(proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>  !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
>  proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
>  }

[Qemu-devel] Re: [PATCH v3 0/5] Add QMP migration events

2010-06-17 Thread Luiz Capitulino

On Thu, 17 Jun 2010 18:34:00 +0200
Juan Quintela  wrote:

> Luiz Capitulino  wrote:
> > On Wed, 16 Jun 2010 21:10:04 +0200
> > Juan Quintela  wrote:
> >
> >> Luiz Capitulino  wrote:
> >> > On Tue, 15 Jun 2010 17:24:59 +0200
> >> > Juan Quintela  wrote:
> >> 
> >> >> >
> >> >> >  I still don't see the need for MIGRATION_STARTED, it could be useful 
> >> >> > in
> >> >> > the target but I'd like to understand the use case in more detail.
> >> >> 
> >> >> At this point, if you are doing migration with tcp, and you are putting
> >> >> the wrong port on source (no path or any other error), you get no info
> >> >> at all of what is happening.
> >> >
> >> >  Shouldn't the migrate command just the return the expected error?
> >> 
> >> No.  Think you are "having troubles".  You try to find what happens.
> >> launch things by hand.  And there is no way to know if anybody has
> >> conected to the destination machine.  Some notification that migration
> >> has started is _very_ useful.  expecially when there are
> >> networks/firewalls/... in the middle.
> >
> >  [...]
> >
> >> That is it.  But you continue telling that going to the old house and
> >> doing a info migrate is a good interface.
> >
> >  I'm sorry? When did I ever claimed such a thing?
> 
> polling is enough.  polling has to be done in source machine.

 Enough for the meantime, until we have something better to offer. The problem
here is that adding not so good stuff to a protocol is that we will have to
maintain it for a quite long time, possibly forever.

 That's why I'm being so opposed to a large set of events, a reduced set is a 
lot
more attractive.

> >  First point: all you describe is MIGRATION_CONNECTED, at the end of the day
> > it would do exactly what you want for MIGRATION_STARTED.
> >
> >  The second, and most important point, is that we're trying not to make
> > things worse. Adding a number of events to circumvent a bad designed
> > command and having the wrong expectations (ie. help developer debugging)
> > is a clear recipe for disaster.
> >
> >  Anyway, I think it doesn't matter anymore, as QMP is not going to be 
> > declared
> > stable for 0.13. In this case we'll have enough time to design the proper
> > interface.
> >
> >> To add insult to injury, the problem is that libvirt people are not
> >> collaborative, and expect things that can't be done, are uncooperative,
> >
> >  Again, I've never claimed that and I think you're taking this thread to
> > the wrong direction.
> 
> Ok, I stop then.

 I'm not asking you to stop arguing, just to avoid taking the non-technical
route in a bad way.

 Now, we have the following situation: MIGRATION_CONNECTED and MIGRATION_DONE
would have possibly been a good fit for 0.13 if QMP was going to be stable.

 However, that's not going to happen so the question is: is it interesting
to have those events for an unstable QMP? Do we expect any client to need it? Or
can we wait until 0.14?

Re: [Qemu-devel] Re: [PATCH 1/2] qemu-io: retry fgets() when errno is EINTRg

2010-06-17 Thread Jamie Lokier

Kevin Wolf wrote:
> Am 16.06.2010 18:52, schrieb MORITA Kazutaka:
> > At Wed, 16 Jun 2010 13:04:47 +0200,
> > Kevin Wolf wrote:
> >>
> >> Am 15.06.2010 19:53, schrieb MORITA Kazutaka:
> >>> posix-aio-compat sends a signal in aio operations, so we should
> >>> consider that fgets() could be interrupted here.
> >>>
> >>> Signed-off-by: MORITA Kazutaka 
> >>> ---
> >>>  cmd.c |3 +++
> >>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/cmd.c b/cmd.c
> >>> index 2336334..460df92 100644
> >>> --- a/cmd.c
> >>> +++ b/cmd.c
> >>> @@ -272,7 +272,10 @@ fetchline(void)
> >>>   return NULL;
> >>>   printf("%s", get_prompt());
> >>>   fflush(stdout);
> >>> +again:
> >>>   if (!fgets(line, MAXREADLINESZ, stdin)) {
> >>> + if (errno == EINTR)
> >>> + goto again;
> >>>   free(line);
> >>>   return NULL;
> >>>   }
> >>
> >> This looks like a loop replaced by goto (and braces are missing). What
> >> about this instead?
> >>
> >> do {
> >> ret = fgets(...)
> >> } while (ret == NULL && errno == EINTR)
> >>
> >> if (ret == NULL) {
> >>fail
> >> }
> >>
> > 
> > I agree.
> > 
> > However, it seems that my second patch have already solved the
> > problem.  We register this readline routines as an aio handler now, so
> > fgets() does not block and cannot return with EINTR.
> > 
> > This patch looks no longer needed, sorry.
> 
> Good point. Thanks for having a look.

Anyway, are you sure stdio functions can be interrupted with EINTR?
Linus reminds us that some stdio functions have to retry internally
anyway:

http://comments.gmane.org/gmane.comp.version-control.git/18285

-- Jamie

[Qemu-devel] Re: [PATCH v3 0/5] Add QMP migration events

2010-06-17 Thread Juan Quintela

Luiz Capitulino  wrote:
> On Wed, 16 Jun 2010 21:10:04 +0200
> Juan Quintela  wrote:
>
>> Luiz Capitulino  wrote:
>> > On Tue, 15 Jun 2010 17:24:59 +0200
>> > Juan Quintela  wrote:
>> 
>> >> >
>> >> >  I still don't see the need for MIGRATION_STARTED, it could be useful in
>> >> > the target but I'd like to understand the use case in more detail.
>> >> 
>> >> At this point, if you are doing migration with tcp, and you are putting
>> >> the wrong port on source (no path or any other error), you get no info
>> >> at all of what is happening.
>> >
>> >  Shouldn't the migrate command just the return the expected error?
>> 
>> No.  Think you are "having troubles".  You try to find what happens.
>> launch things by hand.  And there is no way to know if anybody has
>> conected to the destination machine.  Some notification that migration
>> has started is _very_ useful.  expecially when there are
>> networks/firewalls/... in the middle.
>
>  [...]
>
>> That is it.  But you continue telling that going to the old house and
>> doing a info migrate is a good interface.
>
>  I'm sorry? When did I ever claimed such a thing?

polling is enough.  polling has to be done in source machine.

>  First point: all you describe is MIGRATION_CONNECTED, at the end of the day
> it would do exactly what you want for MIGRATION_STARTED.
>
>  The second, and most important point, is that we're trying not to make
> things worse. Adding a number of events to circumvent a bad designed
> command and having the wrong expectations (ie. help developer debugging)
> is a clear recipe for disaster.
>
>  Anyway, I think it doesn't matter anymore, as QMP is not going to be declared
> stable for 0.13. In this case we'll have enough time to design the proper
> interface.
>
>> To add insult to injury, the problem is that libvirt people are not
>> collaborative, and expect things that can't be done, are uncooperative,
>
>  Again, I've never claimed that and I think you're taking this thread to
> the wrong direction.

Ok, I stop then.

Later, Juan.

[Qemu-devel] Re: [PATCH v3 0/5] Add QMP migration events

2010-06-17 Thread Anthony Liguori


On 06/17/2010 11:45 AM, Luiz Capitulino wrote:

On Thu, 17 Jun 2010 18:34:00 +0200
Juan Quintela  wrote:

   

Luiz Capitulino  wrote:
 

On Wed, 16 Jun 2010 21:10:04 +0200
Juan Quintela  wrote:

   

Luiz Capitulino  wrote:
 

On Tue, 15 Jun 2010 17:24:59 +0200
Juan Quintela  wrote:
   
 

  I still don't see the need for MIGRATION_STARTED, it could be useful in
the target but I'd like to understand the use case in more detail.
   

At this point, if you are doing migration with tcp, and you are putting
the wrong port on source (no path or any other error), you get no info
at all of what is happening.
 

  Shouldn't the migrate command just the return the expected error?
   

No.  Think you are "having troubles".  You try to find what happens.
launch things by hand.  And there is no way to know if anybody has
conected to the destination machine.  Some notification that migration
has started is _very_ useful.  expecially when there are
networks/firewalls/... in the middle.
 

  [...]

   

That is it.  But you continue telling that going to the old house and
doing a info migrate is a good interface.
 

  I'm sorry? When did I ever claimed such a thing?
   

polling is enough.  polling has to be done in source machine.
 

  Enough for the meantime, until we have something better to offer. The problem
here is that adding not so good stuff to a protocol is that we will have to
maintain it for a quite long time, possibly forever.

  That's why I'm being so opposed to a large set of events, a reduced set is a 
lot
more attractive.

   

  First point: all you describe is MIGRATION_CONNECTED, at the end of the day
it would do exactly what you want for MIGRATION_STARTED.

  The second, and most important point, is that we're trying not to make
things worse. Adding a number of events to circumvent a bad designed
command and having the wrong expectations (ie. help developer debugging)
is a clear recipe for disaster.

  Anyway, I think it doesn't matter anymore, as QMP is not going to be declared
stable for 0.13. In this case we'll have enough time to design the proper
interface.

   

To add insult to injury, the problem is that libvirt people are not
collaborative, and expect things that can't be done, are uncooperative,
 

  Again, I've never claimed that and I think you're taking this thread to
the wrong direction.
   

Ok, I stop then.
 

  I'm not asking you to stop arguing, just to avoid taking the non-technical
route in a bad way.

  Now, we have the following situation: MIGRATION_CONNECTED and MIGRATION_DONE
would have possibly been a good fit for 0.13 if QMP was going to be stable.

  However, that's not going to happen so the question is: is it interesting
to have those events for an unstable QMP? Do we expect any client to need it? Or
can we wait until 0.14?
   


We need MIGRATION_CONNECTED post 0.13.  We won't need MIGRATION_DONE so 
there's probably no point in introducing it.


Regards,

Anthony Liguori

Re: [Qemu-devel] Re: [PATCH V4 2/3] qemu: Generic task offloading framework: threadlets

2010-06-17 Thread Anthony Liguori


On 06/17/2010 05:09 AM, Paolo Bonzini wrote:

+while (QTAILQ_EMPTY(&(queue->request_list))&&
+   (ret != ETIMEDOUT)) {
+ret = qemu_cond_timedwait(&(queue->cond),
+ &(queue->lock), 10*10);
+}


Using qemu_cond_timedwait is a hack for not properly broadcasting the
condvar in flush_threadlet_queue.


I think Anthony answered this one.


I think he said that the code has been changed so I am right? :)


You're right about the condition we check in the exit path but the 
timedwait is needed to expire an idle thread.


Regards,

Anthony Liguori

RE: [Qemu-devel] VLIW?

2010-06-17 Thread Gibbons, Scott

My architecture is an Interleaved Multithreading VLIW architecture.  One bundle 
(packet) executes per processor cycle, rotating between threads (i.e., thread 0 
executes at time 0, thread 1 executes at time 1, then thread 0 executes at time 
2, etc.).  Each thread has its own context (including a program counter).  I'm 
not sure what kind of performance I would get in translating a single bundle at 
a time (or maybe I'm misunderstanding).

I think I'll get basic single-thread operation working first, then attempt 
multithreading when I have a spare month or so.

Thanks,
--Scott

---
Qualcomm Inc. / Hexagon Tools
Austin, TX
sgibb...@qualcomm.com
Office: 512-623-3831
Cell: 469-450-8390

-Original Message-
From: Richard Henderson [mailto:rth7...@gmail.com] On Behalf Of Richard 
Henderson
Sent: Thursday, June 17, 2010 11:03 AM
To: Gibbons, Scott
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] VLIW?

On 06/17/2010 08:12 AM, Gibbons, Scott wrote:
> Another question I have is how to handle this multithreaded
> architecture.  This seems to be extraordinarily difficult as a
> dynamic translation problem and I'll probably defer it to later.
> But, if anyone has any suggestions, I'd be glad to hear them.

How is your threading different from other SMP systems?

In system mode, QEMU TCG is single-threaded and models SMP via
cooperative switching in between TCG translation blocks.  It's
not ideal, but it does solve quite a number of problems and is
at least functional.

r~

Re: [Qemu-devel] VLIW?

2010-06-17 Thread Richard Henderson

On 06/17/2010 11:05 AM, Gibbons, Scott wrote:
> My architecture is an Interleaved Multithreading VLIW architecture.
> One bundle (packet) executes per processor cycle, rotating between
> threads (i.e., thread 0 executes at time 0, thread 1 executes at time
> 1, then thread 0 executes at time 2, etc.).  Each thread has its own
> context (including a program counter).

Ah, I see.  And presumably this knowledge of synchronicity is something
that can be leveraged by the programmer for specific tasks?  Or does 
the closed pipeline mean that you have unpredictable delays that can
stall the pipeline, which can then throw off the thread lock-step?

> I'm not sure what kind of
> performance I would get in translating a single bundle at a time (or
> maybe I'm misunderstanding).

"Poor" might be a word to describe it.

> I think I'll get basic single-thread operation working first, then
> attempt multithreading when I have a spare month or so.

Good plan.

r~

Re: [Qemu-devel] [PATCH] monitor: Add force option support to pci_del command

2010-06-17 Thread Luiz Capitulino

On Tue, 15 Jun 2010 11:03:13 +0200
Markus Armbruster  wrote:

> Anthony Liguori  writes:
> 
> > On 06/09/2010 09:27 AM, Gerd Hoffmann wrote:
> >>   Hi,
> >>
> >>> This make sense when you mistakenly add a pci device on a -s -S
> >>> scenario, like the scenario described on the following bug:
> >>> https://bugs.launchpad.net/qemu/+bug/544367.
> >>
> >> It doesn't IMHO.
> >>
> >>> When ACPI-based hotplug support is present on the guest and we run
> >>> pci_del with the force option, the hotplug events will still be
> >>> generated to the guest and the guest still will trigger the EJx event,
> >>> which will end by calling pciej_write() on qemu side. This function will
> >>> do nothing on a -f and pci hotplug support scenario, as the pci device
> >>> was previously removed by pci_del.
> >>
> >> And in case the guest wants to do anything (like flushing dirty
> >> buffers) before triggering the EJx event it will horribly fail.
> >>
> >> If the guest is stopped while unplugging the device the unplug
> >> should happen as soon as the guest is unpaused.
> >
> > This is a case where the fundamental problem is that the pci_del
> > command should block until the guest has actually responded to the
> > request.
> >
> > pci_del returning with no error and yet not having the operation
> > complete is certainly a usability issue.
> 
> s/pci_del/device_del/g  :)
> 
> What should device_del do?  Wait until ACPI reports that the guest has
> processed the unplug event?  What if the guest doesn't?  Hanging
> indefinitely is not an option.  Can we reliably detect this case?

 This is a general question for all commands that can take way too long
or never return.

 For QMP the question is whether we should handle this in QEMU or in the
client. Ie, if the guest doesn't respond the client could detect that
and cancel the async command.

 For HMP we could just live with that or suspend the shell and allow the
user to cancel the operation (eg. ctrl+c) and the obvious alternative is to
have timeouts, allowing the user to set them.

Re: [Qemu-devel] [PATCH] monitor: Add force option support to pci_del command

2010-06-17 Thread Anthony Liguori


On 06/17/2010 01:15 PM, Luiz Capitulino wrote:

  This is a general question for all commands that can take way too long
or never return.

  For QMP the question is whether we should handle this in QEMU or in the
client. Ie, if the guest doesn't respond the client could detect that
and cancel the async command.
   


Exactly.  It's no different than a migration that takes too long.


  For HMP we could just live with that or suspend the shell and allow the
user to cancel the operation (eg. ctrl+c) and the obvious alternative is to
have timeouts, allowing the user to set them.
   


Yeah, ctrl+c to cancel would be a very nice feature.

Regards,

Anthony Liguori

Re: [Qemu-devel] VLIW?

2010-06-17 Thread Jamie Lokier

Gibbons, Scott wrote:
> My architecture is an Interleaved Multithreading VLIW architecture.  One 
> bundle (packet) executes per processor cycle, rotating between threads (i.e., 
> thread 0 executes at time 0, thread 1 executes at time 1, then thread 0 
> executes at time 2, etc.).  Each thread has its own context (including a 
> program counter).  I'm not sure what kind of performance I would get in 
> translating a single bundle at a time (or maybe I'm misunderstanding).
> 
> I think I'll get basic single-thread operation working first, then attempt 
> multithreading when I have a spare month or so.

I know of another CPU architecture that has fine-grained hardware
threads and has working qemu emulation at a useful performance for
debugging kernels, but it's not public as far as I know, and I don't
know if it's ok to name it.  I don't think it's VLIW, only that it has
lots of hardware threads and a working qemu model.

-- Jamie

[Qemu-devel] Re: [PATCH] ceph/rbd block driver for qemu-kvm (v3)

2010-06-17 Thread Christian Brunner

Hi Simone,

sorry for the late reply. I've been on vacation for a week.

Thanks for sending the patch. At first sight your patch looks good.
I'll do some testing by the weekend.

Kevin also sent me a note about the missing aio support, but I didn't
have the time to implement it yet. Now it seems, that I don't have to
do it, since you where quicker... :)

Regarding locking: There were some problems with the thread handling,
when I started writing the driver. But Yehuda removed the use of
SIGUSERx and Sage modified librados, so that the Ceph Thread class is
masking signals on any new thread it creates. (see
http://ceph.newdream.net/git/?p=ceph.git;a=commit;h=cf4414684dd2ca5f2a565449be4686849695f62f
and 
http://ceph.newdream.net/git/?p=ceph.git;a=commit;h=e4e775b60f117ba2d07da9e0e438714b409447b6).
I think that this is also sufficient for the aio callbacks.

Regards

Christian

2010/6/11 Simone Gotti :
> Hi Christian,
>
> thanks for you patch. I tried it a little and it worked quite well but
> during some live migration tests I noticed a problem.
>
>
> The problem is related to live migration with high I/O using the AIO
> calls (I triggered it with a simple "dd").
>
> If you launch a live migration and the guest is stopped and started on
> the new qemu process while some AIO was in flight the guest on the new
> qemu will wait undefinitely for data this will never come. With ata
> emulation an ata reset is sent after some seconds but with virtio this
> won't happen.
>
> I'm not a qemu expert but from what I understand qemu in
> savevm.c:do_savevm calls qemu_aio_flush to wait that all the asyncronous
> aio returned (the callback si called). But the rbd block driver doesn't
> use the qemu aio model but the rados one so that function will never
> know of the rados aio.
>
> So a solution will be to glue the block driver with the qemu aio model.
> I tried to do this to test if this will work in the attached patch. I
> only tested with one rbd block device but the live migration tests
> worked (in the patch I removed all the debug prints I adedd to see if
> all AIO requets really returned.
>
> This is an RFC just to know what you think about this possible solution.
> As qemu's aio model is event based and it needs a file descriptor for
> event communication i used eventfd to do this.
> Let me know if you need a detailed description of the patch!
>
>
> I've also got a question: as librados is multithreaded the callbacks are
> called in another thread. Is there the need to protect some critical
> sections with a lock (for example in rbd_aio_rw_vector and in
> rbd_finish_aiocb)?
>
>
> Thanks!
>
> Bye!
>
>
> From: Simone Gotti 
> Date: Fri, 11 Jun 2010 21:19:39 +0200
> Subject: [PATCH] block/rbd: Added glue to qemu aio model to fix live
> migration with outstanding aio
>
> Signed-off-by: Simone Gotti 
>
>
> ---
>  block/rbd.c |   63
> +-
>  1 files changed, 57 insertions(+), 6 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 4d22069..83b7898 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -25,6 +25,8 @@
>
>  #include 
>
> +#include 
> +
>  /*
>  * When specifying the image filename use:
>  *
> @@ -47,6 +49,15 @@
>
>  #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
>
> +typedef struct BDRVRBDState {
> +    int efd;
> +    rados_pool_t pool;
> +    char name[RBD_MAX_OBJ_NAME_SIZE];
> +    uint64_t size;
> +    uint64_t objsize;
> +    int qemu_aio_count;
> +} BDRVRBDState;
> +
>  typedef struct RBDAIOCB {
>     BlockDriverAIOCB common;
>     QEMUBH *bh;
> @@ -57,6 +68,7 @@ typedef struct RBDAIOCB {
>     int64_t sector_num;
>     int aiocnt;
>     int error;
> +    BDRVRBDState *s;
>  } RBDAIOCB;
>
>  typedef struct RADOSCB {
> @@ -67,12 +79,6 @@ typedef struct RADOSCB {
>     char *buf;
>  } RADOSCB;
>
> -typedef struct BDRVRBDState {
> -    rados_pool_t pool;
> -    char name[RBD_MAX_OBJ_NAME_SIZE];
> -    uint64_t size;
> -    uint64_t objsize;
> -} BDRVRBDState;
>
>  typedef struct rbd_obj_header_ondisk RbdHeader1;
>
> @@ -255,6 +261,31 @@ done:
>     return ret;
>  }
>
> +static void rbd_aio_completion_cb(void *opaque)
> +{
> +    BDRVRBDState *s = opaque;
> +
> +    uint64_t val;
> +    ssize_t ret;
> +
> +    do {
> +        if ((ret = read(s->efd, &val, sizeof(val))) > 0) {
> +            s->qemu_aio_count -= val;
> +       }
> +    } while (ret == -1 && errno == EINTR);
> +
> +    return;
> +}
> +
> +static int rbd_aio_flush_cb(void *opaque)
> +{
> +    BDRVRBDState *s = opaque;
> +
> +    return (s->qemu_aio_count > 0) ? 1 : 0;
> +}
> +
> +
> +
>  static int rbd_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>     BDRVRBDState *s = bs->opaque;
> @@ -303,6 +334,15 @@ static int rbd_open(BlockDriverState *bs, const
> char *filename, int flags)
>     s->size = header->image_size;
>     s->objsize = 1 << header->options.order;
>
> +    s->efd = eventfd(0, 0);
> +    if (s->efd == -1) {
> +        error_report("error openi

1 2 >

1 - 100 of 120 matches

Mail list logo