Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Avi Kivity
On 02/14/2012 09:43 PM, Marcelo Tosatti wrote:
> Also it should not be necessary for these flushes to be inside mmu_lock
> on EPT/NPT case (since there is no write protection there). 

We do write protect with TDP, if nested virt is active.  The question is
whether we have indirect pages or not, not whether TDP is active or not
(even without TDP, if you don't enable paging in the guest, you don't
have to write protect).

> But it would
> be awkward to differentiate the unlock position based on EPT/NPT.
>

I would really like to move the IPI back out of the lock.

How about something like a sequence lock:


spin_lock(mmu_lock)
need_flush = write_protect_stuff();
atomic_add(kvm->want_flush_counter, need_flush);
spin_unlock(mmu_lock);

while ((done = atomic_read(kvm->done_flush_counter)) < (want =
atomic_read(kvm->want_flush_counter)) {
  kvm_make_request(flush)
  atomic_cmpxchg(kvm->done_flush_counter, done, want)
}

This (or maybe a corrected and optimized version) ensures that any
need_flush cannot pass the while () barrier, no matter which thread
encounters it first.  However it violates the "do not invent new locking
techniques" commandment.  Can we map it to some existing method?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0 regression with usb tablet after live migration

2012-02-15 Thread Peter Lieven
Anyone?

Peter Lieven wrote:
> Hi,
>
> i recently started updating our VMs to qemu-kvm 1.0. Since that I see
> that the usb tablet device (used for as pointer device for accurate
> mouse positioning) becomes unavailable after live migrating.
> If I migrate a few times a Windows 7 VM reliable stops using
> the USB tablet and fails back to PS/2 mouse.
> If I do the same with qemu-kvm-0.12.5 with the very same VM its working
> fine.
>
> Can anyone imagine what introduced this flaw?
>
> Thanks,
> Peter
>
>
>
>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #30 from Avi Kivity   2012-02-15 09:28:12 ---
Disable ksm, and build with debug information so we get useful information
instead of hex addresses.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AESNI and guest hosts

2012-02-15 Thread Avi Kivity
On 02/14/2012 08:18 PM, Brian Jackson wrote:
> On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote:
> > Sorry for being a noob here, Any clues with this?, anyone ...
> > 
> > On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown  wrote:
> > > Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
> > > kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
> > > its not able to supply the guest with aesni. Is there a config option
> > > or is there something we're missing?
>
>
>
> I don't think it's supported to pass that functionality to the guest.
>

Why not?  Perhaps a new libvirt or qemu is needed.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Avi Kivity
On 02/15/2012 11:18 AM, Avi Kivity wrote:
> On 02/14/2012 09:43 PM, Marcelo Tosatti wrote:
> > Also it should not be necessary for these flushes to be inside mmu_lock
> > on EPT/NPT case (since there is no write protection there). 
>
> We do write protect with TDP, if nested virt is active.  The question is
> whether we have indirect pages or not, not whether TDP is active or not
> (even without TDP, if you don't enable paging in the guest, you don't
> have to write protect).
>
> > But it would
> > be awkward to differentiate the unlock position based on EPT/NPT.
> >
>
> I would really like to move the IPI back out of the lock.
>
> How about something like a sequence lock:
>
>
> spin_lock(mmu_lock)
> need_flush = write_protect_stuff();
> atomic_add(kvm->want_flush_counter, need_flush);
> spin_unlock(mmu_lock);
>
> while ((done = atomic_read(kvm->done_flush_counter)) < (want =
> atomic_read(kvm->want_flush_counter)) {
>   kvm_make_request(flush)
>   atomic_cmpxchg(kvm->done_flush_counter, done, want)
> }
>
> This (or maybe a corrected and optimized version) ensures that any
> need_flush cannot pass the while () barrier, no matter which thread
> encounters it first.  However it violates the "do not invent new locking
> techniques" commandment.  Can we map it to some existing method?

There is no need to advance 'want' in the loop.  So we could do

/* must call with mmu_lock held */
void kvm_mmu_defer_remote_flush(kvm, need_flush)
{
  if (need_flush)
++kvm->flush_counter.want;
}

/* may call without mmu_lock */
void kvm_mmu_commit_remote_flush(kvm)
{
  want = ACCESS_ONCE(kvm->flush_counter.want)
  while ((done = atomic_read(kvm->flush_counter.done) < want) {
kvm_make_request(flush)
atomic_cmpxchg(kvm->flush_counter.done, done, want)
  }
}




-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-15 Thread Cyrill Gorcunov
On Tue, Feb 14, 2012 at 11:07:08PM -0500, Kevin O'Connor wrote:
...
> > hardware. Maybe we could poke someone from KVM camp for a hint?
> 
> SeaBIOS has two ways to be deployed - first is to copy the image to
> the top of the first 1MB (eg, 0xe-0xf) and jump to
> 0xf000:0xfff0 in 16bit mode.  The second way is to use the SeaBIOS elf
> and deploy into memory (according to the elf memory map) and jump to
> SeaBIOS in 32bit mode (according to the elf entry point).
> 
> SeaBIOS doesn't really need to be in the top 4G of ram.  SeaBIOS does
> expect to have normal PC hardware devices (eg, a PIC), though many
> hardware devices can be compiled out via its kconfig interface.  The
> more interesting challenge will likely be in communicating critical
> pieces of information (eg, total memory size) into SeaBIOS.
> 
> The SeaBIOS mailing list (seab...@seabios.org) is probably a better
> location for technical seabios questions.
> 

Hi Kevin, thanks for pointing. Yes, providing info back to seabios
to setup mttr and such (so seabios would recognize them) is
most challeging I think.

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)

2012-02-15 Thread Avi Kivity
On 02/13/2012 05:52 PM, Marcelo Tosatti wrote:
> > >  {
> > >+  x86_platform.restore_sched_clock_state();
> > Isn't it too early? It is scarry to say hypervisor to write to some
> > memory location and than completely replace page-tables and half of
> > cpu state in __restore_processor_state. Wouldn't that have a potential
> > of writing into a place that is not restored hv_clock and restored
> > hv_clock might still be stale?
>
> No, memory is copied in swsusp_arch_resume(), which happens
> before restore_processor_state. restore_processor_state() is only
> setting up registers and MTRR.
>

In addition, kvmclock uses physical addresses, so page table changes
don't matter.

Note we could have done this in
__save_processor_state()/__restore_processor_state() (it's just reading
and writing an MSR, like we do for MSR_IA32_MISC_ENABLE), but I think
your patch is the right way.  I'd like an ack from the x86 maintainers
though.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BUG in pv_clock when overflow condition is detected

2012-02-15 Thread Avi Kivity
On 02/13/2012 08:20 PM, Igor Mammedov wrote:
> BUG when overflow occurs at pvclock.c:pvclock_get_nsec_offset
>
> u64 delta = native_read_tsc() - shadow->tsc_timestamp;
>
> this might happen at an attempt to read an uninitialized yet clock.
> It won't prevent stalls and hangs but at least it won't do it silently.
>
> Signed-off-by: Igor Mammedov 
> ---
>  arch/x86/kernel/pvclock.c |5 -
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 42eb330..35a6190 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags)
>  
>  static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
>  {
> - u64 delta = native_read_tsc() - shadow->tsc_timestamp;
> + u64 delta;
> + u64 tsc = native_read_tsc();
> + BUG_ON(tsc < shadow->tsc_timestamp);
> + delta = tsc - shadow->tsc_timestamp;
>   return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
>  shadow->tsc_shift);

Maybe a WARN_ON_ONCE()?  Otherwise a relatively minor hypervisor bug can
kill the guest.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?

2012-02-15 Thread Avi Kivity
On 02/13/2012 03:35 PM, Asias He wrote:
> On 02/13/2012 12:38 PM, Pekka Enberg wrote:
> > On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote:
> >>> As I know, native tool does not support loading BIOS so it does not
> >>> support Windows. Is this supporting now?
> >>> If not, I may try to implement it.
>
> You're welcome to do so ;-). This would open the door for non-linux OS
> support in kvm tool.

Also, to loading the kernel from /boot, and so allowing for the normal
distro kernel update mechanism to work.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vsyscall=emulate regression

2012-02-15 Thread Amit Shah
On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote:
> On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah  wrote:
> > On (Fri) 03 Feb 2012 [13:57:48], Amit Shah wrote:
> >> Hello,
> >>
> >> I'm booting some latest kernels on a Fedora 11 (released June 2009)
> >> guest.  After the recent change of default to vsyscall=emulate, the
> >> guest fails to boot (init segfaults).
> >>
> >> I also tried vsyscall=none, as suggested by hpa, and that fails as
> >> well.  Only vsyscall=native works fine.
> >>
> >> The commit that introduced the kernel parameter,
> >>
> >> 3ae36655b97a03fa1decf72f04078ef945647c1a
> >>
> >> is bad too.
> >
> > I suggest we revert 2e57ae0515124af45dd889bfbd4840fd40fcc07d till we
> > track down and fix the vsyscal=emulate case.
> 
> Hi-
> 
> Sorry, I lost track of this one.  I can't reproduce it, although I
> doubt I've set up the right test environment.  But this is fishy:
> 
> init[1]: segfault at ff600400 ip ff600400 sp
> 7fff9c8ba098 error 5
> 
> Error 5, if I'm decoding it correctly, is a userspace read (i.e. not
> execute) fault.  The vsyscall emulation changes shouldn't have had any
> effect on reads there.
> 
> Can you try booting the initramfs here:
> http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
> with your kernel image (i.e. qemu-kvm -kernel  -initrd
> vsyscall_initramfs.img -whatever_else) and seeing what happens?  It
> works for me.

This too results in a similar error.

> I'm also curious what happens if you run without kvm (i.e. straight
> qemu)

Interesting; without kvm, this does work fine.

> and what your .config on the guest kernel is.  It sounds like
> something's wrong with your fixmap, which makes me wonder if your
> qemu/kernel combo is capable of booting even a modern distro
> (up-to-date F16, say) -- the vvar page uses identical fixmap flags as
> the vsyscall page in vsyscall=emulate and vsyscall=none mode.

I didn't try a modern distro, but looks like this is enough evidence
for now to check the kvm emulator code.  I tried the same guests on a
newer kernel (Fedora 16's 3.2), and things worked fine except for
vsyscall=none, panic message below.

> What host cpu are you on and what qemu flags do you use?

$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 Duo CPU E6550  @ 2.33GHz
stepping: 11
cpu MHz : 2000.000
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi 
flexpriority
bogomips: 4654.73
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

>  Maybe
> something is wrong with your emulator.

Yes, looks like it.  Thanks!

This is what I get with vsyscall=none, where emulate and native work
fine on the 3.2 kernel on different host hardware, the guest stays the
same:


[2.874661] debug: unmapping init memory 8167f000..818dc000
[2.876778] Write protecting the kernel read-only data: 6144k
[2.879111] debug: unmapping init memory 880001318000..88000140
[2.881242] debug: unmapping init memory 8800015a..88000160
[2.884637] init[1] vsyscall attempted with vsyscall=none 
ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0
[2.888078] init[1]: segfault at ff600400 ip ff600400 sp 
7fff2f48fe18 error 15
[2.888193] Refined TSC clocksource calibration: 2691.293 MHz.
[2.892748] 
[2.895219] Kernel panic - not syncing: Attempted to kill init!


Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 04:39 PM, Alexander Graf wrote:
> > 
> > Syscalls are orthogonal to that - they're to avoid the fget_light() and to 
> > tighten the vcpu/thread and vm/process relationship.
>
> How about keeping the ioctl interface but moving vcpu_run to a syscall then?

I dislike half-and-half interfaces even more.  And it's not like the
fget_light() is really painful - it's just that I see it occasionally in
perf top so it annoys me.

>  That should really be the only thing that belongs into the fast path, right? 
> Every time we do a register sync in user space, we do something wrong. 
> Instead, user space should either
>
>   a) have wrappers around register accesses, so it can directly ask for 
> specific registers that it needs
> or
>   b) keep everything that would be requested by the register synchronization 
> in shared memory

Always-synced shared memory is a liability, since newer hardware might
introduce on-chip caches for that state, making synchronization
expensive.  Or we may choose to keep some of the registers loaded, if we
have a way to trap on their use from userspace - for example we can
return to userspace with the guest fpu loaded, and trap if userspace
tries to use it.

Is an extra syscall for copying TLB entries to user space prohibitively
expensive?

> > 
> >> , keep the rest in user space.
> >> >
> >> >
> >> >  When a device is fully in the kernel, we have a good specification of 
> >> > the ABI: it just implements the spec, and the ABI provides the interface 
> >> > from the device to the rest of the world.  Partially accelerated devices 
> >> > means a much greater effort in specifying exactly what it does.  It's 
> >> > also vulnerable to changes in how the guest uses the device.
> >> 
> >> Why? For the HPET timer register for example, we could have a simple MMIO 
> >> hook that says
> >> 
> >>   on_read:
> >> return read_current_time() - shared_page.offset;
> >>   on_write:
> >> handle_in_user_space();
> > 
> > It works for the really simple cases, yes, but if the guest wants to set up 
> > one-shot timers, it fails.  
>
> I don't understand. Why would anything fail here? 

It fails to provide a benefit, I didn't mean it causes guest failures.

You also have to make sure the kernel part and the user part use exactly
the same time bases.

> Once the logic that's implemented by the kernel accelerator doesn't fit 
> anymore, unregister it.

Yeah.

>
> > Also look at the PIT which latches on read.
> > 
> >> 
> >> For IDE, it would be as simple as
> >> 
> >>   register_pio_hook_ptr_r(PIO_IDE, SIZE_BYTE,&s->cmd[0]);
> >>   for (i = 1; i<  7; i++) {
> >> register_pio_hook_ptr_r(PIO_IDE + i, SIZE_BYTE,&s->cmd[i]);
> >> register_pio_hook_ptr_w(PIO_IDE + i, SIZE_BYTE,&s->cmd[i]);
> >>   }
> >> 
> >> and we should have reduced overhead of IDE by quite a bit already. All the 
> >> other 2k LOC in hw/ide/core.c don't matter for us really.
> > 
> > 
> > Just use virtio.
>
> Just use xenbus. Seriously, this is not an answer.

Why not?  We invested effort in making it as fast as possible, and in
writing the drivers.  IDE will never, ever, get anything close to virtio
performance, even if we put all of it in the kernel.

However, after these examples, I'm more open to partial acceleration
now.  I won't ever like it though.

> >> >
> >> >>- VGA
> >> >>- IDE
> >> >
> >> >  Why?  There are perfectly good replacements for these (qxl, virtio-blk, 
> >> > virtio-scsi).
> >> 
> >> Because not every guest supports them. Virtio-blk needs 3rd party drivers. 
> >> AHCI needs 3rd party drivers on w2k3 and wxp. 

3rd party drivers are a way of life for Windows users; and the
incremental benefits of IDE acceleration are still far behind virtio.

> I'm pretty sure non-Linux non-Windows systems won't get QXL drivers. 

Cirrus or vesa should be okay for them, I don't see what we could do for
them in the kernel, or why.

> Same for virtio.
> >> 
> >> Please don't do the Xen mistake again of claiming that all we care about 
> >> is Linux as a guest.
> > 
> > Rest easy, there's no chance of that.  But if a guest is important enough, 
> > virtio drivers will get written.  IDE has no chance in hell of approaching 
> > virtio-blk performance, no matter how much effort we put into it.
>
> Ever used VMware? They basically get virtio-blk performance out of ordinary 
> IDE for linear workloads.

For linear loads, so should we, perhaps with greater cpu utliization.

If we DMA 64 kB at a time, then 128 MB/sec (to keep the numbers simple)
means 0.5 msec/transaction.  Spending 30 usec on some heavyweight exits
shouldn't matter.

> > 
> >> KVM's strength has always been its close resemblance to hardware.
> > 
> > This will remain.  But we can't optimize everything.
>
> That's my point. Let's optimize the hot paths and be good. As long as we 
> default to IDE for disk, we should have that be fast, no?

We should make sure that we don't default to IDE.  Qemu has no knowledge
of the guest, so it can't defa

Re: [PATCH] BUG in pv_clock when overflow condition is detected

2012-02-15 Thread Igor Mammedov

On 02/15/2012 11:49 AM, Avi Kivity wrote:

On 02/13/2012 08:20 PM, Igor Mammedov wrote:

BUG when overflow occurs at pvclock.c:pvclock_get_nsec_offset

 u64 delta = native_read_tsc() - shadow->tsc_timestamp;

this might happen at an attempt to read an uninitialized yet clock.
It won't prevent stalls and hangs but at least it won't do it silently.

Signed-off-by: Igor Mammedov
---
  arch/x86/kernel/pvclock.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..35a6190 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags)

  static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
  {
-   u64 delta = native_read_tsc() - shadow->tsc_timestamp;
+   u64 delta;
+   u64 tsc = native_read_tsc();
+   BUG_ON(tsc<  shadow->tsc_timestamp);
+   delta = tsc - shadow->tsc_timestamp;
return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
   shadow->tsc_shift);


Maybe a WARN_ON_ONCE()?  Otherwise a relatively minor hypervisor bug can
kill the guest.


An attempt to print from this place is not perfect since it often leads
to recursive calling to this very function and it hang there anyway.
But if you insist I'll re-post it with WARN_ON_ONCE,
It won't make much difference because guest will hang/stall due overflow
anyway.

If there is an intention to keep guest functional after the event then
maybe this patch is a way to go
  http://www.spinics.net/lists/kvm/msg68463.html
this way clock will be re-silent to this kind of errors, like bare-metal
one is.

--
Thanks,
 Igor
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Xiao Guangrong
On 02/15/2012 05:47 PM, Avi Kivity wrote:

> On 02/15/2012 11:18 AM, Avi Kivity wrote:
>> On 02/14/2012 09:43 PM, Marcelo Tosatti wrote:
>>> Also it should not be necessary for these flushes to be inside mmu_lock
>>> on EPT/NPT case (since there is no write protection there). 
>>
>> We do write protect with TDP, if nested virt is active.  The question is
>> whether we have indirect pages or not, not whether TDP is active or not
>> (even without TDP, if you don't enable paging in the guest, you don't
>> have to write protect).
>>
>>> But it would
>>> be awkward to differentiate the unlock position based on EPT/NPT.
>>>
>>
>> I would really like to move the IPI back out of the lock.
>>
>> How about something like a sequence lock:
>>
>>
>> spin_lock(mmu_lock)
>> need_flush = write_protect_stuff();
>> atomic_add(kvm->want_flush_counter, need_flush);
>> spin_unlock(mmu_lock);
>>
>> while ((done = atomic_read(kvm->done_flush_counter)) < (want =
>> atomic_read(kvm->want_flush_counter)) {
>>   kvm_make_request(flush)
>>   atomic_cmpxchg(kvm->done_flush_counter, done, want)
>> }
>>
>> This (or maybe a corrected and optimized version) ensures that any
>> need_flush cannot pass the while () barrier, no matter which thread
>> encounters it first.  However it violates the "do not invent new locking
>> techniques" commandment.  Can we map it to some existing method?
> 
> There is no need to advance 'want' in the loop.  So we could do
> 
> /* must call with mmu_lock held */
> void kvm_mmu_defer_remote_flush(kvm, need_flush)
> {
>   if (need_flush)
> ++kvm->flush_counter.want;
> }
> 
> /* may call without mmu_lock */
> void kvm_mmu_commit_remote_flush(kvm)
> {
>   want = ACCESS_ONCE(kvm->flush_counter.want)
>   while ((done = atomic_read(kvm->flush_counter.done) < want) {
> kvm_make_request(flush)
> atomic_cmpxchg(kvm->flush_counter.done, done, want)
>   }
> }
> 


Hmm, we already have kvm->tlbs_dirty, so, we can do it like this:

#define SPTE_INVALID_UNCLEAN (1 << 63 )

in invalid page path:
lock mmu_lock
if (spte is invalid)
kvm->tlbs_dirty |= SPTE_INVALID_UNCLEAN;
need_tlb_flush = kvm->tlbs_dirty;
unlock mmu_lock
if (need_tlb_flush)
kvm_flush_remote_tlbs()

And in page write-protected path:
lock mmu_lock
if (it has spte change to readonly |
  kvm->tlbs_dirty & SPTE_INVALID_UNCLEAN)
kvm_flush_remote_tlbs()
unlock mmu_lock

How about this?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf

On 15.02.2012, at 12:18, Avi Kivity wrote:

> On 02/07/2012 04:39 PM, Alexander Graf wrote:
>>> 
>>> Syscalls are orthogonal to that - they're to avoid the fget_light() and to 
>>> tighten the vcpu/thread and vm/process relationship.
>> 
>> How about keeping the ioctl interface but moving vcpu_run to a syscall then?
> 
> I dislike half-and-half interfaces even more.  And it's not like the
> fget_light() is really painful - it's just that I see it occasionally in
> perf top so it annoys me.
> 
>> That should really be the only thing that belongs into the fast path, right? 
>> Every time we do a register sync in user space, we do something wrong. 
>> Instead, user space should either
>> 
>>  a) have wrappers around register accesses, so it can directly ask for 
>> specific registers that it needs
>> or
>>  b) keep everything that would be requested by the register synchronization 
>> in shared memory
> 
> Always-synced shared memory is a liability, since newer hardware might
> introduce on-chip caches for that state, making synchronization
> expensive.  Or we may choose to keep some of the registers loaded, if we
> have a way to trap on their use from userspace - for example we can
> return to userspace with the guest fpu loaded, and trap if userspace
> tries to use it.
> 
> Is an extra syscall for copying TLB entries to user space prohibitively
> expensive?

The copying can be very expensive, yes. We want to have the possibility of 
exposing a very large TLB to the guest, in the order of multiple kentries. 
Every entry is a struct of 24 bytes.

> 
>>> 
 , keep the rest in user space.
> 
> 
> When a device is fully in the kernel, we have a good specification of the 
> ABI: it just implements the spec, and the ABI provides the interface from 
> the device to the rest of the world.  Partially accelerated devices means 
> a much greater effort in specifying exactly what it does.  It's also 
> vulnerable to changes in how the guest uses the device.
 
 Why? For the HPET timer register for example, we could have a simple MMIO 
 hook that says
 
  on_read:
return read_current_time() - shared_page.offset;
  on_write:
handle_in_user_space();
>>> 
>>> It works for the really simple cases, yes, but if the guest wants to set up 
>>> one-shot timers, it fails.  
>> 
>> I don't understand. Why would anything fail here? 
> 
> It fails to provide a benefit, I didn't mean it causes guest failures.
> 
> You also have to make sure the kernel part and the user part use exactly
> the same time bases.

Right. It's an optional performance accelerator. If anything doesn't align, 
don't use it. But if you happen to have a system where everything's cool, 
you're faster. Sounds like a good deal to me ;).

> 
>> Once the logic that's implemented by the kernel accelerator doesn't fit 
>> anymore, unregister it.
> 
> Yeah.
> 
>> 
>>> Also look at the PIT which latches on read.
>>> 
 
 For IDE, it would be as simple as
 
  register_pio_hook_ptr_r(PIO_IDE, SIZE_BYTE,&s->cmd[0]);
  for (i = 1; i<  7; i++) {
register_pio_hook_ptr_r(PIO_IDE + i, SIZE_BYTE,&s->cmd[i]);
register_pio_hook_ptr_w(PIO_IDE + i, SIZE_BYTE,&s->cmd[i]);
  }
 
 and we should have reduced overhead of IDE by quite a bit already. All the 
 other 2k LOC in hw/ide/core.c don't matter for us really.
>>> 
>>> 
>>> Just use virtio.
>> 
>> Just use xenbus. Seriously, this is not an answer.
> 
> Why not?  We invested effort in making it as fast as possible, and in
> writing the drivers.  IDE will never, ever, get anything close to virtio
> performance, even if we put all of it in the kernel.
> 
> However, after these examples, I'm more open to partial acceleration
> now.  I won't ever like it though.
> 
> 
>>   - VGA
>>   - IDE
> 
> Why?  There are perfectly good replacements for these (qxl, virtio-blk, 
> virtio-scsi).
 
 Because not every guest supports them. Virtio-blk needs 3rd party drivers. 
 AHCI needs 3rd party drivers on w2k3 and wxp. 
> 
> 3rd party drivers are a way of life for Windows users; and the
> incremental benefits of IDE acceleration are still far behind virtio.

The typical way of life for Windows users are all-included drivers. Which is 
the case for AHCI, where we're getting awesome performance for Vista and above 
guests. The iDE thing was just an idea for legacy ones.

It'd be great to simply try and see how fast we could get by handling a few 
special registers in kernel space vs heavyweight exiting to QEMU. If it's only 
10%, I wouldn't even bother with creating an interface for it. I'd bet the 
benefits are a lot bigger though.

And the main point was that specific partial device emulation buys us more than 
pseudo-generic accelerators like coalesced mmio, which are also only used by 1 
or 2 devices.

> 
>> I'm pretty sure non-Linux non-Windows systems won't get QXL drivers. 
> 
> Cirrus or ves

Re: AESNI and guest hosts

2012-02-15 Thread Ryan Brown
>>
>> I don't think it's supported to pass that functionality to the guest.
>>
>
> Why not?  Perhaps a new libvirt or qemu is needed.
>

Should it be the case to add one of the following?


or..


something like that?

Host is using linux kernel 3.2.4 (Debian Wheezy) libvirt (0.9.8-2),
qemu (1.0+dfsg-2), Guest is on linux kernel Ubuntu/3.2.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/15/2012 01:57 PM, Alexander Graf wrote:
> > 
> > Is an extra syscall for copying TLB entries to user space prohibitively
> > expensive?
>
> The copying can be very expensive, yes. We want to have the possibility of 
> exposing a very large TLB to the guest, in the order of multiple kentries. 
> Every entry is a struct of 24 bytes.

You don't need to copy the entire TLB, just the way that maps the
address you're interested in.

btw, why are you interested in virtual addresses in userspace at all?

> >>> 
> >>> It works for the really simple cases, yes, but if the guest wants to set 
> >>> up one-shot timers, it fails.  
> >> 
> >> I don't understand. Why would anything fail here? 
> > 
> > It fails to provide a benefit, I didn't mean it causes guest failures.
> > 
> > You also have to make sure the kernel part and the user part use exactly
> > the same time bases.
>
> Right. It's an optional performance accelerator. If anything doesn't align, 
> don't use it. But if you happen to have a system where everything's cool, 
> you're faster. Sounds like a good deal to me ;).

Depends on how much the alignment relies on guest knowledge.  I guess
with a simple device like HPET, it's simple, but with a complex device,
different guests (or different versions of the same guest) could drive
it very differently.

>  
>  Because not every guest supports them. Virtio-blk needs 3rd party 
>  drivers. AHCI needs 3rd party drivers on w2k3 and wxp. 
> > 
> > 3rd party drivers are a way of life for Windows users; and the
> > incremental benefits of IDE acceleration are still far behind virtio.
>
> The typical way of life for Windows users are all-included drivers. Which is 
> the case for AHCI, where we're getting awesome performance for Vista and 
> above guests. The iDE thing was just an idea for legacy ones.
>
> It'd be great to simply try and see how fast we could get by handling a few 
> special registers in kernel space vs heavyweight exiting to QEMU. If it's 
> only 10%, I wouldn't even bother with creating an interface for it. I'd bet 
> the benefits are a lot bigger though.
>
> And the main point was that specific partial device emulation buys us more 
> than pseudo-generic accelerators like coalesced mmio, which are also only 
> used by 1 or 2 devices.

Ok.

> > 
> >> I'm pretty sure non-Linux non-Windows systems won't get QXL drivers. 
> > 
> > Cirrus or vesa should be okay for them, I don't see what we could do for
> > them in the kernel, or why.
>
> That's my point. You need fast emulation of standard devices to get a good 
> baseline. Do PV on top, but keep the baseline as fast as is reasonable.
>
> > 
> >> Same for virtio.
>  
>  Please don't do the Xen mistake again of claiming that all we care about 
>  is Linux as a guest.
> >>> 
> >>> Rest easy, there's no chance of that.  But if a guest is important 
> >>> enough, virtio drivers will get written.  IDE has no chance in hell of 
> >>> approaching virtio-blk performance, no matter how much effort we put into 
> >>> it.
> >> 
> >> Ever used VMware? They basically get virtio-blk performance out of 
> >> ordinary IDE for linear workloads.
> > 
> > For linear loads, so should we, perhaps with greater cpu utliization.
> > 
> > If we DMA 64 kB at a time, then 128 MB/sec (to keep the numbers simple)
> > means 0.5 msec/transaction.  Spending 30 usec on some heavyweight exits
> > shouldn't matter.
>
> *shrug* last time I checked we were a lot slower. But maybe there's more 
> stuff making things slow than the exit path ;).

One thing that's different is that virtio offloads itself to a thread
very quickly, while IDE does a lot of work in vcpu thread context.

> > 
> >>> 
>  KVM's strength has always been its close resemblance to hardware.
> >>> 
> >>> This will remain.  But we can't optimize everything.
> >> 
> >> That's my point. Let's optimize the hot paths and be good. As long as we 
> >> default to IDE for disk, we should have that be fast, no?
> > 
> > We should make sure that we don't default to IDE.  Qemu has no knowledge
> > of the guest, so it can't default to virtio, but higher level tools can
> > and should.
>
> You can only default to virtio on recent Linux. Windows, BSD, etc don't 
> include drivers, so you can't assume it working. You can default to AHCI for 
> basically any recent guest, but that still won't work for XP and the likes :(.

The all-knowing management tool can provide a virtio driver disk, or
even slip-stream the driver into the installation CD.


>  
> >> Ah, because you're on NPT and you can have MMIO hints in the nested page 
> >> table. Nifty. Yeah, we don't have that luxury :).
> > 
> > Well the real reason is we have an extra bit reported by page faults
> > that we can control.  Can't you set up a hashed pte that is configured
> > in a way that it will fault, no matter what type of access the guest
> > does, and see it in your page fault handler?
>
> I might be able to synthesize a PTE that is !readab

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/12/2012 09:10 AM, Takuya Yoshikawa wrote:
> Avi Kivity  wrote:
>
> > > >  Slot searching is quite fast since there's a small number of slots, 
> > > > and we sort the larger ones to be in the front, so positive lookups are 
> > > > fast.  We cache negative lookups in the shadow page tables (an spte can 
> > > > be either "not mapped", "mapped to RAM", or "not mapped and known to be 
> > > > mmio") so we rarely need to walk the entire list.
> > >
> > > Well, we don't always have shadow page tables. Having hints for unmapped 
> > > guest memory like this is pretty tricky.
> > > We're currently running into issues with device assignment though, where 
> > > we get a lot of small slots mapped to real hardware. I'm sure that will 
> > > hit us on x86 sooner or later too.
> > 
> > For x86 that's not a problem, since once you map a page, it stays mapped 
> > (on modern hardware).
> > 
>
> I was once thinking about how to search a slot reasonably fast for every case,
> even when we do not have mmio-spte cache.
>
> One possible way I thought up was to sort slots according to their base_gfn.
> Then the problem would become:  "find the first slot whose base_gfn + npages
> is greater than this gfn."
>
> Since we can do binary search, the search cost is O(log(# of slots)).
>
> But I guess that most of the time was wasted on reading many memslots just to
> know their base_gfn and npages.
>
> So the most practically effective thing is to make a separate array which 
> holds
> just their base_gfn.  This will make the task a simple, and cache friendly,
> search on an integer array:  probably faster than using *-tree data structure.

This assumes that there is equal probability for matching any slot.  But
that's not true, even if you have hundreds of slots, the probability is
much greater for the two main memory slots, or if you're playing with
the framebuffer, the framebuffer slot.  Everything else is loaded
quickly into shadow and forgotten.

> If needed, we should make cmp_memslot() architecture specific in the end?

We could, but why is it needed?  This logic holds for all architectures.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 05:23 PM, Anthony Liguori wrote:
> On 02/07/2012 07:40 AM, Alexander Graf wrote:
>>
>> Why? For the HPET timer register for example, we could have a simple
>> MMIO hook that says
>>
>>on_read:
>>  return read_current_time() - shared_page.offset;
>>on_write:
>>  handle_in_user_space();
>>
>> For IDE, it would be as simple as
>>
>>register_pio_hook_ptr_r(PIO_IDE, SIZE_BYTE,&s->cmd[0]);
>>for (i = 1; i<  7; i++) {
>>  register_pio_hook_ptr_r(PIO_IDE + i, SIZE_BYTE,&s->cmd[i]);
>>  register_pio_hook_ptr_w(PIO_IDE + i, SIZE_BYTE,&s->cmd[i]);
>>}
>
> You can't easily serialize updates to that address with the kernel
> since two threads are likely going to be accessing it at the same
> time.  That either means an expensive sync operation or a reliance on
> atomic instructions.
>
> But not all architectures offer non-word sized atomic instructions so
> it gets fairly nasty in practice.
>

I doubt that any guest accesses IDE registers from two threads in
parallel.  The guest will have some lock, so we could have a lock as
well and be assured that there will never be contention.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf

On 15.02.2012, at 14:29, Avi Kivity wrote:

> On 02/15/2012 01:57 PM, Alexander Graf wrote:
>>> 
>>> Is an extra syscall for copying TLB entries to user space prohibitively
>>> expensive?
>> 
>> The copying can be very expensive, yes. We want to have the possibility of 
>> exposing a very large TLB to the guest, in the order of multiple kentries. 
>> Every entry is a struct of 24 bytes.
> 
> You don't need to copy the entire TLB, just the way that maps the
> address you're interested in.

Yeah, unless we do migration in which case we need to introduce another special 
case to fetch the whole thing :(.

> btw, why are you interested in virtual addresses in userspace at all?

We need them for gdb and monitor introspection.

> 
> 
> It works for the really simple cases, yes, but if the guest wants to set 
> up one-shot timers, it fails.  
 
 I don't understand. Why would anything fail here? 
>>> 
>>> It fails to provide a benefit, I didn't mean it causes guest failures.
>>> 
>>> You also have to make sure the kernel part and the user part use exactly
>>> the same time bases.
>> 
>> Right. It's an optional performance accelerator. If anything doesn't align, 
>> don't use it. But if you happen to have a system where everything's cool, 
>> you're faster. Sounds like a good deal to me ;).
> 
> Depends on how much the alignment relies on guest knowledge.  I guess
> with a simple device like HPET, it's simple, but with a complex device,
> different guests (or different versions of the same guest) could drive
> it very differently.

Right. But accelerating simple devices > not accelerating any devices. No? :)

> 
>> 
>> Because not every guest supports them. Virtio-blk needs 3rd party 
>> drivers. AHCI needs 3rd party drivers on w2k3 and wxp. 
>>> 
>>> 3rd party drivers are a way of life for Windows users; and the
>>> incremental benefits of IDE acceleration are still far behind virtio.
>> 
>> The typical way of life for Windows users are all-included drivers. Which is 
>> the case for AHCI, where we're getting awesome performance for Vista and 
>> above guests. The iDE thing was just an idea for legacy ones.
>> 
>> It'd be great to simply try and see how fast we could get by handling a few 
>> special registers in kernel space vs heavyweight exiting to QEMU. If it's 
>> only 10%, I wouldn't even bother with creating an interface for it. I'd bet 
>> the benefits are a lot bigger though.
>> 
>> And the main point was that specific partial device emulation buys us more 
>> than pseudo-generic accelerators like coalesced mmio, which are also only 
>> used by 1 or 2 devices.
> 
> Ok.
> 
>>> 
 I'm pretty sure non-Linux non-Windows systems won't get QXL drivers. 
>>> 
>>> Cirrus or vesa should be okay for them, I don't see what we could do for
>>> them in the kernel, or why.
>> 
>> That's my point. You need fast emulation of standard devices to get a good 
>> baseline. Do PV on top, but keep the baseline as fast as is reasonable.
>> 
>>> 
 Same for virtio.
>> 
>> Please don't do the Xen mistake again of claiming that all we care about 
>> is Linux as a guest.
> 
> Rest easy, there's no chance of that.  But if a guest is important 
> enough, virtio drivers will get written.  IDE has no chance in hell of 
> approaching virtio-blk performance, no matter how much effort we put into 
> it.
 
 Ever used VMware? They basically get virtio-blk performance out of 
 ordinary IDE for linear workloads.
>>> 
>>> For linear loads, so should we, perhaps with greater cpu utliization.
>>> 
>>> If we DMA 64 kB at a time, then 128 MB/sec (to keep the numbers simple)
>>> means 0.5 msec/transaction.  Spending 30 usec on some heavyweight exits
>>> shouldn't matter.
>> 
>> *shrug* last time I checked we were a lot slower. But maybe there's more 
>> stuff making things slow than the exit path ;).
> 
> One thing that's different is that virtio offloads itself to a thread
> very quickly, while IDE does a lot of work in vcpu thread context.

So it's all about latencies again, which could be reduced at least a fair bit 
with the scheme I described above. But really, this needs to be prototyped and 
benchmarked to actually give us data on how fast it would get us.

> 
>>> 
> 
>> KVM's strength has always been its close resemblance to hardware.
> 
> This will remain.  But we can't optimize everything.
 
 That's my point. Let's optimize the hot paths and be good. As long as we 
 default to IDE for disk, we should have that be fast, no?
>>> 
>>> We should make sure that we don't default to IDE.  Qemu has no knowledge
>>> of the guest, so it can't default to virtio, but higher level tools can
>>> and should.
>> 
>> You can only default to virtio on recent Linux. Windows, BSD, etc don't 
>> include drivers, so you can't assume it working. You can default to AHCI for 
>> basically any recent guest, but that still won't work for XP and the likes 

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 08:12 PM, Rusty Russell wrote:
> > I would really love to have this, but the problem is that we'd need a
> > general purpose bytecode VM with binding to some kernel APIs.  The
> > bytecode VM, if made general enough to host more complicated devices,
> > would likely be much larger than the actual code we have in the kernel now.
>
> We have the ability to upload bytecode into the kernel already.  It's in
> a great bytecode interpreted by the CPU itself.

Unfortunately it's inflexible (has to come with the kernel) and open to
security vulnerabilities.

> If every user were emulating different machines, LPF this would make
> sense.  Are they?  

They aren't.

> Or should we write those helpers once, in C, and
> provide that for them.

There are many of them: PIT/PIC/IOAPIC/MSIX tables/HPET/kvmclock/Hyper-V
stuff/vhost-net/DMA remapping/IO remapping (just for x86), and some of
them are quite complicated.  However implementing them in bytecode
amounts to exposing a stable kernel ABI, since they use such a vast
range of kernel services.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 06:29 PM, Jan Kiszka wrote:
> >>>
> >>
> >> Isn't there another level in between just scheduling and full syscall
> >> return if the user return notifier has some real work to do?
> > 
> > Depends on whether you're scheduling a kthread or a userspace process, no?  
> > If 
>
> Kthreads can't return, of course. User space threads /may/ do so. And
> then there needs to be a differences between host and guest in the
> tracked MSRs. 

Right.  Until we randomize kernel virtual addresses (what happened to
that?) and then there will always be a difference, even if you run the
same kernel in the host and guest.

> I think to recall it's a question of another few hundred
> cycles.

Right.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 06:19 PM, Anthony Liguori wrote:
>> Ah. But then ioeventfd has that as well, unless the other end is in
>> the kernel too.
>
>
> Yes, that was my point exactly :-)
>
> ioeventfd/mmio-over-socketpair to adifferent thread is not faster than
> a synchronous KVM_RUN + writing to an eventfd in userspace modulo a
> couple of cheap syscalls.
>
> The exception is when the other end is in the kernel and there is
> magic optimizations (like there is today with ioeventfd).

vhost seems to schedule a workqueue item unconditionally.

irqfd does have magic optimizations to avoid an extra schedule.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/15/2012 03:37 PM, Alexander Graf wrote:
> On 15.02.2012, at 14:29, Avi Kivity wrote:
>
> > On 02/15/2012 01:57 PM, Alexander Graf wrote:
> >>> 
> >>> Is an extra syscall for copying TLB entries to user space prohibitively
> >>> expensive?
> >> 
> >> The copying can be very expensive, yes. We want to have the possibility of 
> >> exposing a very large TLB to the guest, in the order of multiple kentries. 
> >> Every entry is a struct of 24 bytes.
> > 
> > You don't need to copy the entire TLB, just the way that maps the
> > address you're interested in.
>
> Yeah, unless we do migration in which case we need to introduce another 
> special case to fetch the whole thing :(.

Well, the scatter/gather registers I proposed will give you just one
register or all of them.

> > btw, why are you interested in virtual addresses in userspace at all?
>
> We need them for gdb and monitor introspection.

Hardly fast paths that justify shared memory.  I should be much harder
on you.

> >> 
> >> Right. It's an optional performance accelerator. If anything doesn't 
> >> align, don't use it. But if you happen to have a system where everything's 
> >> cool, you're faster. Sounds like a good deal to me ;).
> > 
> > Depends on how much the alignment relies on guest knowledge.  I guess
> > with a simple device like HPET, it's simple, but with a complex device,
> > different guests (or different versions of the same guest) could drive
> > it very differently.
>
> Right. But accelerating simple devices > not accelerating any devices. No? :)

Yes.  But introducing bugs and vulns < not introducing them.  It's a
tradeoff.  Even an unexploited vulnerability can be a lot more pain,
just because you need to update your entire cluster, than a simple
device that is accelerated for a guest which has maybe 3% utilization. 
Performance is just one parameter we optimize for.  It's easy to overdo
it because it's an easily measurable and sexy parameter, but it's a mistake.

> > 
> > One thing that's different is that virtio offloads itself to a thread
> > very quickly, while IDE does a lot of work in vcpu thread context.
>
> So it's all about latencies again, which could be reduced at least a fair bit 
> with the scheme I described above. But really, this needs to be prototyped 
> and benchmarked to actually give us data on how fast it would get us.

Simply making qemu issue the request from a thread would be way better. 
Something like socketpair mmio, configured for not waiting for the
writes to be seen (posted writes) will also help by buffering writes in
the socket buffer.

> > 
> > The all-knowing management tool can provide a virtio driver disk, or
> > even slip-stream the driver into the installation CD.
>
> One management tool might do that, another one might now. We can't assume 
> that all management tools are all-knowing. Some times you also want to run 
> guest OSs that the management tool doesn't know (yet).

That is true, but we have to leave some work for the management guys.

>  
> >> So for MMIO reads, I can assume that this is an MMIO because I would never 
> >> write a non-readable entry. For writes, I'm overloading the bit that also 
> >> means "guest entry is not readable" so there I'd have to walk the guest 
> >> PTEs/TLBs and check if I find a read-only entry. Right now I can just 
> >> forward write faults to the guest. Since COW is probably a hotter path for 
> >> the guest than MMIO, this might end up being ineffective.
> > 
> > COWs usually happen from guest userspace, while mmio is usually from the
> > guest kernel, so you can switch on that, maybe.
>
> Hrm, nice idea. That might fall apart with user space drivers that we might 
> eventually have once vfio turns out to work well, but for the time being it's 
> a nice hack :).

Or nested virt...



-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BUG in pv_clock when overflow condition is detected

2012-02-15 Thread Avi Kivity
On 02/15/2012 01:23 PM, Igor Mammedov wrote:
>>>   static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time
>>> *shadow)
>>>   {
>>> -u64 delta = native_read_tsc() - shadow->tsc_timestamp;
>>> +u64 delta;
>>> +u64 tsc = native_read_tsc();
>>> +BUG_ON(tsc<  shadow->tsc_timestamp);
>>> +delta = tsc - shadow->tsc_timestamp;
>>>   return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
>>>  shadow->tsc_shift);
>>
>> Maybe a WARN_ON_ONCE()?  Otherwise a relatively minor hypervisor bug can
>> kill the guest.
>
>
> An attempt to print from this place is not perfect since it often leads
> to recursive calling to this very function and it hang there anyway.
> But if you insist I'll re-post it with WARN_ON_ONCE,
> It won't make much difference because guest will hang/stall due overflow
> anyway.

Won't a BUG_ON() also result in a printk?

>
> If there is an intention to keep guest functional after the event then
> maybe this patch is a way to go
>   http://www.spinics.net/lists/kvm/msg68463.html
> this way clock will be re-silent to this kind of errors, like bare-metal
> one is.

It's the same patch... do you mean something that detects the overflow
and uses the last value?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AESNI and guest hosts

2012-02-15 Thread Avi Kivity
On 02/15/2012 02:02 PM, Ryan Brown wrote:
> >>
> >> I don't think it's supported to pass that functionality to the guest.
> >>
> >
> > Why not?  Perhaps a new libvirt or qemu is needed.
> >
>
> Should it be the case to add one of the following?
>
> 
> or..
> 
>
> something like that?

The qemu name is aes.  Don't know about libvirt, suggest you start with
bare qemu first.



-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Avi Kivity
On 02/15/2012 01:37 PM, Xiao Guangrong wrote:
> >>
> >> I would really like to move the IPI back out of the lock.
> >>
> >> How about something like a sequence lock:
> >>
> >>
> >> spin_lock(mmu_lock)
> >> need_flush = write_protect_stuff();
> >> atomic_add(kvm->want_flush_counter, need_flush);
> >> spin_unlock(mmu_lock);
> >>
> >> while ((done = atomic_read(kvm->done_flush_counter)) < (want =
> >> atomic_read(kvm->want_flush_counter)) {
> >>   kvm_make_request(flush)
> >>   atomic_cmpxchg(kvm->done_flush_counter, done, want)
> >> }
> >>
> >> This (or maybe a corrected and optimized version) ensures that any
> >> need_flush cannot pass the while () barrier, no matter which thread
> >> encounters it first.  However it violates the "do not invent new locking
> >> techniques" commandment.  Can we map it to some existing method?
> > 
> > There is no need to advance 'want' in the loop.  So we could do
> > 
> > /* must call with mmu_lock held */
> > void kvm_mmu_defer_remote_flush(kvm, need_flush)
> > {
> >   if (need_flush)
> > ++kvm->flush_counter.want;
> > }
> > 
> > /* may call without mmu_lock */
> > void kvm_mmu_commit_remote_flush(kvm)
> > {
> >   want = ACCESS_ONCE(kvm->flush_counter.want)
> >   while ((done = atomic_read(kvm->flush_counter.done) < want) {
> > kvm_make_request(flush)
> > atomic_cmpxchg(kvm->flush_counter.done, done, want)
> >   }
> > }
> > 
>
>
> Hmm, we already have kvm->tlbs_dirty, so, we can do it like this:
>
> #define SPTE_INVALID_UNCLEAN (1 << 63 )
>
> in invalid page path:
> lock mmu_lock
> if (spte is invalid)
>   kvm->tlbs_dirty |= SPTE_INVALID_UNCLEAN;
> need_tlb_flush = kvm->tlbs_dirty;
> unlock mmu_lock
> if (need_tlb_flush)
>   kvm_flush_remote_tlbs()
>
> And in page write-protected path:
> lock mmu_lock
>   if (it has spte change to readonly |
> kvm->tlbs_dirty & SPTE_INVALID_UNCLEAN)
>   kvm_flush_remote_tlbs()
> unlock mmu_lock
>
> How about this?

Well, it still has flushes inside the lock.  And it seems to be more
complicated, but maybe that's because I thought of my idea and didn't
fully grok yours yet.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf

On 15.02.2012, at 14:57, Avi Kivity wrote:

> On 02/15/2012 03:37 PM, Alexander Graf wrote:
>> On 15.02.2012, at 14:29, Avi Kivity wrote:
>> 
>>> On 02/15/2012 01:57 PM, Alexander Graf wrote:
> 
> Is an extra syscall for copying TLB entries to user space prohibitively
> expensive?
 
 The copying can be very expensive, yes. We want to have the possibility of 
 exposing a very large TLB to the guest, in the order of multiple kentries. 
 Every entry is a struct of 24 bytes.
>>> 
>>> You don't need to copy the entire TLB, just the way that maps the
>>> address you're interested in.
>> 
>> Yeah, unless we do migration in which case we need to introduce another 
>> special case to fetch the whole thing :(.
> 
> Well, the scatter/gather registers I proposed will give you just one
> register or all of them.

One register is hardly any use. We either need all ways of a respective address 
to do a full fledged lookup or all of them. By sharing the same data structures 
between qemu and kvm, we actually managed to reuse all of the tcg code for 
lookups, just like you do for x86. On x86 you also have shared memory for page 
tables, it's just guest visible, hence in guest memory. The concept is the same.

> 
>>> btw, why are you interested in virtual addresses in userspace at all?
>> 
>> We need them for gdb and monitor introspection.
> 
> Hardly fast paths that justify shared memory.  I should be much harder
> on you.

It was a tradeoff on speed and complexity. This way we have the least amount of 
complexity IMHO. All KVM code paths just magically fit in with the TCG code. 
There are essentially no if(kvm_enabled)'s in our MMU walking code, because the 
tables are just there. Makes everything a lot easier (without dragging down 
performance).

> 
 
 Right. It's an optional performance accelerator. If anything doesn't 
 align, don't use it. But if you happen to have a system where everything's 
 cool, you're faster. Sounds like a good deal to me ;).
>>> 
>>> Depends on how much the alignment relies on guest knowledge.  I guess
>>> with a simple device like HPET, it's simple, but with a complex device,
>>> different guests (or different versions of the same guest) could drive
>>> it very differently.
>> 
>> Right. But accelerating simple devices > not accelerating any devices. No? :)
> 
> Yes.  But introducing bugs and vulns < not introducing them.  It's a
> tradeoff.  Even an unexploited vulnerability can be a lot more pain,
> just because you need to update your entire cluster, than a simple
> device that is accelerated for a guest which has maybe 3% utilization. 
> Performance is just one parameter we optimize for.  It's easy to overdo
> it because it's an easily measurable and sexy parameter, but it's a mistake.

Yeah, I agree. That's why I was trying to get AHCI to the default storage 
adapter for a while, because I think the same. However, Anthony believes that 
XP/w2k3 is still a major chunk of the guests running on QEMU, so we can't do 
that :(.

I'm mostly trying to think of ways to accelerate the obvious low hanging 
fruits, without overengineering any interfaces.

> 
>>> 
>>> One thing that's different is that virtio offloads itself to a thread
>>> very quickly, while IDE does a lot of work in vcpu thread context.
>> 
>> So it's all about latencies again, which could be reduced at least a fair 
>> bit with the scheme I described above. But really, this needs to be 
>> prototyped and benchmarked to actually give us data on how fast it would get 
>> us.
> 
> Simply making qemu issue the request from a thread would be way better. 
> Something like socketpair mmio, configured for not waiting for the
> writes to be seen (posted writes) will also help by buffering writes in
> the socket buffer.

Yup, nice idea. That only works when all parts of a device are actually 
implemented through the same socket though. Otherwise you could run out of 
order. So if you have a PCI device with a PIO and an MMIO BAR region, they 
would both have to be handled through the same socket.

> 
>>> 
>>> The all-knowing management tool can provide a virtio driver disk, or
>>> even slip-stream the driver into the installation CD.
>> 
>> One management tool might do that, another one might now. We can't assume 
>> that all management tools are all-knowing. Some times you also want to run 
>> guest OSs that the management tool doesn't know (yet).
> 
> That is true, but we have to leave some work for the management guys.

The easier the management stack is, the happier I am ;).

> 
>> 
 So for MMIO reads, I can assume that this is an MMIO because I would never 
 write a non-readable entry. For writes, I'm overloading the bit that also 
 means "guest entry is not readable" so there I'd have to walk the guest 
 PTEs/TLBs and check if I find a read-only entry. Right now I can just 
 forward write faults to the guest. Since COW is probably a hotter path for 
 the guest than M

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-15 Thread Jamal Hadi Salim
On Tue, 2012-02-14 at 10:57 -0800, John Fastabend wrote:

> Roopa was likely on the right track here,
> 
> http://patchwork.ozlabs.org/patch/123064/

Doesnt seem related to the bridging stuff - the modeling looks
reasonable however.

> But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
> netlink messages. And if possible drive this without extending ndo_ops.
> 
> An ideal user space interaction IMHO would look like,
> 
> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
> [root@jf-dev1-dcblab iproute2]# ./br/br fdb
> portmac addrflags
> veth2   36:a6:35:9b:96:c4   local
> veth4   aa:54:b0:7b:42:ef   local
> veth0   2a:e8:5c:95:6c:1b   local
> veth6   6e:26:d5:43:a3:36   local
> veth0   f2:c1:39:76:6a:fb
> veth8   4e:35:16:af:87:13   local
> veth10  52:e5:62:7b:57:88   static
> veth10  aa:a9:35:21:15:c4   local

Looks nice, where is the targeted bridge(eg br0) in that syntax?

> Using Stephen's br tool. First command adds FDB entry to SW bridge and
> if the same tool could be used to add entries to embedded bridge I think
> that would be the best case. 

That would be nice (although adds dependency on the presence of the
s/ware bridge). Would be nicer to have either a knob in the kernel to
say "synchronize with h/w bridge foo" which can be turned off.  

> So no RTNETLINK error on the second cmd. Then
> embedded FDB entries could be dumped this way also so I get a complete view
> of my FDB setup across multiple sw bridges and embedded bridges.

So if you had multiple h/ware bridges - which one is tied to br0? 


> Yes. The hardware has a bit to support this which is currently not exposed
> to user space. That's a case where we have 'yet another knob' that needs
> a clean solution. This causes real bugs today when users try to use the
> macvlan devices in VEPA mode on top of SR-IOV. By the way these modes are
> all part of the 802.1Qbg spec which people actually want to use with Linux
> so a good clean solution is probably needed.


I think the knobs to "flood" and "learn" are important. The hardware
seems to have the "flood" but not the "learn/discover". I think the
s/ware bridge needs to have both. At the moment - as pointed out in that
*NEIGH* notification, s/w bridge assumes a policy that could be
considered a security flaw in some circles - just because you are my
neighbor does not mean i trust you to come into my house; i may trust
you partially and allow you only to come through the front door. Even in
Canada with a default policy of not locking your door we sometimes lock
our doors ;->


> I have no problem with drawing the line here and trying to implement something
> over PF_BRIDGE:RTM_xxx nlmsgs. 


My comment/concern was in regard to the bridge built-in policy of
reading from the neighbor updates (refer to above comments)

cheers,
jamal


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Correct location for bug report: KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread madengineer10
I'm not sure if this bug is located in userspace or in the kernel.
Could you let me know where to file it?

Bug:
Attempting to boot a 32 bit Debian guest with a Xenomai kernel inside
KVM causes it to hang and spin (using 1 full CPU core) after loading
the initrd, as determined by serial console output. The only error
message is "KVM internal error. Suberror: 1"/"emulation failure".
Booting a regular Debian kernel succeeds, as does running the Xenomai
kernel with software emulation (-no-kvm).

Info:
CPU: Intel Core i7-2670QM
Emulator: qemu-kvm 0.14.1
Host kernel: 3.0.0-15 (Ubuntu build), x86_64
Guest OS: Debian Squeeze, kernel.org 2.6.37 kernel with Xenomai 2.6.0
(config attached)
Qemu command: kvm -M pc-0.14 -enable-kvm -m 1024 -drive
file=/var/lib/libvirt/images/eve.img,if=none,id=drive-ide0-0-0,format=raw
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
-netdev tap,fd=21,id=hostnet0 -device
e1000,netdev=hostnet0,id=net0,mac=52:54:00:b5:f4:00,bus=pci.0,addr=0x3
-chardev stdio,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -device
usb-tablet,id=input0 -vga cirrus
Effects of flags: Adding one or both of --no-kvm-irqchip or
--no-kvm-pit has no apparent effect. Adding --no-kvm appears to
correct the problem.

Trace will be attached to the final bug submission.

Thanks,
    --Doug Brunner
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Correct location for bug report: KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread Avi Kivity
On 02/15/2012 06:40 PM, madengineer10 wrote:
> I'm not sure if this bug is located in userspace or in the kernel.
> Could you let me know where to file it?
>
> Bug:
> Attempting to boot a 32 bit Debian guest with a Xenomai kernel inside
> KVM causes it to hang and spin (using 1 full CPU core) after loading
> the initrd, as determined by serial console output. The only error
> message is "KVM internal error. Suberror: 1"/"emulation failure".
> Booting a regular Debian kernel succeeds, as does running the Xenomai
> kernel with software emulation (-no-kvm).
>
>

Please issue the following commands on the qemu monitor:

  (qemu) info registers
  (qemu) x/30i $eip

and report.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Correct location for bug report: KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread Avi Kivity
On 02/15/2012 06:46 PM, Avi Kivity wrote:
> On 02/15/2012 06:40 PM, madengineer10 wrote:
> > I'm not sure if this bug is located in userspace or in the kernel.
> > Could you let me know where to file it?
> >
> > Bug:
> > Attempting to boot a 32 bit Debian guest with a Xenomai kernel inside
> > KVM causes it to hang and spin (using 1 full CPU core) after loading
> > the initrd, as determined by serial console output. The only error
> > message is "KVM internal error. Suberror: 1"/"emulation failure".
> > Booting a regular Debian kernel succeeds, as does running the Xenomai
> > kernel with software emulation (-no-kvm).
> >
> >
>
> Please issue the following commands on the qemu monitor:
>
>   (qemu) info registers
>   (qemu) x/30i $eip
>
> and report.
>

Oh, and wrt your original question, it's likely a kvm bug, please report
in bugzilla.kernel.org.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BUG in pv_clock when overflow condition is detected

2012-02-15 Thread Igor Mammedov


- Original Message -
> From: "Avi Kivity" 
> To: "Igor Mammedov" 
> Cc: linux-ker...@vger.kernel.org, kvm@vger.kernel.org, t...@linutronix.de, 
> mi...@redhat.com, h...@zytor.com,
> r...@redhat.com, "amit shah" , mtosa...@redhat.com
> Sent: Wednesday, February 15, 2012 3:02:04 PM
> Subject: Re: [PATCH] BUG in pv_clock when overflow condition is detected
> 
> On 02/15/2012 01:23 PM, Igor Mammedov wrote:
> >>>   static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time
> >>> *shadow)
> >>>   {
> >>> -u64 delta = native_read_tsc() - shadow->tsc_timestamp;
> >>> +u64 delta;
> >>> +u64 tsc = native_read_tsc();
> >>> +BUG_ON(tsc<  shadow->tsc_timestamp);
> >>> +delta = tsc - shadow->tsc_timestamp;
> >>>   return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
> >>>  shadow->tsc_shift);
> >>
> >> Maybe a WARN_ON_ONCE()?  Otherwise a relatively minor hypervisor
> >> bug can
> >> kill the guest.
> >
> >
> > An attempt to print from this place is not perfect since it often
> > leads
> > to recursive calling to this very function and it hang there
> > anyway.
> > But if you insist I'll re-post it with WARN_ON_ONCE,
> > It won't make much difference because guest will hang/stall due
> > overflow
> > anyway.
> 
> Won't a BUG_ON() also result in a printk?
Yes, it will. But stack will still keep failure point and poking
with crash/gdb at core will always show where it's BUGged.

In case it manages to print dump somehow (saw it couple times from ~
30 test cycles), logs from console or from kernel message buffer
(again poking with gdb) will show where it was called from.

If WARN* is used, it will still totaly screwup clock and 
"last value" and system will become unusable, requiring looking with
gdb/crash at the core any way.

So I've just used more stable failure point that will leave trace
everywhere it manages (maybe in console log, but for sure in stack)
in case of WARN it might leave trace on console or not and probably
won't reflect failure point in stack either leaving only kernel
message buffer for clue.

> 
> >
> > If there is an intention to keep guest functional after the event
> > then
> > maybe this patch is a way to go
> >   http://www.spinics.net/lists/kvm/msg68463.html
> > this way clock will be re-silent to this kind of errors, like
> > bare-metal
> > one is.
> 
> It's the same patch... do you mean something that detects the
> overflow
> and uses the last value?
I'm sorry, pasted wrong link
here it goes: 
  "pvclock: Make pv_clock more robust and fixup it if overflow happens"
 http://www.spinics.net/lists/kvm/msg68440.html

> 
> --
> error compiling committee.c: too many arguments to function
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Andrea Arcangeli
On Wed, Feb 15, 2012 at 04:07:49PM +0200, Avi Kivity wrote:
> Well, it still has flushes inside the lock.  And it seems to be more
> complicated, but maybe that's because I thought of my idea and didn't
> fully grok yours yet.

If we go more complicated I prefer Avi's suggestion to move them all
outside the lock.

Yesterday I was also thinking at the regular pagetables and how we do
not have similar issues there. On the regular pagetables we just do
unconditional flush in fork when we make it readonly and KSM (the
other place that ptes stuff readonly that later can cow) uses
ptep_clear_flush which does an unconditional flush and furthermore it
does it inside the PT lock, so generally we don't optimize for those
things on the regular pagetables. But then these events don't happen
as frequently as they can on KVM without EPT/NPT.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Scott Wood
On 02/15/2012 05:57 AM, Alexander Graf wrote:
> 
> On 15.02.2012, at 12:18, Avi Kivity wrote:
> 
>> Well the real reason is we have an extra bit reported by page faults
>> that we can control.  Can't you set up a hashed pte that is configured
>> in a way that it will fault, no matter what type of access the guest
>> does, and see it in your page fault handler?
> 
> I might be able to synthesize a PTE that is !readable and might throw
> a permission exception instead of a miss exception. I might be able
> to synthesize something similar for booke. I don't however get any
> indication on why things failed.

On booke with ISA 2.06 hypervisor extensions, there's MAS8[VF] that will
trigger a DSI that gets sent to the hypervisor even if normal DSIs go
directly to the guest.  You'll still need to zero out the execute
permission bits.

For other booke, you could use one of the user bits in MAS3 (along with
zeroing out all the permission bits), which you could get to by doing a
tlbsx.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 14/16] KVM: PPC: booke: category E.HV (GS-mode) support

2012-02-15 Thread Alexander Graf

On 10.01.2012, at 01:51, Scott Wood wrote:

> On 01/09/2012 11:46 AM, Alexander Graf wrote:
>> 
>> On 21.12.2011, at 02:34, Scott Wood wrote:
> 

[...]

>>> Current issues include:
>>> - Machine checks from guest state are not routed to the host handler.
>>> - The guest can cause a host oops by executing an emulated instruction
>>>  in a page that lacks read permission.  Existing e500/4xx support has
>>>  the same problem.
>> 
>> We solve that in book3s pr by doing
>> 
>>  LAST_INST = ;
>>  PACA->kvm_mode = ;
>>  lwz(guest pc);
>>  do_more_stuff();
>> 
>> That way when an exception occurs at lwz() the DO_KVM handler checks that 
>> we're in kvm mode "recover" which does basically srr0+=4; rfi;.
> 
> I was thinking we'd check ESR[EPID] or SRR1[IS] as appropriate, and
> treat it as a kernel fault (search exception table) -- but this works
> too and is a bit cleaner (could be other uses of external pid), at the
> expense of a couple extra instructions in the emulation path (but
> probably a slightly faster host TLB handler).
> 
> The check wouldn't go in DO_KVM, though, since on bookehv that only
> deals with diverting flow when xSRR1[GS] is set, which wouldn't be the
> case here.

Thinking about it a bit more, how is this different from a failed get_user()? 
We can just use the same fixup mechanism as there, right?

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-15 Thread Andy Lutomirski
Hi, kvm people-

Here's a strange failure.  It could be a bug in something
RHEL6-specific, but it could be a generic issue that only triggers
with a paravirt guest with old userspace on a non-ept host.  There was
a bug like this on Xen, and I'm wondering something's wrong on kvm as
well.

For background, a change in 3.1 (IIRC) means that, when
vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
NX.  It seems like Amit's machine is marking the physical PTE present
but unreadable.  So I could have messed up, or there could be a subtle
bug somewhere.  Any ideas?

I'll try to reproduce on a non-ept host later on, but that will
involve finding one.

On Wed, Feb 15, 2012 at 3:01 AM, Amit Shah  wrote:
> On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote:
>> On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah  wrote:
>> Can you try booting the initramfs here:
>> http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
>> with your kernel image (i.e. qemu-kvm -kernel  -initrd
>> vsyscall_initramfs.img -whatever_else) and seeing what happens?  It
>> works for me.
>
> This too results in a similar error.

Can you post the exact error?  I'm interested in how far it gets
before it fails.

> I didn't try a modern distro, but looks like this is enough evidence
> for now to check the kvm emulator code.  I tried the same guests on a
> newer kernel (Fedora 16's 3.2), and things worked fine except for
> vsyscall=none, panic message below.

vsyscall=none isn't supposed to work unless you're running a very
modern distro *and* you have no legacy static binaries *and* you
aren't using anything written in Go (sigh).  It will probably either
never become the default or will take 5-10 years.


> model name      : Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz
> flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
> constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor 
> ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi 
> flexpriority

Hmm.  You don't have ept.  If your guest kernel supports paravirt,
then you might use the hypercall interface instead of programming the
fixmap directly.

>
> This is what I get with vsyscall=none, where emulate and native work
> fine on the 3.2 kernel on different host hardware, the guest stays the
> same:
>
>
> [    2.874661] debug: unmapping init memory 8167f000..818dc000
> [    2.876778] Write protecting the kernel read-only data: 6144k
> [    2.879111] debug: unmapping init memory 880001318000..88000140
> [    2.881242] debug: unmapping init memory 8800015a..88000160
> [    2.884637] init[1] vsyscall attempted with vsyscall=none 
> ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0

This like (vsyscall attempted) means that the emulation worked
correctly.  Your other traces didn't have it or anything like it,
which mostly rules out do_emulate_vsyscall issues.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 14/16] KVM: PPC: booke: category E.HV (GS-mode) support

2012-02-15 Thread Scott Wood
On 02/15/2012 01:36 PM, Alexander Graf wrote:
> 
> On 10.01.2012, at 01:51, Scott Wood wrote:
>> I was thinking we'd check ESR[EPID] or SRR1[IS] as appropriate, and
>> treat it as a kernel fault (search exception table) -- but this works
>> too and is a bit cleaner (could be other uses of external pid), at the
>> expense of a couple extra instructions in the emulation path (but
>> probably a slightly faster host TLB handler).
>>
>> The check wouldn't go in DO_KVM, though, since on bookehv that only
>> deals with diverting flow when xSRR1[GS] is set, which wouldn't be the
>> case here.
> 
> Thinking about it a bit more, how is this different from a failed get_user()? 
> We can just use the same fixup mechanism as there, right?

The fixup mechanism can be the same (we'd like to know whether it failed
due to TLB miss or DSI, so we know which to reflect -- but if necessary
I think we can figure that out with a tlbsx).  What's different is that
the page fault handler needs to know that any external pid (or AS1)
fault is bad, same as if the address were in the kernel area, and it
should go directly to searching the exception tables instead of trying
to page something in.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Anthony Liguori

On 02/15/2012 07:39 AM, Avi Kivity wrote:

On 02/07/2012 08:12 PM, Rusty Russell wrote:

I would really love to have this, but the problem is that we'd need a
general purpose bytecode VM with binding to some kernel APIs.  The
bytecode VM, if made general enough to host more complicated devices,
would likely be much larger than the actual code we have in the kernel now.


We have the ability to upload bytecode into the kernel already.  It's in
a great bytecode interpreted by the CPU itself.


Unfortunately it's inflexible (has to come with the kernel) and open to
security vulnerabilities.


I wonder if there's any reasonable way to run device emulation within the 
context of the guest.  Could we effectively do something like SMM?


For a given set of traps, reflect back into the guest quickly changing the 
visibility of the VGA region. It may require installing a new CR3 but maybe that 
wouldn't be so bad with VPIDs.


Then you could implement the PIT as guest firmware using kvmclock as the time 
base.

Once you're back in the guest, you could install the old CR3.  Perhaps just hide 
a portion of the physical address space with the e820.


Regards,

Anthony Liguori


If every user were emulating different machines, LPF this would make
sense.  Are they?


They aren't.


Or should we write those helpers once, in C, and
provide that for them.


There are many of them: PIT/PIC/IOAPIC/MSIX tables/HPET/kvmclock/Hyper-V
stuff/vhost-net/DMA remapping/IO remapping (just for x86), and some of
them are quite complicated.  However implementing them in bytecode
amounts to exposing a stable kernel ABI, since they use such a vast
range of kernel services.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Arnd Bergmann
On Tuesday 07 February 2012, Alexander Graf wrote:
> On 07.02.2012, at 07:58, Michael Ellerman wrote:
> 
> > On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote:
> >> You're exposing a large, complex kernel subsystem that does very
> >> low-level things with the hardware.  It's a potential source of exploits
> >> (from bugs in KVM or in hardware).  I can see people wanting to be
> >> selective with access because of that.
> > 
> > Exactly.
> > 
> > In a perfect world I'd agree with Anthony, but in reality I think
> > sysadmins are quite happy that they can prevent some users from using
> > KVM.
> > 
> > You could presumably achieve something similar with capabilities or
> > whatever, but a node in /dev is much simpler.
> 
> Well, you could still keep the /dev/kvm node and then have syscalls operate 
> on the fd.
> 
> But again, I don't see the problem with the ioctl interface. It's nice, 
> extensible and works great for us.
> 

ioctl is good for hardware devices and stuff that you want to enumerate
and/or control permissions on. For something like KVM that is really a
core kernel service, a syscall makes much more sense.

I would certainly never mix the two concepts: If you use a chardev to get
a file descriptor, use ioctl to do operations on it, and if you use a 
syscall to get the file descriptor then use other syscalls to do operations
on it.

I don't really have a good recommendation whether or not to change from an
ioctl based interface to syscall for KVM now. On the one hand I believe it
would be significantly cleaner, on the other hand we cannot remove the
chardev interface any more since there are many existing users.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Arnd Bergmann
On Tuesday 07 February 2012, Alexander Graf wrote:
> >> 
> >> Not sure we'll ever get there. For PPC, it will probably take another 1-2 
> >> years until we get the 32-bit targets stabilized. By then we will have new 
> >> 64-bit support though. And then the next gen will come out giving us even 
> >> more new constraints.
> > 
> > I would expect that newer archs have less constraints, not more.
> 
> Heh. I doubt it :). The 64-bit booke stuff is pretty similar to what we have 
> today on 32-bit, but extends a
> bunch of registers to 64-bit. So what if we laid out stuff wrong before?
> 
> I don't even want to imagine what v7 arm vs v8 arm looks like. It's a 
> completely new architecture.
> 

I have not seen the source but I'm pretty sure that v7 and v8 they look very
similar regarding virtualization support because they were designed together,
including the concept that on v8 you can run either a v7 compatible 32 bit
hypervisor with 32 bit guests or a 64 bit hypervisor with a combination of
32 and 64 bit guests. Also, the page table layout in v7-LPAE is identical
to the v8 one. The main difference is the instruction set, but then ARMv7
already has four of these (ARM, Thumb, Thumb2, ThumbEE).

Arnd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 14/16] KVM: PPC: booke: category E.HV (GS-mode) support

2012-02-15 Thread Alexander Graf

On 15.02.2012, at 20:40, Scott Wood wrote:

> On 02/15/2012 01:36 PM, Alexander Graf wrote:
>> 
>> On 10.01.2012, at 01:51, Scott Wood wrote:
>>> I was thinking we'd check ESR[EPID] or SRR1[IS] as appropriate, and
>>> treat it as a kernel fault (search exception table) -- but this works
>>> too and is a bit cleaner (could be other uses of external pid), at the
>>> expense of a couple extra instructions in the emulation path (but
>>> probably a slightly faster host TLB handler).
>>> 
>>> The check wouldn't go in DO_KVM, though, since on bookehv that only
>>> deals with diverting flow when xSRR1[GS] is set, which wouldn't be the
>>> case here.
>> 
>> Thinking about it a bit more, how is this different from a failed 
>> get_user()? We can just use the same fixup mechanism as there, right?
> 
> The fixup mechanism can be the same (we'd like to know whether it failed
> due to TLB miss or DSI, so we know which to reflect

No, we only want to know "fast path failed". The reason is a different pair of 
shoes and should be evaluated in the slow path. We shouldn't ever fault here 
during normal operation btw. We already executed a guest instruction, so 
there's almost no reason it can't be read.

> -- but if necessary
> I think we can figure that out with a tlbsx).  What's different is that
> the page fault handler needs to know that any external pid (or AS1)
> fault is bad, same as if the address were in the kernel area, and it
> should go directly to searching the exception tables instead of trying
> to page something in.

Yes and no. We need to force it to search the exception tables. We don't care 
if the page fault handlers knows anything about external pids.

Either way, we discussed the further stuff on IRC and came to a working 
solution :). Stay tuned.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Michael Ellerman
On Wed, 2012-02-15 at 22:21 +, Arnd Bergmann wrote:
> On Tuesday 07 February 2012, Alexander Graf wrote:
> > On 07.02.2012, at 07:58, Michael Ellerman wrote:
> > 
> > > On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote:
> > >> You're exposing a large, complex kernel subsystem that does very
> > >> low-level things with the hardware.  It's a potential source of exploits
> > >> (from bugs in KVM or in hardware).  I can see people wanting to be
> > >> selective with access because of that.
> > > 
> > > Exactly.
> > > 
> > > In a perfect world I'd agree with Anthony, but in reality I think
> > > sysadmins are quite happy that they can prevent some users from using
> > > KVM.
> > > 
> > > You could presumably achieve something similar with capabilities or
> > > whatever, but a node in /dev is much simpler.
> > 
> > Well, you could still keep the /dev/kvm node and then have syscalls operate 
> > on the fd.
> > 
> > But again, I don't see the problem with the ioctl interface. It's nice, 
> > extensible and works great for us.
> > 
> 
> ioctl is good for hardware devices and stuff that you want to enumerate
> and/or control permissions on. For something like KVM that is really a
> core kernel service, a syscall makes much more sense.

Yeah maybe. That distinction is at least in part just historical.

The first problem I see with using a syscall is that you don't need one
syscall for KVM, you need ~90. OK so you wouldn't do that, you'd use a
multiplexed syscall like epoll_ctl() - or probably several
(vm/vcpu/etc).

Secondly you still need a handle/context for those syscalls, and I think
the most sane thing to use for that is an fd.

At that point you've basically reinvented ioctl :)

I also think it is an advantage that you have a node in /dev for
permissions. I know other "core kernel" interfaces don't use a /dev
node, but arguably that is their loss.

> I would certainly never mix the two concepts: If you use a chardev to get
> a file descriptor, use ioctl to do operations on it, and if you use a 
> syscall to get the file descriptor then use other syscalls to do operations
> on it.

Sure, we use a syscall to get the fd (open) and then other syscalls to
do operations on it, ioctl and kvm_vcpu_run. ;)

But seriously, I guess that makes sense. Though it's a bit of a pity
because if you want a syscall for any of it, eg. vcpu_run(), then you
have to basically reinvent ioctl for all the other little operations.

cheers


signature.asc
Description: This is a digitally signed message part


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-15 Thread John Fastabend
On 2/15/2012 6:10 AM, Jamal Hadi Salim wrote:
> On Tue, 2012-02-14 at 10:57 -0800, John Fastabend wrote:
> 
>> Roopa was likely on the right track here,
>>
>> http://patchwork.ozlabs.org/patch/123064/
> 
> Doesnt seem related to the bridging stuff - the modeling looks
> reasonable however.
> 

The operations are really the same ADD/DEL/GET additional MAC
addresses to a port, in this case a macvlan type port. The
difference is the  macvlan port type drops any packet with an
address not in the FDB where the bridge type floods these.

>> But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
>> netlink messages. And if possible drive this without extending ndo_ops.
>>
>> An ideal user space interaction IMHO would look like,
>>
>> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
>> [root@jf-dev1-dcblab iproute2]# ./br/br fdb
>> portmac addrflags
>> veth2   36:a6:35:9b:96:c4   local
>> veth4   aa:54:b0:7b:42:ef   local
>> veth0   2a:e8:5c:95:6c:1b   local
>> veth6   6e:26:d5:43:a3:36   local
>> veth0   f2:c1:39:76:6a:fb
>> veth8   4e:35:16:af:87:13   local
>> veth10  52:e5:62:7b:57:88   static
>> veth10  aa:a9:35:21:15:c4   local
> 
> Looks nice, where is the targeted bridge(eg br0) in that syntax?

[root@jf-dev1-dcblab src]# br fdb help
Usage: br fdb { add | del | replace } ADDR dev DEV
   br fdb {show} [ dev DEV ]

In my example I just dumped all bridge devices,

#br fdb show dev bridge0

> 
>> Using Stephen's br tool. First command adds FDB entry to SW bridge and
>> if the same tool could be used to add entries to embedded bridge I think
>> that would be the best case. 
> 
> That would be nice (although adds dependency on the presence of the
> s/ware bridge). Would be nicer to have either a knob in the kernel to
> say "synchronize with h/w bridge foo" which can be turned off.  
> 

Seems we need both a synchronize and a { add | del | replace } option.

>> So no RTNETLINK error on the second cmd. Then
>> embedded FDB entries could be dumped this way also so I get a complete view
>> of my FDB setup across multiple sw bridges and embedded bridges.
> 
> So if you had multiple h/ware bridges - which one is tied to br0? 
> 

Not sure I follow but does the additional dev parameter above answer this?

> 
>> Yes. The hardware has a bit to support this which is currently not exposed
>> to user space. That's a case where we have 'yet another knob' that needs
>> a clean solution. This causes real bugs today when users try to use the
>> macvlan devices in VEPA mode on top of SR-IOV. By the way these modes are
>> all part of the 802.1Qbg spec which people actually want to use with Linux
>> so a good clean solution is probably needed.
> 
> 
> I think the knobs to "flood" and "learn" are important. The hardware
> seems to have the "flood" but not the "learn/discover". I think the
> s/ware bridge needs to have both. At the moment - as pointed out in that
> *NEIGH* notification, s/w bridge assumes a policy that could be
> considered a security flaw in some circles - just because you are my
> neighbor does not mean i trust you to come into my house; i may trust
> you partially and allow you only to come through the front door. Even in
> Canada with a default policy of not locking your door we sometimes lock
> our doors ;->
> 
> 
>> I have no problem with drawing the line here and trying to implement 
>> something
>> over PF_BRIDGE:RTM_xxx nlmsgs. 
> 
> 
> My comment/concern was in regard to the bridge built-in policy of
> reading from the neighbor updates (refer to above comments)
> 

So I think what your saying is a per port bit to disable learning...

hmm but if you start tweaking it too much it looks less and less like a
802.1D bridge and more like something you would want to build with tc or
openvswitch or tc+bridge or tc+macvlan.

.John

> cheers,
> jamal
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/3] [PULL] qemu-kvm.git uq/master queue

2012-02-15 Thread Anthony Liguori

On 02/08/2012 02:01 PM, Marcelo Tosatti wrote:

The following changes since commit cf4dc461a4cfc3e056ee24edb26154f4d34a6278:

   Restore consistent formatting (2012-02-07 22:11:04 +0400)

are available in the git repository at:
   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master


Pulled.  Thanks.

Regards,

Anthony Liguori



Jan Kiszka (3):
   kvm: Allow to set shadow MMU size
   kvm: Implement kvm_irqchip_in_kernel like kvm_enabled
   apic: Fix legacy vmstate loading for KVM

  hw/apic_common.c  |7 ++-
  hw/pc.c   |4 ++--
  hw/pc_piix.c  |6 +++---
  kvm-all.c |   13 -
  kvm-stub.c|5 -
  kvm.h |8 +---
  qemu-config.c |4 
  qemu-options.hx   |5 -
  target-i386/kvm.c |   17 +++--
  9 files changed, 43 insertions(+), 26 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Rusty Russell
On Wed, 15 Feb 2012 15:39:41 +0200, Avi Kivity  wrote:
> On 02/07/2012 08:12 PM, Rusty Russell wrote:
> > > I would really love to have this, but the problem is that we'd need a
> > > general purpose bytecode VM with binding to some kernel APIs.  The
> > > bytecode VM, if made general enough to host more complicated devices,
> > > would likely be much larger than the actual code we have in the kernel 
> > > now.
> >
> > We have the ability to upload bytecode into the kernel already.  It's in
> > a great bytecode interpreted by the CPU itself.
> 
> Unfortunately it's inflexible (has to come with the kernel) and open to
> security vulnerabilities.

It doesn't have to come with the kernel, but it does require privs.  And
the bytecode itself might be invulnerable, the services it will call
will be, so it's not clear it'll be a win, given the reduced
auditability.

The grass is not really greener, and getting there involves many fences.

> > If every user were emulating different machines, LPF this would make
> > sense.  Are they?  
> 
> They aren't.
> 
> > Or should we write those helpers once, in C, and
> > provide that for them.
> 
> There are many of them: PIT/PIC/IOAPIC/MSIX tables/HPET/kvmclock/Hyper-V
> stuff/vhost-net/DMA remapping/IO remapping (just for x86), and some of
> them are quite complicated.  However implementing them in bytecode
> amounts to exposing a stable kernel ABI, since they use such a vast
> range of kernel services.

We could think about regularizing and enumerating the various in-kernel
helpers, and give userspace a generic mechanism for wiring them up.
That would surely be the first step towards bytecode anyway.

But the current device assignment ioctls make me think that this
wouldn't be simple or neat.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-15 Thread Ben Hutchings
[I'm just catching up with this after getting my own driver changes into
shape.]

On Fri, 2012-02-10 at 10:18 -0500, jamal wrote:
> Hi John,
> 
> I went backwards to summarize at the top after going through your email.
> 
> TL;DR version 0.1: 
> you provide a good use case where it makes sense to do things in the
> kernel. IMO, you could make the same arguement if your embedded switch
> could do ACLs, IPv4 forwarding etc. And the kernel bloats.
> I am always bigoted to move all policy control to user space instead of
> bloating in the kernel.
[...]
> > Now here is the potential issue,
> > 
> > (G) The frame transmitted from ethx.y with the destination address of
> > veth0 but the embedded switch is not a learning switch. If the FDB
> > update is done in user space its possible (likely?) that the FDB
> > entry for veth0 has not been added to the embedded switch yet. 
> 
> Ok, got it - so the catch here is the switch is not capable of learning.
> I think this depends on where learning is done. Your intent is to
> use the S/W bridge as something that does the learning for you i.e in
> the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
> And that maybe the case for your use case.
[...]

Well, in addition, there are SR-IOV network adapters that don't have any
bridge.  For these, the software bridge is necessary to handle
multicast, broadcast and forwarding between local ports, not only to do
learning.

Solarflare's implementation of accelerated guest networking (which
Shradha and I are gradually sending upstream) builds on libvirt's
existing support for software bridges and assigns VFs to guests as a
means to offload some of the forwarding.

If and when we implement a hardware bridge, we would probably still want
to keep the software bridge as a fallback.  If a guest is dependent on a
VF that's connected to a hardware bridge, it becomes impossible or at
least very disruptive to migrate it to another host that doesn't have a
compatible VF available.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-15 Thread Xiao Guangrong
On 02/15/2012 10:07 PM, Avi Kivity wrote:

> On 02/15/2012 01:37 PM, Xiao Guangrong wrote:

 I would really like to move the IPI back out of the lock.

 How about something like a sequence lock:


 spin_lock(mmu_lock)
 need_flush = write_protect_stuff();
 atomic_add(kvm->want_flush_counter, need_flush);
 spin_unlock(mmu_lock);

 while ((done = atomic_read(kvm->done_flush_counter)) < (want =
 atomic_read(kvm->want_flush_counter)) {
   kvm_make_request(flush)
   atomic_cmpxchg(kvm->done_flush_counter, done, want)
 }

 This (or maybe a corrected and optimized version) ensures that any
 need_flush cannot pass the while () barrier, no matter which thread
 encounters it first.  However it violates the "do not invent new locking
 techniques" commandment.  Can we map it to some existing method?
>>>
>>> There is no need to advance 'want' in the loop.  So we could do
>>>
>>> /* must call with mmu_lock held */
>>> void kvm_mmu_defer_remote_flush(kvm, need_flush)
>>> {
>>>   if (need_flush)
>>> ++kvm->flush_counter.want;
>>> }
>>>
>>> /* may call without mmu_lock */
>>> void kvm_mmu_commit_remote_flush(kvm)
>>> {
>>>   want = ACCESS_ONCE(kvm->flush_counter.want)
>>>   while ((done = atomic_read(kvm->flush_counter.done) < want) {
>>> kvm_make_request(flush)
>>> atomic_cmpxchg(kvm->flush_counter.done, done, want)
>>>   }
>>> }
>>>
>>
>>
>> Hmm, we already have kvm->tlbs_dirty, so, we can do it like this:
>>
>> #define SPTE_INVALID_UNCLEAN (1 << 63 )
>>
>> in invalid page path:
>> lock mmu_lock
>> if (spte is invalid)
>>  kvm->tlbs_dirty |= SPTE_INVALID_UNCLEAN;
>> need_tlb_flush = kvm->tlbs_dirty;
>> unlock mmu_lock
>> if (need_tlb_flush)
>>  kvm_flush_remote_tlbs()
>>
>> And in page write-protected path:
>> lock mmu_lock
>>  if (it has spte change to readonly |
>>kvm->tlbs_dirty & SPTE_INVALID_UNCLEAN)
>>  kvm_flush_remote_tlbs()
>> unlock mmu_lock
>>
>> How about this?
> 
> Well, it still has flushes inside the lock.  And it seems to be more
> complicated, but maybe that's because I thought of my idea and didn't
> fully grok yours yet.
> 


Oh, i was not think that flush all tlbs out of mmu-lock, just invalid page
path.

But, there still have some paths need flush tlbs inside of mmu-lock(like
sync_children, get_page).

In your code:

>>> /* must call with mmu_lock held */
>>> void kvm_mmu_defer_remote_flush(kvm, need_flush)
>>> {
>>>   if (need_flush)
>>> ++kvm->flush_counter.want;
>>> }
>>>
>>> /* may call without mmu_lock */
>>> void kvm_mmu_commit_remote_flush(kvm)
>>> {
>>>   want = ACCESS_ONCE(kvm->flush_counter.want)
>>>   while ((done = atomic_read(kvm->flush_counter.done) < want) {
>>> kvm_make_request(flush)
>>> atomic_cmpxchg(kvm->flush_counter.done, done, want)
>>>   }
>>> }

I think we do not need handle all tlb-flushed request here since all of these
request can be delayed to the point where mmu-lock is released , we can simply
do it:

void kvm_mmu_defer_remote_flush(kvm, need_flush)
{
if (need_flush)
++kvm->tlbs_dirty;
}

void kvm_mmu_commit_remote_flush(struct kvm *kvm)
{
int dirty_count = kvm->tlbs_dirty;

smp_mb();

if (!dirty_count)
return;

if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
}

if this is ok, we only need do small change in the current code, since
kvm_mmu_commit_remote_flush is very similar with kvm_flush_remote_tlbs().

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: perf: kvm events analysis tool

2012-02-15 Thread Xiao Guangrong
On 02/13/2012 11:52 PM, David Ahern wrote:


>> The first patch is only needed for code compilation, after kvm-events is
>> compiled, you can analyse any kernels. :)
> 
> understood.
> 
> Now that I recall perf's way of handling out of tree builds, a couple of
> comments:
> 
> 1. you need to add the following to tools/perf/MANIFEST
> arch/x86/include/asm/svm.h
> arch/x86/include/asm/vmx.h
> arch/x86/include/asm/kvm_host.h
> 


Right.

> 2.scripts/checkpatch.pl is an unhappy camper.
> 


It seems checkpath always complains about TRACE_EVENT and many more
than-80-characters lines in perf tools.

> I'll take a look at the code and try out the command when I get some time.
> 


Okay, i will post the next version after collecting your new comments!

Thanks for your time, David! :)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: perf: kvm events analysis tool

2012-02-15 Thread David Ahern

On 2/15/12 9:59 PM, Xiao Guangrong wrote:



Okay, i will post the next version after collecting your new comments!

Thanks for your time, David! :)



I had more comments, but got sidetracked and forgot to come back to 
this. I still haven't looked at the code yet, but some comments from 
testing:


1. The error message:
  Warning: Error: expected type 5 but read 4
  Warning: Error: expected type 5 but read 0
  Warning: unknown op '}'

is fixed by this patch which has not yet made its way into perf:
https://lkml.org/lkml/2011/9/4/41

The most recent request:
https://lkml.org/lkml/2012/2/8/479

Arnaldo: the patch still applies cleanly (but with an offset of -2 lines).


2. negatve testing:

perf kvm-events record -e kvm:* -p 2603 -- sleep 10

  Warning: Error: expected type 4 but read 7
  Warning: Error: expected type 5 but read 0
  Warning: failed to read event print fmt for kvm_apic
  Warning: Error: expected type 4 but read 7
  Warning: Error: expected type 5 but read 0
  Warning: failed to read event print fmt for kvm_inj_exception
  Fatal: bad op token {

If other kvm events are specified in the record line they appear to be 
silently ignored in the report in which case why allow the -e option to 
record?



3. What is happening for multiple VMs?

a. perf kvm-events report
data is collected for all VMs. What is displayed in the report? An
average for all VMs?

b. perf kvm-events report --vcpu 1
Does this given an average of all vcpu 1's?

Perhaps a -p option for the report to pull out events related to a 
single VM. Really this could be a generic option (to perf-report and 
perf-script as well) to only show/analyze events for the specified pid. 
ie., data is recorded for all VMs (or system wide for the regular 
perf-record) and you want to only consider events for a specific pid. 
e.g., in process_sample_event() skip event if event->ip.pid != 
report_pid (works for perf code because PERF_SAMPLE_TID attribute is 
always set).


David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: perf: kvm events analysis tool

2012-02-15 Thread Xiao Guangrong
On 02/16/2012 01:05 PM, David Ahern wrote:

> On 2/15/12 9:59 PM, Xiao Guangrong wrote:
>>
>>
>> Okay, i will post the next version after collecting your new comments!
>>
>> Thanks for your time, David! :)
>>
> 
> I had more comments, but got sidetracked and forgot to come back to this. I 
> still haven't looked at the code yet, but some comments from testing:
> 
> 1. The error message:
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
> 
> is fixed by this patch which has not yet made its way into perf:
> https://lkml.org/lkml/2011/9/4/41
> 
> The most recent request:
> https://lkml.org/lkml/2012/2/8/479
> 
> Arnaldo: the patch still applies cleanly (but with an offset of -2 lines).
> 


Great, it is a good fix.

But, it does not hurt the development of kvm-events.

> 
> 2. negatve testing:
> 
> perf kvm-events record -e kvm:* -p 2603 -- sleep 10
> 
>   Warning: Error: expected type 4 but read 7
>   Warning: Error: expected type 5 but read 0
>   Warning: failed to read event print fmt for kvm_apic
>   Warning: Error: expected type 4 but read 7
>   Warning: Error: expected type 5 but read 0
>   Warning: failed to read event print fmt for kvm_inj_exception
>   Fatal: bad op token {
> 
> If other kvm events are specified in the record line they appear to be 
> silently ignored in the report in which case why allow the -e option to 
> record?
> 


Yes, kvm-events doese not analyse these events specified by -e option since
these events are not needed by vmexit/ioport/mmio analysis.

And after kvm-evnets record, you can see these events by perf script

> 
> 3. What is happening for multiple VMs?
> 
> a. perf kvm-events report
> data is collected for all VMs. What is displayed in the report? An
> average for all VMs?
> 


Yes

> b. perf kvm-events report --vcpu 1
> Does this given an average of all vcpu 1's?
> 


Yes

> Perhaps a -p option for the report to pull out events related to a single VM. 
> Really this could be a generic option (to perf-report and perf-script as 
> well) to only show/analyze events for the specified pid. ie., data is 
> recorded for all VMs (or system wide for the regular perf-record) and you 
> want to only consider events for a specific pid. e.g., in 
> process_sample_event() skip event if event->ip.pid != report_pid (works for 
> perf code because PERF_SAMPLE_TID attribute is always set).

Analysis for per VMs is good idea, but please allow me put it into my TODO 
list. :)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] New: KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779

   Summary: KVM domain hangs after loading initrd with Xenomai
kernel
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.0.0-15
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: madenginee...@gmail.com
Regression: No


Attempting to boot a 32 bit Debian guest with a Xenomai kernel inside KVM
causes it to hang and spin (using 1 full CPU core) after loading the initrd, as
determined by serial console output. The only error message is "KVM internal
error. Suberror: 1"/"emulation failure". Booting a regular Debian kernel
succeeds, as does running the Xenomai kernel with software emulation (-no-kvm).

Info:
CPU: Intel Core i7-2670QM
Emulator: qemu-kvm 0.14.1
Host kernel: 3.0.0-15 (Ubuntu build), x86_64
Guest OS: Debian Squeeze, kernel.org 2.6.37 kernel with Xenomai 2.6.0 (config
attached)
Qemu command: kvm -M pc-0.14 -enable-kvm -m 1024 -drive
file=/var/lib/libvirt/images/eve.img,if=none,id=drive-ide0-0-0,format=raw
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
tap,fd=21,id=hostnet0 -device
e1000,netdev=hostnet0,id=net0,mac=52:54:00:b5:f4:00,bus=pci.0,addr=0x3 -chardev
stdio,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb
-device usb-tablet,id=input0 -vga cirrus
Effects of flags: Adding one or both of --no-kvm-irqchip or --no-kvm-pit has no
apparent effect. Adding --no-kvm appears to correct the problem, at the cost of
performance due to using the software emulator.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #1 from madenginee...@gmail.com  2012-02-16 05:46:07 ---
Created an attachment (id=72393)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=72393)
Configuration of the guest kernel

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #2 from madenginee...@gmail.com  2012-02-16 05:47:18 ---
Created an attachment (id=72394)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=72394)
Result of 'registers info' and 'x/30i $eip' after fault

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779


madenginee...@gmail.com changed:

   What|Removed |Added

  Attachment #72393|application/octet-stream|text/plain
  mime type||




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779


madenginee...@gmail.com changed:

   What|Removed |Added

  Attachment #72394|application/octet-stream|text/plain
  mime type||




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #3 from madenginee...@gmail.com  2012-02-16 05:57:25 ---
Couldn't attach the trace I recorded of the fault occurring since it's 3 MB
compressed with xz, bigger still with other formats. I can email it if it will
be useful.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #4 from madenginee...@gmail.com  2012-02-16 06:32:48 ---
Same problem occurs with qemu-kvm 1.0 from
https://launchpad.net/~bderzhavets/+archive/lib-usbredir39:

$ sudo kvm -M pc-1.0 -enable-kvm -m 1024 -drive
file=/var/lib/libvirt/images/eve.img,if=none,id=drive-ide0-0-0,format=raw
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
tap,fd=21,id=hostnet0 -device
e1000,netdev=hostnet0,id=net0,mac=52:54:00:b5:f4:00,bus=pci.0,addr=0x3 -chardev
vc,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb
-device usb-tablet,id=input0 -vga cirrus
kvm: -netdev tap,fd=21,id=hostnet0: TUNGETIFF ioctl() failed: Bad file
descriptor
TUNSETOFFLOAD ioctl() failed: Bad file descriptor
kvm: -device
e1000,netdev=hostnet0,id=net0,mac=52:54:00:b5:f4:00,bus=pci.0,addr=0x3:
pci_add_option_rom: failed to find romfile "pxe-e1000.rom"
KVM internal error. Suberror: 1
emulation failure
EAX=f681 EBX=003e ECX=003e EDX=c00b8000
ESI=c00b8000 EDI=c15b EBP=c15b1f74 ESP=c15b1f58
EIP=c1228905 EFL=00010206 [-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00c0f300 DPL=3 DS   [-WA]
CS =0060   00c09b00 DPL=0 CS32 [-RA]
SS =0068   00c09300 DPL=0 DS   [-WA]
DS =007b   00c0f300 DPL=3 DS   [-WA]
FS =   
GS =   
LDT=   
TR =0080 c15b6300 206b 8b00 DPL=0 TSS32-busy
GDT= c15b3000 00ff
IDT= c15b2000 07ff
CR0=80050033 CR2=ffee4000 CR3=01663000 CR4=0690
DR0= DR1= DR2=
DR3= 
DR6=0ff0 DR7=0400
EFER=
Code=8e 2b 01 00 00 8b 4d f0 89 f2 8b 45 ec 0f 0d 82 40 01 00 00 <0f> 6f 02 0f
6f 4a 08 0f 6f 52 10 0f 6f 5a 18 0f 7f 00 0f 7f 48 08 0f 7f 50 10 0f 7f 58 18

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779


Gleb  changed:

   What|Removed |Added

 CC||g...@redhat.com




--- Comment #5 from Gleb   2012-02-16 07:15:49 ---
(In reply to comment #3)
> Couldn't attach the trace I recorded of the fault occurring since it's 3 MB
> compressed with xz, bigger still with other formats. I can email it if it will
> be useful.

Can you do "tail -1" on it and attach it here?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #6 from madenginee...@gmail.com  2012-02-16 07:22:41 ---
Created an attachment (id=72395)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=72395)
Last 10k lines of a trace showing the fault

Per Gleb's request

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #7 from Gleb   2012-02-16 07:43:11 ---
Have you installed trace-cmd before capturing the trace? It failed to parse kvm
events. qemu haven't paused the guest after emulation error (looks like a bug),
so 'x/30i $eip' output is not useful either. Can you do 'x/30i 0xXXX' where XXX
is the address in EIP from register dump you see after instruction emulation
failure message (c1228905 in your output from comment #4)?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #8 from madenginee...@gmail.com  2012-02-16 07:56:03 ---
Not sure what you mean by installing trace-cmd before capturing the trace--I
did do that, otherwise I wouldn't have had a trace-cmd to run. The package
version is trace-cmd 1.0.3-0ubuntu1 if that helps. I tried running it again
against qemu 1.0 (the last one was for qemu 0.14), still contained a bunch of
[FAILED TO PARSE] messages.

Attaching the output you requested separately.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #9 from madenginee...@gmail.com  2012-02-16 07:57:35 ---
Created an attachment (id=72397)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=72397)
Register state and code disassembly at failure point with qemu 1.0

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html