date:20090616

Re: slow guest performance with build load, looking for ideas

2009-06-16 Thread Michael Tokarev


Avi Kivity wrote:

On 06/15/2009 06:25 PM, Michael Tokarev wrote:

Avi Kivity wrote:

On 06/15/2009 05:15 PM, Erik Jacobson wrote:

[]

So if I understand what you're saying: best not to use kvm guests for build
servers with pre-Nehalem processors.


pre-Nehalem / pre-Barcelona, > 4 vcpus, yes.


How about 2 vcpus, and how about AMD processors ?


2 vcpus (or 4) should be fine.  AMD processors (Barcelona+) would be 
good for any number of vcpus.


Hmm.. that's sorta good (not so good for owners of most
Intel CPUs -- Nehalem just started its life).  But still
confusing.  Namely, 2..4 vcpus per GUEST or HOST -- for
the ore-Nehalem ones? :)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test

2009-06-16 Thread Michael Goldish

The shutdown test logs into a VM and sends a shutdown command.
It serves two purposes:
- Test KVM's ability to shut down.
- Clean up after the other tests:
Currently VMs of the last test remain alive when Autotest finishes running.
When one guest finishes testing and another begins, the VM is automatically
shut down by the preprocessor because the QEMU command required for the next
guest differs from that of the guest that just finished.  In the case of the
final guest this doesn't happen because no guest follows it, so the preprocessor
must be explicitly instructed to kill the VM.
However, we have no easy way of knowing which test runs last because the user
usually selects a subset of the tests/guests.
The addition of a shutdown test can be a decent solution to this small problem:
by convention the shutdown test will always be the last to run, and if users
wish to clean up after the tests, they must select the shutdown test.

Note: it is beneficial to allow users to leave the VMs of the last test running
because it saves time when developing and testing tests. A test writer can run
the test once on a VM, and when the test exits, make some modifications to its
code and re-run it on the same living VM, and repeat this procedure without
having to shutdown/boot the VM every time.

Signed-off-by: Michael Goldish 
---
 client/tests/kvm/kvm.py   |1 +
 client/tests/kvm/kvm_tests.py |   37 +
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
index 9428162..aa727da 100644
--- a/client/tests/kvm/kvm.py
+++ b/client/tests/kvm/kvm.py
@@ -48,6 +48,7 @@ class kvm(test.test):
 "steps":test_routine("kvm_guest_wizard", "run_steps"),
 "stepmaker":test_routine("stepmaker", "run_stepmaker"),
 "boot": test_routine("kvm_tests", "run_boot"),
+"shutdown": test_routine("kvm_tests", "run_shutdown"),
 "migration":test_routine("kvm_tests", "run_migration"),
 "yum_update":   test_routine("kvm_tests", "run_yum_update"),
 "autotest": test_routine("kvm_tests", "run_autotest"),
diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
index ffe9116..4c9653f 100644
--- a/client/tests/kvm/kvm_tests.py
+++ b/client/tests/kvm/kvm_tests.py
@@ -57,6 +57,43 @@ def run_boot(test, params, env):
 session.close()
 
 
+def run_shutdown(test, params, env):
+"""
+KVM shutdown test:
+1) Log into a guest
+2) Send a shutdown command to the guest
+3) Wait until it's down
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment
+"""
+vm = kvm_utils.env_get_vm(env, params.get("main_vm"))
+if not vm:
+raise error.TestError("VM object not found in environment")
+if not vm.is_alive():
+raise error.TestError("VM seems to be dead; Test requires a living VM")
+
+logging.info("Waiting for guest to be up...")
+
+session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
+if not session:
+raise error.TestFail("Could not log into guest")
+
+logging.info("Logged in")
+
+# Send the VM's shutdown command
+session.sendline(vm.get_params().get("cmd_shutdown"))
+session.close()
+
+logging.info("Shutdown command sent; waiting for guest to go down...")
+
+if not kvm_utils.wait_for(vm.is_dead, 120, 0, 1):
+raise error.TestFail("Guest refuses to go down")
+
+logging.info("Guest is down")
+
+
 def run_migration(test, params, env):
 """
 KVM migration test:
-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 2/2] KVM test: include shutdown test in kvm_tests.cfg.sample

2009-06-16 Thread Michael Goldish

The shutdown test should always be last -- new tests will be added above it.

If the user chooses to run all tests on a guest, the guest will be shut down
by the final shutdown test.  If the user selects a specific subset of the tests,
the guest will shut down if the shutdown test is included in this subset.
Note that this only applies to the last guest of the job; since currently all
guests use the same VM object (vm1), the VM is shut down automatically when
it's passed from one guest to the next.

In short, standard test sets (weekly, daily, per-release) should include the
shutdown test.  In particular, tests running from the server should include it.
Custom test sets run in client mode, intended for testing/developing a specific
test or a set of tests, may run without the shutdown test.

The shutdown test is defined with 'kill_vm = yes', so if the test fails to shut
the guest down nicely, the postprocessor will take care of it.

Also added 'kill_vm_on_error = yes' to 'setup' tests, because if a setup test
fails, shutdown will not run, so the setup test must take care of shutting the
VM down.  Install tests already have this defined.

Signed-off-by: Michael Goldish 
---
 client/tests/kvm/kvm_tests.cfg.sample |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 54c9975..412c4b0 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -41,6 +41,7 @@ variants:
 fail_if_stuck_for = 300
 stuck_detection_history = 2
 keep_screendump_history = yes
+kill_vm_on_error = yes
 
 - boot: install setup
 type = boot
@@ -83,6 +84,12 @@ variants:
 - linux_s3:  install setup
 type = linux_s3
 
+- shutdown:  install setup
+type = shutdown
+kill_vm = yes
+kill_vm_gracefully = no
+
+
 # NICs
 variants:
 - @rtl8139:
@@ -544,7 +551,7 @@ variants:
 only default
 only up
 only Fedora.8.32
-only install setup boot
+only install setup boot shutdown
 only rtl8139
 - @sample1:
 only qcow2
-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: slow guest performance with build load, looking for ideas

2009-06-16 Thread Avi Kivity


On 06/16/2009 10:03 AM, Michael Tokarev wrote:
So if I understand what you're saying: best not to use kvm guests 
for build

servers with pre-Nehalem processors.


pre-Nehalem / pre-Barcelona, > 4 vcpus, yes.


How about 2 vcpus, and how about AMD processors ?


2 vcpus (or 4) should be fine.  AMD processors (Barcelona+) would be 
good for any number of vcpus.

[]

Hmm.. that's sorta good (not so good for owners of most
Intel CPUs -- Nehalem just started its life).  But still
confusing.  Namely, 2..4 vcpus per GUEST or HOST -- for
the ore-Nehalem ones? :)


4 vcpus per guest would be fine (even more should work, depending on 
workload).  Host will scale with any number of cpus.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Avi Kivity


On 06/15/2009 09:44 PM, Blue Swirl wrote:

  pc-qemu-0.10?
 


pc-2009.06? Or given the hardware, should that be pc-1997?
   


pc-qemu-0.10 has the obvious benefit of allowing people to immediately 
know what's the oldest version of qemu that supports it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity


On 06/15/2009 11:33 PM, Anthony Liguori wrote:

The basic issue is that:

migrate_fd_put_ready():bdrv_flush_all();

Does:

block.c:

foreach block driver:
  drv->flush(bs);

Which in the case of raw, is just fsync(s->fd).

Any submitted request is not queued or flushed which will lead to the 
request being dropped after the live migration.


Is anyone working on fixing this? 


Not to my knowledge

Does anyone have a clever idea how to fix this without just waiting 
for all IO requests to complete?


What's wrong with waiting for requests to complete?  It should take a 
few tens of milliseconds.


We could start throttling requests late in the live stage, but I don't 
really see the point.


Isn't virtio migration currently broken due to the qdev changes?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity


On 06/16/2009 12:10 PM, Avi Kivity wrote:
Does anyone have a clever idea how to fix this without just waiting 
for all IO requests to complete?


What's wrong with waiting for requests to complete?  It should take a 
few tens of milliseconds.


We could start throttling requests late in the live stage, but I don't 
really see the point.


Well, we can introduce a new live stage, where we migrate RAM and 
complete block requests, but the vm is otherwise stopped.  This allows 
the flush to overlap with sending memory dirtied by the flush, reducing 
some downtime.  Since block requests can dirty large amounts of memory, 
this may be significant.


We can even keep the vcpu alive, only blocking new block requests, but 
that may cause dirty RAM divergence.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] move _PR to SSDT v2

2009-06-16 Thread Gleb Natapov

On Fri, Jun 12, 2009 at 03:56:04PM +0200, Jes Sorensen wrote:
> Hi,
>
> I figured out why Windows wasn't booting with the previous version
> of the patch. It seems it didn't like the forward declaration of the
> _PR.PRSC method.
>
> Instead I found that it does seem to work if I add the calling method
> _GRE.L02 in the SSDT, while the rest of the _GRE is declared in the
> DSDT.
>
> It boots, it's perfect!
>
Works for me with windows XP/7, but can you add a comment in DSDT where
_GRE.L02 should be why it is missing and where to find it.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] move _PR to SSDT v2

2009-06-16 Thread Jes Sorensen


On 06/16/2009 11:50 AM, Gleb Natapov wrote:

Works for me with windows XP/7, but can you add a comment in DSDT where
_GRE.L02 should be why it is missing and where to find it.



Sure, here's an updated version with the comment. Thanks for testing it.

Cheers,
Jes
Move _PR block from the DSDT to a new SSDT in the KVM BIOS. As AML,
or at least Windows doesn't allow forward declarations of Methods,
it is necessary to declare the _GPE._L02 Method in the SSDT as well.

This will make it possible to plug in different SSDTs with different
processor counts in the future.

Signed-off-by: Jes Sorensen 

---
 kvm/bios/Makefile  |   12 +++-
 kvm/bios/acpi-dsdt.dsl |  110 +-
 kvm/bios/acpi-ssdt.dsl |  140 +
 kvm/bios/rombios32.c   |   16 -
 4 files changed, 166 insertions(+), 112 deletions(-)

Index: qemu-kvm/kvm/bios/Makefile
===
--- qemu-kvm.orig/kvm/bios/Makefile
+++ qemu-kvm/kvm/bios/Makefile
@@ -71,7 +71,7 @@ bios: biossums BIOS-bochs-latest BIOS-bo
 clean:
 	rm -f  *.o *.a *.s _rombios*_.c rombios*.txt rombios*.sym
 	rm -f  usage biossums rombios16.bin
-	rm -f  rombios32.bin rombios32.out acpi-dsdt.hex
+	rm -f  rombios32.bin rombios32.out acpi-dsdt.hex acpi-ssdt.hex
 
 dist-clean: clean
 	rm -f  Makefile
@@ -108,13 +108,19 @@ rombios32.bin: rombios32.out rombios.h
 rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld
 	ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o
 
-rombios32.o: rombios32.c acpi-dsdt.hex
+rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex
 	$(GCC) -m32 -O2 -Wall -c -o $@ $<
 
 acpi-dsdt.hex: acpi-dsdt.dsl
 	cpp -P $< $<.i
 	iasl -tc -p $@ $<.i
-	sed -i -e's/^unsigned/const unsigned/' $@
+	sed -i -e's/^unsigned char AmlCode/const unsigned char DSDTCode/' $@
+	rm $<.i
+
+acpi-ssdt.hex: acpi-ssdt.dsl
+	cpp -P $< $<.i
+	iasl -tc -p $@ $<.i
+	sed -i -e's/^unsigned char AmlCode/const unsigned char SSDTCode/' $@
 	rm $<.i
 
 rombios32start.o: rombios32start.S
Index: qemu-kvm/kvm/bios/acpi-dsdt.dsl
===
--- qemu-kvm.orig/kvm/bios/acpi-dsdt.dsl
+++ qemu-kvm/kvm/bios/acpi-dsdt.dsl
@@ -25,108 +25,6 @@ DefinitionBlock (
 0x1 // OEM Revision
 )
 {
-   Scope (\_PR)
-   {
-	/* pointer to first element of MADT APIC structures */
-	OperationRegion(ATPR, SystemMemory, 0x0514, 4)
-	Field (ATPR, DwordAcc, NoLock, Preserve)
-	{
-		ATP, 32
-	}
-
-#define madt_addr(nr)  Add (ATP, Multiply(nr, 8))
-
-#define gen_processor(nr, name) \
-	Processor (CPU##name, nr, 0xb010, 0x06) {   \
-	OperationRegion (MATR, SystemMemory, madt_addr(nr), 8)  \
-	Field (MATR, ByteAcc, NoLock, Preserve) \
-	{   \
-	MAT, 64 \
-	}   \
-	Field (MATR, ByteAcc, NoLock, Preserve) \
-	{   \
-	Offset(4),  \
-	FLG, 1  \
-	}   \
-Method(_MAT, 0) {   \
-		Return(MAT) \
-}   \
-Method (_STA) { \
-If (FLG) { Return(0xF) } Else { Return(0x9) }   \
-}   \
-}   \
-
-
-	gen_processor(0, 0)
-	gen_processor(1, 1)
-	gen_processor(2, 2)
-	gen_processor(3, 3)
-	gen_processor(4, 4)
-	gen_processor(5, 5)
-	gen_processor(6, 6)
-	gen_processor(7, 7)
-	gen_processor(8, 8)
-	gen_processor(9, 9)
-	gen_processor(10, A)
-	gen_processor(11, B)
-	gen_processor(12, C)
-	gen_processor(13, D)
-	gen_processor(14, E)
-
-	Method (NTFY, 2) {
-#define gen_ntfy(nr)\
-	If (LEqual(Arg0, 0x##nr)) { \
-		If (LNotEqual(Arg1, \_PR.CPU##nr.FLG)) {\
-			Store (Arg1, \_PR.CPU##nr.FLG)  \
-			If (LEqual(Arg1, 1)) {  \
-Notify(CPU##nr, 1)  \
-			} Else {\
-Notify(CPU##nr, 3)  \
-			}   \
-		}   \
-	}
-		gen_ntfy(0)
-		gen_ntfy(1)
-		gen_ntfy(2)
-		gen_ntfy(3)
-		gen_ntfy(4)
-		gen_ntfy(5)
-		gen_ntfy(6)
-		gen_ntfy(7)
-		gen_ntfy(8)
-		gen_ntfy(9)
-		gen_ntfy(A)
-		gen_ntfy(B)
-		gen_ntfy(C

Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test

2009-06-16 Thread Alexey Eromenko


Michael I don't fully understand why shutdown test is needed; Shutdown is 
tested during reboot, where GuestOS must de-init itself.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 1/2] KVM test: optionally convert PPM files to PNG format after test

2009-06-16 Thread Michael Goldish

This is intended to save disk space.  Requires ImageMagick (uses mogrify).

To enable:
convert_ppm_files_to_png = yes

To enable only for failed tests:
convert_ppm_files_to_png_on_error = yes

Reminder: by default PPM files are removed after the test (and after the
conversion, if requested).  To keep them specify:
keep_ppm_files = yes

To keep them only for failed tests:
keep_ppm_files_on_error = yes

A reasonable choice would be to keep PNG files for failed tests, and never
keep PPM files.  To do this specify
convert_ppm_files_to_png_on_error = yes
without specifying 'keep_ppm_files = yes' or 'keep_ppm_files_on_error = yes'
(or explicitly set to 'no', if it was set to 'yes' on a higher level in the
config hierarchy).

This patch also makes small cosmetic changes to the 'keep_ppm_files' feature.

Signed-off-by: Michael Goldish 
---
 client/tests/kvm/kvm_preprocessing.py |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 7fac46a..26e5210 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -223,11 +223,19 @@ def postprocess(test, params, env):
 """
 process(test, params, env, postprocess_image, postprocess_vm)
 
-# See if we should get rid of all PPM files
-if not params.get("keep_ppm_files") == "yes":
-# Remove them all
+# Should we convert PPM files to PNG format?
+if params.get("convert_ppm_files_to_png") == "yes":
+logging.debug("'convert_ppm_files_to_png' specified; converting PPM"
+  " files to PNG format...")
+mogrify_cmd = ("mogrify -format png %s" %
+   os.path.join(test.debugdir, "*.ppm"))
+kvm_subprocess.run_fg(mogrify_cmd, logging.debug, "(mogrify) ",
+  timeout=30.0)
+
+# Should we keep the PPM files?
+if params.get("keep_ppm_files") != "yes":
 logging.debug("'keep_ppm_files' not specified; removing all PPM files"
-  " from results dir...")
+  " from debug dir...")
 rm_cmd = "rm -vf %s" % os.path.join(test.debugdir, "*.ppm")
 kvm_subprocess.run_fg(rm_cmd, logging.debug, "(rm) ", timeout=5.0)
 
-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH 2/2] KVM test: kvm_tests.cfg.sample: convert PPM files to PNG by default

2009-06-16 Thread Michael Goldish

By default, always remove PPM files, and keep PNG files only for failed tests.
This shouldn't do much harm, because while PPMs can be incorporated directly
into step files, PNGs can be converted back to PPMs easily, and take less disk
space.
(PNG is a lossless compression format.)

The 'keep_ppm_files' and 'keep_ppm_files_on_error' settings are commented out
so the user can easily enable them if desired.

Signed-off-by: Michael Goldish 
---
 client/tests/kvm/kvm_tests.cfg.sample |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 412c4b0..2564e16 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -8,8 +8,9 @@ main_vm = vm1
 
 # Some preprocessor/postprocessor params
 start_vm = yes
-keep_ppm_files = no
-keep_ppm_files_on_error = yes
+convert_ppm_files_to_png_on_error = yes
+#keep_ppm_files = yes
+#keep_ppm_files_on_error = yes
 kill_vm = no
 kill_vm_gracefully = yes
 
-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test

2009-06-16 Thread Michael Goldish

- "Alexey Eromenko"  wrote:

> Michael I don't fully understand why shutdown test is needed; Shutdown
> is tested during reboot, where GuestOS must de-init itself.

The main motivation for a shutdown test is that it allows us to choose
whether VMs should be kept alive after the last test.  Including this test
in a job takes no more time than running the job without it, because
normally the preprocessor automatically shuts down guests (except for the
last one).

There's also a small difference between a complete shutdown and a reboot --
a shutdown is supposed to close the qemu process.  I'm not sure this
difference justifies the test, but the reason mentioned in the previous
paragraph does, in my opinion.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: x86: verify MTRR/PAT validity

2009-06-16 Thread Marcelo Tosatti


Do not allow invalid MTRR/PAT values in set_msr_mtrr.

Please review carefully.

Signed-off-by: Marcelo Tosatti 


Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -722,11 +722,53 @@ static bool msr_mtrr_valid(unsigned msr)
return false;
 }
 
+static unsigned mtrr_types[] = {0, 1, 4, 5, 6};
+static unsigned pat_types[] = {0, 1, 4, 5, 6, 7};
+
+static bool valid_mt(unsigned type, int len, unsigned array[len])
+{
+   int i;
+
+   for (i = 0; i < len; i++)
+   if (type == array[i])
+   return true;
+
+   return false;
+}
+
+#define valid_pat_type(a) valid_mt(a, ARRAY_SIZE(pat_types), pat_types)
+#define valid_mtrr_type(a) valid_mt(a, ARRAY_SIZE(mtrr_types), mtrr_types)
+
+static bool mtrr_valid(u32 msr, u64 data)
+{
+   int i;
+
+   if (!msr_mtrr_valid(msr))
+   return false;
+
+   if (msr == MSR_IA32_CR_PAT) {
+   for (i = 0; i < 8; i++)
+   if (!valid_pat_type((data >> (i * 8)) & 0xff))
+   return false;
+   return true;
+   } else if (msr == MSR_MTRRdefType) {
+   return valid_mtrr_type(data & 0xff);
+   } else if (msr >= MSR_MTRRfix64K_0 && msr <= MSR_MTRRfix4K_F8000) {
+   for (i = 0; i < 8 ; i++)
+   if (!valid_mtrr_type((data >> (i * 8)) & 0xff))
+   return false;
+   return true;
+   }
+
+   /* variable MTRRs, physmaskn have bits 0-10 reserved */
+   return valid_mtrr_type(data & 0xff);
+}
+
 static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
 
-   if (!msr_mtrr_valid(msr))
+   if (!mtrr_valid(msr, data))
return 1;
 
if (msr == MSR_MTRRdefType) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Mark McLoughlin

On Mon, 2009-06-15 at 13:12 -0500, Anthony Liguori wrote:
> Mark McLoughlin wrote:
> > So long as the restrictions would be known to the management app via
> > some "what slots are available" mechanism in qemu, that sounds fine.
> >   
> 
> I'm not sure a "what slots are available" mechanism is as straight 
> forward as has been claimed.

If qemu can't provide that information, then the management app does not
have sufficient information to do the slot allocation itself. In which
case, it must leave it up to qemu to do it.

> It doesn't matter though because it's orthogonal to the current proposal.

It is not orthogonal to solving the actual problem at hand, though -
i.e. how to allow management apps to provide stable PCI addresses.

> >>> I'm not at all arguing against pci_addr.  I'm arguing about how 
> >>> libvirt should use it with respect to the "genesis" use-case where 
> >>> libvirt has no specific reason to choose one PCI slot over another.  
> >>> In that case, I'm merely advocating that we want to let QEMU make the 
> >>> decision.
> >>>   
> >> However this may end up, isn't it offtopic?  Whatever we do we have to 
> >> support both pci_addr= and default placement, so we can push this 
> >> discussion to livirt-devel and bid them godspeed.
> >> 
> >
> > Presumably you're not proposing that qemu-devel completely ignore the
> > typical requirements of management apps?
> >   
> 
> This is a happy case where the current proposals allow both usages to 
> occur.  Which one libvirt chooses it up to it.
> 
> To summarize, I think we have:
> 
> 1) Introduce addressing to all host device configurations
>   - Either in the canonical form "pci_addr=bus:dev.fn or target=3,lun=1" 
> or in flattened form "addr=bus:dev.fn or addr=target.lun".  I prefer the 
> later form but I think either would be acceptable.

That helps, but it's not enough on its own.

The management app needs to figure out what addresses to pass either by:

   a) Initially allowing qemu to do the address allocation, and 
  thereafter using those addresses - this requires some way to query
  the addresses of devices

or b) Doing the initial address allocation itself - this requires some 
  way to query what slots are available.

> 2) Whenever the default machine type changes in a guest-visible way, 
> introduce a new machine type
>   - Use explicit versions in name: pc-v1, pc-v2 or use more descriptive 
> names pc-with-usb
>   - Easily transitions to device config files

To be clear - you're not proposing this is a solution to the "stable PCI
addresses" problem, are you? The main requirement is for the addresses
to stay stable even if the user adds/removes other devices.

This is a fine solution to the "stable guest ABI" problem ... assuming
there's some way of querying the current default machine type.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Avi Kivity


On 06/16/2009 03:14 PM, Mark McLoughlin wrote:

On Mon, 2009-06-15 at 13:12 -0500, Anthony Liguori wrote:
   

Mark McLoughlin wrote:
 

So long as the restrictions would be known to the management app via
some "what slots are available" mechanism in qemu, that sounds fine.

   

I'm not sure a "what slots are available" mechanism is as straight
forward as has been claimed.
 


If qemu can't provide that information, then the management app does not
have sufficient information to do the slot allocation itself. In which
case, it must leave it up to qemu to do it.
   


A given -M machine will have well-known open slots (since it's an ABI), 
same as it has rtl8139 and ne2000 cards.  Worst case we hardcode those 
numbers (gasp, faint).



It doesn't matter though because it's orthogonal to the current proposal.
 


It is not orthogonal to solving the actual problem at hand, though -
i.e. how to allow management apps to provide stable PCI addresses.
   


It's part of the solution, but hardly a difficult the most difficult part.


This is a fine solution to the "stable guest ABI" problem ... assuming
there's some way of querying the current default machine type.
   


$ qemu -print-default-machine

or maybe

$ qemu -show default-machine
$ qemu -show pci-bus
$ qemu -show me a way out

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Mark McLoughlin

On Tue, 2009-06-16 at 15:28 +0300, Avi Kivity wrote:
> On 06/16/2009 03:14 PM, Mark McLoughlin wrote:
> > On Mon, 2009-06-15 at 13:12 -0500, Anthony Liguori wrote:
> >
> >> Mark McLoughlin wrote:
> >>  
> >>> So long as the restrictions would be known to the management app via
> >>> some "what slots are available" mechanism in qemu, that sounds fine.
> >>>
> >>>
> >> I'm not sure a "what slots are available" mechanism is as straight
> >> forward as has been claimed.
> >>  
> >
> > If qemu can't provide that information, then the management app does not
> > have sufficient information to do the slot allocation itself. In which
> > case, it must leave it up to qemu to do it.
> >
> 
> A given -M machine will have well-known open slots (since it's an ABI), 
> same as it has rtl8139 and ne2000 cards.

If they're so obviously well-known, I don't see how the query mechanism
would not be straightforward, which is the comment I was replying to.

> Worst case we hardcode those numbers (gasp, faint).

Maybe we can just add the open slots to the -help output. That'd be nice
and clean.

> >> It doesn't matter though because it's orthogonal to the current proposal.
> >>  
> >
> > It is not orthogonal to solving the actual problem at hand, though -
> > i.e. how to allow management apps to provide stable PCI addresses.
> >
> 
> It's part of the solution, but hardly a difficult the most difficult part.

Agree.

> > This is a fine solution to the "stable guest ABI" problem ... assuming
> > there's some way of querying the current default machine type.
> >
> 
>  $ qemu -print-default-machine

Or:

  $ readlink /usr/share/qemu/machine-types/pc.dt

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] KVM: convert custom marker based tracing to event traces

2009-06-16 Thread Marcelo Tosatti

On Mon, Jun 15, 2009 at 04:03:10PM -0400, Steven Rostedt wrote:
> 
> On Mon, 15 Jun 2009, Marcelo Tosatti wrote:
> 
> > This allows use of the powerful ftrace infrastructure.
> > 
> > See Documentation/trace/ for usage information.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > Index: kvm/arch/x86/kvm/svm.c
> > ===
> > --- kvm.orig/arch/x86/kvm/svm.c
> > +++ kvm/arch/x86/kvm/svm.c
> > @@ -29,6 +29,9 @@
> >  #include 
> >  
> >  #include 
> > +#include "svm-trace.h"
> > +#define CREATE_TRACE_POINTS
> > +#include 
> >  
> >  #define __ex(x) __kvm_handle_fault_on_reboot(x)
> >  
> 
> 
> > Index: kvm/arch/x86/kvm/vmx.c
> > ===
> > --- kvm.orig/arch/x86/kvm/vmx.c
> > +++ kvm/arch/x86/kvm/vmx.c
> > @@ -34,6 +34,10 @@
> >  #include 
> >  #include 
> >  
> > +#include "vmx-trace.h"
> > +#define CREATE_TRACE_POINTS
> > +#include 
> > +
> >  #define __ex(x) __kvm_handle_fault_on_reboot(x)
> >  
> >  MODULE_AUTHOR("Qumranet");
> 
> > Index: kvm/arch/x86/kvm/x86.c
> > ===
> > --- kvm.orig/arch/x86/kvm/x86.c
> > +++ kvm/arch/x86/kvm/x86.c
> > @@ -37,6 +37,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#define CREATE_TRACE_POINTS
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -347,9 +349,6 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0);
> 
> 
> > Index: kvm/include/trace/events/kvm/x86-arch.h
> > ===
> > --- /dev/null
> > +++ kvm/include/trace/events/kvm/x86-arch.h
> 
> 
> One suggestion. Instead of putting in arch specific trace points into 
> generic code, you can put these into arch/x86/kvm/trace.h ?
> 
> Then you can in the Makefile add:
> 
> CFLAGS_x86.o := -I.
> CFLAGS_svm.o := -I.
> CFLAGS_vmx.o := -I.
> 
> Or better yet, have a single file called trace.c:
> 
> in the Makefile:
> CFLAGS_trace.o := -I.
> obj-$(EVENT_TRACING) += trace.o
> 
> in trace.c:
> 
> #define CREATE_TRACE_POINTS
> #include "trace.h"
> #include "trace-arch.h"
> 
> 
> Then have the kvm/x86.h moved to trace.h
> and the kvm/arch-x86.h move to trace-arch.h
> 
> Just change the "TRACE_INCLUDE_FILE" to include the proper name.
> 
> -- Steve

Similar to http://patchwork.kernel.org/patch/23829/, but moving the
x86 tracepoint definitions to arch/x86/kvm/ ?

I thought the point of include/trace/ was to have all tracepoints
definitions in a single directory?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live migration broken when under heavy IO

2009-06-16 Thread Anthony Liguori


Avi Kivity wrote:
Does anyone have a clever idea how to fix this without just waiting 
for all IO requests to complete?


What's wrong with waiting for requests to complete?  It should take a 
few tens of milliseconds.


An alternative would be to attempt to cancel the requests.  This incurs 
no non-deterministic latency.


The tricky bit is that this has to happen at the device layer because 
the opaques cannot be saved in a meaningful way.


We could start throttling requests late in the live stage, but I don't 
really see the point.


Isn't virtio migration currently broken due to the qdev changes?




--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Avi Kivity


On 06/16/2009 03:39 PM, Mark McLoughlin wrote:


Worst case we hardcode those numbers (gasp, faint).
 


Maybe we can just add the open slots to the -help output. That'd be nice
and clean.
   


Yeah, there's precedent too.


Or:

   $ readlink /usr/share/qemu/machine-types/pc.dt

   


That works if you have exactly one qemu installed.  It's best if qemu 
itself is the entry point (qemu -print-device-tree).


Though I wouldn't want to inflict it upon the management application 
writers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity


On 06/16/2009 03:50 PM, Anthony Liguori wrote:

Avi Kivity wrote:
Does anyone have a clever idea how to fix this without just waiting 
for all IO requests to complete?


What's wrong with waiting for requests to complete?  It should take a 
few tens of milliseconds.


An alternative would be to attempt to cancel the requests.  This 
incurs no non-deterministic latency.


Yes, that's even better (though without linux-aio, it's equivalent).



The tricky bit is that this has to happen at the device layer because 
the opaques cannot be saved in a meaningful way.




Do you mean the device has to record all cancelled requests and replay 
them?  I think we can do it at the block layer (though we have to avoid 
it for nested requests).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live migration broken when under heavy IO

2009-06-16 Thread Anthony Liguori


Avi Kivity wrote:

Yes, that's even better (though without linux-aio, it's equivalent).


Not absolutely equivalent.  There many be queued requests that haven't 
yet been dispatched to the thread pool, but yeah, I understand what you 
mean.




The tricky bit is that this has to happen at the device layer because 
the opaques cannot be saved in a meaningful way.




Do you mean the device has to record all cancelled requests and replay 
them?  I think we can do it at the block layer (though we have to 
avoid it for nested requests).


In order to complete the requests, you have to call a callback and pass 
an opaque with the results.  The callback/opaque cannot be saved in the 
block layer in a meaningful way.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity


On 06/16/2009 03:57 PM, Anthony Liguori wrote:


The tricky bit is that this has to happen at the device layer 
because the opaques cannot be saved in a meaningful way.




Do you mean the device has to record all cancelled requests and 
replay them?  I think we can do it at the block layer (though we have 
to avoid it for nested requests).


In order to complete the requests, you have to call a callback and 
pass an opaque with the results.  The callback/opaque cannot be saved 
in the block layer in a meaningful way.




You're right, of course.  I guess we'll have to cancel any near term 
cancellation plans.


We could change the opaque to be something pre-registered (e.g. the 
device state object, which we don't need to save/restore) and pass in 
addition an integer request tag.  These would be migratable.  The device 
would be responsible for saving tags and their associated information 
(perhaps through a common API).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] add sysenter/syscall emulation for 32bit compat mode

2009-06-16 Thread Andre Przywara

sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas
syscall is not supported on Intel's 32bit compat mode. To allow cross
vendor migration we emulate the missing instructions by setting up the
processor state accordingly.
The sysenter code was originally sketched by Amit Shah, it was completed,
debugged,  syscall added and made-to-work by Christoph Egger and polished
up by Andre Przywara.
Please note that sysret does not need to be emulated, because it will be
exectued in 64bit mode and returning to 32bit compat mode works on Intel.

Signed-off-by: Amit Shah 
Signed-off-by: Christoph Egger 
Signed-off-by: Andre Przywara 
---
 arch/x86/kvm/x86.c |   37 ++-
 arch/x86/kvm/x86_emulate.c |  228 +++-
 2 files changed, 259 insertions(+), 6 deletions(-)

This is a reworked version of my earlier patch. I moved the former three
case: parts into one separate function for better readability and 
understanding.
Please apply!

Thanks,
Andre.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6025e5b..9066fb3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2635,11 +2635,38 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
/* Reject the instructions other than VMCALL/VMMCALL when
 * try to emulate invalid opcode */
c = &vcpu->arch.emulate_ctxt.decode;
-   if ((emulation_type & EMULTYPE_TRAP_UD) &&
-   (!(c->twobyte && c->b == 0x01 &&
- (c->modrm_reg == 0 || c->modrm_reg == 3) &&
-  c->modrm_mod == 3 && c->modrm_rm == 1)))
-   return EMULATE_FAIL;
+
+   if (emulation_type & EMULTYPE_TRAP_UD) {
+   if (!c->twobyte)
+   return EMULATE_FAIL;
+   switch (c->b) {
+   case 0x01: /* VMMCALL */
+   if (c->modrm_mod != 3)
+   return EMULATE_FAIL;
+   if (c->modrm_rm != 1)
+   return EMULATE_FAIL;
+   break;
+   case 0x34: /* sysenter */
+   case 0x35: /* sysexit */
+   if (c->modrm_mod != 0)
+   return EMULATE_FAIL;
+   if (c->modrm_rm != 0)
+   return EMULATE_FAIL;
+   break;
+   case 0x05: /* syscall */
+   r = 0;
+   if (c->modrm_mod != 0)
+   return EMULATE_FAIL;
+   if (c->modrm_rm != 0)
+   return EMULATE_FAIL;
+   break;
+   default:
+   return EMULATE_FAIL;
+   }
+
+   if (!(c->modrm_reg == 0 || c->modrm_reg == 3))
+   return EMULATE_FAIL;
+   }
 
++vcpu->stat.insn_emulation;
if (r)  {
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 22c765d..6c65462 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#include "mmu.h"
+
 /*
  * Opcode effective-address decode tables.
  * Note that we only emulate instructions that have at least one memory
@@ -217,7 +219,9 @@ static u32 twobyte_table[256] = {
ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x30 - 0x3F */
-   ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+   ImplicitOps, 0, ImplicitOps, 0,
+   ImplicitOps, ImplicitOps, 0, 0,
+   0, 0, 0, 0, 0, 0, 0, 0,
/* 0x40 - 0x47 */
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -320,8 +324,11 @@ static u32 group2_table[] = {
 };
 
 /* EFLAGS bit definitions. */
+#define EFLG_VM (1<<17)
+#define EFLG_RF (1<<16)
 #define EFLG_OF (1<<11)
 #define EFLG_DF (1<<10)
+#define EFLG_IF (1<<9)
 #define EFLG_SF (1<<7)
 #define EFLG_ZF (1<<6)
 #define EFLG_AF (1<<4)
@@ -1390,6 +1397,207 @@ void toggle_interruptibility(struct x86_emulate_ctxt 
*ctxt, u32 mask)
ctxt->interruptibility = mask;
 }
 
+#define EMUL_SYSENTER 1
+#define EMUL_SYSEXIT 2
+#define EMUL_SYSCALL 3
+
+static int
+emulate_syscalls(struct x86_emulate_ctxt *ctxt, int ins)
+{
+   struct decode_cache *c = &ctxt->decode;
+   struct kvm_segment cs, ss;
+   unsigned long cr0;
+   u64 msr_data;
+   int usermode;
+
+   /* inject #UD if LOCK prefix is used */
+   if (c->lock_prefix)
+   return -1;
+
+   /* inject

[PATCH] remove ppc functions from callbacks

2009-06-16 Thread Glauber Costa

handle_powerpc_dcr_read() and handle_powerpc_dcr_write() are two
powerpc specific functions that are called via libkvm callbacks.
However, grepping the source code finds simply no use of them. This
is probably due to the fact that powerpc now relies on a totally
qemu upstream implementation of kvm, and does not need it anymore.

Anyway, I'm providing this patch separatedly, so that if it breaks
for whenever reason, we can identify a bisection point easily

Signed-off-by: Glauber Costa 
CC: Hollis Blanchard 
---
 libkvm-all.h |4 
 qemu-kvm.c   |4 
 qemu-kvm.h   |5 -
 3 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/libkvm-all.h b/libkvm-all.h
index 4f7b9a3..47855be 100644
--- a/libkvm-all.h
+++ b/libkvm-all.h
@@ -156,10 +156,6 @@ struct kvm_callbacks {
 void (*post_kvm_run)(void *opaque, void *env);
 int (*pre_kvm_run)(void *opaque, void *env);
 int (*tpr_access)(void *opaque, kvm_vcpu_context_t vcpu, uint64_t rip, int 
is_write);
-#if defined(__powerpc__)
-int (*powerpc_dcr_read)(kvm_vcpu_context_t vcpu, uint32_t dcrn, uint32_t 
*data);
-int (*powerpc_dcr_write)(kvm_vcpu_context_t vcpu, uint32_t dcrn, uint32_t 
data);
-#endif
 #if defined(__s390__)
 int (*s390_handle_intercept)(kvm_context_t context, kvm_vcpu_context_t 
vcpu,
struct kvm_run *run);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index eefca61..61f5a19 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2278,10 +2278,6 @@ static struct kvm_callbacks qemu_kvm_ops = {
 #ifdef TARGET_I386
 .tpr_access = handle_tpr_access,
 #endif
-#ifdef TARGET_PPC
-.powerpc_dcr_read = handle_powerpc_dcr_read,
-.powerpc_dcr_write = handle_powerpc_dcr_write,
-#endif
 .unhandled = handle_unhandled,
 };
 
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 126b8f3..57e8a2f 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -134,11 +134,6 @@ void kvm_remove_ioperm_data(unsigned long start_port, 
unsigned long num);
 void kvm_arch_do_ioperm(void *_data);
 #endif
 
-#ifdef TARGET_PPC
-int handle_powerpc_dcr_read(kvm_vcpu_context_t vcpu, uint32_t dcrn, uint32_t 
*data);
-int handle_powerpc_dcr_write(kvm_vcpu_context_t vcpu,uint32_t dcrn, uint32_t 
data);
-#endif
-
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 #define BITMAP_SIZE(m) (ALIGN(((m)>>TARGET_PAGE_BITS), HOST_LONG_BITS) / 8)
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] KVM: convert custom marker based tracing to event traces

2009-06-16 Thread Steven Rostedt


On Tue, 16 Jun 2009, Marcelo Tosatti wrote:
> > 
> > 
> > One suggestion. Instead of putting in arch specific trace points into 
> > generic code, you can put these into arch/x86/kvm/trace.h ?
> > 
> > Then you can in the Makefile add:
> > 
> > CFLAGS_x86.o := -I.
> > CFLAGS_svm.o := -I.
> > CFLAGS_vmx.o := -I.
> > 
> > Or better yet, have a single file called trace.c:
> > 
> > in the Makefile:
> > CFLAGS_trace.o := -I.
> > obj-$(EVENT_TRACING) += trace.o
> > 
> > in trace.c:
> > 
> > #define CREATE_TRACE_POINTS
> > #include "trace.h"
> > #include "trace-arch.h"
> > 
> > 
> > Then have the kvm/x86.h moved to trace.h
> > and the kvm/arch-x86.h move to trace-arch.h
> > 
> > Just change the "TRACE_INCLUDE_FILE" to include the proper name.
> > 
> > -- Steve
> 
> Similar to http://patchwork.kernel.org/patch/23829/, but moving the
> x86 tracepoint definitions to arch/x86/kvm/ ?

Yes, similar to what Christoph suggested.

> 
> I thought the point of include/trace/ was to have all tracepoints
> definitions in a single directory?
> 

That's not what I heard ;-)

It should only be for core kernel code. If you have trace points that are 
the same on most archs, then sure. But if it is specific to one arch, then 
we are going to end up with arch code scattered through out the kernel 
again. I thought the idea of moving include/asm-x86 into 
arch/x86/include/asm was to consolidate arch code to one location?

Thus, if a trace point is made for one arch, then it should be located in 
that arch.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v7 0/2] iosignalfd

2009-06-16 Thread Gregory Haskins

(Applies to kvm.git/master:c27b64a0)

This is v7 of the series.  For more details, please see the header to
patch 2/2.

This series has been tested against the kvm-eventfd unit test, and
appears to be functioning properly.  You can download this test here:

ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

Please consider for inclusion to kvm.git

[
   Changelog:

  v7:
   *) Implemented a resource limit (CONFIG_KVM_MAX_IOSIGNALFD_ITEMS)
  to limit malicious/broken userspace from consuming arbitrary
  kernel memory.
   *) Rebased to kvm.git/master:c27b64a0, which already includes
  Marcelo's irq-lock rework.

  v6:
   *) Removed "FIXME?" comment on choice over RCU vs SRCU after
  discussion/numbers from Paul.  I think RCU is fine to use for
  now based on the conversation.  We can always convert it later
  if need be.
   *) Fixed the "group" free path to eliminate an RCU related race
   *) Fixed a memory/eventfd leak on shutdown for any iosignalfd's
  which were still active at the time the guest shuts down.
   *) Beefed up comments
   *) Rebased to kvm.git/master:0281e88f + irq locking rework and
  verified that kvm-eventfd unit test still passes.

  v5:
   *) Removed "cookie" field, which was a misunderstanding on my
  part on what Avi wanted for a data-match feature
   *) Added a new "trigger" data-match feature which I think is
  much closer to what we need.
   *) We retain the dev_count field in the io_bus infrastructure
  and instead back-fill the array on removal.
   *) Various minor cleanups
   *) Rebased to kvm.git/master:25deed73

  v4:
   *) Fixed a bug in the original 2/4 where the PIT failure case
  would potentially leave the io_bus components registered.
   *) Condensed the v3 2/4 and 3/4 into one patch (2/2) since
  the patches became interdependent with the fix described above
   *) Rebased to kvm.git/master:74dfca0a

  v3:
   *) fixed patch 2/4 to handle error cases instead of BUG_ON
   *) implemented same HAVE_EVENTFD protection mechanism as
  irqfd to prevent compilation errors on unsupported arches
   *) completed testing
   *) rebased to kvm.git/master:7391a6d5

  v2:
   *) added optional data-matching capability (via cookie field)
   *) changed name from iofd to iosignalfd
   *) added io_bus unregister function
   *) implemented deassign feature

  v1:
   *) original release (integrated into irqfd v7 series as "iofd")
]

---

Gregory Haskins (2):
  KVM: add iosignalfd support
  KVM: make io_bus interface more robust


 arch/x86/kvm/Kconfig  |8 +
 arch/x86/kvm/i8254.c  |   22 ++
 arch/x86/kvm/i8259.c  |9 +
 arch/x86/kvm/x86.c|3 
 include/linux/kvm.h   |   15 ++
 include/linux/kvm_host.h  |   16 +-
 virt/kvm/Kconfig  |2 
 virt/kvm/coalesced_mmio.c |8 +
 virt/kvm/eventfd.c|  402 +
 virt/kvm/ioapic.c |9 +
 virt/kvm/kvm_main.c   |   41 -
 11 files changed, 520 insertions(+), 15 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v7 1/2] KVM: make io_bus interface more robust

2009-06-16 Thread Gregory Haskins

Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it
fails.  We want to create dynamic MMIO/PIO entries driven from userspace later
in the series, so we need to enhance the code to be more robust with the
following changes:

   1) Add a return value to the registration function
   2) Fix up all the callsites to check the return code, handle any
  failures, and percolate the error up to the caller.
   3) Add an unregister function that collapses holes in the array

Signed-off-by: Gregory Haskins 
---

 arch/x86/kvm/i8254.c  |   22 --
 arch/x86/kvm/i8259.c  |9 -
 include/linux/kvm_host.h  |6 --
 virt/kvm/coalesced_mmio.c |8 ++--
 virt/kvm/ioapic.c |9 +++--
 virt/kvm/kvm_main.c   |   30 --
 6 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 331705f..1c41715 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -586,6 +586,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 {
struct kvm_pit *pit;
struct kvm_kpit_state *pit_state;
+   int ret;
 
pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
if (!pit)
@@ -620,14 +621,31 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
 
kvm_iodevice_init(&pit->dev, &pit_dev_ops);
-   kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+   ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+   if (ret < 0)
+   goto fail;
 
if (flags & KVM_PIT_SPEAKER_DUMMY) {
kvm_iodevice_init(&pit->speaker_dev, &speaker_dev_ops);
-   kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
+   ret = kvm_io_bus_register_dev(&kvm->pio_bus,
+ &pit->speaker_dev);
+   if (ret < 0)
+   goto fail;
}
 
return pit;
+
+fail:
+   if (flags & KVM_PIT_SPEAKER_DUMMY)
+   kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->speaker_dev);
+
+   kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->dev);
+
+   if (pit->irq_source_id >= 0)
+   kvm_free_irq_source_id(kvm, pit->irq_source_id);
+
+   kfree(pit);
+   return NULL;
 }
 
 void kvm_free_pit(struct kvm *kvm)
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 148c52a..66e37c2 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -532,6 +532,8 @@ static const struct kvm_io_device_ops picdev_ops = {
 struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 {
struct kvm_pic *s;
+   int ret;
+
s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL);
if (!s)
return NULL;
@@ -548,6 +550,11 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 * Initialize PIO device
 */
kvm_iodevice_init(&s->dev, &picdev_ops);
-   kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+   ret = kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+   if (ret < 0) {
+   kfree(s);
+   return NULL;
+   }
+
return s;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e4e78db..eafa2b3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
 void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
  gpa_t addr, int len, int is_write);
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
-struct kvm_io_device *dev);
+int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
+void kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
 
 struct kvm_vcpu {
struct kvm *kvm;
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 397f419..3e89db8 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -94,6 +94,7 @@ static const struct kvm_io_device_ops coalesced_mmio_ops = {
 int kvm_coalesced_mmio_init(struct kvm *kvm)
 {
struct kvm_coalesced_mmio_dev *dev;
+   int ret;
 
dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
if (!dev)
@@ -102,9 +103,12 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops);
dev->kvm = kvm;
kvm->coalesced_mmio_dev = dev;
-   kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
 
-   return 0;
+   ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
+   if (ret < 0)
+   kfree(dev);
+
+   return ret;
 }
 
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/

[KVM PATCH v7 2/2] KVM: add iosignalfd support

2009-06-16 Thread Gregory Haskins

iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any arbitrary
IO address with a corresponding eventfd and then pass the eventfd to a
specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
iosignalfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:   11 iops, 9.09us rtt
iosignalfd-mmio: 200100 iops, 5.00us rtt
iosignalfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:  153139 iops, 6.53us rtt
iosignalfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.



Signed-off-by: Gregory Haskins 
---

 arch/x86/kvm/Kconfig |8 +
 arch/x86/kvm/x86.c   |3 
 include/linux/kvm.h  |   15 ++
 include/linux/kvm_host.h |   10 +
 virt/kvm/Kconfig |2 
 virt/kvm/eventfd.c   |  402 ++
 virt/kvm/kvm_main.c  |   11 +
 7 files changed, 447 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 7fbedfd..61b3b24 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -74,6 +74,14 @@ config KVM_TRACE
  relayfs.  Note the ABI is not considered stable and will be
  modified in future updates.
 
+config KVM_MAX_IOSIGNALFD_ITEMS
+   int "Maximum IOSIGNALFD items per address"
+   depends on KVM
+   default "32"
+   ---help---
+ This option influences the maximum number of fd's per PIO/MMIO
+ address that are allowed to register
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/lguest/Kconfig
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b91ea7..c5fd38b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1121,6 +1121,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_MCE:
r = KVM_MAX_MCE_BANKS;
break;
+   case KVM_CAP_IOSIGNALFD:
+   r = CONFIG_KVM_MAX_IOSIGNALFD_ITEMS;
+   break;
default:
r = 0;
break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 38ff31e..9de6486 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -307,6 +307,19 @@ struct kvm_guest_debug {
struct kvm_guest_debug_arch arch;
 };
 
+#define KVM_IOSIGNALFD_FLAG_TRIGGER   (1 << 0)
+#define KVM_IOSIGNALFD_FLAG_PIO   (1 << 1)
+#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1 << 2)
+
+struct kvm_iosignalfd {
+   __u64 trigger;
+   __u64 addr;
+   __u32 len;
+   __u32 fd;
+   __u32 flags;
+   __

Re: [PATCH 0/4] Add rudimentary Hyper-V guest support v3

2009-06-16 Thread Avi Kivity


On 06/15/2009 04:21 PM, Alexander Graf wrote:

Now that we have nested SVM in place, let's make use of it and virtualize
something non-kvm.
The first interesting target that came to my mind here was Hyper-V.

This patchset makes Windows Server 2008 boot with Hyper-V, which runs
the "dom0" in virtualized mode already. It hangs somewhere in IDE code when
booted, so I haven't been able to run a second VM within for now yet.

   


Applied all, thanks.


Please keep in mind that Hyper-V won't work unless you apply the userspace
patches too


Please rebase/repost those.


and the PAT bit patch
   


That's fd2e987d5 unless I'm confusing this with another issue.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Activate Virtualization On Demand v2

2009-06-16 Thread Avi Kivity


On 06/15/2009 02:30 PM, Alexander Graf wrote:

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

+static int hardware_enable_all(void)
+{
+   int r = 0;
+
+   spin_lock(&kvm_lock);
+
+   kvm_usage_count++;
+   if (kvm_usage_count == 1) {
+   atomic_set(&hardware_enable_failed, 1);
+   on_each_cpu(hardware_enable, NULL, 1);
+
+   if (!atomic_dec_and_test(&hardware_enable_failed))
+   r = -EBUSY;
+   }
   


That's a little obfuscated.  I suggest atomic_set(..., p) and 
atomic_read(...).



+
  static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
   void *v)
  {
int cpu = (long)v;

+   if (!kvm_usage_count)
+   return NOTIFY_OK;
+
val&= ~CPU_TASKS_FROZEN;
switch (val) {
case CPU_DYING:
@@ -2513,13 +2571,15 @@ static void kvm_exit_debug(void)

  static int kvm_suspend(struct sys_device *dev, pm_message_t state)
  {
-   hardware_disable(NULL);
+   if (kvm_usage_count)
+   hardware_disable(NULL);
return 0;
  }

  static int kvm_resume(struct sys_device *dev)
  {
-   hardware_enable(NULL);
+   if (kvm_usage_count)
+   hardware_enable(NULL);
return 0;
  }

   


Please tell me you tested suspend/resume with/without VMs and cpu 
hotunplug/hotplug.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Activate Virtualization On Demand v2

2009-06-16 Thread Avi Kivity


On 06/15/2009 03:17 PM, Christoph Hellwig wrote:

On Mon, Jun 15, 2009 at 01:30:05PM +0200, Alexander Graf wrote:
   

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
 


That will only become interesting if we every have such a thing in
mainline.  So NACK, lots of complication for no good reason.
   


If it were truly lots of complication, I might agree.  But it isn't, and 
we keep getting reports from users about it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
> irqfd and its underlying implementation, eventfd, currently utilize
> the embedded wait-queue in eventfd for signal notification.  The nice thing
> about this design decision is that it re-uses the existing
> eventfd/wait-queue code and it generally works wellwith several
> limitations.
> 
> One of the limitations is that notification callbacks are always called
> inside a spin_lock_irqsave critical section.  Another limitation is
> that it is very difficult to build a system that can recieve release
> notification without being racy.
> 
> Therefore, we introduce a new registration interface that is SRCU based
> instead of wait-queue based, and implement the internal wait-queue
> infrastructure in terms of this new interface.  We then convert irqfd
> to use this new interface instead of the existing wait-queue code.
> 
> The end result is that we now have the opportunity to run the interrupt
> injection code serially to the callback (when the signal is raised from
> process-context, at least) instead of always deferring the injection to a
> work-queue.
> 
> Signed-off-by: Gregory Haskins 
> CC: Paul E. McKenney 
> CC: Davide Libenzi 
> ---
> 
>  fs/eventfd.c|  115 
> +++
>  include/linux/eventfd.h |   30 
>  virt/kvm/eventfd.c  |  114 
> +--
>  3 files changed, 188 insertions(+), 71 deletions(-)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 72f5f8d..505d5de 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -30,8 +30,47 @@ struct eventfd_ctx {
>*/
>   __u64 count;
>   unsigned int flags;
> + struct srcu_struct srcu;
> + struct list_head nh;
> + struct eventfd_notifier notifier;
>  };
>  
> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
> +{
> + struct eventfd_ctx *ctx = container_of(en,
> +struct eventfd_ctx,
> +notifier);
> +
> + if (waitqueue_active(&ctx->wqh))
> + wake_up_poll(&ctx->wqh, POLLIN);
> +}
> +
> +static void _eventfd_notify(struct eventfd_ctx *ctx)
> +{
> + struct eventfd_notifier *en;
> + int idx;
> +
> + idx = srcu_read_lock(&ctx->srcu);
> +
> + /*
> +  * The goal here is to allow the notification to be preemptible
> +  * as often as possible.  We cannot achieve this with the basic
> +  * wqh mechanism because it requires the wqh->lock.  Therefore
> +  * we have an internal srcu list mechanism of which the wqh is
> +  * a client.
> +  *
> +  * Not all paths will invoke this function in process context.
> +  * Callers should check for suitable state before assuming they
> +  * can sleep (such as with preemptible()).  Paul McKenney assures
> +  * me that srcu_read_lock is compatible with in-atomic, as long as
> +  * the code within the critical section is also compatible.
> +  */
> + list_for_each_entry_rcu(en, &ctx->nh, list)
> + en->ops->signal(en);
> +
> + srcu_read_unlock(&ctx->srcu, idx);
> +}
> +
>  /*
>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>   * success, or a value lower then "n" in case of coutner overflow.

This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false.

Further, to do useful things it might not be enough that you can sleep:
with iofd you also want to access current task with e.g. copy from user.

Here's an idea: let's pass a flag to ->signal, along the lines of
signal_is_task, that tells us that it is safe to use current, and add
eventfd_signal_task() which is the same as eventfd_signal but lets everyone
know that it's safe to both sleep and use current->mm.

Makes sense?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Activate Virtualization On Demand v2

2009-06-16 Thread Alexander Graf

Avi Kivity wrote:
> On 06/15/2009 02:30 PM, Alexander Graf wrote:
>> X86 CPUs need to have some magic happening to enable the virtualization
>> extensions on them. This magic can result in unpleasant results for
>> users, like blocking other VMMs from working (vmx) or using invalid TLB
>> entries (svm).
>>
>> Currently KVM activates virtualization when the respective kernel module
>> is loaded. This blocks us from autoloading KVM modules without breaking
>> other VMMs.
>>
>> To circumvent this problem at least a bit, this patch introduces on
>> demand activation of virtualization. This means, that instead
>> virtualization is enabled on creation of the first virtual machine
>> and disabled on destruction of the last one.
>>
>> So using this, KVM can be easily autoloaded, while keeping other
>> hypervisors usable.
>>
>> +static int hardware_enable_all(void)
>> +{
>> +int r = 0;
>> +
>> +spin_lock(&kvm_lock);
>> +
>> +kvm_usage_count++;
>> +if (kvm_usage_count == 1) {
>> +atomic_set(&hardware_enable_failed, 1);
>> +on_each_cpu(hardware_enable, NULL, 1);
>> +
>> +if (!atomic_dec_and_test(&hardware_enable_failed))
>> +r = -EBUSY;
>> +}
>>
>
> That's a little obfuscated.  I suggest atomic_set(..., p) and
> atomic_read(...).

Ah, I was more searching for an atomic_test :-).

>>   static int kvm_cpu_hotplug(struct notifier_block *notifier,
>> unsigned long val,
>>  void *v)
>>   {
>>   int cpu = (long)v;
>>
>> +if (!kvm_usage_count)
>> +return NOTIFY_OK;
>> +
>>   val&= ~CPU_TASKS_FROZEN;
>>   switch (val) {
>>   case CPU_DYING:
>> @@ -2513,13 +2571,15 @@ static void kvm_exit_debug(void)
>>
>>   static int kvm_suspend(struct sys_device *dev, pm_message_t state)
>>   {
>> -hardware_disable(NULL);
>> +if (kvm_usage_count)
>> +hardware_disable(NULL);
>>   return 0;
>>   }
>>
>>   static int kvm_resume(struct sys_device *dev)
>>   {
>> -hardware_enable(NULL);
>> +if (kvm_usage_count)
>> +hardware_enable(NULL);
>>   return 0;
>>   }
>>
>>
> +
>
> Please tell me you tested suspend/resume with/without VMs and cpu
> hotunplug/hotplug.

I tested cpu hotplugging. On the last round I tested suspend/resume, but
this time I couldn't because my machine can't do suspend :-(.
So I'll try hard and find a machine I can test it on for the next round.

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
>   
>> irqfd and its underlying implementation, eventfd, currently utilize
>> the embedded wait-queue in eventfd for signal notification.  The nice thing
>> about this design decision is that it re-uses the existing
>> eventfd/wait-queue code and it generally works wellwith several
>> limitations.
>>
>> One of the limitations is that notification callbacks are always called
>> inside a spin_lock_irqsave critical section.  Another limitation is
>> that it is very difficult to build a system that can recieve release
>> notification without being racy.
>>
>> Therefore, we introduce a new registration interface that is SRCU based
>> instead of wait-queue based, and implement the internal wait-queue
>> infrastructure in terms of this new interface.  We then convert irqfd
>> to use this new interface instead of the existing wait-queue code.
>>
>> The end result is that we now have the opportunity to run the interrupt
>> injection code serially to the callback (when the signal is raised from
>> process-context, at least) instead of always deferring the injection to a
>> work-queue.
>>
>> Signed-off-by: Gregory Haskins 
>> CC: Paul E. McKenney 
>> CC: Davide Libenzi 
>> ---
>>
>>  fs/eventfd.c|  115 
>> +++
>>  include/linux/eventfd.h |   30 
>>  virt/kvm/eventfd.c  |  114 
>> +--
>>  3 files changed, 188 insertions(+), 71 deletions(-)
>>
>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>> index 72f5f8d..505d5de 100644
>> --- a/fs/eventfd.c
>> +++ b/fs/eventfd.c
>> @@ -30,8 +30,47 @@ struct eventfd_ctx {
>>   */
>>  __u64 count;
>>  unsigned int flags;
>> +struct srcu_struct srcu;
>> +struct list_head nh;
>> +struct eventfd_notifier notifier;
>>  };
>>  
>> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
>> +{
>> +struct eventfd_ctx *ctx = container_of(en,
>> +   struct eventfd_ctx,
>> +   notifier);
>> +
>> +if (waitqueue_active(&ctx->wqh))
>> +wake_up_poll(&ctx->wqh, POLLIN);
>> +}
>> +
>> +static void _eventfd_notify(struct eventfd_ctx *ctx)
>> +{
>> +struct eventfd_notifier *en;
>> +int idx;
>> +
>> +idx = srcu_read_lock(&ctx->srcu);
>> +
>> +/*
>> + * The goal here is to allow the notification to be preemptible
>> + * as often as possible.  We cannot achieve this with the basic
>> + * wqh mechanism because it requires the wqh->lock.  Therefore
>> + * we have an internal srcu list mechanism of which the wqh is
>> + * a client.
>> + *
>> + * Not all paths will invoke this function in process context.
>> + * Callers should check for suitable state before assuming they
>> + * can sleep (such as with preemptible()).  Paul McKenney assures
>> + * me that srcu_read_lock is compatible with in-atomic, as long as
>> + * the code within the critical section is also compatible.
>> + */
>> +list_for_each_entry_rcu(en, &ctx->nh, list)
>> +en->ops->signal(en);
>> +
>> +srcu_read_unlock(&ctx->srcu, idx);
>> +}
>> +
>>  /*
>>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>>   * success, or a value lower then "n" in case of coutner overflow.
>> 
>
> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false.
>
> Further, to do useful things it might not be enough that you can sleep:
> with iofd you also want to access current task with e.g. copy from user.
>
> Here's an idea: let's pass a flag to ->signal, along the lines of
> signal_is_task, that tells us that it is safe to use current, and add
> eventfd_signal_task() which is the same as eventfd_signal but lets everyone
> know that it's safe to both sleep and use current->mm.
>
> Makes sense?
>   

It does make sense, yes.  What I am not clear on is how would eventfd
detect this state such as to populate such flags, and why cant the
->signal() CB do the same?

Thanks Michael,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
>   
>> irqfd and its underlying implementation, eventfd, currently utilize
>> the embedded wait-queue in eventfd for signal notification.  The nice thing
>> about this design decision is that it re-uses the existing
>> eventfd/wait-queue code and it generally works wellwith several
>> limitations.
>>
>> One of the limitations is that notification callbacks are always called
>> inside a spin_lock_irqsave critical section.  Another limitation is
>> that it is very difficult to build a system that can recieve release
>> notification without being racy.
>>
>> Therefore, we introduce a new registration interface that is SRCU based
>> instead of wait-queue based, and implement the internal wait-queue
>> infrastructure in terms of this new interface.  We then convert irqfd
>> to use this new interface instead of the existing wait-queue code.
>>
>> The end result is that we now have the opportunity to run the interrupt
>> injection code serially to the callback (when the signal is raised from
>> process-context, at least) instead of always deferring the injection to a
>> work-queue.
>>
>> Signed-off-by: Gregory Haskins 
>> CC: Paul E. McKenney 
>> CC: Davide Libenzi 
>> ---
>>
>>  fs/eventfd.c|  115 
>> +++
>>  include/linux/eventfd.h |   30 
>>  virt/kvm/eventfd.c  |  114 
>> +--
>>  3 files changed, 188 insertions(+), 71 deletions(-)
>>
>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>> index 72f5f8d..505d5de 100644
>> --- a/fs/eventfd.c
>> +++ b/fs/eventfd.c
>> @@ -30,8 +30,47 @@ struct eventfd_ctx {
>>   */
>>  __u64 count;
>>  unsigned int flags;
>> +struct srcu_struct srcu;
>> +struct list_head nh;
>> +struct eventfd_notifier notifier;
>>  };
>>  
>> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
>> +{
>> +struct eventfd_ctx *ctx = container_of(en,
>> +   struct eventfd_ctx,
>> +   notifier);
>> +
>> +if (waitqueue_active(&ctx->wqh))
>> +wake_up_poll(&ctx->wqh, POLLIN);
>> +}
>> +
>> +static void _eventfd_notify(struct eventfd_ctx *ctx)
>> +{
>> +struct eventfd_notifier *en;
>> +int idx;
>> +
>> +idx = srcu_read_lock(&ctx->srcu);
>> +
>> +/*
>> + * The goal here is to allow the notification to be preemptible
>> + * as often as possible.  We cannot achieve this with the basic
>> + * wqh mechanism because it requires the wqh->lock.  Therefore
>> + * we have an internal srcu list mechanism of which the wqh is
>> + * a client.
>> + *
>> + * Not all paths will invoke this function in process context.
>> + * Callers should check for suitable state before assuming they
>> + * can sleep (such as with preemptible()).  Paul McKenney assures
>> + * me that srcu_read_lock is compatible with in-atomic, as long as
>> + * the code within the critical section is also compatible.
>> + */
>> +list_for_each_entry_rcu(en, &ctx->nh, list)
>> +en->ops->signal(en);
>> +
>> +srcu_read_unlock(&ctx->srcu, idx);
>> +}
>> +
>>  /*
>>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>>   * success, or a value lower then "n" in case of coutner overflow.
>> 
>
> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false.
>   

As an aside, this is something I would like to address.  I keep running
into this pattern where I could do something in-line if I had a
"can_sleep()" context.  Otherwise, I have to punt to something like a
workqueue which adds latency.  The closest thing I have to "can_sleep()"
is preemptible(), which, as you correctly pointed out is limited to only
working with CONFIG_PREEMPT=y.

Its been a while since I looked into it, but one of the barriers that
would need to be overcome is the fact that the preempt_count stuff gets
compiled away with CONFIG_PREEMPT=n.  It is conceivable that we could
make the preempt_count logic an independent config variable from
CONFIG_PREEMPT to provide a can_sleep() macro without requiring
full-blow preemption to be enabled.  So my questions would be as follows:

a) Is the community conducive to such an idea?
b) Are there other things to consider/fix besides the lack of
preempt_count in order to implement such a beast?

-Greg




signature.asc
Description: OpenPGP digital signature

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

[Adding Ingo]

Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
>   
>> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
>>   
>> 
>>> irqfd and its underlying implementation, eventfd, currently utilize
>>> the embedded wait-queue in eventfd for signal notification.  The nice thing
>>> about this design decision is that it re-uses the existing
>>> eventfd/wait-queue code and it generally works wellwith several
>>> limitations.
>>>
>>> One of the limitations is that notification callbacks are always called
>>> inside a spin_lock_irqsave critical section.  Another limitation is
>>> that it is very difficult to build a system that can recieve release
>>> notification without being racy.
>>>
>>> Therefore, we introduce a new registration interface that is SRCU based
>>> instead of wait-queue based, and implement the internal wait-queue
>>> infrastructure in terms of this new interface.  We then convert irqfd
>>> to use this new interface instead of the existing wait-queue code.
>>>
>>> The end result is that we now have the opportunity to run the interrupt
>>> injection code serially to the callback (when the signal is raised from
>>> process-context, at least) instead of always deferring the injection to a
>>> work-queue.
>>>
>>> Signed-off-by: Gregory Haskins 
>>> CC: Paul E. McKenney 
>>> CC: Davide Libenzi 
>>> ---
>>>
>>>  fs/eventfd.c|  115 
>>> +++
>>>  include/linux/eventfd.h |   30 
>>>  virt/kvm/eventfd.c  |  114 
>>> +--
>>>  3 files changed, 188 insertions(+), 71 deletions(-)
>>>
>>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>>> index 72f5f8d..505d5de 100644
>>> --- a/fs/eventfd.c
>>> +++ b/fs/eventfd.c
>>> @@ -30,8 +30,47 @@ struct eventfd_ctx {
>>>  */
>>> __u64 count;
>>> unsigned int flags;
>>> +   struct srcu_struct srcu;
>>> +   struct list_head nh;
>>> +   struct eventfd_notifier notifier;
>>>  };
>>>  
>>> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
>>> +{
>>> +   struct eventfd_ctx *ctx = container_of(en,
>>> +  struct eventfd_ctx,
>>> +  notifier);
>>> +
>>> +   if (waitqueue_active(&ctx->wqh))
>>> +   wake_up_poll(&ctx->wqh, POLLIN);
>>> +}
>>> +
>>> +static void _eventfd_notify(struct eventfd_ctx *ctx)
>>> +{
>>> +   struct eventfd_notifier *en;
>>> +   int idx;
>>> +
>>> +   idx = srcu_read_lock(&ctx->srcu);
>>> +
>>> +   /*
>>> +* The goal here is to allow the notification to be preemptible
>>> +* as often as possible.  We cannot achieve this with the basic
>>> +* wqh mechanism because it requires the wqh->lock.  Therefore
>>> +* we have an internal srcu list mechanism of which the wqh is
>>> +* a client.
>>> +*
>>> +* Not all paths will invoke this function in process context.
>>> +* Callers should check for suitable state before assuming they
>>> +* can sleep (such as with preemptible()).  Paul McKenney assures
>>> +* me that srcu_read_lock is compatible with in-atomic, as long as
>>> +* the code within the critical section is also compatible.
>>> +*/
>>> +   list_for_each_entry_rcu(en, &ctx->nh, list)
>>> +   en->ops->signal(en);
>>> +
>>> +   srcu_read_unlock(&ctx->srcu, idx);
>>> +}
>>> +
>>>  /*
>>>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>>>   * success, or a value lower then "n" in case of coutner overflow.
>>> 
>>>   
>> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false.
>>   
>> 
>
> As an aside, this is something I would like to address.  I keep running
> into this pattern where I could do something in-line if I had a
> "can_sleep()" context.  Otherwise, I have to punt to something like a
> workqueue which adds latency.  The closest thing I have to "can_sleep()"
> is preemptible(), which, as you correctly pointed out is limited to only
> working with CONFIG_PREEMPT=y.
>
> Its been a while since I looked into it, but one of the barriers that
> would need to be overcome is the fact that the preempt_count stuff gets
> compiled away with CONFIG_PREEMPT=n.  It is conceivable that we could
> make the preempt_count logic an independent config variable from
> CONFIG_PREEMPT to provide a can_sleep() macro without requiring
> full-blow preemption to be enabled.  So my questions would be as follows:
>
> a) Is the community conducive to such an idea?
> b) Are there other things to consider/fix besides the lack of
> preempt_count in order to implement such a beast?
>
> -Greg
>
>
>   
Hi Ingo,

Any thoughts here?

-Greg



signature.asc
Description: OpenPGP digital signature

Re: strange guest slowness after some time

2009-06-16 Thread Tomasz Chmielewski


Felix Leimbach wrote:


It's exactly the same CPU I have.
Interesting: Since two months I'm running on 2 Shanghai Quad-Cores 
instead and the problem is definitely gone.
The rest of the hardware as well as the whole software-stack remained 
unchanged.


That should confirm what we assumed already.


For me, it turned out that KVM I was running (coming with Proxmox VE) 
had a "fairsched" patch (OpenVZ-related) which caused this broken behaviour.



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: init bsp_vcpu before kvm_arch_vcpu_init

2009-06-16 Thread Marcelo Tosatti


On x86 mp_state is initialized by kvm_arch_vcpu_init. Right
now kvm_vcpu_is_bsp returns false because kvm->bsp_vcpu has
not been initialized, so vcpu_id == 0 ends up with mp_state ==
KVM_MP_STATE_UNINITIALIZED.

Gleb do you see a better way to fix this?


Signed-off-by: Marcelo Tosatti 


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8939ffa..7225064 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -773,6 +773,13 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, 
unsigned id)
struct page *page;
int r;
 
+   mutex_lock(&kvm->lock);
+#ifdef CONFIG_KVM_APIC_ARCHITECTURE
+   if (kvm->bsp_vcpu_id == id)
+   kvm->bsp_vcpu = vcpu;
+#endif
+   mutex_unlock(&kvm->lock);
+
mutex_init(&vcpu->mutex);
vcpu->cpu = -1;
vcpu->kvm = kvm;
@@ -1760,14 +1767,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
u32 id)
smp_wmb();
atomic_inc(&kvm->online_vcpus);
 
-#ifdef CONFIG_KVM_APIC_ARCHITECTURE
-   if (kvm->bsp_vcpu_id == id)
-   kvm->bsp_vcpu = vcpu;
-#endif
mutex_unlock(&kvm->lock);
return r;
 
 vcpu_destroy:
+   if (kvm->bsp_vcpu_id == id)
+   kvm->bsp_vcpu = NULL;
mutex_unlock(&kvm->lock);
kvm_arch_vcpu_destroy(vcpu);
return r;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Tue, Jun 16, 2009 at 10:11:08AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
> >   
> >> irqfd and its underlying implementation, eventfd, currently utilize
> >> the embedded wait-queue in eventfd for signal notification.  The nice thing
> >> about this design decision is that it re-uses the existing
> >> eventfd/wait-queue code and it generally works wellwith several
> >> limitations.
> >>
> >> One of the limitations is that notification callbacks are always called
> >> inside a spin_lock_irqsave critical section.  Another limitation is
> >> that it is very difficult to build a system that can recieve release
> >> notification without being racy.
> >>
> >> Therefore, we introduce a new registration interface that is SRCU based
> >> instead of wait-queue based, and implement the internal wait-queue
> >> infrastructure in terms of this new interface.  We then convert irqfd
> >> to use this new interface instead of the existing wait-queue code.
> >>
> >> The end result is that we now have the opportunity to run the interrupt
> >> injection code serially to the callback (when the signal is raised from
> >> process-context, at least) instead of always deferring the injection to a
> >> work-queue.
> >>
> >> Signed-off-by: Gregory Haskins 
> >> CC: Paul E. McKenney 
> >> CC: Davide Libenzi 
> >> ---
> >>
> >>  fs/eventfd.c|  115 
> >> +++
> >>  include/linux/eventfd.h |   30 
> >>  virt/kvm/eventfd.c  |  114 
> >> +--
> >>  3 files changed, 188 insertions(+), 71 deletions(-)
> >>
> >> diff --git a/fs/eventfd.c b/fs/eventfd.c
> >> index 72f5f8d..505d5de 100644
> >> --- a/fs/eventfd.c
> >> +++ b/fs/eventfd.c
> >> @@ -30,8 +30,47 @@ struct eventfd_ctx {
> >> */
> >>__u64 count;
> >>unsigned int flags;
> >> +  struct srcu_struct srcu;
> >> +  struct list_head nh;
> >> +  struct eventfd_notifier notifier;
> >>  };
> >>  
> >> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
> >> +{
> >> +  struct eventfd_ctx *ctx = container_of(en,
> >> + struct eventfd_ctx,
> >> + notifier);
> >> +
> >> +  if (waitqueue_active(&ctx->wqh))
> >> +  wake_up_poll(&ctx->wqh, POLLIN);
> >> +}
> >> +
> >> +static void _eventfd_notify(struct eventfd_ctx *ctx)
> >> +{
> >> +  struct eventfd_notifier *en;
> >> +  int idx;
> >> +
> >> +  idx = srcu_read_lock(&ctx->srcu);
> >> +
> >> +  /*
> >> +   * The goal here is to allow the notification to be preemptible
> >> +   * as often as possible.  We cannot achieve this with the basic
> >> +   * wqh mechanism because it requires the wqh->lock.  Therefore
> >> +   * we have an internal srcu list mechanism of which the wqh is
> >> +   * a client.
> >> +   *
> >> +   * Not all paths will invoke this function in process context.
> >> +   * Callers should check for suitable state before assuming they
> >> +   * can sleep (such as with preemptible()).  Paul McKenney assures
> >> +   * me that srcu_read_lock is compatible with in-atomic, as long as
> >> +   * the code within the critical section is also compatible.
> >> +   */
> >> +  list_for_each_entry_rcu(en, &ctx->nh, list)
> >> +  en->ops->signal(en);
> >> +
> >> +  srcu_read_unlock(&ctx->srcu, idx);
> >> +}
> >> +
> >>  /*
> >>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
> >>   * success, or a value lower then "n" in case of coutner overflow.
> >> 
> >
> > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
> > false.
> >
> > Further, to do useful things it might not be enough that you can sleep:
> > with iofd you also want to access current task with e.g. copy from user.
> >
> > Here's an idea: let's pass a flag to ->signal, along the lines of
> > signal_is_task, that tells us that it is safe to use current, and add
> > eventfd_signal_task() which is the same as eventfd_signal but lets everyone
> > know that it's safe to both sleep and use current->mm.
> >
> > Makes sense?
> >   
> 
> It does make sense, yes.  What I am not clear on is how would eventfd
> detect this state such as to populate such flags, and why cant the
> ->signal() CB do the same?
> 
> Thanks Michael,
> -Greg
> 

eventfd can't detect this state. But the callers know in what context they are.
So the *caller* of eventfd_signal_task makes sure of this: if you are in a task,
you can call eventfd_signal_task() if not, you must call eventfd_signal.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
>   
>> irqfd and its underlying implementation, eventfd, currently utilize
>> the embedded wait-queue in eventfd for signal notification.  The nice thing
>> about this design decision is that it re-uses the existing
>> eventfd/wait-queue code and it generally works wellwith several
>> limitations.
>>
>> One of the limitations is that notification callbacks are always called
>> inside a spin_lock_irqsave critical section.  Another limitation is
>> that it is very difficult to build a system that can recieve release
>> notification without being racy.
>>
>> Therefore, we introduce a new registration interface that is SRCU based
>> instead of wait-queue based, and implement the internal wait-queue
>> infrastructure in terms of this new interface.  We then convert irqfd
>> to use this new interface instead of the existing wait-queue code.
>>
>> The end result is that we now have the opportunity to run the interrupt
>> injection code serially to the callback (when the signal is raised from
>> process-context, at least) instead of always deferring the injection to a
>> work-queue.
>>
>> Signed-off-by: Gregory Haskins 
>> CC: Paul E. McKenney 
>> CC: Davide Libenzi 
>> ---
>>
>>  fs/eventfd.c|  115 
>> +++
>>  include/linux/eventfd.h |   30 
>>  virt/kvm/eventfd.c  |  114 
>> +--
>>  3 files changed, 188 insertions(+), 71 deletions(-)
>>
>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>> index 72f5f8d..505d5de 100644
>> --- a/fs/eventfd.c
>> +++ b/fs/eventfd.c
>> @@ -30,8 +30,47 @@ struct eventfd_ctx {
>>   */
>>  __u64 count;
>>  unsigned int flags;
>> +struct srcu_struct srcu;
>> +struct list_head nh;
>> +struct eventfd_notifier notifier;
>>  };
>>  
>> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
>> +{
>> +struct eventfd_ctx *ctx = container_of(en,
>> +   struct eventfd_ctx,
>> +   notifier);
>> +
>> +if (waitqueue_active(&ctx->wqh))
>> +wake_up_poll(&ctx->wqh, POLLIN);
>> +}
>> +
>> +static void _eventfd_notify(struct eventfd_ctx *ctx)
>> +{
>> +struct eventfd_notifier *en;
>> +int idx;
>> +
>> +idx = srcu_read_lock(&ctx->srcu);
>> +
>> +/*
>> + * The goal here is to allow the notification to be preemptible
>> + * as often as possible.  We cannot achieve this with the basic
>> + * wqh mechanism because it requires the wqh->lock.  Therefore
>> + * we have an internal srcu list mechanism of which the wqh is
>> + * a client.
>> + *
>> + * Not all paths will invoke this function in process context.
>> + * Callers should check for suitable state before assuming they
>> + * can sleep (such as with preemptible()).  Paul McKenney assures
>> + * me that srcu_read_lock is compatible with in-atomic, as long as
>> + * the code within the critical section is also compatible.
>> + */
>> +list_for_each_entry_rcu(en, &ctx->nh, list)
>> +en->ops->signal(en);
>> +
>> +srcu_read_unlock(&ctx->srcu, idx);
>> +}
>> +
>>  /*
>>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>>   * success, or a value lower then "n" in case of coutner overflow.
>> 
>
> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false.
>
> Further, to do useful things it might not be enough that you can sleep:
> with iofd you also want to access current task with e.g. copy from user.
>   

Something else to consider:  For iosignalfd, we can assume we will
always be called from vcpu process context so we might not really need
official affirmation from the system.  For irqfd, we cannot predict who
may be injecting the interrupt (for instance, it might be a
PCI-passthrough hard-irq context).  I am not sure if this knowledge
actually helps to simplify the problem space, but I thought I should
mention it.

-Greg




signature.asc
Description: OpenPGP digital signature

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Tue, Jun 16, 2009 at 10:40:55AM -0400, Gregory Haskins wrote:
> > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
> > false.
> >
> > Further, to do useful things it might not be enough that you can sleep:
> > with iofd you also want to access current task with e.g. copy from user.
> >   
> 
> Something else to consider:  For iosignalfd, we can assume we will
> always be called from vcpu process context so we might not really need
> official affirmation from the system.  For irqfd, we cannot predict who
> may be injecting the interrupt (for instance, it might be a
> PCI-passthrough hard-irq context).  I am not sure if this knowledge
> actually helps to simplify the problem space, but I thought I should
> mention it.
> 
> -Greg
> 
> 

The way this is addressed with eventfd_signal_task proposal is:
- user calls eventfd_signal_task
  we look at current->mm and figure out whether this is the right
  context or we need a context switch
- everyone else calls eventfd_signal
  we know that we need a context switch


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 16, 2009 at 10:11:08AM -0400, Gregory Haskins wrote:
>   
>> Michael S. Tsirkin wrote:
>> 
>>> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
>>>   
>>>   
 irqfd and its underlying implementation, eventfd, currently utilize
 the embedded wait-queue in eventfd for signal notification.  The nice thing
 about this design decision is that it re-uses the existing
 eventfd/wait-queue code and it generally works wellwith several
 limitations.

 One of the limitations is that notification callbacks are always called
 inside a spin_lock_irqsave critical section.  Another limitation is
 that it is very difficult to build a system that can recieve release
 notification without being racy.

 Therefore, we introduce a new registration interface that is SRCU based
 instead of wait-queue based, and implement the internal wait-queue
 infrastructure in terms of this new interface.  We then convert irqfd
 to use this new interface instead of the existing wait-queue code.

 The end result is that we now have the opportunity to run the interrupt
 injection code serially to the callback (when the signal is raised from
 process-context, at least) instead of always deferring the injection to a
 work-queue.

 Signed-off-by: Gregory Haskins 
 CC: Paul E. McKenney 
 CC: Davide Libenzi 
 ---

  fs/eventfd.c|  115 
 +++
  include/linux/eventfd.h |   30 
  virt/kvm/eventfd.c  |  114 
 +--
  3 files changed, 188 insertions(+), 71 deletions(-)

 diff --git a/fs/eventfd.c b/fs/eventfd.c
 index 72f5f8d..505d5de 100644
 --- a/fs/eventfd.c
 +++ b/fs/eventfd.c
 @@ -30,8 +30,47 @@ struct eventfd_ctx {
 */
__u64 count;
unsigned int flags;
 +  struct srcu_struct srcu;
 +  struct list_head nh;
 +  struct eventfd_notifier notifier;
  };
  
 +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
 +{
 +  struct eventfd_ctx *ctx = container_of(en,
 + struct eventfd_ctx,
 + notifier);
 +
 +  if (waitqueue_active(&ctx->wqh))
 +  wake_up_poll(&ctx->wqh, POLLIN);
 +}
 +
 +static void _eventfd_notify(struct eventfd_ctx *ctx)
 +{
 +  struct eventfd_notifier *en;
 +  int idx;
 +
 +  idx = srcu_read_lock(&ctx->srcu);
 +
 +  /*
 +   * The goal here is to allow the notification to be preemptible
 +   * as often as possible.  We cannot achieve this with the basic
 +   * wqh mechanism because it requires the wqh->lock.  Therefore
 +   * we have an internal srcu list mechanism of which the wqh is
 +   * a client.
 +   *
 +   * Not all paths will invoke this function in process context.
 +   * Callers should check for suitable state before assuming they
 +   * can sleep (such as with preemptible()).  Paul McKenney assures
 +   * me that srcu_read_lock is compatible with in-atomic, as long as
 +   * the code within the critical section is also compatible.
 +   */
 +  list_for_each_entry_rcu(en, &ctx->nh, list)
 +  en->ops->signal(en);
 +
 +  srcu_read_unlock(&ctx->srcu, idx);
 +}
 +
  /*
   * Adds "n" to the eventfd counter "count". Returns "n" in case of
   * success, or a value lower then "n" in case of coutner overflow.
 
 
>>> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
>>> false.
>>>
>>> Further, to do useful things it might not be enough that you can sleep:
>>> with iofd you also want to access current task with e.g. copy from user.
>>>
>>> Here's an idea: let's pass a flag to ->signal, along the lines of
>>> signal_is_task, that tells us that it is safe to use current, and add
>>> eventfd_signal_task() which is the same as eventfd_signal but lets everyone
>>> know that it's safe to both sleep and use current->mm.
>>>
>>> Makes sense?
>>>   
>>>   
>> It does make sense, yes.  What I am not clear on is how would eventfd
>> detect this state such as to populate such flags, and why cant the
>> ->signal() CB do the same?
>>
>> Thanks Michael,
>> -Greg
>>
>> 
>
> eventfd can't detect this state. But the callers know in what context they 
> are.
> So the *caller* of eventfd_signal_task makes sure of this: if you are in a 
> task,
> you can call eventfd_signal_task() if not, you must call eventfd_signal.
>
>
>   
Hmm, this is an interesting idea, but I think it would be problematic in
real-world applications for the long-term.  For instance, the -rt tree
and irq-threads .config option in the process of merging into mainline
changes context types for established code.  Therefore,

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
>   
>> On Tue, Jun 16, 2009 at 10:11:08AM -0400, Gregory Haskins wrote:
>>   
>> 
>>> Michael S. Tsirkin wrote:
>>> 
>>>   
 On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote:
   
   
 
> irqfd and its underlying implementation, eventfd, currently utilize
> the embedded wait-queue in eventfd for signal notification.  The nice 
> thing
> about this design decision is that it re-uses the existing
> eventfd/wait-queue code and it generally works wellwith several
> limitations.
>
> One of the limitations is that notification callbacks are always called
> inside a spin_lock_irqsave critical section.  Another limitation is
> that it is very difficult to build a system that can recieve release
> notification without being racy.
>
> Therefore, we introduce a new registration interface that is SRCU based
> instead of wait-queue based, and implement the internal wait-queue
> infrastructure in terms of this new interface.  We then convert irqfd
> to use this new interface instead of the existing wait-queue code.
>
> The end result is that we now have the opportunity to run the interrupt
> injection code serially to the callback (when the signal is raised from
> process-context, at least) instead of always deferring the injection to a
> work-queue.
>
> Signed-off-by: Gregory Haskins 
> CC: Paul E. McKenney 
> CC: Davide Libenzi 
> ---
>
>  fs/eventfd.c|  115 
> +++
>  include/linux/eventfd.h |   30 
>  virt/kvm/eventfd.c  |  114 
> +--
>  3 files changed, 188 insertions(+), 71 deletions(-)
>
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 72f5f8d..505d5de 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -30,8 +30,47 @@ struct eventfd_ctx {
>*/
>   __u64 count;
>   unsigned int flags;
> + struct srcu_struct srcu;
> + struct list_head nh;
> + struct eventfd_notifier notifier;
>  };
>  
> +static void _eventfd_wqh_notify(struct eventfd_notifier *en)
> +{
> + struct eventfd_ctx *ctx = container_of(en,
> +struct eventfd_ctx,
> +notifier);
> +
> + if (waitqueue_active(&ctx->wqh))
> + wake_up_poll(&ctx->wqh, POLLIN);
> +}
> +
> +static void _eventfd_notify(struct eventfd_ctx *ctx)
> +{
> + struct eventfd_notifier *en;
> + int idx;
> +
> + idx = srcu_read_lock(&ctx->srcu);
> +
> + /*
> +  * The goal here is to allow the notification to be preemptible
> +  * as often as possible.  We cannot achieve this with the basic
> +  * wqh mechanism because it requires the wqh->lock.  Therefore
> +  * we have an internal srcu list mechanism of which the wqh is
> +  * a client.
> +  *
> +  * Not all paths will invoke this function in process context.
> +  * Callers should check for suitable state before assuming they
> +  * can sleep (such as with preemptible()).  Paul McKenney assures
> +  * me that srcu_read_lock is compatible with in-atomic, as long as
> +  * the code within the critical section is also compatible.
> +  */
> + list_for_each_entry_rcu(en, &ctx->nh, list)
> + en->ops->signal(en);
> +
> + srcu_read_unlock(&ctx->srcu, idx);
> +}
> +
>  /*
>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>   * success, or a value lower then "n" in case of coutner overflow.
> 
> 
>   
 This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
 false.

 Further, to do useful things it might not be enough that you can sleep:
 with iofd you also want to access current task with e.g. copy from user.

 Here's an idea: let's pass a flag to ->signal, along the lines of
 signal_is_task, that tells us that it is safe to use current, and add
 eventfd_signal_task() which is the same as eventfd_signal but lets everyone
 know that it's safe to both sleep and use current->mm.

 Makes sense?
   
   
 
>>> It does make sense, yes.  What I am not clear on is how would eventfd
>>> detect this state such as to populate such flags, and why cant the
>>> ->signal() CB do the same?
>>>
>>> Thanks Michael,
>>> -Greg
>>>
>>> 
>>>   
>> eventfd can't detect this state. But the callers know in what context they 
>> are.
>> So the *caller* of eventfd_signal_task makes sure of this: if you are in a 
>> task,
>> you can call eventfd_signal_task() if not, you must call eventfd_signal.
>>
>>
>>   
>> 
> Hmm, this is an interesting idea, but I think it would be

Re: [PATCH] remove ppc functions from callbacks

2009-06-16 Thread Hollis Blanchard

On Tue, 2009-06-16 at 09:34 -0400, Glauber Costa wrote:
> handle_powerpc_dcr_read() and handle_powerpc_dcr_write() are two
> powerpc specific functions that are called via libkvm callbacks.
> However, grepping the source code finds simply no use of them. This
> is probably due to the fact that powerpc now relies on a totally
> qemu upstream implementation of kvm, and does not need it anymore.
> 
> Anyway, I'm providing this patch separatedly, so that if it breaks
> for whenever reason, we can identify a bisection point easily
> 
> Signed-off-by: Glauber Costa 
> CC: Hollis Blanchard 

Yup, this path is handled via kvm_arch_handle_exit() now. Thanks.

Acked-by: Hollis Blanchard 

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Tue, Jun 16, 2009 at 10:48:27AM -0400, Gregory Haskins wrote:
>  +static void _eventfd_notify(struct eventfd_ctx *ctx)
>  +{
>  +struct eventfd_notifier *en;
>  +int idx;
>  +
>  +idx = srcu_read_lock(&ctx->srcu);
>  +
>  +/*
>  + * The goal here is to allow the notification to be preemptible
>  + * as often as possible.  We cannot achieve this with the basic
>  + * wqh mechanism because it requires the wqh->lock.  Therefore
>  + * we have an internal srcu list mechanism of which the wqh is
>  + * a client.
>  + *
>  + * Not all paths will invoke this function in process context.
>  + * Callers should check for suitable state before assuming they
>  + * can sleep (such as with preemptible()).  Paul McKenney 
>  assures
>  + * me that srcu_read_lock is compatible with in-atomic, as long 
>  as
>  + * the code within the critical section is also compatible.
>  + */
>  +list_for_each_entry_rcu(en, &ctx->nh, list)
>  +en->ops->signal(en);
>  +
>  +srcu_read_unlock(&ctx->srcu, idx);
>  +}
>  +
>   /*
>    * Adds "n" to the eventfd counter "count". Returns "n" in case of
>    * success, or a value lower then "n" in case of coutner overflow.
>  
>  
> >>> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
> >>> false.
> >>>
> >>> Further, to do useful things it might not be enough that you can sleep:
> >>> with iofd you also want to access current task with e.g. copy from user.
> >>>
> >>> Here's an idea: let's pass a flag to ->signal, along the lines of
> >>> signal_is_task, that tells us that it is safe to use current, and add
> >>> eventfd_signal_task() which is the same as eventfd_signal but lets 
> >>> everyone
> >>> know that it's safe to both sleep and use current->mm.
> >>>
> >>> Makes sense?
> >>>   
> >>>   
> >> It does make sense, yes.  What I am not clear on is how would eventfd
> >> detect this state such as to populate such flags, and why cant the
> >> ->signal() CB do the same?
> >>
> >> Thanks Michael,
> >> -Greg
> >>
> >> 
> >
> > eventfd can't detect this state. But the callers know in what context they 
> > are.
> > So the *caller* of eventfd_signal_task makes sure of this: if you are in a 
> > task,
> > you can call eventfd_signal_task() if not, you must call eventfd_signal.
> >
> >
> >   
> Hmm, this is an interesting idea, but I think it would be problematic in
> real-world applications for the long-term.  For instance, the -rt tree
> and irq-threads .config option in the process of merging into mainline
> changes context types for established code.  Therefore, what might be
> "hardirq/softirq" logic today may execute in a kthread tomorrow.

That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is just
an optimization. I think everyone not in the context of a system call or vmexit
can just call eventfd_signal_task.

>  I
> think its dangerous to try to solve the problem with caller provided
> info:  the caller may be ignorant of its true state.

I assume this wasn't clear enough: the idea is that you only
calls eventfd_signal_task if you know you are on a systemcall path.
If you are ignorant of the state, call eventfd_signal.

>  IMO, the ideal
> solution needs to be something we can detect at run-time.
> 
> Thanks Michael,
> -Greg
> 


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-AUTOTEST PATCH 2/4] kvm_guest_wizard: pass 'params' directly to barrier_2()

2009-06-16 Thread Uri Lublin


On 06/15/2009 10:45 PM, Michael Goldish wrote:

Currently parameters for barrier_2() are extracted from 'params' in the main
run_steps() test routine, and then passed to barrier_2().
Instead, let barrier_2() extract parameters from 'params' as it sees fit.
This will make adding new parameters slightly easier and cleaner.

Signed-off-by: Michael Goldish
---
  client/tests/kvm/kvm_guest_wizard.py |   37 -
  1 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/client/tests/kvm/kvm_guest_wizard.py 
b/client/tests/kvm/kvm_guest_wizard.py
index 143e61e..eb0e2d5 100644
--- a/client/tests/kvm/kvm_guest_wizard.py
+++ b/client/tests/kvm/kvm_guest_wizard.py




@@ -41,6 +41,18 @@ def barrier_2(vm, words, fail_if_stuck_for, 
stuck_detection_history,
   
"cropped_scrdump_expected.ppm")
  comparison_filename = os.path.join(debug_dir, "comparison.ppm")

+fail_if_stuck_for = params.get("fail_if_stuck_for")
+if fail_if_stuck_for:
+fail_if_stuck_for = float(fail_if_stuck_for)
+else:
+fail_if_stuck_for = 1e308
+
+stuck_detection_history = params.get("stuck_detection_history")
+if stuck_detection_history:
+stuck_detection_history = int(stuck_detection_history)
+else:
+stuck_detection_history = 2
+


Could be simplified by (preferably in a separate patch):
   fail_if_stuck_for = params.get("fail_if_stuck_for", 1e308)
   fail_if_stuck_for = float(fail_if_stuck_for)

   stuck_detection_history = params.get("stuck_detection_history", 2)
   stuck_detection_history = int(stuck_detection_history)

or even with a single line for each param:
   var = cast(params.get(var_name, default_value))


Regards,
Uri
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Activate Virtualization On Demand v2

2009-06-16 Thread Avi Kivity


On 06/16/2009 05:08 PM, Alexander Graf wrote:

Please tell me you tested suspend/resume with/without VMs and cpu
hotunplug/hotplug.
 


I tested cpu hotplugging. On the last round I tested suspend/resume, but
this time I couldn't because my machine can't do suspend :-(.
So I'll try hard and find a machine I can test it on for the next round.
   


I can test suspend/resume for you if you don't have a friendly machine.  
I have a personal interest in keeping it working :)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Tue, Jun 16, 2009 at 10:54:28AM -0400, Gregory Haskins wrote:
> > +static void _eventfd_notify(struct eventfd_ctx *ctx)
> > +{
> > +   struct eventfd_notifier *en;
> > +   int idx;
> > +
> > +   idx = srcu_read_lock(&ctx->srcu);
> > +
> > +   /*
> > +* The goal here is to allow the notification to be preemptible
> > +* as often as possible.  We cannot achieve this with the basic
> > +* wqh mechanism because it requires the wqh->lock.  Therefore
> > +* we have an internal srcu list mechanism of which the wqh is
> > +* a client.
> > +*
> > +* Not all paths will invoke this function in process context.
> > +* Callers should check for suitable state before assuming they
> > +* can sleep (such as with preemptible()).  Paul McKenney 
> > assures
> > +* me that srcu_read_lock is compatible with in-atomic, as long 
> > as
> > +* the code within the critical section is also compatible.
> > +*/
> > +   list_for_each_entry_rcu(en, &ctx->nh, list)
> > +   en->ops->signal(en);
> > +
> > +   srcu_read_unlock(&ctx->srcu, idx);
> > +}
> > +
> >  /*
> >   * Adds "n" to the eventfd counter "count". Returns "n" in case of
> >   * success, or a value lower then "n" in case of coutner overflow.
> > 
> > 
> >   
>  This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
>  false.
> 
>  Further, to do useful things it might not be enough that you can sleep:
>  with iofd you also want to access current task with e.g. copy from user.
> 
>  Here's an idea: let's pass a flag to ->signal, along the lines of
>  signal_is_task, that tells us that it is safe to use current, and add
>  eventfd_signal_task() which is the same as eventfd_signal but lets 
>  everyone
>  know that it's safe to both sleep and use current->mm.
> 
>  Makes sense?
>    
>    
>  
> >>> It does make sense, yes.  What I am not clear on is how would eventfd
> >>> detect this state such as to populate such flags, and why cant the
> >>> ->signal() CB do the same?
> >>>
> >>> Thanks Michael,
> >>> -Greg
> >>>
> >>> 
> >>>   
> >> eventfd can't detect this state. But the callers know in what context they 
> >> are.
> >> So the *caller* of eventfd_signal_task makes sure of this: if you are in a 
> >> task,
> >> you can call eventfd_signal_task() if not, you must call eventfd_signal.
> >>
> >>
> >>   
> >> 
> > Hmm, this is an interesting idea, but I think it would be problematic in
> > real-world applications for the long-term.  For instance, the -rt tree
> > and irq-threads .config option in the process of merging into mainline
> > changes context types for established code.  Therefore, what might be
> > "hardirq/softirq" logic today may execute in a kthread tomorrow.  I
> > think its dangerous to try to solve the problem with caller provided
> > info:  the caller may be ignorant of its true state.
> 
> Also, we need to consider that a process context can still be in-atomic
> if the user did something like disabled interrupts, preemption, used a
> spinlock, etc, before calling the eventfd_signal_task() function. 
> Perhaps we can put a stake in the ground that says you must not call
> this from atomic context,

That's the ticket.

> but I still prefer just being able to detect
> this from our state.
> 
> -Greg
> 
> 

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 16, 2009 at 10:48:27AM -0400, Gregory Haskins wrote:
>   
>> +static void _eventfd_notify(struct eventfd_ctx *ctx)
>> +{
>> +struct eventfd_notifier *en;
>> +int idx;
>> +
>> +idx = srcu_read_lock(&ctx->srcu);
>> +
>> +/*
>> + * The goal here is to allow the notification to be preemptible
>> + * as often as possible.  We cannot achieve this with the basic
>> + * wqh mechanism because it requires the wqh->lock.  Therefore
>> + * we have an internal srcu list mechanism of which the wqh is
>> + * a client.
>> + *
>> + * Not all paths will invoke this function in process context.
>> + * Callers should check for suitable state before assuming they
>> + * can sleep (such as with preemptible()).  Paul McKenney 
>> assures
>> + * me that srcu_read_lock is compatible with in-atomic, as long 
>> as
>> + * the code within the critical section is also compatible.
>> + */
>> +list_for_each_entry_rcu(en, &ctx->nh, list)
>> +en->ops->signal(en);
>> +
>> +srcu_read_unlock(&ctx->srcu, idx);
>> +}
>> +
>>  /*
>>   * Adds "n" to the eventfd counter "count". Returns "n" in case of
>>   * success, or a value lower then "n" in case of coutner overflow.
>> 
>> 
>> 
> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always 
> false.
>
> Further, to do useful things it might not be enough that you can sleep:
> with iofd you also want to access current task with e.g. copy from user.
>
> Here's an idea: let's pass a flag to ->signal, along the lines of
> signal_is_task, that tells us that it is safe to use current, and add
> eventfd_signal_task() which is the same as eventfd_signal but lets 
> everyone
> know that it's safe to both sleep and use current->mm.
>
> Makes sense?
>   
>   
>   
 It does make sense, yes.  What I am not clear on is how would eventfd
 detect this state such as to populate such flags, and why cant the
 ->signal() CB do the same?

 Thanks Michael,
 -Greg

 
 
>>> eventfd can't detect this state. But the callers know in what context they 
>>> are.
>>> So the *caller* of eventfd_signal_task makes sure of this: if you are in a 
>>> task,
>>> you can call eventfd_signal_task() if not, you must call eventfd_signal.
>>>
>>>
>>>   
>>>   
>> Hmm, this is an interesting idea, but I think it would be problematic in
>> real-world applications for the long-term.  For instance, the -rt tree
>> and irq-threads .config option in the process of merging into mainline
>> changes context types for established code.  Therefore, what might be
>> "hardirq/softirq" logic today may execute in a kthread tomorrow.
>> 
>
> That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is 
> just
> an optimization. I think everyone not in the context of a system call or 
> vmexit
> can just call eventfd_signal_task.
>   
 

I assume you meant s/eventfd_signal_task/eventfd_signal there?

>   
>>  I
>> think its dangerous to try to solve the problem with caller provided
>> info:  the caller may be ignorant of its true state.
>> 
>
> I assume this wasn't clear enough: the idea is that you only
> calls eventfd_signal_task if you know you are on a systemcall path.
> If you are ignorant of the state, call eventfd_signal.
>   

Well, its not a matter of correctness.  Its more for optimal
performance.  If I have PCI pass-though injecting interrupts from
hardirq in mainline, clearly eventfd_signal() is proper.  In -rt, the
hardirq is transparently converted to a kthread, so technically
eventfd_signal_task() would work (at least for the can_sleep() part, not
for current->mm per se).  But in this case, the PCI logic would not know
it was converted to a kthread.  It all happens transparently in the
low-level code and the pci code is unmodified.

In this case, your proposal would have the passthrough path invoking
irqfd with eventfd_signal().  It  would therefore still shunt to a
workqueue to inject the interrupt, even though it would have been
perfectly fine to just inject it directly because taking
mutex_lock(&kvm->irq_lock) is legal.  Perhaps I am over-optimizing, but
this is the scenario I am concerned about and what I was trying to
address with preemptible()/can_sleep().

I think your idea is a good one to address the current->mm portion.  It
would only ever be safe to access the MM context from syscall/vmexit
context, as you point out.  Therefore, I see no problem with
implementing something like iosignalfd with eventfd_signal_task().

But accessing current->mm is only a subset of

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Tue, Jun 16, 2009 at 11:20:18AM -0400, Gregory Haskins wrote:
> >>> eventfd can't detect this state. But the callers know in what context 
> >>> they are.
> >>> So the *caller* of eventfd_signal_task makes sure of this: if you are in 
> >>> a task,
> >>> you can call eventfd_signal_task() if not, you must call eventfd_signal.
> >>>
> >>>
> >>>   
> >>>   
> >> Hmm, this is an interesting idea, but I think it would be problematic in
> >> real-world applications for the long-term.  For instance, the -rt tree
> >> and irq-threads .config option in the process of merging into mainline
> >> changes context types for established code.  Therefore, what might be
> >> "hardirq/softirq" logic today may execute in a kthread tomorrow.
> >> 
> >
> > That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is 
> > just
> > an optimization. I think everyone not in the context of a system call or 
> > vmexit
> > can just call eventfd_signal_task.
> >   
>  
> 
> I assume you meant s/eventfd_signal_task/eventfd_signal there?

Yea. Sorry.

> >   
> >>  I
> >> think its dangerous to try to solve the problem with caller provided
> >> info:  the caller may be ignorant of its true state.
> >> 
> >
> > I assume this wasn't clear enough: the idea is that you only
> > calls eventfd_signal_task if you know you are on a systemcall path.
> > If you are ignorant of the state, call eventfd_signal.
> >   
> 
> Well, its not a matter of correctness.  Its more for optimal
> performance.  If I have PCI pass-though injecting interrupts from
> hardirq in mainline, clearly eventfd_signal() is proper.  In -rt, the
> hardirq is transparently converted to a kthread, so technically
> eventfd_signal_task() would work

I think it's wrong to sleep for a long time while handling interrupts even if
you technically can with threaded interrupts. For that matter, I think you can
sleep while within a spinlock if preempt is on, but that does not mean you
should - and I think you will get warnings in the log if you do. No?

> (at least for the can_sleep() part, not
> for current->mm per se).  But in this case, the PCI logic would not know
> it was converted to a kthread.  It all happens transparently in the
> low-level code and the pci code is unmodified.
> 
> In this case, your proposal would have the passthrough path invoking
> irqfd with eventfd_signal().  It  would therefore still shunt to a
> workqueue to inject the interrupt, even though it would have been
> perfectly fine to just inject it directly because taking
> mutex_lock(&kvm->irq_lock) is legal.

This specific issue should just be addressed by making it possible
to inject the interrupt from an atomic context.

>  Perhaps I am over-optimizing, but
> this is the scenario I am concerned about and what I was trying to
> address with preemptible()/can_sleep().

What, a path that is never invoked without threaded interrupts?
Yes, I would say it currently looks like an over-optimization.

> I think your idea is a good one to address the current->mm portion.  It
> would only ever be safe to access the MM context from syscall/vmexit
> context, as you point out.  Therefore, I see no problem with
> implementing something like iosignalfd with eventfd_signal_task().
> 
> But accessing current->mm is only a subset of the use-cases.  The other
> use-cases would include the ability to sleep, and possibly the ability
> to address other->mm.  For these latter cases, I really only need the
> "can_sleep()" behavior, not the full blown "can_access_current_mm()". 
> Additionally, the eventfd_signal_task() data at least for iosignalfd is
> superfluous:  I already know that I can access current->mm by virtue of
> the design.

Maybe I misunderstand what iosignalfd is - isn't it true that you get eventfd
and kvm will signal that when guest performs io write to a specific
address, and then drivers can get eventfd and poll it to detect
these writes?

If yes, you have no way to know that the other end of eventfd
is connected to kvm, so you don't know you can access current->mm.

> So since I cannot use it accurately for the hardirq/threaded-irq type
> case, and I don't actually need it for the iosignalfd case, I am not
> sure its the right direction (at least for now).  I do think it might
> have merit for syscal/vmexit uses outside of iosignalfd, but I do not
> see a current use-case for it so perhaps it can wait until one arises.
> 
> -Greg

But, it addresses the CONFIG_PREEMPT off case, which your design doesn't
seem to.

> >   
> >>  IMO, the ideal
> >> solution needs to be something we can detect at run-time.
> >>
> >> Thanks Michael,
> >> -Greg
> >>
> >> 
> >
> >
> >   
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-AUTOTEST PATCH 2/4] kvm_guest_wizard: pass 'params' directly to barrier_2()

2009-06-16 Thread Michael Goldish


- "Uri Lublin"  wrote:

> On 06/15/2009 10:45 PM, Michael Goldish wrote:
> > Currently parameters for barrier_2() are extracted from 'params' in
> the main
> > run_steps() test routine, and then passed to barrier_2().
> > Instead, let barrier_2() extract parameters from 'params' as it sees
> fit.
> > This will make adding new parameters slightly easier and cleaner.
> >
> > Signed-off-by: Michael Goldish
> > ---
> >   client/tests/kvm/kvm_guest_wizard.py |   37
> -
> >   1 files changed, 18 insertions(+), 19 deletions(-)
> >
> > diff --git a/client/tests/kvm/kvm_guest_wizard.py
> b/client/tests/kvm/kvm_guest_wizard.py
> > index 143e61e..eb0e2d5 100644
> > --- a/client/tests/kvm/kvm_guest_wizard.py
> > +++ b/client/tests/kvm/kvm_guest_wizard.py
> 
> 
> > @@ -41,6 +41,18 @@ def barrier_2(vm, words, fail_if_stuck_for,
> stuck_detection_history,
> >   
> "cropped_scrdump_expected.ppm")
> >   comparison_filename = os.path.join(debug_dir,
> "comparison.ppm")
> >
> > +fail_if_stuck_for = params.get("fail_if_stuck_for")
> > +if fail_if_stuck_for:
> > +fail_if_stuck_for = float(fail_if_stuck_for)
> > +else:
> > +fail_if_stuck_for = 1e308
> > +
> > +stuck_detection_history =
> params.get("stuck_detection_history")
> > +if stuck_detection_history:
> > +stuck_detection_history = int(stuck_detection_history)
> > +else:
> > +stuck_detection_history = 2
> > +
> 
> Could be simplified by (preferably in a separate patch):
> fail_if_stuck_for = params.get("fail_if_stuck_for", 1e308)
> fail_if_stuck_for = float(fail_if_stuck_for)
> 
> stuck_detection_history =
> params.get("stuck_detection_history", 2)
> stuck_detection_history = int(stuck_detection_history)
> 
> or even with a single line for each param:
> var = cast(params.get(var_name, default_value))

But what if the user wants to disable the feature or use the default
value by setting fail_if_stuck_for = ''?
This is required, for example, when the param is set to 300 for all
tests, and the user wants to disable it only for Fedora tests.
In that case params.get("fail_if_stuck_for", 1e308) would return ''
and the following cast would raise an exception.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 16, 2009 at 11:20:18AM -0400, Gregory Haskins wrote:
>   
> eventfd can't detect this state. But the callers know in what context 
> they are.
> So the *caller* of eventfd_signal_task makes sure of this: if you are in 
> a task,
> you can call eventfd_signal_task() if not, you must call eventfd_signal.
>
>
>   
>   
>   
 Hmm, this is an interesting idea, but I think it would be problematic in
 real-world applications for the long-term.  For instance, the -rt tree
 and irq-threads .config option in the process of merging into mainline
 changes context types for established code.  Therefore, what might be
 "hardirq/softirq" logic today may execute in a kthread tomorrow.
 
 
>>> That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is 
>>> just
>>> an optimization. I think everyone not in the context of a system call or 
>>> vmexit
>>> can just call eventfd_signal_task.
>>>   
>>>   
>>  
>>
>> I assume you meant s/eventfd_signal_task/eventfd_signal there?
>> 
>
> Yea. Sorry.
>   

np.  I knew what you meant ;)
>   
>>>   
>>>   
  I
 think its dangerous to try to solve the problem with caller provided
 info:  the caller may be ignorant of its true state.
 
 
>>> I assume this wasn't clear enough: the idea is that you only
>>> calls eventfd_signal_task if you know you are on a systemcall path.
>>> If you are ignorant of the state, call eventfd_signal.
>>>   
>>>   
>> Well, its not a matter of correctness.  Its more for optimal
>> performance.  If I have PCI pass-though injecting interrupts from
>> hardirq in mainline, clearly eventfd_signal() is proper.  In -rt, the
>> hardirq is transparently converted to a kthread, so technically
>> eventfd_signal_task() would work
>> 
>
> I think it's wrong to sleep for a long time while handling interrupts even if
> you technically can with threaded interrupts.

Well, this is somewhat of an orthogonal issue so I don't want to open
this can of worms per se.

But, one of the long-term goals of the threaded-irq construct is to
eliminate the need for the traditional "bh" contexts to do work.  The
idea, of course, is that the threaded-irq context can do all of the work
traditionally shunted to tasklets/softirqs/workqueues directly, so why
bother switching contexts.  So, the short answer is that its not
necessarily wrong to sleep or to do significant processing from a
threaded-irq.

In any case, threaded-irqs are just one example.  I will try to
highlight others below.

>  For that matter, I think you can
> sleep while within a spinlock if preempt is on
Yes, in fact the spinlocks are mutexes in this mode, so the locks
themselves can sleep.


> , but that does not mean you
> should - and I think you will get warnings in the log if you do. No?
>
>   
Nope, sleeping is fine (voluntary or involuntary).  The idea is its all
governed by priority, and priority-inheritance, and a scheduler that is
free to make fine-grained decisions (due to the broadly preemptible
kernel).  But this is getting off-topic so I will stop.

>> (at least for the can_sleep() part, not
>> for current->mm per se).  But in this case, the PCI logic would not know
>> it was converted to a kthread.  It all happens transparently in the
>> low-level code and the pci code is unmodified.
>>
>> In this case, your proposal would have the passthrough path invoking
>> irqfd with eventfd_signal().  It  would therefore still shunt to a
>> workqueue to inject the interrupt, even though it would have been
>> perfectly fine to just inject it directly because taking
>> mutex_lock(&kvm->irq_lock) is legal.
>> 
>
> This specific issue should just be addressed by making it possible
> to inject the interrupt from an atomic context.
>   

I assume you mean
s/mutex_lock(&kvm->irq_lock)/spin_lock(&kvm->irq_lock)?  If so, I agree
this is a good idea.  TBH: I am more concerned about the iosignalfd path
w.r.t. the premptiblity of the callback.  We can optimize the virtio-net
interface, for instance, once we make this ->signal() callback
preemptible.  Having irqfd ->signal() preemptible is just a bonus, but
we could work around it by fixing irq_lock as well, I agree.

>   
>>  Perhaps I am over-optimizing, but
>> this is the scenario I am concerned about and what I was trying to
>> address with preemptible()/can_sleep().
>> 
>
> What, a path that is never invoked without threaded interrupts?
> Yes, I would say it currently looks like an over-optimization.
>   

You are only seeing part of the picture though.  This is a cascading
pattern.
>   
>> I think your idea is a good one to address the current->mm portion.  It
>> would only ever be safe to access the MM context from syscall/vmexit
>> context, as you point out.  Therefore, I see no problem with
>> implementing something like

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Davide Libenzi

On Tue, 16 Jun 2009, Gregory Haskins wrote:

> Does this all make sense?

This conversation has been *really* long, and I haven't had time to look 
at the patch yet. But looking at the amount of changes, and the amount of 
even more changes talked in this thread, there's a very slim chance that 
I'll ACK the eventfd code.
You may want to consider a solution that does not litter eventfd code that 
much.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

virtio-blk-pci chokes on some PCI slots

2009-06-16 Thread Markus Armbruster

I'm working on make PCI slots configurable in QEMU.  While testing the
feature for virtio-blk-pci, I ran into a task hang with Fedora kernel
2.6.27.5-117.fc10.  Marcelo tested it with a newer kernel, and will
follow up with details.

Instead of the full QEMU patch I'm working on, the appended little hack
suffices to reproduce the problem.

$ QEMU_VIRTIO_BLK=9 qemu foo.qcow2 -drive file=bar.img,if=virtio -serial 
stdio
[...]
Loading virtio_pci module
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
virtio-pci :00:04.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, low) -> 
IRQ 11
virtio-pci :00:09.0: can't derive routing for PCI INT A
virtio-pci :00:09.0: PCI INT A: no GSI - using IRQ 11
Loading virtio_blk module
 vda:
[tick-tock-tick-tock...]
<3>INFO: task modprobe:491 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe  D c04303f1 0   491  1
   c6baeae8 0082 c6baea80 c04303f1 c6bae000 c087667c c0879c00 
c0879c00 
   c0879c00 c6b659b0 c6b65c24 c1108c00  c1108c00 83a43d86 
0002 
   c781aaa8 c6baead0 c04418d0 c6b65c24 4d88 c781aaa8 4d88 
c6baeb3c 
Call Trace:
 [] ? irq_exit+0x5f/0x83
 [] ? getnstimeofday+0x3c/0xc9
 [] ? ktime_get_ts+0x4a/0x4e
 [] io_schedule+0x52/0x8a
 [] sync_page+0x46/0x4c
 [] __wait_on_bit_lock+0x34/0x5e
 [] ? sync_page+0x0/0x4c
 [] __lock_page+0x78/0x81
 [] ? wake_bit_function+0x0/0x43
 [] lock_page+0x2c/0x2f
 [] read_cache_page_async+0xa4/0xfb
 [] ? blkdev_readpage+0x0/0x11
 [] read_cache_page+0xc/0x3c
 [] read_dev_sector+0x34/0x6a
 [] ? efi_partition+0x0/0x676
 [] read_lba+0x69/0xc6
 [] ? efi_partition+0x84/0x676
 [] ? efi_partition+0x0/0x676
 [] efi_partition+0xa3/0x676
 [] ? vprintk+0x2c8/0x2f3
 [] ? iget5_locked+0x33/0x114
 [] ? snprintf+0x15/0x17
 [] ? efi_partition+0x0/0x676
 [] rescan_partitions+0x106/0x242
 [] ? need_resched+0x18/0x22
 [] ? _cond_resched+0x8/0x32
 [] do_open+0x1ff/0x290
 [] __blkdev_get+0x71/0x7c
 [] blkdev_get+0xd/0xf
 [] register_disk+0xf0/0x141
 [] add_disk+0x34/0x89
 [] ? exact_match+0x0/0xb
 [] ? exact_lock+0x0/0x11
 [] virtblk_probe+0x275/0x2a8 [virtio_blk]
 [] virtio_dev_probe+0x89/0xb3
 [] driver_probe_device+0xa0/0x136
 [] __driver_attach+0x3a/0x59
 [] bus_for_each_dev+0x3b/0x63
 [] driver_attach+0x14/0x16
 [] ? __driver_attach+0x0/0x59
 [] bus_add_driver+0x9d/0x1ba
 [] driver_register+0x81/0xe1
 [] ? register_blkdev+0xc8/0xdc
 [] register_virtio_driver+0x1f/0x21
 [] init+0x22/0x24 [virtio_blk]
 [] _stext+0x3d/0x115
 [] ? init+0x0/0x24 [virtio_blk]
 [] sys_init_module+0x87/0x178
 [] syscall_call+0x7/0xb
 ===

Some slots work, e.g. 5, some don't, e.g. 6.

What's going on here?


diff --git a/hw/pc.c b/hw/pc.c
index 143b697..15c44ac 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1146,7 +1146,8 @@ static void pc_init1(ram_addr_t ram_size,
 int unit_id = 0;
 
 while ((index = drive_get_index(IF_VIRTIO, 0, unit_id)) != -1) {
-pci_create_simple(pci_bus, -1, "virtio-blk-pci");
+char *devfn = getenv("QEMU_VIRTIO_BLK");
+pci_create_simple(pci_bus, devfn ? atoi(devfn) * 8: -1, 
"virtio-blk-pci");
 unit_id++;
 }
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Davide Libenzi wrote:
> On Tue, 16 Jun 2009, Gregory Haskins wrote:
>
>   
>> Does this all make sense?
>> 
>
> This conversation has been *really* long, and I haven't had time to look 
> at the patch yet. But looking at the amount of changes, and the amount of 
> even more changes talked in this thread, there's a very slim chance that 
> I'll ACK the eventfd code.
> You may want to consider a solution that does not litter eventfd code that 
> much.
>
>
> - Davide
>
>
>   
Hi Davide,

I understand your position and value your time/insight into looking at
this things.

Despite the current ongoing discussion, I still stand that the current
patch is my proposed solution (though I have yet to convince Michael). 
But in any case,  if you have the time, please look it over because I
still think its the right direction to head in.

The general solution is that we use an srcu list instead of the
wait-queue, and thats really it.  If we can't eliminate that spinlock
held over the notification, it has usability implications at least for
irqfd/iosignalfd.  The only way I can think of to solve the problem
without modifying eventfd is to not use eventfd at all.   :(

Since using eventfd really captures the concept we are going for here
really well, reusing it has a ton of advantages including interface
compatibility and, of course, code-reuse of a tested/debugged code
base.  Heck, we may hopefully even improve eventfd for other users in
doing this work.  It would therefore be a shame to walk away from it if
it can be avoided.

So if what I proposed is not acceptable but you are willing to work with
me to find a solution that is, that would be ideal from my perspective. 
Otherwise, I apologize for all the noise.  You have been quite the
patient and helpful gentleman with me to date and I appreciate that.

Kind Regards,
-Greg

signature.asc
Description: OpenPGP digital signature

Re: virtio-blk-pci chokes on some PCI slots

2009-06-16 Thread Marcelo Tosatti

On Tue, Jun 16, 2009 at 06:47:45PM +0200, Markus Armbruster wrote:
> I'm working on make PCI slots configurable in QEMU.  While testing the
> feature for virtio-blk-pci, I ran into a task hang with Fedora kernel
> 2.6.27.5-117.fc10.  Marcelo tested it with a newer kernel, and will
> follow up with details.
> 
> Instead of the full QEMU patch I'm working on, the appended little hack
> suffices to reproduce the problem.
> 
> $ QEMU_VIRTIO_BLK=9 qemu foo.qcow2 -drive file=bar.img,if=virtio -serial 
> stdio
> [...]
> Loading virtio_pci module
> ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
> virtio-pci :00:04.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, low) 
> -> IRQ 11
> virtio-pci :00:09.0: can't derive routing for PCI INT A
> virtio-pci :00:09.0: PCI INT A: no GSI - using IRQ 11
> Loading virtio_blk module
>  vda:
> [tick-tock-tick-tock...]
> <3>INFO: task modprobe:491 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> modprobe  D c04303f1 0   491  1
>c6baeae8 0082 c6baea80 c04303f1 c6bae000 c087667c c0879c00 
> c0879c00 
>c0879c00 c6b659b0 c6b65c24 c1108c00  c1108c00 83a43d86 
> 0002 
>c781aaa8 c6baead0 c04418d0 c6b65c24 4d88 c781aaa8 4d88 
> c6baeb3c 
> Call Trace:
>  [] ? irq_exit+0x5f/0x83
>  [] ? getnstimeofday+0x3c/0xc9
>  [] ? ktime_get_ts+0x4a/0x4e
>  [] io_schedule+0x52/0x8a
>  [] sync_page+0x46/0x4c
>  [] __wait_on_bit_lock+0x34/0x5e
>  [] ? sync_page+0x0/0x4c
>  [] __lock_page+0x78/0x81
>  [] ? wake_bit_function+0x0/0x43
>  [] lock_page+0x2c/0x2f
>  [] read_cache_page_async+0xa4/0xfb
>  [] ? blkdev_readpage+0x0/0x11
>  [] read_cache_page+0xc/0x3c
>  [] read_dev_sector+0x34/0x6a
>  [] ? efi_partition+0x0/0x676
>  [] read_lba+0x69/0xc6
>  [] ? efi_partition+0x84/0x676
>  [] ? efi_partition+0x0/0x676
>  [] efi_partition+0xa3/0x676
>  [] ? vprintk+0x2c8/0x2f3
>  [] ? iget5_locked+0x33/0x114
>  [] ? snprintf+0x15/0x17
>  [] ? efi_partition+0x0/0x676
>  [] rescan_partitions+0x106/0x242
>  [] ? need_resched+0x18/0x22
>  [] ? _cond_resched+0x8/0x32
>  [] do_open+0x1ff/0x290
>  [] __blkdev_get+0x71/0x7c
>  [] blkdev_get+0xd/0xf
>  [] register_disk+0xf0/0x141
>  [] add_disk+0x34/0x89
>  [] ? exact_match+0x0/0xb
>  [] ? exact_lock+0x0/0x11
>  [] virtblk_probe+0x275/0x2a8 [virtio_blk]
>  [] virtio_dev_probe+0x89/0xb3
>  [] driver_probe_device+0xa0/0x136
>  [] __driver_attach+0x3a/0x59
>  [] bus_for_each_dev+0x3b/0x63
>  [] driver_attach+0x14/0x16
>  [] ? __driver_attach+0x0/0x59
>  [] bus_add_driver+0x9d/0x1ba
>  [] driver_register+0x81/0xe1
>  [] ? register_blkdev+0xc8/0xdc
>  [] register_virtio_driver+0x1f/0x21
>  [] init+0x22/0x24 [virtio_blk]
>  [] _stext+0x3d/0x115
>  [] ? init+0x0/0x24 [virtio_blk]
>  [] sys_init_module+0x87/0x178
>  [] syscall_call+0x7/0xb
>  ===
> 
> Some slots work, e.g. 5, some don't, e.g. 6.
> 
> What's going on here?
> 
> 
> diff --git a/hw/pc.c b/hw/pc.c
> index 143b697..15c44ac 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1146,7 +1146,8 @@ static void pc_init1(ram_addr_t ram_size,
>  int unit_id = 0;
>  
>  while ((index = drive_get_index(IF_VIRTIO, 0, unit_id)) != -1) {
> -pci_create_simple(pci_bus, -1, "virtio-blk-pci");
> +char *devfn = getenv("QEMU_VIRTIO_BLK");
> +pci_create_simple(pci_bus, devfn ? atoi(devfn) * 8: -1, 
> "virtio-blk-pci");
>  unit_id++;
>  }
>  }

no softlockup with 2.6.29-06638-g17db0a0, but not very friendly either
(note the backing device has no partitions in my case, perhaps that
makes a difference).

virtio-pci :00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, low)
-> IRQ 10
virtio-pci :00:09.0: can't derive routing for PCI INT A
virtio-pci :00:09.0: PCI INT A: no GSI - using IRQ 11

Starting udev:  Wait timeout. Will continue in the background.[FAILED]

[r...@guest ~]# dmesg | grep vda
 vda:<6>EXT3 FS on dm-0, internal journal

[r...@guest ~]# mount /dev/vda /mnt

Hangs there forever.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Michael S. Tsirkin

On Tue, Jun 16, 2009 at 12:17:22PM -0400, Gregory Haskins wrote:
> > Maybe I misunderstand what iosignalfd is - isn't it true that you get 
> > eventfd
> > and kvm will signal that when guest performs io write to a specific
> > address, and then drivers can get eventfd and poll it to detect
> > these writes?
> >   
> 
> Correct.
> 
> > If yes, you have no way to know that the other end of eventfd
> > is connected to kvm, so you don't know you can access current->mm.
> >   
> 
> Well, my intended use for them *does* know its connected to KVM. 

How does it know? You get a file descriptor and you even figure out it's an
eventfd. Now what? Any process can write there.

> Perhaps there will be others that do not in the future, but like I said
> it could be addressed as those actually arise.
> 
> 
> >   
> >> So since I cannot use it accurately for the hardirq/threaded-irq type
> >> case, and I don't actually need it for the iosignalfd case, I am not
> >> sure its the right direction (at least for now).  I do think it might
> >> have merit for syscal/vmexit uses outside of iosignalfd, but I do not
> >> see a current use-case for it so perhaps it can wait until one arises.
> >>
> >> -Greg
> >> 
> >
> > But, it addresses the CONFIG_PREEMPT off case, which your design doesn't
> > seem to.
> >   
> 
> Well, the problem is that it only addresses it partially in a limited
> set of circumstances, and doesn't address the broader problem.  I'll
> give you an example:
> 
> (First off, lets assume that we are not going to have
> eventfd_signal_task(), but rather eventfd_signal() with two option
> flags: EVENTFD_SIGNAL_CANSLEEP, and EVENTFD_SIGNAL_CURRENTVALID
> 
> Today vbus-enet has a rx-thread and a tx-thread at least partially
> because I need process-context to do the copy_to_user(other->mm) type
> stuff (and we will need similar for our virtio-net backend as well).  I
> also utilize irqfd to do interrupt injection.  Now, since I know that I
> have kthread_context, I could modify vbus-enet to inject interrupts with
> EVENTFD_SIGNAL_CANSLEEP set, and this is fine.  I know that is safe.
> 
> However, the problem is above that!  I would like to optimize out the
> tx-thread to begin with with a similar "can_sleep()" pattern whenever
> possible.
> 
> For instance, if the netif stack calls me to do a transmit and it
> happens to be in a sleepable context, it would be nice to just skip
> waking up the tx-thread.  I would therefore just do the
> copy_to_user(other->mm) + irqfd directly in the netif callback, thereby
> skipping the context switch.

What do you mean by copy_to_user(other->mm) here?  If you are going to switch
to another mm, then I think current->mm must be valid (I think it's not enough
that you can sleep). So preemptible() might not be enough.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface

2009-06-16 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 16, 2009 at 12:17:22PM -0400, Gregory Haskins wrote:
>   
>>> Maybe I misunderstand what iosignalfd is - isn't it true that you get 
>>> eventfd
>>> and kvm will signal that when guest performs io write to a specific
>>> address, and then drivers can get eventfd and poll it to detect
>>> these writes?
>>>   
>>>   
>> Correct.
>>
>> 
>>> If yes, you have no way to know that the other end of eventfd
>>> is connected to kvm, so you don't know you can access current->mm.
>>>   
>>>   
>> Well, my intended use for them *does* know its connected to KVM. 
>> 
>
> How does it know? You get a file descriptor and you even figure out it's an
> eventfd. Now what? Any process can write there.
>   

Well, I suppose that is true.  Unlike my current hypercall coupling
deployed in vbus v3, anyone can signal the event for the iosignalfd. 
However, it wouldn't matter other than looking like an errant signal as
I do not use "current" for anything. (e.g. all the virtio pointers are
stored independently to the iosignalfd)
>   
>> Perhaps there will be others that do not in the future, but like I said
>> it could be addressed as those actually arise.
>>
>>
>> 
>>>   
>>>   
 So since I cannot use it accurately for the hardirq/threaded-irq type
 case, and I don't actually need it for the iosignalfd case, I am not
 sure its the right direction (at least for now).  I do think it might
 have merit for syscal/vmexit uses outside of iosignalfd, but I do not
 see a current use-case for it so perhaps it can wait until one arises.

 -Greg
 
 
>>> But, it addresses the CONFIG_PREEMPT off case, which your design doesn't
>>> seem to.
>>>   
>>>   
>> Well, the problem is that it only addresses it partially in a limited
>> set of circumstances, and doesn't address the broader problem.  I'll
>> give you an example:
>>
>> (First off, lets assume that we are not going to have
>> eventfd_signal_task(), but rather eventfd_signal() with two option
>> flags: EVENTFD_SIGNAL_CANSLEEP, and EVENTFD_SIGNAL_CURRENTVALID
>>
>> Today vbus-enet has a rx-thread and a tx-thread at least partially
>> because I need process-context to do the copy_to_user(other->mm) type
>> stuff (and we will need similar for our virtio-net backend as well).  I
>> also utilize irqfd to do interrupt injection.  Now, since I know that I
>> have kthread_context, I could modify vbus-enet to inject interrupts with
>> EVENTFD_SIGNAL_CANSLEEP set, and this is fine.  I know that is safe.
>>
>> However, the problem is above that!  I would like to optimize out the
>> tx-thread to begin with with a similar "can_sleep()" pattern whenever
>> possible.
>>
>> For instance, if the netif stack calls me to do a transmit and it
>> happens to be in a sleepable context, it would be nice to just skip
>> waking up the tx-thread.  I would therefore just do the
>> copy_to_user(other->mm) + irqfd directly in the netif callback, thereby
>> skipping the context switch.
>> 
>
> What do you mean by copy_to_user(other->mm) here?  If you are going to switch
> to another mm, then I think current->mm must be valid (I think it's not enough
> that you can sleep). So preemptible() might not be enough.
>   

I dont currently use switch_mm, if that is what you mean.  I save the
task_struct into the appropriate context so current->mm doesn't matter
to me.  I never use it.  All I need (afaik) is to acquire the proper
mutex first.  I am not an MM expert, so perhaps I have this wrong but it
does appear to work properly even from kthread context.

-Greg




signature.asc
Description: OpenPGP digital signature

Debug message flooding system log

2009-06-16 Thread Simon Marchi

Hi,

Since I installed the latest version of qemu-kvm (0.10.5), the
following 3 lines are flooding my system log (dmesg). I have no idea
where this is coming from.

[433019.047800] arch/x86/kvm/x86.c kvm_write_guest_time
[433019.047802] Write (vcpu->hv_clock.system_time): 433019047801180
[433019.047804] FYI (vcpu->hv_clock.tsc_timestamp): 1298967607382796

I tried to look in the mentionned file (x86.c) in the kernel source,
but didn't find anything interesting. I probably activated a debug
option somewhere, but I really don't see where.

As an alternate option, I tried to set "dmesg -n1", which should hide
all messages except panics, but the 3 lines are still appearing at a
very fast rate.

Any idea where this comes from and how I can disable this output ?

Thanks

Simon Marchi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier

Avi Kivity wrote:
> Another issue is enumeration.  Guests will present their devices in the 
> order they find them on the pci bus (of course enumeration is guest 
> specific).  So if I have 2 virtio controllers the only way I can 
> distinguish between them is using their pci slots.

virtio controllers really should have a user-suppliable string or UUID
to identify them to the guest.  Don't they?

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier

Mark McLoughlin wrote:
> > After libvirt has done -drive file=foo... it should dump the machine 
> > config and use that from then on.
> 
> Right - libvirt then wouldn't be able to avoid the complexity of merging
> any future changes into the dumped machine config.

As long as qemu can accept a machine config _and_ -drive file=foo (and
monitor commands to add/remove devices), libvirt could merge by simply
calling qemu with whatever additional command line options or monitor
commands modify the config, then dump the new config.

That way, virtio would not have to deal with that complexity.  It
would be written in one place: qemu.

Or better, a utility: qemu-machine-config.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier

Mark McLoughlin wrote:
> > Worst case we hardcode those numbers (gasp, faint).
> 
> Maybe we can just add the open slots to the -help output. That'd be nice
> and clean.

Make them part of the machine configuration.

After all, they are part of the machine configuration, and ACPI, BIOS
etc. need to know about all the machine slots anyway.

Having said that, I prefer the idea that slot allocation is handled
either in Qemu, or in a separate utility called qemu-machine-config
(for working with machine configs), or in a library
libqemu-machine-config.so.

I particularly don't like the idea of arcane machine-dependent slot
allocation knowledge living in libvirt, because it needs to be in Qemu
anyway for non-libvirt users.  No point in having two implementations
of something tricky and likely to have machine quirks, if one will do.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] qemu-kvm-0.10.5, kvm-kmod-2.6.30, libvirt-0.6.4 -save/restore still unreliable

2009-06-16 Thread Nikola Ciprich

Hello Charles,

> Otherwise -- at the monitor, migrate "exec:cat >somefile.out"
I tried that, saving seems to work OK,

but if I restore the domain, kvm always seems to be stuck, I'm not able to swith
to monitor to do anything. running strace on the kvm process show that
following is repeating:

clock_gettime(CLOCK_MONOTONIC, {453254, 437209647}) = 0
clock_gettime(CLOCK_MONOTONIC, {453254, 437244463}) = 0
clock_gettime(CLOCK_MONOTONIC, {453254, 437295938}) = 0
select(20, [5 10 17 19], [], [], {1, 0}) = 1 (in [5], left {0, 98})
read(5, "\0", 512)  = 1
read(5, 0x7fff60506840, 512)= -1 EAGAIN (Resource temporarily 
unavailable)
clock_gettime(CLOCK_MONOTONIC, {453254, 437458602}) = 0
clock_gettime(CLOCK_MONOTONIC, {453254, 437492440}) = 0
timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {453254, 437605596}) = 0
clock_gettime(CLOCK_MONOTONIC, {453254, 437639489}) = 0
select(20, [5 10 17 19], [], [], {1, 0}) = 1 (in [19], left {0, 970104})
read(19, 
"\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 
128
rt_sigaction(SIGALRM, NULL, {0x4083b0, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 
0x7f25576e94c0}, 8) = 0
write(6, "\0", 1)   = 1
read(19, 0x7fff605069c0, 128)   = -1 EAGAIN (Resource temporarily 
unavailable)

I guess I maybe need to unfreeze the domain somehow after loading it?

> 
> Replace cat with gzip -c or other compression tool (and reverse on
> migrate in) for more efficient on-disk storage.
> 
> -Original Message-
> From: Nikola Ciprich [mailto:extmaill...@linuxbox.cz] 
> Sent: Saturday, June 13, 2009 1:32 AM
> To: CHARLESDUFFY
> Cc: kvm@vger.kernel.org; nikola.cipr...@linuxbox.cz
> Subject: Re: [libvirt] qemu-kvm-0.10.5, kvm-kmod-2.6.30, libvirt-0.6.4
> -save/restore still unreliable
> 
> Hmm,
> I just noticed that dump file saved using virsh contains some libvirt 
> related headers, so I guess I can't just feed it directly to qemu-kvm.
> So I guess best way to test save/restore without libvirt is to save 
> vm state using monitor and then restore it again.
> But I can't find any clear documentation on how to do this.
> Could anyone give me a hint?
> thanks a lot!
> nik
> 
> 
> On Fri, Jun 12, 2009 at 03:17:18PM -0500, Charles Duffy wrote:
> > That may well be a guest issue rather than a qemu migration problem --
> 
> > I've seen similar difficulties for guests being resumed via incoming  
> > migration when using guest kernels with TSC as their clocksource.
> >
> > You might try interrogating the monitor to see your guest's state.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> -- 
> -
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
> 
> tel.:   +420 596 603 142
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> 
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
> 

-- 
-
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 01/13] qemu: make default_write_config use mask table

2009-06-16 Thread Anthony Liguori


Michael S. Tsirkin wrote:

Change much of hw/pci to use symbolic constants and a table-driven
design: add a mask table with writable bits set and readonly bits unset.
Detect change by comparing original and new registers.

This makes it easy to support capabilities where read-only/writeable
bit layout differs between devices, depending on capabilities present.

As a result, writing a single byte in BAR registers now works as
it should. Writing to upper limit registers in the bridge
also works as it should. Code is also shorter.

Signed-off-by: Michael S. Tsirkin 

This series introduces warning (virtio_load decl/def does not match).

--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] lio-utils.git v3.0 configfs HOWTO for v2.6.30 released

2009-06-16 Thread Nicholas A. Bellinger

Grettings all,

The Linux-iSCSI.org Target v3.0 HOWTO using the new configfs enabled
generic target subsystem python CLI has been released and is up and
running on v2.6.30 of the Linux Kernel in x86_64 KVM (Debian), x86_64
bare-metal (SLES) and PPC64 (Fedora PS3) hosts!  The HOWTO entry can be
found at:

http://linux-iscsi.org/index.php/Howto

The v3.0 howto is a WIP and currently expects that the reader is fimilar
with building and installing their own LIO target modules within a linux
kernel source.  There is currently work being done to integrate this
into virtual machines imagines ot make the v3.0 LIO CLI more accessable
for end users.  We are expecting the v3.0 work to be available in a
couple of different forms soon.  Until then, interested folks should
have a look at the v3.0 howto and the lio-utils.git source tree for
usage information and towards build their own copy from source.

So far using the lio-utils.git tools for the Generic Target Core v3.0
with LIO-Target configfs enabled symlinks, the code has been able to
scale to 1 unique Virtual HBA+FILEIO LUNs with SCSI 3 emulation <->
1 unique LIO-Target v3.0 Endpoints inside of a 4GB memory x86_64 KVM
guest on recent x86_64 hardware.  Each of these HBA+LUNs and iSCSI
Target Endpoints are designed (using the TCM+LIO configfs kernel
infrastructure) to be configured completely independent of one another,
so that creating/changing one does not effect the real-time configfs ops
of another.

The configs CLI commands to reproduce the running setup can be dumped at
any time for the entire running setup using tcm_dump.py and lio_dump.py.
The lio-utils.git source repository is for v3.0 target_core_mod
+iscsi_target_mod configfs enabled code in lio-core-2.6.git:

http://git.kernel.org/?p=linux/storage/lio/lio-utils.git;a=summary

Additional information can be found here:

http://linux-iscsi.org/index.php/Lio-utils

Questions can be asked via the LIO-Devel or LIO-Users:

http://groups.google.com/group/linux-iscsi-target-dev
http://groups.google.com/group/linux-iscsi-target-users

The former is where most folks today (devels+users) are doing most of
their work, and thanks to the many people has been helping improve the
code!  I would like to encourage new end users to start posting in
linux-iscsi-targets-users as they get their v3.0 LIO-Target with
tcm_node.py + lio_node.py setups up and running with general questions
and comments.  

Also a special thanks to the kernel.org folks (warthog9 and hpa)
assitance with getting the lio-utils.git source repository up and
running, it would not be possible without their help.  Thank you!

--nab











--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH] Adding iperf test

2009-06-16 Thread Lucas Meneghel Rodrigues

Adding iperf network performance test. Basically it tests
networking functionality, stability and performance of guest OSes.
This test is cross-platform -- i.e. it works on both Linux and
Windows VMs.

Signed-off-by: Alexey Eromenko 
---
 client/tests/kvm/kvm.py   |1 +
 client/tests/kvm/kvm_iperf.py |  105 +
 client/tests/kvm/kvm_tests.cfg.sample |8 +++
 3 files changed, 114 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/kvm_iperf.py

diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
index 9428162..e1a6e27 100644
--- a/client/tests/kvm/kvm.py
+++ b/client/tests/kvm/kvm.py
@@ -53,6 +53,7 @@ class kvm(test.test):
 "autotest": test_routine("kvm_tests", "run_autotest"),
 "kvm_install":  test_routine("kvm_install", "run_kvm_install"),
 "linux_s3": test_routine("kvm_tests", "run_linux_s3"),
+"iperf":test_routine("kvm_iperf", "run_iperf"),
 }
 
 # Make it possible to import modules from the test's bindir
diff --git a/client/tests/kvm/kvm_iperf.py b/client/tests/kvm/kvm_iperf.py
new file mode 100644
index 000..927c9e5
--- /dev/null
+++ b/client/tests/kvm/kvm_iperf.py
@@ -0,0 +1,105 @@
+import time, os, logging
+from autotest_lib.client.common_lib import utils, error
+import kvm_utils
+
+def run_iperf(test, params, env):
+"""
+Runs iperf on the guest system and brings back the result.
+
+@see: http://sourceforge.net/projects/iperf
+@param test: kvm test object
+@param params: Dictionary with test parameters
+@param env: Test environment
+"""
+vm = kvm_utils.env_get_vm(env,  params.get("main_vm"))
+if not vm:
+message = "VM object not found in environment"
+logging.error(message)
+raise error.TestError, message
+if not vm.is_alive():
+message = "VM seems to be dead; Test requires a living VM"
+logging.error(message)
+raise error.TestError(message)
+
+logging.info("Waiting for guest to be up...")
+
+session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
+if not session:
+message = "Could not log into guest"
+logging.error(message)
+raise error.TestFail, message
+
+logging.info("Logged in")
+
+# Checking for GuestOS-compatible iPerf binary existence on host.
+iperf_binary = params.get("iperf_binary", "misc/iperf")
+iperf_duration = params.get("iperf_duration", 5)
+iperf_parallel_threads = params.get("iperf_parallel_threads", 1)
+iperf_dest_ip = params.get("iperf_dest_ip", "10.0.2.2")
+iperf_binary = os.path.join(test.bindir, iperf_binary)
+if not os.path.exists(iperf_binary):
+message = "iPerf binary: %s was not found on host" % iperf_binary
+logging.error(message)
+raise error.TestError, message
+else:
+logging.info("iPerf binary: %s was found on host" % iperf_binary)
+
+# Starting HostOS-compatible iPerf Server on host
+logging.info('VM is up ... \n starting iPerf Server on host')
+kvm_utils.run_bg("iperf -s", timeout=5)
+
+# Detecting GuestOS
+if iperf_binary.__contains__("exe"):
+vm_type="win32"
+else:
+vm_type="linux32"
+
+# Copying GuestOS-compatible iPerf binary to guest.
+# Starting iPerf Client on guest, plus connect to host.
+if vm_type == "win32":
+win_dir = "/cygdrive/c/"
+logging.info('starting copying %s to Windows VM to %s' % (iperf_binary,
+  win_dir))
+if not vm.scp_to_remote(iperf_binary, win_dir):
+message = "Could not copy Win32 iPerf to guest"
+logging.error(message)
+raise error.TestError(message)
+logging.debug("Enabling file permissions of iPerf.exe on Windows 
VM...")
+session.sendline('cacls C:\iperf.exe /P Administrator:F')
+session.sendline('y')
+session.sendline('')
+time.sleep(2)
+session.sendline('')
+logging.info("starting iPerf client on Windows VM, connecting to host")
+session.sendline('C:\iperf -t %s -c %s -P %s' % (int(iperf_duration),
+ iperf_dest_ip,
+   
int(iperf_parallel_threads)))
+else:
+logging.info('starting copying %s to Linux VM ' % iperf_binary)
+if not vm.scp_to_remote(iperf_binary, "/usr/local/bin"):
+message = "Could not copy Linux iPerf to guest"
+logging.error(message)
+raise error.TestError, message
+print "starting iPerf client on VM, connecting to host"
+session.sendline('iperf -t %s -c %s -P %s' % (int(iperf_duration),
+  iperf_dest_ip,
+   
int(iperf_parallel_threads)))

Re: [KVM-AUTOTEST PATCH] Adding iperf test

2009-06-16 Thread Lucas Meneghel Rodrigues

On Tue, 2009-06-16 at 18:29 -0300, Lucas Meneghel Rodrigues wrote:
> Adding iperf network performance test. Basically it tests
> networking functionality, stability and performance of guest OSes.
> This test is cross-platform -- i.e. it works on both Linux and
> Windows VMs.

Ok, now that I had rebased the test, I have a few comments to say:

 * I don't like the idea to ship binaries inside the test. Fair enough
that we don't support other archs than x86 and x86_64 (also there's the
problem windows usually doesn't ship a working toolchain), but I would
like to think a bit more about it.
 * Autotest already does have an iperf test, that could be used on Linux
guests. Sure there's a problem of matching what's being executed on
windows, but it's worth a look
 * Autotest iperf test usually runs with 2 machines, one in 'server'
role and other in 'client' mode. I would like to pursue the same model,
2 vms, one running as a server and another as a client.

Alexey and Yaniv, I'd like to hear your opinions on this.

Thanks,

> Signed-off-by: Alexey Eromenko 
> ---
>  client/tests/kvm/kvm.py   |1 +
>  client/tests/kvm/kvm_iperf.py |  105 
> +
>  client/tests/kvm/kvm_tests.cfg.sample |8 +++
>  3 files changed, 114 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/kvm_iperf.py
> 
> diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
> index 9428162..e1a6e27 100644
> --- a/client/tests/kvm/kvm.py
> +++ b/client/tests/kvm/kvm.py
> @@ -53,6 +53,7 @@ class kvm(test.test):
>  "autotest": test_routine("kvm_tests", "run_autotest"),
>  "kvm_install":  test_routine("kvm_install", 
> "run_kvm_install"),
>  "linux_s3": test_routine("kvm_tests", "run_linux_s3"),
> +"iperf":test_routine("kvm_iperf", "run_iperf"),
>  }
>  
>  # Make it possible to import modules from the test's bindir
> diff --git a/client/tests/kvm/kvm_iperf.py b/client/tests/kvm/kvm_iperf.py
> new file mode 100644
> index 000..927c9e5
> --- /dev/null
> +++ b/client/tests/kvm/kvm_iperf.py
> @@ -0,0 +1,105 @@
> +import time, os, logging
> +from autotest_lib.client.common_lib import utils, error
> +import kvm_utils
> +
> +def run_iperf(test, params, env):
> +"""
> +Runs iperf on the guest system and brings back the result.
> +
> +@see: http://sourceforge.net/projects/iperf
> +@param test: kvm test object
> +@param params: Dictionary with test parameters
> +@param env: Test environment
> +"""
> +vm = kvm_utils.env_get_vm(env,  params.get("main_vm"))
> +if not vm:
> +message = "VM object not found in environment"
> +logging.error(message)
> +raise error.TestError, message
> +if not vm.is_alive():
> +message = "VM seems to be dead; Test requires a living VM"
> +logging.error(message)
> +raise error.TestError(message)
> +
> +logging.info("Waiting for guest to be up...")
> +
> +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
> +if not session:
> +message = "Could not log into guest"
> +logging.error(message)
> +raise error.TestFail, message
> +
> +logging.info("Logged in")
> +
> +# Checking for GuestOS-compatible iPerf binary existence on host.
> +iperf_binary = params.get("iperf_binary", "misc/iperf")
> +iperf_duration = params.get("iperf_duration", 5)
> +iperf_parallel_threads = params.get("iperf_parallel_threads", 1)
> +iperf_dest_ip = params.get("iperf_dest_ip", "10.0.2.2")
> +iperf_binary = os.path.join(test.bindir, iperf_binary)
> +if not os.path.exists(iperf_binary):
> +message = "iPerf binary: %s was not found on host" % iperf_binary
> +logging.error(message)
> +raise error.TestError, message
> +else:
> +logging.info("iPerf binary: %s was found on host" % iperf_binary)
> +
> +# Starting HostOS-compatible iPerf Server on host
> +logging.info('VM is up ... \n starting iPerf Server on host')
> +kvm_utils.run_bg("iperf -s", timeout=5)
> +
> +# Detecting GuestOS
> +if iperf_binary.__contains__("exe"):
> +vm_type="win32"
> +else:
> +vm_type="linux32"
> +
> +# Copying GuestOS-compatible iPerf binary to guest.
> +# Starting iPerf Client on guest, plus connect to host.
> +if vm_type == "win32":
> +win_dir = "/cygdrive/c/"
> +logging.info('starting copying %s to Windows VM to %s' % 
> (iperf_binary,
> +  win_dir))
> +if not vm.scp_to_remote(iperf_binary, win_dir):
> +message = "Could not copy Win32 iPerf to guest"
> +logging.error(message)
> +raise error.TestError(message)
> +logging.debug("Enabling file permissions of iPerf.exe on Windows 
> VM...")
> +session.sendline('cacls C:\iperf.exe /P Adm

[patch 0/2] replace kvmtrace with event traces v2

2009-06-16 Thread Marcelo Tosatti

-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/2] KVM: remove old KVMTRACE support code

2009-06-16 Thread Marcelo Tosatti

Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.

Signed-off-by: Marcelo Tosatti 


Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -2367,7 +2367,7 @@ static long kvm_dev_ioctl(struct file *f
case KVM_TRACE_ENABLE:
case KVM_TRACE_PAUSE:
case KVM_TRACE_DISABLE:
-   r = kvm_trace_ioctl(ioctl, arg);
+   r = -EOPNOTSUPP;
break;
default:
return kvm_arch_dev_ioctl(filp, ioctl, arg);
@@ -2717,7 +2717,6 @@ EXPORT_SYMBOL_GPL(kvm_init);
 
 void kvm_exit(void)
 {
-   kvm_trace_cleanup();
tracepoint_synchronize_unregister();
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -482,37 +482,6 @@ struct kvm_stats_debugfs_item {
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
 
-#define KVMTRACE_5D(evt, vcpu, d1, d2, d3, d4, d5, name) \
-   trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
-   vcpu, 5, d1, d2, d3, d4, d5)
-#define KVMTRACE_4D(evt, vcpu, d1, d2, d3, d4, name) \
-   trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
-   vcpu, 4, d1, d2, d3, d4, 0)
-#define KVMTRACE_3D(evt, vcpu, d1, d2, d3, name) \
-   trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
-   vcpu, 3, d1, d2, d3, 0, 0)
-#define KVMTRACE_2D(evt, vcpu, d1, d2, name) \
-   trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
-   vcpu, 2, d1, d2, 0, 0, 0)
-#define KVMTRACE_1D(evt, vcpu, d1, name) \
-   trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
-   vcpu, 1, d1, 0, 0, 0, 0)
-#define KVMTRACE_0D(evt, vcpu, name) \
-   trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
-   vcpu, 0, 0, 0, 0, 0, 0)
-
-#ifdef CONFIG_KVM_TRACE
-int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg);
-void kvm_trace_cleanup(void);
-#else
-static inline
-int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg)
-{
-   return -EINVAL;
-}
-#define kvm_trace_cleanup() ((void)0)
-#endif
-
 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
 static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
 {
Index: kvm/virt/kvm/kvm_trace.c
===
--- kvm.orig/virt/kvm/kvm_trace.c
+++ /dev/null
@@ -1,285 +0,0 @@
-/*
- * kvm trace
- *
- * It is designed to allow debugging traces of kvm to be generated
- * on UP / SMP machines.  Each trace entry can be timestamped so that
- * it's possible to reconstruct a chronological record of trace events.
- * The implementation refers to blktrace kernel support.
- *
- * Copyright (c) 2008 Intel Corporation
- * Copyright (C) 2006 Jens Axboe 
- *
- * Authors: Feng(Eric) Liu, eric.e@intel.com
- *
- * Date:Feb 2008
- */
-
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-#define KVM_TRACE_STATE_RUNNING(1 << 0)
-#define KVM_TRACE_STATE_PAUSE  (1 << 1)
-#define KVM_TRACE_STATE_CLEARUP(1 << 2)
-
-struct kvm_trace {
-   int trace_state;
-   struct rchan *rchan;
-   struct dentry *lost_file;
-   atomic_t lost_records;
-};
-static struct kvm_trace *kvm_trace;
-
-struct kvm_trace_probe {
-   const char *name;
-   const char *format;
-   u32 timestamp_in;
-   marker_probe_func *probe_func;
-};
-
-static inline int calc_rec_size(int timestamp, int extra)
-{
-   int rec_size = KVM_TRC_HEAD_SIZE;
-
-   rec_size += extra;
-   return timestamp ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size;
-}
-
-static void kvm_add_trace(void *probe_private, void *call_data,
- const char *format, va_list *args)
-{
-   struct kvm_trace_probe *p = probe_private;
-   struct kvm_trace *kt = kvm_trace;
-   struct kvm_trace_rec rec;
-   struct kvm_vcpu *vcpu;
-   inti, size;
-   u32extra;
-
-   if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING))
-   return;
-
-   rec.rec_val = TRACE_REC_EVENT_ID(va_arg(*args, u32));
-   vcpu= va_arg(*args, struct kvm_vcpu *);
-   rec.pid = current->tgid;
-   rec.vcpu_id = vcpu->vcpu_id;
-
-   extra   = va_arg(*args, u32);
-   WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX));
-   extra   = min_t(u32, extra, KVM_TRC_EXTRA_MAX);
-
-   rec.rec_val |= TRACE_REC_TCS(p->

[patch 1/2] KVM: convert custom marker based tracing to event traces

2009-06-16 Thread Marcelo Tosatti

This allows use of the powerful ftrace infrastructure.

See Documentation/trace/ for usage information.

Signed-off-by: Marcelo Tosatti 

Index: kvm/arch/x86/kvm/lapic.c
===
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -34,6 +34,7 @@
 #include 
 #include "kvm_cache_regs.h"
 #include "irq.h"
+#include "trace.h"
 
 #ifndef CONFIG_X86_64
 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
@@ -515,8 +516,6 @@ static u32 __apic_read(struct kvm_lapic 
 {
u32 val = 0;
 
-   KVMTRACE_1D(APIC_ACCESS, apic->vcpu, (u32)offset, handler);
-
if (offset >= LAPIC_MMIO_LENGTH)
return 0;
 
@@ -562,6 +561,8 @@ static void apic_mmio_read(struct kvm_io
}
result = __apic_read(apic, offset & ~0xf);
 
+   trace_kvm_apic_read(offset, result);
+
switch (len) {
case 1:
case 2:
@@ -657,7 +658,7 @@ static void apic_mmio_write(struct kvm_i
 
offset &= 0xff0;
 
-   KVMTRACE_1D(APIC_ACCESS, apic->vcpu, (u32)offset, handler);
+   trace_kvm_apic_write(offset, val);
 
switch (offset) {
case APIC_ID:   /* Local APIC ID */
Index: kvm/arch/x86/kvm/svm-trace.h
===
--- /dev/null
+++ kvm/arch/x86/kvm/svm-trace.h
@@ -0,0 +1,51 @@
+#define exit_reasons svm_exit_reasons
+#define svm_exit_reasons   \
+   {SVM_EXIT_READ_CR0, "read_cr0"},\
+   {SVM_EXIT_READ_CR3, "read_cr3"},\
+   {SVM_EXIT_READ_CR4, "read_cr4"},\
+   {SVM_EXIT_READ_CR8, "read_cr8"},\
+   {SVM_EXIT_WRITE_CR0,"write_cr0"},   \
+   {SVM_EXIT_WRITE_CR3,"write_cr3"},   \
+   {SVM_EXIT_WRITE_CR4,"write_cr4"},   \
+   {SVM_EXIT_WRITE_CR8,"write_cr8"},   \
+   {SVM_EXIT_READ_DR0, "read_dr0"},\
+   {SVM_EXIT_READ_DR1, "read_dr1"},\
+   {SVM_EXIT_READ_DR2, "read_dr2"},\
+   {SVM_EXIT_READ_DR3, "read_dr3"},\
+   {SVM_EXIT_WRITE_DR0,"write_dr0"},   \
+   {SVM_EXIT_WRITE_DR1,"write_dr1"},   \
+   {SVM_EXIT_WRITE_DR2,"write_dr2"},   \
+   {SVM_EXIT_WRITE_DR3,"write_dr3"},   \
+   {SVM_EXIT_WRITE_DR5,"write_dr5"},   \
+   {SVM_EXIT_WRITE_DR7,"write_dr7"},   \
+   {SVM_EXIT_EXCP_BASE + DB_VECTOR,"DB excp"}, \
+   {SVM_EXIT_EXCP_BASE + BP_VECTOR,"BP excp"}, \
+   {SVM_EXIT_EXCP_BASE + UD_VECTOR,"UD excp"}, \
+   {SVM_EXIT_EXCP_BASE + PF_VECTOR,"PF excp"}, \
+   {SVM_EXIT_EXCP_BASE + NM_VECTOR,"NM excp"}, \
+   {SVM_EXIT_EXCP_BASE + MC_VECTOR,"MC excp"}, \
+   {SVM_EXIT_INTR, "interrupt"},   \
+   {SVM_EXIT_NMI,  "nmi"}, \
+   {SVM_EXIT_SMI,  "smi"}, \
+   {SVM_EXIT_INIT, "init"},\
+   {SVM_EXIT_VINTR,"vintr"},   \
+   {SVM_EXIT_CPUID,"cpuid"},   \
+   {SVM_EXIT_INVD, "invd"},\
+   {SVM_EXIT_HLT,  "hlt"}, \
+   {SVM_EXIT_INVLPG,   "invlpg"},  \
+   {SVM_EXIT_INVLPGA,  "invlpga"}, \
+   {SVM_EXIT_IOIO, "io"},  \
+   {SVM_EXIT_MSR,  "msr"}, \
+   {SVM_EXIT_TASK_SWITCH,  "task_switch"}, \
+   {SVM_EXIT_SHUTDOWN, "shutdown"},\
+   {SVM_EXIT_VMRUN,"vmrun"},   \
+   {SVM_EXIT_VMMCALL,  "hypercall"},   \
+   {SVM_EXIT_VMLOAD,   "vmload"},  \
+   {SVM_EXIT_VMSAVE,   "vmsave"},  \
+   {SVM_EXIT_STGI, "stgi"},\
+   {SVM_EXIT_CLGI, "clgi"},\
+   {SVM_EXIT_SKINIT,   "skinit"},  \
+   {SVM_EXIT_WBINVD,   "wbinvd"},  \
+   {SVM_EXIT_MONITOR,  "monitor"}, \
+   {SVM_EXIT_MWAIT,

Re: Debug message flooding system log

2009-06-16 Thread Simon Marchi

On Tue, Jun 16, 2009 at 2:22 PM, Simon Marchi wrote:
> Hi,
>
> Since I installed the latest version of qemu-kvm (0.10.5), the
> following 3 lines are flooding my system log (dmesg). I have no idea
> where this is coming from.
>
> [433019.047800] arch/x86/kvm/x86.c kvm_write_guest_time
> [433019.047802]         Write (vcpu->hv_clock.system_time): 433019047801180
> [433019.047804]         FYI (vcpu->hv_clock.tsc_timestamp): 1298967607382796
>
> I tried to look in the mentionned file (x86.c) in the kernel source,
> but didn't find anything interesting. I probably activated a debug
> option somewhere, but I really don't see where.
>
> As an alternate option, I tried to set "dmesg -n1", which should hide
> all messages except panics, but the 3 lines are still appearing at a
> very fast rate.
>
> Any idea where this comes from and how I can disable this output ?
>
> Thanks
>
> Simon Marchi
>

Oh my god... Forget that...

All day I was booted on the wrong kernel... Now I understand why my
modifications in the kvm module had no effect ! This notice was
something I added few weeks ago to find the root of another problem...

It feels better now.

Thanks

Simon Marchi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-AUTOTEST PATCH] Adding iperf test

2009-06-16 Thread Lucas Meneghel Rodrigues

On Tue, 2009-06-16 at 18:40 -0300, Lucas Meneghel Rodrigues wrote:
>  * Autotest iperf test usually runs with 2 machines, one in 'server'
> role and other in 'client' mode. I would like to pursue the same model,
> 2 vms, one running as a server and another as a client.

Nevermind this comment, I've missed the part of the test that starts the
server on the host linux system.

-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST PATCH] kvm test: Adding remote migration support

2009-06-16 Thread Lucas Meneghel Rodrigues

Make the current migration test handle remote migration. In order to
use remote migration, the following four parameters should be added
to the existing migration test:

   remote = dst
   hostip = 
   remoteip = 
   remuser = root
   rempassword = 

The field remote=dst indicates the VM "dst" should be created on remote
machine. For example:

   - migrate:  install setup
   type = migration
   vms += " dst"
   migration_test_command = help
   kill_vm_on_error = yes
   remote = dst
   hostip = 192.168.1.2
   remoteip = 192.168.1.3
   remuser = root
   rempassword = 123456
   variants:

Signed-off-by: Yogananth Subramanian 
---
 client/tests/kvm/kvm_tests.cfg.sample |6 +++
 client/tests/kvm/kvm_tests.py |2 +-
 client/tests/kvm/kvm_vm.py|   61 +
 3 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 931f748..ca7f1d0 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -54,6 +54,12 @@ variants:
 vms += " dst"
 migration_test_command = help
 kill_vm_on_error = yes
+remote = dst
+hostip = 192.168.1.2
+remoteip = 192.168.1.3
+remuser = root
+rempassword = 123456
+kill_vm_on_error = yes
 variants:
 - 1:
 start_vm_for_migration_dst = yes
diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
index 4270cae..32b1ba4 100644
--- a/client/tests/kvm/kvm_tests.py
+++ b/client/tests/kvm/kvm_tests.py
@@ -113,7 +113,7 @@ def run_migration(test, params, env):
 session.close()
 
 # Define the migration command
-cmd = "migrate -d tcp:localhost:%d" % dest_vm.migration_port
+cmd = "migrate -d tcp:%s:%d" % (dest_vm.hostip, dest_vm.migration_port)
 logging.debug("Migration command: %s" % cmd)
 
 # Migrate
diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 5028161..c15d0a1 100644
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -1,5 +1,5 @@
 #!/usr/bin/python
-import time, socket, os, logging, fcntl
+import time, socket, os, logging, fcntl, re
 import kvm_utils
 
 """
@@ -113,6 +113,7 @@ class VM:
 self.qemu_path = qemu_path
 self.image_dir = image_dir
 self.iso_dir = iso_dir
+self.remote = False
 
 
 # Find available monitor filename
@@ -170,8 +171,6 @@ class VM:
 file.close()
 if not self.qemu_path in cmdline:
 return False
-if not self.monitor_file_name in cmdline:
-return False
 return True
 
 
@@ -234,8 +233,6 @@ class VM:
 qemu_cmd += qemu_path
 # Add the VM's name
 qemu_cmd += " -name '%s'" % name
-# Add the monitor socket parameter
-qemu_cmd += " -monitor unix:%s,server,nowait" % self.monitor_file_name
 
 for image_name in kvm_utils.get_sub_dict_names(params, "images"):
 image_params = kvm_utils.get_sub_dict(params, image_name)
@@ -320,6 +317,18 @@ class VM:
 qemu_path = self.qemu_path
 image_dir = self.image_dir
 iso_dir = self.iso_dir
+# If VM is remote, set hostip to ip of the remote machine
+# If VM is local set hostip to localhost or hostip param
+if params.get("remote") == "yes":
+self.remote = True
+self.hostip = params.get("remoteip")
+self.qemu_path = params.get("qemu_path",qemu_path)
+qemu_path = self.qemu_path
+self.image_dir = params.get("image_dir",image_dir)
+image_dir = self.image_dir
+else:
+self.remote = False
+self.hostip = params.get("hostip","localhost")
 
 # Verify the md5sum of the ISO image
 iso = params.get("cdrom")
@@ -377,9 +386,32 @@ class VM:
 # Add -incoming option to the qemu command
 qemu_command += " -incoming tcp:0:%d" % self.migration_port
 
-logging.debug("Running qemu command:\n%s", qemu_command)
-(status, pid, output) = kvm_utils.run_bg(qemu_command, None,
- logging.debug, "(qemu) ")
+self.monitor_port = kvm_utils.find_free_port(5400, 6000)
+qemu_command += (" -monitor tcp:0:%d,server,nowait" %
+ self.monitor_port)
+
+# If the VM is remote, get the username and password of remote host
+# and lanch qemu command on the remote machine.
+if self.remote:
+remuser = params.get("remuser")
+rempassword = params.get("rempassword")
+sub = kvm_utils.ssh(self.hostip, 22, remuser, rempassword,
+self.params.get("ssh_prompt", "[\#\$]"))
+qemu

Re: [KVM-AUTOTEST PATCH] kvm test: Adding remote migration support

2009-06-16 Thread Lucas Meneghel Rodrigues

On Tue, 2009-06-16 at 21:46 -0300, Lucas Meneghel Rodrigues wrote:
> Make the current migration test handle remote migration. In order to
> use remote migration, the following four parameters should be added
> to the existing migration test:

I went trough the discussion about this patch and I also agree we should
be implementing this test as a server side test. However I thought it
would be good having this patch rebased to the new kvm test code just in
case we need it in the future. This is part of the patch queue work.

>remote = dst
>hostip = 
>remoteip = 
>remuser = root
>rempassword = 
> 
> The field remote=dst indicates the VM "dst" should be created on remote
> machine. For example:
> 
>- migrate:  install setup
>type = migration
>vms += " dst"
>migration_test_command = help
>kill_vm_on_error = yes
>remote = dst
>hostip = 192.168.1.2
>remoteip = 192.168.1.3
>remuser = root
>rempassword = 123456
>variants:
> 
> Signed-off-by: Yogananth Subramanian 
> ---
>  client/tests/kvm/kvm_tests.cfg.sample |6 +++
>  client/tests/kvm/kvm_tests.py |2 +-
>  client/tests/kvm/kvm_vm.py|   61 
> +
>  3 files changed, 53 insertions(+), 16 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
> b/client/tests/kvm/kvm_tests.cfg.sample
> index 931f748..ca7f1d0 100644
> --- a/client/tests/kvm/kvm_tests.cfg.sample
> +++ b/client/tests/kvm/kvm_tests.cfg.sample
> @@ -54,6 +54,12 @@ variants:
>  vms += " dst"
>  migration_test_command = help
>  kill_vm_on_error = yes
> +remote = dst
> +hostip = 192.168.1.2
> +remoteip = 192.168.1.3
> +remuser = root
> +rempassword = 123456
> +kill_vm_on_error = yes
>  variants:
>  - 1:
>  start_vm_for_migration_dst = yes
> diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
> index 4270cae..32b1ba4 100644
> --- a/client/tests/kvm/kvm_tests.py
> +++ b/client/tests/kvm/kvm_tests.py
> @@ -113,7 +113,7 @@ def run_migration(test, params, env):
>  session.close()
>  
>  # Define the migration command
> -cmd = "migrate -d tcp:localhost:%d" % dest_vm.migration_port
> +cmd = "migrate -d tcp:%s:%d" % (dest_vm.hostip, dest_vm.migration_port)
>  logging.debug("Migration command: %s" % cmd)
>  
>  # Migrate
> diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
> index 5028161..c15d0a1 100644
> --- a/client/tests/kvm/kvm_vm.py
> +++ b/client/tests/kvm/kvm_vm.py
> @@ -1,5 +1,5 @@
>  #!/usr/bin/python
> -import time, socket, os, logging, fcntl
> +import time, socket, os, logging, fcntl, re
>  import kvm_utils
>  
>  """
> @@ -113,6 +113,7 @@ class VM:
>  self.qemu_path = qemu_path
>  self.image_dir = image_dir
>  self.iso_dir = iso_dir
> +self.remote = False
>  
> 
>  # Find available monitor filename
> @@ -170,8 +171,6 @@ class VM:
>  file.close()
>  if not self.qemu_path in cmdline:
>  return False
> -if not self.monitor_file_name in cmdline:
> -return False
>  return True
>  
> 
> @@ -234,8 +233,6 @@ class VM:
>  qemu_cmd += qemu_path
>  # Add the VM's name
>  qemu_cmd += " -name '%s'" % name
> -# Add the monitor socket parameter
> -qemu_cmd += " -monitor unix:%s,server,nowait" % 
> self.monitor_file_name
>  
>  for image_name in kvm_utils.get_sub_dict_names(params, "images"):
>  image_params = kvm_utils.get_sub_dict(params, image_name)
> @@ -320,6 +317,18 @@ class VM:
>  qemu_path = self.qemu_path
>  image_dir = self.image_dir
>  iso_dir = self.iso_dir
> +# If VM is remote, set hostip to ip of the remote machine
> +# If VM is local set hostip to localhost or hostip param
> +if params.get("remote") == "yes":
> +self.remote = True
> +self.hostip = params.get("remoteip")
> +self.qemu_path = params.get("qemu_path",qemu_path)
> +qemu_path = self.qemu_path
> +self.image_dir = params.get("image_dir",image_dir)
> +image_dir = self.image_dir
> +else:
> +self.remote = False
> +self.hostip = params.get("hostip","localhost")
>  
>  # Verify the md5sum of the ISO image
>  iso = params.get("cdrom")
> @@ -377,9 +386,32 @@ class VM:
>  # Add -incoming option to the qemu command
>  qemu_command += " -incoming tcp:0:%d" % self.migration_port
>  
> -logging.debug("Running qemu command:\n%s", qemu_command)
> -(status, pid, output) = kvm_utils.run_bg(qemu_command, None,
> - logging.debug, "(qemu) 
> ")
> +self.monitor_po

Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test

2009-06-16 Thread jason wang

Hello Michael:

The shutdown case is useful but the patch does really similar work to
vm.destroy(). Maybe we could simple let the preprocess progress to
shutdown all the vms just like:

- shutdown:  install setup
vms = ''

Michael Goldish 写道:
> The shutdown test logs into a VM and sends a shutdown command.
> It serves two purposes:
> - Test KVM's ability to shut down.
> - Clean up after the other tests:
> Currently VMs of the last test remain alive when Autotest finishes running.
> When one guest finishes testing and another begins, the VM is automatically
> shut down by the preprocessor because the QEMU command required for the next
> guest differs from that of the guest that just finished.  In the case of the
> final guest this doesn't happen because no guest follows it, so the 
> preprocessor
> must be explicitly instructed to kill the VM.
> However, we have no easy way of knowing which test runs last because the user
> usually selects a subset of the tests/guests.
> The addition of a shutdown test can be a decent solution to this small 
> problem:
> by convention the shutdown test will always be the last to run, and if users
> wish to clean up after the tests, they must select the shutdown test.
>
> Note: it is beneficial to allow users to leave the VMs of the last test 
> running
> because it saves time when developing and testing tests. A test writer can run
> the test once on a VM, and when the test exits, make some modifications to its
> code and re-run it on the same living VM, and repeat this procedure without
> having to shutdown/boot the VM every time.
>
> Signed-off-by: Michael Goldish 
> ---
>  client/tests/kvm/kvm.py   |1 +
>  client/tests/kvm/kvm_tests.py |   37 +
>  2 files changed, 38 insertions(+), 0 deletions(-)
>
> diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
> index 9428162..aa727da 100644
> --- a/client/tests/kvm/kvm.py
> +++ b/client/tests/kvm/kvm.py
> @@ -48,6 +48,7 @@ class kvm(test.test):
>  "steps":test_routine("kvm_guest_wizard", 
> "run_steps"),
>  "stepmaker":test_routine("stepmaker", "run_stepmaker"),
>  "boot": test_routine("kvm_tests", "run_boot"),
> +"shutdown": test_routine("kvm_tests", "run_shutdown"),
>  "migration":test_routine("kvm_tests", "run_migration"),
>  "yum_update":   test_routine("kvm_tests", "run_yum_update"),
>  "autotest": test_routine("kvm_tests", "run_autotest"),
> diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py
> index ffe9116..4c9653f 100644
> --- a/client/tests/kvm/kvm_tests.py
> +++ b/client/tests/kvm/kvm_tests.py
> @@ -57,6 +57,43 @@ def run_boot(test, params, env):
>  session.close()
>  
>  
> +def run_shutdown(test, params, env):
> +"""
> +KVM shutdown test:
> +1) Log into a guest
> +2) Send a shutdown command to the guest
> +3) Wait until it's down
> +
> +@param test: kvm test object
> +@param params: Dictionary with the test parameters
> +@param env: Dictionary with test environment
> +"""
> +vm = kvm_utils.env_get_vm(env, params.get("main_vm"))
> +if not vm:
> +raise error.TestError("VM object not found in environment")
> +if not vm.is_alive():
> +raise error.TestError("VM seems to be dead; Test requires a living 
> VM")
> +
> +logging.info("Waiting for guest to be up...")
> +
> +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
> +if not session:
> +raise error.TestFail("Could not log into guest")
> +
> +logging.info("Logged in")
> +
> +# Send the VM's shutdown command
> +session.sendline(vm.get_params().get("cmd_shutdown"))
> +session.close()
> +
> +logging.info("Shutdown command sent; waiting for guest to go down...")
> +
> +if not kvm_utils.wait_for(vm.is_dead, 120, 0, 1):
> +raise error.TestFail("Guest refuses to go down")
> +
> +logging.info("Guest is down")
> +
> +
>  def run_migration(test, params, env):
>  """
>  KVM migration test:
>   

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] add sysenter/syscall emulation for 32bit compat mode

2009-06-16 Thread Amit Shah

Hi Andre,

On (Tue) Jun 16 2009 [15:25:13], Andre Przywara wrote:
> sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas
> syscall is not supported on Intel's 32bit compat mode. To allow cross
> vendor migration we emulate the missing instructions by setting up the
> processor state accordingly.
> The sysenter code was originally sketched by Amit Shah, it was completed,
> debugged,  syscall added and made-to-work by Christoph Egger and polished
> up by Andre Przywara.
> Please note that sysret does not need to be emulated, because it will be
> exectued in 64bit mode and returning to 32bit compat mode works on Intel.

Thanks for picking this up. You also had a testcase for this, right?

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM autotest patch queue - All patches worked out

2009-06-16 Thread Lucas Meneghel Rodrigues

All the patches on the first patch queue e-mail were examined and worked
out. This is a report of the patches that weren't applied yet. Some of
them might not make it into the kvm test, but all of them got follow
ups. Here's the latest report of the patches available:

Patch queue:


David Huff (dh...@redhat.com)

1) Move kvm functional tests to a 'test' directory

http://kerneltrap.org/mailarchive/linux-kvm/2009/5/26/5812453

I have reworked this patch and made my own version. I need to split it into
smaller patches though.

Reworked, needs review.


Yogi (anant...@linux.vnet.ibm.com)

1) Support for remote migration

http://kerneltrap.org/mailarchive/linux-kvm/2009/4/30/5607344/thread

I have rebased the patch. However, the patch review and discussion shows that
we want to implement remote migration as a server side test instead of a client
side one. The rebase is just in case we need the patch in the future.

Probably not going to apply, look at a different way to implement the test.


Alexey Eromenko (aerom...@redhat.com)

1) New test module: iperf

http://kerneltrap.org/mailarchive/linux-kvm/2009/5/31/5840973/thread

Rebase made, made some comments to it.
 * The idea of shipping binaries doesn't look very appealing to me
 * Autotest already has an iperf test, might be worth a look for linux guests

Pending some issues during review.


Jason Wang (jasow...@redhat.com)

1) TAP network support in kvm-autotest

Michael Goldish has a similar patch to this one, so we need to decide which one
is going to be used. Michael, when possible, please send your patch to support
TAP network on the kvm test.

Need to get a patch from Michael in order to decide which one to apply
(I believe he can merge the good aspects of both into a single patch series).

Thank you very much,

-- 
Lucas Meneghel Rodrigues
Software Engineer (QE)
Red Hat - Emerging Technologies

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test

2009-06-16 Thread Michael Goldish


- "jason wang"  wrote:

> Hello Michael:
> 
> The shutdown case is useful but the patch does really similar work to
> vm.destroy(). Maybe we could simple let the preprocess progress to
> shutdown all the vms just like:
> 
> - shutdown:  install setup
> vms = ''

That will shut them all down, but it won't report the results, so
the test will always pass.  I thought it wouldn't hurt to have an actual
test that shuts the guest down.  If the test fails to shut down nicely,
the postprocessor will take over and shut down with a 'quit'.

> Michael Goldish 写道:
> > The shutdown test logs into a VM and sends a shutdown command.
> > It serves two purposes:
> > - Test KVM's ability to shut down.
> > - Clean up after the other tests:
> > Currently VMs of the last test remain alive when Autotest finishes
> running.
> > When one guest finishes testing and another begins, the VM is
> automatically
> > shut down by the preprocessor because the QEMU command required for
> the next
> > guest differs from that of the guest that just finished.  In the
> case of the
> > final guest this doesn't happen because no guest follows it, so the
> preprocessor
> > must be explicitly instructed to kill the VM.
> > However, we have no easy way of knowing which test runs last because
> the user
> > usually selects a subset of the tests/guests.
> > The addition of a shutdown test can be a decent solution to this
> small problem:
> > by convention the shutdown test will always be the last to run, and
> if users
> > wish to clean up after the tests, they must select the shutdown
> test.
> >
> > Note: it is beneficial to allow users to leave the VMs of the last
> test running
> > because it saves time when developing and testing tests. A test
> writer can run
> > the test once on a VM, and when the test exits, make some
> modifications to its
> > code and re-run it on the same living VM, and repeat this procedure
> without
> > having to shutdown/boot the VM every time.
> >
> > Signed-off-by: Michael Goldish 
> > ---
> >  client/tests/kvm/kvm.py   |1 +
> >  client/tests/kvm/kvm_tests.py |   37
> +
> >  2 files changed, 38 insertions(+), 0 deletions(-)
> >
> > diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py
> > index 9428162..aa727da 100644
> > --- a/client/tests/kvm/kvm.py
> > +++ b/client/tests/kvm/kvm.py
> > @@ -48,6 +48,7 @@ class kvm(test.test):
> >  "steps":test_routine("kvm_guest_wizard",
> "run_steps"),
> >  "stepmaker":test_routine("stepmaker",
> "run_stepmaker"),
> >  "boot": test_routine("kvm_tests",
> "run_boot"),
> > +"shutdown": test_routine("kvm_tests",
> "run_shutdown"),
> >  "migration":test_routine("kvm_tests",
> "run_migration"),
> >  "yum_update":   test_routine("kvm_tests",
> "run_yum_update"),
> >  "autotest": test_routine("kvm_tests",
> "run_autotest"),
> > diff --git a/client/tests/kvm/kvm_tests.py
> b/client/tests/kvm/kvm_tests.py
> > index ffe9116..4c9653f 100644
> > --- a/client/tests/kvm/kvm_tests.py
> > +++ b/client/tests/kvm/kvm_tests.py
> > @@ -57,6 +57,43 @@ def run_boot(test, params, env):
> >  session.close()
> >  
> >  
> > +def run_shutdown(test, params, env):
> > +"""
> > +KVM shutdown test:
> > +1) Log into a guest
> > +2) Send a shutdown command to the guest
> > +3) Wait until it's down
> > +
> > +@param test: kvm test object
> > +@param params: Dictionary with the test parameters
> > +@param env: Dictionary with test environment
> > +"""
> > +vm = kvm_utils.env_get_vm(env, params.get("main_vm"))
> > +if not vm:
> > +raise error.TestError("VM object not found in
> environment")
> > +if not vm.is_alive():
> > +raise error.TestError("VM seems to be dead; Test requires a
> living VM")
> > +
> > +logging.info("Waiting for guest to be up...")
> > +
> > +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
> > +if not session:
> > +raise error.TestFail("Could not log into guest")
> > +
> > +logging.info("Logged in")
> > +
> > +# Send the VM's shutdown command
> > +session.sendline(vm.get_params().get("cmd_shutdown"))
> > +session.close()
> > +
> > +logging.info("Shutdown command sent; waiting for guest to go
> down...")
> > +
> > +if not kvm_utils.wait_for(vm.is_dead, 120, 0, 1):
> > +raise error.TestFail("Guest refuses to go down")
> > +
> > +logging.info("Guest is down")
> > +
> > +
> >  def run_migration(test, params, env):
> >  """
> >  KVM migration test:
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Avi Kivity


On 06/16/2009 09:32 PM, Jamie Lokier wrote:

Avi Kivity wrote:
   

Another issue is enumeration.  Guests will present their devices in the
order they find them on the pci bus (of course enumeration is guest
specific).  So if I have 2 virtio controllers the only way I can
distinguish between them is using their pci slots.
 


virtio controllers really should have a user-suppliable string or UUID
to identify them to the guest.  Don't they?
   


virtio controllers don't exist.  When they do, they may have a UUID or 
not, but in either case guest infrastructure is in place for reporting 
the PCI slot, not the UUID.


virtio disks do have a UUID.  I don't think older versions of Windows 
will use it though, so if you reorder your slots you'll see your drive 
letters change.  Same with Linux if you don't use udev by-uuid rules.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

80 matches

Mail list logo