Re: slow guest performance with build load, looking for ideas
Avi Kivity wrote: On 06/15/2009 06:25 PM, Michael Tokarev wrote: Avi Kivity wrote: On 06/15/2009 05:15 PM, Erik Jacobson wrote: [] So if I understand what you're saying: best not to use kvm guests for build servers with pre-Nehalem processors. pre-Nehalem / pre-Barcelona, > 4 vcpus, yes. How about 2 vcpus, and how about AMD processors ? 2 vcpus (or 4) should be fine. AMD processors (Barcelona+) would be good for any number of vcpus. Hmm.. that's sorta good (not so good for owners of most Intel CPUs -- Nehalem just started its life). But still confusing. Namely, 2..4 vcpus per GUEST or HOST -- for the ore-Nehalem ones? :) Thanks! /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test
The shutdown test logs into a VM and sends a shutdown command. It serves two purposes: - Test KVM's ability to shut down. - Clean up after the other tests: Currently VMs of the last test remain alive when Autotest finishes running. When one guest finishes testing and another begins, the VM is automatically shut down by the preprocessor because the QEMU command required for the next guest differs from that of the guest that just finished. In the case of the final guest this doesn't happen because no guest follows it, so the preprocessor must be explicitly instructed to kill the VM. However, we have no easy way of knowing which test runs last because the user usually selects a subset of the tests/guests. The addition of a shutdown test can be a decent solution to this small problem: by convention the shutdown test will always be the last to run, and if users wish to clean up after the tests, they must select the shutdown test. Note: it is beneficial to allow users to leave the VMs of the last test running because it saves time when developing and testing tests. A test writer can run the test once on a VM, and when the test exits, make some modifications to its code and re-run it on the same living VM, and repeat this procedure without having to shutdown/boot the VM every time. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm.py |1 + client/tests/kvm/kvm_tests.py | 37 + 2 files changed, 38 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py index 9428162..aa727da 100644 --- a/client/tests/kvm/kvm.py +++ b/client/tests/kvm/kvm.py @@ -48,6 +48,7 @@ class kvm(test.test): "steps":test_routine("kvm_guest_wizard", "run_steps"), "stepmaker":test_routine("stepmaker", "run_stepmaker"), "boot": test_routine("kvm_tests", "run_boot"), +"shutdown": test_routine("kvm_tests", "run_shutdown"), "migration":test_routine("kvm_tests", "run_migration"), "yum_update": test_routine("kvm_tests", "run_yum_update"), "autotest": test_routine("kvm_tests", "run_autotest"), diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py index ffe9116..4c9653f 100644 --- a/client/tests/kvm/kvm_tests.py +++ b/client/tests/kvm/kvm_tests.py @@ -57,6 +57,43 @@ def run_boot(test, params, env): session.close() +def run_shutdown(test, params, env): +""" +KVM shutdown test: +1) Log into a guest +2) Send a shutdown command to the guest +3) Wait until it's down + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment +""" +vm = kvm_utils.env_get_vm(env, params.get("main_vm")) +if not vm: +raise error.TestError("VM object not found in environment") +if not vm.is_alive(): +raise error.TestError("VM seems to be dead; Test requires a living VM") + +logging.info("Waiting for guest to be up...") + +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) +if not session: +raise error.TestFail("Could not log into guest") + +logging.info("Logged in") + +# Send the VM's shutdown command +session.sendline(vm.get_params().get("cmd_shutdown")) +session.close() + +logging.info("Shutdown command sent; waiting for guest to go down...") + +if not kvm_utils.wait_for(vm.is_dead, 120, 0, 1): +raise error.TestFail("Guest refuses to go down") + +logging.info("Guest is down") + + def run_migration(test, params, env): """ KVM migration test: -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 2/2] KVM test: include shutdown test in kvm_tests.cfg.sample
The shutdown test should always be last -- new tests will be added above it. If the user chooses to run all tests on a guest, the guest will be shut down by the final shutdown test. If the user selects a specific subset of the tests, the guest will shut down if the shutdown test is included in this subset. Note that this only applies to the last guest of the job; since currently all guests use the same VM object (vm1), the VM is shut down automatically when it's passed from one guest to the next. In short, standard test sets (weekly, daily, per-release) should include the shutdown test. In particular, tests running from the server should include it. Custom test sets run in client mode, intended for testing/developing a specific test or a set of tests, may run without the shutdown test. The shutdown test is defined with 'kill_vm = yes', so if the test fails to shut the guest down nicely, the postprocessor will take care of it. Also added 'kill_vm_on_error = yes' to 'setup' tests, because if a setup test fails, shutdown will not run, so the setup test must take care of shutting the VM down. Install tests already have this defined. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_tests.cfg.sample |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 54c9975..412c4b0 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -41,6 +41,7 @@ variants: fail_if_stuck_for = 300 stuck_detection_history = 2 keep_screendump_history = yes +kill_vm_on_error = yes - boot: install setup type = boot @@ -83,6 +84,12 @@ variants: - linux_s3: install setup type = linux_s3 +- shutdown: install setup +type = shutdown +kill_vm = yes +kill_vm_gracefully = no + + # NICs variants: - @rtl8139: @@ -544,7 +551,7 @@ variants: only default only up only Fedora.8.32 -only install setup boot +only install setup boot shutdown only rtl8139 - @sample1: only qcow2 -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow guest performance with build load, looking for ideas
On 06/16/2009 10:03 AM, Michael Tokarev wrote: So if I understand what you're saying: best not to use kvm guests for build servers with pre-Nehalem processors. pre-Nehalem / pre-Barcelona, > 4 vcpus, yes. How about 2 vcpus, and how about AMD processors ? 2 vcpus (or 4) should be fine. AMD processors (Barcelona+) would be good for any number of vcpus. [] Hmm.. that's sorta good (not so good for owners of most Intel CPUs -- Nehalem just started its life). But still confusing. Namely, 2..4 vcpus per GUEST or HOST -- for the ore-Nehalem ones? :) 4 vcpus per guest would be fine (even more should work, depending on workload). Host will scale with any number of cpus. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 09:44 PM, Blue Swirl wrote: pc-qemu-0.10? pc-2009.06? Or given the hardware, should that be pc-1997? pc-qemu-0.10 has the obvious benefit of allowing people to immediately know what's the oldest version of qemu that supports it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration broken when under heavy IO
On 06/15/2009 11:33 PM, Anthony Liguori wrote: The basic issue is that: migrate_fd_put_ready():bdrv_flush_all(); Does: block.c: foreach block driver: drv->flush(bs); Which in the case of raw, is just fsync(s->fd). Any submitted request is not queued or flushed which will lead to the request being dropped after the live migration. Is anyone working on fixing this? Not to my knowledge Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. We could start throttling requests late in the live stage, but I don't really see the point. Isn't virtio migration currently broken due to the qdev changes? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration broken when under heavy IO
On 06/16/2009 12:10 PM, Avi Kivity wrote: Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. We could start throttling requests late in the live stage, but I don't really see the point. Well, we can introduce a new live stage, where we migrate RAM and complete block requests, but the vm is otherwise stopped. This allows the flush to overlap with sending memory dirtied by the flush, reducing some downtime. Since block requests can dirty large amounts of memory, this may be significant. We can even keep the vcpu alive, only blocking new block requests, but that may cause dirty RAM divergence. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] move _PR to SSDT v2
On Fri, Jun 12, 2009 at 03:56:04PM +0200, Jes Sorensen wrote: > Hi, > > I figured out why Windows wasn't booting with the previous version > of the patch. It seems it didn't like the forward declaration of the > _PR.PRSC method. > > Instead I found that it does seem to work if I add the calling method > _GRE.L02 in the SSDT, while the rest of the _GRE is declared in the > DSDT. > > It boots, it's perfect! > Works for me with windows XP/7, but can you add a comment in DSDT where _GRE.L02 should be why it is missing and where to find it. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] move _PR to SSDT v2
On 06/16/2009 11:50 AM, Gleb Natapov wrote: Works for me with windows XP/7, but can you add a comment in DSDT where _GRE.L02 should be why it is missing and where to find it. Sure, here's an updated version with the comment. Thanks for testing it. Cheers, Jes Move _PR block from the DSDT to a new SSDT in the KVM BIOS. As AML, or at least Windows doesn't allow forward declarations of Methods, it is necessary to declare the _GPE._L02 Method in the SSDT as well. This will make it possible to plug in different SSDTs with different processor counts in the future. Signed-off-by: Jes Sorensen --- kvm/bios/Makefile | 12 +++- kvm/bios/acpi-dsdt.dsl | 110 +- kvm/bios/acpi-ssdt.dsl | 140 + kvm/bios/rombios32.c | 16 - 4 files changed, 166 insertions(+), 112 deletions(-) Index: qemu-kvm/kvm/bios/Makefile === --- qemu-kvm.orig/kvm/bios/Makefile +++ qemu-kvm/kvm/bios/Makefile @@ -71,7 +71,7 @@ bios: biossums BIOS-bochs-latest BIOS-bo clean: rm -f *.o *.a *.s _rombios*_.c rombios*.txt rombios*.sym rm -f usage biossums rombios16.bin - rm -f rombios32.bin rombios32.out acpi-dsdt.hex + rm -f rombios32.bin rombios32.out acpi-dsdt.hex acpi-ssdt.hex dist-clean: clean rm -f Makefile @@ -108,13 +108,19 @@ rombios32.bin: rombios32.out rombios.h rombios32.out: rombios32start.o rombios32.o vapic.o rombios32.ld ld -o $@ -T rombios32.ld rombios32start.o vapic.o rombios32.o -rombios32.o: rombios32.c acpi-dsdt.hex +rombios32.o: rombios32.c acpi-dsdt.hex acpi-ssdt.hex $(GCC) -m32 -O2 -Wall -c -o $@ $< acpi-dsdt.hex: acpi-dsdt.dsl cpp -P $< $<.i iasl -tc -p $@ $<.i - sed -i -e's/^unsigned/const unsigned/' $@ + sed -i -e's/^unsigned char AmlCode/const unsigned char DSDTCode/' $@ + rm $<.i + +acpi-ssdt.hex: acpi-ssdt.dsl + cpp -P $< $<.i + iasl -tc -p $@ $<.i + sed -i -e's/^unsigned char AmlCode/const unsigned char SSDTCode/' $@ rm $<.i rombios32start.o: rombios32start.S Index: qemu-kvm/kvm/bios/acpi-dsdt.dsl === --- qemu-kvm.orig/kvm/bios/acpi-dsdt.dsl +++ qemu-kvm/kvm/bios/acpi-dsdt.dsl @@ -25,108 +25,6 @@ DefinitionBlock ( 0x1 // OEM Revision ) { - Scope (\_PR) - { - /* pointer to first element of MADT APIC structures */ - OperationRegion(ATPR, SystemMemory, 0x0514, 4) - Field (ATPR, DwordAcc, NoLock, Preserve) - { - ATP, 32 - } - -#define madt_addr(nr) Add (ATP, Multiply(nr, 8)) - -#define gen_processor(nr, name) \ - Processor (CPU##name, nr, 0xb010, 0x06) { \ - OperationRegion (MATR, SystemMemory, madt_addr(nr), 8) \ - Field (MATR, ByteAcc, NoLock, Preserve) \ - { \ - MAT, 64 \ - } \ - Field (MATR, ByteAcc, NoLock, Preserve) \ - { \ - Offset(4), \ - FLG, 1 \ - } \ -Method(_MAT, 0) { \ - Return(MAT) \ -} \ -Method (_STA) { \ -If (FLG) { Return(0xF) } Else { Return(0x9) } \ -} \ -} \ - - - gen_processor(0, 0) - gen_processor(1, 1) - gen_processor(2, 2) - gen_processor(3, 3) - gen_processor(4, 4) - gen_processor(5, 5) - gen_processor(6, 6) - gen_processor(7, 7) - gen_processor(8, 8) - gen_processor(9, 9) - gen_processor(10, A) - gen_processor(11, B) - gen_processor(12, C) - gen_processor(13, D) - gen_processor(14, E) - - Method (NTFY, 2) { -#define gen_ntfy(nr)\ - If (LEqual(Arg0, 0x##nr)) { \ - If (LNotEqual(Arg1, \_PR.CPU##nr.FLG)) {\ - Store (Arg1, \_PR.CPU##nr.FLG) \ - If (LEqual(Arg1, 1)) { \ -Notify(CPU##nr, 1) \ - } Else {\ -Notify(CPU##nr, 3) \ - } \ - } \ - } - gen_ntfy(0) - gen_ntfy(1) - gen_ntfy(2) - gen_ntfy(3) - gen_ntfy(4) - gen_ntfy(5) - gen_ntfy(6) - gen_ntfy(7) - gen_ntfy(8) - gen_ntfy(9) - gen_ntfy(A) - gen_ntfy(B) - gen_ntfy(C
Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test
Michael I don't fully understand why shutdown test is needed; Shutdown is tested during reboot, where GuestOS must de-init itself. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 1/2] KVM test: optionally convert PPM files to PNG format after test
This is intended to save disk space. Requires ImageMagick (uses mogrify). To enable: convert_ppm_files_to_png = yes To enable only for failed tests: convert_ppm_files_to_png_on_error = yes Reminder: by default PPM files are removed after the test (and after the conversion, if requested). To keep them specify: keep_ppm_files = yes To keep them only for failed tests: keep_ppm_files_on_error = yes A reasonable choice would be to keep PNG files for failed tests, and never keep PPM files. To do this specify convert_ppm_files_to_png_on_error = yes without specifying 'keep_ppm_files = yes' or 'keep_ppm_files_on_error = yes' (or explicitly set to 'no', if it was set to 'yes' on a higher level in the config hierarchy). This patch also makes small cosmetic changes to the 'keep_ppm_files' feature. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_preprocessing.py | 16 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index 7fac46a..26e5210 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -223,11 +223,19 @@ def postprocess(test, params, env): """ process(test, params, env, postprocess_image, postprocess_vm) -# See if we should get rid of all PPM files -if not params.get("keep_ppm_files") == "yes": -# Remove them all +# Should we convert PPM files to PNG format? +if params.get("convert_ppm_files_to_png") == "yes": +logging.debug("'convert_ppm_files_to_png' specified; converting PPM" + " files to PNG format...") +mogrify_cmd = ("mogrify -format png %s" % + os.path.join(test.debugdir, "*.ppm")) +kvm_subprocess.run_fg(mogrify_cmd, logging.debug, "(mogrify) ", + timeout=30.0) + +# Should we keep the PPM files? +if params.get("keep_ppm_files") != "yes": logging.debug("'keep_ppm_files' not specified; removing all PPM files" - " from results dir...") + " from debug dir...") rm_cmd = "rm -vf %s" % os.path.join(test.debugdir, "*.ppm") kvm_subprocess.run_fg(rm_cmd, logging.debug, "(rm) ", timeout=5.0) -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 2/2] KVM test: kvm_tests.cfg.sample: convert PPM files to PNG by default
By default, always remove PPM files, and keep PNG files only for failed tests. This shouldn't do much harm, because while PPMs can be incorporated directly into step files, PNGs can be converted back to PPMs easily, and take less disk space. (PNG is a lossless compression format.) The 'keep_ppm_files' and 'keep_ppm_files_on_error' settings are commented out so the user can easily enable them if desired. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_tests.cfg.sample |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 412c4b0..2564e16 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -8,8 +8,9 @@ main_vm = vm1 # Some preprocessor/postprocessor params start_vm = yes -keep_ppm_files = no -keep_ppm_files_on_error = yes +convert_ppm_files_to_png_on_error = yes +#keep_ppm_files = yes +#keep_ppm_files_on_error = yes kill_vm = no kill_vm_gracefully = yes -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test
- "Alexey Eromenko" wrote: > Michael I don't fully understand why shutdown test is needed; Shutdown > is tested during reboot, where GuestOS must de-init itself. The main motivation for a shutdown test is that it allows us to choose whether VMs should be kept alive after the last test. Including this test in a job takes no more time than running the job without it, because normally the preprocessor automatically shuts down guests (except for the last one). There's also a small difference between a complete shutdown and a reboot -- a shutdown is supposed to close the qemu process. I'm not sure this difference justifies the test, but the reason mentioned in the previous paragraph does, in my opinion. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: verify MTRR/PAT validity
Do not allow invalid MTRR/PAT values in set_msr_mtrr. Please review carefully. Signed-off-by: Marcelo Tosatti Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -722,11 +722,53 @@ static bool msr_mtrr_valid(unsigned msr) return false; } +static unsigned mtrr_types[] = {0, 1, 4, 5, 6}; +static unsigned pat_types[] = {0, 1, 4, 5, 6, 7}; + +static bool valid_mt(unsigned type, int len, unsigned array[len]) +{ + int i; + + for (i = 0; i < len; i++) + if (type == array[i]) + return true; + + return false; +} + +#define valid_pat_type(a) valid_mt(a, ARRAY_SIZE(pat_types), pat_types) +#define valid_mtrr_type(a) valid_mt(a, ARRAY_SIZE(mtrr_types), mtrr_types) + +static bool mtrr_valid(u32 msr, u64 data) +{ + int i; + + if (!msr_mtrr_valid(msr)) + return false; + + if (msr == MSR_IA32_CR_PAT) { + for (i = 0; i < 8; i++) + if (!valid_pat_type((data >> (i * 8)) & 0xff)) + return false; + return true; + } else if (msr == MSR_MTRRdefType) { + return valid_mtrr_type(data & 0xff); + } else if (msr >= MSR_MTRRfix64K_0 && msr <= MSR_MTRRfix4K_F8000) { + for (i = 0; i < 8 ; i++) + if (!valid_mtrr_type((data >> (i * 8)) & 0xff)) + return false; + return true; + } + + /* variable MTRRs, physmaskn have bits 0-10 reserved */ + return valid_mtrr_type(data & 0xff); +} + static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data) { u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges; - if (!msr_mtrr_valid(msr)) + if (!mtrr_valid(msr, data)) return 1; if (msr == MSR_MTRRdefType) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, 2009-06-15 at 13:12 -0500, Anthony Liguori wrote: > Mark McLoughlin wrote: > > So long as the restrictions would be known to the management app via > > some "what slots are available" mechanism in qemu, that sounds fine. > > > > I'm not sure a "what slots are available" mechanism is as straight > forward as has been claimed. If qemu can't provide that information, then the management app does not have sufficient information to do the slot allocation itself. In which case, it must leave it up to qemu to do it. > It doesn't matter though because it's orthogonal to the current proposal. It is not orthogonal to solving the actual problem at hand, though - i.e. how to allow management apps to provide stable PCI addresses. > >>> I'm not at all arguing against pci_addr. I'm arguing about how > >>> libvirt should use it with respect to the "genesis" use-case where > >>> libvirt has no specific reason to choose one PCI slot over another. > >>> In that case, I'm merely advocating that we want to let QEMU make the > >>> decision. > >>> > >> However this may end up, isn't it offtopic? Whatever we do we have to > >> support both pci_addr= and default placement, so we can push this > >> discussion to livirt-devel and bid them godspeed. > >> > > > > Presumably you're not proposing that qemu-devel completely ignore the > > typical requirements of management apps? > > > > This is a happy case where the current proposals allow both usages to > occur. Which one libvirt chooses it up to it. > > To summarize, I think we have: > > 1) Introduce addressing to all host device configurations > - Either in the canonical form "pci_addr=bus:dev.fn or target=3,lun=1" > or in flattened form "addr=bus:dev.fn or addr=target.lun". I prefer the > later form but I think either would be acceptable. That helps, but it's not enough on its own. The management app needs to figure out what addresses to pass either by: a) Initially allowing qemu to do the address allocation, and thereafter using those addresses - this requires some way to query the addresses of devices or b) Doing the initial address allocation itself - this requires some way to query what slots are available. > 2) Whenever the default machine type changes in a guest-visible way, > introduce a new machine type > - Use explicit versions in name: pc-v1, pc-v2 or use more descriptive > names pc-with-usb > - Easily transitions to device config files To be clear - you're not proposing this is a solution to the "stable PCI addresses" problem, are you? The main requirement is for the addresses to stay stable even if the user adds/removes other devices. This is a fine solution to the "stable guest ABI" problem ... assuming there's some way of querying the current default machine type. Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/16/2009 03:14 PM, Mark McLoughlin wrote: On Mon, 2009-06-15 at 13:12 -0500, Anthony Liguori wrote: Mark McLoughlin wrote: So long as the restrictions would be known to the management app via some "what slots are available" mechanism in qemu, that sounds fine. I'm not sure a "what slots are available" mechanism is as straight forward as has been claimed. If qemu can't provide that information, then the management app does not have sufficient information to do the slot allocation itself. In which case, it must leave it up to qemu to do it. A given -M machine will have well-known open slots (since it's an ABI), same as it has rtl8139 and ne2000 cards. Worst case we hardcode those numbers (gasp, faint). It doesn't matter though because it's orthogonal to the current proposal. It is not orthogonal to solving the actual problem at hand, though - i.e. how to allow management apps to provide stable PCI addresses. It's part of the solution, but hardly a difficult the most difficult part. This is a fine solution to the "stable guest ABI" problem ... assuming there's some way of querying the current default machine type. $ qemu -print-default-machine or maybe $ qemu -show default-machine $ qemu -show pci-bus $ qemu -show me a way out -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Tue, 2009-06-16 at 15:28 +0300, Avi Kivity wrote: > On 06/16/2009 03:14 PM, Mark McLoughlin wrote: > > On Mon, 2009-06-15 at 13:12 -0500, Anthony Liguori wrote: > > > >> Mark McLoughlin wrote: > >> > >>> So long as the restrictions would be known to the management app via > >>> some "what slots are available" mechanism in qemu, that sounds fine. > >>> > >>> > >> I'm not sure a "what slots are available" mechanism is as straight > >> forward as has been claimed. > >> > > > > If qemu can't provide that information, then the management app does not > > have sufficient information to do the slot allocation itself. In which > > case, it must leave it up to qemu to do it. > > > > A given -M machine will have well-known open slots (since it's an ABI), > same as it has rtl8139 and ne2000 cards. If they're so obviously well-known, I don't see how the query mechanism would not be straightforward, which is the comment I was replying to. > Worst case we hardcode those numbers (gasp, faint). Maybe we can just add the open slots to the -help output. That'd be nice and clean. > >> It doesn't matter though because it's orthogonal to the current proposal. > >> > > > > It is not orthogonal to solving the actual problem at hand, though - > > i.e. how to allow management apps to provide stable PCI addresses. > > > > It's part of the solution, but hardly a difficult the most difficult part. Agree. > > This is a fine solution to the "stable guest ABI" problem ... assuming > > there's some way of querying the current default machine type. > > > > $ qemu -print-default-machine Or: $ readlink /usr/share/qemu/machine-types/pc.dt Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/2] KVM: convert custom marker based tracing to event traces
On Mon, Jun 15, 2009 at 04:03:10PM -0400, Steven Rostedt wrote: > > On Mon, 15 Jun 2009, Marcelo Tosatti wrote: > > > This allows use of the powerful ftrace infrastructure. > > > > See Documentation/trace/ for usage information. > > > > Signed-off-by: Marcelo Tosatti > > > > Index: kvm/arch/x86/kvm/svm.c > > === > > --- kvm.orig/arch/x86/kvm/svm.c > > +++ kvm/arch/x86/kvm/svm.c > > @@ -29,6 +29,9 @@ > > #include > > > > #include > > +#include "svm-trace.h" > > +#define CREATE_TRACE_POINTS > > +#include > > > > #define __ex(x) __kvm_handle_fault_on_reboot(x) > > > > > > Index: kvm/arch/x86/kvm/vmx.c > > === > > --- kvm.orig/arch/x86/kvm/vmx.c > > +++ kvm/arch/x86/kvm/vmx.c > > @@ -34,6 +34,10 @@ > > #include > > #include > > > > +#include "vmx-trace.h" > > +#define CREATE_TRACE_POINTS > > +#include > > + > > #define __ex(x) __kvm_handle_fault_on_reboot(x) > > > > MODULE_AUTHOR("Qumranet"); > > > Index: kvm/arch/x86/kvm/x86.c > > === > > --- kvm.orig/arch/x86/kvm/x86.c > > +++ kvm/arch/x86/kvm/x86.c > > @@ -37,6 +37,8 @@ > > #include > > #include > > #include > > +#define CREATE_TRACE_POINTS > > +#include > > > > #include > > #include > > @@ -347,9 +349,6 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0); > > > > Index: kvm/include/trace/events/kvm/x86-arch.h > > === > > --- /dev/null > > +++ kvm/include/trace/events/kvm/x86-arch.h > > > One suggestion. Instead of putting in arch specific trace points into > generic code, you can put these into arch/x86/kvm/trace.h ? > > Then you can in the Makefile add: > > CFLAGS_x86.o := -I. > CFLAGS_svm.o := -I. > CFLAGS_vmx.o := -I. > > Or better yet, have a single file called trace.c: > > in the Makefile: > CFLAGS_trace.o := -I. > obj-$(EVENT_TRACING) += trace.o > > in trace.c: > > #define CREATE_TRACE_POINTS > #include "trace.h" > #include "trace-arch.h" > > > Then have the kvm/x86.h moved to trace.h > and the kvm/arch-x86.h move to trace-arch.h > > Just change the "TRACE_INCLUDE_FILE" to include the proper name. > > -- Steve Similar to http://patchwork.kernel.org/patch/23829/, but moving the x86 tracepoint definitions to arch/x86/kvm/ ? I thought the point of include/trace/ was to have all tracepoints definitions in a single directory? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration broken when under heavy IO
Avi Kivity wrote: Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. An alternative would be to attempt to cancel the requests. This incurs no non-deterministic latency. The tricky bit is that this has to happen at the device layer because the opaques cannot be saved in a meaningful way. We could start throttling requests late in the live stage, but I don't really see the point. Isn't virtio migration currently broken due to the qdev changes? -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/16/2009 03:39 PM, Mark McLoughlin wrote: Worst case we hardcode those numbers (gasp, faint). Maybe we can just add the open slots to the -help output. That'd be nice and clean. Yeah, there's precedent too. Or: $ readlink /usr/share/qemu/machine-types/pc.dt That works if you have exactly one qemu installed. It's best if qemu itself is the entry point (qemu -print-device-tree). Though I wouldn't want to inflict it upon the management application writers. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration broken when under heavy IO
On 06/16/2009 03:50 PM, Anthony Liguori wrote: Avi Kivity wrote: Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. An alternative would be to attempt to cancel the requests. This incurs no non-deterministic latency. Yes, that's even better (though without linux-aio, it's equivalent). The tricky bit is that this has to happen at the device layer because the opaques cannot be saved in a meaningful way. Do you mean the device has to record all cancelled requests and replay them? I think we can do it at the block layer (though we have to avoid it for nested requests). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration broken when under heavy IO
Avi Kivity wrote: Yes, that's even better (though without linux-aio, it's equivalent). Not absolutely equivalent. There many be queued requests that haven't yet been dispatched to the thread pool, but yeah, I understand what you mean. The tricky bit is that this has to happen at the device layer because the opaques cannot be saved in a meaningful way. Do you mean the device has to record all cancelled requests and replay them? I think we can do it at the block layer (though we have to avoid it for nested requests). In order to complete the requests, you have to call a callback and pass an opaque with the results. The callback/opaque cannot be saved in the block layer in a meaningful way. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration broken when under heavy IO
On 06/16/2009 03:57 PM, Anthony Liguori wrote: The tricky bit is that this has to happen at the device layer because the opaques cannot be saved in a meaningful way. Do you mean the device has to record all cancelled requests and replay them? I think we can do it at the block layer (though we have to avoid it for nested requests). In order to complete the requests, you have to call a callback and pass an opaque with the results. The callback/opaque cannot be saved in the block layer in a meaningful way. You're right, of course. I guess we'll have to cancel any near term cancellation plans. We could change the opaque to be something pre-registered (e.g. the device state object, which we don't need to save/restore) and pass in addition an integer request tag. These would be migratable. The device would be responsible for saving tags and their associated information (perhaps through a common API). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] add sysenter/syscall emulation for 32bit compat mode
sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas syscall is not supported on Intel's 32bit compat mode. To allow cross vendor migration we emulate the missing instructions by setting up the processor state accordingly. The sysenter code was originally sketched by Amit Shah, it was completed, debugged, syscall added and made-to-work by Christoph Egger and polished up by Andre Przywara. Please note that sysret does not need to be emulated, because it will be exectued in 64bit mode and returning to 32bit compat mode works on Intel. Signed-off-by: Amit Shah Signed-off-by: Christoph Egger Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c | 37 ++- arch/x86/kvm/x86_emulate.c | 228 +++- 2 files changed, 259 insertions(+), 6 deletions(-) This is a reworked version of my earlier patch. I moved the former three case: parts into one separate function for better readability and understanding. Please apply! Thanks, Andre. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6025e5b..9066fb3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2635,11 +2635,38 @@ int emulate_instruction(struct kvm_vcpu *vcpu, /* Reject the instructions other than VMCALL/VMMCALL when * try to emulate invalid opcode */ c = &vcpu->arch.emulate_ctxt.decode; - if ((emulation_type & EMULTYPE_TRAP_UD) && - (!(c->twobyte && c->b == 0x01 && - (c->modrm_reg == 0 || c->modrm_reg == 3) && - c->modrm_mod == 3 && c->modrm_rm == 1))) - return EMULATE_FAIL; + + if (emulation_type & EMULTYPE_TRAP_UD) { + if (!c->twobyte) + return EMULATE_FAIL; + switch (c->b) { + case 0x01: /* VMMCALL */ + if (c->modrm_mod != 3) + return EMULATE_FAIL; + if (c->modrm_rm != 1) + return EMULATE_FAIL; + break; + case 0x34: /* sysenter */ + case 0x35: /* sysexit */ + if (c->modrm_mod != 0) + return EMULATE_FAIL; + if (c->modrm_rm != 0) + return EMULATE_FAIL; + break; + case 0x05: /* syscall */ + r = 0; + if (c->modrm_mod != 0) + return EMULATE_FAIL; + if (c->modrm_rm != 0) + return EMULATE_FAIL; + break; + default: + return EMULATE_FAIL; + } + + if (!(c->modrm_reg == 0 || c->modrm_reg == 3)) + return EMULATE_FAIL; + } ++vcpu->stat.insn_emulation; if (r) { diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index 22c765d..6c65462 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -32,6 +32,8 @@ #include #include +#include "mmu.h" + /* * Opcode effective-address decode tables. * Note that we only emulate instructions that have at least one memory @@ -217,7 +219,9 @@ static u32 twobyte_table[256] = { ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x30 - 0x3F */ - ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + ImplicitOps, 0, ImplicitOps, 0, + ImplicitOps, ImplicitOps, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, /* 0x40 - 0x47 */ DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, @@ -320,8 +324,11 @@ static u32 group2_table[] = { }; /* EFLAGS bit definitions. */ +#define EFLG_VM (1<<17) +#define EFLG_RF (1<<16) #define EFLG_OF (1<<11) #define EFLG_DF (1<<10) +#define EFLG_IF (1<<9) #define EFLG_SF (1<<7) #define EFLG_ZF (1<<6) #define EFLG_AF (1<<4) @@ -1390,6 +1397,207 @@ void toggle_interruptibility(struct x86_emulate_ctxt *ctxt, u32 mask) ctxt->interruptibility = mask; } +#define EMUL_SYSENTER 1 +#define EMUL_SYSEXIT 2 +#define EMUL_SYSCALL 3 + +static int +emulate_syscalls(struct x86_emulate_ctxt *ctxt, int ins) +{ + struct decode_cache *c = &ctxt->decode; + struct kvm_segment cs, ss; + unsigned long cr0; + u64 msr_data; + int usermode; + + /* inject #UD if LOCK prefix is used */ + if (c->lock_prefix) + return -1; + + /* inject
[PATCH] remove ppc functions from callbacks
handle_powerpc_dcr_read() and handle_powerpc_dcr_write() are two powerpc specific functions that are called via libkvm callbacks. However, grepping the source code finds simply no use of them. This is probably due to the fact that powerpc now relies on a totally qemu upstream implementation of kvm, and does not need it anymore. Anyway, I'm providing this patch separatedly, so that if it breaks for whenever reason, we can identify a bisection point easily Signed-off-by: Glauber Costa CC: Hollis Blanchard --- libkvm-all.h |4 qemu-kvm.c |4 qemu-kvm.h |5 - 3 files changed, 0 insertions(+), 13 deletions(-) diff --git a/libkvm-all.h b/libkvm-all.h index 4f7b9a3..47855be 100644 --- a/libkvm-all.h +++ b/libkvm-all.h @@ -156,10 +156,6 @@ struct kvm_callbacks { void (*post_kvm_run)(void *opaque, void *env); int (*pre_kvm_run)(void *opaque, void *env); int (*tpr_access)(void *opaque, kvm_vcpu_context_t vcpu, uint64_t rip, int is_write); -#if defined(__powerpc__) -int (*powerpc_dcr_read)(kvm_vcpu_context_t vcpu, uint32_t dcrn, uint32_t *data); -int (*powerpc_dcr_write)(kvm_vcpu_context_t vcpu, uint32_t dcrn, uint32_t data); -#endif #if defined(__s390__) int (*s390_handle_intercept)(kvm_context_t context, kvm_vcpu_context_t vcpu, struct kvm_run *run); diff --git a/qemu-kvm.c b/qemu-kvm.c index eefca61..61f5a19 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2278,10 +2278,6 @@ static struct kvm_callbacks qemu_kvm_ops = { #ifdef TARGET_I386 .tpr_access = handle_tpr_access, #endif -#ifdef TARGET_PPC -.powerpc_dcr_read = handle_powerpc_dcr_read, -.powerpc_dcr_write = handle_powerpc_dcr_write, -#endif .unhandled = handle_unhandled, }; diff --git a/qemu-kvm.h b/qemu-kvm.h index 126b8f3..57e8a2f 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -134,11 +134,6 @@ void kvm_remove_ioperm_data(unsigned long start_port, unsigned long num); void kvm_arch_do_ioperm(void *_data); #endif -#ifdef TARGET_PPC -int handle_powerpc_dcr_read(kvm_vcpu_context_t vcpu, uint32_t dcrn, uint32_t *data); -int handle_powerpc_dcr_write(kvm_vcpu_context_t vcpu,uint32_t dcrn, uint32_t data); -#endif - #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) #define BITMAP_SIZE(m) (ALIGN(((m)>>TARGET_PAGE_BITS), HOST_LONG_BITS) / 8) -- 1.6.2.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/2] KVM: convert custom marker based tracing to event traces
On Tue, 16 Jun 2009, Marcelo Tosatti wrote: > > > > > > One suggestion. Instead of putting in arch specific trace points into > > generic code, you can put these into arch/x86/kvm/trace.h ? > > > > Then you can in the Makefile add: > > > > CFLAGS_x86.o := -I. > > CFLAGS_svm.o := -I. > > CFLAGS_vmx.o := -I. > > > > Or better yet, have a single file called trace.c: > > > > in the Makefile: > > CFLAGS_trace.o := -I. > > obj-$(EVENT_TRACING) += trace.o > > > > in trace.c: > > > > #define CREATE_TRACE_POINTS > > #include "trace.h" > > #include "trace-arch.h" > > > > > > Then have the kvm/x86.h moved to trace.h > > and the kvm/arch-x86.h move to trace-arch.h > > > > Just change the "TRACE_INCLUDE_FILE" to include the proper name. > > > > -- Steve > > Similar to http://patchwork.kernel.org/patch/23829/, but moving the > x86 tracepoint definitions to arch/x86/kvm/ ? Yes, similar to what Christoph suggested. > > I thought the point of include/trace/ was to have all tracepoints > definitions in a single directory? > That's not what I heard ;-) It should only be for core kernel code. If you have trace points that are the same on most archs, then sure. But if it is specific to one arch, then we are going to end up with arch code scattered through out the kernel again. I thought the idea of moving include/asm-x86 into arch/x86/include/asm was to consolidate arch code to one location? Thus, if a trace point is made for one arch, then it should be located in that arch. -- Steve -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v7 0/2] iosignalfd
(Applies to kvm.git/master:c27b64a0) This is v7 of the series. For more details, please see the header to patch 2/2. This series has been tested against the kvm-eventfd unit test, and appears to be functioning properly. You can download this test here: ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2 Please consider for inclusion to kvm.git [ Changelog: v7: *) Implemented a resource limit (CONFIG_KVM_MAX_IOSIGNALFD_ITEMS) to limit malicious/broken userspace from consuming arbitrary kernel memory. *) Rebased to kvm.git/master:c27b64a0, which already includes Marcelo's irq-lock rework. v6: *) Removed "FIXME?" comment on choice over RCU vs SRCU after discussion/numbers from Paul. I think RCU is fine to use for now based on the conversation. We can always convert it later if need be. *) Fixed the "group" free path to eliminate an RCU related race *) Fixed a memory/eventfd leak on shutdown for any iosignalfd's which were still active at the time the guest shuts down. *) Beefed up comments *) Rebased to kvm.git/master:0281e88f + irq locking rework and verified that kvm-eventfd unit test still passes. v5: *) Removed "cookie" field, which was a misunderstanding on my part on what Avi wanted for a data-match feature *) Added a new "trigger" data-match feature which I think is much closer to what we need. *) We retain the dev_count field in the io_bus infrastructure and instead back-fill the array on removal. *) Various minor cleanups *) Rebased to kvm.git/master:25deed73 v4: *) Fixed a bug in the original 2/4 where the PIT failure case would potentially leave the io_bus components registered. *) Condensed the v3 2/4 and 3/4 into one patch (2/2) since the patches became interdependent with the fix described above *) Rebased to kvm.git/master:74dfca0a v3: *) fixed patch 2/4 to handle error cases instead of BUG_ON *) implemented same HAVE_EVENTFD protection mechanism as irqfd to prevent compilation errors on unsupported arches *) completed testing *) rebased to kvm.git/master:7391a6d5 v2: *) added optional data-matching capability (via cookie field) *) changed name from iofd to iosignalfd *) added io_bus unregister function *) implemented deassign feature v1: *) original release (integrated into irqfd v7 series as "iofd") ] --- Gregory Haskins (2): KVM: add iosignalfd support KVM: make io_bus interface more robust arch/x86/kvm/Kconfig |8 + arch/x86/kvm/i8254.c | 22 ++ arch/x86/kvm/i8259.c |9 + arch/x86/kvm/x86.c|3 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 16 +- virt/kvm/Kconfig |2 virt/kvm/coalesced_mmio.c |8 + virt/kvm/eventfd.c| 402 + virt/kvm/ioapic.c |9 + virt/kvm/kvm_main.c | 41 - 11 files changed, 520 insertions(+), 15 deletions(-) -- Signature -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v7 1/2] KVM: make io_bus interface more robust
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by: Gregory Haskins --- arch/x86/kvm/i8254.c | 22 -- arch/x86/kvm/i8259.c |9 - include/linux/kvm_host.h |6 -- virt/kvm/coalesced_mmio.c |8 ++-- virt/kvm/ioapic.c |9 +++-- virt/kvm/kvm_main.c | 30 -- 6 files changed, 73 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 331705f..1c41715 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -586,6 +586,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) { struct kvm_pit *pit; struct kvm_kpit_state *pit_state; + int ret; pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL); if (!pit) @@ -620,14 +621,31 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); kvm_iodevice_init(&pit->dev, &pit_dev_ops); - kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev); + ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev); + if (ret < 0) + goto fail; if (flags & KVM_PIT_SPEAKER_DUMMY) { kvm_iodevice_init(&pit->speaker_dev, &speaker_dev_ops); - kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev); + ret = kvm_io_bus_register_dev(&kvm->pio_bus, + &pit->speaker_dev); + if (ret < 0) + goto fail; } return pit; + +fail: + if (flags & KVM_PIT_SPEAKER_DUMMY) + kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->speaker_dev); + + kvm_io_bus_unregister_dev(&kvm->pio_bus, &pit->dev); + + if (pit->irq_source_id >= 0) + kvm_free_irq_source_id(kvm, pit->irq_source_id); + + kfree(pit); + return NULL; } void kvm_free_pit(struct kvm *kvm) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 148c52a..66e37c2 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -532,6 +532,8 @@ static const struct kvm_io_device_ops picdev_ops = { struct kvm_pic *kvm_create_pic(struct kvm *kvm) { struct kvm_pic *s; + int ret; + s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL); if (!s) return NULL; @@ -548,6 +550,11 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm) * Initialize PIO device */ kvm_iodevice_init(&s->dev, &picdev_ops); - kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev); + ret = kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev); + if (ret < 0) { + kfree(s); + return NULL; + } + return s; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e4e78db..eafa2b3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -61,8 +61,10 @@ void kvm_io_bus_init(struct kvm_io_bus *bus); void kvm_io_bus_destroy(struct kvm_io_bus *bus); struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr, int len, int is_write); -void kvm_io_bus_register_dev(struct kvm_io_bus *bus, -struct kvm_io_device *dev); +int kvm_io_bus_register_dev(struct kvm_io_bus *bus, + struct kvm_io_device *dev); +void kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, + struct kvm_io_device *dev); struct kvm_vcpu { struct kvm *kvm; diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 397f419..3e89db8 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -94,6 +94,7 @@ static const struct kvm_io_device_ops coalesced_mmio_ops = { int kvm_coalesced_mmio_init(struct kvm *kvm) { struct kvm_coalesced_mmio_dev *dev; + int ret; dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) @@ -102,9 +103,12 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops); dev->kvm = kvm; kvm->coalesced_mmio_dev = dev; - kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev); - return 0; + ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev); + if (ret < 0) + kfree(dev); + + return ret; } int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm, diff --git a/virt/kvm/ioapic.c b/virt/
[KVM PATCH v7 2/2] KVM: add iosignalfd support
iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iosignalfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 11 iops, 9.09us rtt iosignalfd-mmio: 200100 iops, 5.00us rtt iosignalfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iosignalfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. Signed-off-by: Gregory Haskins --- arch/x86/kvm/Kconfig |8 + arch/x86/kvm/x86.c |3 include/linux/kvm.h | 15 ++ include/linux/kvm_host.h | 10 + virt/kvm/Kconfig |2 virt/kvm/eventfd.c | 402 ++ virt/kvm/kvm_main.c | 11 + 7 files changed, 447 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 7fbedfd..61b3b24 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -74,6 +74,14 @@ config KVM_TRACE relayfs. Note the ABI is not considered stable and will be modified in future updates. +config KVM_MAX_IOSIGNALFD_ITEMS + int "Maximum IOSIGNALFD items per address" + depends on KVM + default "32" + ---help--- + This option influences the maximum number of fd's per PIO/MMIO + address that are allowed to register + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/lguest/Kconfig diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b91ea7..c5fd38b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1121,6 +1121,9 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_MCE: r = KVM_MAX_MCE_BANKS; break; + case KVM_CAP_IOSIGNALFD: + r = CONFIG_KVM_MAX_IOSIGNALFD_ITEMS; + break; default: r = 0; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 38ff31e..9de6486 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -307,6 +307,19 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOSIGNALFD_FLAG_TRIGGER (1 << 0) +#define KVM_IOSIGNALFD_FLAG_PIO (1 << 1) +#define KVM_IOSIGNALFD_FLAG_DEASSIGN (1 << 2) + +struct kvm_iosignalfd { + __u64 trigger; + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __
Re: [PATCH 0/4] Add rudimentary Hyper-V guest support v3
On 06/15/2009 04:21 PM, Alexander Graf wrote: Now that we have nested SVM in place, let's make use of it and virtualize something non-kvm. The first interesting target that came to my mind here was Hyper-V. This patchset makes Windows Server 2008 boot with Hyper-V, which runs the "dom0" in virtualized mode already. It hangs somewhere in IDE code when booted, so I haven't been able to run a second VM within for now yet. Applied all, thanks. Please keep in mind that Hyper-V won't work unless you apply the userspace patches too Please rebase/repost those. and the PAT bit patch That's fd2e987d5 unless I'm confusing this with another issue. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Activate Virtualization On Demand v2
On 06/15/2009 02:30 PM, Alexander Graf wrote: X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. +static int hardware_enable_all(void) +{ + int r = 0; + + spin_lock(&kvm_lock); + + kvm_usage_count++; + if (kvm_usage_count == 1) { + atomic_set(&hardware_enable_failed, 1); + on_each_cpu(hardware_enable, NULL, 1); + + if (!atomic_dec_and_test(&hardware_enable_failed)) + r = -EBUSY; + } That's a little obfuscated. I suggest atomic_set(..., p) and atomic_read(...). + static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, void *v) { int cpu = (long)v; + if (!kvm_usage_count) + return NOTIFY_OK; + val&= ~CPU_TASKS_FROZEN; switch (val) { case CPU_DYING: @@ -2513,13 +2571,15 @@ static void kvm_exit_debug(void) static int kvm_suspend(struct sys_device *dev, pm_message_t state) { - hardware_disable(NULL); + if (kvm_usage_count) + hardware_disable(NULL); return 0; } static int kvm_resume(struct sys_device *dev) { - hardware_enable(NULL); + if (kvm_usage_count) + hardware_enable(NULL); return 0; } Please tell me you tested suspend/resume with/without VMs and cpu hotunplug/hotplug. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Activate Virtualization On Demand v2
On 06/15/2009 03:17 PM, Christoph Hellwig wrote: On Mon, Jun 15, 2009 at 01:30:05PM +0200, Alexander Graf wrote: X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. That will only become interesting if we every have such a thing in mainline. So NACK, lots of complication for no good reason. If it were truly lots of complication, I might agree. But it isn't, and we keep getting reports from users about it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: > irqfd and its underlying implementation, eventfd, currently utilize > the embedded wait-queue in eventfd for signal notification. The nice thing > about this design decision is that it re-uses the existing > eventfd/wait-queue code and it generally works wellwith several > limitations. > > One of the limitations is that notification callbacks are always called > inside a spin_lock_irqsave critical section. Another limitation is > that it is very difficult to build a system that can recieve release > notification without being racy. > > Therefore, we introduce a new registration interface that is SRCU based > instead of wait-queue based, and implement the internal wait-queue > infrastructure in terms of this new interface. We then convert irqfd > to use this new interface instead of the existing wait-queue code. > > The end result is that we now have the opportunity to run the interrupt > injection code serially to the callback (when the signal is raised from > process-context, at least) instead of always deferring the injection to a > work-queue. > > Signed-off-by: Gregory Haskins > CC: Paul E. McKenney > CC: Davide Libenzi > --- > > fs/eventfd.c| 115 > +++ > include/linux/eventfd.h | 30 > virt/kvm/eventfd.c | 114 > +-- > 3 files changed, 188 insertions(+), 71 deletions(-) > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 72f5f8d..505d5de 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -30,8 +30,47 @@ struct eventfd_ctx { >*/ > __u64 count; > unsigned int flags; > + struct srcu_struct srcu; > + struct list_head nh; > + struct eventfd_notifier notifier; > }; > > +static void _eventfd_wqh_notify(struct eventfd_notifier *en) > +{ > + struct eventfd_ctx *ctx = container_of(en, > +struct eventfd_ctx, > +notifier); > + > + if (waitqueue_active(&ctx->wqh)) > + wake_up_poll(&ctx->wqh, POLLIN); > +} > + > +static void _eventfd_notify(struct eventfd_ctx *ctx) > +{ > + struct eventfd_notifier *en; > + int idx; > + > + idx = srcu_read_lock(&ctx->srcu); > + > + /* > + * The goal here is to allow the notification to be preemptible > + * as often as possible. We cannot achieve this with the basic > + * wqh mechanism because it requires the wqh->lock. Therefore > + * we have an internal srcu list mechanism of which the wqh is > + * a client. > + * > + * Not all paths will invoke this function in process context. > + * Callers should check for suitable state before assuming they > + * can sleep (such as with preemptible()). Paul McKenney assures > + * me that srcu_read_lock is compatible with in-atomic, as long as > + * the code within the critical section is also compatible. > + */ > + list_for_each_entry_rcu(en, &ctx->nh, list) > + en->ops->signal(en); > + > + srcu_read_unlock(&ctx->srcu, idx); > +} > + > /* > * Adds "n" to the eventfd counter "count". Returns "n" in case of > * success, or a value lower then "n" in case of coutner overflow. This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false. Further, to do useful things it might not be enough that you can sleep: with iofd you also want to access current task with e.g. copy from user. Here's an idea: let's pass a flag to ->signal, along the lines of signal_is_task, that tells us that it is safe to use current, and add eventfd_signal_task() which is the same as eventfd_signal but lets everyone know that it's safe to both sleep and use current->mm. Makes sense? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Activate Virtualization On Demand v2
Avi Kivity wrote: > On 06/15/2009 02:30 PM, Alexander Graf wrote: >> X86 CPUs need to have some magic happening to enable the virtualization >> extensions on them. This magic can result in unpleasant results for >> users, like blocking other VMMs from working (vmx) or using invalid TLB >> entries (svm). >> >> Currently KVM activates virtualization when the respective kernel module >> is loaded. This blocks us from autoloading KVM modules without breaking >> other VMMs. >> >> To circumvent this problem at least a bit, this patch introduces on >> demand activation of virtualization. This means, that instead >> virtualization is enabled on creation of the first virtual machine >> and disabled on destruction of the last one. >> >> So using this, KVM can be easily autoloaded, while keeping other >> hypervisors usable. >> >> +static int hardware_enable_all(void) >> +{ >> +int r = 0; >> + >> +spin_lock(&kvm_lock); >> + >> +kvm_usage_count++; >> +if (kvm_usage_count == 1) { >> +atomic_set(&hardware_enable_failed, 1); >> +on_each_cpu(hardware_enable, NULL, 1); >> + >> +if (!atomic_dec_and_test(&hardware_enable_failed)) >> +r = -EBUSY; >> +} >> > > That's a little obfuscated. I suggest atomic_set(..., p) and > atomic_read(...). Ah, I was more searching for an atomic_test :-). >> static int kvm_cpu_hotplug(struct notifier_block *notifier, >> unsigned long val, >> void *v) >> { >> int cpu = (long)v; >> >> +if (!kvm_usage_count) >> +return NOTIFY_OK; >> + >> val&= ~CPU_TASKS_FROZEN; >> switch (val) { >> case CPU_DYING: >> @@ -2513,13 +2571,15 @@ static void kvm_exit_debug(void) >> >> static int kvm_suspend(struct sys_device *dev, pm_message_t state) >> { >> -hardware_disable(NULL); >> +if (kvm_usage_count) >> +hardware_disable(NULL); >> return 0; >> } >> >> static int kvm_resume(struct sys_device *dev) >> { >> -hardware_enable(NULL); >> +if (kvm_usage_count) >> +hardware_enable(NULL); >> return 0; >> } >> >> > + > > Please tell me you tested suspend/resume with/without VMs and cpu > hotunplug/hotplug. I tested cpu hotplugging. On the last round I tested suspend/resume, but this time I couldn't because my machine can't do suspend :-(. So I'll try hard and find a machine I can test it on for the next round. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: > >> irqfd and its underlying implementation, eventfd, currently utilize >> the embedded wait-queue in eventfd for signal notification. The nice thing >> about this design decision is that it re-uses the existing >> eventfd/wait-queue code and it generally works wellwith several >> limitations. >> >> One of the limitations is that notification callbacks are always called >> inside a spin_lock_irqsave critical section. Another limitation is >> that it is very difficult to build a system that can recieve release >> notification without being racy. >> >> Therefore, we introduce a new registration interface that is SRCU based >> instead of wait-queue based, and implement the internal wait-queue >> infrastructure in terms of this new interface. We then convert irqfd >> to use this new interface instead of the existing wait-queue code. >> >> The end result is that we now have the opportunity to run the interrupt >> injection code serially to the callback (when the signal is raised from >> process-context, at least) instead of always deferring the injection to a >> work-queue. >> >> Signed-off-by: Gregory Haskins >> CC: Paul E. McKenney >> CC: Davide Libenzi >> --- >> >> fs/eventfd.c| 115 >> +++ >> include/linux/eventfd.h | 30 >> virt/kvm/eventfd.c | 114 >> +-- >> 3 files changed, 188 insertions(+), 71 deletions(-) >> >> diff --git a/fs/eventfd.c b/fs/eventfd.c >> index 72f5f8d..505d5de 100644 >> --- a/fs/eventfd.c >> +++ b/fs/eventfd.c >> @@ -30,8 +30,47 @@ struct eventfd_ctx { >> */ >> __u64 count; >> unsigned int flags; >> +struct srcu_struct srcu; >> +struct list_head nh; >> +struct eventfd_notifier notifier; >> }; >> >> +static void _eventfd_wqh_notify(struct eventfd_notifier *en) >> +{ >> +struct eventfd_ctx *ctx = container_of(en, >> + struct eventfd_ctx, >> + notifier); >> + >> +if (waitqueue_active(&ctx->wqh)) >> +wake_up_poll(&ctx->wqh, POLLIN); >> +} >> + >> +static void _eventfd_notify(struct eventfd_ctx *ctx) >> +{ >> +struct eventfd_notifier *en; >> +int idx; >> + >> +idx = srcu_read_lock(&ctx->srcu); >> + >> +/* >> + * The goal here is to allow the notification to be preemptible >> + * as often as possible. We cannot achieve this with the basic >> + * wqh mechanism because it requires the wqh->lock. Therefore >> + * we have an internal srcu list mechanism of which the wqh is >> + * a client. >> + * >> + * Not all paths will invoke this function in process context. >> + * Callers should check for suitable state before assuming they >> + * can sleep (such as with preemptible()). Paul McKenney assures >> + * me that srcu_read_lock is compatible with in-atomic, as long as >> + * the code within the critical section is also compatible. >> + */ >> +list_for_each_entry_rcu(en, &ctx->nh, list) >> +en->ops->signal(en); >> + >> +srcu_read_unlock(&ctx->srcu, idx); >> +} >> + >> /* >> * Adds "n" to the eventfd counter "count". Returns "n" in case of >> * success, or a value lower then "n" in case of coutner overflow. >> > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false. > > Further, to do useful things it might not be enough that you can sleep: > with iofd you also want to access current task with e.g. copy from user. > > Here's an idea: let's pass a flag to ->signal, along the lines of > signal_is_task, that tells us that it is safe to use current, and add > eventfd_signal_task() which is the same as eventfd_signal but lets everyone > know that it's safe to both sleep and use current->mm. > > Makes sense? > It does make sense, yes. What I am not clear on is how would eventfd detect this state such as to populate such flags, and why cant the ->signal() CB do the same? Thanks Michael, -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: > >> irqfd and its underlying implementation, eventfd, currently utilize >> the embedded wait-queue in eventfd for signal notification. The nice thing >> about this design decision is that it re-uses the existing >> eventfd/wait-queue code and it generally works wellwith several >> limitations. >> >> One of the limitations is that notification callbacks are always called >> inside a spin_lock_irqsave critical section. Another limitation is >> that it is very difficult to build a system that can recieve release >> notification without being racy. >> >> Therefore, we introduce a new registration interface that is SRCU based >> instead of wait-queue based, and implement the internal wait-queue >> infrastructure in terms of this new interface. We then convert irqfd >> to use this new interface instead of the existing wait-queue code. >> >> The end result is that we now have the opportunity to run the interrupt >> injection code serially to the callback (when the signal is raised from >> process-context, at least) instead of always deferring the injection to a >> work-queue. >> >> Signed-off-by: Gregory Haskins >> CC: Paul E. McKenney >> CC: Davide Libenzi >> --- >> >> fs/eventfd.c| 115 >> +++ >> include/linux/eventfd.h | 30 >> virt/kvm/eventfd.c | 114 >> +-- >> 3 files changed, 188 insertions(+), 71 deletions(-) >> >> diff --git a/fs/eventfd.c b/fs/eventfd.c >> index 72f5f8d..505d5de 100644 >> --- a/fs/eventfd.c >> +++ b/fs/eventfd.c >> @@ -30,8 +30,47 @@ struct eventfd_ctx { >> */ >> __u64 count; >> unsigned int flags; >> +struct srcu_struct srcu; >> +struct list_head nh; >> +struct eventfd_notifier notifier; >> }; >> >> +static void _eventfd_wqh_notify(struct eventfd_notifier *en) >> +{ >> +struct eventfd_ctx *ctx = container_of(en, >> + struct eventfd_ctx, >> + notifier); >> + >> +if (waitqueue_active(&ctx->wqh)) >> +wake_up_poll(&ctx->wqh, POLLIN); >> +} >> + >> +static void _eventfd_notify(struct eventfd_ctx *ctx) >> +{ >> +struct eventfd_notifier *en; >> +int idx; >> + >> +idx = srcu_read_lock(&ctx->srcu); >> + >> +/* >> + * The goal here is to allow the notification to be preemptible >> + * as often as possible. We cannot achieve this with the basic >> + * wqh mechanism because it requires the wqh->lock. Therefore >> + * we have an internal srcu list mechanism of which the wqh is >> + * a client. >> + * >> + * Not all paths will invoke this function in process context. >> + * Callers should check for suitable state before assuming they >> + * can sleep (such as with preemptible()). Paul McKenney assures >> + * me that srcu_read_lock is compatible with in-atomic, as long as >> + * the code within the critical section is also compatible. >> + */ >> +list_for_each_entry_rcu(en, &ctx->nh, list) >> +en->ops->signal(en); >> + >> +srcu_read_unlock(&ctx->srcu, idx); >> +} >> + >> /* >> * Adds "n" to the eventfd counter "count". Returns "n" in case of >> * success, or a value lower then "n" in case of coutner overflow. >> > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false. > As an aside, this is something I would like to address. I keep running into this pattern where I could do something in-line if I had a "can_sleep()" context. Otherwise, I have to punt to something like a workqueue which adds latency. The closest thing I have to "can_sleep()" is preemptible(), which, as you correctly pointed out is limited to only working with CONFIG_PREEMPT=y. Its been a while since I looked into it, but one of the barriers that would need to be overcome is the fact that the preempt_count stuff gets compiled away with CONFIG_PREEMPT=n. It is conceivable that we could make the preempt_count logic an independent config variable from CONFIG_PREEMPT to provide a can_sleep() macro without requiring full-blow preemption to be enabled. So my questions would be as follows: a) Is the community conducive to such an idea? b) Are there other things to consider/fix besides the lack of preempt_count in order to implement such a beast? -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
[Adding Ingo] Gregory Haskins wrote: > Michael S. Tsirkin wrote: > >> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: >> >> >>> irqfd and its underlying implementation, eventfd, currently utilize >>> the embedded wait-queue in eventfd for signal notification. The nice thing >>> about this design decision is that it re-uses the existing >>> eventfd/wait-queue code and it generally works wellwith several >>> limitations. >>> >>> One of the limitations is that notification callbacks are always called >>> inside a spin_lock_irqsave critical section. Another limitation is >>> that it is very difficult to build a system that can recieve release >>> notification without being racy. >>> >>> Therefore, we introduce a new registration interface that is SRCU based >>> instead of wait-queue based, and implement the internal wait-queue >>> infrastructure in terms of this new interface. We then convert irqfd >>> to use this new interface instead of the existing wait-queue code. >>> >>> The end result is that we now have the opportunity to run the interrupt >>> injection code serially to the callback (when the signal is raised from >>> process-context, at least) instead of always deferring the injection to a >>> work-queue. >>> >>> Signed-off-by: Gregory Haskins >>> CC: Paul E. McKenney >>> CC: Davide Libenzi >>> --- >>> >>> fs/eventfd.c| 115 >>> +++ >>> include/linux/eventfd.h | 30 >>> virt/kvm/eventfd.c | 114 >>> +-- >>> 3 files changed, 188 insertions(+), 71 deletions(-) >>> >>> diff --git a/fs/eventfd.c b/fs/eventfd.c >>> index 72f5f8d..505d5de 100644 >>> --- a/fs/eventfd.c >>> +++ b/fs/eventfd.c >>> @@ -30,8 +30,47 @@ struct eventfd_ctx { >>> */ >>> __u64 count; >>> unsigned int flags; >>> + struct srcu_struct srcu; >>> + struct list_head nh; >>> + struct eventfd_notifier notifier; >>> }; >>> >>> +static void _eventfd_wqh_notify(struct eventfd_notifier *en) >>> +{ >>> + struct eventfd_ctx *ctx = container_of(en, >>> + struct eventfd_ctx, >>> + notifier); >>> + >>> + if (waitqueue_active(&ctx->wqh)) >>> + wake_up_poll(&ctx->wqh, POLLIN); >>> +} >>> + >>> +static void _eventfd_notify(struct eventfd_ctx *ctx) >>> +{ >>> + struct eventfd_notifier *en; >>> + int idx; >>> + >>> + idx = srcu_read_lock(&ctx->srcu); >>> + >>> + /* >>> +* The goal here is to allow the notification to be preemptible >>> +* as often as possible. We cannot achieve this with the basic >>> +* wqh mechanism because it requires the wqh->lock. Therefore >>> +* we have an internal srcu list mechanism of which the wqh is >>> +* a client. >>> +* >>> +* Not all paths will invoke this function in process context. >>> +* Callers should check for suitable state before assuming they >>> +* can sleep (such as with preemptible()). Paul McKenney assures >>> +* me that srcu_read_lock is compatible with in-atomic, as long as >>> +* the code within the critical section is also compatible. >>> +*/ >>> + list_for_each_entry_rcu(en, &ctx->nh, list) >>> + en->ops->signal(en); >>> + >>> + srcu_read_unlock(&ctx->srcu, idx); >>> +} >>> + >>> /* >>> * Adds "n" to the eventfd counter "count". Returns "n" in case of >>> * success, or a value lower then "n" in case of coutner overflow. >>> >>> >> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false. >> >> > > As an aside, this is something I would like to address. I keep running > into this pattern where I could do something in-line if I had a > "can_sleep()" context. Otherwise, I have to punt to something like a > workqueue which adds latency. The closest thing I have to "can_sleep()" > is preemptible(), which, as you correctly pointed out is limited to only > working with CONFIG_PREEMPT=y. > > Its been a while since I looked into it, but one of the barriers that > would need to be overcome is the fact that the preempt_count stuff gets > compiled away with CONFIG_PREEMPT=n. It is conceivable that we could > make the preempt_count logic an independent config variable from > CONFIG_PREEMPT to provide a can_sleep() macro without requiring > full-blow preemption to be enabled. So my questions would be as follows: > > a) Is the community conducive to such an idea? > b) Are there other things to consider/fix besides the lack of > preempt_count in order to implement such a beast? > > -Greg > > > Hi Ingo, Any thoughts here? -Greg signature.asc Description: OpenPGP digital signature
Re: strange guest slowness after some time
Felix Leimbach wrote: It's exactly the same CPU I have. Interesting: Since two months I'm running on 2 Shanghai Quad-Cores instead and the problem is definitely gone. The rest of the hardware as well as the whole software-stack remained unchanged. That should confirm what we assumed already. For me, it turned out that KVM I was running (coming with Proxmox VE) had a "fairsched" patch (OpenVZ-related) which caused this broken behaviour. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: init bsp_vcpu before kvm_arch_vcpu_init
On x86 mp_state is initialized by kvm_arch_vcpu_init. Right now kvm_vcpu_is_bsp returns false because kvm->bsp_vcpu has not been initialized, so vcpu_id == 0 ends up with mp_state == KVM_MP_STATE_UNINITIALIZED. Gleb do you see a better way to fix this? Signed-off-by: Marcelo Tosatti diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8939ffa..7225064 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -773,6 +773,13 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) struct page *page; int r; + mutex_lock(&kvm->lock); +#ifdef CONFIG_KVM_APIC_ARCHITECTURE + if (kvm->bsp_vcpu_id == id) + kvm->bsp_vcpu = vcpu; +#endif + mutex_unlock(&kvm->lock); + mutex_init(&vcpu->mutex); vcpu->cpu = -1; vcpu->kvm = kvm; @@ -1760,14 +1767,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) smp_wmb(); atomic_inc(&kvm->online_vcpus); -#ifdef CONFIG_KVM_APIC_ARCHITECTURE - if (kvm->bsp_vcpu_id == id) - kvm->bsp_vcpu = vcpu; -#endif mutex_unlock(&kvm->lock); return r; vcpu_destroy: + if (kvm->bsp_vcpu_id == id) + kvm->bsp_vcpu = NULL; mutex_unlock(&kvm->lock); kvm_arch_vcpu_destroy(vcpu); return r; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, Jun 16, 2009 at 10:11:08AM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: > > > >> irqfd and its underlying implementation, eventfd, currently utilize > >> the embedded wait-queue in eventfd for signal notification. The nice thing > >> about this design decision is that it re-uses the existing > >> eventfd/wait-queue code and it generally works wellwith several > >> limitations. > >> > >> One of the limitations is that notification callbacks are always called > >> inside a spin_lock_irqsave critical section. Another limitation is > >> that it is very difficult to build a system that can recieve release > >> notification without being racy. > >> > >> Therefore, we introduce a new registration interface that is SRCU based > >> instead of wait-queue based, and implement the internal wait-queue > >> infrastructure in terms of this new interface. We then convert irqfd > >> to use this new interface instead of the existing wait-queue code. > >> > >> The end result is that we now have the opportunity to run the interrupt > >> injection code serially to the callback (when the signal is raised from > >> process-context, at least) instead of always deferring the injection to a > >> work-queue. > >> > >> Signed-off-by: Gregory Haskins > >> CC: Paul E. McKenney > >> CC: Davide Libenzi > >> --- > >> > >> fs/eventfd.c| 115 > >> +++ > >> include/linux/eventfd.h | 30 > >> virt/kvm/eventfd.c | 114 > >> +-- > >> 3 files changed, 188 insertions(+), 71 deletions(-) > >> > >> diff --git a/fs/eventfd.c b/fs/eventfd.c > >> index 72f5f8d..505d5de 100644 > >> --- a/fs/eventfd.c > >> +++ b/fs/eventfd.c > >> @@ -30,8 +30,47 @@ struct eventfd_ctx { > >> */ > >>__u64 count; > >>unsigned int flags; > >> + struct srcu_struct srcu; > >> + struct list_head nh; > >> + struct eventfd_notifier notifier; > >> }; > >> > >> +static void _eventfd_wqh_notify(struct eventfd_notifier *en) > >> +{ > >> + struct eventfd_ctx *ctx = container_of(en, > >> + struct eventfd_ctx, > >> + notifier); > >> + > >> + if (waitqueue_active(&ctx->wqh)) > >> + wake_up_poll(&ctx->wqh, POLLIN); > >> +} > >> + > >> +static void _eventfd_notify(struct eventfd_ctx *ctx) > >> +{ > >> + struct eventfd_notifier *en; > >> + int idx; > >> + > >> + idx = srcu_read_lock(&ctx->srcu); > >> + > >> + /* > >> + * The goal here is to allow the notification to be preemptible > >> + * as often as possible. We cannot achieve this with the basic > >> + * wqh mechanism because it requires the wqh->lock. Therefore > >> + * we have an internal srcu list mechanism of which the wqh is > >> + * a client. > >> + * > >> + * Not all paths will invoke this function in process context. > >> + * Callers should check for suitable state before assuming they > >> + * can sleep (such as with preemptible()). Paul McKenney assures > >> + * me that srcu_read_lock is compatible with in-atomic, as long as > >> + * the code within the critical section is also compatible. > >> + */ > >> + list_for_each_entry_rcu(en, &ctx->nh, list) > >> + en->ops->signal(en); > >> + > >> + srcu_read_unlock(&ctx->srcu, idx); > >> +} > >> + > >> /* > >> * Adds "n" to the eventfd counter "count". Returns "n" in case of > >> * success, or a value lower then "n" in case of coutner overflow. > >> > > > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always > > false. > > > > Further, to do useful things it might not be enough that you can sleep: > > with iofd you also want to access current task with e.g. copy from user. > > > > Here's an idea: let's pass a flag to ->signal, along the lines of > > signal_is_task, that tells us that it is safe to use current, and add > > eventfd_signal_task() which is the same as eventfd_signal but lets everyone > > know that it's safe to both sleep and use current->mm. > > > > Makes sense? > > > > It does make sense, yes. What I am not clear on is how would eventfd > detect this state such as to populate such flags, and why cant the > ->signal() CB do the same? > > Thanks Michael, > -Greg > eventfd can't detect this state. But the callers know in what context they are. So the *caller* of eventfd_signal_task makes sure of this: if you are in a task, you can call eventfd_signal_task() if not, you must call eventfd_signal. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: > >> irqfd and its underlying implementation, eventfd, currently utilize >> the embedded wait-queue in eventfd for signal notification. The nice thing >> about this design decision is that it re-uses the existing >> eventfd/wait-queue code and it generally works wellwith several >> limitations. >> >> One of the limitations is that notification callbacks are always called >> inside a spin_lock_irqsave critical section. Another limitation is >> that it is very difficult to build a system that can recieve release >> notification without being racy. >> >> Therefore, we introduce a new registration interface that is SRCU based >> instead of wait-queue based, and implement the internal wait-queue >> infrastructure in terms of this new interface. We then convert irqfd >> to use this new interface instead of the existing wait-queue code. >> >> The end result is that we now have the opportunity to run the interrupt >> injection code serially to the callback (when the signal is raised from >> process-context, at least) instead of always deferring the injection to a >> work-queue. >> >> Signed-off-by: Gregory Haskins >> CC: Paul E. McKenney >> CC: Davide Libenzi >> --- >> >> fs/eventfd.c| 115 >> +++ >> include/linux/eventfd.h | 30 >> virt/kvm/eventfd.c | 114 >> +-- >> 3 files changed, 188 insertions(+), 71 deletions(-) >> >> diff --git a/fs/eventfd.c b/fs/eventfd.c >> index 72f5f8d..505d5de 100644 >> --- a/fs/eventfd.c >> +++ b/fs/eventfd.c >> @@ -30,8 +30,47 @@ struct eventfd_ctx { >> */ >> __u64 count; >> unsigned int flags; >> +struct srcu_struct srcu; >> +struct list_head nh; >> +struct eventfd_notifier notifier; >> }; >> >> +static void _eventfd_wqh_notify(struct eventfd_notifier *en) >> +{ >> +struct eventfd_ctx *ctx = container_of(en, >> + struct eventfd_ctx, >> + notifier); >> + >> +if (waitqueue_active(&ctx->wqh)) >> +wake_up_poll(&ctx->wqh, POLLIN); >> +} >> + >> +static void _eventfd_notify(struct eventfd_ctx *ctx) >> +{ >> +struct eventfd_notifier *en; >> +int idx; >> + >> +idx = srcu_read_lock(&ctx->srcu); >> + >> +/* >> + * The goal here is to allow the notification to be preemptible >> + * as often as possible. We cannot achieve this with the basic >> + * wqh mechanism because it requires the wqh->lock. Therefore >> + * we have an internal srcu list mechanism of which the wqh is >> + * a client. >> + * >> + * Not all paths will invoke this function in process context. >> + * Callers should check for suitable state before assuming they >> + * can sleep (such as with preemptible()). Paul McKenney assures >> + * me that srcu_read_lock is compatible with in-atomic, as long as >> + * the code within the critical section is also compatible. >> + */ >> +list_for_each_entry_rcu(en, &ctx->nh, list) >> +en->ops->signal(en); >> + >> +srcu_read_unlock(&ctx->srcu, idx); >> +} >> + >> /* >> * Adds "n" to the eventfd counter "count". Returns "n" in case of >> * success, or a value lower then "n" in case of coutner overflow. >> > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false. > > Further, to do useful things it might not be enough that you can sleep: > with iofd you also want to access current task with e.g. copy from user. > Something else to consider: For iosignalfd, we can assume we will always be called from vcpu process context so we might not really need official affirmation from the system. For irqfd, we cannot predict who may be injecting the interrupt (for instance, it might be a PCI-passthrough hard-irq context). I am not sure if this knowledge actually helps to simplify the problem space, but I thought I should mention it. -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, Jun 16, 2009 at 10:40:55AM -0400, Gregory Haskins wrote: > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always > > false. > > > > Further, to do useful things it might not be enough that you can sleep: > > with iofd you also want to access current task with e.g. copy from user. > > > > Something else to consider: For iosignalfd, we can assume we will > always be called from vcpu process context so we might not really need > official affirmation from the system. For irqfd, we cannot predict who > may be injecting the interrupt (for instance, it might be a > PCI-passthrough hard-irq context). I am not sure if this knowledge > actually helps to simplify the problem space, but I thought I should > mention it. > > -Greg > > The way this is addressed with eventfd_signal_task proposal is: - user calls eventfd_signal_task we look at current->mm and figure out whether this is the right context or we need a context switch - everyone else calls eventfd_signal we know that we need a context switch -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Tue, Jun 16, 2009 at 10:11:08AM -0400, Gregory Haskins wrote: > >> Michael S. Tsirkin wrote: >> >>> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: >>> >>> irqfd and its underlying implementation, eventfd, currently utilize the embedded wait-queue in eventfd for signal notification. The nice thing about this design decision is that it re-uses the existing eventfd/wait-queue code and it generally works wellwith several limitations. One of the limitations is that notification callbacks are always called inside a spin_lock_irqsave critical section. Another limitation is that it is very difficult to build a system that can recieve release notification without being racy. Therefore, we introduce a new registration interface that is SRCU based instead of wait-queue based, and implement the internal wait-queue infrastructure in terms of this new interface. We then convert irqfd to use this new interface instead of the existing wait-queue code. The end result is that we now have the opportunity to run the interrupt injection code serially to the callback (when the signal is raised from process-context, at least) instead of always deferring the injection to a work-queue. Signed-off-by: Gregory Haskins CC: Paul E. McKenney CC: Davide Libenzi --- fs/eventfd.c| 115 +++ include/linux/eventfd.h | 30 virt/kvm/eventfd.c | 114 +-- 3 files changed, 188 insertions(+), 71 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 72f5f8d..505d5de 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -30,8 +30,47 @@ struct eventfd_ctx { */ __u64 count; unsigned int flags; + struct srcu_struct srcu; + struct list_head nh; + struct eventfd_notifier notifier; }; +static void _eventfd_wqh_notify(struct eventfd_notifier *en) +{ + struct eventfd_ctx *ctx = container_of(en, + struct eventfd_ctx, + notifier); + + if (waitqueue_active(&ctx->wqh)) + wake_up_poll(&ctx->wqh, POLLIN); +} + +static void _eventfd_notify(struct eventfd_ctx *ctx) +{ + struct eventfd_notifier *en; + int idx; + + idx = srcu_read_lock(&ctx->srcu); + + /* + * The goal here is to allow the notification to be preemptible + * as often as possible. We cannot achieve this with the basic + * wqh mechanism because it requires the wqh->lock. Therefore + * we have an internal srcu list mechanism of which the wqh is + * a client. + * + * Not all paths will invoke this function in process context. + * Callers should check for suitable state before assuming they + * can sleep (such as with preemptible()). Paul McKenney assures + * me that srcu_read_lock is compatible with in-atomic, as long as + * the code within the critical section is also compatible. + */ + list_for_each_entry_rcu(en, &ctx->nh, list) + en->ops->signal(en); + + srcu_read_unlock(&ctx->srcu, idx); +} + /* * Adds "n" to the eventfd counter "count". Returns "n" in case of * success, or a value lower then "n" in case of coutner overflow. >>> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always >>> false. >>> >>> Further, to do useful things it might not be enough that you can sleep: >>> with iofd you also want to access current task with e.g. copy from user. >>> >>> Here's an idea: let's pass a flag to ->signal, along the lines of >>> signal_is_task, that tells us that it is safe to use current, and add >>> eventfd_signal_task() which is the same as eventfd_signal but lets everyone >>> know that it's safe to both sleep and use current->mm. >>> >>> Makes sense? >>> >>> >> It does make sense, yes. What I am not clear on is how would eventfd >> detect this state such as to populate such flags, and why cant the >> ->signal() CB do the same? >> >> Thanks Michael, >> -Greg >> >> > > eventfd can't detect this state. But the callers know in what context they > are. > So the *caller* of eventfd_signal_task makes sure of this: if you are in a > task, > you can call eventfd_signal_task() if not, you must call eventfd_signal. > > > Hmm, this is an interesting idea, but I think it would be problematic in real-world applications for the long-term. For instance, the -rt tree and irq-threads .config option in the process of merging into mainline changes context types for established code. Therefore,
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Gregory Haskins wrote: > Michael S. Tsirkin wrote: > >> On Tue, Jun 16, 2009 at 10:11:08AM -0400, Gregory Haskins wrote: >> >> >>> Michael S. Tsirkin wrote: >>> >>> On Mon, Jun 15, 2009 at 10:29:56PM -0400, Gregory Haskins wrote: > irqfd and its underlying implementation, eventfd, currently utilize > the embedded wait-queue in eventfd for signal notification. The nice > thing > about this design decision is that it re-uses the existing > eventfd/wait-queue code and it generally works wellwith several > limitations. > > One of the limitations is that notification callbacks are always called > inside a spin_lock_irqsave critical section. Another limitation is > that it is very difficult to build a system that can recieve release > notification without being racy. > > Therefore, we introduce a new registration interface that is SRCU based > instead of wait-queue based, and implement the internal wait-queue > infrastructure in terms of this new interface. We then convert irqfd > to use this new interface instead of the existing wait-queue code. > > The end result is that we now have the opportunity to run the interrupt > injection code serially to the callback (when the signal is raised from > process-context, at least) instead of always deferring the injection to a > work-queue. > > Signed-off-by: Gregory Haskins > CC: Paul E. McKenney > CC: Davide Libenzi > --- > > fs/eventfd.c| 115 > +++ > include/linux/eventfd.h | 30 > virt/kvm/eventfd.c | 114 > +-- > 3 files changed, 188 insertions(+), 71 deletions(-) > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 72f5f8d..505d5de 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -30,8 +30,47 @@ struct eventfd_ctx { >*/ > __u64 count; > unsigned int flags; > + struct srcu_struct srcu; > + struct list_head nh; > + struct eventfd_notifier notifier; > }; > > +static void _eventfd_wqh_notify(struct eventfd_notifier *en) > +{ > + struct eventfd_ctx *ctx = container_of(en, > +struct eventfd_ctx, > +notifier); > + > + if (waitqueue_active(&ctx->wqh)) > + wake_up_poll(&ctx->wqh, POLLIN); > +} > + > +static void _eventfd_notify(struct eventfd_ctx *ctx) > +{ > + struct eventfd_notifier *en; > + int idx; > + > + idx = srcu_read_lock(&ctx->srcu); > + > + /* > + * The goal here is to allow the notification to be preemptible > + * as often as possible. We cannot achieve this with the basic > + * wqh mechanism because it requires the wqh->lock. Therefore > + * we have an internal srcu list mechanism of which the wqh is > + * a client. > + * > + * Not all paths will invoke this function in process context. > + * Callers should check for suitable state before assuming they > + * can sleep (such as with preemptible()). Paul McKenney assures > + * me that srcu_read_lock is compatible with in-atomic, as long as > + * the code within the critical section is also compatible. > + */ > + list_for_each_entry_rcu(en, &ctx->nh, list) > + en->ops->signal(en); > + > + srcu_read_unlock(&ctx->srcu, idx); > +} > + > /* > * Adds "n" to the eventfd counter "count". Returns "n" in case of > * success, or a value lower then "n" in case of coutner overflow. > > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always false. Further, to do useful things it might not be enough that you can sleep: with iofd you also want to access current task with e.g. copy from user. Here's an idea: let's pass a flag to ->signal, along the lines of signal_is_task, that tells us that it is safe to use current, and add eventfd_signal_task() which is the same as eventfd_signal but lets everyone know that it's safe to both sleep and use current->mm. Makes sense? >>> It does make sense, yes. What I am not clear on is how would eventfd >>> detect this state such as to populate such flags, and why cant the >>> ->signal() CB do the same? >>> >>> Thanks Michael, >>> -Greg >>> >>> >>> >> eventfd can't detect this state. But the callers know in what context they >> are. >> So the *caller* of eventfd_signal_task makes sure of this: if you are in a >> task, >> you can call eventfd_signal_task() if not, you must call eventfd_signal. >> >> >> >> > Hmm, this is an interesting idea, but I think it would be
Re: [PATCH] remove ppc functions from callbacks
On Tue, 2009-06-16 at 09:34 -0400, Glauber Costa wrote: > handle_powerpc_dcr_read() and handle_powerpc_dcr_write() are two > powerpc specific functions that are called via libkvm callbacks. > However, grepping the source code finds simply no use of them. This > is probably due to the fact that powerpc now relies on a totally > qemu upstream implementation of kvm, and does not need it anymore. > > Anyway, I'm providing this patch separatedly, so that if it breaks > for whenever reason, we can identify a bisection point easily > > Signed-off-by: Glauber Costa > CC: Hollis Blanchard Yup, this path is handled via kvm_arch_handle_exit() now. Thanks. Acked-by: Hollis Blanchard -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, Jun 16, 2009 at 10:48:27AM -0400, Gregory Haskins wrote: > +static void _eventfd_notify(struct eventfd_ctx *ctx) > +{ > +struct eventfd_notifier *en; > +int idx; > + > +idx = srcu_read_lock(&ctx->srcu); > + > +/* > + * The goal here is to allow the notification to be preemptible > + * as often as possible. We cannot achieve this with the basic > + * wqh mechanism because it requires the wqh->lock. Therefore > + * we have an internal srcu list mechanism of which the wqh is > + * a client. > + * > + * Not all paths will invoke this function in process context. > + * Callers should check for suitable state before assuming they > + * can sleep (such as with preemptible()). Paul McKenney > assures > + * me that srcu_read_lock is compatible with in-atomic, as long > as > + * the code within the critical section is also compatible. > + */ > +list_for_each_entry_rcu(en, &ctx->nh, list) > +en->ops->signal(en); > + > +srcu_read_unlock(&ctx->srcu, idx); > +} > + > /* > * Adds "n" to the eventfd counter "count". Returns "n" in case of > * success, or a value lower then "n" in case of coutner overflow. > > > >>> This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always > >>> false. > >>> > >>> Further, to do useful things it might not be enough that you can sleep: > >>> with iofd you also want to access current task with e.g. copy from user. > >>> > >>> Here's an idea: let's pass a flag to ->signal, along the lines of > >>> signal_is_task, that tells us that it is safe to use current, and add > >>> eventfd_signal_task() which is the same as eventfd_signal but lets > >>> everyone > >>> know that it's safe to both sleep and use current->mm. > >>> > >>> Makes sense? > >>> > >>> > >> It does make sense, yes. What I am not clear on is how would eventfd > >> detect this state such as to populate such flags, and why cant the > >> ->signal() CB do the same? > >> > >> Thanks Michael, > >> -Greg > >> > >> > > > > eventfd can't detect this state. But the callers know in what context they > > are. > > So the *caller* of eventfd_signal_task makes sure of this: if you are in a > > task, > > you can call eventfd_signal_task() if not, you must call eventfd_signal. > > > > > > > Hmm, this is an interesting idea, but I think it would be problematic in > real-world applications for the long-term. For instance, the -rt tree > and irq-threads .config option in the process of merging into mainline > changes context types for established code. Therefore, what might be > "hardirq/softirq" logic today may execute in a kthread tomorrow. That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is just an optimization. I think everyone not in the context of a system call or vmexit can just call eventfd_signal_task. > I > think its dangerous to try to solve the problem with caller provided > info: the caller may be ignorant of its true state. I assume this wasn't clear enough: the idea is that you only calls eventfd_signal_task if you know you are on a systemcall path. If you are ignorant of the state, call eventfd_signal. > IMO, the ideal > solution needs to be something we can detect at run-time. > > Thanks Michael, > -Greg > -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH 2/4] kvm_guest_wizard: pass 'params' directly to barrier_2()
On 06/15/2009 10:45 PM, Michael Goldish wrote: Currently parameters for barrier_2() are extracted from 'params' in the main run_steps() test routine, and then passed to barrier_2(). Instead, let barrier_2() extract parameters from 'params' as it sees fit. This will make adding new parameters slightly easier and cleaner. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_guest_wizard.py | 37 - 1 files changed, 18 insertions(+), 19 deletions(-) diff --git a/client/tests/kvm/kvm_guest_wizard.py b/client/tests/kvm/kvm_guest_wizard.py index 143e61e..eb0e2d5 100644 --- a/client/tests/kvm/kvm_guest_wizard.py +++ b/client/tests/kvm/kvm_guest_wizard.py @@ -41,6 +41,18 @@ def barrier_2(vm, words, fail_if_stuck_for, stuck_detection_history, "cropped_scrdump_expected.ppm") comparison_filename = os.path.join(debug_dir, "comparison.ppm") +fail_if_stuck_for = params.get("fail_if_stuck_for") +if fail_if_stuck_for: +fail_if_stuck_for = float(fail_if_stuck_for) +else: +fail_if_stuck_for = 1e308 + +stuck_detection_history = params.get("stuck_detection_history") +if stuck_detection_history: +stuck_detection_history = int(stuck_detection_history) +else: +stuck_detection_history = 2 + Could be simplified by (preferably in a separate patch): fail_if_stuck_for = params.get("fail_if_stuck_for", 1e308) fail_if_stuck_for = float(fail_if_stuck_for) stuck_detection_history = params.get("stuck_detection_history", 2) stuck_detection_history = int(stuck_detection_history) or even with a single line for each param: var = cast(params.get(var_name, default_value)) Regards, Uri -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Activate Virtualization On Demand v2
On 06/16/2009 05:08 PM, Alexander Graf wrote: Please tell me you tested suspend/resume with/without VMs and cpu hotunplug/hotplug. I tested cpu hotplugging. On the last round I tested suspend/resume, but this time I couldn't because my machine can't do suspend :-(. So I'll try hard and find a machine I can test it on for the next round. I can test suspend/resume for you if you don't have a friendly machine. I have a personal interest in keeping it working :) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, Jun 16, 2009 at 10:54:28AM -0400, Gregory Haskins wrote: > > +static void _eventfd_notify(struct eventfd_ctx *ctx) > > +{ > > + struct eventfd_notifier *en; > > + int idx; > > + > > + idx = srcu_read_lock(&ctx->srcu); > > + > > + /* > > +* The goal here is to allow the notification to be preemptible > > +* as often as possible. We cannot achieve this with the basic > > +* wqh mechanism because it requires the wqh->lock. Therefore > > +* we have an internal srcu list mechanism of which the wqh is > > +* a client. > > +* > > +* Not all paths will invoke this function in process context. > > +* Callers should check for suitable state before assuming they > > +* can sleep (such as with preemptible()). Paul McKenney > > assures > > +* me that srcu_read_lock is compatible with in-atomic, as long > > as > > +* the code within the critical section is also compatible. > > +*/ > > + list_for_each_entry_rcu(en, &ctx->nh, list) > > + en->ops->signal(en); > > + > > + srcu_read_unlock(&ctx->srcu, idx); > > +} > > + > > /* > > * Adds "n" to the eventfd counter "count". Returns "n" in case of > > * success, or a value lower then "n" in case of coutner overflow. > > > > > > > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always > false. > > Further, to do useful things it might not be enough that you can sleep: > with iofd you also want to access current task with e.g. copy from user. > > Here's an idea: let's pass a flag to ->signal, along the lines of > signal_is_task, that tells us that it is safe to use current, and add > eventfd_signal_task() which is the same as eventfd_signal but lets > everyone > know that it's safe to both sleep and use current->mm. > > Makes sense? > > > > >>> It does make sense, yes. What I am not clear on is how would eventfd > >>> detect this state such as to populate such flags, and why cant the > >>> ->signal() CB do the same? > >>> > >>> Thanks Michael, > >>> -Greg > >>> > >>> > >>> > >> eventfd can't detect this state. But the callers know in what context they > >> are. > >> So the *caller* of eventfd_signal_task makes sure of this: if you are in a > >> task, > >> you can call eventfd_signal_task() if not, you must call eventfd_signal. > >> > >> > >> > >> > > Hmm, this is an interesting idea, but I think it would be problematic in > > real-world applications for the long-term. For instance, the -rt tree > > and irq-threads .config option in the process of merging into mainline > > changes context types for established code. Therefore, what might be > > "hardirq/softirq" logic today may execute in a kthread tomorrow. I > > think its dangerous to try to solve the problem with caller provided > > info: the caller may be ignorant of its true state. > > Also, we need to consider that a process context can still be in-atomic > if the user did something like disabled interrupts, preemption, used a > spinlock, etc, before calling the eventfd_signal_task() function. > Perhaps we can put a stake in the ground that says you must not call > this from atomic context, That's the ticket. > but I still prefer just being able to detect > this from our state. > > -Greg > > -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Tue, Jun 16, 2009 at 10:48:27AM -0400, Gregory Haskins wrote: > >> +static void _eventfd_notify(struct eventfd_ctx *ctx) >> +{ >> +struct eventfd_notifier *en; >> +int idx; >> + >> +idx = srcu_read_lock(&ctx->srcu); >> + >> +/* >> + * The goal here is to allow the notification to be preemptible >> + * as often as possible. We cannot achieve this with the basic >> + * wqh mechanism because it requires the wqh->lock. Therefore >> + * we have an internal srcu list mechanism of which the wqh is >> + * a client. >> + * >> + * Not all paths will invoke this function in process context. >> + * Callers should check for suitable state before assuming they >> + * can sleep (such as with preemptible()). Paul McKenney >> assures >> + * me that srcu_read_lock is compatible with in-atomic, as long >> as >> + * the code within the critical section is also compatible. >> + */ >> +list_for_each_entry_rcu(en, &ctx->nh, list) >> +en->ops->signal(en); >> + >> +srcu_read_unlock(&ctx->srcu, idx); >> +} >> + >> /* >> * Adds "n" to the eventfd counter "count". Returns "n" in case of >> * success, or a value lower then "n" in case of coutner overflow. >> >> >> > This is ugly, isn't it? With CONFIG_PREEMPT=no preemptible() is always > false. > > Further, to do useful things it might not be enough that you can sleep: > with iofd you also want to access current task with e.g. copy from user. > > Here's an idea: let's pass a flag to ->signal, along the lines of > signal_is_task, that tells us that it is safe to use current, and add > eventfd_signal_task() which is the same as eventfd_signal but lets > everyone > know that it's safe to both sleep and use current->mm. > > Makes sense? > > > It does make sense, yes. What I am not clear on is how would eventfd detect this state such as to populate such flags, and why cant the ->signal() CB do the same? Thanks Michael, -Greg >>> eventfd can't detect this state. But the callers know in what context they >>> are. >>> So the *caller* of eventfd_signal_task makes sure of this: if you are in a >>> task, >>> you can call eventfd_signal_task() if not, you must call eventfd_signal. >>> >>> >>> >>> >> Hmm, this is an interesting idea, but I think it would be problematic in >> real-world applications for the long-term. For instance, the -rt tree >> and irq-threads .config option in the process of merging into mainline >> changes context types for established code. Therefore, what might be >> "hardirq/softirq" logic today may execute in a kthread tomorrow. >> > > That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is > just > an optimization. I think everyone not in the context of a system call or > vmexit > can just call eventfd_signal_task. > I assume you meant s/eventfd_signal_task/eventfd_signal there? > >> I >> think its dangerous to try to solve the problem with caller provided >> info: the caller may be ignorant of its true state. >> > > I assume this wasn't clear enough: the idea is that you only > calls eventfd_signal_task if you know you are on a systemcall path. > If you are ignorant of the state, call eventfd_signal. > Well, its not a matter of correctness. Its more for optimal performance. If I have PCI pass-though injecting interrupts from hardirq in mainline, clearly eventfd_signal() is proper. In -rt, the hardirq is transparently converted to a kthread, so technically eventfd_signal_task() would work (at least for the can_sleep() part, not for current->mm per se). But in this case, the PCI logic would not know it was converted to a kthread. It all happens transparently in the low-level code and the pci code is unmodified. In this case, your proposal would have the passthrough path invoking irqfd with eventfd_signal(). It would therefore still shunt to a workqueue to inject the interrupt, even though it would have been perfectly fine to just inject it directly because taking mutex_lock(&kvm->irq_lock) is legal. Perhaps I am over-optimizing, but this is the scenario I am concerned about and what I was trying to address with preemptible()/can_sleep(). I think your idea is a good one to address the current->mm portion. It would only ever be safe to access the MM context from syscall/vmexit context, as you point out. Therefore, I see no problem with implementing something like iosignalfd with eventfd_signal_task(). But accessing current->mm is only a subset of
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, Jun 16, 2009 at 11:20:18AM -0400, Gregory Haskins wrote: > >>> eventfd can't detect this state. But the callers know in what context > >>> they are. > >>> So the *caller* of eventfd_signal_task makes sure of this: if you are in > >>> a task, > >>> you can call eventfd_signal_task() if not, you must call eventfd_signal. > >>> > >>> > >>> > >>> > >> Hmm, this is an interesting idea, but I think it would be problematic in > >> real-world applications for the long-term. For instance, the -rt tree > >> and irq-threads .config option in the process of merging into mainline > >> changes context types for established code. Therefore, what might be > >> "hardirq/softirq" logic today may execute in a kthread tomorrow. > >> > > > > That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is > > just > > an optimization. I think everyone not in the context of a system call or > > vmexit > > can just call eventfd_signal_task. > > > > > I assume you meant s/eventfd_signal_task/eventfd_signal there? Yea. Sorry. > > > >> I > >> think its dangerous to try to solve the problem with caller provided > >> info: the caller may be ignorant of its true state. > >> > > > > I assume this wasn't clear enough: the idea is that you only > > calls eventfd_signal_task if you know you are on a systemcall path. > > If you are ignorant of the state, call eventfd_signal. > > > > Well, its not a matter of correctness. Its more for optimal > performance. If I have PCI pass-though injecting interrupts from > hardirq in mainline, clearly eventfd_signal() is proper. In -rt, the > hardirq is transparently converted to a kthread, so technically > eventfd_signal_task() would work I think it's wrong to sleep for a long time while handling interrupts even if you technically can with threaded interrupts. For that matter, I think you can sleep while within a spinlock if preempt is on, but that does not mean you should - and I think you will get warnings in the log if you do. No? > (at least for the can_sleep() part, not > for current->mm per se). But in this case, the PCI logic would not know > it was converted to a kthread. It all happens transparently in the > low-level code and the pci code is unmodified. > > In this case, your proposal would have the passthrough path invoking > irqfd with eventfd_signal(). It would therefore still shunt to a > workqueue to inject the interrupt, even though it would have been > perfectly fine to just inject it directly because taking > mutex_lock(&kvm->irq_lock) is legal. This specific issue should just be addressed by making it possible to inject the interrupt from an atomic context. > Perhaps I am over-optimizing, but > this is the scenario I am concerned about and what I was trying to > address with preemptible()/can_sleep(). What, a path that is never invoked without threaded interrupts? Yes, I would say it currently looks like an over-optimization. > I think your idea is a good one to address the current->mm portion. It > would only ever be safe to access the MM context from syscall/vmexit > context, as you point out. Therefore, I see no problem with > implementing something like iosignalfd with eventfd_signal_task(). > > But accessing current->mm is only a subset of the use-cases. The other > use-cases would include the ability to sleep, and possibly the ability > to address other->mm. For these latter cases, I really only need the > "can_sleep()" behavior, not the full blown "can_access_current_mm()". > Additionally, the eventfd_signal_task() data at least for iosignalfd is > superfluous: I already know that I can access current->mm by virtue of > the design. Maybe I misunderstand what iosignalfd is - isn't it true that you get eventfd and kvm will signal that when guest performs io write to a specific address, and then drivers can get eventfd and poll it to detect these writes? If yes, you have no way to know that the other end of eventfd is connected to kvm, so you don't know you can access current->mm. > So since I cannot use it accurately for the hardirq/threaded-irq type > case, and I don't actually need it for the iosignalfd case, I am not > sure its the right direction (at least for now). I do think it might > have merit for syscal/vmexit uses outside of iosignalfd, but I do not > see a current use-case for it so perhaps it can wait until one arises. > > -Greg But, it addresses the CONFIG_PREEMPT off case, which your design doesn't seem to. > > > >> IMO, the ideal > >> solution needs to be something we can detect at run-time. > >> > >> Thanks Michael, > >> -Greg > >> > >> > > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH 2/4] kvm_guest_wizard: pass 'params' directly to barrier_2()
- "Uri Lublin" wrote: > On 06/15/2009 10:45 PM, Michael Goldish wrote: > > Currently parameters for barrier_2() are extracted from 'params' in > the main > > run_steps() test routine, and then passed to barrier_2(). > > Instead, let barrier_2() extract parameters from 'params' as it sees > fit. > > This will make adding new parameters slightly easier and cleaner. > > > > Signed-off-by: Michael Goldish > > --- > > client/tests/kvm/kvm_guest_wizard.py | 37 > - > > 1 files changed, 18 insertions(+), 19 deletions(-) > > > > diff --git a/client/tests/kvm/kvm_guest_wizard.py > b/client/tests/kvm/kvm_guest_wizard.py > > index 143e61e..eb0e2d5 100644 > > --- a/client/tests/kvm/kvm_guest_wizard.py > > +++ b/client/tests/kvm/kvm_guest_wizard.py > > > > @@ -41,6 +41,18 @@ def barrier_2(vm, words, fail_if_stuck_for, > stuck_detection_history, > > > "cropped_scrdump_expected.ppm") > > comparison_filename = os.path.join(debug_dir, > "comparison.ppm") > > > > +fail_if_stuck_for = params.get("fail_if_stuck_for") > > +if fail_if_stuck_for: > > +fail_if_stuck_for = float(fail_if_stuck_for) > > +else: > > +fail_if_stuck_for = 1e308 > > + > > +stuck_detection_history = > params.get("stuck_detection_history") > > +if stuck_detection_history: > > +stuck_detection_history = int(stuck_detection_history) > > +else: > > +stuck_detection_history = 2 > > + > > Could be simplified by (preferably in a separate patch): > fail_if_stuck_for = params.get("fail_if_stuck_for", 1e308) > fail_if_stuck_for = float(fail_if_stuck_for) > > stuck_detection_history = > params.get("stuck_detection_history", 2) > stuck_detection_history = int(stuck_detection_history) > > or even with a single line for each param: > var = cast(params.get(var_name, default_value)) But what if the user wants to disable the feature or use the default value by setting fail_if_stuck_for = ''? This is required, for example, when the param is set to 300 for all tests, and the user wants to disable it only for Fedora tests. In that case params.get("fail_if_stuck_for", 1e308) would return '' and the following cast would raise an exception. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Tue, Jun 16, 2009 at 11:20:18AM -0400, Gregory Haskins wrote: > > eventfd can't detect this state. But the callers know in what context > they are. > So the *caller* of eventfd_signal_task makes sure of this: if you are in > a task, > you can call eventfd_signal_task() if not, you must call eventfd_signal. > > > > > Hmm, this is an interesting idea, but I think it would be problematic in real-world applications for the long-term. For instance, the -rt tree and irq-threads .config option in the process of merging into mainline changes context types for established code. Therefore, what might be "hardirq/softirq" logic today may execute in a kthread tomorrow. >>> That's OK, it's always safe to call eventfd_signal: eventfd_signal_task is >>> just >>> an optimization. I think everyone not in the context of a system call or >>> vmexit >>> can just call eventfd_signal_task. >>> >>> >> >> >> I assume you meant s/eventfd_signal_task/eventfd_signal there? >> > > Yea. Sorry. > np. I knew what you meant ;) > >>> >>> I think its dangerous to try to solve the problem with caller provided info: the caller may be ignorant of its true state. >>> I assume this wasn't clear enough: the idea is that you only >>> calls eventfd_signal_task if you know you are on a systemcall path. >>> If you are ignorant of the state, call eventfd_signal. >>> >>> >> Well, its not a matter of correctness. Its more for optimal >> performance. If I have PCI pass-though injecting interrupts from >> hardirq in mainline, clearly eventfd_signal() is proper. In -rt, the >> hardirq is transparently converted to a kthread, so technically >> eventfd_signal_task() would work >> > > I think it's wrong to sleep for a long time while handling interrupts even if > you technically can with threaded interrupts. Well, this is somewhat of an orthogonal issue so I don't want to open this can of worms per se. But, one of the long-term goals of the threaded-irq construct is to eliminate the need for the traditional "bh" contexts to do work. The idea, of course, is that the threaded-irq context can do all of the work traditionally shunted to tasklets/softirqs/workqueues directly, so why bother switching contexts. So, the short answer is that its not necessarily wrong to sleep or to do significant processing from a threaded-irq. In any case, threaded-irqs are just one example. I will try to highlight others below. > For that matter, I think you can > sleep while within a spinlock if preempt is on Yes, in fact the spinlocks are mutexes in this mode, so the locks themselves can sleep. > , but that does not mean you > should - and I think you will get warnings in the log if you do. No? > > Nope, sleeping is fine (voluntary or involuntary). The idea is its all governed by priority, and priority-inheritance, and a scheduler that is free to make fine-grained decisions (due to the broadly preemptible kernel). But this is getting off-topic so I will stop. >> (at least for the can_sleep() part, not >> for current->mm per se). But in this case, the PCI logic would not know >> it was converted to a kthread. It all happens transparently in the >> low-level code and the pci code is unmodified. >> >> In this case, your proposal would have the passthrough path invoking >> irqfd with eventfd_signal(). It would therefore still shunt to a >> workqueue to inject the interrupt, even though it would have been >> perfectly fine to just inject it directly because taking >> mutex_lock(&kvm->irq_lock) is legal. >> > > This specific issue should just be addressed by making it possible > to inject the interrupt from an atomic context. > I assume you mean s/mutex_lock(&kvm->irq_lock)/spin_lock(&kvm->irq_lock)? If so, I agree this is a good idea. TBH: I am more concerned about the iosignalfd path w.r.t. the premptiblity of the callback. We can optimize the virtio-net interface, for instance, once we make this ->signal() callback preemptible. Having irqfd ->signal() preemptible is just a bonus, but we could work around it by fixing irq_lock as well, I agree. > >> Perhaps I am over-optimizing, but >> this is the scenario I am concerned about and what I was trying to >> address with preemptible()/can_sleep(). >> > > What, a path that is never invoked without threaded interrupts? > Yes, I would say it currently looks like an over-optimization. > You are only seeing part of the picture though. This is a cascading pattern. > >> I think your idea is a good one to address the current->mm portion. It >> would only ever be safe to access the MM context from syscall/vmexit >> context, as you point out. Therefore, I see no problem with >> implementing something like
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, 16 Jun 2009, Gregory Haskins wrote: > Does this all make sense? This conversation has been *really* long, and I haven't had time to look at the patch yet. But looking at the amount of changes, and the amount of even more changes talked in this thread, there's a very slim chance that I'll ACK the eventfd code. You may want to consider a solution that does not litter eventfd code that much. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio-blk-pci chokes on some PCI slots
I'm working on make PCI slots configurable in QEMU. While testing the feature for virtio-blk-pci, I ran into a task hang with Fedora kernel 2.6.27.5-117.fc10. Marcelo tested it with a newer kernel, and will follow up with details. Instead of the full QEMU patch I'm working on, the appended little hack suffices to reproduce the problem. $ QEMU_VIRTIO_BLK=9 qemu foo.qcow2 -drive file=bar.img,if=virtio -serial stdio [...] Loading virtio_pci module ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 virtio-pci :00:04.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, low) -> IRQ 11 virtio-pci :00:09.0: can't derive routing for PCI INT A virtio-pci :00:09.0: PCI INT A: no GSI - using IRQ 11 Loading virtio_blk module vda: [tick-tock-tick-tock...] <3>INFO: task modprobe:491 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobe D c04303f1 0 491 1 c6baeae8 0082 c6baea80 c04303f1 c6bae000 c087667c c0879c00 c0879c00 c0879c00 c6b659b0 c6b65c24 c1108c00 c1108c00 83a43d86 0002 c781aaa8 c6baead0 c04418d0 c6b65c24 4d88 c781aaa8 4d88 c6baeb3c Call Trace: [] ? irq_exit+0x5f/0x83 [] ? getnstimeofday+0x3c/0xc9 [] ? ktime_get_ts+0x4a/0x4e [] io_schedule+0x52/0x8a [] sync_page+0x46/0x4c [] __wait_on_bit_lock+0x34/0x5e [] ? sync_page+0x0/0x4c [] __lock_page+0x78/0x81 [] ? wake_bit_function+0x0/0x43 [] lock_page+0x2c/0x2f [] read_cache_page_async+0xa4/0xfb [] ? blkdev_readpage+0x0/0x11 [] read_cache_page+0xc/0x3c [] read_dev_sector+0x34/0x6a [] ? efi_partition+0x0/0x676 [] read_lba+0x69/0xc6 [] ? efi_partition+0x84/0x676 [] ? efi_partition+0x0/0x676 [] efi_partition+0xa3/0x676 [] ? vprintk+0x2c8/0x2f3 [] ? iget5_locked+0x33/0x114 [] ? snprintf+0x15/0x17 [] ? efi_partition+0x0/0x676 [] rescan_partitions+0x106/0x242 [] ? need_resched+0x18/0x22 [] ? _cond_resched+0x8/0x32 [] do_open+0x1ff/0x290 [] __blkdev_get+0x71/0x7c [] blkdev_get+0xd/0xf [] register_disk+0xf0/0x141 [] add_disk+0x34/0x89 [] ? exact_match+0x0/0xb [] ? exact_lock+0x0/0x11 [] virtblk_probe+0x275/0x2a8 [virtio_blk] [] virtio_dev_probe+0x89/0xb3 [] driver_probe_device+0xa0/0x136 [] __driver_attach+0x3a/0x59 [] bus_for_each_dev+0x3b/0x63 [] driver_attach+0x14/0x16 [] ? __driver_attach+0x0/0x59 [] bus_add_driver+0x9d/0x1ba [] driver_register+0x81/0xe1 [] ? register_blkdev+0xc8/0xdc [] register_virtio_driver+0x1f/0x21 [] init+0x22/0x24 [virtio_blk] [] _stext+0x3d/0x115 [] ? init+0x0/0x24 [virtio_blk] [] sys_init_module+0x87/0x178 [] syscall_call+0x7/0xb === Some slots work, e.g. 5, some don't, e.g. 6. What's going on here? diff --git a/hw/pc.c b/hw/pc.c index 143b697..15c44ac 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1146,7 +1146,8 @@ static void pc_init1(ram_addr_t ram_size, int unit_id = 0; while ((index = drive_get_index(IF_VIRTIO, 0, unit_id)) != -1) { -pci_create_simple(pci_bus, -1, "virtio-blk-pci"); +char *devfn = getenv("QEMU_VIRTIO_BLK"); +pci_create_simple(pci_bus, devfn ? atoi(devfn) * 8: -1, "virtio-blk-pci"); unit_id++; } } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Davide Libenzi wrote: > On Tue, 16 Jun 2009, Gregory Haskins wrote: > > >> Does this all make sense? >> > > This conversation has been *really* long, and I haven't had time to look > at the patch yet. But looking at the amount of changes, and the amount of > even more changes talked in this thread, there's a very slim chance that > I'll ACK the eventfd code. > You may want to consider a solution that does not litter eventfd code that > much. > > > - Davide > > > Hi Davide, I understand your position and value your time/insight into looking at this things. Despite the current ongoing discussion, I still stand that the current patch is my proposed solution (though I have yet to convince Michael). But in any case, if you have the time, please look it over because I still think its the right direction to head in. The general solution is that we use an srcu list instead of the wait-queue, and thats really it. If we can't eliminate that spinlock held over the notification, it has usability implications at least for irqfd/iosignalfd. The only way I can think of to solve the problem without modifying eventfd is to not use eventfd at all. :( Since using eventfd really captures the concept we are going for here really well, reusing it has a ton of advantages including interface compatibility and, of course, code-reuse of a tested/debugged code base. Heck, we may hopefully even improve eventfd for other users in doing this work. It would therefore be a shame to walk away from it if it can be avoided. So if what I proposed is not acceptable but you are willing to work with me to find a solution that is, that would be ideal from my perspective. Otherwise, I apologize for all the noise. You have been quite the patient and helpful gentleman with me to date and I appreciate that. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: virtio-blk-pci chokes on some PCI slots
On Tue, Jun 16, 2009 at 06:47:45PM +0200, Markus Armbruster wrote: > I'm working on make PCI slots configurable in QEMU. While testing the > feature for virtio-blk-pci, I ran into a task hang with Fedora kernel > 2.6.27.5-117.fc10. Marcelo tested it with a newer kernel, and will > follow up with details. > > Instead of the full QEMU patch I'm working on, the appended little hack > suffices to reproduce the problem. > > $ QEMU_VIRTIO_BLK=9 qemu foo.qcow2 -drive file=bar.img,if=virtio -serial > stdio > [...] > Loading virtio_pci module > ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 > virtio-pci :00:04.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, low) > -> IRQ 11 > virtio-pci :00:09.0: can't derive routing for PCI INT A > virtio-pci :00:09.0: PCI INT A: no GSI - using IRQ 11 > Loading virtio_blk module > vda: > [tick-tock-tick-tock...] > <3>INFO: task modprobe:491 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > modprobe D c04303f1 0 491 1 >c6baeae8 0082 c6baea80 c04303f1 c6bae000 c087667c c0879c00 > c0879c00 >c0879c00 c6b659b0 c6b65c24 c1108c00 c1108c00 83a43d86 > 0002 >c781aaa8 c6baead0 c04418d0 c6b65c24 4d88 c781aaa8 4d88 > c6baeb3c > Call Trace: > [] ? irq_exit+0x5f/0x83 > [] ? getnstimeofday+0x3c/0xc9 > [] ? ktime_get_ts+0x4a/0x4e > [] io_schedule+0x52/0x8a > [] sync_page+0x46/0x4c > [] __wait_on_bit_lock+0x34/0x5e > [] ? sync_page+0x0/0x4c > [] __lock_page+0x78/0x81 > [] ? wake_bit_function+0x0/0x43 > [] lock_page+0x2c/0x2f > [] read_cache_page_async+0xa4/0xfb > [] ? blkdev_readpage+0x0/0x11 > [] read_cache_page+0xc/0x3c > [] read_dev_sector+0x34/0x6a > [] ? efi_partition+0x0/0x676 > [] read_lba+0x69/0xc6 > [] ? efi_partition+0x84/0x676 > [] ? efi_partition+0x0/0x676 > [] efi_partition+0xa3/0x676 > [] ? vprintk+0x2c8/0x2f3 > [] ? iget5_locked+0x33/0x114 > [] ? snprintf+0x15/0x17 > [] ? efi_partition+0x0/0x676 > [] rescan_partitions+0x106/0x242 > [] ? need_resched+0x18/0x22 > [] ? _cond_resched+0x8/0x32 > [] do_open+0x1ff/0x290 > [] __blkdev_get+0x71/0x7c > [] blkdev_get+0xd/0xf > [] register_disk+0xf0/0x141 > [] add_disk+0x34/0x89 > [] ? exact_match+0x0/0xb > [] ? exact_lock+0x0/0x11 > [] virtblk_probe+0x275/0x2a8 [virtio_blk] > [] virtio_dev_probe+0x89/0xb3 > [] driver_probe_device+0xa0/0x136 > [] __driver_attach+0x3a/0x59 > [] bus_for_each_dev+0x3b/0x63 > [] driver_attach+0x14/0x16 > [] ? __driver_attach+0x0/0x59 > [] bus_add_driver+0x9d/0x1ba > [] driver_register+0x81/0xe1 > [] ? register_blkdev+0xc8/0xdc > [] register_virtio_driver+0x1f/0x21 > [] init+0x22/0x24 [virtio_blk] > [] _stext+0x3d/0x115 > [] ? init+0x0/0x24 [virtio_blk] > [] sys_init_module+0x87/0x178 > [] syscall_call+0x7/0xb > === > > Some slots work, e.g. 5, some don't, e.g. 6. > > What's going on here? > > > diff --git a/hw/pc.c b/hw/pc.c > index 143b697..15c44ac 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -1146,7 +1146,8 @@ static void pc_init1(ram_addr_t ram_size, > int unit_id = 0; > > while ((index = drive_get_index(IF_VIRTIO, 0, unit_id)) != -1) { > -pci_create_simple(pci_bus, -1, "virtio-blk-pci"); > +char *devfn = getenv("QEMU_VIRTIO_BLK"); > +pci_create_simple(pci_bus, devfn ? atoi(devfn) * 8: -1, > "virtio-blk-pci"); > unit_id++; > } > } no softlockup with 2.6.29-06638-g17db0a0, but not very friendly either (note the backing device has no partitions in my case, perhaps that makes a difference). virtio-pci :00:04.0: PCI INT A -> Link[LNKD] -> GSI 10 (level, low) -> IRQ 10 virtio-pci :00:09.0: can't derive routing for PCI INT A virtio-pci :00:09.0: PCI INT A: no GSI - using IRQ 11 Starting udev: Wait timeout. Will continue in the background.[FAILED] [r...@guest ~]# dmesg | grep vda vda:<6>EXT3 FS on dm-0, internal journal [r...@guest ~]# mount /dev/vda /mnt Hangs there forever. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
On Tue, Jun 16, 2009 at 12:17:22PM -0400, Gregory Haskins wrote: > > Maybe I misunderstand what iosignalfd is - isn't it true that you get > > eventfd > > and kvm will signal that when guest performs io write to a specific > > address, and then drivers can get eventfd and poll it to detect > > these writes? > > > > Correct. > > > If yes, you have no way to know that the other end of eventfd > > is connected to kvm, so you don't know you can access current->mm. > > > > Well, my intended use for them *does* know its connected to KVM. How does it know? You get a file descriptor and you even figure out it's an eventfd. Now what? Any process can write there. > Perhaps there will be others that do not in the future, but like I said > it could be addressed as those actually arise. > > > > > >> So since I cannot use it accurately for the hardirq/threaded-irq type > >> case, and I don't actually need it for the iosignalfd case, I am not > >> sure its the right direction (at least for now). I do think it might > >> have merit for syscal/vmexit uses outside of iosignalfd, but I do not > >> see a current use-case for it so perhaps it can wait until one arises. > >> > >> -Greg > >> > > > > But, it addresses the CONFIG_PREEMPT off case, which your design doesn't > > seem to. > > > > Well, the problem is that it only addresses it partially in a limited > set of circumstances, and doesn't address the broader problem. I'll > give you an example: > > (First off, lets assume that we are not going to have > eventfd_signal_task(), but rather eventfd_signal() with two option > flags: EVENTFD_SIGNAL_CANSLEEP, and EVENTFD_SIGNAL_CURRENTVALID > > Today vbus-enet has a rx-thread and a tx-thread at least partially > because I need process-context to do the copy_to_user(other->mm) type > stuff (and we will need similar for our virtio-net backend as well). I > also utilize irqfd to do interrupt injection. Now, since I know that I > have kthread_context, I could modify vbus-enet to inject interrupts with > EVENTFD_SIGNAL_CANSLEEP set, and this is fine. I know that is safe. > > However, the problem is above that! I would like to optimize out the > tx-thread to begin with with a similar "can_sleep()" pattern whenever > possible. > > For instance, if the netif stack calls me to do a transmit and it > happens to be in a sleepable context, it would be nice to just skip > waking up the tx-thread. I would therefore just do the > copy_to_user(other->mm) + irqfd directly in the netif callback, thereby > skipping the context switch. What do you mean by copy_to_user(other->mm) here? If you are going to switch to another mm, then I think current->mm must be valid (I think it's not enough that you can sleep). So preemptible() might not be enough. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface
Michael S. Tsirkin wrote: > On Tue, Jun 16, 2009 at 12:17:22PM -0400, Gregory Haskins wrote: > >>> Maybe I misunderstand what iosignalfd is - isn't it true that you get >>> eventfd >>> and kvm will signal that when guest performs io write to a specific >>> address, and then drivers can get eventfd and poll it to detect >>> these writes? >>> >>> >> Correct. >> >> >>> If yes, you have no way to know that the other end of eventfd >>> is connected to kvm, so you don't know you can access current->mm. >>> >>> >> Well, my intended use for them *does* know its connected to KVM. >> > > How does it know? You get a file descriptor and you even figure out it's an > eventfd. Now what? Any process can write there. > Well, I suppose that is true. Unlike my current hypercall coupling deployed in vbus v3, anyone can signal the event for the iosignalfd. However, it wouldn't matter other than looking like an errant signal as I do not use "current" for anything. (e.g. all the virtio pointers are stored independently to the iosignalfd) > >> Perhaps there will be others that do not in the future, but like I said >> it could be addressed as those actually arise. >> >> >> >>> >>> So since I cannot use it accurately for the hardirq/threaded-irq type case, and I don't actually need it for the iosignalfd case, I am not sure its the right direction (at least for now). I do think it might have merit for syscal/vmexit uses outside of iosignalfd, but I do not see a current use-case for it so perhaps it can wait until one arises. -Greg >>> But, it addresses the CONFIG_PREEMPT off case, which your design doesn't >>> seem to. >>> >>> >> Well, the problem is that it only addresses it partially in a limited >> set of circumstances, and doesn't address the broader problem. I'll >> give you an example: >> >> (First off, lets assume that we are not going to have >> eventfd_signal_task(), but rather eventfd_signal() with two option >> flags: EVENTFD_SIGNAL_CANSLEEP, and EVENTFD_SIGNAL_CURRENTVALID >> >> Today vbus-enet has a rx-thread and a tx-thread at least partially >> because I need process-context to do the copy_to_user(other->mm) type >> stuff (and we will need similar for our virtio-net backend as well). I >> also utilize irqfd to do interrupt injection. Now, since I know that I >> have kthread_context, I could modify vbus-enet to inject interrupts with >> EVENTFD_SIGNAL_CANSLEEP set, and this is fine. I know that is safe. >> >> However, the problem is above that! I would like to optimize out the >> tx-thread to begin with with a similar "can_sleep()" pattern whenever >> possible. >> >> For instance, if the netif stack calls me to do a transmit and it >> happens to be in a sleepable context, it would be nice to just skip >> waking up the tx-thread. I would therefore just do the >> copy_to_user(other->mm) + irqfd directly in the netif callback, thereby >> skipping the context switch. >> > > What do you mean by copy_to_user(other->mm) here? If you are going to switch > to another mm, then I think current->mm must be valid (I think it's not enough > that you can sleep). So preemptible() might not be enough. > I dont currently use switch_mm, if that is what you mean. I save the task_struct into the appropriate context so current->mm doesn't matter to me. I never use it. All I need (afaik) is to acquire the proper mutex first. I am not an MM expert, so perhaps I have this wrong but it does appear to work properly even from kthread context. -Greg signature.asc Description: OpenPGP digital signature
Debug message flooding system log
Hi, Since I installed the latest version of qemu-kvm (0.10.5), the following 3 lines are flooding my system log (dmesg). I have no idea where this is coming from. [433019.047800] arch/x86/kvm/x86.c kvm_write_guest_time [433019.047802] Write (vcpu->hv_clock.system_time): 433019047801180 [433019.047804] FYI (vcpu->hv_clock.tsc_timestamp): 1298967607382796 I tried to look in the mentionned file (x86.c) in the kernel source, but didn't find anything interesting. I probably activated a debug option somewhere, but I really don't see where. As an alternate option, I tried to set "dmesg -n1", which should hide all messages except panics, but the 3 lines are still appearing at a very fast rate. Any idea where this comes from and how I can disable this output ? Thanks Simon Marchi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: > Another issue is enumeration. Guests will present their devices in the > order they find them on the pci bus (of course enumeration is guest > specific). So if I have 2 virtio controllers the only way I can > distinguish between them is using their pci slots. virtio controllers really should have a user-suppliable string or UUID to identify them to the guest. Don't they? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Mark McLoughlin wrote: > > After libvirt has done -drive file=foo... it should dump the machine > > config and use that from then on. > > Right - libvirt then wouldn't be able to avoid the complexity of merging > any future changes into the dumped machine config. As long as qemu can accept a machine config _and_ -drive file=foo (and monitor commands to add/remove devices), libvirt could merge by simply calling qemu with whatever additional command line options or monitor commands modify the config, then dump the new config. That way, virtio would not have to deal with that complexity. It would be written in one place: qemu. Or better, a utility: qemu-machine-config. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Mark McLoughlin wrote: > > Worst case we hardcode those numbers (gasp, faint). > > Maybe we can just add the open slots to the -help output. That'd be nice > and clean. Make them part of the machine configuration. After all, they are part of the machine configuration, and ACPI, BIOS etc. need to know about all the machine slots anyway. Having said that, I prefer the idea that slot allocation is handled either in Qemu, or in a separate utility called qemu-machine-config (for working with machine configs), or in a library libqemu-machine-config.so. I particularly don't like the idea of arcane machine-dependent slot allocation knowledge living in libvirt, because it needs to be in Qemu anyway for non-libvirt users. No point in having two implementations of something tricky and likely to have machine quirks, if one will do. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] qemu-kvm-0.10.5, kvm-kmod-2.6.30, libvirt-0.6.4 -save/restore still unreliable
Hello Charles, > Otherwise -- at the monitor, migrate "exec:cat >somefile.out" I tried that, saving seems to work OK, but if I restore the domain, kvm always seems to be stuck, I'm not able to swith to monitor to do anything. running strace on the kvm process show that following is repeating: clock_gettime(CLOCK_MONOTONIC, {453254, 437209647}) = 0 clock_gettime(CLOCK_MONOTONIC, {453254, 437244463}) = 0 clock_gettime(CLOCK_MONOTONIC, {453254, 437295938}) = 0 select(20, [5 10 17 19], [], [], {1, 0}) = 1 (in [5], left {0, 98}) read(5, "\0", 512) = 1 read(5, 0x7fff60506840, 512)= -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {453254, 437458602}) = 0 clock_gettime(CLOCK_MONOTONIC, {453254, 437492440}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3000}}, NULL) = 0 clock_gettime(CLOCK_MONOTONIC, {453254, 437605596}) = 0 clock_gettime(CLOCK_MONOTONIC, {453254, 437639489}) = 0 select(20, [5 10 17 19], [], [], {1, 0}) = 1 (in [19], left {0, 970104}) read(19, "\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 rt_sigaction(SIGALRM, NULL, {0x4083b0, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f25576e94c0}, 8) = 0 write(6, "\0", 1) = 1 read(19, 0x7fff605069c0, 128) = -1 EAGAIN (Resource temporarily unavailable) I guess I maybe need to unfreeze the domain somehow after loading it? > > Replace cat with gzip -c or other compression tool (and reverse on > migrate in) for more efficient on-disk storage. > > -Original Message- > From: Nikola Ciprich [mailto:extmaill...@linuxbox.cz] > Sent: Saturday, June 13, 2009 1:32 AM > To: CHARLESDUFFY > Cc: kvm@vger.kernel.org; nikola.cipr...@linuxbox.cz > Subject: Re: [libvirt] qemu-kvm-0.10.5, kvm-kmod-2.6.30, libvirt-0.6.4 > -save/restore still unreliable > > Hmm, > I just noticed that dump file saved using virsh contains some libvirt > related headers, so I guess I can't just feed it directly to qemu-kvm. > So I guess best way to test save/restore without libvirt is to save > vm state using monitor and then restore it again. > But I can't find any clear documentation on how to do this. > Could anyone give me a hint? > thanks a lot! > nik > > > On Fri, Jun 12, 2009 at 03:17:18PM -0500, Charles Duffy wrote: > > That may well be a guest issue rather than a qemu migration problem -- > > > I've seen similar difficulties for guests being resumed via incoming > > migration when using guest kernels with TSC as their clocksource. > > > > You might try interrogating the monitor to see your guest's state. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > - > Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28. rijna 168, 709 01 Ostrava > > tel.: +420 596 603 142 > fax:+420 596 621 273 > mobil: +420 777 093 799 > > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: ser...@linuxbox.cz > - > -- - Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 01/13] qemu: make default_write_config use mask table
Michael S. Tsirkin wrote: Change much of hw/pci to use symbolic constants and a table-driven design: add a mask table with writable bits set and readonly bits unset. Detect change by comparing original and new registers. This makes it easy to support capabilities where read-only/writeable bit layout differs between devices, depending on capabilities present. As a result, writing a single byte in BAR registers now works as it should. Writing to upper limit registers in the bridge also works as it should. Code is also shorter. Signed-off-by: Michael S. Tsirkin This series introduces warning (virtio_load decl/def does not match). -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] lio-utils.git v3.0 configfs HOWTO for v2.6.30 released
Grettings all, The Linux-iSCSI.org Target v3.0 HOWTO using the new configfs enabled generic target subsystem python CLI has been released and is up and running on v2.6.30 of the Linux Kernel in x86_64 KVM (Debian), x86_64 bare-metal (SLES) and PPC64 (Fedora PS3) hosts! The HOWTO entry can be found at: http://linux-iscsi.org/index.php/Howto The v3.0 howto is a WIP and currently expects that the reader is fimilar with building and installing their own LIO target modules within a linux kernel source. There is currently work being done to integrate this into virtual machines imagines ot make the v3.0 LIO CLI more accessable for end users. We are expecting the v3.0 work to be available in a couple of different forms soon. Until then, interested folks should have a look at the v3.0 howto and the lio-utils.git source tree for usage information and towards build their own copy from source. So far using the lio-utils.git tools for the Generic Target Core v3.0 with LIO-Target configfs enabled symlinks, the code has been able to scale to 1 unique Virtual HBA+FILEIO LUNs with SCSI 3 emulation <-> 1 unique LIO-Target v3.0 Endpoints inside of a 4GB memory x86_64 KVM guest on recent x86_64 hardware. Each of these HBA+LUNs and iSCSI Target Endpoints are designed (using the TCM+LIO configfs kernel infrastructure) to be configured completely independent of one another, so that creating/changing one does not effect the real-time configfs ops of another. The configs CLI commands to reproduce the running setup can be dumped at any time for the entire running setup using tcm_dump.py and lio_dump.py. The lio-utils.git source repository is for v3.0 target_core_mod +iscsi_target_mod configfs enabled code in lio-core-2.6.git: http://git.kernel.org/?p=linux/storage/lio/lio-utils.git;a=summary Additional information can be found here: http://linux-iscsi.org/index.php/Lio-utils Questions can be asked via the LIO-Devel or LIO-Users: http://groups.google.com/group/linux-iscsi-target-dev http://groups.google.com/group/linux-iscsi-target-users The former is where most folks today (devels+users) are doing most of their work, and thanks to the many people has been helping improve the code! I would like to encourage new end users to start posting in linux-iscsi-targets-users as they get their v3.0 LIO-Target with tcm_node.py + lio_node.py setups up and running with general questions and comments. Also a special thanks to the kernel.org folks (warthog9 and hpa) assitance with getting the lio-utils.git source repository up and running, it would not be possible without their help. Thank you! --nab -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH] Adding iperf test
Adding iperf network performance test. Basically it tests networking functionality, stability and performance of guest OSes. This test is cross-platform -- i.e. it works on both Linux and Windows VMs. Signed-off-by: Alexey Eromenko --- client/tests/kvm/kvm.py |1 + client/tests/kvm/kvm_iperf.py | 105 + client/tests/kvm/kvm_tests.cfg.sample |8 +++ 3 files changed, 114 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/kvm_iperf.py diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py index 9428162..e1a6e27 100644 --- a/client/tests/kvm/kvm.py +++ b/client/tests/kvm/kvm.py @@ -53,6 +53,7 @@ class kvm(test.test): "autotest": test_routine("kvm_tests", "run_autotest"), "kvm_install": test_routine("kvm_install", "run_kvm_install"), "linux_s3": test_routine("kvm_tests", "run_linux_s3"), +"iperf":test_routine("kvm_iperf", "run_iperf"), } # Make it possible to import modules from the test's bindir diff --git a/client/tests/kvm/kvm_iperf.py b/client/tests/kvm/kvm_iperf.py new file mode 100644 index 000..927c9e5 --- /dev/null +++ b/client/tests/kvm/kvm_iperf.py @@ -0,0 +1,105 @@ +import time, os, logging +from autotest_lib.client.common_lib import utils, error +import kvm_utils + +def run_iperf(test, params, env): +""" +Runs iperf on the guest system and brings back the result. + +@see: http://sourceforge.net/projects/iperf +@param test: kvm test object +@param params: Dictionary with test parameters +@param env: Test environment +""" +vm = kvm_utils.env_get_vm(env, params.get("main_vm")) +if not vm: +message = "VM object not found in environment" +logging.error(message) +raise error.TestError, message +if not vm.is_alive(): +message = "VM seems to be dead; Test requires a living VM" +logging.error(message) +raise error.TestError(message) + +logging.info("Waiting for guest to be up...") + +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) +if not session: +message = "Could not log into guest" +logging.error(message) +raise error.TestFail, message + +logging.info("Logged in") + +# Checking for GuestOS-compatible iPerf binary existence on host. +iperf_binary = params.get("iperf_binary", "misc/iperf") +iperf_duration = params.get("iperf_duration", 5) +iperf_parallel_threads = params.get("iperf_parallel_threads", 1) +iperf_dest_ip = params.get("iperf_dest_ip", "10.0.2.2") +iperf_binary = os.path.join(test.bindir, iperf_binary) +if not os.path.exists(iperf_binary): +message = "iPerf binary: %s was not found on host" % iperf_binary +logging.error(message) +raise error.TestError, message +else: +logging.info("iPerf binary: %s was found on host" % iperf_binary) + +# Starting HostOS-compatible iPerf Server on host +logging.info('VM is up ... \n starting iPerf Server on host') +kvm_utils.run_bg("iperf -s", timeout=5) + +# Detecting GuestOS +if iperf_binary.__contains__("exe"): +vm_type="win32" +else: +vm_type="linux32" + +# Copying GuestOS-compatible iPerf binary to guest. +# Starting iPerf Client on guest, plus connect to host. +if vm_type == "win32": +win_dir = "/cygdrive/c/" +logging.info('starting copying %s to Windows VM to %s' % (iperf_binary, + win_dir)) +if not vm.scp_to_remote(iperf_binary, win_dir): +message = "Could not copy Win32 iPerf to guest" +logging.error(message) +raise error.TestError(message) +logging.debug("Enabling file permissions of iPerf.exe on Windows VM...") +session.sendline('cacls C:\iperf.exe /P Administrator:F') +session.sendline('y') +session.sendline('') +time.sleep(2) +session.sendline('') +logging.info("starting iPerf client on Windows VM, connecting to host") +session.sendline('C:\iperf -t %s -c %s -P %s' % (int(iperf_duration), + iperf_dest_ip, + int(iperf_parallel_threads))) +else: +logging.info('starting copying %s to Linux VM ' % iperf_binary) +if not vm.scp_to_remote(iperf_binary, "/usr/local/bin"): +message = "Could not copy Linux iPerf to guest" +logging.error(message) +raise error.TestError, message +print "starting iPerf client on VM, connecting to host" +session.sendline('iperf -t %s -c %s -P %s' % (int(iperf_duration), + iperf_dest_ip, + int(iperf_parallel_threads)))
Re: [KVM-AUTOTEST PATCH] Adding iperf test
On Tue, 2009-06-16 at 18:29 -0300, Lucas Meneghel Rodrigues wrote: > Adding iperf network performance test. Basically it tests > networking functionality, stability and performance of guest OSes. > This test is cross-platform -- i.e. it works on both Linux and > Windows VMs. Ok, now that I had rebased the test, I have a few comments to say: * I don't like the idea to ship binaries inside the test. Fair enough that we don't support other archs than x86 and x86_64 (also there's the problem windows usually doesn't ship a working toolchain), but I would like to think a bit more about it. * Autotest already does have an iperf test, that could be used on Linux guests. Sure there's a problem of matching what's being executed on windows, but it's worth a look * Autotest iperf test usually runs with 2 machines, one in 'server' role and other in 'client' mode. I would like to pursue the same model, 2 vms, one running as a server and another as a client. Alexey and Yaniv, I'd like to hear your opinions on this. Thanks, > Signed-off-by: Alexey Eromenko > --- > client/tests/kvm/kvm.py |1 + > client/tests/kvm/kvm_iperf.py | 105 > + > client/tests/kvm/kvm_tests.cfg.sample |8 +++ > 3 files changed, 114 insertions(+), 0 deletions(-) > create mode 100644 client/tests/kvm/kvm_iperf.py > > diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py > index 9428162..e1a6e27 100644 > --- a/client/tests/kvm/kvm.py > +++ b/client/tests/kvm/kvm.py > @@ -53,6 +53,7 @@ class kvm(test.test): > "autotest": test_routine("kvm_tests", "run_autotest"), > "kvm_install": test_routine("kvm_install", > "run_kvm_install"), > "linux_s3": test_routine("kvm_tests", "run_linux_s3"), > +"iperf":test_routine("kvm_iperf", "run_iperf"), > } > > # Make it possible to import modules from the test's bindir > diff --git a/client/tests/kvm/kvm_iperf.py b/client/tests/kvm/kvm_iperf.py > new file mode 100644 > index 000..927c9e5 > --- /dev/null > +++ b/client/tests/kvm/kvm_iperf.py > @@ -0,0 +1,105 @@ > +import time, os, logging > +from autotest_lib.client.common_lib import utils, error > +import kvm_utils > + > +def run_iperf(test, params, env): > +""" > +Runs iperf on the guest system and brings back the result. > + > +@see: http://sourceforge.net/projects/iperf > +@param test: kvm test object > +@param params: Dictionary with test parameters > +@param env: Test environment > +""" > +vm = kvm_utils.env_get_vm(env, params.get("main_vm")) > +if not vm: > +message = "VM object not found in environment" > +logging.error(message) > +raise error.TestError, message > +if not vm.is_alive(): > +message = "VM seems to be dead; Test requires a living VM" > +logging.error(message) > +raise error.TestError(message) > + > +logging.info("Waiting for guest to be up...") > + > +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) > +if not session: > +message = "Could not log into guest" > +logging.error(message) > +raise error.TestFail, message > + > +logging.info("Logged in") > + > +# Checking for GuestOS-compatible iPerf binary existence on host. > +iperf_binary = params.get("iperf_binary", "misc/iperf") > +iperf_duration = params.get("iperf_duration", 5) > +iperf_parallel_threads = params.get("iperf_parallel_threads", 1) > +iperf_dest_ip = params.get("iperf_dest_ip", "10.0.2.2") > +iperf_binary = os.path.join(test.bindir, iperf_binary) > +if not os.path.exists(iperf_binary): > +message = "iPerf binary: %s was not found on host" % iperf_binary > +logging.error(message) > +raise error.TestError, message > +else: > +logging.info("iPerf binary: %s was found on host" % iperf_binary) > + > +# Starting HostOS-compatible iPerf Server on host > +logging.info('VM is up ... \n starting iPerf Server on host') > +kvm_utils.run_bg("iperf -s", timeout=5) > + > +# Detecting GuestOS > +if iperf_binary.__contains__("exe"): > +vm_type="win32" > +else: > +vm_type="linux32" > + > +# Copying GuestOS-compatible iPerf binary to guest. > +# Starting iPerf Client on guest, plus connect to host. > +if vm_type == "win32": > +win_dir = "/cygdrive/c/" > +logging.info('starting copying %s to Windows VM to %s' % > (iperf_binary, > + win_dir)) > +if not vm.scp_to_remote(iperf_binary, win_dir): > +message = "Could not copy Win32 iPerf to guest" > +logging.error(message) > +raise error.TestError(message) > +logging.debug("Enabling file permissions of iPerf.exe on Windows > VM...") > +session.sendline('cacls C:\iperf.exe /P Adm
[patch 0/2] replace kvmtrace with event traces v2
-- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/2] KVM: remove old KVMTRACE support code
Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls. Signed-off-by: Marcelo Tosatti Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -2367,7 +2367,7 @@ static long kvm_dev_ioctl(struct file *f case KVM_TRACE_ENABLE: case KVM_TRACE_PAUSE: case KVM_TRACE_DISABLE: - r = kvm_trace_ioctl(ioctl, arg); + r = -EOPNOTSUPP; break; default: return kvm_arch_dev_ioctl(filp, ioctl, arg); @@ -2717,7 +2717,6 @@ EXPORT_SYMBOL_GPL(kvm_init); void kvm_exit(void) { - kvm_trace_cleanup(); tracepoint_synchronize_unregister(); misc_deregister(&kvm_dev); kmem_cache_destroy(kvm_vcpu_cache); Index: kvm/include/linux/kvm_host.h === --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -482,37 +482,6 @@ struct kvm_stats_debugfs_item { extern struct kvm_stats_debugfs_item debugfs_entries[]; extern struct dentry *kvm_debugfs_dir; -#define KVMTRACE_5D(evt, vcpu, d1, d2, d3, d4, d5, name) \ - trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ - vcpu, 5, d1, d2, d3, d4, d5) -#define KVMTRACE_4D(evt, vcpu, d1, d2, d3, d4, name) \ - trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ - vcpu, 4, d1, d2, d3, d4, 0) -#define KVMTRACE_3D(evt, vcpu, d1, d2, d3, name) \ - trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ - vcpu, 3, d1, d2, d3, 0, 0) -#define KVMTRACE_2D(evt, vcpu, d1, d2, name) \ - trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ - vcpu, 2, d1, d2, 0, 0, 0) -#define KVMTRACE_1D(evt, vcpu, d1, name) \ - trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ - vcpu, 1, d1, 0, 0, 0, 0) -#define KVMTRACE_0D(evt, vcpu, name) \ - trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \ - vcpu, 0, 0, 0, 0, 0, 0) - -#ifdef CONFIG_KVM_TRACE -int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg); -void kvm_trace_cleanup(void); -#else -static inline -int kvm_trace_ioctl(unsigned int ioctl, unsigned long arg) -{ - return -EINVAL; -} -#define kvm_trace_cleanup() ((void)0) -#endif - #ifdef KVM_ARCH_WANT_MMU_NOTIFIER static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq) { Index: kvm/virt/kvm/kvm_trace.c === --- kvm.orig/virt/kvm/kvm_trace.c +++ /dev/null @@ -1,285 +0,0 @@ -/* - * kvm trace - * - * It is designed to allow debugging traces of kvm to be generated - * on UP / SMP machines. Each trace entry can be timestamped so that - * it's possible to reconstruct a chronological record of trace events. - * The implementation refers to blktrace kernel support. - * - * Copyright (c) 2008 Intel Corporation - * Copyright (C) 2006 Jens Axboe - * - * Authors: Feng(Eric) Liu, eric.e@intel.com - * - * Date:Feb 2008 - */ - -#include -#include -#include -#include - -#include - -#define KVM_TRACE_STATE_RUNNING(1 << 0) -#define KVM_TRACE_STATE_PAUSE (1 << 1) -#define KVM_TRACE_STATE_CLEARUP(1 << 2) - -struct kvm_trace { - int trace_state; - struct rchan *rchan; - struct dentry *lost_file; - atomic_t lost_records; -}; -static struct kvm_trace *kvm_trace; - -struct kvm_trace_probe { - const char *name; - const char *format; - u32 timestamp_in; - marker_probe_func *probe_func; -}; - -static inline int calc_rec_size(int timestamp, int extra) -{ - int rec_size = KVM_TRC_HEAD_SIZE; - - rec_size += extra; - return timestamp ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size; -} - -static void kvm_add_trace(void *probe_private, void *call_data, - const char *format, va_list *args) -{ - struct kvm_trace_probe *p = probe_private; - struct kvm_trace *kt = kvm_trace; - struct kvm_trace_rec rec; - struct kvm_vcpu *vcpu; - inti, size; - u32extra; - - if (unlikely(kt->trace_state != KVM_TRACE_STATE_RUNNING)) - return; - - rec.rec_val = TRACE_REC_EVENT_ID(va_arg(*args, u32)); - vcpu= va_arg(*args, struct kvm_vcpu *); - rec.pid = current->tgid; - rec.vcpu_id = vcpu->vcpu_id; - - extra = va_arg(*args, u32); - WARN_ON(!(extra <= KVM_TRC_EXTRA_MAX)); - extra = min_t(u32, extra, KVM_TRC_EXTRA_MAX); - - rec.rec_val |= TRACE_REC_TCS(p->
[patch 1/2] KVM: convert custom marker based tracing to event traces
This allows use of the powerful ftrace infrastructure. See Documentation/trace/ for usage information. Signed-off-by: Marcelo Tosatti Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -34,6 +34,7 @@ #include #include "kvm_cache_regs.h" #include "irq.h" +#include "trace.h" #ifndef CONFIG_X86_64 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y)) @@ -515,8 +516,6 @@ static u32 __apic_read(struct kvm_lapic { u32 val = 0; - KVMTRACE_1D(APIC_ACCESS, apic->vcpu, (u32)offset, handler); - if (offset >= LAPIC_MMIO_LENGTH) return 0; @@ -562,6 +561,8 @@ static void apic_mmio_read(struct kvm_io } result = __apic_read(apic, offset & ~0xf); + trace_kvm_apic_read(offset, result); + switch (len) { case 1: case 2: @@ -657,7 +658,7 @@ static void apic_mmio_write(struct kvm_i offset &= 0xff0; - KVMTRACE_1D(APIC_ACCESS, apic->vcpu, (u32)offset, handler); + trace_kvm_apic_write(offset, val); switch (offset) { case APIC_ID: /* Local APIC ID */ Index: kvm/arch/x86/kvm/svm-trace.h === --- /dev/null +++ kvm/arch/x86/kvm/svm-trace.h @@ -0,0 +1,51 @@ +#define exit_reasons svm_exit_reasons +#define svm_exit_reasons \ + {SVM_EXIT_READ_CR0, "read_cr0"},\ + {SVM_EXIT_READ_CR3, "read_cr3"},\ + {SVM_EXIT_READ_CR4, "read_cr4"},\ + {SVM_EXIT_READ_CR8, "read_cr8"},\ + {SVM_EXIT_WRITE_CR0,"write_cr0"}, \ + {SVM_EXIT_WRITE_CR3,"write_cr3"}, \ + {SVM_EXIT_WRITE_CR4,"write_cr4"}, \ + {SVM_EXIT_WRITE_CR8,"write_cr8"}, \ + {SVM_EXIT_READ_DR0, "read_dr0"},\ + {SVM_EXIT_READ_DR1, "read_dr1"},\ + {SVM_EXIT_READ_DR2, "read_dr2"},\ + {SVM_EXIT_READ_DR3, "read_dr3"},\ + {SVM_EXIT_WRITE_DR0,"write_dr0"}, \ + {SVM_EXIT_WRITE_DR1,"write_dr1"}, \ + {SVM_EXIT_WRITE_DR2,"write_dr2"}, \ + {SVM_EXIT_WRITE_DR3,"write_dr3"}, \ + {SVM_EXIT_WRITE_DR5,"write_dr5"}, \ + {SVM_EXIT_WRITE_DR7,"write_dr7"}, \ + {SVM_EXIT_EXCP_BASE + DB_VECTOR,"DB excp"}, \ + {SVM_EXIT_EXCP_BASE + BP_VECTOR,"BP excp"}, \ + {SVM_EXIT_EXCP_BASE + UD_VECTOR,"UD excp"}, \ + {SVM_EXIT_EXCP_BASE + PF_VECTOR,"PF excp"}, \ + {SVM_EXIT_EXCP_BASE + NM_VECTOR,"NM excp"}, \ + {SVM_EXIT_EXCP_BASE + MC_VECTOR,"MC excp"}, \ + {SVM_EXIT_INTR, "interrupt"}, \ + {SVM_EXIT_NMI, "nmi"}, \ + {SVM_EXIT_SMI, "smi"}, \ + {SVM_EXIT_INIT, "init"},\ + {SVM_EXIT_VINTR,"vintr"}, \ + {SVM_EXIT_CPUID,"cpuid"}, \ + {SVM_EXIT_INVD, "invd"},\ + {SVM_EXIT_HLT, "hlt"}, \ + {SVM_EXIT_INVLPG, "invlpg"}, \ + {SVM_EXIT_INVLPGA, "invlpga"}, \ + {SVM_EXIT_IOIO, "io"}, \ + {SVM_EXIT_MSR, "msr"}, \ + {SVM_EXIT_TASK_SWITCH, "task_switch"}, \ + {SVM_EXIT_SHUTDOWN, "shutdown"},\ + {SVM_EXIT_VMRUN,"vmrun"}, \ + {SVM_EXIT_VMMCALL, "hypercall"}, \ + {SVM_EXIT_VMLOAD, "vmload"}, \ + {SVM_EXIT_VMSAVE, "vmsave"}, \ + {SVM_EXIT_STGI, "stgi"},\ + {SVM_EXIT_CLGI, "clgi"},\ + {SVM_EXIT_SKINIT, "skinit"}, \ + {SVM_EXIT_WBINVD, "wbinvd"}, \ + {SVM_EXIT_MONITOR, "monitor"}, \ + {SVM_EXIT_MWAIT,
Re: Debug message flooding system log
On Tue, Jun 16, 2009 at 2:22 PM, Simon Marchi wrote: > Hi, > > Since I installed the latest version of qemu-kvm (0.10.5), the > following 3 lines are flooding my system log (dmesg). I have no idea > where this is coming from. > > [433019.047800] arch/x86/kvm/x86.c kvm_write_guest_time > [433019.047802] Write (vcpu->hv_clock.system_time): 433019047801180 > [433019.047804] FYI (vcpu->hv_clock.tsc_timestamp): 1298967607382796 > > I tried to look in the mentionned file (x86.c) in the kernel source, > but didn't find anything interesting. I probably activated a debug > option somewhere, but I really don't see where. > > As an alternate option, I tried to set "dmesg -n1", which should hide > all messages except panics, but the 3 lines are still appearing at a > very fast rate. > > Any idea where this comes from and how I can disable this output ? > > Thanks > > Simon Marchi > Oh my god... Forget that... All day I was booted on the wrong kernel... Now I understand why my modifications in the kvm module had no effect ! This notice was something I added few weeks ago to find the root of another problem... It feels better now. Thanks Simon Marchi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] Adding iperf test
On Tue, 2009-06-16 at 18:40 -0300, Lucas Meneghel Rodrigues wrote: > * Autotest iperf test usually runs with 2 machines, one in 'server' > role and other in 'client' mode. I would like to pursue the same model, > 2 vms, one running as a server and another as a client. Nevermind this comment, I've missed the part of the test that starts the server on the host linux system. -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH] kvm test: Adding remote migration support
Make the current migration test handle remote migration. In order to use remote migration, the following four parameters should be added to the existing migration test: remote = dst hostip = remoteip = remuser = root rempassword = The field remote=dst indicates the VM "dst" should be created on remote machine. For example: - migrate: install setup type = migration vms += " dst" migration_test_command = help kill_vm_on_error = yes remote = dst hostip = 192.168.1.2 remoteip = 192.168.1.3 remuser = root rempassword = 123456 variants: Signed-off-by: Yogananth Subramanian --- client/tests/kvm/kvm_tests.cfg.sample |6 +++ client/tests/kvm/kvm_tests.py |2 +- client/tests/kvm/kvm_vm.py| 61 + 3 files changed, 53 insertions(+), 16 deletions(-) diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 931f748..ca7f1d0 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -54,6 +54,12 @@ variants: vms += " dst" migration_test_command = help kill_vm_on_error = yes +remote = dst +hostip = 192.168.1.2 +remoteip = 192.168.1.3 +remuser = root +rempassword = 123456 +kill_vm_on_error = yes variants: - 1: start_vm_for_migration_dst = yes diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py index 4270cae..32b1ba4 100644 --- a/client/tests/kvm/kvm_tests.py +++ b/client/tests/kvm/kvm_tests.py @@ -113,7 +113,7 @@ def run_migration(test, params, env): session.close() # Define the migration command -cmd = "migrate -d tcp:localhost:%d" % dest_vm.migration_port +cmd = "migrate -d tcp:%s:%d" % (dest_vm.hostip, dest_vm.migration_port) logging.debug("Migration command: %s" % cmd) # Migrate diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 5028161..c15d0a1 100644 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -1,5 +1,5 @@ #!/usr/bin/python -import time, socket, os, logging, fcntl +import time, socket, os, logging, fcntl, re import kvm_utils """ @@ -113,6 +113,7 @@ class VM: self.qemu_path = qemu_path self.image_dir = image_dir self.iso_dir = iso_dir +self.remote = False # Find available monitor filename @@ -170,8 +171,6 @@ class VM: file.close() if not self.qemu_path in cmdline: return False -if not self.monitor_file_name in cmdline: -return False return True @@ -234,8 +233,6 @@ class VM: qemu_cmd += qemu_path # Add the VM's name qemu_cmd += " -name '%s'" % name -# Add the monitor socket parameter -qemu_cmd += " -monitor unix:%s,server,nowait" % self.monitor_file_name for image_name in kvm_utils.get_sub_dict_names(params, "images"): image_params = kvm_utils.get_sub_dict(params, image_name) @@ -320,6 +317,18 @@ class VM: qemu_path = self.qemu_path image_dir = self.image_dir iso_dir = self.iso_dir +# If VM is remote, set hostip to ip of the remote machine +# If VM is local set hostip to localhost or hostip param +if params.get("remote") == "yes": +self.remote = True +self.hostip = params.get("remoteip") +self.qemu_path = params.get("qemu_path",qemu_path) +qemu_path = self.qemu_path +self.image_dir = params.get("image_dir",image_dir) +image_dir = self.image_dir +else: +self.remote = False +self.hostip = params.get("hostip","localhost") # Verify the md5sum of the ISO image iso = params.get("cdrom") @@ -377,9 +386,32 @@ class VM: # Add -incoming option to the qemu command qemu_command += " -incoming tcp:0:%d" % self.migration_port -logging.debug("Running qemu command:\n%s", qemu_command) -(status, pid, output) = kvm_utils.run_bg(qemu_command, None, - logging.debug, "(qemu) ") +self.monitor_port = kvm_utils.find_free_port(5400, 6000) +qemu_command += (" -monitor tcp:0:%d,server,nowait" % + self.monitor_port) + +# If the VM is remote, get the username and password of remote host +# and lanch qemu command on the remote machine. +if self.remote: +remuser = params.get("remuser") +rempassword = params.get("rempassword") +sub = kvm_utils.ssh(self.hostip, 22, remuser, rempassword, +self.params.get("ssh_prompt", "[\#\$]")) +qemu
Re: [KVM-AUTOTEST PATCH] kvm test: Adding remote migration support
On Tue, 2009-06-16 at 21:46 -0300, Lucas Meneghel Rodrigues wrote: > Make the current migration test handle remote migration. In order to > use remote migration, the following four parameters should be added > to the existing migration test: I went trough the discussion about this patch and I also agree we should be implementing this test as a server side test. However I thought it would be good having this patch rebased to the new kvm test code just in case we need it in the future. This is part of the patch queue work. >remote = dst >hostip = >remoteip = >remuser = root >rempassword = > > The field remote=dst indicates the VM "dst" should be created on remote > machine. For example: > >- migrate: install setup >type = migration >vms += " dst" >migration_test_command = help >kill_vm_on_error = yes >remote = dst >hostip = 192.168.1.2 >remoteip = 192.168.1.3 >remuser = root >rempassword = 123456 >variants: > > Signed-off-by: Yogananth Subramanian > --- > client/tests/kvm/kvm_tests.cfg.sample |6 +++ > client/tests/kvm/kvm_tests.py |2 +- > client/tests/kvm/kvm_vm.py| 61 > + > 3 files changed, 53 insertions(+), 16 deletions(-) > > diff --git a/client/tests/kvm/kvm_tests.cfg.sample > b/client/tests/kvm/kvm_tests.cfg.sample > index 931f748..ca7f1d0 100644 > --- a/client/tests/kvm/kvm_tests.cfg.sample > +++ b/client/tests/kvm/kvm_tests.cfg.sample > @@ -54,6 +54,12 @@ variants: > vms += " dst" > migration_test_command = help > kill_vm_on_error = yes > +remote = dst > +hostip = 192.168.1.2 > +remoteip = 192.168.1.3 > +remuser = root > +rempassword = 123456 > +kill_vm_on_error = yes > variants: > - 1: > start_vm_for_migration_dst = yes > diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py > index 4270cae..32b1ba4 100644 > --- a/client/tests/kvm/kvm_tests.py > +++ b/client/tests/kvm/kvm_tests.py > @@ -113,7 +113,7 @@ def run_migration(test, params, env): > session.close() > > # Define the migration command > -cmd = "migrate -d tcp:localhost:%d" % dest_vm.migration_port > +cmd = "migrate -d tcp:%s:%d" % (dest_vm.hostip, dest_vm.migration_port) > logging.debug("Migration command: %s" % cmd) > > # Migrate > diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py > index 5028161..c15d0a1 100644 > --- a/client/tests/kvm/kvm_vm.py > +++ b/client/tests/kvm/kvm_vm.py > @@ -1,5 +1,5 @@ > #!/usr/bin/python > -import time, socket, os, logging, fcntl > +import time, socket, os, logging, fcntl, re > import kvm_utils > > """ > @@ -113,6 +113,7 @@ class VM: > self.qemu_path = qemu_path > self.image_dir = image_dir > self.iso_dir = iso_dir > +self.remote = False > > > # Find available monitor filename > @@ -170,8 +171,6 @@ class VM: > file.close() > if not self.qemu_path in cmdline: > return False > -if not self.monitor_file_name in cmdline: > -return False > return True > > > @@ -234,8 +233,6 @@ class VM: > qemu_cmd += qemu_path > # Add the VM's name > qemu_cmd += " -name '%s'" % name > -# Add the monitor socket parameter > -qemu_cmd += " -monitor unix:%s,server,nowait" % > self.monitor_file_name > > for image_name in kvm_utils.get_sub_dict_names(params, "images"): > image_params = kvm_utils.get_sub_dict(params, image_name) > @@ -320,6 +317,18 @@ class VM: > qemu_path = self.qemu_path > image_dir = self.image_dir > iso_dir = self.iso_dir > +# If VM is remote, set hostip to ip of the remote machine > +# If VM is local set hostip to localhost or hostip param > +if params.get("remote") == "yes": > +self.remote = True > +self.hostip = params.get("remoteip") > +self.qemu_path = params.get("qemu_path",qemu_path) > +qemu_path = self.qemu_path > +self.image_dir = params.get("image_dir",image_dir) > +image_dir = self.image_dir > +else: > +self.remote = False > +self.hostip = params.get("hostip","localhost") > > # Verify the md5sum of the ISO image > iso = params.get("cdrom") > @@ -377,9 +386,32 @@ class VM: > # Add -incoming option to the qemu command > qemu_command += " -incoming tcp:0:%d" % self.migration_port > > -logging.debug("Running qemu command:\n%s", qemu_command) > -(status, pid, output) = kvm_utils.run_bg(qemu_command, None, > - logging.debug, "(qemu) > ") > +self.monitor_po
Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test
Hello Michael: The shutdown case is useful but the patch does really similar work to vm.destroy(). Maybe we could simple let the preprocess progress to shutdown all the vms just like: - shutdown: install setup vms = '' Michael Goldish 写道: > The shutdown test logs into a VM and sends a shutdown command. > It serves two purposes: > - Test KVM's ability to shut down. > - Clean up after the other tests: > Currently VMs of the last test remain alive when Autotest finishes running. > When one guest finishes testing and another begins, the VM is automatically > shut down by the preprocessor because the QEMU command required for the next > guest differs from that of the guest that just finished. In the case of the > final guest this doesn't happen because no guest follows it, so the > preprocessor > must be explicitly instructed to kill the VM. > However, we have no easy way of knowing which test runs last because the user > usually selects a subset of the tests/guests. > The addition of a shutdown test can be a decent solution to this small > problem: > by convention the shutdown test will always be the last to run, and if users > wish to clean up after the tests, they must select the shutdown test. > > Note: it is beneficial to allow users to leave the VMs of the last test > running > because it saves time when developing and testing tests. A test writer can run > the test once on a VM, and when the test exits, make some modifications to its > code and re-run it on the same living VM, and repeat this procedure without > having to shutdown/boot the VM every time. > > Signed-off-by: Michael Goldish > --- > client/tests/kvm/kvm.py |1 + > client/tests/kvm/kvm_tests.py | 37 + > 2 files changed, 38 insertions(+), 0 deletions(-) > > diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py > index 9428162..aa727da 100644 > --- a/client/tests/kvm/kvm.py > +++ b/client/tests/kvm/kvm.py > @@ -48,6 +48,7 @@ class kvm(test.test): > "steps":test_routine("kvm_guest_wizard", > "run_steps"), > "stepmaker":test_routine("stepmaker", "run_stepmaker"), > "boot": test_routine("kvm_tests", "run_boot"), > +"shutdown": test_routine("kvm_tests", "run_shutdown"), > "migration":test_routine("kvm_tests", "run_migration"), > "yum_update": test_routine("kvm_tests", "run_yum_update"), > "autotest": test_routine("kvm_tests", "run_autotest"), > diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py > index ffe9116..4c9653f 100644 > --- a/client/tests/kvm/kvm_tests.py > +++ b/client/tests/kvm/kvm_tests.py > @@ -57,6 +57,43 @@ def run_boot(test, params, env): > session.close() > > > +def run_shutdown(test, params, env): > +""" > +KVM shutdown test: > +1) Log into a guest > +2) Send a shutdown command to the guest > +3) Wait until it's down > + > +@param test: kvm test object > +@param params: Dictionary with the test parameters > +@param env: Dictionary with test environment > +""" > +vm = kvm_utils.env_get_vm(env, params.get("main_vm")) > +if not vm: > +raise error.TestError("VM object not found in environment") > +if not vm.is_alive(): > +raise error.TestError("VM seems to be dead; Test requires a living > VM") > + > +logging.info("Waiting for guest to be up...") > + > +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) > +if not session: > +raise error.TestFail("Could not log into guest") > + > +logging.info("Logged in") > + > +# Send the VM's shutdown command > +session.sendline(vm.get_params().get("cmd_shutdown")) > +session.close() > + > +logging.info("Shutdown command sent; waiting for guest to go down...") > + > +if not kvm_utils.wait_for(vm.is_dead, 120, 0, 1): > +raise error.TestFail("Guest refuses to go down") > + > +logging.info("Guest is down") > + > + > def run_migration(test, params, env): > """ > KVM migration test: > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] add sysenter/syscall emulation for 32bit compat mode
Hi Andre, On (Tue) Jun 16 2009 [15:25:13], Andre Przywara wrote: > sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas > syscall is not supported on Intel's 32bit compat mode. To allow cross > vendor migration we emulate the missing instructions by setting up the > processor state accordingly. > The sysenter code was originally sketched by Amit Shah, it was completed, > debugged, syscall added and made-to-work by Christoph Egger and polished > up by Andre Przywara. > Please note that sysret does not need to be emulated, because it will be > exectued in 64bit mode and returning to 32bit compat mode works on Intel. Thanks for picking this up. You also had a testcase for this, right? Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM autotest patch queue - All patches worked out
All the patches on the first patch queue e-mail were examined and worked out. This is a report of the patches that weren't applied yet. Some of them might not make it into the kvm test, but all of them got follow ups. Here's the latest report of the patches available: Patch queue: David Huff (dh...@redhat.com) 1) Move kvm functional tests to a 'test' directory http://kerneltrap.org/mailarchive/linux-kvm/2009/5/26/5812453 I have reworked this patch and made my own version. I need to split it into smaller patches though. Reworked, needs review. Yogi (anant...@linux.vnet.ibm.com) 1) Support for remote migration http://kerneltrap.org/mailarchive/linux-kvm/2009/4/30/5607344/thread I have rebased the patch. However, the patch review and discussion shows that we want to implement remote migration as a server side test instead of a client side one. The rebase is just in case we need the patch in the future. Probably not going to apply, look at a different way to implement the test. Alexey Eromenko (aerom...@redhat.com) 1) New test module: iperf http://kerneltrap.org/mailarchive/linux-kvm/2009/5/31/5840973/thread Rebase made, made some comments to it. * The idea of shipping binaries doesn't look very appealing to me * Autotest already has an iperf test, might be worth a look for linux guests Pending some issues during review. Jason Wang (jasow...@redhat.com) 1) TAP network support in kvm-autotest Michael Goldish has a similar patch to this one, so we need to decide which one is going to be used. Michael, when possible, please send your patch to support TAP network on the kvm test. Need to get a patch from Michael in order to decide which one to apply (I believe he can merge the good aspects of both into a single patch series). Thank you very much, -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH 1/2] KVM test: add shutdown test
- "jason wang" wrote: > Hello Michael: > > The shutdown case is useful but the patch does really similar work to > vm.destroy(). Maybe we could simple let the preprocess progress to > shutdown all the vms just like: > > - shutdown: install setup > vms = '' That will shut them all down, but it won't report the results, so the test will always pass. I thought it wouldn't hurt to have an actual test that shuts the guest down. If the test fails to shut down nicely, the postprocessor will take over and shut down with a 'quit'. > Michael Goldish 写道: > > The shutdown test logs into a VM and sends a shutdown command. > > It serves two purposes: > > - Test KVM's ability to shut down. > > - Clean up after the other tests: > > Currently VMs of the last test remain alive when Autotest finishes > running. > > When one guest finishes testing and another begins, the VM is > automatically > > shut down by the preprocessor because the QEMU command required for > the next > > guest differs from that of the guest that just finished. In the > case of the > > final guest this doesn't happen because no guest follows it, so the > preprocessor > > must be explicitly instructed to kill the VM. > > However, we have no easy way of knowing which test runs last because > the user > > usually selects a subset of the tests/guests. > > The addition of a shutdown test can be a decent solution to this > small problem: > > by convention the shutdown test will always be the last to run, and > if users > > wish to clean up after the tests, they must select the shutdown > test. > > > > Note: it is beneficial to allow users to leave the VMs of the last > test running > > because it saves time when developing and testing tests. A test > writer can run > > the test once on a VM, and when the test exits, make some > modifications to its > > code and re-run it on the same living VM, and repeat this procedure > without > > having to shutdown/boot the VM every time. > > > > Signed-off-by: Michael Goldish > > --- > > client/tests/kvm/kvm.py |1 + > > client/tests/kvm/kvm_tests.py | 37 > + > > 2 files changed, 38 insertions(+), 0 deletions(-) > > > > diff --git a/client/tests/kvm/kvm.py b/client/tests/kvm/kvm.py > > index 9428162..aa727da 100644 > > --- a/client/tests/kvm/kvm.py > > +++ b/client/tests/kvm/kvm.py > > @@ -48,6 +48,7 @@ class kvm(test.test): > > "steps":test_routine("kvm_guest_wizard", > "run_steps"), > > "stepmaker":test_routine("stepmaker", > "run_stepmaker"), > > "boot": test_routine("kvm_tests", > "run_boot"), > > +"shutdown": test_routine("kvm_tests", > "run_shutdown"), > > "migration":test_routine("kvm_tests", > "run_migration"), > > "yum_update": test_routine("kvm_tests", > "run_yum_update"), > > "autotest": test_routine("kvm_tests", > "run_autotest"), > > diff --git a/client/tests/kvm/kvm_tests.py > b/client/tests/kvm/kvm_tests.py > > index ffe9116..4c9653f 100644 > > --- a/client/tests/kvm/kvm_tests.py > > +++ b/client/tests/kvm/kvm_tests.py > > @@ -57,6 +57,43 @@ def run_boot(test, params, env): > > session.close() > > > > > > +def run_shutdown(test, params, env): > > +""" > > +KVM shutdown test: > > +1) Log into a guest > > +2) Send a shutdown command to the guest > > +3) Wait until it's down > > + > > +@param test: kvm test object > > +@param params: Dictionary with the test parameters > > +@param env: Dictionary with test environment > > +""" > > +vm = kvm_utils.env_get_vm(env, params.get("main_vm")) > > +if not vm: > > +raise error.TestError("VM object not found in > environment") > > +if not vm.is_alive(): > > +raise error.TestError("VM seems to be dead; Test requires a > living VM") > > + > > +logging.info("Waiting for guest to be up...") > > + > > +session = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) > > +if not session: > > +raise error.TestFail("Could not log into guest") > > + > > +logging.info("Logged in") > > + > > +# Send the VM's shutdown command > > +session.sendline(vm.get_params().get("cmd_shutdown")) > > +session.close() > > + > > +logging.info("Shutdown command sent; waiting for guest to go > down...") > > + > > +if not kvm_utils.wait_for(vm.is_dead, 120, 0, 1): > > +raise error.TestFail("Guest refuses to go down") > > + > > +logging.info("Guest is down") > > + > > + > > def run_migration(test, params, env): > > """ > > KVM migration test: > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/16/2009 09:32 PM, Jamie Lokier wrote: Avi Kivity wrote: Another issue is enumeration. Guests will present their devices in the order they find them on the pci bus (of course enumeration is guest specific). So if I have 2 virtio controllers the only way I can distinguish between them is using their pci slots. virtio controllers really should have a user-suppliable string or UUID to identify them to the guest. Don't they? virtio controllers don't exist. When they do, they may have a UUID or not, but in either case guest infrastructure is in place for reporting the PCI slot, not the UUID. virtio disks do have a UUID. I don't think older versions of Windows will use it though, so if you reorder your slots you'll see your drive letters change. Same with Linux if you don't use udev by-uuid rules. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html