[Qemu-devel] Re: [[RfC PATCH]] linux fbdev display driver prototype.
This looks very promissing. I just got a couple of observations: - Your patch does not work on my machine with the vesafb driver. It reports "can't handle 8 bpp frame buffers". It turns out that the vesafb driver seems to initialize the framebuffer in PSEUDOCOLOR mode. Depends on the video mode you ask for via vga=$nr, there are also 32bpp modes. I think we should add a piece of code which tries reinitialize the framebuffer with the suitable parametters (32bpp/TRUECOLOR). With vesafb it wouldn't work anyway, you can't switch these parameters at runtime. I think the *drmfb fbdev interface is quite limited too in what it allows to change. - You should register a Display Allocator and override the create_displaysurface() method like I did in the DirectFB driver. This way you save qemu a data copy. fbdev_render_32() should only be used when the guest framebuffer is not compatible with the physical framebuffer (guest_bpp != physical_bbp || guest_linesize != physical_linesize). Isn't a trivial move though. If the console is switched you must stop drawing on the framebuffer. Right now this is easy: just stop copying. Likewise restoring the screen when switching back is easy: just copy everything. If we give out pointers to the framebuffer to other qemu code which doesn't know anything about console switching we have to be quite careful get things right ... - A cool feature would be to be able to stretch the guest display in fullscreen. My DirectFB driver implements a fullscreen toggle command by pressing the Ctrl-Alt-Return keys. I think Stefano added a SDL zoom feature a while ago which we could reuse for this. The actual stretching is done by SDL I think. For that kind of stuff a rendering library is actually helpful ... - I'm not very familiar with the scancode stuff, but I think that if you set your VT fd in the K_RAW keyboard mode, you'll be able to get true keyboard scancodes that you can directly give to the guest using the kbd_put_keycode() function. I'm not sure this is really portable. What do you get in K_RAW mode on !x86 platforms? K_MEDIUMRAW gives you linux input layer key codes no matter what. Also the translation to keysyms (for text consoles) is easier with mediumraw. cheers, Gerd
[Qemu-devel] Jämställdhet
Om du har problem med att läsa detta e-postmeddelande, klicka här (http://www.anp.se/newsletter/706025/444059437941455D4B7142445C43) för en webb-version. Vårt nyhetsbrev skickas automatiskt till våra kunder och intressenter. Vill du inte ha detta nyhetsbrev framöver, klicka här för att avprenumerera (http://www.anp.se/oa/706025/444059437941455D4B7142445C43). Nyhetsbrev 21/2010Detta nyhetsbrev är skickat till: qemu-devel@nongnu.org (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se&from=172191075&prefix=dm) (http://www.anp.se/taf/706025/444059437941455D4B7142445C43) (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/out_newsletters.asp&from=172191075&prefix=dm) (http://www.anp.se/newsletter.asp?sqid=706025&sid=444059437941455D4B7142445C43&print=true) Jämställdhetsarbete Jämställdhet råder när kvinnor och män har samma rättigheter, möjligheter och skyldigheter inom alla områden. Trots att vi går mot ett mer och mer jämställt samhälle, så har vi ändå en lång bit kvar innan vi nått målet. Ett steg på vägen till att bli ett mer jämställt företag är att upprätta en jämställdhetsplan. I verksamheter med 25 anställda eller fler skall idag finnas en jämställdhetsplan som revideras vart tredje år. De åtgärder som sätts upp i jämställdhetsplanen måste också genomföras och följas upp. Ett annat verktyg för att främja jämställdhetsarbetet inom företaget är en jämställdhetspoliy. Genom att upprätta en sådan tydliggör man, för de anställda, företagets syn på jämställdhet och jämställdhetsarbete samt vad företaget i sin tur förväntar sig av sina medarbetare i denna fråga. Genom DokuMeras Företagspaket (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/visa-kategorier.asp?id=1321&from=172191075&prefix=dm) får du tillgång till mallar, checklistor, policys, expertsvar och mycket mer som underlättar och juridiskt säkerställer arbetet i ditt företag. Veckans dokument Checklista jämställdhetsarbete (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/checklista_jamstalldhetsarbete_712_dd.html&from=172191075&prefix=dm) >> Jämställdhetspolicy (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/jamstalldhetspolicy_4298_dd.html&from=172191075&prefix=dm) >> Jämställdhetsplan (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/jamstalldhetsplan_711_dd.html&from=172191075&prefix=dm) >> Policy mot trakasserier (http://www.dokumera.se/newsletter_redirect.asp?tohttp://www.dokumera.se/policy_mot_trakasserier_4294_dd.html&from=172191075&prefix=dm) >> Ord från kund Jan Kirkhoff, sälj- och marknadschef IKAROS AB "Det bästa med Företagspaketet är mångfalden och att många på vårt företag på olika befattningar kan använda det." (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/foretagspaketet_1321_dc.html&from=172191075&prefix=dm) (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/out_atq_ppdviewer.asp&from=172191075&prefix=dm) (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/styckvisa_dokumentmallar_1330_dc.html&from=172191075&prefix=dm) (http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/out_contactusmessage.asp&from=172191075&prefix=dm) För en kostnadsfri exklusiv presentation av hur DokuMera kan spara tiotusentals kronor åt just mitt företag. Givetvis är du varmt välkommen att ringa oss på 08-664 04 50. Innehållet i nyhetsbrev ska inte tolkas som ett åtagande från DokuMeras sida. Informationen sänds ut i befintligt skick, utan garantier och digitala signaturer.
Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization
On 05/24/2010 11:22 PM, Anthony Liguori wrote: This converts the entire qdev tree into an undocumented stable protocol (the qdev paths were already in this state I believe). This really worries me. N.B. the association with qdev is only in identifying the device. The contents of the device's state are not part of qdev but rather part of vmstate. vmstate is something that we already guarantee to be stable since that's required for live migration compatibility. That removes out ability to deprecate older vmstate as time passes. Not a blocker but something to consider. I don't think that qdev device names and paths are something we have to worry much about changing over time since they reflect logical bus layout. They should remain static provided the devices remain static. Modulo mistakes. We already saw one (lack of pci domains). To reduce the possibility of mistakes, we need reviewable documentation. Note sysfs had similar assumptions and problems. The qdev properties are a different matter entirely. A command like 'info qdm' would be potentially difficult to support as part of QMP but the proposed command's output is actually already part of a backward compatible interface (vmstate). That's all good. But documentation is critical for this. Not only to improve quality, but also so that tool authors would have something to code against instead of trial and error (which invariably misses some corner cases). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host
On 05/25/2010 01:10 AM, Anthony Liguori wrote: On 05/21/2010 02:50 AM, Andre Przywara wrote: -cpu host currently only propagates the CPU's family/model/stepping, the brand name and the feature bits. Add a whitelist of safe CPUID leafs to let the guest see the actual CPU's cache details and other things. Signed-off-by: Andre Przywara The problem I can see is that this greatly increases the chances of problems with live migration since we don't migrate the cpuid state. -cpu host is already problematic for live migration. Are you talking about the state maintained by the cpuid instruction? Yes, we need to migrate those bits. What's the benefit of exposing this information to the guest? Some algorithms adjust themselves based on the cache size. If you have several passes over a large data set, it's often better to run each set of passes on a subset of the dataset that fits in cache, then stitch the subsets together. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v2 1/3] add some tests for invalid JSON
On 05/24/2010 10:17 PM, Anthony Liguori wrote: On 05/24/2010 02:39 AM, Paolo Bonzini wrote: Signed-off-by: Paolo Bonzini I think this series conflicts a bit with Luiz's series which I just pushed. Could you rebase against the latest? You didn't apply this one yet, at least I don't see it on qemu.git commit e546343ee0f3f904529d32c1a9a60f5baa181852 Author: Luiz Capitulino Date: Wed May 19 18:15:32 2010 -0300 json-lexer: Drop 'buf' QString supports adding a single char, 'buf' is unneeded. Signed-off-by: Luiz Capitulino I based my series on top of Luiz's, so it should apply. The above is the only commit that is actually required. I can ping the series once Luiz's patches are applied, so you can disregard it in the meanwhile. Paolo
[Qemu-devel] [Test] Question
Hi, everyone. I tried to test the qemu, but I found only qemu-i386 is tested. But is there any test about other command like qemu-system-arm or qemu-arm to make sure the function still work after modification? Best Regards, robert
Re: [Qemu-devel] [PATCH 1/5] trace: Add trace-events file for declaring trace events
On Mon, May 24, 2010 at 11:20 PM, Anthony Liguori wrote: >> +# check if trace backend exists >> + >> +sh tracetool "--$trace_backend" --check-backend> /dev/null 2> /dev/null >> > > This will fail if objdir != srcdir. You have to qualify tracetool with the > path to srcdir. Thanks Anthony, fixed on my branch. I'll resend a v2 together with other fixes. Stefan
[Qemu-devel] Re: Windows guest debugging on KVM/Qemu
On 05/24/2010 11:07 PM, Neo Jia wrote: hi, I am using KVM/Qemu to debug my Windows guest according to KVM wiki page (http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging). It works for me and also I can only use one Windows guest and bind its serial port to a TCP port and run "Virtual Serial Ports Emulator" on my Windows dev machine. The problem is that these kind of connection is really slow. Is there any known issue with KVM serial port driver? There is a good discussion about the same issue one year ago. Not sure if there is any improvement or not after that. How slow? Can you measure it (without a debugger, just guest-to-guest file transfer)? slirp used to be ridiculously slow but some recent change made it fairly fast. Probably a missing wakeup, perhaps serial has the same problem. In any case I recommend testing with qemu-kvm.git master. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Hi. Regarding QEMU's GDB server and MMU
Hi all. I am very new to dev for QEMU, so I have some very basic questions. 1) I understand that QEMU has a built-in GDB server that is somewhat a simulation of a JTAG device on dev boards, connected directly to the CPU. Is that a correct analogy? 2) How can the GDB server handle a MMU? Would it "see" physical or virtual addresses? Do I need a special client that can handle this? Thanks! :-) -- Use the source, Luke!
[Qemu-devel] Re: [PATCH 0/6] Make hpet a compile time option
On 05/24/2010 07:54 PM, Juan Quintela wrote: But for the other call, what do you propose? My best try was to hide the availability of hpet inside hpet_emul.h with: #ifdef CONFIG_HPET uint32_t hpet_in_legacy_mode(void); else uint32_t hpet_in_legacy_mode(void) { return 0;} #endif Change this to a global variable rtc_disable_interrupts in hw/mc146818rtc.c? (You didn't say it would need to be particularly pretty...). Not tested beyond compilation. Paolo diff --git a/hw/hpet.c b/hw/hpet.c index 8729fb2..c2615c1 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -29,6 +29,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "mc146818rtc.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -39,14 +40,6 @@ static HPETState *hpet_statep; -uint32_t hpet_in_legacy_mode(void) -{ -if (hpet_statep) -return hpet_statep->config & HPET_CFG_LEGACY; -else -return 0; -} - static uint32_t timer_int_route(struct HPETTimer *timer) { uint32_t route; @@ -139,7 +132,7 @@ static void update_irq(struct HPETTimer *timer) qemu_irq irq; int route; -if (timer->tn <= 1 && hpet_in_legacy_mode()) { +if (timer->tn <= 1 && (timer->state->config & HPET_CFG_LEGACY)) { /* if LegacyReplacementRoute bit is set, HPET specification requires * timer0 be routed to IRQ0 in NON-APIC or IRQ2 in the I/O APIC, * timer1 be routed to IRQ8 in NON-APIC or IRQ8 in the I/O APIC. @@ -474,8 +467,10 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_disable(); + rtc_disable_interrupts = 1; } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_enable(); + rtc_disable_interrupts = 0; } break; case HPET_CFG + 4: diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index 571c593..61d5980 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -94,6 +94,9 @@ typedef struct RTCState { QEMUTimer *second_timer2; } RTCState; + +int rtc_disable_interrupts = 0; + static void rtc_irq_raise(qemu_irq irq) { /* When HPET is operating in legacy mode, RTC interrupts are disabled @@ -101,9 +104,7 @@ static void rtc_irq_raise(qemu_irq irq) * mode is established while interrupt is raised. We want it to * be lowered in any case */ -#if defined TARGET_I386 -if (!hpet_in_legacy_mode()) -#endif +if (!rtc_disable_interrupts) qemu_irq_raise(irq); } @@ -148,14 +149,10 @@ static void rtc_timer_update(RTCState *s, int64_t current_time) int enable_pie; period_code = s->cmos_data[RTC_REG_A] & 0x0f; -#if defined TARGET_I386 /* disable periodic timer if hpet is in legacy mode, since interrupts are * disabled anyway. */ -enable_pie = !hpet_in_legacy_mode(); -#else -enable_pie = 1; -#endif +enable_pie = !rtc_disable_interrupts; if (period_code != 0 && (((s->cmos_data[RTC_REG_B] & REG_B_PIE) && enable_pie) || ((s->cmos_data[RTC_REG_B] & REG_B_SQWE) && s->sqw_irq))) { diff --git a/hw/mc146818rtc.h b/hw/mc146818rtc.h index 6f46a68..ff4bcda 100644 --- a/hw/mc146818rtc.h +++ b/hw/mc146818rtc.h @@ -3,6 +3,7 @@ #include "isa.h" +extern int rtc_disable_interrupts; ISADevice *rtc_init(int base_year); void rtc_set_memory(ISADevice *dev, int addr, int val); void rtc_set_date(ISADevice *dev, const struct tm *tm);
Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation
On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote: > On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel wrote: > >> + > >> +#define MMIO_SIZE 0x2028 > > > > This size should be a power-of-two value. In this case probably 0x4000. > > Not really, the devices can reserve regions of any size. There were > some implementation deficiencies in earlier versions of QEMU, where > the whole page would be reserved anyway, but this limitation has been > removed long time ago. The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux driver maps the MMIO region with this size. So the emulation should reserve this amount of MMIO space too. Joerg
[Qemu-devel] [RFC PATCH 21/23] virtio-blk: Modify save/load handler to handle inuse varialble.
Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura --- hw/virtio.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index 7c020a3..502929c 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -70,7 +70,7 @@ struct VirtQueue VRing vring; target_phys_addr_t pa; uint16_t last_avail_idx; -int inuse; +uint16_t inuse; uint16_t vector; void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq); }; @@ -641,6 +641,7 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f) qemu_put_be32(f, vdev->vq[i].vring.num); qemu_put_be64(f, vdev->vq[i].pa); qemu_put_be16s(f, &vdev->vq[i].last_avail_idx); +qemu_put_be16s(f, &vdev->vq[i].inuse); if (vdev->binding->save_queue) vdev->binding->save_queue(vdev->binding_opaque, i, f); } @@ -678,6 +679,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f) vdev->vq[i].vring.num = qemu_get_be32(f); vdev->vq[i].pa = qemu_get_be64(f); qemu_get_be16s(f, &vdev->vq[i].last_avail_idx); +qemu_get_be16s(f, &vdev->vq[i].inuse); + +/* revert last_avail_idx if there are outstanding emulation. */ +vdev->vq[i].last_avail_idx -= vdev->vq[i].inuse; +vdev->vq[i].inuse = 0; if (vdev->vq[i].pa) { virtqueue_init(&vdev->vq[i]); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 14/23] Call init handler of event-tap at main().
Signed-off-by: Yoshiaki Tamura --- vl.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/vl.c b/vl.c index 70a8aed..56d12c7 100644 --- a/vl.c +++ b/vl.c @@ -169,6 +169,8 @@ int main(int argc, char **argv) #include "qemu-queue.h" +#include "event-tap.h" + //#define DEBUG_NET //#define DEBUG_SLIRP @@ -5949,6 +5951,8 @@ int main(int argc, char **argv, char **envp) blk_mig_init(); +event_tap_init(); + if (default_cdrom) { /* we always create the cdrom drive, even if no disk is there */ drive_add(NULL, CDROM_ALIAS); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 22/23] Introduce -k option to enable FT migration mode (Kemari).
When -k option is set to migrate command, it will turn on ft_mode to start FT migration mode (Kemari). Signed-off-by: Yoshiaki Tamura --- migration.c |3 +++ qemu-monitor.hx |7 --- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/migration.c b/migration.c index 5b90d37..3334650 100644 --- a/migration.c +++ b/migration.c @@ -71,6 +71,9 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data) return -1; } +if (qdict_get_int(qdict, "ft")) +ft_mode = FT_INIT; + if (strstart(uri, "tcp:", &p)) { s = tcp_start_outgoing_migration(mon, p, max_throttle, detach, (int)qdict_get_int(qdict, "blk"), diff --git a/qemu-monitor.hx b/qemu-monitor.hx index 16c45b7..22b72d9 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -765,13 +765,14 @@ ETEXI { .name = "migrate", -.args_type = "detach:-d,blk:-b,inc:-i,uri:s", -.params = "[-d] [-b] [-i] uri", +.args_type = "detach:-d,blk:-b,inc:-i,ft:-k,uri:s", +.params = "[-d] [-b] [-i] [-k] uri", .help = "migrate to URI (using -d to not wait for completion)" "\n\t\t\t -b for migration without shared storage with" " full copy of disk\n\t\t\t -i for migration without " "shared storage with incremental copy of disk " - "(base image shared between src and destination)", + "(base image shared between src and destination)" + "\n\t\t\t -k for FT migration mode (Kemari)", .user_print = monitor_user_noop, .mhandler.cmd_new = do_migrate, }, -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 16/23] Insert event_tap_mmio() to cpu_physical_memory_rw().
Record mmio write event to replay it upon failover. Signed-off-by: Yoshiaki Tamura --- exec.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/exec.c b/exec.c index d5c2a05..e9ed477 100644 --- a/exec.c +++ b/exec.c @@ -44,6 +44,7 @@ #include "hw/hw.h" #include "osdep.h" #include "kvm.h" +#include "event-tap.h" #if defined(CONFIG_USER_ONLY) #include #include @@ -3373,6 +3374,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf, io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1); if (p) addr1 = (addr & ~TARGET_PAGE_MASK) + p->region_offset; + +event_tap_mmio(addr, buf, len); + /* XXX: could force cpu_single_env to NULL to avoid potential bugs */ if (l >= 4 && ((addr1 & 3) == 0)) { -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 04/23] Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages.
Modifies ram_save_block() and ram_save_remaining() to use cpu_physical_memory_get_dirty_range() to check multiple dirty and non-dirty pages at once. Signed-off-by: Yoshiaki Tamura Signed-off-by: OHMURA Kei --- vl.c | 52 +--- 1 files changed, 33 insertions(+), 19 deletions(-) diff --git a/vl.c b/vl.c index 729c955..70a8aed 100644 --- a/vl.c +++ b/vl.c @@ -2779,7 +2779,8 @@ static int ram_save_block(QEMUFile *f) static ram_addr_t current_addr = 0; ram_addr_t saved_addr = current_addr; ram_addr_t addr = 0; -int found = 0; +ram_addr_t dirty_rams[HOST_LONG_BITS]; +int i, found = 0; while (addr < last_ram_offset) { if (kvm_enabled() && current_addr == 0) { @@ -2791,28 +2792,33 @@ static int ram_save_block(QEMUFile *f) return 0; } } -if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) { +if ((found = cpu_physical_memory_get_dirty_range( + current_addr, last_ram_offset, dirty_rams, HOST_LONG_BITS, + MIGRATION_DIRTY_FLAG))) { uint8_t *p; -cpu_physical_memory_reset_dirty(current_addr, -current_addr + TARGET_PAGE_SIZE, -MIGRATION_DIRTY_FLAG); +for (i = 0; i < found; i++) { +ram_addr_t page_addr = dirty_rams[i]; +cpu_physical_memory_reset_dirty(page_addr, +page_addr + TARGET_PAGE_SIZE, +MIGRATION_DIRTY_FLAG); -p = qemu_get_ram_ptr(current_addr); +p = qemu_get_ram_ptr(page_addr); -if (is_dup_page(p, *p)) { -qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_COMPRESS); -qemu_put_byte(f, *p); -} else { -qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_PAGE); -qemu_put_buffer(f, p, TARGET_PAGE_SIZE); +if (is_dup_page(p, *p)) { +qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_COMPRESS); +qemu_put_byte(f, *p); +} else { +qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_PAGE); +qemu_put_buffer(f, p, TARGET_PAGE_SIZE); +} } -found = 1; break; +} else { +addr += dirty_rams[0]; +current_addr = (saved_addr + addr) % last_ram_offset; } -addr += TARGET_PAGE_SIZE; -current_addr = (saved_addr + addr) % last_ram_offset; } return found; @@ -2822,12 +2828,20 @@ static uint64_t bytes_transferred; static ram_addr_t ram_save_remaining(void) { -ram_addr_t addr; +ram_addr_t addr = 0; ram_addr_t count = 0; +ram_addr_t dirty_rams[HOST_LONG_BITS]; +int found = 0; -for (addr = 0; addr < last_ram_offset; addr += TARGET_PAGE_SIZE) { -if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) -count++; +while (addr < last_ram_offset) { +if ((found = cpu_physical_memory_get_dirty_range( + addr, last_ram_offset, dirty_rams, HOST_LONG_BITS, + MIGRATION_DIRTY_FLAG))) { +count += found; +addr = dirty_rams[found - 1] + TARGET_PAGE_SIZE; +} else { +addr += dirty_rams[0]; +} } return count; -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 12/23] Insent event-tap callbacks to net/block layer.
Introduce event-tap callbacks to functions which actually fire outputs at net/block layer. By synchronizing VMs before outputs are fired, we can failover to the receiver upon failure. Signed-off-by: Yoshiaki Tamura --- block.c | 22 ++ block.h |4 net/queue.c | 18 ++ net/queue.h |3 +++ 4 files changed, 47 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 31d1ba4..cf73c47 100644 --- a/block.c +++ b/block.c @@ -59,6 +59,8 @@ BlockDriverState *bdrv_first; static BlockDriver *first_drv; +static int (*bdrv_event_tap)(void); + /* If non-zero, use only whitelisted block drivers */ static int use_bdrv_whitelist; @@ -787,6 +789,10 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num, set_dirty_bitmap(bs, sector_num, nb_sectors, 1); } +if (bdrv_event_tap != NULL) { +bdrv_event_tap(); +} + return drv->bdrv_write(bs, sector_num, buf, nb_sectors); } @@ -1851,6 +1857,10 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs) MultiwriteCB *mcb; int i; +if (bdrv_event_tap != NULL) { +bdrv_event_tap(); +} + if (num_reqs == 0) { return 0; } @@ -2277,3 +2287,15 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs) { return bs->dirty_count; } + +void bdrv_event_tap_register(int (*cb)(void)) +{ +if (bdrv_event_tap == NULL) { +bdrv_event_tap = cb; +} +} + +void bdrv_event_tap_unregister(void) +{ +bdrv_event_tap = NULL; +} diff --git a/block.h b/block.h index edf5704..b5139db 100644 --- a/block.h +++ b/block.h @@ -207,4 +207,8 @@ int bdrv_get_dirty(BlockDriverState *bs, int64_t sector); void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors); int64_t bdrv_get_dirty_count(BlockDriverState *bs); + +void bdrv_event_tap_register(int (*cb)(void)); +void bdrv_event_tap_unregister(void); + #endif diff --git a/net/queue.c b/net/queue.c index 2ea6cd0..a542efe 100644 --- a/net/queue.c +++ b/net/queue.c @@ -57,6 +57,8 @@ struct NetQueue { unsigned delivering : 1; }; +static int (*net_event_tap)(void); + NetQueue *qemu_new_net_queue(NetPacketDeliver *deliver, NetPacketDeliverIOV *deliver_iov, void *opaque) @@ -151,6 +153,8 @@ static ssize_t qemu_net_queue_deliver(NetQueue *queue, ssize_t ret = -1; queue->delivering = 1; +if (net_event_tap) +net_event_tap(); ret = queue->deliver(sender, flags, data, size, queue->opaque); queue->delivering = 0; @@ -166,6 +170,8 @@ static ssize_t qemu_net_queue_deliver_iov(NetQueue *queue, ssize_t ret = -1; queue->delivering = 1; +if (net_event_tap) +net_event_tap(); ret = queue->deliver_iov(sender, flags, iov, iovcnt, queue->opaque); queue->delivering = 0; @@ -258,3 +264,15 @@ void qemu_net_queue_flush(NetQueue *queue) qemu_free(packet); } } + +void qemu_net_event_tap_register(int (*cb)(void)) +{ +if (net_event_tap == NULL) { +net_event_tap = cb; +} +} + +void qemu_net_event_tap_unregister(void) +{ +net_event_tap = NULL; +} diff --git a/net/queue.h b/net/queue.h index a31958e..5b031c1 100644 --- a/net/queue.h +++ b/net/queue.h @@ -68,4 +68,7 @@ ssize_t qemu_net_queue_send_iov(NetQueue *queue, void qemu_net_queue_purge(NetQueue *queue, VLANClientState *from); void qemu_net_queue_flush(NetQueue *queue); +void qemu_net_event_tap_register(int (*cb)(void)); +void qemu_net_event_tap_unregister(void); + #endif /* QEMU_NET_QUEUE_H */ -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 15/23] Insert event_tap_ioport() to ioport_write().
Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura --- ioport.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index 53dd87a..ad7a017 100644 --- a/ioport.c +++ b/ioport.c @@ -26,6 +26,7 @@ */ #include "ioport.h" +#include "event-tap.h" /***/ /* IO Port */ @@ -75,6 +76,7 @@ static void ioport_write(int index, uint32_t address, uint32_t data) default_ioport_writel }; IOPortWriteFunc *func = ioport_write_table[index][address]; +event_tap_ioport(index, address, data); if (!func) func = default_func[index]; func(ioport_opaque[address], address, data); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 10/23] Introduce util functions to control ft_transaction from savevm layer.
To utilize ft_transaction function, savevm needs interfaces to be exported. Signed-off-by: Yoshiaki Tamura --- hw/hw.h |5 + savevm.c | 41 + 2 files changed, 46 insertions(+), 0 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index fc9ed29..5a48a91 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -54,6 +54,8 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, QEMUFile *qemu_fopen(const char *filename, const char *mode); QEMUFile *qemu_fdopen(int fd, const char *mode); QEMUFile *qemu_fopen_socket(int fd); +QEMUFile *qemu_fopen_transaction(int fd); +QEMUFile *qemu_fopen_tranx_sender(void *opaque); QEMUFile *qemu_popen(FILE *popen_file, const char *mode); QEMUFile *qemu_popen_cmd(const char *command, const char *mode); int qemu_stdio_fd(QEMUFile *f); @@ -63,6 +65,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size); void qemu_put_byte(QEMUFile *f, int v); void *qemu_realloc_buffer(QEMUFile *f, int size); void qemu_clear_buffer(QEMUFile *f); +int qemu_transaction_begin(QEMUFile *f); +int qemu_transaction_commit(QEMUFile *f); +int qemu_transaction_cancel(QEMUFile *f); static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v) { diff --git a/savevm.c b/savevm.c index 2ab883b..81cb711 100644 --- a/savevm.c +++ b/savevm.c @@ -82,6 +82,7 @@ #include "migration.h" #include "qemu_socket.h" #include "qemu-queue.h" +#include "ft_transaction.h" /* point to the block driver where the snapshots are managed */ static BlockDriverState *bs_snapshots; @@ -207,6 +208,21 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) return len; } +static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size) +{ +QEMUFileSocket *s = opaque; +ssize_t len; + +do { +len = send(s->fd, (void *)buf, size, 0); +} while (len == -1 && socket_error() == EINTR); + +if (len == -1) +len = -socket_error(); + +return len; +} + static int socket_close(void *opaque) { QEMUFileSocket *s = opaque; @@ -335,6 +351,16 @@ QEMUFile *qemu_fopen_socket(int fd) return s->file; } +QEMUFile *qemu_fopen_transaction(int fd) +{ +QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket)); + +s->fd = fd; +s->file = qemu_fopen_ops_ft_tranx(s, socket_put_buffer, socket_get_buffer, + socket_close, 0); +return s->file; +} + static int file_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { @@ -472,6 +498,21 @@ void qemu_clear_buffer(QEMUFile *f) memset(f->buf, 0, f->buf_max_size); } +int qemu_transaction_begin(QEMUFile *f) +{ +return qemu_ft_tranx_begin(f->opaque); +} + +int qemu_transaction_commit(QEMUFile *f) +{ +return qemu_ft_tranx_commit(f->opaque); +} + +int qemu_transaction_cancel(QEMUFile *f) +{ +return qemu_ft_tranx_cancel(f->opaque); +} + static void qemu_fill_buffer(QEMUFile *f) { int len; -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 23/23] Add a parser to accept FT migration incoming mode.
The option looks like, -incoming ::,ft_mode Signed-off-by: Yoshiaki Tamura --- migration.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/migration.c b/migration.c index 3334650..a4850f9 100644 --- a/migration.c +++ b/migration.c @@ -42,7 +42,19 @@ static MigrationState *current_migration; void qemu_start_incoming_migration(const char *uri) { -const char *p; +const char *p = uri; + +/* check ft_mode option */ +while (*p != '\0') { +if (*p == ',') { +p++; +if (!strcmp(p, "ft_mode")) { +ft_mode = FT_INIT; +break; +} +} +p++; +} if (strstart(uri, "tcp:", &p)) tcp_start_incoming_migration(p); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 03/23] Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.
Modifies kvm_get_dirty_pages_log_range to use cpu_physical_memory_set_dirty_range() to update the row of the bit-based phys_ram_dirty bitmap at once. Signed-off-by: Yoshiaki Tamura Signed-off-by: OHMURA Kei --- qemu-kvm.c | 19 +++ 1 files changed, 7 insertions(+), 12 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 29365a9..1414f49 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2323,8 +2323,8 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, unsigned long offset, unsigned long mem_size) { -unsigned int i, j; -unsigned long page_number, addr, addr1, c; +unsigned int i; +unsigned long page_number, addr, addr1; ram_addr_t ram_addr; unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS; @@ -2335,16 +2335,11 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, */ for (i = 0; i < len; i++) { if (bitmap[i] != 0) { -c = leul_to_cpu(bitmap[i]); -do { -j = ffsl(c) - 1; -c &= ~(1ul << j); -page_number = i * HOST_LONG_BITS + j; -addr1 = page_number * TARGET_PAGE_SIZE; -addr = offset + addr1; -ram_addr = cpu_get_physical_page_desc(addr); -cpu_physical_memory_set_dirty(ram_addr); -} while (c != 0); +page_number = i * HOST_LONG_BITS; +addr1 = page_number * TARGET_PAGE_SIZE; +addr = offset + addr1; +ram_addr = cpu_get_physical_page_desc(addr); +cpu_physical_memory_set_dirty_range(ram_addr, leul_to_cpu(bitmap[i])); } } return 0; -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 19/23] Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when ft_mode is on.
Introduce ft_tranx_ready() which kicks the FT transaction cycle. When ft_mode is on, migrate_fd_put_ready() would open ft_transaction file and turn on event_tap. To end or cancel ft_transaction, ft_mode and event_tap is turned off. Signed-off-by: Yoshiaki Tamura --- migration.c | 78 -- 1 files changed, 75 insertions(+), 3 deletions(-) diff --git a/migration.c b/migration.c index 2adf7ad..5b90d37 100644 --- a/migration.c +++ b/migration.c @@ -21,6 +21,7 @@ #include "qemu_socket.h" #include "block-migration.h" #include "qemu-objects.h" +#include "event-tap.h" //#define DEBUG_MIGRATION @@ -375,6 +376,49 @@ void migrate_fd_connect(FdMigrationState *s) migrate_fd_put_ready(s); } +static int ft_tranx_ready(void) +{ +FdMigrationState *s = migrate_to_fms(current_migration); +int ret = -1; + +if (ft_mode != FT_TRANSACTION && ft_mode != FT_INIT) { +return ret; +} + +if (qemu_transaction_begin(s->file) < 0) { +fprintf(stderr, "tranx_begin failed\n"); +goto error_out; +} + +/* make the VM state consistent by flushing outstanding requests. */ +vm_stop(0); +qemu_aio_flush(); +bdrv_flush_all(); + +if (qemu_savevm_state_all(s->mon, s->file) < 0) { +fprintf(stderr, "savevm_state_all failed\n"); +goto error_out; +} + +if (qemu_transaction_commit(s->file) < 0) { +fprintf(stderr, "tranx_commit failed\n"); +goto error_out; +} + +ret = 0; +goto unpause_out; + +error_out: +ft_mode = FT_OFF; +qemu_savevm_state_cancel(s->mon, s->file); +migrate_fd_cleanup(s); +event_tap_unregister(); + +unpause_out: +vm_start(); +return ret; +} + void migrate_fd_put_ready(void *opaque) { FdMigrationState *s = opaque; @@ -402,8 +446,30 @@ void migrate_fd_put_ready(void *opaque) } else { state = MIG_STATE_COMPLETED; } -migrate_fd_cleanup(s); -s->state = state; + +if (ft_mode && state == MIG_STATE_COMPLETED) { +/* close buffered_file and open ft_transaction. + * Note: file discriptor won't get closed, + * but reused by ft_transaction. */ +socket_set_block(s->fd); +socket_set_nodelay(s->fd); +qemu_fclose(s->file); +s->file = qemu_fopen_ops_ft_tranx(s, + migrate_fd_put_buffer, + migrate_fd_get_buffer, + migrate_fd_close, + 1); + +/* events are tapped from now. */ +event_tap_register(ft_tranx_ready); + +if (old_vm_running) { +vm_start(); +} +} else { +migrate_fd_cleanup(s); +s->state = state; +} } } @@ -423,8 +489,14 @@ void migrate_fd_cancel(MigrationState *mig_state) DPRINTF("cancelling migration\n"); s->state = MIG_STATE_CANCELLED; -qemu_savevm_state_cancel(s->mon, s->file); +if (ft_mode == FT_TRANSACTION) { +qemu_transaction_cancel(s->file); +ft_mode = FT_OFF; +event_tap_unregister(); +} + +qemu_savevm_state_cancel(s->mon, s->file); migrate_fd_cleanup(s); } -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 07/23] Introduce skip_header parameter to qemu_loadvm_state().
Introduce skip_header parameter to qemu_loadvm_state() so that it can be called iteratively without reading the header. Signed-off-by: Yoshiaki Tamura --- migration-exec.c |2 +- migration-fd.c |2 +- migration-tcp.c |2 +- migration-unix.c |2 +- savevm.c | 24 +--- sysemu.h |2 +- 6 files changed, 18 insertions(+), 16 deletions(-) diff --git a/migration-exec.c b/migration-exec.c index 3edc026..5839a6d 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -113,7 +113,7 @@ static void exec_accept_incoming_migration(void *opaque) QEMUFile *f = opaque; int ret; -ret = qemu_loadvm_state(f); +ret = qemu_loadvm_state(f, 0); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); goto err; diff --git a/migration-fd.c b/migration-fd.c index 0cc74ad..0e97ed0 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -106,7 +106,7 @@ static void fd_accept_incoming_migration(void *opaque) QEMUFile *f = opaque; int ret; -ret = qemu_loadvm_state(f); +ret = qemu_loadvm_state(f, 0); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); goto err; diff --git a/migration-tcp.c b/migration-tcp.c index cffc4df..767a2f1 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -176,7 +176,7 @@ static void tcp_accept_incoming_migration(void *opaque) goto out; } -ret = qemu_loadvm_state(f); +ret = qemu_loadvm_state(f, 0); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); goto out_fopen; diff --git a/migration-unix.c b/migration-unix.c index b7aab38..dd99a73 100644 --- a/migration-unix.c +++ b/migration-unix.c @@ -168,7 +168,7 @@ static void unix_accept_incoming_migration(void *opaque) goto out; } -ret = qemu_loadvm_state(f); +ret = qemu_loadvm_state(f, 0); if (ret < 0) { fprintf(stderr, "load of migration failed\n"); goto out_fopen; diff --git a/savevm.c b/savevm.c index b9bb9f4..2ab883b 100644 --- a/savevm.c +++ b/savevm.c @@ -1489,7 +1489,7 @@ typedef struct LoadStateEntry { int version_id; } LoadStateEntry; -int qemu_loadvm_state(QEMUFile *f) +int qemu_loadvm_state(QEMUFile *f, int skip_header) { QLIST_HEAD(, LoadStateEntry) loadvm_handlers = QLIST_HEAD_INITIALIZER(loadvm_handlers); @@ -1498,17 +1498,19 @@ int qemu_loadvm_state(QEMUFile *f) unsigned int v; int ret; -v = qemu_get_be32(f); -if (v != QEMU_VM_FILE_MAGIC) -return -EINVAL; +if (!skip_header) { +v = qemu_get_be32(f); +if (v != QEMU_VM_FILE_MAGIC) +return -EINVAL; -v = qemu_get_be32(f); -if (v == QEMU_VM_FILE_VERSION_COMPAT) { -fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n"); -return -ENOTSUP; +v = qemu_get_be32(f); +if (v == QEMU_VM_FILE_VERSION_COMPAT) { +fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n"); +return -ENOTSUP; +} +if (v != QEMU_VM_FILE_VERSION) +return -ENOTSUP; } -if (v != QEMU_VM_FILE_VERSION) -return -ENOTSUP; while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) { uint32_t instance_id, version_id, section_id; @@ -1833,7 +1835,7 @@ int load_vmstate(Monitor *mon, const char *name) monitor_printf(mon, "Could not open VM state file\n"); return -EINVAL; } -ret = qemu_loadvm_state(f); +ret = qemu_loadvm_state(f, 0); qemu_fclose(f); if (ret < 0) { monitor_printf(mon, "Error %d while loading VM state\n", ret); diff --git a/sysemu.h b/sysemu.h index 647a468..6c1441f 100644 --- a/sysemu.h +++ b/sysemu.h @@ -68,7 +68,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable, int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f); int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f); void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f); -int qemu_loadvm_state(QEMUFile *f); +int qemu_loadvm_state(QEMUFile *f, int skip_header); void qemu_errors_to_file(FILE *fp); void qemu_errors_to_mon(Monitor *mon); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 06/23] Introduce read() to FdMigrationState.
Currently FdMigrationState doesn't support read(), and this patch introduces it to get response from the other side. Signed-off-by: Yoshiaki Tamura --- migration-tcp.c | 14 ++ migration.c | 12 migration.h |3 +++ 3 files changed, 29 insertions(+), 0 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index e7f307c..cffc4df 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -39,6 +39,19 @@ static int socket_write(FdMigrationState *s, const void * buf, size_t size) return send(s->fd, buf, size, 0); } +static int socket_read(FdMigrationState *s, const void * buf, size_t size) +{ +ssize_t len; + +do { +len = recv(s->fd, (void *)buf, size, 0); +} while (len == -1 && socket_error() == EINTR); +if (len == -1) +len = -socket_error(); + +return len; +} + static int tcp_close(FdMigrationState *s) { DPRINTF("tcp_close\n"); @@ -94,6 +107,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon, s->get_error = socket_errno; s->write = socket_write; +s->read = socket_read; s->close = tcp_close; s->mig_state.cancel = migrate_fd_cancel; s->mig_state.get_status = migrate_fd_get_status; diff --git a/migration.c b/migration.c index 05f6cc5..a2ca6ef 100644 --- a/migration.c +++ b/migration.c @@ -337,6 +337,18 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size) return ret; } +int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, int size) +{ +FdMigrationState *s = opaque; +ssize_t ret; +ret = s->read(s, data, size); + +if (ret == -1) +ret = -(s->get_error(s)); + +return ret; +} + void migrate_fd_connect(FdMigrationState *s) { int ret; diff --git a/migration.h b/migration.h index 385423f..6f8af97 100644 --- a/migration.h +++ b/migration.h @@ -47,6 +47,7 @@ struct FdMigrationState int (*get_error)(struct FdMigrationState*); int (*close)(struct FdMigrationState*); int (*write)(struct FdMigrationState*, const void *, size_t); +int (*read)(struct FdMigrationState *, const void *, size_t); void *opaque; }; @@ -113,6 +114,8 @@ void migrate_fd_put_notify(void *opaque); ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size); +int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, int size); + void migrate_fd_connect(FdMigrationState *s); void migrate_fd_put_ready(void *opaque); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 08/23] Introduce some socket util functions.
Signed-off-by: Yoshiaki Tamura --- osdep.c | 13 + qemu-char.c | 25 - qemu_socket.h |4 3 files changed, 41 insertions(+), 1 deletions(-) diff --git a/osdep.c b/osdep.c index 3bab79a..63444e7 100644 --- a/osdep.c +++ b/osdep.c @@ -201,6 +201,12 @@ void socket_set_nonblock(int fd) ioctlsocket(fd, FIONBIO, &opt); } +void socket_set_block(int fd) +{ +unsigned long opt = 0; +ioctlsocket(fd, FIONBIO, &opt); +} + int inet_aton(const char *cp, struct in_addr *ia) { uint32_t addr = inet_addr(cp); @@ -223,6 +229,13 @@ void socket_set_nonblock(int fd) fcntl(fd, F_SETFL, f | O_NONBLOCK); } +void socket_set_block(int fd) +{ +int f; +f = fcntl(fd, F_GETFL); +fcntl(fd, F_SETFL, f & ~O_NONBLOCK); +} + void qemu_set_cloexec(int fd) { int f; diff --git a/qemu-char.c b/qemu-char.c index 4169492..ccdf394 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2092,12 +2092,35 @@ static void tcp_chr_telnet_init(int fd) send(fd, (char *)buf, 3, 0); } -static void socket_set_nodelay(int fd) +void socket_set_delay(int fd) +{ +int val = 0; +setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&val, sizeof(val)); +} + +void socket_set_nodelay(int fd) { int val = 1; setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&val, sizeof(val)); } +void socket_set_timeout(int fd, int s) +{ +struct timeval tv = { +.tv_sec = s, +.tv_usec = 0 +}; +/* Set socket_timeout */ +if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv)) < 0) { +fprintf(stderr, "failed to set SO_RCVTIMEO\n"); +} +if (setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, + &tv, sizeof(tv)) < 0) { +fprintf(stderr, "fialed to set SO_SNDTIMEO\n"); +} +} + static void tcp_chr_accept(void *opaque) { CharDriverState *chr = opaque; diff --git a/qemu_socket.h b/qemu_socket.h index 7ee46ac..8eae465 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -35,6 +35,10 @@ int inet_aton(const char *cp, struct in_addr *ia); int qemu_socket(int domain, int type, int protocol); int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen); void socket_set_nonblock(int fd); +void socket_set_block(int fd); +void socket_set_nodelay(int fd); +void socket_set_delay(int fd); +void socket_set_timeout(int fd, int s); int send_all(int fd, const void *buf, int len1); /* New, ipv6-ready socket helper functions, see qemu-sockets.c */ -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 18/23] Call event_tap_replay() at vm_start().
Call event_tap_replay() at vm_start() to replay the last ioport/mmio event upon failover. Signed-off-by: Yoshiaki Tamura --- vl.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/vl.c b/vl.c index 56d12c7..762440d 100644 --- a/vl.c +++ b/vl.c @@ -3094,6 +3094,7 @@ void vm_start(void) vm_state_notify(1, 0); qemu_rearm_alarm_timer(alarm_timer); resume_all_vcpus(); +event_tap_replay(); } } -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 20/23] Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.
When ft_mode is set in the header, tcp_accept_incoming_migration() receives ft_transaction iteratively. We also need a hack no to close fd before moving to ft_transaction mode, so that we can reuse the fd for it. Signed-off-by: Yoshiaki Tamura --- migration-tcp.c | 36 +++- 1 files changed, 35 insertions(+), 1 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index 767a2f1..a5d9b6d 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -18,6 +18,7 @@ #include "sysemu.h" #include "buffered_file.h" #include "block.h" +#include "ft_transaction.h" //#define DEBUG_MIGRATION_TCP @@ -55,7 +56,8 @@ static int socket_read(FdMigrationState *s, const void * buf, size_t size) static int tcp_close(FdMigrationState *s) { DPRINTF("tcp_close\n"); -if (s->fd != -1) { +/* FIX ME: accessing ft_mode here isn't clean */ +if (s->fd != -1 && ft_mode != FT_INIT) { close(s->fd); s->fd = -1; } @@ -181,6 +183,38 @@ static void tcp_accept_incoming_migration(void *opaque) fprintf(stderr, "load of migration failed\n"); goto out_fopen; } + +/* ft_mode is set by qemu_loadvm_state(). */ +if (ft_mode == FT_INIT) { +/* close normal QEMUFile first before reusing connection. */ +qemu_fclose(f); +socket_set_nodelay(c); +socket_set_timeout(c, 5); +/* don't autostart to avoid split brain. */ +autostart = 0; + +f = qemu_fopen_transaction(c); +if (f == NULL) { +fprintf(stderr, "could not qemu_fopen transaction\n"); +goto out; +} + +/* need to wait sender to setup. */ +if (qemu_transaction_begin(f) < 0) { +goto out_fopen; +} + +/* loop until transaction breaks */ +while ((ft_mode != FT_OFF) && (ret == 0)) { +ret = qemu_loadvm_state(f, 1); +} + +/* if migrate_cancel was called at the sender */ +if (ft_mode == FT_OFF) { +goto out_fopen; +} +} + qemu_announce_self(); DPRINTF("successfully loaded vm state\n"); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 13/23] Introduce event-tap.
event-tap controls when to start ft transaction, and inserts callbacks to the net/block. Signed-off-by: Yoshiaki Tamura --- Makefile.target |1 + event-tap.c | 184 +++ event-tap.h | 32 ++ 3 files changed, 217 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h diff --git a/Makefile.target b/Makefile.target index 82caf20..a49b21f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -188,6 +188,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o # MSI-X depends on kvm for interrupt injection, # so moved it from Makefile.objs to Makefile.target for now obj-y += msix.o +obj-y += event-tap.o obj-$(CONFIG_ISA_MMIO) += isa_mmio.o LIBS+=-lz diff --git a/event-tap.c b/event-tap.c new file mode 100644 index 000..5d3a338 --- /dev/null +++ b/event-tap.c @@ -0,0 +1,184 @@ +/* + * Event Tap functions for QEMU + * + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include "block.h" +#include "ioport.h" +#include "osdep.h" +#include "hw/hw.h" +#include "net/queue.h" +#include "event-tap.h" + +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF; + +typedef struct EventTapIOport { +uint32_t address; +uint32_t data; +int index; +} EventTapIOport; + +#define MMIO_BUF_SIZE 8 + +typedef struct EventTapMMIO { +uint64_t address; +uint8_t buf[MMIO_BUF_SIZE]; +int len; +} EventTapMMIO; + +#define EVENT_TAP_IOPORT 1 +#define EVENT_TAP_MMIO 2 + +typedef struct EventTapLog { +int mode; +union { +EventTapIOport ioport ; +EventTapMMIO mmio; +}; +} EventTapLog; + +static EventTapLog last_event_tap; + +int event_tap_register(int (*cb)(void)) +{ +if (cb == NULL || event_tap_state != EVENT_TAP_OFF) +return -1; + +bdrv_event_tap_register(cb); +qemu_net_event_tap_register(cb); +event_tap_state = EVENT_TAP_ON; + +return 0; +} + +int event_tap_unregister(void) +{ +if (event_tap_state == EVENT_TAP_OFF) +return -1; + +bdrv_event_tap_unregister(); +qemu_net_event_tap_unregister(); +event_tap_state = EVENT_TAP_OFF; + +return 0; +} + +void event_tap_suspend(void) +{ +if (event_tap_state == EVENT_TAP_ON) +event_tap_state = EVENT_TAP_SUSPEND; +} + +void event_tap_resume(void) +{ +if (event_tap_state == EVENT_TAP_SUSPEND) +event_tap_state = EVENT_TAP_ON; +} + +int event_tap_get_state(void) +{ +return event_tap_state; +} + +void event_tap_ioport(int index, uint32_t address, uint32_t data) +{ +if (event_tap_state != EVENT_TAP_ON) { +return; +} + +last_event_tap.mode = EVENT_TAP_IOPORT; +last_event_tap.ioport.index = index; +last_event_tap.ioport.address = address; +last_event_tap.ioport.data = data; +} + +void event_tap_mmio(uint64_t address, uint8_t *buf, int len) +{ +if (event_tap_state != EVENT_TAP_ON || len > MMIO_BUF_SIZE) { +return; +} + +last_event_tap.mode = EVENT_TAP_MMIO; +last_event_tap.mmio.address = address; +last_event_tap.mmio.len = len; +memcpy(last_event_tap.mmio.buf, buf, len); +} + +static void event_tap_reset(void) +{ +memset(&last_event_tap, 0, sizeof(last_event_tap)); +} + +void event_tap_replay(void) +{ +if (event_tap_state != EVENT_TAP_REPLAY) { +return; +} + +switch (last_event_tap.mode) { +case EVENT_TAP_IOPORT: +switch (last_event_tap.ioport.index) { +case 0: +cpu_outb(last_event_tap.ioport.address, last_event_tap.ioport.data); +break; +case 1: +cpu_outw(last_event_tap.ioport.address, last_event_tap.ioport.data); +break; +case 2: +cpu_outl(last_event_tap.ioport.address, last_event_tap.ioport.data); +break; +} +event_tap_reset(); +break; +case EVENT_TAP_MMIO: +cpu_physical_memory_rw(last_event_tap.mmio.address, + last_event_tap.mmio.buf, + last_event_tap.mmio.len, 1); +event_tap_reset(); +break; +} +} + +static void event_tap_save(QEMUFile *f, void *opaque) +{ +qemu_put_byte(f, last_event_tap.mode); + +if (last_event_tap.mode == EVENT_TAP_IOPORT) { +qemu_put_be32(f, last_event_tap.ioport.index); +qemu_put_be32(f, last_event_tap.ioport.address); +qemu_put_byte(f, last_event_tap.ioport.data); +} else { +qemu_put_be64(f, last_event_tap.mmio.address); +qemu_put_byte(f, last_event_tap.mmio.len); +qemu_put_buffer(f, last_event_tap.mmio.buf, last_event_tap.mmio.len); +} +} + +static int event_tap_load(QEMUFile *f, void *opaque, int version_id) +{ +last_event_tap.mode = qemu_get_byte(f); + +if (last_event_tap.mode == EVENT_TAP_IOPORT
[Qemu-devel] [RFC PATCH 11/23] Introduce qemu_savevm_state_all().
Introduce qemu_savevm_state_all() to send the memory and device info together, while avoiding cancelling memory state tracking. Signed-off-by: Yoshiaki Tamura --- savevm.c | 60 sysemu.h |1 + 2 files changed, 61 insertions(+), 0 deletions(-) diff --git a/savevm.c b/savevm.c index 81cb711..25ccbb8 100644 --- a/savevm.c +++ b/savevm.c @@ -1468,6 +1468,66 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) return 0; } +int qemu_savevm_state_all(Monitor *mon, QEMUFile *f) +{ +SaveStateEntry *se; + +QTAILQ_FOREACH(se, &savevm_handlers, entry) { +int len; + +if (se->save_live_state == NULL) +continue; + +/* Section type */ +qemu_put_byte(f, QEMU_VM_SECTION_START); +qemu_put_be32(f, se->section_id); + +/* ID string */ +len = strlen(se->idstr); +qemu_put_byte(f, len); +qemu_put_buffer(f, (uint8_t *)se->idstr, len); + +qemu_put_be32(f, se->instance_id); +qemu_put_be32(f, se->version_id); +if (ft_mode == FT_INIT) { +/* This is workaround. */ +se->save_live_state(mon, f, QEMU_VM_SECTION_START, se->opaque); +} else { +se->save_live_state(mon, f, QEMU_VM_SECTION_PART, se->opaque); +} +} + +ft_mode = FT_TRANSACTION; +QTAILQ_FOREACH(se, &savevm_handlers, entry) { +int len; + + if (se->save_state == NULL && se->vmsd == NULL) + continue; + +/* Section type */ +qemu_put_byte(f, QEMU_VM_SECTION_FULL); +qemu_put_be32(f, se->section_id); + +/* ID string */ +len = strlen(se->idstr); +qemu_put_byte(f, len); +qemu_put_buffer(f, (uint8_t *)se->idstr, len); + +qemu_put_be32(f, se->instance_id); +qemu_put_be32(f, se->version_id); + +vmstate_save(f, se); +} + +qemu_put_byte(f, QEMU_VM_EOF); + +if (qemu_file_has_error(f)) +return -EIO; + +return 0; +} + + void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f) { SaveStateEntry *se; diff --git a/sysemu.h b/sysemu.h index 6c1441f..df314bb 100644 --- a/sysemu.h +++ b/sysemu.h @@ -67,6 +67,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable, int shared); int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f); int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f); +int qemu_savevm_state_all(Monitor *mon, QEMUFile *f); void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f); int qemu_loadvm_state(QEMUFile *f, int skip_header); -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 17/23] Skip assert() when event_tap_state weren't EVENT_TAP_OFF.
Skip assert(!cpu_single_env) in resume_all_threads() when event_tap_state weren't EVENT_TAP_OFF. Signed-off-by: Yoshiaki Tamura --- qemu-kvm.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 1414f49..e28bf59 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -18,6 +18,7 @@ #include "compatfd.h" #include "gdbstub.h" #include "monitor.h" +#include "event-tap.h" #include "qemu-kvm.h" #include "libkvm.h" @@ -1770,7 +1771,8 @@ static void resume_all_threads(void) { CPUState *penv = first_cpu; -assert(!cpu_single_env); +if (event_tap_get_state() == EVENT_TAP_OFF) +assert(!cpu_single_env); while (penv) { penv->stop = 0; -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 00/23] Kemari for KVM v0.1.1
Hi, This patch series is a revised version of Kemari for KVM, which applied comments for the previous post. The current code is based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27. On the contrary to the previous version, this series doesn't require any modifications to KVM. The I/O events are caputured in net/block layer instead of device emulation layer. The transmission/transaction protocol, and most of the control logic is implemented in QEMU. We prepared a demonstration video again. This time the guest is Windows XP without virtio drivers. The demonstration scenario is, 1. Play with a guest VM (This guest has e1000 and ide) # The guest image should be a NFS/SAN. 2. Start incoming side with, -incoming ::,ft_mode 3. Start Kemari to synchronize the VM by running the following command in QEMU. Just add "-k" option to usual migrate command. migrate -d -k tcp:192.168.0.20: 3. Check the status by calling info migrate. 4. Go back to the VM to play the pinball. 5. Kill the the VM. (VNC client also disappears) 6. Press "c" to continue the VM on the other host. 7. Bring up the VNC client (Sorry, it pops outside of video capture.) 8. Confirm that the pinball works, then shutdown. http://www.osrg.net/kemari/download/kemari-kvm-winxp.mov The repository contains all patches we're sending with this message. For those who want to try, pull the following repository. git://kemari.git.sourceforge.net/gitroot/kemari/kemari The changes from v0.1 -> v0.1.1 are: - events are tapped in net/block layer instead of device emulation layer. - Introduce a new option for -incoming to accept FT transaction. - Removed writev() support to QEMUFile and FdMigrationState for now. I would post this work in a different series. - Modified virtio-blk save/load handler to send inuse variable to correctly replay. - Removed configure --enable-ft-mode. - Removed unnecessary check for qemu_realloc(). I hope people like this approach, and looking forward to suggestions/comments. Thanks, Yoshi Yoshiaki Tamura (23): Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty. Introduce cpu_physical_memory_get_dirty_range(). Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty. Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages. Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer(). Introduce read() to FdMigrationState. Introduce skip_header parameter to qemu_loadvm_state(). Introduce some socket util functions. Introduce fault tolerant VM transaction QEMUFile and ft_mode. Introduce util functions to control ft_transaction from savevm layer. Introduce qemu_savevm_state_all(). Insent event-tap callbacks to net/block layer. Introduce event-tap. Call init handler of event-tap at main(). Insert event_tap_ioport() to ioport_write(). Insert event_tap_mmio() to cpu_physical_memory_rw(). Skip assert() when event_tap_state weren't EVENT_TAP_OFF. Call event_tap_replay() at vm_start(). Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when ft_mode is on. Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled. virtio-blk: Modify save/load handler to handle inuse varialble. Introduce -k option to enable FT migration mode (Kemari). Add a parser to accept FT migration incoming mode. Makefile.objs|1 + Makefile.target |1 + block.c | 22 +++ block.h |4 + cpu-all.h| 134 - event-tap.c | 184 event-tap.h | 32 exec.c | 131 + ft_transaction.c | 418 ++ ft_transaction.h | 54 +++ hw/hw.h |7 + hw/virtio.c |8 +- ioport.c |2 + migration-exec.c |2 +- migration-fd.c |2 +- migration-tcp.c | 52 +++- migration-unix.c |2 +- migration.c | 110 ++- migration.h |3 + net/queue.c | 18 +++ net/queue.h |3 + osdep.c | 13 ++ qemu-char.c | 25 +++- qemu-kvm.c | 23 ++-- qemu-monitor.hx |7 +- qemu_socket.h|4 + savevm.c | 146 +-- sysemu.h |3 +- vl.c | 57 +--- 29 files changed, 1371 insertions(+), 97 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h create mode 100644 ft_transaction.c create mode 100644 ft_transaction.h
[Qemu-devel] [RFC PATCH 01/23] Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty.
Replaces byte-based phys_ram_dirty bitmap with four (MASTER, VGA, CODE, MIGRATION) bit-based phys_ram_dirty bitmap. On allocation, it sets all bits in the bitmap. It uses ffs() to convert DIRTY_FLAG to DIRTY_IDX. Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap. MASTER works as a buffer, and upon get_diry() or get_dirty_flags(), it calls cpu_physical_memory_sync_master() to update VGA and MIGRATION. Replaces direct phys_ram_dirty access with wrapper functions to prevent direct access to the phys_ram_dirty bitmap. Signed-off-by: Yoshiaki Tamura Signed-off-by: OHMURA Kei --- cpu-all.h | 130 + exec.c| 60 ++-- 2 files changed, 152 insertions(+), 38 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 51effc0..3f8762d 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -37,6 +37,9 @@ #include "softfloat.h" +/* to use ffs in flag_to_idx() */ +#include + #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN) #define BSWAP_NEEDED #endif @@ -846,7 +849,6 @@ int cpu_str_to_log_mask(const char *str); /* memory API */ extern int phys_ram_fd; -extern uint8_t *phys_ram_dirty; extern ram_addr_t ram_size; extern ram_addr_t last_ram_offset; extern uint8_t *bios_mem; @@ -869,28 +871,140 @@ extern uint8_t *bios_mem; /* Set if TLB entry is an IO callback. */ #define TLB_MMIO(1 << 5) +/* Use DIRTY_IDX as indexes of bit-based phys_ram_dirty. */ +#define MASTER_DIRTY_IDX0 +#define VGA_DIRTY_IDX 1 +#define CODE_DIRTY_IDX 2 +#define MIGRATION_DIRTY_IDX 3 +#define NUM_DIRTY_IDX 4 + +#define MASTER_DIRTY_FLAG(1 << MASTER_DIRTY_IDX) +#define VGA_DIRTY_FLAG (1 << VGA_DIRTY_IDX) +#define CODE_DIRTY_FLAG (1 << CODE_DIRTY_IDX) +#define MIGRATION_DIRTY_FLAG (1 << MIGRATION_DIRTY_IDX) + +extern unsigned long *phys_ram_dirty[NUM_DIRTY_IDX]; + +static inline int dirty_flag_to_idx(int flag) +{ +return ffs(flag) - 1; +} + +static inline int dirty_idx_to_flag(int idx) +{ +return 1 << idx; +} + int cpu_memory_rw_debug(CPUState *env, target_ulong addr, uint8_t *buf, int len, int is_write); -#define VGA_DIRTY_FLAG 0x01 -#define CODE_DIRTY_FLAG 0x02 -#define MIGRATION_DIRTY_FLAG 0x08 - /* read dirty bit (return 0 or 1) */ static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) { -return phys_ram_dirty[addr >> TARGET_PAGE_BITS] == 0xff; +unsigned long mask; +ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1); + +mask = 1UL << offset; +return (phys_ram_dirty[MASTER_DIRTY_IDX][index] & mask) == mask; +} + +static inline void cpu_physical_memory_sync_master(ram_addr_t index) +{ +if (phys_ram_dirty[MASTER_DIRTY_IDX][index]) { +phys_ram_dirty[VGA_DIRTY_IDX][index] +|= phys_ram_dirty[MASTER_DIRTY_IDX][index]; +phys_ram_dirty[MIGRATION_DIRTY_IDX][index] +|= phys_ram_dirty[MASTER_DIRTY_IDX][index]; +phys_ram_dirty[MASTER_DIRTY_IDX][index] = 0UL; +} +} + +static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) +{ +unsigned long mask; +ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1); +int ret = 0, i; + +mask = 1UL << offset; +cpu_physical_memory_sync_master(index); + +for (i = VGA_DIRTY_IDX; i <= MIGRATION_DIRTY_IDX; i++) { +if (phys_ram_dirty[i][index] & mask) { +ret |= dirty_idx_to_flag(i); +} +} + +return ret; +} + +static inline int cpu_physical_memory_get_dirty_idx(ram_addr_t addr, +int dirty_idx) +{ +unsigned long mask; +ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1); + +mask = 1UL << offset; +cpu_physical_memory_sync_master(index); +return (phys_ram_dirty[dirty_idx][index] & mask) == mask; } static inline int cpu_physical_memory_get_dirty(ram_addr_t addr, int dirty_flags) { -return phys_ram_dirty[addr >> TARGET_PAGE_BITS] & dirty_flags; +return cpu_physical_memory_get_dirty_idx(addr, + dirty_flag_to_idx(dirty_flags)); } static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) { -phys_ram_dirty[addr >> TARGET_PAGE_BITS] = 0xff; +unsigned long mask; +ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1); + +mask = 1UL << offset; +phys_ram_dirty[MASTER_DIRTY_IDX][index] |= mask; +} + +static inline void cpu_physical_memory_set_dirty_range(ram_addr_t addr, +
[Qemu-devel] [RFC PATCH 05/23] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().
Currently buf size is fixed at 32KB. It would be useful if it could be flexible. Signed-off-by: Yoshiaki Tamura --- hw/hw.h |2 ++ savevm.c | 21 - 2 files changed, 22 insertions(+), 1 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index 05131a0..fc9ed29 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -61,6 +61,8 @@ void qemu_fflush(QEMUFile *f); int qemu_fclose(QEMUFile *f); void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size); void qemu_put_byte(QEMUFile *f, int v); +void *qemu_realloc_buffer(QEMUFile *f, int size); +void qemu_clear_buffer(QEMUFile *f); static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v) { diff --git a/savevm.c b/savevm.c index 2fd3de6..b9bb9f4 100644 --- a/savevm.c +++ b/savevm.c @@ -174,7 +174,8 @@ struct QEMUFile { when reading */ int buf_index; int buf_size; /* 0 when writing */ -uint8_t buf[IO_BUF_SIZE]; +int buf_max_size; +uint8_t *buf; int has_error; }; @@ -424,6 +425,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, f->get_rate_limit = get_rate_limit; f->is_write = 0; +f->buf_max_size = IO_BUF_SIZE; +f->buf = qemu_mallocz(sizeof(uint8_t) * f->buf_max_size); + return f; } @@ -454,6 +458,20 @@ void qemu_fflush(QEMUFile *f) } } +void *qemu_realloc_buffer(QEMUFile *f, int size) +{ +f->buf_max_size = size; +f->buf = qemu_realloc(f->buf, f->buf_max_size); + +return f->buf; +} + +void qemu_clear_buffer(QEMUFile *f) +{ +f->buf_size = f->buf_index = f->buf_offset = 0; +memset(f->buf, 0, f->buf_max_size); +} + static void qemu_fill_buffer(QEMUFile *f) { int len; @@ -479,6 +497,7 @@ int qemu_fclose(QEMUFile *f) qemu_fflush(f); if (f->close) ret = f->close(f->opaque); +qemu_free(f->buf); qemu_free(f); return ret; } -- 1.7.0.31.g1df487
[Qemu-devel] [RFC PATCH 00/23] Kemari for KVM v0.1.1
Hi, This patch series is a revised version of Kemari for KVM, which applied comments for the previous post. The current code is based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27. On the contrary to the previous version, this series doesn't require any modifications to KVM. The I/O events are caputured in net/block layer instead of device emulation layer. The transmission/transaction protocol, and most of the control logic is implemented in QEMU. We prepared a demonstration video again. This time the guest is Windows XP without virtio drivers. The demonstration scenario is, 1. Play with a guest VM (This guest has e1000 and ide) # The guest image should be a NFS/SAN. 2. Start incoming side with, -incoming ::,ft_mode 3. Start Kemari to synchronize the VM by running the following command in QEMU. Just add "-k" option to usual migrate command. migrate -d -k tcp:192.168.0.20: 3. Check the status by calling info migrate. 4. Go back to the VM to play the pinball. 5. Kill the the VM. (VNC client also disappears) 6. Press "c" to continue the VM on the other host. 7. Bring up the VNC client (Sorry, it pops outside of video capture.) 8. Confirm that the pinball works, then shutdown. http://www.osrg.net/kemari/download/kemari-kvm-winxp.mov The repository contains all patches we're sending with this message. For those who want to try, please pull the following repository. git://kemari.git.sourceforge.net/gitroot/kemari/kemari The changes from v0.1 -> v0.1.1 are: - events are tapped in net/block layer instead of device emulation layer. - Introduce a new option for -incoming to accept FT transaction. - Removed writev() support to QEMUFile and FdMigrationState for now. I would post this work in a different series. - Modified virtio-blk save/load handler to send inuse variable to correctly replay. - Removed configure --enable-ft-mode. - Removed unnecessary check for qemu_realloc(). I hope people like this approach, and looking forward to suggestions/comments. Thanks, Yoshi Yoshiaki Tamura (23): Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty. Introduce cpu_physical_memory_get_dirty_range(). Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty. Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages. Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer(). Introduce read() to FdMigrationState. Introduce skip_header parameter to qemu_loadvm_state(). Introduce some socket util functions. Introduce fault tolerant VM transaction QEMUFile and ft_mode. Introduce util functions to control ft_transaction from savevm layer. Introduce qemu_savevm_state_all(). Insent event-tap callbacks to net/block layer. Introduce event-tap. Call init handler of event-tap at main(). Insert event_tap_ioport() to ioport_write(). Insert event_tap_mmio() to cpu_physical_memory_rw(). Skip assert() when event_tap_state weren't EVENT_TAP_OFF. Call event_tap_replay() at vm_start(). Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when ft_mode is on. Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled. virtio-blk: Modify save/load handler to handle inuse varialble. Introduce -k option to enable FT migration mode (Kemari). Add a parser to accept FT migration incoming mode. Makefile.objs|1 + Makefile.target |1 + block.c | 22 +++ block.h |4 + cpu-all.h| 134 - event-tap.c | 184 event-tap.h | 32 exec.c | 131 + ft_transaction.c | 418 ++ ft_transaction.h | 54 +++ hw/hw.h |7 + hw/virtio.c |8 +- ioport.c |2 + migration-exec.c |2 +- migration-fd.c |2 +- migration-tcp.c | 52 +++- migration-unix.c |2 +- migration.c | 110 ++- migration.h |3 + net/queue.c | 18 +++ net/queue.h |3 + osdep.c | 13 ++ qemu-char.c | 25 +++- qemu-kvm.c | 23 ++-- qemu-monitor.hx |7 +- qemu_socket.h|4 + savevm.c | 146 +-- sysemu.h |3 +- vl.c | 57 +--- 29 files changed, 1371 insertions(+), 97 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h create mode 100644 ft_transaction.c create mode 100644 ft_transaction.h
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/24/2010 10:19 PM, Anthony Liguori wrote: On 05/24/2010 06:03 AM, Avi Kivity wrote: On 05/24/2010 11:27 AM, Stefan Hajnoczi wrote: On Sun, May 23, 2010 at 1:01 PM, Avi Kivity wrote: On 05/21/2010 12:29 AM, Anthony Liguori wrote: I'd be more interested in enabling people to build these types of storage systems without touching qemu. Both sheepdog and ceph ultimately transmit I/O over a socket to a central daemon, right? That incurs an extra copy. Besides a shared memory approach, I wonder if the splice() family of syscalls could be used to send/receive data through a storage daemon without the daemon looking at or copying the data? Excellent idea. splice() eventually requires a copy. You cannot splice() to linux-aio so you'd have to splice() to a temporary buffer and then call into linux-aio. With shared memory, you can avoid ever bringing the data into memory via O_DIRECT and linux-aio. If the final destination is a socket, then you end up queuing guest memory as an skbuff. In theory we could do an aio splice to block devices but I don't think that's realistic given our experience with aio changes. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/24/2010 10:16 PM, Anthony Liguori wrote: On 05/24/2010 06:56 AM, Avi Kivity wrote: On 05/24/2010 02:42 PM, MORITA Kazutaka wrote: The server would be local and talk over a unix domain socket, perhaps anonymous. nbd has other issues though, such as requiring a copy and no support for metadata operations such as snapshot and file size extension. Sorry, my explanation was unclear. I'm not sure how running servers on localhost can solve the problem. The local server can convert from the local (nbd) protocol to the remote (sheepdog, ceph) protocol. What I wanted to say was that we cannot specify the image of VM. With nbd protocol, command line arguments are as follows: $ qemu nbd:hostname:port As this syntax shows, with nbd protocol the client cannot pass the VM image name to the server. We would extend it to allow it to connect to a unix domain socket: qemu nbd:unix:/path/to/socket nbd is a no-go because it only supports a single, synchronous I/O operation at a time and has no mechanism for extensibility. If we go this route, I think two options are worth considering. The first would be a purely socket based approach where we just accepted the extra copy. The other potential approach would be shared memory based. We export all guest ram as shared memory along with a small bounce buffer pool. We would then use a ring queue (potentially even using virtio-blk) and an eventfd for notification. We can't actually export guest memory unless we allocate it as a shared memory object, which has many disadvantages. The only way to export anonymous memory now is vmsplice(), which is fairly limited. The server at the other end would associate the socket with a filename and forward it to the server using the remote protocol. However, I don't think nbd would be a good protocol. My preference would be for a plugin API, or for a new local protocol that uses splice() to avoid copies. I think a good shared memory implementation would be preferable to plugins. I think it's worth attempting to do a plugin interface for the block layer but I strongly suspect it would not be sufficient. I would not want to see plugins that interacted with BlockDriverState directly, for instance. We change it far too often. Our main loop functions are also not terribly stable so I'm not sure how we would handle that (unless we forced all block plugins to be in a separate thread). If we manage to make a good long-term stable plugin API, it would be a good candidate for the block layer itself. Some OSes manage to have a stable block driver ABI, so it should be possible, if difficult. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH] Release usb devices on shutdown and usb_del command
On 05/21/10 19:55, Shahar Havivi wrote: Remove usb_host_device_release and using usb_host_close to handle usb_del command. Gerd, What do you think about the usb_cleanup()? We need a mechanism to handle this for sure. I don't like that usb-specific approach very much though. I think we should either do that at qdev level, then at exit walk the whole device tree and call cleanup functions (if present). So every device has the chance to do cleanups when needed. Or we could have a exit notifier, which can be used for device (and also other) cleanup work. I tend to think that a exit notifier will be better. We probably have only a few devices which actually have to do some cleanup work (usb passthrough, maybe pci passthrough too), so building qdev infrastructure for that feels a bit like overkill. And exit notifiers are more generic, i.e. it will also work for non-device stuff. cheers, Gerd
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
> for (i = 0; i < 24; i++) { > sysbus_connect_irq(sysbus_from_qdev(hpet), i, isa_irq[i]); > } > +rtc_irq = qemu_allocate_feedback_irqs(hpet_handle_rtc_irq, > + sysbus_from_qdev(hpet), 1); > } This is wrong. The hpet device should expose this as an IO pin. Paul
Re: [Qemu-devel] [PATCH 1/5] trace: Add trace-events file for declaring trace events
On 05/25/2010 01:07 AM, Anthony Liguori wrote: Interesting approach as it lets us defer the tracing backend decision. Also, it's compatible with the multiplatform nature of qemu. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/24/2010 10:38 PM, Anthony Liguori wrote: - Building a plugin API seems a bit simpler to me, although I'm to sure if I'd get the idea correctly: The block layer has already some kind of api (.bdrv_file_open, .bdrv_read). We could simply compile the block-drivers as shared objects and create a method for loading the necessary modules at runtime. That approach would be a recipe for disaster. We would have to introduce a new, reduced functionality block API that was supported for plugins. Otherwise, the only way a plugin could keep up with our API changes would be if it was in tree which defeats the purpose of having plugins. We could guarantee API/ABI stability in a stable branch but not across releases. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFT][PATCH 05/15] hpet: Convert to qdev
> +static SysBusDeviceInfo hpet_device_info = { > +.qdev.name= "hpet", > +.qdev.size= sizeof(HPETState), > +.qdev.no_user = 1, Why shouldn't the user create HPET devices? I thought you'd removed all the global state. Paul
[Qemu-devel] [RFC PATCH 02/23] Introduce cpu_physical_memory_get_dirty_range().
It checks the first row and puts dirty addr in the array. If the first row is empty, it skips to the first non-dirty row or the end addr, and put the length in the first entry of the array. Signed-off-by: Yoshiaki Tamura Signed-off-by: OHMURA Kei --- cpu-all.h |4 +++ exec.c| 67 + 2 files changed, 71 insertions(+), 0 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 3f8762d..27187d4 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -1007,6 +1007,10 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start, } } +int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, +ram_addr_t *dirty_rams, int length, +int dirty_flags); + void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, int dirty_flags); void cpu_tlb_update_dirty(CPUState *env); diff --git a/exec.c b/exec.c index bf8d703..d5c2a05 100644 --- a/exec.c +++ b/exec.c @@ -1962,6 +1962,73 @@ static inline void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, } } +/* It checks the first row and puts dirty addrs in the array. + If the first row is empty, it skips to the first non-dirty row + or the end addr, and put the length in the first entry of the array. */ +int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, +ram_addr_t *dirty_rams, int length, +int dirty_flag) +{ +unsigned long p = 0, page_number; +ram_addr_t addr; +ram_addr_t s_idx = (start >> TARGET_PAGE_BITS) / HOST_LONG_BITS; +ram_addr_t e_idx = (end >> TARGET_PAGE_BITS) / HOST_LONG_BITS; +int i, j, offset, dirty_idx = dirty_flag_to_idx(dirty_flag); + +/* mask bits before the start addr */ +offset = (start >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1); +cpu_physical_memory_sync_master(s_idx); +p |= phys_ram_dirty[dirty_idx][s_idx] & ~((1UL << offset) - 1); + +if (s_idx == e_idx) { +/* mask bits after the end addr */ +offset = (end >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1); +p &= (1UL << offset) - 1; +} + +if (p == 0) { +/* when the row is empty */ +ram_addr_t skip; +if (s_idx == e_idx) { +skip = end; + } else { +/* skip empty rows */ +while (s_idx < e_idx) { +s_idx++; +cpu_physical_memory_sync_master(s_idx); + +if (phys_ram_dirty[dirty_idx][s_idx] != 0) { +break; +} +} +skip = (s_idx * HOST_LONG_BITS * TARGET_PAGE_SIZE); +} +dirty_rams[0] = skip - start; +i = 0; + +} else if (p == ~0UL) { +/* when the row is fully dirtied */ +addr = start; +for (i = 0; i < length; i++) { +dirty_rams[i] = addr; +addr += TARGET_PAGE_SIZE; +} +} else { +/* when the row is partially dirtied */ +i = 0; +do { +j = ffsl(p) - 1; +p &= ~(1UL << j); +page_number = s_idx * HOST_LONG_BITS + j; +addr = page_number * TARGET_PAGE_SIZE; +dirty_rams[i] = addr; +i++; +} while (p != 0 && i < length); +} + +return i; +} + /* Note: start and end must be within the same ram block. */ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, int dirty_flags) -- 1.7.0.31.g1df487
[Qemu-devel] Re: [[RfC PATCH]] linux fbdev display driver prototype.
On Tue, 25 May 2010, Gerd Hoffmann wrote: > The actual stretching is done by SDL I think. For that kind of stuff a > rendering library is actually helpful ... > not really, the sdl_zoom* stuff is completely generic
Re: [Qemu-devel] [PATCH 1/2] ioport: add function to check whenever a port is assigned or not
On 05/24/10 14:32, Paul Brook wrote: +int is_ioport_assigned(pio_addr_t addr) Shouldn't we move this into register_ioport_{read,write}, and have that fail if the port has already been assigned? It already checks and fails with hw_error(). Problem with that is that this kills qemu in case you try to pci hot-plug a vga card. So I've added a way to check before-hand, so we can fail gracefully in the few places where we need it (see second patch of the series). cheers, Gerd
Re: [Qemu-devel] linux-user mmap bug
On Mon, May 24, 2010 at 08:45:31AM -0700, Richard Henderson wrote: > On 05/24/2010 07:57 AM, Edgar E. Iglesias wrote: > > I took a look at the code again and I dont really understand how the > > particular case when we get a high address from the kernel while > > mmap_min_addr is busy case is supposed to work :/ > > In fact, for CRIS it never works on my host. > > Indeed, there are many cases for which it doesn't work for the Alpha > target either. Ye, what puzzled me was that if I am not completely senile, CRIS apps used to emulate on my x86_64 host not so long ago :) > > I changed it locally to keep scanning after a wrap until we succeed to > > allocate a chunk or rewrap (SLOW) but at least I can run dynamically > > linked CRIS programs again. > > Yep. My hack had been similar, except that I used the PageDesc tree > to help speed things up. But PageDesc is hardly an ideal data structure > in which to search, since it quickly devolves into a linear search of > the address space. > > Probably the easiest real fix is to re-read /proc/self/maps each time > the mmap_next_start guess fails and the kernel's returned address is > out of range. > > Another is using the MMAP_32BIT flag on x86-64 host whenever a 31-bit > address is appropriate for the guest. E.g. mips32, where architecturally > the high half of the address space is reserved for kernel mode. MAP_32BIT sounds good as long as guest_base is not used. When used I guess we'd need to fallback to something else anyway.. Maybe these issues are something too look more at during the bug day? :) In the meantime, I've patched the cris git to use the MAP_32BIT and to fallback to a super ugly and slow linear scan.. Thanks again for the help, Cheers > See > http://www.mail-archive.com/qemu-devel@nongnu.org/msg28924.html > for more ideas on the subject. > > > > r~
[Qemu-devel] [RFC PATCH 09/23] Introduce fault tolerant VM transaction QEMUFile and ft_mode.
This code implements VM transaction protocol. Like buffered_file, it sits between savevm and migration layer. With this architecture, VM transaction protocol is implemented mostly independent from other existing code. Signed-off-by: Yoshiaki Tamura Signed-off-by: OHMURA Kei --- Makefile.objs|1 + ft_transaction.c | 418 ++ ft_transaction.h | 54 +++ migration.c |3 + 4 files changed, 476 insertions(+), 0 deletions(-) create mode 100644 ft_transaction.c create mode 100644 ft_transaction.h diff --git a/Makefile.objs b/Makefile.objs index b73e2cb..4388fb3 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -78,6 +78,7 @@ common-obj-y += qemu-char.o savevm.o #aio.o common-obj-y += msmouse.o ps2.o common-obj-y += qdev.o qdev-properties.o common-obj-y += qemu-config.o block-migration.o +common-obj-y += ft_transaction.o common-obj-$(CONFIG_BRLAPI) += baum.o common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o diff --git a/ft_transaction.c b/ft_transaction.c new file mode 100644 index 000..92dc681 --- /dev/null +++ b/ft_transaction.c @@ -0,0 +1,418 @@ +/* + * Fault tolerant VM transaction QEMUFile + * + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * This source code is based on buffered_file.c. + * Copyright IBM, Corp. 2008 + * Authors: + * Anthony Liguori + */ + +#include "qemu-common.h" +#include "hw/hw.h" +#include "qemu-timer.h" +#include "sysemu.h" +#include "qemu-char.h" +#include "ft_transaction.h" + +// #define DEBUG_FT_TRANSACTION + +typedef struct QEMUFileFtTranx +{ +FtTranxPutBufferFunc *put_buffer; +FtTranxGetBufferFunc *get_buffer; +FtTranxCloseFunc *close; +void *opaque; +QEMUFile *file; +int has_error; +int is_sender; +int buf_max_size; +enum QEMU_VM_TRANSACTION_STATE tranx_state; +uint16_t tranx_id; +uint32_t seq; +} QEMUFileFtTranx; + +#define IO_BUF_SIZE 32768 + +#ifdef DEBUG_FT_TRANSACTION +#define dprintf(fmt, ...) \ +do { printf("ft_transaction: " fmt, ## __VA_ARGS__); } while (0) +#else +#define dprintf(fmt, ...) \ +do { } while (0) +#endif + +static ssize_t ft_tranx_flush_buffer(void *opaque, void *buf, int size) +{ +QEMUFileFtTranx *s = opaque; +size_t offset = 0; +ssize_t len; + +while (offset < size) { +len = s->put_buffer(s->opaque, (uint8_t *)buf + offset, size - offset); + +if (len <= 0) { +fprintf(stderr, "ft transaction flush buffer failed \n"); +s->has_error = 1; +offset = -EINVAL; +break; +} + +offset += len; +} + +return offset; +} + +static int ft_tranx_send_header(QEMUFileFtTranx *s) +{ +int ret = -1; + +dprintf("send header %d\n", s->tranx_state); + +ret = ft_tranx_flush_buffer(s, &s->tranx_state, sizeof(uint16_t)); +if (ret < 0) { +goto out; +} +ret = ft_tranx_flush_buffer(s, &s->tranx_id, sizeof(uint16_t)); + +out: +return ret; +} + +static int ft_tranx_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) +{ +QEMUFileFtTranx *s = opaque; +ssize_t ret = -1; + +if (s->has_error) { +fprintf(stderr, "flush when error, bailing\n"); +return -EINVAL; +} + +ret = ft_tranx_send_header(s); +if (ret < 0) { +goto out; +} + +ret = ft_tranx_flush_buffer(s, &s->seq, sizeof(s->seq)); +if (ret < 0) { +goto out; +} +s->seq++; + +ret = ft_tranx_flush_buffer(s, &size, sizeof(uint32_t)); +if (ret < 0) { +goto out; +} + +ret = ft_tranx_flush_buffer(s, (uint8_t *)buf, size); + +out: +return ret; +} + +#if 0 +static int ft_tranx_put_vector(void *opaque, struct iovec *vector, int64_t pos, int count) +{ +QEMUFileFtTranx *s = opaque; +ssize_t ret = -1; +int i; +uint32_t size = 0; + +dprintf("putting %d vectors at %" PRId64 "\n", count, pos); + +if (s->has_error) { +dprintf("put vector when error, bailing\n"); +return -EINVAL; +} + +ret = ft_tranx_send_header(s); +if (ret < 0) { +return ret; +} + +ret = ft_tranx_flush_buffer(s, &s->seq, sizeof(s->seq)); +if (ret < 0) { +return ret; +} +s->seq++; + +for (i = 0; i < count; i++) +size += vector[i].iov_len; + +ret = ft_tranx_flush_buffer(s, &size, sizeof(uint32_t)); +if (ret < 0) { +return ret; +} + +while (count > 0) { +/* + * It will continue calling put_vector even if count > IOV_MAX. + */ +ret = s->put_vector(s->opaque, vector, +((count>IOV_MAX)?IOV_MAX:count)); + +if (ret <= 0) { +fprintf(stderr, "ft transaction putting vector\n"); +s->has_error = 1; +ret
[Qemu-devel] [Bug 494486] Re: cirrus_vga display is buggy after migration
Fixed by ae6b2c4ed956c17456e70efefe13ad0ab7db31de ** Changed in: qemu Status: New => Fix Committed -- cirrus_vga display is buggy after migration https://bugs.launchpad.net/bugs/494486 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: Fix Committed Bug description: [ Bug also reported on qemu-devel on November 24, 2009 ] After migrating a VM (running Debian Lenny 32-bit) using text consoles (default Debian configuration, no framebuffer I think), the VM is responsive to keyboard input but doesn't display new characters: only the cursor moves. Otherwise the machine seems to run fine: I can log in and see the cursor moving at the prompt position for example. Reverting the hw/cirrus_vga.c part of 7e72abc382b700a72549e8147bdea413534eeedc resolves the problem. Origin host is Debian Lenny 32-bits, destination host is Fedora 12 32-bit.
[Qemu-devel] Re: KVM call agenda for May 18
On 05/19/2010 11:20 AM, Christoph Hellwig wrote: It's time we get a proper bugzilla.qemu.org for both qemu and qemu-kvm that can be used sanely. If you ask nicely you might even get a virtual instance of bugzilla.kernel.org which works quite nicely. That would be my preference too but there's a limit to how much we can juggle the bug database around. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH 0/6] Make hpet a compile time option
Paolo Bonzini wrote: > On 05/24/2010 07:54 PM, Juan Quintela wrote: >> But for the other call, what do you propose? >> >> My best try was to hide the availability of hpet inside hpet_emul.h >> with: >> >> #ifdef CONFIG_HPET >> uint32_t hpet_in_legacy_mode(void); >> else >> uint32_t hpet_in_legacy_mode(void) { return 0;} >> #endif > > Change this to a global variable rtc_disable_interrupts in > hw/mc146818rtc.c? (You didn't say it would need to be particularly > pretty...). > > Not tested beyond compilation. > > Paolo > Please don't waste your time: http://permalink.gmane.org/gmane.comp.emulators.qemu/71377 Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] Re: [PATCH 0/6] Make hpet a compile time option
On 05/25/2010 11:05 AM, Jan Kiszka wrote: Please don't waste your time: http://permalink.gmane.org/gmane.comp.emulators.qemu/71377 I wasn't going to. :-) I had seen the series---very nice work! Paolo
Re: [Qemu-devel] [RFT][PATCH 05/15] hpet: Convert to qdev
Paul Brook wrote: >> +static SysBusDeviceInfo hpet_device_info = { >> +.qdev.name= "hpet", >> +.qdev.size= sizeof(HPETState), >> +.qdev.no_user = 1, > > Why shouldn't the user create HPET devices? I thought you'd removed all the > global state. Long-term, there is no reason to deny this. But the code is not yet ready for this: we statically instantiate it during PC setup to establish the routings and respect -no-hpat. Also, the BIOS isn't prepared for > 1 HPET. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
Paul Brook wrote: >> for (i = 0; i < 24; i++) { >> sysbus_connect_irq(sysbus_from_qdev(hpet), i, isa_irq[i]); >> } >> +rtc_irq = qemu_allocate_feedback_irqs(hpet_handle_rtc_irq, >> + sysbus_from_qdev(hpet), 1); >> } > > This is wrong. The hpet device should expose this as an IO pin. Will look into this. BTW, I just realized that the GPIO handling is apparently lacking support for attaching an output to multiple inputs. Or am I missing something? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH 2/7] trace: Support disabled events in trace-events
Sometimes it is useful to disable a trace event. Removing the event from trace-events is not enough since source code will call the trace_*() function for the event. This patch makes it easy to build without specific trace events by marking them disabled in trace-events: disable multiwrite_cb(void *mcb, int ret) "mcb %p ret %d" This builds without the multiwrite_cb trace event. Signed-off-by: Stefan Hajnoczi --- v2: * This patch is new in v2 trace-events |4 +++- tracetool| 10 -- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/trace-events b/trace-events index a37d3cc..5efaa86 100644 --- a/trace-events +++ b/trace-events @@ -12,10 +12,12 @@ # # Format of a trace event: # -# ( [, ] ...) "" +# [disable] ( [, ] ...) "" # # Example: qemu_malloc(size_t size) "size %zu" # +# The "disable" keyword will build without the trace event. +# # The must be a valid as a C function name. # # Types should be standard C types. Use void * for pointers because the trace diff --git a/tracetool b/tracetool index 766a9ba..53d3612 100755 --- a/tracetool +++ b/tracetool @@ -110,7 +110,7 @@ linetoc_end_nop() # Process stdin by calling begin, line, and end functions for the backend convert() { -local begin process_line end +local begin process_line end str disable begin="lineto$1_begin_$backend" process_line="lineto$1_$backend" end="lineto$1_end_$backend" @@ -123,8 +123,14 @@ convert() str=${str%%#*} test -z "$str" && continue +# Process the line. The nop backend handles disabled lines. +disable=${str%%disable*} echo -"$process_line" "$str" +if test -z "$disable"; then +"lineto$1_nop" "${str##disable}" +else +"$process_line" "$str" +fi done echo -- 1.7.1
[Qemu-devel] [PATCH 6/7] trace: Trace virtio-blk, multiwrite, and paio_submit
This patch adds trace events that make it possible to observe virtio-blk. Signed-off-by: Stefan Hajnoczi --- block.c|7 +++ hw/virtio-blk.c|7 +++ posix-aio-compat.c |2 ++ trace-events | 14 ++ 4 files changed, 30 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 0b0966c..56db112 100644 --- a/block.c +++ b/block.c @@ -23,6 +23,7 @@ */ #include "config-host.h" #include "qemu-common.h" +#include "trace.h" #include "monitor.h" #include "block_int.h" #include "module.h" @@ -1922,6 +1923,8 @@ static void multiwrite_cb(void *opaque, int ret) { MultiwriteCB *mcb = opaque; +trace_multiwrite_cb(mcb, ret); + if (ret < 0 && !mcb->error) { mcb->error = ret; multiwrite_user_cb(mcb); @@ -2065,6 +2068,8 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs) // Check for mergable requests num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb); +trace_bdrv_aio_multiwrite(mcb, mcb->num_callbacks, num_reqs); + // Run the aio requests for (i = 0; i < num_reqs; i++) { acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov, @@ -2075,9 +2080,11 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs) // submitted yet. Otherwise we'll wait for the submitted AIOs to // complete and report the error in the callback. if (mcb->num_requests == 0) { +trace_bdrv_aio_multiwrite_earlyfail(mcb); reqs[i].error = -EIO; goto fail; } else { +trace_bdrv_aio_multiwrite_latefail(mcb, i); mcb->num_requests++; multiwrite_cb(mcb, -EIO); break; diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 5d7f1a2..706f109 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -13,6 +13,7 @@ #include #include +#include "trace.h" #include "virtio-blk.h" #include "block_int.h" #ifdef __linux__ @@ -50,6 +51,8 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) { VirtIOBlock *s = req->dev; +trace_virtio_blk_req_complete(req, status); + req->in->status = status; virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in)); virtio_notify(&s->vdev, s->vq); @@ -87,6 +90,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret) { VirtIOBlockReq *req = opaque; +trace_virtio_blk_rw_complete(req, ret); + if (ret) { int is_read = !(req->out->type & VIRTIO_BLK_T_OUT); if (virtio_blk_handle_rw_error(req, -ret, is_read)) @@ -263,6 +268,8 @@ static void virtio_blk_handle_flush(BlockRequest *blkreq, int *num_writes, static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes, VirtIOBlockReq *req, BlockDriverState **old_bs) { +trace_virtio_blk_handle_write(req, req->out->sector, req->qiov.size / 512); + if (req->out->sector & req->dev->sector_mask) { virtio_blk_rw_complete(req, -EIO); return; diff --git a/posix-aio-compat.c b/posix-aio-compat.c index b43c531..c2200fe 100644 --- a/posix-aio-compat.c +++ b/posix-aio-compat.c @@ -25,6 +25,7 @@ #include "qemu-queue.h" #include "osdep.h" #include "qemu-common.h" +#include "trace.h" #include "block_int.h" #include "block/raw-posix-aio.h" @@ -583,6 +584,7 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd, acb->next = posix_aio_state->first_aio; posix_aio_state->first_aio = acb; +trace_paio_submit(acb, opaque, sector_num, nb_sectors, type); qemu_paio_submit(acb); return &acb->common; } diff --git a/trace-events b/trace-events index 3fde0c6..48415f8 100644 --- a/trace-events +++ b/trace-events @@ -34,3 +34,17 @@ qemu_free(void *ptr) "ptr %p" qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu ptr %p" qemu_valloc(size_t size, void *ptr) "size %zu ptr %p" qemu_vfree(void *ptr) "ptr %p" + +# block.c +multiwrite_cb(void *mcb, int ret) "mcb %p ret %d" +bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb %p num_callbacks %d num_reqs %d" +bdrv_aio_multiwrite_earlyfail(void *mcb) "mcb %p" +bdrv_aio_multiwrite_latefail(void *mcb, int i) "mcb %p i %d" + +# hw/virtio-blk.c +virtio_blk_req_complete(void *req, int status) "req %p status %d" +virtio_blk_rw_complete(void *req, int ret) "req %p ret %d" +virtio_blk_handle_write(void *req, unsigned long sector, unsigned long nsectors) "req %p sector %lu nsectors %lu" + +# posix-aio-compat.c +paio_submit(void *acb, void *opaque, unsigned long sector_num, unsigned long nb_sectors, unsigned long type) "acb %p opaque %p sector_num %lu nb_sectors %lu type %lu" -- 1.7.1
[Qemu-devel] [PATCH 7/7] trace: Trace virtqueue operations
This patch adds trace events for virtqueue operations including adding/removing buffers, notifying the guest, and receiving a notify from the guest. Signed-off-by: Stefan Hajnoczi --- v2: * This patch is new in v2 hw/virtio.c |8 trace-events |8 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index 4475bb3..a5741ae 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -13,6 +13,7 @@ #include +#include "trace.h" #include "virtio.h" #include "sysemu.h" @@ -205,6 +206,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, unsigned int offset; int i; +trace_virtqueue_fill(vq, elem, len, idx); + offset = 0; for (i = 0; i < elem->in_num; i++) { size_t size = MIN(len - offset, elem->in_sg[i].iov_len); @@ -232,6 +235,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) { /* Make sure buffer is written before we update index. */ wmb(); +trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); vq->inuse -= count; } @@ -422,6 +426,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) vq->inuse++; +trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num); return elem->in_num + elem->out_num; } @@ -560,6 +565,7 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n) void virtio_queue_notify(VirtIODevice *vdev, int n) { if (n < VIRTIO_PCI_QUEUE_MAX && vdev->vq[n].vring.desc) { +trace_virtio_queue_notify(vdev, n, &vdev->vq[n]); vdev->vq[n].handle_output(vdev, &vdev->vq[n]); } } @@ -597,6 +603,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void virtio_irq(VirtQueue *vq) { +trace_virtio_irq(vq); vq->vdev->isr |= 0x01; virtio_notify_vector(vq->vdev, vq->vector); } @@ -609,6 +616,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq) (vq->inuse || vring_avail_idx(vq) != vq->last_avail_idx))) return; +trace_virtio_notify(vdev, vq); vdev->isr |= 0x01; virtio_notify_vector(vdev, vq->vector); } diff --git a/trace-events b/trace-events index 48415f8..a533414 100644 --- a/trace-events +++ b/trace-events @@ -35,6 +35,14 @@ qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu qemu_valloc(size_t size, void *ptr) "size %zu ptr %p" qemu_vfree(void *ptr) "ptr %p" +# hw/virtio.c +virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u" +virtqueue_flush(void *vq, unsigned int count) "vq %p count %u" +virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) "vq %p elem %p in_num %u out_num %u" +virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p" +virtio_irq(void *vq) "vq %p" +virtio_notify(void *vdev, void *vq) "vdev %p vq %p" + # block.c multiwrite_cb(void *mcb, int ret) "mcb %p ret %d" bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb %p num_callbacks %d num_reqs %d" -- 1.7.1
[Qemu-devel] [PATCH v2 0/7] Tracing backends
After the RFC discussion, updated patches which I propose for review and merge: The following patches against qemu.git allow static trace events to be declared in QEMU. Trace events use a lightweight syntax and are independent of the backend tracing system (e.g. LTTng UST). Supported backends are: * my trivial tracer ("simple") * LTTng Userspace Tracer ("ust") * no tracer ("nop", the default) The ./configure option to choose a backend is --trace-backend=. Main point of this patchset: adding new trace events is easy and we can switch between backends without modifying the code. These patches are also available at: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/tracing v2: [PATCH 1/7] trace: Add trace-events file for declaring trace events * Use "$source_path/tracetool" in ./configure * Include qemu-common.h in trace.h so common types are available [PATCH 2/7] trace: Support disabled events in trace-events * New in v2: makes it easy to build only a subset of trace events [PATCH 3/7] trace: Add simple built-in tracing backend * Make simpletrace.py parse trace-events instead of generating Python [PATCH 4/7] trace: Add LTTng Userspace Tracer backend [PATCH 5/7] trace: Trace qemu_malloc() and qemu_vmalloc() * Record pointer result from allocation functions [PATCH 6/7] trace: Trace virtio-blk, multiwrite, and paio_submit [PATCH 7/7] trace: Trace virtqueue operations * New in v2: observe virtqueue buffer add/remove and notifies
[Qemu-devel] [PATCH 1/7] trace: Add trace-events file for declaring trace events
This patch introduces the trace-events file where trace events can be declared like so: qemu_malloc(size_t size) "size %zu" qemu_free(void *ptr) "ptr %p" These trace event declarations are processed by a new tool called tracetool to generate code for the trace events. Trace event declarations are independent of the backend tracing system (LTTng User Space Tracing, ftrace markers, DTrace). The default "nop" backend generates empty trace event functions. Therefore trace events are disabled by default. The trace-events file serves two purposes: 1. Adding trace events is easy. It is not necessary to understand the details of a backend tracing system. The trace-events file is a single location where trace events can be declared without code duplication. 2. QEMU is not tightly coupled to one particular backend tracing system. In order to support tracing across QEMU host platforms and to anticipate new backend tracing systems that are currently maturing, it is important to be flexible and not tied to one system. Signed-off-by: Stefan Hajnoczi --- v2: * Use "$source_path/tracetool" in ./configure * Include qemu-common.h in trace.h so common types are available .gitignore |2 + Makefile| 17 - Makefile.objs |5 ++ Makefile.target |1 + configure | 19 ++ trace-events| 24 tracetool | 165 +++ 7 files changed, 229 insertions(+), 4 deletions(-) create mode 100644 trace-events create mode 100755 tracetool diff --git a/.gitignore b/.gitignore index fdfe2f0..4644557 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,8 @@ config-devices.* config-all-devices.* config-host.* config-target.* +trace.h +trace.c *-softmmu *-darwin-user *-linux-user diff --git a/Makefile b/Makefile index 7986bf6..a9f79a9 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ # Makefile for QEMU. -GENERATED_HEADERS = config-host.h +GENERATED_HEADERS = config-host.h trace.h ifneq ($(wildcard config-host.mak),) # Put the all: rule here so that config-host.mak can contain dependencies. @@ -130,16 +130,24 @@ bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS) iov.o: iov.c iov.h +trace.h: trace-events + $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h < $< > $@," GEN $@") + +trace.c: trace-events + $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -c < $< > $@," GEN $@") + +trace.o: trace.c + ## qemu-img.o: qemu-img-cmds.h qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o: $(GENERATED_HEADERS) -qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(block-obj-y) $(qobject-obj-y) +qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) -qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(block-obj-y) $(qobject-obj-y) +qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) -qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(block-obj-y) $(qobject-obj-y) +qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@," GEN $@") @@ -157,6 +165,7 @@ clean: rm -f *.o *.d *.a $(TOOLS) TAGS cscope.* *.pod *~ */*~ rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d rm -f qemu-img-cmds.h + rm -f trace.c trace.h $(MAKE) -C tests clean for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \ if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \ diff --git a/Makefile.objs b/Makefile.objs index 1a942e5..20e709e 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -251,6 +251,11 @@ libdis-$(CONFIG_S390_DIS) += s390-dis.o libdis-$(CONFIG_SH4_DIS) += sh4-dis.o libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o +## +# trace + +trace-obj-y = trace.o + vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS) vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS) diff --git a/Makefile.target b/Makefile.target index fda5bf3..8f7b564 100644 --- a/Makefile.target +++ b/Makefile.target @@ -293,6 +293,7 @@ $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y): $(GENERATED_HEADERS) obj-y += $(addprefix ../, $(common-obj-y)) obj-y += $(addprefix ../libdis/, $(libdis-y)) +obj-y += $(addprefix ../, $(trace-obj-y)) obj-y += $(libobj-y) obj-y += $(addprefix $(HWDIR)/, $(hw-obj-y)) diff --git a/configure b/configure index 3cd2c5f..e94e113 100755 --- a/configure +++ b/configure @@ -299,6 +299,7 @@ pkgversion="" check_utests="no" user_pie="no" zero_malloc="" +trace_backend="nop" # OS specific if check_define __linux__ ; then @@ -494,6 +495,8 @@ for opt do ;; --target-list=*) target_list="$optarg" ;; + -
[Qemu-devel] [PATCH 5/7] trace: Trace qemu_malloc() and qemu_vmalloc()
It is often useful to instrument memory management functions in order to find leaks or performance problems. This patch adds trace events for the memory allocation primitives. Signed-off-by: Stefan Hajnoczi --- v2: * Record pointer result from allocation functions osdep.c | 24 ++-- qemu-malloc.c | 12 ++-- trace-events | 10 ++ 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/osdep.c b/osdep.c index abbc8a2..a6b7726 100644 --- a/osdep.c +++ b/osdep.c @@ -50,6 +50,7 @@ #endif #include "qemu-common.h" +#include "trace.h" #include "sysemu.h" #include "qemu_socket.h" @@ -71,25 +72,34 @@ static void *oom_check(void *ptr) #if defined(_WIN32) void *qemu_memalign(size_t alignment, size_t size) { +void *ptr; + if (!size) { abort(); } -return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE)); +ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE)); +trace_qemu_memalign(alignment, size, ptr); +return ptr; } void *qemu_vmalloc(size_t size) { +void *ptr; + /* FIXME: this is not exactly optimal solution since VirtualAlloc has 64Kb granularity, but at least it guarantees us that the memory is page aligned. */ if (!size) { abort(); } -return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE)); +ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE)); +trace_qemu_vmalloc(size, ptr); +return ptr; } void qemu_vfree(void *ptr) { +trace_qemu_vfree(ptr); VirtualFree(ptr, 0, MEM_RELEASE); } @@ -97,21 +107,22 @@ void qemu_vfree(void *ptr) void *qemu_memalign(size_t alignment, size_t size) { +void *ptr; #if defined(_POSIX_C_SOURCE) && !defined(__sun__) int ret; -void *ptr; ret = posix_memalign(&ptr, alignment, size); if (ret != 0) { fprintf(stderr, "Failed to allocate %zu B: %s\n", size, strerror(ret)); abort(); } -return ptr; #elif defined(CONFIG_BSD) -return oom_check(valloc(size)); +ptr = oom_check(valloc(size)); #else -return oom_check(memalign(alignment, size)); +ptr = oom_check(memalign(alignment, size)); #endif +trace_qemu_memalign(alignment, size, ptr); +return ptr; } /* alloc shared memory pages */ @@ -122,6 +133,7 @@ void *qemu_vmalloc(size_t size) void qemu_vfree(void *ptr) { +trace_qemu_vfree(ptr); free(ptr); } diff --git a/qemu-malloc.c b/qemu-malloc.c index 6cdc5de..72de60a 100644 --- a/qemu-malloc.c +++ b/qemu-malloc.c @@ -22,6 +22,7 @@ * THE SOFTWARE. */ #include "qemu-common.h" +#include "trace.h" #include static void *oom_check(void *ptr) @@ -39,6 +40,7 @@ void *get_mmap_addr(unsigned long size) void qemu_free(void *ptr) { +trace_qemu_free(ptr); free(ptr); } @@ -53,18 +55,24 @@ static int allow_zero_malloc(void) void *qemu_malloc(size_t size) { +void *ptr; if (!size && !allow_zero_malloc()) { abort(); } -return oom_check(malloc(size ? size : 1)); +ptr = oom_check(malloc(size ? size : 1)); +trace_qemu_malloc(size, ptr); +return ptr; } void *qemu_realloc(void *ptr, size_t size) { +void *newptr; if (!size && !allow_zero_malloc()) { abort(); } -return oom_check(realloc(ptr, size ? size : 1)); +newptr = oom_check(realloc(ptr, size ? size : 1)); +trace_qemu_realloc(ptr, size, newptr); +return newptr; } void *qemu_mallocz(size_t size) diff --git a/trace-events b/trace-events index 5efaa86..3fde0c6 100644 --- a/trace-events +++ b/trace-events @@ -24,3 +24,13 @@ # system may not have the necessary headers included. # # The should be a sprintf()-compatible format string. + +# qemu-malloc.c +qemu_malloc(size_t size, void *ptr) "size %zu ptr %p" +qemu_realloc(void *ptr, size_t size, void *newptr) "ptr %p size %zu newptr %p" +qemu_free(void *ptr) "ptr %p" + +# osdep.c +qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu ptr %p" +qemu_valloc(size_t size, void *ptr) "size %zu ptr %p" +qemu_vfree(void *ptr) "ptr %p" -- 1.7.1
[Qemu-devel] [PATCH 3/7] trace: Add simple built-in tracing backend
This patch adds a simple tracer which produces binary trace files and is built into QEMU. The main purpose of this patch is to show how new tracing backends can be added to tracetool. To try out the simple backend: ./configure --trace-backend=simple make After running QEMU you can pretty-print the trace: ./simpletrace.py trace-events /tmp/trace.log Signed-off-by: Stefan Hajnoczi --- I intend for this tracing backend to be replaced by something based on Prerna's work. For now it is useful for basic tracing. v2: * Make simpletrace.py parse trace-events instead of generating Python .gitignore |1 + Makefile.objs |3 ++ configure |2 +- simpletrace.c | 64 ++ simpletrace.py | 53 ++ tracetool | 78 +-- 6 files changed, 197 insertions(+), 4 deletions(-) create mode 100644 simpletrace.c create mode 100755 simpletrace.py diff --git a/.gitignore b/.gitignore index 4644557..5128452 100644 --- a/.gitignore +++ b/.gitignore @@ -39,6 +39,7 @@ qemu-monitor.texi *.log *.pdf *.pg +*.pyc *.toc *.tp *.vr diff --git a/Makefile.objs b/Makefile.objs index 20e709e..7cb40ac 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -255,6 +255,9 @@ libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o # trace trace-obj-y = trace.o +ifeq ($(TRACE_BACKEND),simple) +trace-obj-y += simpletrace.o +endif vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS) diff --git a/configure b/configure index e94e113..7d2c69b 100755 --- a/configure +++ b/configure @@ -829,7 +829,7 @@ echo " --enable-docsenable documentation build" echo " --disable-docs disable documentation build" echo " --disable-vhost-net disable vhost-net acceleration support" echo " --enable-vhost-net enable vhost-net acceleration support" -echo " --trace-backend=BTrace backend nop" +echo " --trace-backend=BTrace backend nop simple" echo "" echo "NOTE: The object files are built at the place where configure is launched" exit 1 diff --git a/simpletrace.c b/simpletrace.c new file mode 100644 index 000..2fec4d3 --- /dev/null +++ b/simpletrace.c @@ -0,0 +1,64 @@ +#include +#include +#include "trace.h" + +typedef struct { +unsigned long event; +unsigned long x1; +unsigned long x2; +unsigned long x3; +unsigned long x4; +unsigned long x5; +} TraceRecord; + +enum { +TRACE_BUF_LEN = 64 * 1024 / sizeof(TraceRecord), +}; + +static TraceRecord trace_buf[TRACE_BUF_LEN]; +static unsigned int trace_idx; +static FILE *trace_fp; + +static void trace(TraceEvent event, unsigned long x1, + unsigned long x2, unsigned long x3, + unsigned long x4, unsigned long x5) { +TraceRecord *rec = &trace_buf[trace_idx]; +rec->event = event; +rec->x1 = x1; +rec->x2 = x2; +rec->x3 = x3; +rec->x4 = x4; +rec->x5 = x5; + +if (++trace_idx == TRACE_BUF_LEN) { +trace_idx = 0; + +if (!trace_fp) { +trace_fp = fopen("/tmp/trace.log", "w"); +} +if (trace_fp) { +size_t result = fwrite(trace_buf, sizeof trace_buf, 1, trace_fp); +result = result; +} +} +} + +void trace1(TraceEvent event, unsigned long x1) { +trace(event, x1, 0, 0, 0, 0); +} + +void trace2(TraceEvent event, unsigned long x1, unsigned long x2) { +trace(event, x1, x2, 0, 0, 0); +} + +void trace3(TraceEvent event, unsigned long x1, unsigned long x2, unsigned long x3) { +trace(event, x1, x2, x3, 0, 0); +} + +void trace4(TraceEvent event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4) { +trace(event, x1, x2, x3, x4, 0); +} + +void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long x4, unsigned long x5) { +trace(event, x1, x2, x3, x4, x5); +} diff --git a/simpletrace.py b/simpletrace.py new file mode 100755 index 000..d6631ba --- /dev/null +++ b/simpletrace.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python +import sys +import struct +import re + +trace_fmt = 'LL' +trace_len = struct.calcsize(trace_fmt) +event_re = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\)\s+"([^"]*)"') + +def parse_events(fobj): +def get_argnames(args): +return tuple(arg.split()[-1].lstrip('*') for arg in args.split(',')) + +events = {} +event_num = 0 +for line in fobj: +m = event_re.match(line.strip()) +if m is None: +continue + +disable, name, args, fmt = m.groups() +if disable: +continue + +events[event_num] = (name,) + get_argnames(args) +event_num += 1 +return events + +def read_record(fobj): +s = fobj.read(trace_len) +if len(s) != trace_len: +return None +return struct.unpack(trace_fmt, s) + +def format_record(events, rec): +event = events[rec[0]] +fields = [event[0]] +
[Qemu-devel] [PATCH 4/7] trace: Add LTTng Userspace Tracer backend
This patch adds LTTng Userspace Tracer (UST) backend support. The UST system requires no kernel support but libust and liburcu must be installed. $ ./configure --trace-backend ust $ make Start the UST daemon: $ ustd & List available tracepoints and enable some: $ ustctl --list-markers $(pgrep qemu) [...] {PID: 5458, channel/marker: ust/paio_submit, state: 0, fmt: "acb %p opaque %p sector_num %lu nb_sectors %lu type %lu" 0x4b32ba} $ ustctl --enable-marker "ust/paio_submit" $(pgrep qemu) Run the trace: $ ustctl --create-trace $(pgrep qemu) $ ustctl --start-trace $(pgrep qemu) [...] $ ustctl --stop-trace $(pgrep qemu) $ ustctl --destroy-trace $(pgrep qemu) Trace results can be viewed using lttv-gui. More information about UST: http://lttng.org/ust Signed-off-by: Stefan Hajnoczi --- configure |5 +++- tracetool | 77 +++- 2 files changed, 79 insertions(+), 3 deletions(-) diff --git a/configure b/configure index 7d2c69b..675d0fc 100755 --- a/configure +++ b/configure @@ -829,7 +829,7 @@ echo " --enable-docsenable documentation build" echo " --disable-docs disable documentation build" echo " --disable-vhost-net disable vhost-net acceleration support" echo " --enable-vhost-net enable vhost-net acceleration support" -echo " --trace-backend=BTrace backend nop simple" +echo " --trace-backend=BTrace backend nop simple ust" echo "" echo "NOTE: The object files are built at the place where configure is launched" exit 1 @@ -2302,6 +2302,9 @@ bsd) esac echo "TRACE_BACKEND=$trace_backend" >> $config_host_mak +if test "$trace_backend" = "ust"; then + LIBS="-lust $LIBS" +fi tools= if test `expr "$target_list" : ".*softmmu.*"` != 0 ; then diff --git a/tracetool b/tracetool index f094ddc..9ea9c08 100755 --- a/tracetool +++ b/tracetool @@ -3,12 +3,13 @@ usage() { cat >&2 <" +} + +linetoh_ust() +{ +local name args argnames +name=$(get_name "$1") +args=$(get_args "$1") +argnames=$(get_argnames "$1") + +cat < +#include "trace.h" +EOF +} + +linetoc_ust() +{ +local name args argnames fmt +name=$(get_name "$1") +args=$(get_args "$1") +argnames=$(get_argnames "$1") +fmt=$(get_fmt "$1") + +cat <
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Am 23.05.2010 14:01, schrieb Avi Kivity: > On 05/21/2010 12:29 AM, Anthony Liguori wrote: >> >> I'd be more interested in enabling people to build these types of >> storage systems without touching qemu. >> >> Both sheepdog and ceph ultimately transmit I/O over a socket to a >> central daemon, right? > > That incurs an extra copy. > >> So could we not standardize a protocol for this that both sheepdog and >> ceph could implement? > > The protocol already exists, nbd. It doesn't support snapshotting etc. > but we could extend it. > > But IMO what's needed is a plugin API for the block layer. What would it buy us, apart from more downstreams and having to maintain a stable API and ABI? Hiding block drivers somewhere else doesn't make them stop existing, they just might not be properly integrated, but rather hacked in to fit that limited stable API. Kevin
Re: [Qemu-devel] Re: irq problems after live migration with 0.12.4
Michael Tokarev wrote: 23.05.2010 13:55, Peter Lieven wrote: Hi, after live migrating ubuntu 9.10 server (2.6.31-14-server) and suse linux 10.1 (2.6.16.13-4-smp) it happens sometimes that the guest runs into irq problems. i mention these 2 guest oss since i have seen the error there. there are likely others around with the same problem. on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4. i started a vm with: /usr/bin/qemu-kvm-0.12.4 -net tap,vlan=141,script=no,downscript=no,ifname=tap0 -net nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72 -drive file=/dev/sdb,if=ide,boot=on,cache=none,aio=native -m 1024 -cpu qemu64,model_id='Intel(R) Xeon(R) CPU E5430 @ 2.66GHz' -monitor tcp:0:4001,server,nowait -vnc :1 -name 'migration-test-9-10' -boot order=dc,menu=on -k de -incoming tcp:172.21.55.22:5001 -pidfile /var/run/qemu/vm-155.pid -mem-path /hugepages -mem-prealloc -rtc base=utc,clock=host -usb -usbdevice tablet for testing i have a clean ubuntu 9.10 server 64-bit install and created a small script with fetches a dvd iso from a local server and checking md5sum in an endless loop. the download performance is approx. 50MB/s on that vm. to trigger the error i did several migrations of the vm throughout the last days. finally I ended up in the following oops in the guest: [64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option) [64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server #48-Ubuntu [64442.299179] Call Trace: [64442.299185] [] __report_bad_irq+0x26/0xa0 [64442.299227] [] note_interrupt+0x18c/0x1d0 [64442.299232] [] handle_fasteoi_irq+0xd5/0x100 [64442.299244] [] handle_irq+0x1d/0x30 [64442.299246] [] do_IRQ+0x67/0xe0 [64442.299249] [] ret_from_intr+0x0/0x11 [64442.299266] [] ? handle_IRQ_event+0x24/0x160 [64442.299269] [] ? handle_edge_irq+0xcf/0x170 [64442.299271] [] ? handle_irq+0x1d/0x30 [64442.299273] [] ? do_IRQ+0x67/0xe0 [64442.299275] [] ? ret_from_intr+0x0/0x11 [64442.299290] [] ? _spin_unlock_irqrestore+0x14/0x20 [64442.299302] [] ? scsi_dispatch_cmd+0x16c/0x2d0 [64442.299307] [] ? scsi_request_fn+0x3aa/0x500 [64442.299322] [] ? __blk_run_queue+0x6c/0x150 [64442.299324] [] ? blk_run_queue+0x2b/0x50 [64442.299327] [] ? scsi_run_queue+0xcf/0x2a0 [64442.299336] [] ? scsi_next_command+0x3d/0x60 [64442.299338] [] ? scsi_end_request+0xab/0xb0 [64442.299340] [] ? scsi_io_completion+0x9e/0x4d0 [64442.299348] [] ? default_spin_lock_flags+0x9/0x10 [64442.299351] [] ? scsi_finish_command+0xbd/0x130 [64442.299353] [] ? scsi_softirq_done+0x145/0x170 [64442.299356] [] ? blk_done_softirq+0x7d/0x90 [64442.299368] [] ? __do_softirq+0xbd/0x200 [64442.299370] [] ? call_softirq+0x1c/0x30 [64442.299372] [] ? do_softirq+0x55/0x90 [64442.299374] [] ? irq_exit+0x85/0x90 [64442.299376] [] ? do_IRQ+0x70/0xe0 [64442.299379] [] ? ret_from_intr+0x0/0x11 [64442.299380] [] ? native_safe_halt+0x6/0x10 [64442.299390] [] ? default_idle+0x4c/0xe0 [64442.299395] [] ? atomic_notifier_call_chain+0x15/0x20 [64442.299398] [] ? cpu_idle+0xb2/0x100 [64442.299406] [] ? rest_init+0x66/0x70 [64442.299424] [] ? start_kernel+0x352/0x35b [64442.299427] [] ? x86_64_start_reservations+0x125/0x129 [64442.299429] [] ? x86_64_start_kernel+0xfa/0x109 [64442.299433] handlers: [64442.299840] [] (e1000_intr+0x0/0x190 [e1000]) [64442.300046] Disabling IRQ #10 See also LP bug #584131 (https://bugs.launchpad.net/bugs/584131) and original Debian bug#580649 (http://bugs.debian.org/580649) Not sure if they're related... /mjt michael, do you have any ideas what i got do to debug whats happening? looking at launchpad and debian bug tracker i found other bugs also with a maybe related problem. so this issue might be greater... thanks peter
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
> > This is wrong. The hpet device should expose this as an IO pin. > > Will look into this. > > BTW, I just realized that the GPIO handling is apparently lacking > support for attaching an output to multiple inputs. Or am I missing > something? Use an explicit mux. Incidentally I suspect your handling of the ISA IRQs is broken. You may never have more than one source connected to a sink. Shared IRQ lines must be done explicitly. Paul
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
Paul Brook wrote: >>> This is wrong. The hpet device should expose this as an IO pin. >> Will look into this. >> >> BTW, I just realized that the GPIO handling is apparently lacking >> support for attaching an output to multiple inputs. Or am I missing >> something? > > Use an explicit mux. > > Incidentally I suspect your handling of the ISA IRQs is broken. You may never > have more than one source connected to a sink. Shared IRQ lines must be done > explicitly. No, the other way around: one source (RTC) multiple sinks (HPET, ACPI). Will probably draft a generic irq/gpio mux. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
> Paul Brook wrote: > >>> This is wrong. The hpet device should expose this as an IO pin. > >> > >> Will look into this. > >> > >> BTW, I just realized that the GPIO handling is apparently lacking > >> support for attaching an output to multiple inputs. Or am I missing > >> something? > > > > Use an explicit mux. > > > > Incidentally I suspect your handling of the ISA IRQs is broken. You may > > never have more than one source connected to a sink. Shared IRQ lines > > must be done explicitly. > > No, the other way around: one source (RTC) multiple sinks (HPET, ACPI). > Will probably draft a generic irq/gpio mux. I realise that. However I'd expect things to break if the guest OS devices to share an IRQ line between the HPET and some other device. Paul
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 02:02 PM, Kevin Wolf wrote: So could we not standardize a protocol for this that both sheepdog and ceph could implement? The protocol already exists, nbd. It doesn't support snapshotting etc. but we could extend it. But IMO what's needed is a plugin API for the block layer. What would it buy us, apart from more downstreams and having to maintain a stable API and ABI? Currently if someone wants to add a new block format, they have to upstream it and wait for a new qemu to be released. With a plugin API, they can add a new block format to an existing, supported qemu. Hiding block drivers somewhere else doesn't make them stop existing, they just might not be properly integrated, but rather hacked in to fit that limited stable API. They would hack it to fit the current API, and hack the API in qemu.git to fit their requirements for the next release. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation
On Tue, May 25, 2010 at 10:39:22AM +0200, Joerg Roedel wrote: > On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote: > > On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel wrote: > > >> + > > >> +#define MMIO_SIZE ? ? ? ? ? ? ? 0x2028 > > > > > > This size should be a power-of-two value. In this case probably 0x4000. > > > > Not really, the devices can reserve regions of any size. There were > > some implementation deficiencies in earlier versions of QEMU, where > > the whole page would be reserved anyway, but this limitation has been > > removed long time ago. > > The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux > driver maps the MMIO region with this size. So the emulation should > reserve this amount of MMIO space too. > > Joerg Yeah, I'll change that, since I already reserve 0x4000 bytes in SeaBIOS for it (I did that to deal with the 16 KiB alignment requirement). Eduard
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
Paul Brook wrote: >> Paul Brook wrote: > This is wrong. The hpet device should expose this as an IO pin. Will look into this. BTW, I just realized that the GPIO handling is apparently lacking support for attaching an output to multiple inputs. Or am I missing something? >>> Use an explicit mux. >>> >>> Incidentally I suspect your handling of the ISA IRQs is broken. You may >>> never have more than one source connected to a sink. Shared IRQ lines >>> must be done explicitly. >> No, the other way around: one source (RTC) multiple sinks (HPET, ACPI). >> Will probably draft a generic irq/gpio mux. > > I realise that. However I'd expect things to break if the guest OS devices to > share an IRQ line between the HPET and some other device. The guest would share IRQ8, not the RTC output. So there would be no difference to the current situation. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] sparc64: clean up pci bridge map
From: Igor V. Kovalenko - remove unused host state and store pci bus pointer only - do not map host state access into unused 1fe.1000 range - reorder pci region registration - assign pci i/o region to isa_mem_base - rename default machine (it's Ultrasparc IIi now) Signed-off-by: Igor V. Kovalenko --- hw/apb_pci.c | 49 ++--- hw/sun4u.c |6 +++--- 2 files changed, 29 insertions(+), 26 deletions(-) diff --git a/hw/apb_pci.c b/hw/apb_pci.c index 65d8ba6..b53e3c3 100644 --- a/hw/apb_pci.c +++ b/hw/apb_pci.c @@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0) typedef struct APBState { SysBusDevice busdev; -PCIHostState host_state; +PCIBus *bus; ReadWriteHandler pci_config_handler; uint32_t iommu[4]; uint32_t pci_control[16]; @@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, pcibus_t addr, val = qemu_bswap_len(val, size); APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val); -pci_data_write(s->host_state.bus, addr, val, size); +pci_data_write(s->bus, addr, val, size); } static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr, @@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr, uint32_t ret; APBState *s = container_of(h, APBState, pci_config_handler); -ret = pci_data_read(s->host_state.bus, addr, size); +ret = pci_data_read(s->bus, addr, size); ret = qemu_bswap_len(ret, size); APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret); return ret; @@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, s = sysbus_from_qdev(dev); /* apb_config */ sysbus_mmio_map(s, 0, special_base); +/* PCI configuration space */ +sysbus_mmio_map(s, 1, special_base + 0x100ULL); /* pci_ioport */ -sysbus_mmio_map(s, 1, special_base + 0x200ULL); -/* pci_config */ -sysbus_mmio_map(s, 2, special_base + 0x100ULL); -/* mem_data */ -sysbus_mmio_map(s, 3, mem_base); +sysbus_mmio_map(s, 2, special_base + 0x200ULL); d = FROM_SYSBUS(APBState, s); -d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci", + +d->bus = pci_register_bus(&d->busdev.qdev, "pci", pci_apb_set_irq, pci_pbm_map_irq, d, 0, 32); -pci_bus_set_mem_base(d->host_state.bus, mem_base); +pci_bus_set_mem_base(d->bus, mem_base); for (i = 0; i < 32; i++) { sysbus_connect_irq(s, i, pic[i]); } -pci_create_simple(d->host_state.bus, 0, "pbm"); +pci_create_simple(d->bus, 0, "pbm"); + /* APB secondary busses */ -*bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0), +*bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0), PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, pci_apb_map_irq, "Advanced PCI Bus secondary bridge 1"); apb_pci_bridge_init(*bus2); -*bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1), +*bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1), PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, pci_apb_map_irq, "Advanced PCI Bus secondary bridge 2"); apb_pci_bridge_init(*bus3); -return d->host_state.bus; +return d->bus; } static void pci_pbm_reset(DeviceState *d) @@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d) static int pci_pbm_init_device(SysBusDevice *dev) { APBState *s; -int pci_mem_data, apb_config, pci_ioport, pci_config; +int pci_config, apb_config, pci_ioport; unsigned int i; s = FROM_SYSBUS(APBState, dev); @@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev) /* apb_config */ apb_config = cpu_register_io_memory(apb_config_read, apb_config_write, s); +/* at region 0 */ sysbus_init_mmio(dev, 0x1ULL, apb_config); -/* pci_ioport */ -pci_ioport = cpu_register_io_memory(pci_apb_ioread, - pci_apb_iowrite, s); -sysbus_init_mmio(dev, 0x1ULL, pci_ioport); -/* pci_config */ + +/* PCI configuration space */ s->pci_config_handler.read = apb_pci_config_read; s->pci_config_handler.write = apb_pci_config_write; pci_config = cpu_register_io_memory_simple(&s->pci_config_handler); assert(pci_config >= 0); +/* at region 1 */ sysbus_init_mmio(dev, 0x100ULL, pci_config); -/* mem_data */ -pci_mem_data = pci_host_data_register_mmio(&s->host_state, 1); -sysbus_init_mmio(dev, 0x1000ULL, pci_mem_data); + +/* pci_ioport */ +pci_ioport = cpu_register_io_memory(pci_apb_ioread, +pci_apb_iowrite, s); +
Re: [Qemu-devel] [PATCH 3/3] Samples to add a tracepoint.
@@ -87,6 +91,8 @@ static void virtio_blk_rw_complete(void { VirtIOBlockReq *req = opaque; +trace_virtio_blk_rw_complete(req, ret); + if (ret) { int is_read = !(req->out->type & VIRTIO_BLK_T_OUT); if (virtio_blk_handle_rw_error(req, -ret, is_read)) What happens when CONFIG_QEMU_TRACE is not defined? Linker error for missing symbol trace_virtio_blk_rw_complete()? This is handled by the tracing backends patchset I posted. When tracing is disabled the nop backend will make tracepoints empty inline functions. The compiler makes them disappear but the tracepoint invocation is still parsed and type checked by the compiler. It shouldn't be hard to add your tracer as a backend. Stefan
Re: [Qemu-devel] [PATCH 2/3] Tracepoint, buffer & monitor framework
+#define DEFINE_TRACE(name, tproto, tassign, tprint)\ + void trace_##name(tproto) \ + { \ + unsigned int hash;\ + char tpname[] = __stringify(name);\ + struct tracepoint *tp;\ + struct __trace_struct_##name var, *entry; \ + \ + hash = tdb_hash(tpname); \ + tp = find_tracepoint_by_name(tpname); \ + if (tp == NULL || !tp->state) \ + return; \ + \ + entry = &var; \ + tassign \ + \ + write_trace_to_buffer(&qemu_buf, hash, tp->trace_id, \ + entry, sizeof(struct __trace_struct_##name)); \ + } \ I think this is too much work. Let each tracepoint have its own global struct tracepoint so it can directly reference it using tracepoint_##name - no hash lookup needed. Add the QLIST_ENTRY directly to struct tracepoint so the tracepoint register/unregister code can assign ids and look up tracepoints by name. No critical path code needs to do name lookups and the hash table can disappear. +#define DECLARE_TRACE(name, tproto, tstruct) \ + struct __trace_struct_##name { \ + tstruct \ + }; \ Should this struct be packed so more fields can fit? +trace_queue->trace_buffer[tmp].metadata.write_complete = 0; This is not guaranteed to work without memory barriers. There is no way for the trace consumer to block until there is more data available. The synchronization needs to consider writing traces to a file, which has different constraints than dumping the current contents of the trace buffer. We're missing a way to trace to a file. That could be done in binary or text. It would be easier in text because we already have the format strings and don't need a unique ID mapping in an external binary parsing tool. Making data available after crash is also useful. The easiest way is to dump the trace buffer from the core dump using gdb. However, we'd need some way of making sense of the bytes. That could be done by reading the tracepoint_lib structures from the core dump. (The way I do trace recovery from a core dump in my simple tracer is to binary dump the trace buffer from the core dump. Since the trace buffer contents are normally written out to file unchanged anyway, the simpletrace.py script can read the dumped trace buffer like a normal trace file.) Nitpicks: Some added lines of code use tabs for indentation, 4 space indentation should be used. +{ +.name = "tracepoint", +.args_type = "", +.params = "", +.help = "show contents of trace buffer", Copy-pasted, .help not updated. @@ -145,6 +147,10 @@ struct Monitor { #ifdef CONFIG_DEBUG_MONITOR int print_calls_nr; #endif +#ifdef CONFIG_QEMU_TRACE +struct DebugBuffer *qemu_buf_ptr; +#endif + QError *error; QLIST_HEAD(,mon_fd_t) fds; QLIST_ENTRY(Monitor) entry; Would TraceBuffer be a more appropriate name for DebugBuffer? qemu_buf_ptr is vague, perhaps trace_buf is more clear? I'm not sure I understand the reason for qemu_buf_ptr. There is already a global qemu_buf and qemu_buf_ptr is a pointer to that? +if(!strncmp(tp_state, "on", 3)) [...] + if(!strncmp(tp_state, "off", 4)) "on" with 3 and "off" with 4 are equivalent to strcmp(). "on" with 2 and "off" with 3 would allow for any suffix after the matched string. +#else /* CONFIG_QEMU_TRACE */ +static void do_tracepoint_status(Monitor *mon, const QDict *qdict) +{ +monitor_printf(mon, "Internal tracing not compiled\n"); +} +#endif "tracepoint" has this !CONFIG_QEMU_TRACE function but "trace" doesn't. +#define INCREMENT_INDEX(HEAD,IDX) (HEAD->IDX++) % HEAD->buf_size [...] +if ((trace_queue->last + 1) % trace_queue->buf_size +== trace_queue->first) + trace_queue->first = INCREMENT_INDEX(trace_queue, first); +trace_queue->last = INCREMENT_INDEX(trace_queue, last); Slightly safer macro: #define NEXT_INDEX(HEAD,IDX) (((HEAD)->IDX + 1) % (HEAD)->buf_size) [...] if (NEXT_INDEX(trace_queue, last) == trace_queue->first) trace_queue->first = NEXT_INDEX(trace_queue, first); trace_queue->last = NEXT_INDEX(trace_queue, last); +tmp = trace_queue->last; Instead of using tmp: D
Re: [Qemu-devel] [RFC 0/3] Tracing framework for QEMU
Interesting to see your patches, tracepoint definitions/declarations look similar to in-kernel tracepoints :). Please post future patches inline to the email so reviewing and replying is easy (e.g. use git-send-email to send patches). Stefan
Re: [Qemu-devel] [PATCH] sparc64: clean up pci bridge map
2010/5/25 Igor V. Kovalenko : > From: Igor V. Kovalenko > > - remove unused host state and store pci bus pointer only > - do not map host state access into unused 1fe.1000 range > - reorder pci region registration > - assign pci i/o region to isa_mem_base > - rename default machine (it's Ultrasparc IIi now) Just rename the machine or use another CPU too? While you are at it maybe split these two? > > Signed-off-by: Igor V. Kovalenko > --- > hw/apb_pci.c | 49 ++--- > hw/sun4u.c | 6 +++--- > 2 files changed, 29 insertions(+), 26 deletions(-) > > diff --git a/hw/apb_pci.c b/hw/apb_pci.c > index 65d8ba6..b53e3c3 100644 > --- a/hw/apb_pci.c > +++ b/hw/apb_pci.c > @@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0) > > typedef struct APBState { > SysBusDevice busdev; > - PCIHostState host_state; > + PCIBus *bus; > ReadWriteHandler pci_config_handler; > uint32_t iommu[4]; > uint32_t pci_control[16]; > @@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, > pcibus_t addr, > > val = qemu_bswap_len(val, size); > APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val); > - pci_data_write(s->host_state.bus, addr, val, size); > + pci_data_write(s->bus, addr, val, size); > } > > static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr, > @@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, > pcibus_t addr, > uint32_t ret; > APBState *s = container_of(h, APBState, pci_config_handler); > > - ret = pci_data_read(s->host_state.bus, addr, size); > + ret = pci_data_read(s->bus, addr, size); > ret = qemu_bswap_len(ret, size); > APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret); > return ret; > @@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, > s = sysbus_from_qdev(dev); > /* apb_config */ > sysbus_mmio_map(s, 0, special_base); > + /* PCI configuration space */ > + sysbus_mmio_map(s, 1, special_base + 0x100ULL); > /* pci_ioport */ > - sysbus_mmio_map(s, 1, special_base + 0x200ULL); > - /* pci_config */ > - sysbus_mmio_map(s, 2, special_base + 0x100ULL); > - /* mem_data */ > - sysbus_mmio_map(s, 3, mem_base); > + sysbus_mmio_map(s, 2, special_base + 0x200ULL); > d = FROM_SYSBUS(APBState, s); > - d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci", > + > + d->bus = pci_register_bus(&d->busdev.qdev, "pci", > pci_apb_set_irq, pci_pbm_map_irq, d, > 0, 32); > - pci_bus_set_mem_base(d->host_state.bus, mem_base); > + pci_bus_set_mem_base(d->bus, mem_base); > > for (i = 0; i < 32; i++) { > sysbus_connect_irq(s, i, pic[i]); > } > > - pci_create_simple(d->host_state.bus, 0, "pbm"); > + pci_create_simple(d->bus, 0, "pbm"); > + > /* APB secondary busses */ > - *bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0), > + *bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0), > PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, > pci_apb_map_irq, > "Advanced PCI Bus secondary bridge 1"); > apb_pci_bridge_init(*bus2); > > - *bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1), > + *bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1), > PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, > pci_apb_map_irq, > "Advanced PCI Bus secondary bridge 2"); > apb_pci_bridge_init(*bus3); > > - return d->host_state.bus; > + return d->bus; > } > > static void pci_pbm_reset(DeviceState *d) > @@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d) > static int pci_pbm_init_device(SysBusDevice *dev) > { > APBState *s; > - int pci_mem_data, apb_config, pci_ioport, pci_config; > + int pci_config, apb_config, pci_ioport; > unsigned int i; > > s = FROM_SYSBUS(APBState, dev); > @@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev) > /* apb_config */ > apb_config = cpu_register_io_memory(apb_config_read, > apb_config_write, s); > + /* at region 0 */ > sysbus_init_mmio(dev, 0x1ULL, apb_config); > - /* pci_ioport */ > - pci_ioport = cpu_register_io_memory(pci_apb_ioread, > - pci_apb_iowrite, s); > - sysbus_init_mmio(dev, 0x1ULL, pci_ioport); > - /* pci_config */ > + > + /* PCI configuration space */ > s->pci_config_handler.read = apb_pci_config_read; > s->pci_config_handler.write = apb_pci_config_write; > pci_config = cpu_register_io_memory_simple(&s->pci_config_handler); > assert(pci_config >= 0); > + /* at region 1 */ > sysbus_init_mmio(dev, 0x100U
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote: > Currently if someone wants to add a new block format, they have to > upstream it and wait for a new qemu to be released. With a plugin API, > they can add a new block format to an existing, supported qemu. So? Unless we want a stable driver ABI which I fundamentally oppose as it would make block driver development hell they'd have to wait for a new release of the block layer. It's really just going to be a lot of pain for no major gain. qemu releases are frequent enough, and if users care enough they can also easily patch qemu.
[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations
On 05/25/2010 01:24 PM, Stefan Hajnoczi wrote: This patch adds trace events for virtqueue operations including adding/removing buffers, notifying the guest, and receiving a notify from the guest. diff --git a/trace-events b/trace-events index 48415f8..a533414 100644 --- a/trace-events +++ b/trace-events @@ -35,6 +35,14 @@ qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size %zu qemu_valloc(size_t size, void *ptr) "size %zu ptr %p" qemu_vfree(void *ptr) "ptr %p" +# hw/virtio.c +virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u" +virtqueue_flush(void *vq, unsigned int count) "vq %p count %u" +virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) "vq %p elem %p in_num %u out_num %u" +virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p" +virtio_irq(void *vq) "vq %p" +virtio_notify(void *vdev, void *vq) "vdev %p vq %p" + Those %ps are more or less useless. We need better ways of identifying them. Linux uses %pTYPE to pretty print arbitrary types. We could do something similar (not the same since we don't want our own printf implementation). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
> > I realise that. However I'd expect things to break if the guest OS > > devices to share an IRQ line between the HPET and some other device. > > The guest would share IRQ8, not the RTC output. So there would be no > difference to the current situation. The difference is that you've removed the check that prevented overlap between the PIC and annother device. You should be using isa_reserve_irq/isa_init_irq before you use an ISA IRQ line. Any uses of isa_bus_irqs (including teh existing HPET code) are probably broken. Paul
[Qemu-devel] [PATCH 0/2] sparc64: cleanups
- rename sun4u cpu to Ultrasparc IIi - cleanup pci bridge map (requires openbios change) v0->v1: split out rename of sun4u cpu to separate patch --- Igor V. Kovalenko (2): sparc64: rename sun4u cpu to Ultrasparc IIi sparc64: clean up pci bridge map hw/apb_pci.c | 49 ++--- hw/sun4u.c |6 +++--- 2 files changed, 29 insertions(+), 26 deletions(-) --
[Qemu-devel] [PATCH 1/2] sparc64: rename sun4u cpu to Ultrasparc IIi
From: Igor V. Kovalenko Signed-off-by: Igor V. Kovalenko --- hw/sun4u.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/sun4u.c b/hw/sun4u.c index e9a1e23..1e92900 100644 --- a/hw/sun4u.c +++ b/hw/sun4u.c @@ -859,7 +859,7 @@ enum { static const struct hwdef hwdefs[] = { /* Sun4u generic PC-like machine */ { -.default_cpu_model = "TI UltraSparc II", +.default_cpu_model = "TI UltraSparc IIi", .machine_id = sun4u_id, .prom_addr = 0x1fff000ULL, .console_serial_base = 0,
[Qemu-devel] [PATCH 2/2] sparc64: clean up pci bridge map
From: Igor V. Kovalenko - remove unused host state and store pci bus pointer only - do not map host state access into unused 1fe.1000 range - reorder pci region registration - assign pci i/o region to isa_mem_base Signed-off-by: Igor V. Kovalenko --- hw/apb_pci.c | 49 ++--- hw/sun4u.c |4 ++-- 2 files changed, 28 insertions(+), 25 deletions(-) diff --git a/hw/apb_pci.c b/hw/apb_pci.c index 65d8ba6..b53e3c3 100644 --- a/hw/apb_pci.c +++ b/hw/apb_pci.c @@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0) typedef struct APBState { SysBusDevice busdev; -PCIHostState host_state; +PCIBus *bus; ReadWriteHandler pci_config_handler; uint32_t iommu[4]; uint32_t pci_control[16]; @@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, pcibus_t addr, val = qemu_bswap_len(val, size); APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val); -pci_data_write(s->host_state.bus, addr, val, size); +pci_data_write(s->bus, addr, val, size); } static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr, @@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr, uint32_t ret; APBState *s = container_of(h, APBState, pci_config_handler); -ret = pci_data_read(s->host_state.bus, addr, size); +ret = pci_data_read(s->bus, addr, size); ret = qemu_bswap_len(ret, size); APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret); return ret; @@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, s = sysbus_from_qdev(dev); /* apb_config */ sysbus_mmio_map(s, 0, special_base); +/* PCI configuration space */ +sysbus_mmio_map(s, 1, special_base + 0x100ULL); /* pci_ioport */ -sysbus_mmio_map(s, 1, special_base + 0x200ULL); -/* pci_config */ -sysbus_mmio_map(s, 2, special_base + 0x100ULL); -/* mem_data */ -sysbus_mmio_map(s, 3, mem_base); +sysbus_mmio_map(s, 2, special_base + 0x200ULL); d = FROM_SYSBUS(APBState, s); -d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci", + +d->bus = pci_register_bus(&d->busdev.qdev, "pci", pci_apb_set_irq, pci_pbm_map_irq, d, 0, 32); -pci_bus_set_mem_base(d->host_state.bus, mem_base); +pci_bus_set_mem_base(d->bus, mem_base); for (i = 0; i < 32; i++) { sysbus_connect_irq(s, i, pic[i]); } -pci_create_simple(d->host_state.bus, 0, "pbm"); +pci_create_simple(d->bus, 0, "pbm"); + /* APB secondary busses */ -*bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0), +*bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0), PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, pci_apb_map_irq, "Advanced PCI Bus secondary bridge 1"); apb_pci_bridge_init(*bus2); -*bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1), +*bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1), PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, pci_apb_map_irq, "Advanced PCI Bus secondary bridge 2"); apb_pci_bridge_init(*bus3); -return d->host_state.bus; +return d->bus; } static void pci_pbm_reset(DeviceState *d) @@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d) static int pci_pbm_init_device(SysBusDevice *dev) { APBState *s; -int pci_mem_data, apb_config, pci_ioport, pci_config; +int pci_config, apb_config, pci_ioport; unsigned int i; s = FROM_SYSBUS(APBState, dev); @@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev) /* apb_config */ apb_config = cpu_register_io_memory(apb_config_read, apb_config_write, s); +/* at region 0 */ sysbus_init_mmio(dev, 0x1ULL, apb_config); -/* pci_ioport */ -pci_ioport = cpu_register_io_memory(pci_apb_ioread, - pci_apb_iowrite, s); -sysbus_init_mmio(dev, 0x1ULL, pci_ioport); -/* pci_config */ + +/* PCI configuration space */ s->pci_config_handler.read = apb_pci_config_read; s->pci_config_handler.write = apb_pci_config_write; pci_config = cpu_register_io_memory_simple(&s->pci_config_handler); assert(pci_config >= 0); +/* at region 1 */ sysbus_init_mmio(dev, 0x100ULL, pci_config); -/* mem_data */ -pci_mem_data = pci_host_data_register_mmio(&s->host_state, 1); -sysbus_init_mmio(dev, 0x1000ULL, pci_mem_data); + +/* pci_ioport */ +pci_ioport = cpu_register_io_memory(pci_apb_ioread, +pci_apb_iowrite, s); +/* at region 2 */ +sysbus_init_mmio(dev, 0x10
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 03:03 PM, Christoph Hellwig wrote: On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote: Currently if someone wants to add a new block format, they have to upstream it and wait for a new qemu to be released. With a plugin API, they can add a new block format to an existing, supported qemu. So? Unless we want a stable driver ABI which I fundamentally oppose as it would make block driver development hell We'd only freeze it for a major release. they'd have to wait for a new release of the block layer. It's really just going to be a lot of pain for no major gain. qemu releases are frequent enough, and if users care enough they can also easily patch qemu. May not be so easy for them, they lose binary updates from their distro and have to keep repatching. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] sparc64: clean up pci bridge map
On Tue, May 25, 2010 at 3:56 PM, Artyom Tarasenko wrote: > 2010/5/25 Igor V. Kovalenko : >> From: Igor V. Kovalenko >> >> - remove unused host state and store pci bus pointer only >> - do not map host state access into unused 1fe.1000 range >> - reorder pci region registration >> - assign pci i/o region to isa_mem_base >> - rename default machine (it's Ultrasparc IIi now) > > Just rename the machine or use another CPU too? While you are at it > maybe split these two? Let's rename the cpu only since at the moment the rest of sun4u is more Ultrasparc IIi than anything else anyway. I posted updated set with separated rename bit. -- Kind regards, Igor V. Kovalenko
Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET
Paul Brook wrote: >>> I realise that. However I'd expect things to break if the guest OS >>> devices to share an IRQ line between the HPET and some other device. >> The guest would share IRQ8, not the RTC output. So there would be no >> difference to the current situation. > > The difference is that you've removed the check that prevented overlap > between > the PIC and annother device. You should be using > isa_reserve_irq/isa_init_irq > before you use an ISA IRQ line. Any uses of isa_bus_irqs (including teh > existing HPET code) are probably broken. ...at least fragile. OK, will address this as well. Thanks, Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization
On 05/25/2010 02:23 AM, Avi Kivity wrote: On 05/24/2010 11:22 PM, Anthony Liguori wrote: This converts the entire qdev tree into an undocumented stable protocol (the qdev paths were already in this state I believe). This really worries me. N.B. the association with qdev is only in identifying the device. The contents of the device's state are not part of qdev but rather part of vmstate. vmstate is something that we already guarantee to be stable since that's required for live migration compatibility. That removes out ability to deprecate older vmstate as time passes. Not a blocker but something to consider. I don't think that qdev device names and paths are something we have to worry much about changing over time since they reflect logical bus layout. They should remain static provided the devices remain static. Modulo mistakes. We already saw one (lack of pci domains). To reduce the possibility of mistakes, we need reviewable documentation. pci domains was only a mistake as a nice-to-have. We can add pci domains in a backwards compatible way. The arguments you're making about the importance of backwards compatibility and what's needed to strongly guarantee it are equally applicable to the live migration protocol. We really do need to formally document the live migration protocol in such a way that it's reviewable if we hope to truly make it compatible across versions. Regards, Anthony Liguori Note sysfs had similar assumptions and problems. The qdev properties are a different matter entirely. A command like 'info qdm' would be potentially difficult to support as part of QMP but the proposed command's output is actually already part of a backward compatible interface (vmstate). That's all good. But documentation is critical for this. Not only to improve quality, but also so that tool authors would have something to code against instead of trial and error (which invariably misses some corner cases).
Re: [Qemu-devel] [PATCH v2 1/3] add some tests for invalid JSON
On 05/25/2010 02:28 AM, Paolo Bonzini wrote: On 05/24/2010 10:17 PM, Anthony Liguori wrote: On 05/24/2010 02:39 AM, Paolo Bonzini wrote: Signed-off-by: Paolo Bonzini I think this series conflicts a bit with Luiz's series which I just pushed. Could you rebase against the latest? You didn't apply this one yet, at least I don't see it on qemu.git commit e546343ee0f3f904529d32c1a9a60f5baa181852 Author: Luiz Capitulino Date: Wed May 19 18:15:32 2010 -0300 json-lexer: Drop 'buf' QString supports adding a single char, 'buf' is unneeded. Signed-off-by: Luiz Capitulino I based my series on top of Luiz's, so it should apply. Yeah, I confused myself into thinking that Luiz's series was more contentious than it is. Nevermind, your patches are fine on top of his. Regards, Anthony Liguori The above is the only commit that is actually required. I can ping the series once Luiz's patches are applied, so you can disregard it in the meanwhile. Paolo
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:14 AM, Avi Kivity wrote: On 05/24/2010 10:38 PM, Anthony Liguori wrote: - Building a plugin API seems a bit simpler to me, although I'm to sure if I'd get the idea correctly: The block layer has already some kind of api (.bdrv_file_open, .bdrv_read). We could simply compile the block-drivers as shared objects and create a method for loading the necessary modules at runtime. That approach would be a recipe for disaster. We would have to introduce a new, reduced functionality block API that was supported for plugins. Otherwise, the only way a plugin could keep up with our API changes would be if it was in tree which defeats the purpose of having plugins. We could guarantee API/ABI stability in a stable branch but not across releases. We have releases every six months. There would be tons of block plugins that didn't work for random sets of releases. That creates a lot of user confusion and unhappiness. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization
On 05/25/2010 04:03 PM, Anthony Liguori wrote: I don't think that qdev device names and paths are something we have to worry much about changing over time since they reflect logical bus layout. They should remain static provided the devices remain static. Modulo mistakes. We already saw one (lack of pci domains). To reduce the possibility of mistakes, we need reviewable documentation. pci domains was only a mistake as a nice-to-have. We can add pci domains in a backwards compatible way. It adds a new level to the qdev tree. Of course we can hide the new level for older clients, and newer clients can drop the level for older qemus, but it will be oh-so-painful. The arguments you're making about the importance of backwards compatibility and what's needed to strongly guarantee it are equally applicable to the live migration protocol. We really do need to formally document the live migration protocol in such a way that it's reviewable if we hope to truly make it compatible across versions. Mostly agreed. I think live migration has a faster/easier deprecation schedule (easier not to support migration from 0.n-k to 0.n than to remove qmp support for a feature introduced in 0.n-k when releasing 0.n). But that's a minor concern, improving our externally visible interface documentation is a good thing and badly needed. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host
Anthony Liguori wrote: On 05/21/2010 02:50 AM, Andre Przywara wrote: -cpu host currently only propagates the CPU's family/model/stepping, the brand name and the feature bits. Add a whitelist of safe CPUID leafs to let the guest see the actual CPU's cache details and other things. Signed-off-by: Andre Przywara The problem I can see is that this greatly increases the chances of problems with live migration since we don't migrate the cpuid state. I think that should be fixed. Although -cpu host is not a wise choice for migration, even without these additional leaves the feature bits probably don't match between source and target. What's the benefit of exposing this information to the guest? That is mostly to propagate the cache size and organization parameters to the guest: >> +/* safe CPUID leafs to propagate to guest if -cpu host is specified >> + * Intel defined leafs: >> + * Cache descriptors (0x02) >> + * Deterministic cache parameters (0x04) >> + * Monitor/MWAIT parameters (0x05) >> + * >> + * AMD defined leafs: >> + * L1 Cache and TLB (0x05) >> + * L2+L3 TLB (0x06) >> + * LongMode address size (0x08) >> + * 1GB page TLB (0x19) >> + * Performance optimization (0x1A) >> + */ Since at least L1 and L2 caches are mostly private to vCPUs, I see no reason to disguise them. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 06:25 AM, Avi Kivity wrote: On 05/25/2010 02:02 PM, Kevin Wolf wrote: So could we not standardize a protocol for this that both sheepdog and ceph could implement? The protocol already exists, nbd. It doesn't support snapshotting etc. but we could extend it. But IMO what's needed is a plugin API for the block layer. What would it buy us, apart from more downstreams and having to maintain a stable API and ABI? Currently if someone wants to add a new block format, they have to upstream it and wait for a new qemu to be released. With a plugin API, they can add a new block format to an existing, supported qemu. Whether we have a plugin or protocol based mechanism to implement block formats really ends up being just an implementation detail. In order to implement either, we need to take a subset of block functionality that we feel we can support long term and expose that. Right now, that's basically just querying characteristics (like size and geometry) and asynchronous reads and writes. A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. Plugins that just expose chunks of QEMU internal state directly (like BlockDriver) are a really bad idea IMHO. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:17 PM, Anthony Liguori wrote: On 05/25/2010 04:14 AM, Avi Kivity wrote: On 05/24/2010 10:38 PM, Anthony Liguori wrote: - Building a plugin API seems a bit simpler to me, although I'm to sure if I'd get the idea correctly: The block layer has already some kind of api (.bdrv_file_open, .bdrv_read). We could simply compile the block-drivers as shared objects and create a method for loading the necessary modules at runtime. That approach would be a recipe for disaster. We would have to introduce a new, reduced functionality block API that was supported for plugins. Otherwise, the only way a plugin could keep up with our API changes would be if it was in tree which defeats the purpose of having plugins. We could guarantee API/ABI stability in a stable branch but not across releases. We have releases every six months. There would be tons of block plugins that didn't work for random sets of releases. That creates a lot of user confusion and unhappiness. The current situation is that those block format drivers only exist in qemu.git or as patches. Surely that's even more unhappiness. Confusion could be mitigated: $ qemu -module my-fancy-block-format-driver.so my-fancy-block-format-driver.so does not support this version of qemu (0.19.2). Please contact my-fancy-block-format-driver-de...@example.org. The question is how many such block format drivers we expect. We now have two in the pipeline (ceph, sheepdog), it's reasonable to assume we'll want an lvm2 driver and btrfs driver. This is an area with a lot of activity and a relatively simply interface. -- error compiling committee.c: too many arguments to function
[Qemu-devel] RFC: ehci -> uhci handoff suggestions
USB 2.0 leverages companion UHCI or OHCI host controllers for full and low speed devices. I do not see an appropriate means for doing that bus transition and could use some suggestions. I've read through the code for the "legacy" path in handling USB devices (-usbdevice CLI arg and usb_add monitor command), and I am now working on the new path (now that I know about it). As I understand the code at this point it is a top down setup: device added, bus found, device attached. | Qemu USB admin | - adding/removing devices | interface | - showing device list | | USB controller | | | USB device model | - emulated devices (e.g., hw/usb-serial) |(or driver )| - host devices ie., key point is the expectation that the bus to which the device is assigned is known early in the code path. For USB devices the bus to attach it to should be determined automatically when the device is attached. Something along the lines of: | Qemu USB admin | | interface | | | EHCI controller |--->|UHCI / OHCI | | | | USB device model || USB device model | |(or driver )||(or driver )| high speed full / low speed To know which bus to attach it to the device needs to be queried/probed for basic information - something the current architecture does not have. Suggestions? David P.S. I skimmed the USB 3.0 spec and it has the same design: super speed devices are attached to the new 3.0 controller, high speed to ehci and low/full to uhci/ohci.
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:29 PM, Anthony Liguori wrote: The current situation is that those block format drivers only exist in qemu.git or as patches. Surely that's even more unhappiness. Confusion could be mitigated: $ qemu -module my-fancy-block-format-driver.so my-fancy-block-format-driver.so does not support this version of qemu (0.19.2). Please contact my-fancy-block-format-driver-de...@example.org. The question is how many such block format drivers we expect. We now have two in the pipeline (ceph, sheepdog), it's reasonable to assume we'll want an lvm2 driver and btrfs driver. This is an area with a lot of activity and a relatively simply interface. If we expose a simple interface, I'm all for it. But BlockDriver is not simple and things like the snapshoting API need love. Of course, there's certainly a question of why we're solving this in qemu at all. Wouldn't it be more appropriate to either (1) implement a kernel module for ceph/sheepdog if performance matters We'd need a kernel-level generic snapshot API for this eventually. or (2) implement BUSE to complement FUSE and CUSE to enable proper userspace block devices. Likely slow due do lots of copying. Also needs a snapshot API. (ABUSE was proposed a while ago by Zach). If you want to use a block device within qemu, you almost certainly want to be able to manipulate it on the host using standard tools (like mount and parted) so it stands to reason that addressing this in the kernel makes more sense. qemu-nbd also allows this. This reasoning also applies to qcow2, btw. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? Plugins that just expose chunks of QEMU internal state directly (like BlockDriver) are a really bad idea IMHO. Also, we don't want to expose all of the qemu API. We should default the visibility attribute to "hidden" and expose only select functions, perhaps under their own interface. And no inlines. Yeah, if we did plugins, this would be a key requirement. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:35 PM, Anthony Liguori wrote: On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? fork() doesn't work in the with of memory hotplug. What else is there? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:25 PM, Anthony Liguori wrote: Currently if someone wants to add a new block format, they have to upstream it and wait for a new qemu to be released. With a plugin API, they can add a new block format to an existing, supported qemu. Whether we have a plugin or protocol based mechanism to implement block formats really ends up being just an implementation detail. True. In order to implement either, we need to take a subset of block functionality that we feel we can support long term and expose that. Right now, that's basically just querying characteristics (like size and geometry) and asynchronous reads and writes. Unfortunately, you're right. A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. Plugins that just expose chunks of QEMU internal state directly (like BlockDriver) are a really bad idea IMHO. Also, we don't want to expose all of the qemu API. We should default the visibility attribute to "hidden" and expose only select functions, perhaps under their own interface. And no inlines. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host
On 05/25/2010 04:26 PM, Anthony Liguori wrote: On 05/25/2010 08:21 AM, Andre Przywara wrote: What's the benefit of exposing this information to the guest? That is mostly to propagate the cache size and organization parameters to the guest: >> +/* safe CPUID leafs to propagate to guest if -cpu host is specified >> + * Intel defined leafs: >> + * Cache descriptors (0x02) >> + * Deterministic cache parameters (0x04) >> + * Monitor/MWAIT parameters (0x05) >> + * >> + * AMD defined leafs: >> + * L1 Cache and TLB (0x05) >> + * L2+L3 TLB (0x06) >> + * LongMode address size (0x08) >> + * 1GB page TLB (0x19) >> + * Performance optimization (0x1A) >> + */ Since at least L1 and L2 caches are mostly private to vCPUs, I see no reason to disguise them. But in practice, what is it useful for? See my other mail. Just because we can expose it doesn't mean we should. What's the point of -cpu host then? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Am 25.05.2010 15:25, schrieb Anthony Liguori: > On 05/25/2010 06:25 AM, Avi Kivity wrote: >> On 05/25/2010 02:02 PM, Kevin Wolf wrote: >>> > So could we not standardize a protocol for this that both sheepdog and > ceph could implement? The protocol already exists, nbd. It doesn't support snapshotting etc. but we could extend it. But IMO what's needed is a plugin API for the block layer. >>> What would it buy us, apart from more downstreams and having to maintain >>> a stable API and ABI? >> >> Currently if someone wants to add a new block format, they have to >> upstream it and wait for a new qemu to be released. With a plugin >> API, they can add a new block format to an existing, supported qemu. > > Whether we have a plugin or protocol based mechanism to implement block > formats really ends up being just an implementation detail. > > In order to implement either, we need to take a subset of block > functionality that we feel we can support long term and expose that. > Right now, that's basically just querying characteristics (like size and > geometry) and asynchronous reads and writes. > > A protocol based mechanism has the advantage of being more robust in the > face of poorly written block backends so if it's possible to make it > perform as well as a plugin, it's a preferable approach. > > Plugins that just expose chunks of QEMU internal state directly (like > BlockDriver) are a really bad idea IMHO. I'm still not convinced that we need either. I share Christoph's concern that we would make our life harder for almost no gain. It's probably a very small group of users (if it exists at all) that wants to add new block drivers themselves, but at the same time can't run upstream qemu. But if we were to decide that there's no way around it, I agree with you that directly exposing the internal API isn't going to work. Kevin
Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization
On 05/25/2010 08:19 AM, Avi Kivity wrote: On 05/25/2010 04:03 PM, Anthony Liguori wrote: I don't think that qdev device names and paths are something we have to worry much about changing over time since they reflect logical bus layout. They should remain static provided the devices remain static. Modulo mistakes. We already saw one (lack of pci domains). To reduce the possibility of mistakes, we need reviewable documentation. pci domains was only a mistake as a nice-to-have. We can add pci domains in a backwards compatible way. It adds a new level to the qdev tree. The tree is not organized like that today. IOW, the PCI hierarchy is not reflected in the qdev hierarchy. All PCI devices (regardless of whether they're a function or a full slot) simply sit below the PCI bus. The arguments you're making about the importance of backwards compatibility and what's needed to strongly guarantee it are equally applicable to the live migration protocol. We really do need to formally document the live migration protocol in such a way that it's reviewable if we hope to truly make it compatible across versions. Mostly agreed. I think live migration has a faster/easier deprecation schedule (easier not to support migration from 0.n-k to 0.n than to remove qmp support for a feature introduced in 0.n-k when releasing 0.n). But that's a minor concern, improving our externally visible interface documentation is a good thing and badly needed. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:53 PM, Kevin Wolf wrote: I'm still not convinced that we need either. I share Christoph's concern that we would make our life harder for almost no gain. It's probably a very small group of users (if it exists at all) that wants to add new block drivers themselves, but at the same time can't run upstream qemu. The first part of your argument may be true, but the second isn't. No user can run upstream qemu.git. It's not tested or supported, and has no backwards compatibility guarantees. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH] add support for protocol driver create_options
Am 24.05.2010 08:34, schrieb MORITA Kazutaka: > At Fri, 21 May 2010 18:57:36 +0200, > Kevin Wolf wrote: >> >> Am 20.05.2010 07:36, schrieb MORITA Kazutaka: >>> + >>> +/* >>> + * Append an option list (list) to an option list (dest). >>> + * >>> + * If dest is NULL, a new copy of list is created. >>> + * >>> + * Returns a pointer to the first element of dest (or the newly allocated >>> copy) >>> + */ >>> +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest, >>> +QEMUOptionParameter *list) >>> +{ >>> +size_t num_options, num_dest_options; >>> + >>> +num_options = count_option_parameters(dest); >>> +num_dest_options = num_options; >>> + >>> +num_options += count_option_parameters(list); >>> + >>> +dest = qemu_realloc(dest, (num_options + 1) * >>> sizeof(QEMUOptionParameter)); >>> + >>> +while (list && list->name) { >>> +if (get_option_parameter(dest, list->name) == NULL) { >>> +dest[num_dest_options++] = *list; >> >> You need to add a dest[num_dest_options].name = NULL; here. Otherwise >> the next loop iteration works on uninitialized memory and possibly an >> unterminated list. I got a segfault for that reason. >> > > I forgot to add it, sorry. > Fixed version is below. > > Thanks, > > Kazutaka > > == > This patch enables protocol drivers to use their create options which > are not supported by the format. For example, protcol drivers can use > a backing_file option with raw format. > > Signed-off-by: MORITA Kazutaka $ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G Unknown option 'cluster_size' qemu-img: Invalid options for file format 'qcow2'. I think you added another num_dest_options++ which shouldn't be there. Kevin
[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations
On Tue, May 25, 2010 at 1:04 PM, Avi Kivity wrote: > Those %ps are more or less useless. We need better ways of identifying > them. You're right, the vq pointer is useless in isolation. We don't know which virtio device or which virtqueue number. With the full context of a trace it would be possible to correlate the vq pointer if we had trace events for vdev and vq setup. Adding custom formatters is could be tricky since the format string is passed only to tracing backends that use it, like UST. And UST uses its own sprintf implementation which we don't have direct control over. I think we just need to guarantee that any pointer can be correlated with previous trace entries that give context for that pointer. Stefan