date:20100525

[Qemu-devel] Re: [[RfC PATCH]] linux fbdev display driver prototype.

2010-05-25 Thread Gerd Hoffmann


This looks very promissing.

I just got a couple of observations:

- Your patch does not work on my machine with the vesafb driver. It
reports "can't handle 8 bpp frame buffers". It turns out that the
vesafb driver seems to initialize the framebuffer in PSEUDOCOLOR
mode.


Depends on the video mode you ask for via vga=$nr, there are also 32bpp 
modes.



I think we should add a piece of code which tries reinitialize
the framebuffer with the suitable parametters (32bpp/TRUECOLOR).


With vesafb it wouldn't work anyway, you can't switch these parameters 
at runtime.  I think the *drmfb fbdev interface is quite limited too in 
what it allows to change.



- You should register a Display Allocator and override the
create_displaysurface() method like I did in the DirectFB driver.
This way you save qemu a data copy. fbdev_render_32() should only be
used when the guest framebuffer is not compatible with the physical
framebuffer (guest_bpp != physical_bbp || guest_linesize !=
physical_linesize).


Isn't a trivial move though.  If the console is switched you must stop 
drawing on the framebuffer.


Right now this is easy: just stop copying.  Likewise restoring the 
screen when switching back is easy: just copy everything.


If we give out pointers to the framebuffer to other qemu code which 
doesn't know anything about console switching we have to be quite 
careful get things right ...



- A cool feature would be to be able to stretch the guest display in
fullscreen. My DirectFB driver implements a fullscreen toggle command
by pressing the Ctrl-Alt-Return keys. I think Stefano added a SDL
zoom feature a while ago which we could reuse for this.


The actual stretching is done by SDL I think.  For that kind of stuff a 
rendering library is actually helpful ...



- I'm not very familiar with the scancode stuff, but I think that if
you set your VT fd in the K_RAW keyboard mode, you'll be able to get
true keyboard scancodes that you can directly give to the guest using
the kbd_put_keycode() function.


I'm not sure this is really portable.  What do you get in K_RAW mode on 
!x86 platforms?  K_MEDIUMRAW gives you linux input layer key codes no 
matter what.  Also the translation to keysyms (for text consoles) is 
easier with mediumraw.


cheers,
  Gerd

[Qemu-devel] Jämställdhet

2010-05-25 Thread DokuMera Nyhetsbrev


Om du har problem med att läsa detta e-postmeddelande, klicka här 
(http://www.anp.se/newsletter/706025/444059437941455D4B7142445C43) för en 
webb-version.

Vårt nyhetsbrev skickas automatiskt till våra kunder och intressenter. Vill du 
inte ha detta nyhetsbrev framöver, klicka här för att avprenumerera 
(http://www.anp.se/oa/706025/444059437941455D4B7142445C43).

Nyhetsbrev 21/2010Detta nyhetsbrev är skickat till: qemu-devel@nongnu.org

 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se&from=172191075&prefix=dm)
 (http://www.anp.se/taf/706025/444059437941455D4B7142445C43)  
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/out_newsletters.asp&from=172191075&prefix=dm)
  
(http://www.anp.se/newsletter.asp?sqid=706025&sid=444059437941455D4B7142445C43&print=true)


Jämställdhetsarbete

Jämställdhet råder när kvinnor och män har samma rättigheter, möjligheter och 
skyldigheter inom alla områden. Trots att vi går mot ett mer och mer jämställt 
samhälle, så har vi ändå en lång bit kvar innan vi nått målet.

Ett steg på vägen till att bli ett mer jämställt företag är att upprätta en 
jämställdhetsplan. I verksamheter med 25 anställda eller fler skall idag finnas 
en jämställdhetsplan som revideras vart tredje år. De åtgärder som sätts upp i 
jämställdhetsplanen måste också genomföras och följas upp. 

Ett annat verktyg för att främja jämställdhetsarbetet inom företaget är en 
jämställdhetspoliy. Genom att upprätta en sådan tydliggör man, för de 
anställda, företagets syn på jämställdhet och jämställdhetsarbete samt vad 
företaget i sin tur förväntar sig av sina medarbetare i denna fråga.

Genom DokuMeras Företagspaket 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/visa-kategorier.asp?id=1321&from=172191075&prefix=dm)
 får du tillgång till mallar, checklistor, policys, expertsvar och mycket mer 
som underlättar och juridiskt säkerställer arbetet i ditt företag.


Veckans dokument 

Checklista jämställdhetsarbete 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/checklista_jamstalldhetsarbete_712_dd.html&from=172191075&prefix=dm)
 >> 

Jämställdhetspolicy 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/jamstalldhetspolicy_4298_dd.html&from=172191075&prefix=dm)
 >>

Jämställdhetsplan 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/jamstalldhetsplan_711_dd.html&from=172191075&prefix=dm)
 >> 

Policy mot trakasserier 
(http://www.dokumera.se/newsletter_redirect.asp?tohttp://www.dokumera.se/policy_mot_trakasserier_4294_dd.html&from=172191075&prefix=dm)
 >>


Ord från kund

Jan Kirkhoff, 
sälj- och marknadschef

IKAROS AB

"Det bästa med Företagspaketet är mångfalden och att många på vårt företag på 
olika befattningar kan använda det."


 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/foretagspaketet_1321_dc.html&from=172191075&prefix=dm)
 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/out_atq_ppdviewer.asp&from=172191075&prefix=dm)
 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/styckvisa_dokumentmallar_1330_dc.html&from=172191075&prefix=dm)

 
(http://www.dokumera.se/newsletter_redirect.asp?to=http://www.dokumera.se/out_contactusmessage.asp&from=172191075&prefix=dm)

För en kostnadsfri exklusiv presentation av hur DokuMera kan spara tiotusentals 
kronor åt just mitt företag.
Givetvis är du varmt välkommen att ringa oss på 08-664 04 50.

Innehållet i nyhetsbrev ska inte tolkas som ett åtagande från DokuMeras sida. 
Informationen sänds ut i befintligt skick, utan garantier och digitala 
signaturer.

Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization

2010-05-25 Thread Avi Kivity


On 05/24/2010 11:22 PM, Anthony Liguori wrote:
This converts the entire qdev tree into an undocumented stable 
protocol (the qdev paths were already in this state I believe).  This 
really worries me.



N.B. the association with qdev is only in identifying the device.  The 
contents of the device's state are not part of qdev but rather part of 
vmstate.  vmstate is something that we already guarantee to be stable 
since that's required for live migration compatibility.


That removes out ability to deprecate older vmstate as time passes.  Not 
a blocker but something to consider.


I don't think that qdev device names and paths are something we have 
to worry much about changing over time since they reflect logical bus 
layout.  They should remain static provided the devices remain static.


Modulo mistakes.  We already saw one (lack of pci domains).  To reduce 
the possibility of mistakes, we need reviewable documentation.


Note sysfs had similar assumptions and problems.

The qdev properties are a different matter entirely.  A command like 
'info qdm' would be potentially difficult to support as part of QMP 
but the proposed command's output is actually already part of a 
backward compatible interface (vmstate).


That's all good.  But documentation is critical for this.  Not only to 
improve quality, but also so that tool authors would have something to 
code against instead of trial and error (which invariably misses some 
corner cases).


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host

2010-05-25 Thread Avi Kivity


On 05/25/2010 01:10 AM, Anthony Liguori wrote:

On 05/21/2010 02:50 AM, Andre Przywara wrote:

-cpu host currently only propagates the CPU's family/model/stepping,
the brand name and the feature bits.
Add a whitelist of safe CPUID leafs to let the guest see the actual
CPU's cache details and other things.

Signed-off-by: Andre Przywara


The problem I can see is that this greatly increases the chances of 
problems with live migration since we don't migrate the cpuid state.


-cpu host is already problematic for live migration.

Are you talking about the state maintained by the cpuid instruction?  
Yes, we need to migrate those bits.




What's the benefit of exposing this information to the guest?



Some algorithms adjust themselves based on the cache size.  If you have 
several passes over a large data set, it's often better to run each set 
of passes on a subset of the dataset that fits in cache, then stitch the 
subsets together.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v2 1/3] add some tests for invalid JSON

2010-05-25 Thread Paolo Bonzini


On 05/24/2010 10:17 PM, Anthony Liguori wrote:

On 05/24/2010 02:39 AM, Paolo Bonzini wrote:

Signed-off-by: Paolo Bonzini


I think this series conflicts a bit with Luiz's series which I just
pushed. Could you rebase against the latest?


You didn't apply this one yet, at least I don't see it on qemu.git

commit e546343ee0f3f904529d32c1a9a60f5baa181852
Author: Luiz Capitulino 
Date:   Wed May 19 18:15:32 2010 -0300

json-lexer: Drop 'buf'

QString supports adding a single char, 'buf' is unneeded.

Signed-off-by: Luiz Capitulino 

I based my series on top of Luiz's, so it should apply.  The above is 
the only commit that is actually required.  I can ping the series once 
Luiz's patches are applied, so you can disregard it in the meanwhile.


Paolo

[Qemu-devel] [Test] Question

2010-05-25 Thread robert song

Hi, everyone.
I tried to test the qemu, but I found only qemu-i386 is tested.
But is there any test about other command like qemu-system-arm or
qemu-arm to make sure the function still work after modification?

Best Regards,
  robert

Re: [Qemu-devel] [PATCH 1/5] trace: Add trace-events file for declaring trace events

2010-05-25 Thread Stefan Hajnoczi

On Mon, May 24, 2010 at 11:20 PM, Anthony Liguori
 wrote:
>> +# check if trace backend exists
>> +
>> +sh tracetool "--$trace_backend" --check-backend>  /dev/null 2>  /dev/null
>>
>
> This will fail if objdir != srcdir.  You have to qualify tracetool with the
> path to srcdir.

Thanks Anthony, fixed on my branch.  I'll resend a v2 together with other fixes.

Stefan

[Qemu-devel] Re: Windows guest debugging on KVM/Qemu

2010-05-25 Thread Avi Kivity


On 05/24/2010 11:07 PM, Neo Jia wrote:

hi,

I am using KVM/Qemu to debug my Windows guest according to KVM wiki
page (http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging).
It works for me and also I can only use one Windows guest and bind its
serial port to a TCP port and run "Virtual Serial Ports Emulator" on
my Windows dev machine.

The problem is that these kind of connection is really slow. Is there
any known issue with KVM serial port driver? There is a good
discussion about the same issue one year ago. Not sure if there is any
improvement or not after that.
   


How slow?  Can you measure it (without a debugger, just guest-to-guest 
file transfer)?


slirp used to be ridiculously slow but some recent change made it fairly 
fast.  Probably a missing wakeup, perhaps serial has the same problem.  
In any case I recommend testing with qemu-kvm.git master.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] Hi. Regarding QEMU's GDB server and MMU

2010-05-25 Thread Ari Yoskovitz

Hi all.

I am very new to dev for QEMU, so I have some very basic questions.

1) I understand that QEMU has a built-in GDB server that is somewhat a
simulation of a JTAG device on dev boards, connected directly to the CPU. Is
that a correct analogy?
2) How can the GDB server handle a MMU? Would it "see" physical or virtual
addresses? Do I need a special client that can handle this?


Thanks! :-)

-- 
Use the source, Luke!

[Qemu-devel] Re: [PATCH 0/6] Make hpet a compile time option

2010-05-25 Thread Paolo Bonzini


On 05/24/2010 07:54 PM, Juan Quintela wrote:

But for the other call, what do you propose?

My best try was to hide the availability of hpet inside hpet_emul.h
with:

#ifdef CONFIG_HPET
uint32_t hpet_in_legacy_mode(void);
else
uint32_t hpet_in_legacy_mode(void) { return 0;}
#endif


Change this to a global variable rtc_disable_interrupts in 
hw/mc146818rtc.c?  (You didn't say it would need to be particularly 
pretty...).


Not tested beyond compilation.

Paolo
diff --git a/hw/hpet.c b/hw/hpet.c
index 8729fb2..c2615c1 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -29,6 +29,7 @@
 #include "console.h"
 #include "qemu-timer.h"
 #include "hpet_emul.h"
+#include "mc146818rtc.h"
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -39,14 +40,6 @@
 
 static HPETState *hpet_statep;
 
-uint32_t hpet_in_legacy_mode(void)
-{
-if (hpet_statep)
-return hpet_statep->config & HPET_CFG_LEGACY;
-else
-return 0;
-}
-
 static uint32_t timer_int_route(struct HPETTimer *timer)
 {
 uint32_t route;
@@ -139,7 +132,7 @@ static void update_irq(struct HPETTimer *timer)
 qemu_irq irq;
 int route;
 
-if (timer->tn <= 1 && hpet_in_legacy_mode()) {
+if (timer->tn <= 1 && (timer->state->config & HPET_CFG_LEGACY)) {
 /* if LegacyReplacementRoute bit is set, HPET specification requires
  * timer0 be routed to IRQ0 in NON-APIC or IRQ2 in the I/O APIC,
  * timer1 be routed to IRQ8 in NON-APIC or IRQ8 in the I/O APIC.
@@ -474,8 +467,10 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
 hpet_pit_disable();
+   rtc_disable_interrupts = 1;
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
 hpet_pit_enable();
+   rtc_disable_interrupts = 0;
 }
 break;
 case HPET_CFG + 4:
diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 571c593..61d5980 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -94,6 +94,9 @@ typedef struct RTCState {
 QEMUTimer *second_timer2;
 } RTCState;
 
+
+int rtc_disable_interrupts = 0;
+
 static void rtc_irq_raise(qemu_irq irq)
 {
 /* When HPET is operating in legacy mode, RTC interrupts are disabled
@@ -101,9 +104,7 @@ static void rtc_irq_raise(qemu_irq irq)
  * mode is established while interrupt is raised. We want it to
  * be lowered in any case
  */
-#if defined TARGET_I386
-if (!hpet_in_legacy_mode())
-#endif
+if (!rtc_disable_interrupts)
 qemu_irq_raise(irq);
 }
 
@@ -148,14 +149,10 @@ static void rtc_timer_update(RTCState *s, int64_t 
current_time)
 int enable_pie;
 
 period_code = s->cmos_data[RTC_REG_A] & 0x0f;
-#if defined TARGET_I386
 /* disable periodic timer if hpet is in legacy mode, since interrupts are
  * disabled anyway.
  */
-enable_pie = !hpet_in_legacy_mode();
-#else
-enable_pie = 1;
-#endif
+enable_pie = !rtc_disable_interrupts;
 if (period_code != 0
 && (((s->cmos_data[RTC_REG_B] & REG_B_PIE) && enable_pie)
 || ((s->cmos_data[RTC_REG_B] & REG_B_SQWE) && s->sqw_irq))) {
diff --git a/hw/mc146818rtc.h b/hw/mc146818rtc.h
index 6f46a68..ff4bcda 100644
--- a/hw/mc146818rtc.h
+++ b/hw/mc146818rtc.h
@@ -3,6 +3,7 @@
 
 #include "isa.h"
 
+extern int rtc_disable_interrupts;
 ISADevice *rtc_init(int base_year);
 void rtc_set_memory(ISADevice *dev, int addr, int val);
 void rtc_set_date(ISADevice *dev, const struct tm *tm);

Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation

2010-05-25 Thread Joerg Roedel

On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote:
> On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel  wrote:
> >> +
> >> +#define MMIO_SIZE               0x2028
> >
> > This size should be a power-of-two value. In this case probably 0x4000.
> 
> Not really, the devices can reserve regions of any size. There were
> some implementation deficiencies in earlier versions of QEMU, where
> the whole page would be reserved anyway, but this limitation has been
> removed long time ago.

The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
driver maps the MMIO region with this size. So the emulation should
reserve this amount of MMIO space too.

Joerg

[Qemu-devel] [RFC PATCH 21/23] virtio-blk: Modify save/load handler to handle inuse varialble.

2010-05-25 Thread Yoshiaki Tamura

Modify inuse type to uint16_t, let save/load to handle, and revert
last_avail_idx with inuse if there are outstanding emulation.

Signed-off-by: Yoshiaki Tamura 
---
 hw/virtio.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 7c020a3..502929c 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -70,7 +70,7 @@ struct VirtQueue
 VRing vring;
 target_phys_addr_t pa;
 uint16_t last_avail_idx;
-int inuse;
+uint16_t inuse;
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 };
@@ -641,6 +641,7 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, vdev->vq[i].vring.num);
 qemu_put_be64(f, vdev->vq[i].pa);
 qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
+qemu_put_be16s(f, &vdev->vq[i].inuse);
 if (vdev->binding->save_queue)
 vdev->binding->save_queue(vdev->binding_opaque, i, f);
 }
@@ -678,6 +679,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
 vdev->vq[i].vring.num = qemu_get_be32(f);
 vdev->vq[i].pa = qemu_get_be64(f);
 qemu_get_be16s(f, &vdev->vq[i].last_avail_idx);
+qemu_get_be16s(f, &vdev->vq[i].inuse);
+
+/* revert last_avail_idx if there are outstanding emulation. */
+vdev->vq[i].last_avail_idx -= vdev->vq[i].inuse;
+vdev->vq[i].inuse = 0;
 
 if (vdev->vq[i].pa) {
 virtqueue_init(&vdev->vq[i]);
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 14/23] Call init handler of event-tap at main().

2010-05-25 Thread Yoshiaki Tamura

Signed-off-by: Yoshiaki Tamura 
---
 vl.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 70a8aed..56d12c7 100644
--- a/vl.c
+++ b/vl.c
@@ -169,6 +169,8 @@ int main(int argc, char **argv)
 
 #include "qemu-queue.h"
 
+#include "event-tap.h"
+
 //#define DEBUG_NET
 //#define DEBUG_SLIRP
 
@@ -5949,6 +5951,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 if (default_cdrom) {
 /* we always create the cdrom drive, even if no disk is there */
 drive_add(NULL, CDROM_ALIAS);
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 22/23] Introduce -k option to enable FT migration mode (Kemari).

2010-05-25 Thread Yoshiaki Tamura

When -k option is set to migrate command, it will turn on ft_mode to
start FT migration mode (Kemari).

Signed-off-by: Yoshiaki Tamura 
---
 migration.c |3 +++
 qemu-monitor.hx |7 ---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 5b90d37..3334650 100644
--- a/migration.c
+++ b/migration.c
@@ -71,6 +71,9 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+if (qdict_get_int(qdict, "ft"))
+ft_mode = FT_INIT;
+
 if (strstart(uri, "tcp:", &p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  (int)qdict_get_int(qdict, "blk"), 
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 16c45b7..22b72d9 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -765,13 +765,14 @@ ETEXI
 
 {
 .name   = "migrate",
-.args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-.params = "[-d] [-b] [-i] uri",
+.args_type  = "detach:-d,blk:-b,inc:-i,ft:-k,uri:s",
+.params = "[-d] [-b] [-i] [-k] uri",
 .help   = "migrate to URI (using -d to not wait for completion)"
  "\n\t\t\t -b for migration without shared storage with"
  " full copy of disk\n\t\t\t -i for migration without "
  "shared storage with incremental copy of disk "
- "(base image shared between src and destination)",
+ "(base image shared between src and destination)"
+ "\n\t\t\t -k for FT migration mode (Kemari)",
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 16/23] Insert event_tap_mmio() to cpu_physical_memory_rw().

2010-05-25 Thread Yoshiaki Tamura

Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura 
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index d5c2a05..e9ed477 100644
--- a/exec.c
+++ b/exec.c
@@ -44,6 +44,7 @@
 #include "hw/hw.h"
 #include "osdep.h"
 #include "kvm.h"
+#include "event-tap.h"
 #if defined(CONFIG_USER_ONLY)
 #include 
 #include 
@@ -3373,6 +3374,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr & ~TARGET_PAGE_MASK) + p->region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l >= 4 && ((addr1 & 3) == 0)) {
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 04/23] Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages.

2010-05-25 Thread Yoshiaki Tamura

Modifies ram_save_block() and ram_save_remaining() to use
cpu_physical_memory_get_dirty_range() to check multiple dirty and non-dirty
pages at once.

Signed-off-by: Yoshiaki Tamura 
Signed-off-by: OHMURA Kei 
---
 vl.c |   52 +---
 1 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/vl.c b/vl.c
index 729c955..70a8aed 100644
--- a/vl.c
+++ b/vl.c
@@ -2779,7 +2779,8 @@ static int ram_save_block(QEMUFile *f)
 static ram_addr_t current_addr = 0;
 ram_addr_t saved_addr = current_addr;
 ram_addr_t addr = 0;
-int found = 0;
+ram_addr_t dirty_rams[HOST_LONG_BITS];
+int i, found = 0;
 
 while (addr < last_ram_offset) {
 if (kvm_enabled() && current_addr == 0) {
@@ -2791,28 +2792,33 @@ static int ram_save_block(QEMUFile *f)
 return 0;
 }
 }
-if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) 
{
+if ((found = cpu_physical_memory_get_dirty_range(
+ current_addr, last_ram_offset, dirty_rams, HOST_LONG_BITS,
+ MIGRATION_DIRTY_FLAG))) {
 uint8_t *p;
 
-cpu_physical_memory_reset_dirty(current_addr,
-current_addr + TARGET_PAGE_SIZE,
-MIGRATION_DIRTY_FLAG);
+for (i = 0; i < found; i++) {
+ram_addr_t page_addr = dirty_rams[i];
+cpu_physical_memory_reset_dirty(page_addr,
+page_addr + TARGET_PAGE_SIZE,
+MIGRATION_DIRTY_FLAG);
 
-p = qemu_get_ram_ptr(current_addr);
+p = qemu_get_ram_ptr(page_addr);
 
-if (is_dup_page(p, *p)) {
-qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_COMPRESS);
-qemu_put_byte(f, *p);
-} else {
-qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_PAGE);
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+if (is_dup_page(p, *p)) {
+qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_COMPRESS);
+qemu_put_byte(f, *p);
+} else {
+qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_PAGE);
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+}
 }
 
-found = 1;
 break;
+} else {
+addr += dirty_rams[0];
+current_addr = (saved_addr + addr) % last_ram_offset;
 }
-addr += TARGET_PAGE_SIZE;
-current_addr = (saved_addr + addr) % last_ram_offset;
 }
 
 return found;
@@ -2822,12 +2828,20 @@ static uint64_t bytes_transferred;
 
 static ram_addr_t ram_save_remaining(void)
 {
-ram_addr_t addr;
+ram_addr_t addr = 0;
 ram_addr_t count = 0;
+ram_addr_t dirty_rams[HOST_LONG_BITS];
+int found = 0;
 
-for (addr = 0; addr < last_ram_offset; addr += TARGET_PAGE_SIZE) {
-if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG))
-count++;
+while (addr < last_ram_offset) {
+if ((found = cpu_physical_memory_get_dirty_range(
+ addr, last_ram_offset, dirty_rams, HOST_LONG_BITS,
+ MIGRATION_DIRTY_FLAG))) {
+count += found;
+addr = dirty_rams[found - 1] + TARGET_PAGE_SIZE;
+} else {
+addr += dirty_rams[0];
+}
 }
 
 return count;
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 12/23] Insent event-tap callbacks to net/block layer.

2010-05-25 Thread Yoshiaki Tamura

Introduce event-tap callbacks to functions which actually fire outputs
at net/block layer.  By synchronizing VMs before outputs are fired, we
can failover to the receiver upon failure.

Signed-off-by: Yoshiaki Tamura 
---
 block.c |   22 ++
 block.h |4 
 net/queue.c |   18 ++
 net/queue.h |3 +++
 4 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 31d1ba4..cf73c47 100644
--- a/block.c
+++ b/block.c
@@ -59,6 +59,8 @@ BlockDriverState *bdrv_first;
 
 static BlockDriver *first_drv;
 
+static int (*bdrv_event_tap)(void);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -787,6 +789,10 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
 set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
 }
 
+if (bdrv_event_tap != NULL) {
+bdrv_event_tap();
+}
+
 return drv->bdrv_write(bs, sector_num, buf, nb_sectors);
 }
 
@@ -1851,6 +1857,10 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 MultiwriteCB *mcb;
 int i;
 
+if (bdrv_event_tap != NULL) {
+bdrv_event_tap();
+}
+
 if (num_reqs == 0) {
 return 0;
 }
@@ -2277,3 +2287,15 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
 return bs->dirty_count;
 }
+
+void bdrv_event_tap_register(int (*cb)(void))
+{
+if (bdrv_event_tap == NULL) {
+bdrv_event_tap = cb;
+}
+}
+
+void bdrv_event_tap_unregister(void)
+{
+bdrv_event_tap = NULL;
+}
diff --git a/block.h b/block.h
index edf5704..b5139db 100644
--- a/block.h
+++ b/block.h
@@ -207,4 +207,8 @@ int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
   int nr_sectors);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
+
+void bdrv_event_tap_register(int (*cb)(void));
+void bdrv_event_tap_unregister(void);
+
 #endif
diff --git a/net/queue.c b/net/queue.c
index 2ea6cd0..a542efe 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -57,6 +57,8 @@ struct NetQueue {
 unsigned delivering : 1;
 };
 
+static int (*net_event_tap)(void);
+
 NetQueue *qemu_new_net_queue(NetPacketDeliver *deliver,
  NetPacketDeliverIOV *deliver_iov,
  void *opaque)
@@ -151,6 +153,8 @@ static ssize_t qemu_net_queue_deliver(NetQueue *queue,
 ssize_t ret = -1;
 
 queue->delivering = 1;
+if (net_event_tap)
+net_event_tap();
 ret = queue->deliver(sender, flags, data, size, queue->opaque);
 queue->delivering = 0;
 
@@ -166,6 +170,8 @@ static ssize_t qemu_net_queue_deliver_iov(NetQueue *queue,
 ssize_t ret = -1;
 
 queue->delivering = 1;
+if (net_event_tap)
+net_event_tap();
 ret = queue->deliver_iov(sender, flags, iov, iovcnt, queue->opaque);
 queue->delivering = 0;
 
@@ -258,3 +264,15 @@ void qemu_net_queue_flush(NetQueue *queue)
 qemu_free(packet);
 }
 }
+
+void qemu_net_event_tap_register(int (*cb)(void))
+{
+if (net_event_tap == NULL) {
+net_event_tap = cb;
+}
+}
+
+void qemu_net_event_tap_unregister(void)
+{
+net_event_tap = NULL;
+}
diff --git a/net/queue.h b/net/queue.h
index a31958e..5b031c1 100644
--- a/net/queue.h
+++ b/net/queue.h
@@ -68,4 +68,7 @@ ssize_t qemu_net_queue_send_iov(NetQueue *queue,
 void qemu_net_queue_purge(NetQueue *queue, VLANClientState *from);
 void qemu_net_queue_flush(NetQueue *queue);
 
+void qemu_net_event_tap_register(int (*cb)(void));
+void qemu_net_event_tap_unregister(void);
+
 #endif /* QEMU_NET_QUEUE_H */
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 15/23] Insert event_tap_ioport() to ioport_write().

2010-05-25 Thread Yoshiaki Tamura

Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura 
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index 53dd87a..ad7a017 100644
--- a/ioport.c
+++ b/ioport.c
@@ -26,6 +26,7 @@
  */
 
 #include "ioport.h"
+#include "event-tap.h"
 
 /***/
 /* IO Port */
@@ -75,6 +76,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 10/23] Introduce util functions to control ft_transaction from savevm layer.

2010-05-25 Thread Yoshiaki Tamura

To utilize ft_transaction function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura 
---
 hw/hw.h  |5 +
 savevm.c |   41 +
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index fc9ed29..5a48a91 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -54,6 +54,8 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_transaction(int fd);
+QEMUFile *qemu_fopen_tranx_sender(void *opaque);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -63,6 +65,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_transaction_begin(QEMUFile *f);
+int qemu_transaction_commit(QEMUFile *f);
+int qemu_transaction_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 2ab883b..81cb711 100644
--- a/savevm.c
+++ b/savevm.c
@@ -82,6 +82,7 @@
 #include "migration.h"
 #include "qemu_socket.h"
 #include "qemu-queue.h"
+#include "ft_transaction.h"
 
 /* point to the block driver where the snapshots are managed */
 static BlockDriverState *bs_snapshots;
@@ -207,6 +208,21 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s->fd, (void *)buf, size, 0);
+} while (len == -1 && socket_error() == EINTR);
+
+if (len == -1)
+len = -socket_error();
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -335,6 +351,16 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s->file;
 }
 
+QEMUFile *qemu_fopen_transaction(int fd)
+{
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+s->fd = fd;
+s->file = qemu_fopen_ops_ft_tranx(s, socket_put_buffer, socket_get_buffer,
+ socket_close, 0);
+return s->file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
 int64_t pos, int size)
 {
@@ -472,6 +498,21 @@ void qemu_clear_buffer(QEMUFile *f)
 memset(f->buf, 0, f->buf_max_size);
 }
 
+int qemu_transaction_begin(QEMUFile *f)
+{
+return qemu_ft_tranx_begin(f->opaque);
+}
+
+int qemu_transaction_commit(QEMUFile *f)
+{
+return qemu_ft_tranx_commit(f->opaque);
+}
+
+int qemu_transaction_cancel(QEMUFile *f)
+{
+return qemu_ft_tranx_cancel(f->opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 23/23] Add a parser to accept FT migration incoming mode.

2010-05-25 Thread Yoshiaki Tamura

The option looks like, -incoming ::,ft_mode

Signed-off-by: Yoshiaki Tamura 
---
 migration.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3334650..a4850f9 100644
--- a/migration.c
+++ b/migration.c
@@ -42,7 +42,19 @@ static MigrationState *current_migration;
 
 void qemu_start_incoming_migration(const char *uri)
 {
-const char *p;
+const char *p = uri;
+
+/* check ft_mode option  */
+while (*p != '\0') {
+if (*p == ',') {
+p++;
+if (!strcmp(p, "ft_mode")) {
+ft_mode = FT_INIT;
+break;
+}
+}
+p++;
+}
 
 if (strstart(uri, "tcp:", &p))
 tcp_start_incoming_migration(p);
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 03/23] Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.

2010-05-25 Thread Yoshiaki Tamura

Modifies kvm_get_dirty_pages_log_range to use
cpu_physical_memory_set_dirty_range() to update the row of the
bit-based phys_ram_dirty bitmap at once.

Signed-off-by: Yoshiaki Tamura 
Signed-off-by: OHMURA Kei 
---
 qemu-kvm.c |   19 +++
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 29365a9..1414f49 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2323,8 +2323,8 @@ static int kvm_get_dirty_pages_log_range(unsigned long 
start_addr,
  unsigned long offset,
  unsigned long mem_size)
 {
-unsigned int i, j;
-unsigned long page_number, addr, addr1, c;
+unsigned int i;
+unsigned long page_number, addr, addr1;
 ram_addr_t ram_addr;
 unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
 HOST_LONG_BITS;
@@ -2335,16 +2335,11 @@ static int kvm_get_dirty_pages_log_range(unsigned long 
start_addr,
  */
 for (i = 0; i < len; i++) {
 if (bitmap[i] != 0) {
-c = leul_to_cpu(bitmap[i]);
-do {
-j = ffsl(c) - 1;
-c &= ~(1ul << j);
-page_number = i * HOST_LONG_BITS + j;
-addr1 = page_number * TARGET_PAGE_SIZE;
-addr = offset + addr1;
-ram_addr = cpu_get_physical_page_desc(addr);
-cpu_physical_memory_set_dirty(ram_addr);
-} while (c != 0);
+page_number = i * HOST_LONG_BITS;
+addr1 = page_number * TARGET_PAGE_SIZE;
+addr = offset + addr1;
+ram_addr = cpu_get_physical_page_desc(addr);
+cpu_physical_memory_set_dirty_range(ram_addr, 
leul_to_cpu(bitmap[i]));
 }
 }
 return 0;
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 19/23] Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2010-05-25 Thread Yoshiaki Tamura

Introduce ft_tranx_ready() which kicks the FT transaction cycle.  When
ft_mode is on, migrate_fd_put_ready() would open ft_transaction file
and turn on event_tap.  To end or cancel ft_transaction, ft_mode and
event_tap is turned off.

Signed-off-by: Yoshiaki Tamura 
---
 migration.c |   78 --
 1 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 2adf7ad..5b90d37 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include "qemu_socket.h"
 #include "block-migration.h"
 #include "qemu-objects.h"
+#include "event-tap.h"
 
 //#define DEBUG_MIGRATION
 
@@ -375,6 +376,49 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int ft_tranx_ready(void)
+{
+FdMigrationState *s = migrate_to_fms(current_migration);
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION && ft_mode != FT_INIT) {
+return ret;
+}
+
+if (qemu_transaction_begin(s->file) < 0) {
+fprintf(stderr, "tranx_begin failed\n");
+goto error_out;
+}
+
+/* make the VM state consistent by flushing outstanding requests. */
+vm_stop(0);
+qemu_aio_flush();
+bdrv_flush_all();
+
+if (qemu_savevm_state_all(s->mon, s->file) < 0) {
+fprintf(stderr, "savevm_state_all failed\n");
+goto error_out;
+}
+
+if (qemu_transaction_commit(s->file) < 0) {
+fprintf(stderr, "tranx_commit failed\n");
+goto error_out;
+}
+
+ret = 0;
+goto unpause_out;
+
+error_out:
+ft_mode = FT_OFF;
+qemu_savevm_state_cancel(s->mon, s->file);
+migrate_fd_cleanup(s);
+event_tap_unregister();
+
+unpause_out:
+vm_start();
+return ret;
+}
+
 void migrate_fd_put_ready(void *opaque)
 {
 FdMigrationState *s = opaque;
@@ -402,8 +446,30 @@ void migrate_fd_put_ready(void *opaque)
 } else {
 state = MIG_STATE_COMPLETED;
 }
-migrate_fd_cleanup(s);
-s->state = state;
+
+if (ft_mode && state == MIG_STATE_COMPLETED) {
+/* close buffered_file and open ft_transaction.
+ * Note: file discriptor won't get closed,
+ * but reused by ft_transaction. */
+socket_set_block(s->fd);
+socket_set_nodelay(s->fd);
+qemu_fclose(s->file);
+s->file = qemu_fopen_ops_ft_tranx(s,
+  migrate_fd_put_buffer,
+  migrate_fd_get_buffer,
+  migrate_fd_close,
+  1);
+
+/* events are tapped from now. */
+event_tap_register(ft_tranx_ready);
+
+if (old_vm_running) {
+vm_start();
+}
+} else {
+migrate_fd_cleanup(s);
+s->state = state;
+}
 }
 }
 
@@ -423,8 +489,14 @@ void migrate_fd_cancel(MigrationState *mig_state)
 DPRINTF("cancelling migration\n");
 
 s->state = MIG_STATE_CANCELLED;
-qemu_savevm_state_cancel(s->mon, s->file);
 
+if (ft_mode == FT_TRANSACTION) {
+qemu_transaction_cancel(s->file);
+ft_mode = FT_OFF;
+event_tap_unregister();
+}
+
+qemu_savevm_state_cancel(s->mon, s->file);
 migrate_fd_cleanup(s);
 }
 
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 07/23] Introduce skip_header parameter to qemu_loadvm_state().

2010-05-25 Thread Yoshiaki Tamura

Introduce skip_header parameter to qemu_loadvm_state() so that it can
be called iteratively without reading the header.

Signed-off-by: Yoshiaki Tamura 
---
 migration-exec.c |2 +-
 migration-fd.c   |2 +-
 migration-tcp.c  |2 +-
 migration-unix.c |2 +-
 savevm.c |   24 +---
 sysemu.h |2 +-
 6 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 3edc026..5839a6d 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -113,7 +113,7 @@ static void exec_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 int ret;
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret < 0) {
 fprintf(stderr, "load of migration failed\n");
 goto err;
diff --git a/migration-fd.c b/migration-fd.c
index 0cc74ad..0e97ed0 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -106,7 +106,7 @@ static void fd_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 int ret;
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret < 0) {
 fprintf(stderr, "load of migration failed\n");
 goto err;
diff --git a/migration-tcp.c b/migration-tcp.c
index cffc4df..767a2f1 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -176,7 +176,7 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret < 0) {
 fprintf(stderr, "load of migration failed\n");
 goto out_fopen;
diff --git a/migration-unix.c b/migration-unix.c
index b7aab38..dd99a73 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -168,7 +168,7 @@ static void unix_accept_incoming_migration(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret < 0) {
 fprintf(stderr, "load of migration failed\n");
 goto out_fopen;
diff --git a/savevm.c b/savevm.c
index b9bb9f4..2ab883b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1489,7 +1489,7 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+int qemu_loadvm_state(QEMUFile *f, int skip_header)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
@@ -1498,17 +1498,19 @@ int qemu_loadvm_state(QEMUFile *f)
 unsigned int v;
 int ret;
 
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
+if (!skip_header) {
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC)
+return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, "SaveVM v2 format is obsolete and don't work 
anymore\n");
-return -ENOTSUP;
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, "SaveVM v2 format is obsolete and don't work 
anymore\n");
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION)
+return -ENOTSUP;
 }
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -1833,7 +1835,7 @@ int load_vmstate(Monitor *mon, const char *name)
 monitor_printf(mon, "Could not open VM state file\n");
 return -EINVAL;
 }
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 qemu_fclose(f);
 if (ret < 0) {
 monitor_printf(mon, "Error %d while loading VM state\n", ret);
diff --git a/sysemu.h b/sysemu.h
index 647a468..6c1441f 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -68,7 +68,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
-int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
 void qemu_errors_to_file(FILE *fp);
 void qemu_errors_to_mon(Monitor *mon);
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 06/23] Introduce read() to FdMigrationState.

2010-05-25 Thread Yoshiaki Tamura

Currently FdMigrationState doesn't support read(), and this patch
introduces it to get response from the other side.

Signed-off-by: Yoshiaki Tamura 
---
 migration-tcp.c |   14 ++
 migration.c |   12 
 migration.h |3 +++
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index e7f307c..cffc4df 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -39,6 +39,19 @@ static int socket_write(FdMigrationState *s, const void * 
buf, size_t size)
 return send(s->fd, buf, size, 0);
 }
 
+static int socket_read(FdMigrationState *s, const void * buf, size_t size)
+{
+ssize_t len;
+
+do { 
+len = recv(s->fd, (void *)buf, size, 0);
+} while (len == -1 && socket_error() == EINTR);
+if (len == -1)
+len = -socket_error();
+
+return len;
+}
+
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF("tcp_close\n");
@@ -94,6 +107,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 
 s->get_error = socket_errno;
 s->write = socket_write;
+s->read = socket_read;
 s->close = tcp_close;
 s->mig_state.cancel = migrate_fd_cancel;
 s->mig_state.get_status = migrate_fd_get_status;
diff --git a/migration.c b/migration.c
index 05f6cc5..a2ca6ef 100644
--- a/migration.c
+++ b/migration.c
@@ -337,6 +337,18 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 return ret;
 }
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, int size)
+{
+FdMigrationState *s = opaque;
+ssize_t ret;
+ret = s->read(s, data, size);
+
+if (ret == -1)
+ret = -(s->get_error(s));
+
+return ret;
+}
+
 void migrate_fd_connect(FdMigrationState *s)
 {
 int ret;
diff --git a/migration.h b/migration.h
index 385423f..6f8af97 100644
--- a/migration.h
+++ b/migration.h
@@ -47,6 +47,7 @@ struct FdMigrationState
 int (*get_error)(struct FdMigrationState*);
 int (*close)(struct FdMigrationState*);
 int (*write)(struct FdMigrationState*, const void *, size_t);
+int (*read)(struct FdMigrationState *, const void *, size_t);
 void *opaque;
 };
 
@@ -113,6 +114,8 @@ void migrate_fd_put_notify(void *opaque);
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, int size);
+
 void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 08/23] Introduce some socket util functions.

2010-05-25 Thread Yoshiaki Tamura

Signed-off-by: Yoshiaki Tamura 
---
 osdep.c   |   13 +
 qemu-char.c   |   25 -
 qemu_socket.h |4 
 3 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/osdep.c b/osdep.c
index 3bab79a..63444e7 100644
--- a/osdep.c
+++ b/osdep.c
@@ -201,6 +201,12 @@ void socket_set_nonblock(int fd)
 ioctlsocket(fd, FIONBIO, &opt);
 }
 
+void socket_set_block(int fd)
+{
+unsigned long opt = 0;
+ioctlsocket(fd, FIONBIO, &opt);
+}
+
 int inet_aton(const char *cp, struct in_addr *ia)
 {
 uint32_t addr = inet_addr(cp);
@@ -223,6 +229,13 @@ void socket_set_nonblock(int fd)
 fcntl(fd, F_SETFL, f | O_NONBLOCK);
 }
 
+void socket_set_block(int fd)
+{
+int f;
+f = fcntl(fd, F_GETFL);
+fcntl(fd, F_SETFL, f & ~O_NONBLOCK);
+}
+
 void qemu_set_cloexec(int fd)
 {
 int f;
diff --git a/qemu-char.c b/qemu-char.c
index 4169492..ccdf394 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2092,12 +2092,35 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_delay(int fd)
+{
+int val = 0;
+setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&val, sizeof(val));
+}
+
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&val, sizeof(val));
 }
 
+void socket_set_timeout(int fd, int s)
+{
+struct timeval tv = {
+.tv_sec = s,
+.tv_usec = 0
+};
+/* Set socket_timeout */
+if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO,
+   &tv, sizeof(tv)) < 0) {
+fprintf(stderr, "failed to set SO_RCVTIMEO\n");
+}
+if (setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO,
+   &tv, sizeof(tv)) < 0) {
+fprintf(stderr, "fialed to set SO_SNDTIMEO\n");
+}
+}
+
 static void tcp_chr_accept(void *opaque)
 {
 CharDriverState *chr = opaque;
diff --git a/qemu_socket.h b/qemu_socket.h
index 7ee46ac..8eae465 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -35,6 +35,10 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_block(int fd);
+void socket_set_nodelay(int fd);
+void socket_set_delay(int fd);
+void socket_set_timeout(int fd, int s);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 18/23] Call event_tap_replay() at vm_start().

2010-05-25 Thread Yoshiaki Tamura

Call event_tap_replay() at vm_start() to replay the last ioport/mmio
event upon failover.

Signed-off-by: Yoshiaki Tamura 
---
 vl.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 56d12c7..762440d 100644
--- a/vl.c
+++ b/vl.c
@@ -3094,6 +3094,7 @@ void vm_start(void)
 vm_state_notify(1, 0);
 qemu_rearm_alarm_timer(alarm_timer);
 resume_all_vcpus();
+event_tap_replay();
 }
 }
 
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 20/23] Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2010-05-25 Thread Yoshiaki Tamura

When ft_mode is set in the header, tcp_accept_incoming_migration()
receives ft_transaction iteratively.  We also need a hack no to close
fd before moving to ft_transaction mode, so that we can reuse the fd
for it.

Signed-off-by: Yoshiaki Tamura 
---
 migration-tcp.c |   36 +++-
 1 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 767a2f1..a5d9b6d 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,7 @@
 #include "sysemu.h"
 #include "buffered_file.h"
 #include "block.h"
+#include "ft_transaction.h"
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -55,7 +56,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF("tcp_close\n");
-if (s->fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s->fd != -1 && ft_mode != FT_INIT) {
 close(s->fd);
 s->fd = -1;
 }
@@ -181,6 +183,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 fprintf(stderr, "load of migration failed\n");
 goto out_fopen;
 }
+
+/* ft_mode is set by qemu_loadvm_state(). */
+if (ft_mode == FT_INIT) {
+/* close normal QEMUFile first before reusing connection. */
+qemu_fclose(f);
+socket_set_nodelay(c);
+socket_set_timeout(c, 5);
+/* don't autostart to avoid split brain. */
+autostart = 0;
+
+f = qemu_fopen_transaction(c);
+if (f == NULL) {
+fprintf(stderr, "could not qemu_fopen transaction\n");
+goto out;
+}
+
+/* need to wait sender to setup. */
+if (qemu_transaction_begin(f) < 0) {
+goto out_fopen;
+}
+
+/* loop until transaction breaks */
+while ((ft_mode != FT_OFF) && (ret == 0)) {
+ret = qemu_loadvm_state(f, 1);
+}
+
+/* if migrate_cancel was called at the sender  */
+if (ft_mode == FT_OFF) {
+goto out_fopen;
+}
+}
+
 qemu_announce_self();
 DPRINTF("successfully loaded vm state\n");
 
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 13/23] Introduce event-tap.

2010-05-25 Thread Yoshiaki Tamura

event-tap controls when to start ft transaction, and inserts callbacks
to the net/block.

Signed-off-by: Yoshiaki Tamura 
---
 Makefile.target |1 +
 event-tap.c |  184 +++
 event-tap.h |   32 ++
 3 files changed, 217 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index 82caf20..a49b21f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -188,6 +188,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 # MSI-X depends on kvm for interrupt injection,
 # so moved it from Makefile.objs to Makefile.target for now
 obj-y += msix.o
+obj-y += event-tap.o
 
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 LIBS+=-lz
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..5d3a338
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,184 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "block.h"
+#include "ioport.h"
+#include "osdep.h"
+#include "hw/hw.h"
+#include "net/queue.h"
+#include "event-tap.h"
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t buf[MMIO_BUF_SIZE];
+int len;
+} EventTapMMIO;
+
+#define EVENT_TAP_IOPORT 1
+#define EVENT_TAP_MMIO   2
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport ;
+EventTapMMIO mmio;
+};
+} EventTapLog;
+
+static EventTapLog last_event_tap;
+
+int event_tap_register(int (*cb)(void))
+{
+if (cb == NULL || event_tap_state != EVENT_TAP_OFF)
+return -1;
+
+bdrv_event_tap_register(cb);
+qemu_net_event_tap_register(cb);
+event_tap_state = EVENT_TAP_ON;
+
+return 0;
+}
+
+int event_tap_unregister(void)
+{
+if (event_tap_state == EVENT_TAP_OFF)
+return -1;
+
+bdrv_event_tap_unregister();
+qemu_net_event_tap_unregister();
+event_tap_state = EVENT_TAP_OFF;
+
+return 0;
+}
+
+void event_tap_suspend(void)
+{
+if (event_tap_state == EVENT_TAP_ON)
+event_tap_state = EVENT_TAP_SUSPEND;
+}
+
+void event_tap_resume(void)
+{
+if (event_tap_state == EVENT_TAP_SUSPEND)
+event_tap_state = EVENT_TAP_ON;
+}
+
+int event_tap_get_state(void)
+{
+return event_tap_state;
+}
+
+void event_tap_ioport(int index, uint32_t address, uint32_t data)
+{
+if (event_tap_state != EVENT_TAP_ON) {
+return;
+}
+
+last_event_tap.mode = EVENT_TAP_IOPORT;
+last_event_tap.ioport.index = index;
+last_event_tap.ioport.address = address;
+last_event_tap.ioport.data = data;
+}
+
+void event_tap_mmio(uint64_t address, uint8_t *buf, int len)
+{
+if (event_tap_state != EVENT_TAP_ON || len > MMIO_BUF_SIZE) {
+return;
+}
+
+last_event_tap.mode = EVENT_TAP_MMIO;
+last_event_tap.mmio.address = address;
+last_event_tap.mmio.len = len;
+memcpy(last_event_tap.mmio.buf, buf, len);
+}
+
+static void event_tap_reset(void)
+{
+memset(&last_event_tap, 0, sizeof(last_event_tap));
+}
+
+void event_tap_replay(void)
+{
+if (event_tap_state != EVENT_TAP_REPLAY) {
+return;
+}
+
+switch (last_event_tap.mode) {
+case EVENT_TAP_IOPORT:
+switch (last_event_tap.ioport.index) {
+case 0:
+cpu_outb(last_event_tap.ioport.address, 
last_event_tap.ioport.data);
+break;
+case 1:
+cpu_outw(last_event_tap.ioport.address, 
last_event_tap.ioport.data);
+break;
+case 2:
+cpu_outl(last_event_tap.ioport.address, 
last_event_tap.ioport.data);
+break;
+}
+event_tap_reset();
+break;
+case EVENT_TAP_MMIO:
+cpu_physical_memory_rw(last_event_tap.mmio.address,
+   last_event_tap.mmio.buf,
+   last_event_tap.mmio.len, 1);
+event_tap_reset();
+break;
+}
+}
+
+static void event_tap_save(QEMUFile *f, void *opaque)
+{
+qemu_put_byte(f, last_event_tap.mode);
+
+if (last_event_tap.mode == EVENT_TAP_IOPORT) {
+qemu_put_be32(f, last_event_tap.ioport.index);
+qemu_put_be32(f, last_event_tap.ioport.address);
+qemu_put_byte(f, last_event_tap.ioport.data);
+} else {
+qemu_put_be64(f, last_event_tap.mmio.address);
+qemu_put_byte(f, last_event_tap.mmio.len);
+qemu_put_buffer(f, last_event_tap.mmio.buf, last_event_tap.mmio.len);
+}
+}
+
+static int event_tap_load(QEMUFile *f, void *opaque, int version_id)
+{
+last_event_tap.mode = qemu_get_byte(f);
+
+if (last_event_tap.mode == EVENT_TAP_IOPORT

[Qemu-devel] [RFC PATCH 11/23] Introduce qemu_savevm_state_all().

2010-05-25 Thread Yoshiaki Tamura

Introduce qemu_savevm_state_all() to send the memory and device info
together, while avoiding cancelling memory state tracking.

Signed-off-by: Yoshiaki Tamura 
---
 savevm.c |   60 
 sysemu.h |1 +
 2 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 81cb711..25ccbb8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1468,6 +1468,66 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 return 0;
 }
 
+int qemu_savevm_state_all(Monitor *mon, QEMUFile *f)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+int len;
+
+if (se->save_live_state == NULL)
+continue;
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_be32(f, se->section_id);
+
+/* ID string */
+len = strlen(se->idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se->idstr, len);
+
+qemu_put_be32(f, se->instance_id);
+qemu_put_be32(f, se->version_id);
+if (ft_mode == FT_INIT) {
+/* This is workaround. */
+se->save_live_state(mon, f, QEMU_VM_SECTION_START, se->opaque);
+} else {
+se->save_live_state(mon, f, QEMU_VM_SECTION_PART, se->opaque);
+}
+}
+
+ft_mode = FT_TRANSACTION;
+QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+int len;
+
+   if (se->save_state == NULL && se->vmsd == NULL)
+   continue;
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
+qemu_put_be32(f, se->section_id);
+
+/* ID string */
+len = strlen(se->idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se->idstr, len);
+
+qemu_put_be32(f, se->instance_id);
+qemu_put_be32(f, se->version_id);
+
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+
+if (qemu_file_has_error(f))
+return -EIO;
+
+return 0;
+}
+
+
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f)
 {
 SaveStateEntry *se;
diff --git a/sysemu.h b/sysemu.h
index 6c1441f..df314bb 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -67,6 +67,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int shared);
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
+int qemu_savevm_state_all(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 17/23] Skip assert() when event_tap_state weren't EVENT_TAP_OFF.

2010-05-25 Thread Yoshiaki Tamura

Skip assert(!cpu_single_env) in resume_all_threads() when
event_tap_state weren't EVENT_TAP_OFF.

Signed-off-by: Yoshiaki Tamura 
---
 qemu-kvm.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 1414f49..e28bf59 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -18,6 +18,7 @@
 #include "compatfd.h"
 #include "gdbstub.h"
 #include "monitor.h"
+#include "event-tap.h"
 
 #include "qemu-kvm.h"
 #include "libkvm.h"
@@ -1770,7 +1771,8 @@ static void resume_all_threads(void)
 {
 CPUState *penv = first_cpu;
 
-assert(!cpu_single_env);
+if (event_tap_get_state() == EVENT_TAP_OFF)
+assert(!cpu_single_env);
 
 while (penv) {
 penv->stop = 0;
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 00/23] Kemari for KVM v0.1.1

2010-05-25 Thread Yoshiaki Tamura

Hi,

This patch series is a revised version of Kemari for KVM, which applied comments
for the previous post.  The current code is based on qemu-kvm.git
2b644fd0e737407133c88054ba498e772ce01f27.

On the contrary to the previous version, this series doesn't require any
modifications to KVM.  The I/O events are caputured in net/block layer instead
of device emulation layer.  The transmission/transaction protocol, and most of
the control logic is implemented in QEMU.

We prepared a demonstration video again.  This time the guest is Windows XP
without virtio drivers.  The demonstration scenario is,

1. Play with a guest VM (This guest has e1000 and ide)
# The guest image should be a NFS/SAN.
2. Start incoming side with, -incoming ::,ft_mode
3. Start Kemari to synchronize the VM by running the following command in QEMU.
Just add "-k" option to usual migrate command.
migrate -d -k tcp:192.168.0.20:
3. Check the status by calling info migrate.
4. Go back to the VM to play the pinball.
5. Kill the the VM. (VNC client also disappears)
6. Press "c" to continue the VM on the other host.
7. Bring up the VNC client (Sorry, it pops outside of video capture.)
8. Confirm that the pinball works, then shutdown.

http://www.osrg.net/kemari/download/kemari-kvm-winxp.mov

The repository contains all patches we're sending with this message.  For those
who want to try, pull the following repository.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari

The changes from v0.1 -> v0.1.1 are:

- events are tapped in net/block layer instead of device emulation layer. 
- Introduce a new option for -incoming to accept FT transaction.
- Removed writev() support to QEMUFile and FdMigrationState for now.  I would
  post this work in a different series.
- Modified virtio-blk save/load handler to send inuse variable to
  correctly replay.
- Removed configure --enable-ft-mode.
- Removed unnecessary check for qemu_realloc().

I hope people like this approach, and looking forward to suggestions/comments.

Thanks,

Yoshi

Yoshiaki Tamura (23):
  Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of
bit-based phys_ram_dirty.
  Introduce cpu_physical_memory_get_dirty_range().
  Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.
  Use cpu_physical_memory_get_dirty_range() to check multiple dirty
pages.
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce skip_header parameter to qemu_loadvm_state().
  Introduce some socket util functions.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  Introduce util functions to control ft_transaction from savevm layer.
  Introduce qemu_savevm_state_all().
  Insent event-tap callbacks to net/block layer.
  Introduce event-tap.
  Call init handler of event-tap at main().
  Insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw().
  Skip assert() when event_tap_state weren't EVENT_TAP_OFF.
  Call event_tap_replay() at vm_start().
  Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when
ft_mode is on.
  Modify tcp_accept_incoming_migration() to handle ft_mode, and add a
hack not to close fd when ft_mode is enabled.
  virtio-blk: Modify save/load handler to handle inuse varialble.
  Introduce -k option to enable FT migration mode (Kemari).
  Add a parser to accept FT migration incoming mode.

 Makefile.objs|1 +
 Makefile.target  |1 +
 block.c  |   22 +++
 block.h  |4 +
 cpu-all.h|  134 -
 event-tap.c  |  184 
 event-tap.h  |   32 
 exec.c   |  131 +
 ft_transaction.c |  418 ++
 ft_transaction.h |   54 +++
 hw/hw.h  |7 +
 hw/virtio.c  |8 +-
 ioport.c |2 +
 migration-exec.c |2 +-
 migration-fd.c   |2 +-
 migration-tcp.c  |   52 +++-
 migration-unix.c |2 +-
 migration.c  |  110 ++-
 migration.h  |3 +
 net/queue.c  |   18 +++
 net/queue.h  |3 +
 osdep.c  |   13 ++
 qemu-char.c  |   25 +++-
 qemu-kvm.c   |   23 ++--
 qemu-monitor.hx  |7 +-
 qemu_socket.h|4 +
 savevm.c |  146 +--
 sysemu.h |3 +-
 vl.c |   57 +---
 29 files changed, 1371 insertions(+), 97 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_transaction.c
 create mode 100644 ft_transaction.h

[Qemu-devel] [RFC PATCH 01/23] Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty.

2010-05-25 Thread Yoshiaki Tamura

Replaces byte-based phys_ram_dirty bitmap with four (MASTER, VGA,
CODE, MIGRATION) bit-based phys_ram_dirty bitmap.  On allocation, it
sets all bits in the bitmap.  It uses ffs() to convert DIRTY_FLAG to
DIRTY_IDX.

Modifies wrapper functions for byte-based phys_ram_dirty bitmap to
bit-based phys_ram_dirty bitmap.  MASTER works as a buffer, and upon
get_diry() or get_dirty_flags(), it calls
cpu_physical_memory_sync_master() to update VGA and MIGRATION.

Replaces direct phys_ram_dirty access with wrapper functions to
prevent direct access to the phys_ram_dirty bitmap.

Signed-off-by: Yoshiaki Tamura 
Signed-off-by: OHMURA Kei 
---
 cpu-all.h |  130 +
 exec.c|   60 ++--
 2 files changed, 152 insertions(+), 38 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 51effc0..3f8762d 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -37,6 +37,9 @@
 
 #include "softfloat.h"
 
+/* to use ffs in flag_to_idx() */
+#include 
+
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
 #define BSWAP_NEEDED
 #endif
@@ -846,7 +849,6 @@ int cpu_str_to_log_mask(const char *str);
 /* memory API */
 
 extern int phys_ram_fd;
-extern uint8_t *phys_ram_dirty;
 extern ram_addr_t ram_size;
 extern ram_addr_t last_ram_offset;
 extern uint8_t *bios_mem;
@@ -869,28 +871,140 @@ extern uint8_t *bios_mem;
 /* Set if TLB entry is an IO callback.  */
 #define TLB_MMIO(1 << 5)
 
+/* Use DIRTY_IDX as indexes of bit-based phys_ram_dirty. */
+#define MASTER_DIRTY_IDX0
+#define VGA_DIRTY_IDX   1
+#define CODE_DIRTY_IDX  2
+#define MIGRATION_DIRTY_IDX 3
+#define NUM_DIRTY_IDX   4
+
+#define MASTER_DIRTY_FLAG(1 << MASTER_DIRTY_IDX)
+#define VGA_DIRTY_FLAG   (1 << VGA_DIRTY_IDX)
+#define CODE_DIRTY_FLAG  (1 << CODE_DIRTY_IDX)
+#define MIGRATION_DIRTY_FLAG (1 << MIGRATION_DIRTY_IDX)
+
+extern unsigned long *phys_ram_dirty[NUM_DIRTY_IDX];
+
+static inline int dirty_flag_to_idx(int flag)
+{
+return ffs(flag) - 1;
+}
+
+static inline int dirty_idx_to_flag(int idx)
+{
+return 1 << idx;
+}
+
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
 uint8_t *buf, int len, int is_write);
 
-#define VGA_DIRTY_FLAG   0x01
-#define CODE_DIRTY_FLAG  0x02
-#define MIGRATION_DIRTY_FLAG 0x08
-
 /* read dirty bit (return 0 or 1) */
 static inline int cpu_physical_memory_is_dirty(ram_addr_t addr)
 {
-return phys_ram_dirty[addr >> TARGET_PAGE_BITS] == 0xff;
+unsigned long mask;
+ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1);
+ 
+mask = 1UL << offset;
+return (phys_ram_dirty[MASTER_DIRTY_IDX][index] & mask) == mask;
+}
+
+static inline void cpu_physical_memory_sync_master(ram_addr_t index)
+{
+if (phys_ram_dirty[MASTER_DIRTY_IDX][index]) {
+phys_ram_dirty[VGA_DIRTY_IDX][index]
+|=  phys_ram_dirty[MASTER_DIRTY_IDX][index];
+phys_ram_dirty[MIGRATION_DIRTY_IDX][index]
+|=  phys_ram_dirty[MASTER_DIRTY_IDX][index];
+phys_ram_dirty[MASTER_DIRTY_IDX][index] = 0UL;
+}
+}
+
+static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
+{
+unsigned long mask;
+ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1);
+int ret = 0, i;
+ 
+mask = 1UL << offset;
+cpu_physical_memory_sync_master(index);
+
+for (i = VGA_DIRTY_IDX; i <= MIGRATION_DIRTY_IDX; i++) {
+if (phys_ram_dirty[i][index] & mask) {
+ret |= dirty_idx_to_flag(i);
+}
+}
+ 
+return ret;
+}
+
+static inline int cpu_physical_memory_get_dirty_idx(ram_addr_t addr,
+int dirty_idx)
+{
+unsigned long mask;
+ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1);
+
+mask = 1UL << offset;
+cpu_physical_memory_sync_master(index);
+return (phys_ram_dirty[dirty_idx][index] & mask) == mask;
 }
 
 static inline int cpu_physical_memory_get_dirty(ram_addr_t addr,
 int dirty_flags)
 {
-return phys_ram_dirty[addr >> TARGET_PAGE_BITS] & dirty_flags;
+return cpu_physical_memory_get_dirty_idx(addr,
+ dirty_flag_to_idx(dirty_flags));
 }
 
 static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
 {
-phys_ram_dirty[addr >> TARGET_PAGE_BITS] = 0xff;
+unsigned long mask;
+ram_addr_t index = (addr >> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+int offset = (addr >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1);
+
+mask = 1UL << offset;
+phys_ram_dirty[MASTER_DIRTY_IDX][index] |= mask;
+}
+
+static inline void cpu_physical_memory_set_dirty_range(ram_addr_t addr,
+

[Qemu-devel] [RFC PATCH 05/23] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2010-05-25 Thread Yoshiaki Tamura

Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura 
---
 hw/hw.h  |2 ++
 savevm.c |   21 -
 2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 05131a0..fc9ed29 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -61,6 +61,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 2fd3de6..b9bb9f4 100644
--- a/savevm.c
+++ b/savevm.c
@@ -174,7 +174,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -424,6 +425,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f->get_rate_limit = get_rate_limit;
 f->is_write = 0;
 
+f->buf_max_size = IO_BUF_SIZE;
+f->buf = qemu_mallocz(sizeof(uint8_t) * f->buf_max_size);
+
 return f;
 }
 
@@ -454,6 +458,20 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f->buf_max_size = size;
+f->buf = qemu_realloc(f->buf, f->buf_max_size);
+
+return f->buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f->buf_size = f->buf_index = f->buf_offset = 0;
+memset(f->buf, 0, f->buf_max_size);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -479,6 +497,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f->close)
 ret = f->close(f->opaque);
+qemu_free(f->buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.0.31.g1df487

[Qemu-devel] [RFC PATCH 00/23] Kemari for KVM v0.1.1

2010-05-25 Thread Yoshiaki Tamura

Hi,

This patch series is a revised version of Kemari for KVM, which applied comments
for the previous post.  The current code is based on qemu-kvm.git
2b644fd0e737407133c88054ba498e772ce01f27.

On the contrary to the previous version, this series doesn't require any
modifications to KVM.  The I/O events are caputured in net/block layer instead
of device emulation layer.  The transmission/transaction protocol, and most of
the control logic is implemented in QEMU.

We prepared a demonstration video again.  This time the guest is Windows XP
without virtio drivers.  The demonstration scenario is,

1. Play with a guest VM (This guest has e1000 and ide)
# The guest image should be a NFS/SAN.
2. Start incoming side with, -incoming ::,ft_mode
3. Start Kemari to synchronize the VM by running the following command in QEMU.
Just add "-k" option to usual migrate command.
migrate -d -k tcp:192.168.0.20:
3. Check the status by calling info migrate.
4. Go back to the VM to play the pinball.
5. Kill the the VM. (VNC client also disappears)
6. Press "c" to continue the VM on the other host.
7. Bring up the VNC client (Sorry, it pops outside of video capture.)
8. Confirm that the pinball works, then shutdown.

http://www.osrg.net/kemari/download/kemari-kvm-winxp.mov

The repository contains all patches we're sending with this message.  For those
who want to try, please pull the following repository.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari

The changes from v0.1 -> v0.1.1 are:

- events are tapped in net/block layer instead of device emulation layer. 
- Introduce a new option for -incoming to accept FT transaction.
- Removed writev() support to QEMUFile and FdMigrationState for now.  I would
  post this work in a different series.
- Modified virtio-blk save/load handler to send inuse variable to
  correctly replay.
- Removed configure --enable-ft-mode.
- Removed unnecessary check for qemu_realloc().

I hope people like this approach, and looking forward to suggestions/comments.

Thanks,

Yoshi

Yoshiaki Tamura (23):
  Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of
bit-based phys_ram_dirty.
  Introduce cpu_physical_memory_get_dirty_range().
  Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.
  Use cpu_physical_memory_get_dirty_range() to check multiple dirty
pages.
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce skip_header parameter to qemu_loadvm_state().
  Introduce some socket util functions.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  Introduce util functions to control ft_transaction from savevm layer.
  Introduce qemu_savevm_state_all().
  Insent event-tap callbacks to net/block layer.
  Introduce event-tap.
  Call init handler of event-tap at main().
  Insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw().
  Skip assert() when event_tap_state weren't EVENT_TAP_OFF.
  Call event_tap_replay() at vm_start().
  Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when
ft_mode is on.
  Modify tcp_accept_incoming_migration() to handle ft_mode, and add a
hack not to close fd when ft_mode is enabled.
  virtio-blk: Modify save/load handler to handle inuse varialble.
  Introduce -k option to enable FT migration mode (Kemari).
  Add a parser to accept FT migration incoming mode.

 Makefile.objs|1 +
 Makefile.target  |1 +
 block.c  |   22 +++
 block.h  |4 +
 cpu-all.h|  134 -
 event-tap.c  |  184 
 event-tap.h  |   32 
 exec.c   |  131 +
 ft_transaction.c |  418 ++
 ft_transaction.h |   54 +++
 hw/hw.h  |7 +
 hw/virtio.c  |8 +-
 ioport.c |2 +
 migration-exec.c |2 +-
 migration-fd.c   |2 +-
 migration-tcp.c  |   52 +++-
 migration-unix.c |2 +-
 migration.c  |  110 ++-
 migration.h  |3 +
 net/queue.c  |   18 +++
 net/queue.h  |3 +
 osdep.c  |   13 ++
 qemu-char.c  |   25 +++-
 qemu-kvm.c   |   23 ++--
 qemu-monitor.hx  |7 +-
 qemu_socket.h|4 +
 savevm.c |  146 +--
 sysemu.h |3 +-
 vl.c |   57 +---
 29 files changed, 1371 insertions(+), 97 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_transaction.c
 create mode 100644 ft_transaction.h

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/24/2010 10:19 PM, Anthony Liguori wrote:

On 05/24/2010 06:03 AM, Avi Kivity wrote:

On 05/24/2010 11:27 AM, Stefan Hajnoczi wrote:

On Sun, May 23, 2010 at 1:01 PM, Avi Kivity  wrote:

On 05/21/2010 12:29 AM, Anthony Liguori wrote:
I'd be more interested in enabling people to build these types of 
storage

systems without touching qemu.

Both sheepdog and ceph ultimately transmit I/O over a socket to a 
central

daemon, right?

That incurs an extra copy.

Besides a shared memory approach, I wonder if the splice() family of
syscalls could be used to send/receive data through a storage daemon
without the daemon looking at or copying the data?


Excellent idea.


splice() eventually requires a copy.  You cannot splice() to linux-aio 
so you'd have to splice() to a temporary buffer and then call into 
linux-aio.  With shared memory, you can avoid ever bringing the data 
into memory via O_DIRECT and linux-aio.


If the final destination is a socket, then you end up queuing guest 
memory as an skbuff.  In theory we could do an aio splice to block 
devices but I don't think that's realistic given our experience with aio 
changes.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/24/2010 10:16 PM, Anthony Liguori wrote:

On 05/24/2010 06:56 AM, Avi Kivity wrote:

On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:



The server would be local and talk over a unix domain socket, perhaps
anonymous.

nbd has other issues though, such as requiring a copy and no 
support for

metadata operations such as snapshot and file size extension.


Sorry, my explanation was unclear.  I'm not sure how running servers
on localhost can solve the problem.


The local server can convert from the local (nbd) protocol to the 
remote (sheepdog, ceph) protocol.



What I wanted to say was that we cannot specify the image of VM. With
nbd protocol, command line arguments are as follows:

  $ qemu nbd:hostname:port

As this syntax shows, with nbd protocol the client cannot pass the VM
image name to the server.


We would extend it to allow it to connect to a unix domain socket:

  qemu nbd:unix:/path/to/socket


nbd is a no-go because it only supports a single, synchronous I/O 
operation at a time and has no mechanism for extensibility.


If we go this route, I think two options are worth considering.  The 
first would be a purely socket based approach where we just accepted 
the extra copy.


The other potential approach would be shared memory based.  We export 
all guest ram as shared memory along with a small bounce buffer pool.  
We would then use a ring queue (potentially even using virtio-blk) and 
an eventfd for notification.


We can't actually export guest memory unless we allocate it as a shared 
memory object, which has many disadvantages.  The only way to export 
anonymous memory now is vmsplice(), which is fairly limited.





The server at the other end would associate the socket with a 
filename and forward it to the server using the remote protocol.


However, I don't think nbd would be a good protocol.  My preference 
would be for a plugin API, or for a new local protocol that uses 
splice() to avoid copies.


I think a good shared memory implementation would be preferable to 
plugins.  I think it's worth attempting to do a plugin interface for 
the block layer but I strongly suspect it would not be sufficient.


I would not want to see plugins that interacted with BlockDriverState 
directly, for instance.  We change it far too often.  Our main loop 
functions are also not terribly stable so I'm not sure how we would 
handle that (unless we forced all block plugins to be in a separate 
thread).


If we manage to make a good long-term stable plugin API, it would be a 
good candidate for the block layer itself.


Some OSes manage to have a stable block driver ABI, so it should be 
possible, if difficult.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] Re: [PATCH] Release usb devices on shutdown and usb_del command

2010-05-25 Thread Gerd Hoffmann


On 05/21/10 19:55, Shahar Havivi wrote:

Remove usb_host_device_release and using usb_host_close to handle usb_del 
command.
Gerd, What do you think about the usb_cleanup()?


We need a mechanism to handle this for sure.  I don't like that 
usb-specific approach very much though.


I think we should either do that at qdev level, then at exit walk the 
whole device tree and call cleanup functions (if present).  So every 
device has the chance to do cleanups when needed.


Or we could have a exit notifier, which can be used for device (and also 
other) cleanup work.


I tend to think that a exit notifier will be better.  We probably have 
only a few devices which actually have to do some cleanup work (usb 
passthrough, maybe pci passthrough too), so building qdev infrastructure 
for that feels a bit like overkill.  And exit notifiers are more 
generic, i.e. it will also work for non-device stuff.


cheers,
  Gerd

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Paul Brook

>  for (i = 0; i < 24; i++) {
>  sysbus_connect_irq(sysbus_from_qdev(hpet), i, isa_irq[i]);
>  }
> +rtc_irq = qemu_allocate_feedback_irqs(hpet_handle_rtc_irq,
> +  sysbus_from_qdev(hpet), 1);
>  }

This is wrong. The hpet device should expose this as an IO pin.

Paul

Re: [Qemu-devel] [PATCH 1/5] trace: Add trace-events file for declaring trace events

2010-05-25 Thread Avi Kivity


On 05/25/2010 01:07 AM, Anthony Liguori wrote:


Interesting approach as it lets us defer the tracing backend decision.


Also, it's compatible with the multiplatform nature of qemu.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with our 
API changes would be if it was in tree which defeats the purpose of 
having plugins.


We could guarantee API/ABI stability in a stable branch but not across 
releases.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFT][PATCH 05/15] hpet: Convert to qdev

2010-05-25 Thread Paul Brook

> +static SysBusDeviceInfo hpet_device_info = {
> +.qdev.name= "hpet",
> +.qdev.size= sizeof(HPETState),
> +.qdev.no_user = 1,

Why shouldn't the user create HPET devices? I thought you'd removed all the 
global state.

Paul

[Qemu-devel] [RFC PATCH 02/23] Introduce cpu_physical_memory_get_dirty_range().

2010-05-25 Thread Yoshiaki Tamura

It checks the first row and puts dirty addr in the array.  If the
first row is empty, it skips to the first non-dirty row or the end
addr, and put the length in the first entry of the array.

Signed-off-by: Yoshiaki Tamura 
Signed-off-by: OHMURA Kei 
---
 cpu-all.h |4 +++
 exec.c|   67 +
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 3f8762d..27187d4 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -1007,6 +1007,10 @@ static inline void 
cpu_physical_memory_mask_dirty_range(ram_addr_t start,
 }
 }
 
+int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, 
+ram_addr_t *dirty_rams, int length,
+int dirty_flags);
+
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
  int dirty_flags);
 void cpu_tlb_update_dirty(CPUState *env);
diff --git a/exec.c b/exec.c
index bf8d703..d5c2a05 100644
--- a/exec.c
+++ b/exec.c
@@ -1962,6 +1962,73 @@ static inline void tlb_reset_dirty_range(CPUTLBEntry 
*tlb_entry,
 }
 }
 
+/* It checks the first row and puts dirty addrs in the array.
+   If the first row is empty, it skips to the first non-dirty row
+   or the end addr, and put the length in the first entry of the array. */
+int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, 
+ram_addr_t *dirty_rams, int length,
+int dirty_flag)
+{
+unsigned long p = 0, page_number;
+ram_addr_t addr;
+ram_addr_t s_idx = (start >> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+ram_addr_t e_idx = (end >> TARGET_PAGE_BITS) / HOST_LONG_BITS;
+int i, j, offset, dirty_idx = dirty_flag_to_idx(dirty_flag);
+
+/* mask bits before the start addr */
+offset = (start >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1);
+cpu_physical_memory_sync_master(s_idx);
+p |= phys_ram_dirty[dirty_idx][s_idx] & ~((1UL << offset) - 1);
+
+if (s_idx == e_idx) {
+/* mask bits after the end addr */
+offset = (end >> TARGET_PAGE_BITS) & (HOST_LONG_BITS - 1);
+p &= (1UL << offset) - 1;
+}
+
+if (p == 0) {
+/* when the row is empty */
+ram_addr_t skip;
+if (s_idx == e_idx) {
+skip = end;
+   } else {
+/* skip empty rows */
+while (s_idx < e_idx) {
+s_idx++;
+cpu_physical_memory_sync_master(s_idx);
+
+if (phys_ram_dirty[dirty_idx][s_idx] != 0) {
+break;
+}
+}
+skip = (s_idx * HOST_LONG_BITS * TARGET_PAGE_SIZE);
+}
+dirty_rams[0] = skip - start;
+i = 0;
+
+} else if (p == ~0UL) {
+/* when the row is fully dirtied */
+addr = start;
+for (i = 0; i < length; i++) {
+dirty_rams[i] = addr;
+addr += TARGET_PAGE_SIZE;
+}
+} else {
+/* when the row is partially dirtied */
+i = 0;
+do {
+j = ffsl(p) - 1;
+p &= ~(1UL << j);
+page_number = s_idx * HOST_LONG_BITS + j;
+addr = page_number * TARGET_PAGE_SIZE;
+dirty_rams[i] = addr;
+i++;
+} while (p != 0 && i < length);
+}
+
+return i;
+}
+
 /* Note: start and end must be within the same ram block.  */
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
  int dirty_flags)
-- 
1.7.0.31.g1df487

[Qemu-devel] Re: [[RfC PATCH]] linux fbdev display driver prototype.

2010-05-25 Thread Stefano Stabellini

On Tue, 25 May 2010, Gerd Hoffmann wrote:
> The actual stretching is done by SDL I think.  For that kind of stuff a 
> rendering library is actually helpful ...
> 

not really, the sdl_zoom* stuff is completely generic

Re: [Qemu-devel] [PATCH 1/2] ioport: add function to check whenever a port is assigned or not

2010-05-25 Thread Gerd Hoffmann


On 05/24/10 14:32, Paul Brook wrote:

+int is_ioport_assigned(pio_addr_t addr)


Shouldn't we move this into register_ioport_{read,write}, and have that fail
if the port has already been assigned?


It already checks and fails with hw_error().  Problem with that is that 
this kills qemu in case you try to pci hot-plug a vga card.  So I've 
added a way to check before-hand, so we can fail gracefully in the few 
places where we need it (see second patch of the series).


cheers,
  Gerd

Re: [Qemu-devel] linux-user mmap bug

2010-05-25 Thread Edgar E. Iglesias

On Mon, May 24, 2010 at 08:45:31AM -0700, Richard Henderson wrote:
> On 05/24/2010 07:57 AM, Edgar E. Iglesias wrote:
> > I took a look at the code again and I dont really understand how the
> > particular case when we get a high address from the kernel while
> > mmap_min_addr is busy case is supposed to work :/
> > In fact, for CRIS it never works on my host.
> 
> Indeed, there are many cases for which it doesn't work for the Alpha
> target either.

Ye, what puzzled me was that if I am not completely senile, CRIS apps
used to emulate on my x86_64 host not so long ago :)


> > I changed it locally to keep scanning after a wrap until we succeed to
> > allocate a chunk or rewrap (SLOW) but at least I can run dynamically
> > linked CRIS programs again.
> 
> Yep.  My hack had been similar, except that I used the PageDesc tree
> to help speed things up.  But PageDesc is hardly an ideal data structure
> in which to search, since it quickly devolves into a linear search of
> the address space.
> 
> Probably the easiest real fix is to re-read /proc/self/maps each time
> the mmap_next_start guess fails and the kernel's returned address is
> out of range.
> 
> Another is using the MMAP_32BIT flag on x86-64 host whenever a 31-bit
> address is appropriate for the guest.  E.g. mips32, where architecturally
> the high half of the address space is reserved for kernel mode.


MAP_32BIT sounds good as long as guest_base is not used. When used I
guess we'd need to fallback to something else anyway..

Maybe these issues are something too look more at during the bug day? :)

In the meantime, I've patched the cris git to use the MAP_32BIT and
to fallback to a super ugly and slow linear scan..

Thanks again for the help,
Cheers


> See 
>   http://www.mail-archive.com/qemu-devel@nongnu.org/msg28924.html
> for more ideas on the subject.
> 
> 
> 
> r~

[Qemu-devel] [RFC PATCH 09/23] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2010-05-25 Thread Yoshiaki Tamura

This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura 
Signed-off-by: OHMURA Kei 
---
 Makefile.objs|1 +
 ft_transaction.c |  418 ++
 ft_transaction.h |   54 +++
 migration.c  |3 +
 4 files changed, 476 insertions(+), 0 deletions(-)
 create mode 100644 ft_transaction.c
 create mode 100644 ft_transaction.h

diff --git a/Makefile.objs b/Makefile.objs
index b73e2cb..4388fb3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -78,6 +78,7 @@ common-obj-y += qemu-char.o savevm.o #aio.o
 common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += qemu-config.o block-migration.o
+common-obj-y += ft_transaction.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_transaction.c b/ft_transaction.c
new file mode 100644
index 000..92dc681
--- /dev/null
+++ b/ft_transaction.c
@@ -0,0 +1,418 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguori
+ */
+
+#include "qemu-common.h"
+#include "hw/hw.h"
+#include "qemu-timer.h"
+#include "sysemu.h"
+#include "qemu-char.h"
+#include "ft_transaction.h"
+
+// #define DEBUG_FT_TRANSACTION
+
+typedef struct QEMUFileFtTranx
+{
+FtTranxPutBufferFunc *put_buffer;
+FtTranxGetBufferFunc *get_buffer;
+FtTranxCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+int has_error;
+int is_sender;
+int buf_max_size;
+enum QEMU_VM_TRANSACTION_STATE tranx_state;
+uint16_t tranx_id;
+uint32_t seq;
+} QEMUFileFtTranx;
+
+#define IO_BUF_SIZE 32768
+
+#ifdef DEBUG_FT_TRANSACTION
+#define dprintf(fmt, ...) \
+do { printf("ft_transaction: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define dprintf(fmt, ...) \
+do { } while (0)
+#endif
+
+static ssize_t ft_tranx_flush_buffer(void *opaque, void *buf, int size)
+{
+QEMUFileFtTranx *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+while (offset < size) {
+len = s->put_buffer(s->opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len <= 0) {
+fprintf(stderr, "ft transaction flush buffer failed \n");
+s->has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+return offset;
+}
+
+static int ft_tranx_send_header(QEMUFileFtTranx *s)
+{
+int ret = -1;
+
+dprintf("send header %d\n", s->tranx_state);
+
+ret = ft_tranx_flush_buffer(s, &s->tranx_state, sizeof(uint16_t));
+if (ret < 0) {
+goto out;
+}
+ret = ft_tranx_flush_buffer(s, &s->tranx_id, sizeof(uint16_t));
+
+out:
+return ret;
+}
+
+static int ft_tranx_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, 
int size)
+{
+QEMUFileFtTranx *s = opaque;
+ssize_t ret = -1;
+
+if (s->has_error) {
+fprintf(stderr, "flush when error, bailing\n");
+return -EINVAL;
+}
+
+ret = ft_tranx_send_header(s);
+if (ret < 0) {
+goto out;
+}
+
+ret = ft_tranx_flush_buffer(s, &s->seq, sizeof(s->seq));
+if (ret < 0) {
+goto out;
+}
+s->seq++;
+
+ret = ft_tranx_flush_buffer(s, &size, sizeof(uint32_t));
+if (ret < 0) {
+goto out;
+}
+
+ret = ft_tranx_flush_buffer(s, (uint8_t *)buf, size);
+
+out:
+return ret;
+}
+
+#if 0
+static int ft_tranx_put_vector(void *opaque, struct iovec *vector, int64_t 
pos, int count)
+{
+QEMUFileFtTranx *s = opaque;
+ssize_t ret = -1;
+int i;
+uint32_t size = 0;
+
+dprintf("putting %d vectors at %" PRId64 "\n", count, pos);
+
+if (s->has_error) {
+dprintf("put vector when error, bailing\n");
+return -EINVAL;
+}
+
+ret = ft_tranx_send_header(s);
+if (ret < 0) {
+return ret;
+}
+
+ret = ft_tranx_flush_buffer(s, &s->seq, sizeof(s->seq));
+if (ret < 0) {
+return ret;
+}
+s->seq++;
+
+for (i = 0; i < count; i++)
+size += vector[i].iov_len;
+
+ret = ft_tranx_flush_buffer(s, &size, sizeof(uint32_t));
+if (ret < 0) {
+return ret;
+}
+
+while (count > 0) {
+/* 
+ * It will continue calling put_vector even if count > IOV_MAX.
+ */
+ret = s->put_vector(s->opaque, vector,
+((count>IOV_MAX)?IOV_MAX:count));
+
+if (ret <= 0) {
+fprintf(stderr, "ft transaction putting vector\n");
+s->has_error = 1;
+ret

[Qemu-devel] [Bug 494486] Re: cirrus_vga display is buggy after migration

2010-05-25 Thread Pierre Riteau

Fixed by ae6b2c4ed956c17456e70efefe13ad0ab7db31de

** Changed in: qemu
   Status: New => Fix Committed

-- 
cirrus_vga display is buggy after migration
https://bugs.launchpad.net/bugs/494486
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Fix Committed

Bug description:
[ Bug also reported on qemu-devel on November 24, 2009 ]

After migrating a VM (running Debian Lenny 32-bit) using text consoles (default 
Debian configuration, no framebuffer I think), the VM is responsive to keyboard 
input but doesn't display new characters: only the cursor moves.
Otherwise the machine seems to run fine: I can log in and see the cursor moving 
at the prompt position for example.
Reverting the hw/cirrus_vga.c part of 7e72abc382b700a72549e8147bdea413534eeedc 
resolves the problem.

Origin host is Debian Lenny 32-bits, destination host is Fedora 12 32-bit.

[Qemu-devel] Re: KVM call agenda for May 18

2010-05-25 Thread Avi Kivity


On 05/19/2010 11:20 AM, Christoph Hellwig wrote:


It's time we get a proper bugzilla.qemu.org for both qemu and qemu-kvm
that can be used sanely.  If you ask nicely you might even get a virtual
instance of bugzilla.kernel.org which works quite nicely.
   


That would be my preference too but there's a limit to how much we can 
juggle the bug database around.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] Re: [PATCH 0/6] Make hpet a compile time option

2010-05-25 Thread Jan Kiszka

Paolo Bonzini wrote:
> On 05/24/2010 07:54 PM, Juan Quintela wrote:
>> But for the other call, what do you propose?
>>
>> My best try was to hide the availability of hpet inside hpet_emul.h
>> with:
>>
>> #ifdef CONFIG_HPET
>> uint32_t hpet_in_legacy_mode(void);
>> else
>> uint32_t hpet_in_legacy_mode(void) { return 0;}
>> #endif
> 
> Change this to a global variable rtc_disable_interrupts in
> hw/mc146818rtc.c?  (You didn't say it would need to be particularly
> pretty...).
> 
> Not tested beyond compilation.
> 
> Paolo
> 

Please don't waste your time:

http://permalink.gmane.org/gmane.comp.emulators.qemu/71377

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Re: [PATCH 0/6] Make hpet a compile time option

2010-05-25 Thread Paolo Bonzini


On 05/25/2010 11:05 AM, Jan Kiszka wrote:

Please don't waste your time:

http://permalink.gmane.org/gmane.comp.emulators.qemu/71377


I wasn't going to. :-)  I had seen the series---very nice work!

Paolo

Re: [Qemu-devel] [RFT][PATCH 05/15] hpet: Convert to qdev

2010-05-25 Thread Jan Kiszka

Paul Brook wrote:
>> +static SysBusDeviceInfo hpet_device_info = {
>> +.qdev.name= "hpet",
>> +.qdev.size= sizeof(HPETState),
>> +.qdev.no_user = 1,
> 
> Why shouldn't the user create HPET devices? I thought you'd removed all the 
> global state.

Long-term, there is no reason to deny this.

But the code is not yet ready for this: we statically instantiate it
during PC setup to establish the routings and respect -no-hpat. Also,
the BIOS isn't prepared for > 1 HPET.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Jan Kiszka

Paul Brook wrote:
>>  for (i = 0; i < 24; i++) {
>>  sysbus_connect_irq(sysbus_from_qdev(hpet), i, isa_irq[i]);
>>  }
>> +rtc_irq = qemu_allocate_feedback_irqs(hpet_handle_rtc_irq,
>> +  sysbus_from_qdev(hpet), 1);
>>  }
> 
> This is wrong. The hpet device should expose this as an IO pin.

Will look into this.

BTW, I just realized that the GPIO handling is apparently lacking
support for attaching an output to multiple inputs. Or am I missing
something?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH 2/7] trace: Support disabled events in trace-events

2010-05-25 Thread Stefan Hajnoczi

Sometimes it is useful to disable a trace event.  Removing the event
from trace-events is not enough since source code will call the
trace_*() function for the event.

This patch makes it easy to build without specific trace events by
marking them disabled in trace-events:

disable multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"

This builds without the multiwrite_cb trace event.

Signed-off-by: Stefan Hajnoczi 
---
v2:
 * This patch is new in v2

 trace-events |4 +++-
 tracetool|   10 --
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index a37d3cc..5efaa86 100644
--- a/trace-events
+++ b/trace-events
@@ -12,10 +12,12 @@
 #
 # Format of a trace event:
 #
-# ( [,  ] ...) ""
+# [disable] ( [,  ] ...) ""
 #
 # Example: qemu_malloc(size_t size) "size %zu"
 #
+# The "disable" keyword will build without the trace event.
+#
 # The  must be a valid as a C function name.
 #
 # Types should be standard C types.  Use void * for pointers because the trace
diff --git a/tracetool b/tracetool
index 766a9ba..53d3612 100755
--- a/tracetool
+++ b/tracetool
@@ -110,7 +110,7 @@ linetoc_end_nop()
 # Process stdin by calling begin, line, and end functions for the backend
 convert()
 {
-local begin process_line end
+local begin process_line end str disable
 begin="lineto$1_begin_$backend"
 process_line="lineto$1_$backend"
 end="lineto$1_end_$backend"
@@ -123,8 +123,14 @@ convert()
 str=${str%%#*}
 test -z "$str" && continue
 
+# Process the line.  The nop backend handles disabled lines.
+disable=${str%%disable*}
 echo
-"$process_line" "$str"
+if test -z "$disable"; then
+"lineto$1_nop" "${str##disable}"
+else
+"$process_line" "$str"
+fi
 done
 
 echo
-- 
1.7.1

[Qemu-devel] [PATCH 6/7] trace: Trace virtio-blk, multiwrite, and paio_submit

2010-05-25 Thread Stefan Hajnoczi

This patch adds trace events that make it possible to observe
virtio-blk.

Signed-off-by: Stefan Hajnoczi 
---
 block.c|7 +++
 hw/virtio-blk.c|7 +++
 posix-aio-compat.c |2 ++
 trace-events   |   14 ++
 4 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 0b0966c..56db112 100644
--- a/block.c
+++ b/block.c
@@ -23,6 +23,7 @@
  */
 #include "config-host.h"
 #include "qemu-common.h"
+#include "trace.h"
 #include "monitor.h"
 #include "block_int.h"
 #include "module.h"
@@ -1922,6 +1923,8 @@ static void multiwrite_cb(void *opaque, int ret)
 {
 MultiwriteCB *mcb = opaque;
 
+trace_multiwrite_cb(mcb, ret);
+
 if (ret < 0 && !mcb->error) {
 mcb->error = ret;
 multiwrite_user_cb(mcb);
@@ -2065,6 +2068,8 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // Check for mergable requests
 num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
 
+trace_bdrv_aio_multiwrite(mcb, mcb->num_callbacks, num_reqs);
+
 // Run the aio requests
 for (i = 0; i < num_reqs; i++) {
 acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
@@ -2075,9 +2080,11 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // submitted yet. Otherwise we'll wait for the submitted AIOs to
 // complete and report the error in the callback.
 if (mcb->num_requests == 0) {
+trace_bdrv_aio_multiwrite_earlyfail(mcb);
 reqs[i].error = -EIO;
 goto fail;
 } else {
+trace_bdrv_aio_multiwrite_latefail(mcb, i);
 mcb->num_requests++;
 multiwrite_cb(mcb, -EIO);
 break;
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 5d7f1a2..706f109 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include "trace.h"
 #include "virtio-blk.h"
 #include "block_int.h"
 #ifdef __linux__
@@ -50,6 +51,8 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int 
status)
 {
 VirtIOBlock *s = req->dev;
 
+trace_virtio_blk_req_complete(req, status);
+
 req->in->status = status;
 virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in));
 virtio_notify(&s->vdev, s->vq);
@@ -87,6 +90,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 {
 VirtIOBlockReq *req = opaque;
 
+trace_virtio_blk_rw_complete(req, ret);
+
 if (ret) {
 int is_read = !(req->out->type & VIRTIO_BLK_T_OUT);
 if (virtio_blk_handle_rw_error(req, -ret, is_read))
@@ -263,6 +268,8 @@ static void virtio_blk_handle_flush(BlockRequest *blkreq, 
int *num_writes,
 static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes,
 VirtIOBlockReq *req, BlockDriverState **old_bs)
 {
+trace_virtio_blk_handle_write(req, req->out->sector, req->qiov.size / 512);
+
 if (req->out->sector & req->dev->sector_mask) {
 virtio_blk_rw_complete(req, -EIO);
 return;
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index b43c531..c2200fe 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -25,6 +25,7 @@
 #include "qemu-queue.h"
 #include "osdep.h"
 #include "qemu-common.h"
+#include "trace.h"
 #include "block_int.h"
 
 #include "block/raw-posix-aio.h"
@@ -583,6 +584,7 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 acb->next = posix_aio_state->first_aio;
 posix_aio_state->first_aio = acb;
 
+trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
 qemu_paio_submit(acb);
 return &acb->common;
 }
diff --git a/trace-events b/trace-events
index 3fde0c6..48415f8 100644
--- a/trace-events
+++ b/trace-events
@@ -34,3 +34,17 @@ qemu_free(void *ptr) "ptr %p"
 qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size 
%zu ptr %p"
 qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
 qemu_vfree(void *ptr) "ptr %p"
+
+# block.c
+multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"
+bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb %p 
num_callbacks %d num_reqs %d"
+bdrv_aio_multiwrite_earlyfail(void *mcb) "mcb %p"
+bdrv_aio_multiwrite_latefail(void *mcb, int i) "mcb %p i %d"
+
+# hw/virtio-blk.c
+virtio_blk_req_complete(void *req, int status) "req %p status %d"
+virtio_blk_rw_complete(void *req, int ret) "req %p ret %d"
+virtio_blk_handle_write(void *req, unsigned long sector, unsigned long 
nsectors) "req %p sector %lu nsectors %lu"
+
+# posix-aio-compat.c
+paio_submit(void *acb, void *opaque, unsigned long sector_num, unsigned long 
nb_sectors, unsigned long type) "acb %p opaque %p sector_num %lu nb_sectors %lu 
type %lu"
-- 
1.7.1

[Qemu-devel] [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi

This patch adds trace events for virtqueue operations including
adding/removing buffers, notifying the guest, and receiving a notify
from the guest.

Signed-off-by: Stefan Hajnoczi 
---
v2:
 * This patch is new in v2

 hw/virtio.c  |8 
 trace-events |8 
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 4475bb3..a5741ae 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -13,6 +13,7 @@
 
 #include 
 
+#include "trace.h"
 #include "virtio.h"
 #include "sysemu.h"
 
@@ -205,6 +206,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 unsigned int offset;
 int i;
 
+trace_virtqueue_fill(vq, elem, len, idx);
+
 offset = 0;
 for (i = 0; i < elem->in_num; i++) {
 size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
@@ -232,6 +235,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 /* Make sure buffer is written before we update index. */
 wmb();
+trace_virtqueue_flush(vq, count);
 vring_used_idx_increment(vq, count);
 vq->inuse -= count;
 }
@@ -422,6 +426,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 
 vq->inuse++;
 
+trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 return elem->in_num + elem->out_num;
 }
 
@@ -560,6 +565,7 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n)
 void virtio_queue_notify(VirtIODevice *vdev, int n)
 {
 if (n < VIRTIO_PCI_QUEUE_MAX && vdev->vq[n].vring.desc) {
+trace_virtio_queue_notify(vdev, n, &vdev->vq[n]);
 vdev->vq[n].handle_output(vdev, &vdev->vq[n]);
 }
 }
@@ -597,6 +603,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 
 void virtio_irq(VirtQueue *vq)
 {
+trace_virtio_irq(vq);
 vq->vdev->isr |= 0x01;
 virtio_notify_vector(vq->vdev, vq->vector);
 }
@@ -609,6 +616,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
  (vq->inuse || vring_avail_idx(vq) != vq->last_avail_idx)))
 return;
 
+trace_virtio_notify(vdev, vq);
 vdev->isr |= 0x01;
 virtio_notify_vector(vdev, vq->vector);
 }
diff --git a/trace-events b/trace-events
index 48415f8..a533414 100644
--- a/trace-events
+++ b/trace-events
@@ -35,6 +35,14 @@ qemu_memalign(size_t alignment, size_t size, void *ptr) 
"alignment %zu size %zu
 qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
 qemu_vfree(void *ptr) "ptr %p"
 
+# hw/virtio.c
+virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) 
"vq %p elem %p len %u idx %u"
+virtqueue_flush(void *vq, unsigned int count) "vq %p count %u"
+virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) 
"vq %p elem %p in_num %u out_num %u"
+virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
+virtio_irq(void *vq) "vq %p"
+virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
+
 # block.c
 multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"
 bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb %p 
num_callbacks %d num_reqs %d"
-- 
1.7.1

[Qemu-devel] [PATCH v2 0/7] Tracing backends

2010-05-25 Thread Stefan Hajnoczi

After the RFC discussion, updated patches which I propose for review and merge:

The following patches against qemu.git allow static trace events to be declared
in QEMU.  Trace events use a lightweight syntax and are independent of the
backend tracing system (e.g. LTTng UST).

Supported backends are:
 * my trivial tracer ("simple")
 * LTTng Userspace Tracer ("ust")
 * no tracer ("nop", the default)

The ./configure option to choose a backend is --trace-backend=.

Main point of this patchset: adding new trace events is easy and we can switch
between backends without modifying the code.

These patches are also available at:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/tracing

v2:
[PATCH 1/7] trace: Add trace-events file for declaring trace events
 * Use "$source_path/tracetool" in ./configure
 * Include qemu-common.h in trace.h so common types are available

[PATCH 2/7] trace: Support disabled events in trace-events
 * New in v2: makes it easy to build only a subset of trace events

[PATCH 3/7] trace: Add simple built-in tracing backend
 * Make simpletrace.py parse trace-events instead of generating Python

[PATCH 4/7] trace: Add LTTng Userspace Tracer backend

[PATCH 5/7] trace: Trace qemu_malloc() and qemu_vmalloc()
 * Record pointer result from allocation functions

[PATCH 6/7] trace: Trace virtio-blk, multiwrite, and paio_submit

[PATCH 7/7] trace: Trace virtqueue operations
 * New in v2: observe virtqueue buffer add/remove and notifies

[Qemu-devel] [PATCH 1/7] trace: Add trace-events file for declaring trace events

2010-05-25 Thread Stefan Hajnoczi

This patch introduces the trace-events file where trace events can be
declared like so:

qemu_malloc(size_t size) "size %zu"
qemu_free(void *ptr) "ptr %p"

These trace event declarations are processed by a new tool called
tracetool to generate code for the trace events.  Trace event
declarations are independent of the backend tracing system (LTTng User
Space Tracing, ftrace markers, DTrace).

The default "nop" backend generates empty trace event functions.
Therefore trace events are disabled by default.

The trace-events file serves two purposes:

1. Adding trace events is easy.  It is not necessary to understand the
   details of a backend tracing system.  The trace-events file is a
   single location where trace events can be declared without code
   duplication.

2. QEMU is not tightly coupled to one particular backend tracing system.
   In order to support tracing across QEMU host platforms and to
   anticipate new backend tracing systems that are currently maturing,
   it is important to be flexible and not tied to one system.

Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Use "$source_path/tracetool" in ./configure
 * Include qemu-common.h in trace.h so common types are available

 .gitignore  |2 +
 Makefile|   17 -
 Makefile.objs   |5 ++
 Makefile.target |1 +
 configure   |   19 ++
 trace-events|   24 
 tracetool   |  165 +++
 7 files changed, 229 insertions(+), 4 deletions(-)
 create mode 100644 trace-events
 create mode 100755 tracetool

diff --git a/.gitignore b/.gitignore
index fdfe2f0..4644557 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,8 @@ config-devices.*
 config-all-devices.*
 config-host.*
 config-target.*
+trace.h
+trace.c
 *-softmmu
 *-darwin-user
 *-linux-user
diff --git a/Makefile b/Makefile
index 7986bf6..a9f79a9 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 # Makefile for QEMU.
 
-GENERATED_HEADERS = config-host.h
+GENERATED_HEADERS = config-host.h trace.h
 
 ifneq ($(wildcard config-host.mak),)
 # Put the all: rule here so that config-host.mak can contain dependencies.
@@ -130,16 +130,24 @@ bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
 iov.o: iov.c iov.h
 
+trace.h: trace-events
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h < 
$< > $@,"  GEN   $@")
+
+trace.c: trace-events
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -c < 
$< > $@,"  GEN   $@")
+
+trace.o: trace.c
+
 ##
 
 qemu-img.o: qemu-img-cmds.h
 qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o: $(GENERATED_HEADERS)
 
-qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
-qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
-qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@,"  GEN   $@")
@@ -157,6 +165,7 @@ clean:
rm -f *.o *.d *.a $(TOOLS) TAGS cscope.* *.pod *~ */*~
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d
rm -f qemu-img-cmds.h
+   rm -f trace.c trace.h
$(MAKE) -C tests clean
for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
diff --git a/Makefile.objs b/Makefile.objs
index 1a942e5..20e709e 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -251,6 +251,11 @@ libdis-$(CONFIG_S390_DIS) += s390-dis.o
 libdis-$(CONFIG_SH4_DIS) += sh4-dis.o
 libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 
+##
+# trace
+
+trace-obj-y = trace.o
+
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
 vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS)
diff --git a/Makefile.target b/Makefile.target
index fda5bf3..8f7b564 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -293,6 +293,7 @@ $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y): $(GENERATED_HEADERS)
 
 obj-y += $(addprefix ../, $(common-obj-y))
 obj-y += $(addprefix ../libdis/, $(libdis-y))
+obj-y += $(addprefix ../, $(trace-obj-y))
 obj-y += $(libobj-y)
 obj-y += $(addprefix $(HWDIR)/, $(hw-obj-y))
 
diff --git a/configure b/configure
index 3cd2c5f..e94e113 100755
--- a/configure
+++ b/configure
@@ -299,6 +299,7 @@ pkgversion=""
 check_utests="no"
 user_pie="no"
 zero_malloc=""
+trace_backend="nop"
 
 # OS specific
 if check_define __linux__ ; then
@@ -494,6 +495,8 @@ for opt do
   ;;
   --target-list=*) target_list="$optarg"
   ;;
+  -

[Qemu-devel] [PATCH 5/7] trace: Trace qemu_malloc() and qemu_vmalloc()

2010-05-25 Thread Stefan Hajnoczi

It is often useful to instrument memory management functions in order to
find leaks or performance problems.  This patch adds trace events for
the memory allocation primitives.

Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Record pointer result from allocation functions

 osdep.c   |   24 ++--
 qemu-malloc.c |   12 ++--
 trace-events  |   10 ++
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/osdep.c b/osdep.c
index abbc8a2..a6b7726 100644
--- a/osdep.c
+++ b/osdep.c
@@ -50,6 +50,7 @@
 #endif
 
 #include "qemu-common.h"
+#include "trace.h"
 #include "sysemu.h"
 #include "qemu_socket.h"
 
@@ -71,25 +72,34 @@ static void *oom_check(void *ptr)
 #if defined(_WIN32)
 void *qemu_memalign(size_t alignment, size_t size)
 {
+void *ptr;
+
 if (!size) {
 abort();
 }
-return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+trace_qemu_memalign(alignment, size, ptr);
+return ptr;
 }
 
 void *qemu_vmalloc(size_t size)
 {
+void *ptr;
+
 /* FIXME: this is not exactly optimal solution since VirtualAlloc
has 64Kb granularity, but at least it guarantees us that the
memory is page aligned. */
 if (!size) {
 abort();
 }
-return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+trace_qemu_vmalloc(size, ptr);
+return ptr;
 }
 
 void qemu_vfree(void *ptr)
 {
+trace_qemu_vfree(ptr);
 VirtualFree(ptr, 0, MEM_RELEASE);
 }
 
@@ -97,21 +107,22 @@ void qemu_vfree(void *ptr)
 
 void *qemu_memalign(size_t alignment, size_t size)
 {
+void *ptr;
 #if defined(_POSIX_C_SOURCE) && !defined(__sun__)
 int ret;
-void *ptr;
 ret = posix_memalign(&ptr, alignment, size);
 if (ret != 0) {
 fprintf(stderr, "Failed to allocate %zu B: %s\n",
 size, strerror(ret));
 abort();
 }
-return ptr;
 #elif defined(CONFIG_BSD)
-return oom_check(valloc(size));
+ptr = oom_check(valloc(size));
 #else
-return oom_check(memalign(alignment, size));
+ptr = oom_check(memalign(alignment, size));
 #endif
+trace_qemu_memalign(alignment, size, ptr);
+return ptr;
 }
 
 /* alloc shared memory pages */
@@ -122,6 +133,7 @@ void *qemu_vmalloc(size_t size)
 
 void qemu_vfree(void *ptr)
 {
+trace_qemu_vfree(ptr);
 free(ptr);
 }
 
diff --git a/qemu-malloc.c b/qemu-malloc.c
index 6cdc5de..72de60a 100644
--- a/qemu-malloc.c
+++ b/qemu-malloc.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include "qemu-common.h"
+#include "trace.h"
 #include 
 
 static void *oom_check(void *ptr)
@@ -39,6 +40,7 @@ void *get_mmap_addr(unsigned long size)
 
 void qemu_free(void *ptr)
 {
+trace_qemu_free(ptr);
 free(ptr);
 }
 
@@ -53,18 +55,24 @@ static int allow_zero_malloc(void)
 
 void *qemu_malloc(size_t size)
 {
+void *ptr;
 if (!size && !allow_zero_malloc()) {
 abort();
 }
-return oom_check(malloc(size ? size : 1));
+ptr = oom_check(malloc(size ? size : 1));
+trace_qemu_malloc(size, ptr);
+return ptr;
 }
 
 void *qemu_realloc(void *ptr, size_t size)
 {
+void *newptr;
 if (!size && !allow_zero_malloc()) {
 abort();
 }
-return oom_check(realloc(ptr, size ? size : 1));
+newptr = oom_check(realloc(ptr, size ? size : 1));
+trace_qemu_realloc(ptr, size, newptr);
+return newptr;
 }
 
 void *qemu_mallocz(size_t size)
diff --git a/trace-events b/trace-events
index 5efaa86..3fde0c6 100644
--- a/trace-events
+++ b/trace-events
@@ -24,3 +24,13 @@
 # system may not have the necessary headers included.
 #
 # The  should be a sprintf()-compatible format string.
+
+# qemu-malloc.c
+qemu_malloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_realloc(void *ptr, size_t size, void *newptr) "ptr %p size %zu newptr %p"
+qemu_free(void *ptr) "ptr %p"
+
+# osdep.c
+qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size 
%zu ptr %p"
+qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_vfree(void *ptr) "ptr %p"
-- 
1.7.1

[Qemu-devel] [PATCH 3/7] trace: Add simple built-in tracing backend

2010-05-25 Thread Stefan Hajnoczi

This patch adds a simple tracer which produces binary trace files and is
built into QEMU.  The main purpose of this patch is to show how new
tracing backends can be added to tracetool.

To try out the simple backend:

./configure --trace-backend=simple
make

After running QEMU you can pretty-print the trace:

./simpletrace.py trace-events /tmp/trace.log

Signed-off-by: Stefan Hajnoczi 
---
I intend for this tracing backend to be replaced by something based on Prerna's
work.  For now it is useful for basic tracing.

v2:
 * Make simpletrace.py parse trace-events instead of generating Python

 .gitignore |1 +
 Makefile.objs  |3 ++
 configure  |2 +-
 simpletrace.c  |   64 ++
 simpletrace.py |   53 ++
 tracetool  |   78 +--
 6 files changed, 197 insertions(+), 4 deletions(-)
 create mode 100644 simpletrace.c
 create mode 100755 simpletrace.py

diff --git a/.gitignore b/.gitignore
index 4644557..5128452 100644
--- a/.gitignore
+++ b/.gitignore
@@ -39,6 +39,7 @@ qemu-monitor.texi
 *.log
 *.pdf
 *.pg
+*.pyc
 *.toc
 *.tp
 *.vr
diff --git a/Makefile.objs b/Makefile.objs
index 20e709e..7cb40ac 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -255,6 +255,9 @@ libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 # trace
 
 trace-obj-y = trace.o
+ifeq ($(TRACE_BACKEND),simple)
+trace-obj-y += simpletrace.o
+endif
 
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
diff --git a/configure b/configure
index e94e113..7d2c69b 100755
--- a/configure
+++ b/configure
@@ -829,7 +829,7 @@ echo "  --enable-docsenable documentation build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
 echo "  --enable-vhost-net   enable vhost-net acceleration support"
-echo "  --trace-backend=BTrace backend nop"
+echo "  --trace-backend=BTrace backend nop simple"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
diff --git a/simpletrace.c b/simpletrace.c
new file mode 100644
index 000..2fec4d3
--- /dev/null
+++ b/simpletrace.c
@@ -0,0 +1,64 @@
+#include 
+#include 
+#include "trace.h"
+
+typedef struct {
+unsigned long event;
+unsigned long x1;
+unsigned long x2;
+unsigned long x3;
+unsigned long x4;
+unsigned long x5;
+} TraceRecord;
+
+enum {
+TRACE_BUF_LEN = 64 * 1024 / sizeof(TraceRecord),
+};
+
+static TraceRecord trace_buf[TRACE_BUF_LEN];
+static unsigned int trace_idx;
+static FILE *trace_fp;
+
+static void trace(TraceEvent event, unsigned long x1,
+  unsigned long x2, unsigned long x3,
+  unsigned long x4, unsigned long x5) {
+TraceRecord *rec = &trace_buf[trace_idx];
+rec->event = event;
+rec->x1 = x1;
+rec->x2 = x2;
+rec->x3 = x3;
+rec->x4 = x4;
+rec->x5 = x5;
+
+if (++trace_idx == TRACE_BUF_LEN) {
+trace_idx = 0;
+
+if (!trace_fp) {
+trace_fp = fopen("/tmp/trace.log", "w");
+}
+if (trace_fp) {
+size_t result = fwrite(trace_buf, sizeof trace_buf, 1, trace_fp);
+result = result;
+}
+}
+}
+
+void trace1(TraceEvent event, unsigned long x1) {
+trace(event, x1, 0, 0, 0, 0);
+}
+
+void trace2(TraceEvent event, unsigned long x1, unsigned long x2) {
+trace(event, x1, x2, 0, 0, 0);
+}
+
+void trace3(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
long x3) {
+trace(event, x1, x2, x3, 0, 0);
+}
+
+void trace4(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
long x3, unsigned long x4) {
+trace(event, x1, x2, x3, x4, 0);
+}
+
+void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
long x3, unsigned long x4, unsigned long x5) {
+trace(event, x1, x2, x3, x4, x5);
+}
diff --git a/simpletrace.py b/simpletrace.py
new file mode 100755
index 000..d6631ba
--- /dev/null
+++ b/simpletrace.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+import sys
+import struct
+import re
+
+trace_fmt = 'LL'
+trace_len = struct.calcsize(trace_fmt)
+event_re  = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\)\s+"([^"]*)"')
+
+def parse_events(fobj):
+def get_argnames(args):
+return tuple(arg.split()[-1].lstrip('*') for arg in args.split(','))
+
+events = {}
+event_num = 0
+for line in fobj:
+m = event_re.match(line.strip())
+if m is None:
+continue
+
+disable, name, args, fmt = m.groups()
+if disable:
+continue
+
+events[event_num] = (name,) + get_argnames(args)
+event_num += 1
+return events
+
+def read_record(fobj):
+s = fobj.read(trace_len)
+if len(s) != trace_len:
+return None
+return struct.unpack(trace_fmt, s)
+
+def format_record(events, rec):
+event = events[rec[0]]
+fields = [event[0]]
+

[Qemu-devel] [PATCH 4/7] trace: Add LTTng Userspace Tracer backend

2010-05-25 Thread Stefan Hajnoczi

This patch adds LTTng Userspace Tracer (UST) backend support.  The UST
system requires no kernel support but libust and liburcu must be
installed.

$ ./configure --trace-backend ust
$ make

Start the UST daemon:
$ ustd &

List available tracepoints and enable some:
$ ustctl --list-markers $(pgrep qemu)
[...]
{PID: 5458, channel/marker: ust/paio_submit, state: 0, fmt: "acb %p
opaque %p sector_num %lu nb_sectors %lu type %lu" 0x4b32ba}
$ ustctl --enable-marker "ust/paio_submit" $(pgrep qemu)

Run the trace:
$ ustctl --create-trace $(pgrep qemu)
$ ustctl --start-trace $(pgrep qemu)
[...]
$ ustctl --stop-trace $(pgrep qemu)
$ ustctl --destroy-trace $(pgrep qemu)

Trace results can be viewed using lttv-gui.

More information about UST:
http://lttng.org/ust

Signed-off-by: Stefan Hajnoczi 
---
 configure |5 +++-
 tracetool |   77 +++-
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 7d2c69b..675d0fc 100755
--- a/configure
+++ b/configure
@@ -829,7 +829,7 @@ echo "  --enable-docsenable documentation build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
 echo "  --enable-vhost-net   enable vhost-net acceleration support"
-echo "  --trace-backend=BTrace backend nop simple"
+echo "  --trace-backend=BTrace backend nop simple ust"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2302,6 +2302,9 @@ bsd)
 esac
 
 echo "TRACE_BACKEND=$trace_backend" >> $config_host_mak
+if test "$trace_backend" = "ust"; then
+  LIBS="-lust $LIBS"
+fi
 
 tools=
 if test `expr "$target_list" : ".*softmmu.*"` != 0 ; then
diff --git a/tracetool b/tracetool
index f094ddc..9ea9c08 100755
--- a/tracetool
+++ b/tracetool
@@ -3,12 +3,13 @@
 usage()
 {
 cat >&2 <"
+}
+
+linetoh_ust()
+{
+local name args argnames
+name=$(get_name "$1")
+args=$(get_args "$1")
+argnames=$(get_argnames "$1")
+
+cat <
+#include "trace.h"
+EOF
+}
+
+linetoc_ust()
+{
+local name args argnames fmt
+name=$(get_name "$1")
+args=$(get_args "$1")
+argnames=$(get_argnames "$1")
+fmt=$(get_fmt "$1")
+
+cat <

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf

Am 23.05.2010 14:01, schrieb Avi Kivity:
> On 05/21/2010 12:29 AM, Anthony Liguori wrote:
>>
>> I'd be more interested in enabling people to build these types of 
>> storage systems without touching qemu.
>>
>> Both sheepdog and ceph ultimately transmit I/O over a socket to a 
>> central daemon, right? 
> 
> That incurs an extra copy.
> 
>> So could we not standardize a protocol for this that both sheepdog and 
>> ceph could implement?
> 
> The protocol already exists, nbd.  It doesn't support snapshotting etc. 
> but we could extend it.
> 
> But IMO what's needed is a plugin API for the block layer.

What would it buy us, apart from more downstreams and having to maintain
a stable API and ABI? Hiding block drivers somewhere else doesn't make
them stop existing, they just might not be properly integrated, but
rather hacked in to fit that limited stable API.

Kevin

Re: [Qemu-devel] Re: irq problems after live migration with 0.12.4

2010-05-25 Thread Peter Lieven


Michael Tokarev wrote:

23.05.2010 13:55, Peter Lieven wrote:

Hi,

after live migrating ubuntu 9.10 server (2.6.31-14-server) and suse 
linux 10.1 (2.6.16.13-4-smp)
it happens sometimes that the guest runs into irq problems. i mention 
these 2 guest oss
since i have seen the error there. there are likely others around 
with the same problem.


on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4.

i started a vm with:
/usr/bin/qemu-kvm-0.12.4  -net 
tap,vlan=141,script=no,downscript=no,ifname=tap0 -net 
nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72   -drive 
file=/dev/sdb,if=ide,boot=on,cache=none,aio=native  -m 1024 -cpu 
qemu64,model_id='Intel(R) Xeon(R) CPU   E5430  @ 2.66GHz'  
-monitor tcp:0:4001,server,nowait -vnc :1 -name 
'migration-test-9-10'  -boot order=dc,menu=on  -k de  -incoming 
tcp:172.21.55.22:5001  -pidfile /var/run/qemu/vm-155.pid  -mem-path 
/hugepages -mem-prealloc  -rtc base=utc,clock=host -usb -usbdevice 
tablet


for testing i have a clean ubuntu 9.10 server 64-bit install and 
created a small script with fetches a dvd iso from a local server and 
checking md5sum in an endless loop.


the download performance is approx. 50MB/s on that vm.

to trigger the error i did several migrations of the vm throughout 
the last days. finally I ended up in the following oops in the guest:


[64442.298521] irq 10: nobody cared (try booting with the "irqpoll" 
option)
[64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server 
#48-Ubuntu

[64442.299179] Call Trace:
[64442.299185]   [] __report_bad_irq+0x26/0xa0
[64442.299227]  [] note_interrupt+0x18c/0x1d0
[64442.299232]  [] handle_fasteoi_irq+0xd5/0x100
[64442.299244]  [] handle_irq+0x1d/0x30
[64442.299246]  [] do_IRQ+0x67/0xe0
[64442.299249]  [] ret_from_intr+0x0/0x11
[64442.299266]  [] ? handle_IRQ_event+0x24/0x160
[64442.299269]  [] ? handle_edge_irq+0xcf/0x170
[64442.299271]  [] ? handle_irq+0x1d/0x30
[64442.299273]  [] ? do_IRQ+0x67/0xe0
[64442.299275]  [] ? ret_from_intr+0x0/0x11
[64442.299290]  [] ? _spin_unlock_irqrestore+0x14/0x20
[64442.299302]  [] ? scsi_dispatch_cmd+0x16c/0x2d0
[64442.299307]  [] ? scsi_request_fn+0x3aa/0x500
[64442.299322]  [] ? __blk_run_queue+0x6c/0x150
[64442.299324]  [] ? blk_run_queue+0x2b/0x50
[64442.299327]  [] ? scsi_run_queue+0xcf/0x2a0
[64442.299336]  [] ? scsi_next_command+0x3d/0x60
[64442.299338]  [] ? scsi_end_request+0xab/0xb0
[64442.299340]  [] ? scsi_io_completion+0x9e/0x4d0
[64442.299348]  [] ? default_spin_lock_flags+0x9/0x10
[64442.299351]  [] ? scsi_finish_command+0xbd/0x130
[64442.299353]  [] ? scsi_softirq_done+0x145/0x170
[64442.299356]  [] ? blk_done_softirq+0x7d/0x90
[64442.299368]  [] ? __do_softirq+0xbd/0x200
[64442.299370]  [] ? call_softirq+0x1c/0x30
[64442.299372]  [] ? do_softirq+0x55/0x90
[64442.299374]  [] ? irq_exit+0x85/0x90
[64442.299376]  [] ? do_IRQ+0x70/0xe0
[64442.299379]  [] ? ret_from_intr+0x0/0x11
[64442.299380]   [] ? native_safe_halt+0x6/0x10
[64442.299390]  [] ? default_idle+0x4c/0xe0
[64442.299395]  [] ? 
atomic_notifier_call_chain+0x15/0x20

[64442.299398]  [] ? cpu_idle+0xb2/0x100
[64442.299406]  [] ? rest_init+0x66/0x70
[64442.299424]  [] ? start_kernel+0x352/0x35b
[64442.299427]  [] ? 
x86_64_start_reservations+0x125/0x129

[64442.299429]  [] ? x86_64_start_kernel+0xfa/0x109
[64442.299433] handlers:
[64442.299840] [] (e1000_intr+0x0/0x190 [e1000])
[64442.300046] Disabling IRQ #10


See also LP bug #584131 (https://bugs.launchpad.net/bugs/584131)
and original Debian bug#580649 (http://bugs.debian.org/580649)

Not sure if they're related...

/mjt

michael, do you have any ideas what i got do to debug whats happening?
looking at launchpad and debian bug tracker i found other bugs also
with a maybe related problem. so this issue might be greater...

thanks
peter

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Paul Brook

> > This is wrong. The hpet device should expose this as an IO pin.
> 
> Will look into this.
> 
> BTW, I just realized that the GPIO handling is apparently lacking
> support for attaching an output to multiple inputs. Or am I missing
> something?

Use an explicit mux.

Incidentally I suspect your handling of the ISA IRQs is broken. You may never 
have more than one source connected to a sink.  Shared IRQ lines must be done 
explicitly.

Paul

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Jan Kiszka

Paul Brook wrote:
>>> This is wrong. The hpet device should expose this as an IO pin.
>> Will look into this.
>>
>> BTW, I just realized that the GPIO handling is apparently lacking
>> support for attaching an output to multiple inputs. Or am I missing
>> something?
> 
> Use an explicit mux.
> 
> Incidentally I suspect your handling of the ISA IRQs is broken. You may never 
> have more than one source connected to a sink.  Shared IRQ lines must be done 
> explicitly.

No, the other way around: one source (RTC) multiple sinks (HPET, ACPI).
Will probably draft a generic irq/gpio mux.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Paul Brook

> Paul Brook wrote:
> >>> This is wrong. The hpet device should expose this as an IO pin.
> >> 
> >> Will look into this.
> >> 
> >> BTW, I just realized that the GPIO handling is apparently lacking
> >> support for attaching an output to multiple inputs. Or am I missing
> >> something?
> > 
> > Use an explicit mux.
> > 
> > Incidentally I suspect your handling of the ISA IRQs is broken. You may
> > never have more than one source connected to a sink.  Shared IRQ lines
> > must be done explicitly.
> 
> No, the other way around: one source (RTC) multiple sinks (HPET, ACPI).
> Will probably draft a generic irq/gpio mux.

I realise that. However I'd expect things to break if the guest OS devices to 
share an IRQ line between the HPET and some other device.

Paul

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 02:02 PM, Kevin Wolf wrote:





So could we not standardize a protocol for this that both sheepdog and
ceph could implement?
   

The protocol already exists, nbd.  It doesn't support snapshotting etc.
but we could extend it.

But IMO what's needed is a plugin API for the block layer.
 

What would it buy us, apart from more downstreams and having to maintain
a stable API and ABI?


Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin API, 
they can add a new block format to an existing, supported qemu.



Hiding block drivers somewhere else doesn't make
them stop existing, they just might not be properly integrated, but
rather hacked in to fit that limited stable API.
   


They would hack it to fit the current API, and hack the API in qemu.git 
to fit their requirements for the next release.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation

2010-05-25 Thread Eduard - Gabriel Munteanu

On Tue, May 25, 2010 at 10:39:22AM +0200, Joerg Roedel wrote:
> On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote:
> > On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel  wrote:
> > >> +
> > >> +#define MMIO_SIZE ? ? ? ? ? ? ? 0x2028
> > >
> > > This size should be a power-of-two value. In this case probably 0x4000.
> > 
> > Not really, the devices can reserve regions of any size. There were
> > some implementation deficiencies in earlier versions of QEMU, where
> > the whole page would be reserved anyway, but this limitation has been
> > removed long time ago.
> 
> The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
> driver maps the MMIO region with this size. So the emulation should
> reserve this amount of MMIO space too.
> 
>   Joerg

Yeah, I'll change that, since I already reserve 0x4000 bytes in SeaBIOS
for it (I did that to deal with the 16 KiB alignment requirement).


Eduard

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Jan Kiszka

Paul Brook wrote:
>> Paul Brook wrote:
> This is wrong. The hpet device should expose this as an IO pin.
 Will look into this.

 BTW, I just realized that the GPIO handling is apparently lacking
 support for attaching an output to multiple inputs. Or am I missing
 something?
>>> Use an explicit mux.
>>>
>>> Incidentally I suspect your handling of the ISA IRQs is broken. You may
>>> never have more than one source connected to a sink.  Shared IRQ lines
>>> must be done explicitly.
>> No, the other way around: one source (RTC) multiple sinks (HPET, ACPI).
>> Will probably draft a generic irq/gpio mux.
> 
> I realise that. However I'd expect things to break if the guest OS devices to 
> share an IRQ line between the HPET and some other device.

The guest would share IRQ8, not the RTC output. So there would be no
difference to the current situation.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH] sparc64: clean up pci bridge map

2010-05-25 Thread Igor V. Kovalenko

From: Igor V. Kovalenko 

- remove unused host state and store pci bus pointer only
- do not map host state access into unused 1fe.1000 range
- reorder pci region registration
- assign pci i/o region to isa_mem_base
- rename default machine (it's Ultrasparc IIi now)

Signed-off-by: Igor V. Kovalenko 
---
 hw/apb_pci.c |   49 ++---
 hw/sun4u.c   |6 +++---
 2 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 65d8ba6..b53e3c3 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0)
 
 typedef struct APBState {
 SysBusDevice busdev;
-PCIHostState host_state;
+PCIBus  *bus;
 ReadWriteHandler pci_config_handler;
 uint32_t iommu[4];
 uint32_t pci_control[16];
@@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, 
pcibus_t addr,
 
 val = qemu_bswap_len(val, size);
 APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val);
-pci_data_write(s->host_state.bus, addr, val, size);
+pci_data_write(s->bus, addr, val, size);
 }
 
 static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr,
@@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, 
pcibus_t addr,
 uint32_t ret;
 APBState *s = container_of(h, APBState, pci_config_handler);
 
-ret = pci_data_read(s->host_state.bus, addr, size);
+ret = pci_data_read(s->bus, addr, size);
 ret = qemu_bswap_len(ret, size);
 APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret);
 return ret;
@@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 s = sysbus_from_qdev(dev);
 /* apb_config */
 sysbus_mmio_map(s, 0, special_base);
+/* PCI configuration space */
+sysbus_mmio_map(s, 1, special_base + 0x100ULL);
 /* pci_ioport */
-sysbus_mmio_map(s, 1, special_base + 0x200ULL);
-/* pci_config */
-sysbus_mmio_map(s, 2, special_base + 0x100ULL);
-/* mem_data */
-sysbus_mmio_map(s, 3, mem_base);
+sysbus_mmio_map(s, 2, special_base + 0x200ULL);
 d = FROM_SYSBUS(APBState, s);
-d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci",
+
+d->bus = pci_register_bus(&d->busdev.qdev, "pci",
  pci_apb_set_irq, pci_pbm_map_irq, d,
  0, 32);
-pci_bus_set_mem_base(d->host_state.bus, mem_base);
+pci_bus_set_mem_base(d->bus, mem_base);
 
 for (i = 0; i < 32; i++) {
 sysbus_connect_irq(s, i, pic[i]);
 }
 
-pci_create_simple(d->host_state.bus, 0, "pbm");
+pci_create_simple(d->bus, 0, "pbm");
+
 /* APB secondary busses */
-*bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0),
+*bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0),
 PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
 pci_apb_map_irq,
 "Advanced PCI Bus secondary bridge 1");
 apb_pci_bridge_init(*bus2);
 
-*bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1),
+*bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1),
 PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
 pci_apb_map_irq,
 "Advanced PCI Bus secondary bridge 2");
 apb_pci_bridge_init(*bus3);
 
-return d->host_state.bus;
+return d->bus;
 }
 
 static void pci_pbm_reset(DeviceState *d)
@@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d)
 static int pci_pbm_init_device(SysBusDevice *dev)
 {
 APBState *s;
-int pci_mem_data, apb_config, pci_ioport, pci_config;
+int pci_config, apb_config, pci_ioport;
 unsigned int i;
 
 s = FROM_SYSBUS(APBState, dev);
@@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev)
 /* apb_config */
 apb_config = cpu_register_io_memory(apb_config_read,
 apb_config_write, s);
+/* at region 0 */
 sysbus_init_mmio(dev, 0x1ULL, apb_config);
-/* pci_ioport */
-pci_ioport = cpu_register_io_memory(pci_apb_ioread,
-  pci_apb_iowrite, s);
-sysbus_init_mmio(dev, 0x1ULL, pci_ioport);
-/* pci_config */
+
+/* PCI configuration space */
 s->pci_config_handler.read = apb_pci_config_read;
 s->pci_config_handler.write = apb_pci_config_write;
 pci_config = cpu_register_io_memory_simple(&s->pci_config_handler);
 assert(pci_config >= 0);
+/* at region 1 */
 sysbus_init_mmio(dev, 0x100ULL, pci_config);
-/* mem_data */
-pci_mem_data = pci_host_data_register_mmio(&s->host_state, 1);
-sysbus_init_mmio(dev, 0x1000ULL, pci_mem_data);
+
+/* pci_ioport */
+pci_ioport = cpu_register_io_memory(pci_apb_ioread,
+pci_apb_iowrite, s);
+

Re: [Qemu-devel] [PATCH 3/3] Samples to add a tracepoint.

2010-05-25 Thread Stefan Hajnoczi

@@ -87,6 +91,8 @@ static void virtio_blk_rw_complete(void
 {
 VirtIOBlockReq *req = opaque;

+trace_virtio_blk_rw_complete(req, ret);
+
 if (ret) {
 int is_read = !(req->out->type & VIRTIO_BLK_T_OUT);
 if (virtio_blk_handle_rw_error(req, -ret, is_read))

What happens when CONFIG_QEMU_TRACE is not defined?  Linker error for missing
symbol trace_virtio_blk_rw_complete()?

This is handled by the tracing backends patchset I posted.  When
tracing is disabled the nop backend will make tracepoints empty inline
functions.  The compiler makes them disappear but the tracepoint
invocation is still parsed and type checked by the compiler.  It
shouldn't be hard to add your tracer as a backend.

Stefan

Re: [Qemu-devel] [PATCH 2/3] Tracepoint, buffer & monitor framework

2010-05-25 Thread Stefan Hajnoczi

+#define DEFINE_TRACE(name, tproto, tassign, tprint)\
+   void trace_##name(tproto)   \
+   {   \
+ unsigned int hash;\
+ char tpname[] = __stringify(name);\
+ struct tracepoint *tp;\
+ struct __trace_struct_##name var, *entry; \
+   \
+ hash = tdb_hash(tpname);  \
+ tp = find_tracepoint_by_name(tpname); \
+ if (tp == NULL || !tp->state) \
+   return; \
+   \
+ entry = &var; \
+ tassign   \
+   \
+ write_trace_to_buffer(&qemu_buf, hash, tp->trace_id,  \
+   entry, sizeof(struct __trace_struct_##name));   \
+   }   \

I think this is too much work.  Let each tracepoint have its own global struct
tracepoint so it can directly reference it using tracepoint_##name - no hash
lookup needed.  Add the QLIST_ENTRY directly to struct tracepoint so the
tracepoint register/unregister code can assign ids and look up tracepoints by
name.  No critical path code needs to do name lookups and the hash table can
disappear.

+#define DECLARE_TRACE(name, tproto, tstruct)   \
+   struct __trace_struct_##name {  \
+   tstruct \
+   };  \

Should this struct be packed so more fields can fit?

+trace_queue->trace_buffer[tmp].metadata.write_complete = 0;

This is not guaranteed to work without memory barriers.  There is no way for
the trace consumer to block until there is more data available.  The
synchronization needs to consider writing traces to a file, which has different
constraints than dumping the current contents of the trace buffer.

We're missing a way to trace to a file.  That could be done in binary or text.
It would be easier in text because we already have the format strings and don't
need a unique ID mapping in an external binary parsing tool.

Making data available after crash is also useful.  The easiest way is to dump
the trace buffer from the core dump using gdb.  However, we'd need some way of
making sense of the bytes.  That could be done by reading the tracepoint_lib
structures from the core dump.

(The way I do trace recovery from a core dump in my simple tracer is to binary
dump the trace buffer from the core dump.  Since the trace buffer contents are
normally written out to file unchanged anyway, the simpletrace.py script can
read the dumped trace buffer like a normal trace file.)

Nitpicks:

Some added lines of code use tabs for indentation, 4 space indentation should
be used.

+{
+.name   = "tracepoint",
+.args_type  = "",
+.params = "",
+.help   = "show contents of trace buffer",

Copy-pasted, .help not updated.

@@ -145,6 +147,10 @@ struct Monitor {
 #ifdef CONFIG_DEBUG_MONITOR
 int print_calls_nr;
 #endif
+#ifdef CONFIG_QEMU_TRACE
+struct DebugBuffer *qemu_buf_ptr;
+#endif
+
 QError *error;
 QLIST_HEAD(,mon_fd_t) fds;
 QLIST_ENTRY(Monitor) entry;

Would TraceBuffer be a more appropriate name for DebugBuffer?  qemu_buf_ptr is
vague, perhaps trace_buf is more clear?

I'm not sure I understand the reason for qemu_buf_ptr.  There is already a
global qemu_buf and qemu_buf_ptr is a pointer to that?

+if(!strncmp(tp_state, "on", 3))
[...]
+   if(!strncmp(tp_state, "off", 4))

"on" with 3 and "off" with 4 are equivalent to strcmp().  "on" with 2 and
"off" with 3 would allow for any suffix after the matched string.

+#else /* CONFIG_QEMU_TRACE */
+static void do_tracepoint_status(Monitor *mon, const QDict *qdict)
+{
+monitor_printf(mon, "Internal tracing not compiled\n");
+}
+#endif

"tracepoint" has this !CONFIG_QEMU_TRACE function but "trace" doesn't.

+#define INCREMENT_INDEX(HEAD,IDX) (HEAD->IDX++) % HEAD->buf_size
[...]
+if ((trace_queue->last + 1) % trace_queue->buf_size
+== trace_queue->first)
+   trace_queue->first = INCREMENT_INDEX(trace_queue, first);
+trace_queue->last  = INCREMENT_INDEX(trace_queue, last);

Slightly safer macro:
#define NEXT_INDEX(HEAD,IDX) (((HEAD)->IDX + 1) % (HEAD)->buf_size)
[...]
if (NEXT_INDEX(trace_queue, last) == trace_queue->first)
trace_queue->first = NEXT_INDEX(trace_queue, first);
trace_queue->last  = NEXT_INDEX(trace_queue, last);

+tmp = trace_queue->last;
Instead of using tmp:
D

Re: [Qemu-devel] [RFC 0/3] Tracing framework for QEMU

2010-05-25 Thread Stefan Hajnoczi

Interesting to see your patches, tracepoint definitions/declarations
look similar to in-kernel tracepoints :).

Please post future patches inline to the email so reviewing and
replying is easy (e.g. use git-send-email to send patches).

Stefan

Re: [Qemu-devel] [PATCH] sparc64: clean up pci bridge map

2010-05-25 Thread Artyom Tarasenko

2010/5/25 Igor V. Kovalenko :
> From: Igor V. Kovalenko 
>
> - remove unused host state and store pci bus pointer only
> - do not map host state access into unused 1fe.1000 range
> - reorder pci region registration
> - assign pci i/o region to isa_mem_base
> - rename default machine (it's Ultrasparc IIi now)

Just rename the machine or use another CPU too? While you are at it
maybe split these two?

>
> Signed-off-by: Igor V. Kovalenko 
> ---
>  hw/apb_pci.c |   49 ++---
>  hw/sun4u.c   |    6 +++---
>  2 files changed, 29 insertions(+), 26 deletions(-)
>
> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> index 65d8ba6..b53e3c3 100644
> --- a/hw/apb_pci.c
> +++ b/hw/apb_pci.c
> @@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0)
>
>  typedef struct APBState {
>     SysBusDevice busdev;
> -    PCIHostState host_state;
> +    PCIBus      *bus;
>     ReadWriteHandler pci_config_handler;
>     uint32_t iommu[4];
>     uint32_t pci_control[16];
> @@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, 
> pcibus_t addr,
>
>     val = qemu_bswap_len(val, size);
>     APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val);
> -    pci_data_write(s->host_state.bus, addr, val, size);
> +    pci_data_write(s->bus, addr, val, size);
>  }
>
>  static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr,
> @@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, 
> pcibus_t addr,
>     uint32_t ret;
>     APBState *s = container_of(h, APBState, pci_config_handler);
>
> -    ret = pci_data_read(s->host_state.bus, addr, size);
> +    ret = pci_data_read(s->bus, addr, size);
>     ret = qemu_bswap_len(ret, size);
>     APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret);
>     return ret;
> @@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>     s = sysbus_from_qdev(dev);
>     /* apb_config */
>     sysbus_mmio_map(s, 0, special_base);
> +    /* PCI configuration space */
> +    sysbus_mmio_map(s, 1, special_base + 0x100ULL);
>     /* pci_ioport */
> -    sysbus_mmio_map(s, 1, special_base + 0x200ULL);
> -    /* pci_config */
> -    sysbus_mmio_map(s, 2, special_base + 0x100ULL);
> -    /* mem_data */
> -    sysbus_mmio_map(s, 3, mem_base);
> +    sysbus_mmio_map(s, 2, special_base + 0x200ULL);
>     d = FROM_SYSBUS(APBState, s);
> -    d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci",
> +
> +    d->bus = pci_register_bus(&d->busdev.qdev, "pci",
>                                          pci_apb_set_irq, pci_pbm_map_irq, d,
>                                          0, 32);
> -    pci_bus_set_mem_base(d->host_state.bus, mem_base);
> +    pci_bus_set_mem_base(d->bus, mem_base);
>
>     for (i = 0; i < 32; i++) {
>         sysbus_connect_irq(s, i, pic[i]);
>     }
>
> -    pci_create_simple(d->host_state.bus, 0, "pbm");
> +    pci_create_simple(d->bus, 0, "pbm");
> +
>     /* APB secondary busses */
> -    *bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0),
> +    *bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0),
>                             PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
>                             pci_apb_map_irq,
>                             "Advanced PCI Bus secondary bridge 1");
>     apb_pci_bridge_init(*bus2);
>
> -    *bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1),
> +    *bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1),
>                             PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
>                             pci_apb_map_irq,
>                             "Advanced PCI Bus secondary bridge 2");
>     apb_pci_bridge_init(*bus3);
>
> -    return d->host_state.bus;
> +    return d->bus;
>  }
>
>  static void pci_pbm_reset(DeviceState *d)
> @@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d)
>  static int pci_pbm_init_device(SysBusDevice *dev)
>  {
>     APBState *s;
> -    int pci_mem_data, apb_config, pci_ioport, pci_config;
> +    int pci_config, apb_config, pci_ioport;
>     unsigned int i;
>
>     s = FROM_SYSBUS(APBState, dev);
> @@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev)
>     /* apb_config */
>     apb_config = cpu_register_io_memory(apb_config_read,
>                                         apb_config_write, s);
> +    /* at region 0 */
>     sysbus_init_mmio(dev, 0x1ULL, apb_config);
> -    /* pci_ioport */
> -    pci_ioport = cpu_register_io_memory(pci_apb_ioread,
> -                                          pci_apb_iowrite, s);
> -    sysbus_init_mmio(dev, 0x1ULL, pci_ioport);
> -    /* pci_config */
> +
> +    /* PCI configuration space */
>     s->pci_config_handler.read = apb_pci_config_read;
>     s->pci_config_handler.write = apb_pci_config_write;
>     pci_config = cpu_register_io_memory_simple(&s->pci_config_handler);
>     assert(pci_config >= 0);
> +    /* at region 1 */
>     sysbus_init_mmio(dev, 0x100U

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Christoph Hellwig

On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote:
> Currently if someone wants to add a new block format, they have to  
> upstream it and wait for a new qemu to be released.  With a plugin API,  
> they can add a new block format to an existing, supported qemu.

So?  Unless we want a stable driver ABI which I fundamentally oppose as
it would make block driver development hell they'd have to wait for
a new release of the block layer.  It's really just going to be a lot
of pain for no major gain.  qemu releases are frequent enough, and if
users care enough they can also easily patch qemu.

[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Avi Kivity


On 05/25/2010 01:24 PM, Stefan Hajnoczi wrote:

This patch adds trace events for virtqueue operations including
adding/removing buffers, notifying the guest, and receiving a notify
from the guest.

diff --git a/trace-events b/trace-events
index 48415f8..a533414 100644
--- a/trace-events
+++ b/trace-events
@@ -35,6 +35,14 @@ qemu_memalign(size_t alignment, size_t size, void *ptr) 
"alignment %zu size %zu
  qemu_valloc(size_t size, void *ptr) "size %zu ptr %p"
  qemu_vfree(void *ptr) "ptr %p"

+# hw/virtio.c
+virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq 
%p elem %p len %u idx %u"
+virtqueue_flush(void *vq, unsigned int count) "vq %p count %u"
+virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) "vq 
%p elem %p in_num %u out_num %u"
+virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
+virtio_irq(void *vq) "vq %p"
+virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
+
   



Those %ps are more or less useless.  We need better ways of identifying 
them.


Linux uses %pTYPE to pretty print arbitrary types.  We could do 
something similar (not the same since we don't want our own printf 
implementation).


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Paul Brook

> > I realise that. However I'd expect things to break if the guest OS
> > devices to share an IRQ line between the HPET and some other device.
> 
> The guest would share IRQ8, not the RTC output. So there would be no
> difference to the current situation.

The difference is that you've removed the check that prevented overlap between 
the PIC and annother device.  You should be using isa_reserve_irq/isa_init_irq 
before you use an ISA IRQ line.  Any uses of isa_bus_irqs (including teh 
existing HPET code) are probably broken.

Paul

[Qemu-devel] [PATCH 0/2] sparc64: cleanups

2010-05-25 Thread Igor V. Kovalenko

- rename sun4u cpu to Ultrasparc IIi
- cleanup pci bridge map (requires openbios change)

v0->v1: split out rename of sun4u cpu to separate patch

---

Igor V. Kovalenko (2):
  sparc64: rename sun4u cpu to Ultrasparc IIi
  sparc64: clean up pci bridge map


 hw/apb_pci.c |   49 ++---
 hw/sun4u.c   |6 +++---
 2 files changed, 29 insertions(+), 26 deletions(-)

--

[Qemu-devel] [PATCH 1/2] sparc64: rename sun4u cpu to Ultrasparc IIi

2010-05-25 Thread Igor V. Kovalenko

From: Igor V. Kovalenko 

Signed-off-by: Igor V. Kovalenko 
---
 hw/sun4u.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/sun4u.c b/hw/sun4u.c
index e9a1e23..1e92900 100644
--- a/hw/sun4u.c
+++ b/hw/sun4u.c
@@ -859,7 +859,7 @@ enum {
 static const struct hwdef hwdefs[] = {
 /* Sun4u generic PC-like machine */
 {
-.default_cpu_model = "TI UltraSparc II",
+.default_cpu_model = "TI UltraSparc IIi",
 .machine_id = sun4u_id,
 .prom_addr = 0x1fff000ULL,
 .console_serial_base = 0,

[Qemu-devel] [PATCH 2/2] sparc64: clean up pci bridge map

2010-05-25 Thread Igor V. Kovalenko

From: Igor V. Kovalenko 

- remove unused host state and store pci bus pointer only
- do not map host state access into unused 1fe.1000 range
- reorder pci region registration
- assign pci i/o region to isa_mem_base

Signed-off-by: Igor V. Kovalenko 
---
 hw/apb_pci.c |   49 ++---
 hw/sun4u.c   |4 ++--
 2 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 65d8ba6..b53e3c3 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0)
 
 typedef struct APBState {
 SysBusDevice busdev;
-PCIHostState host_state;
+PCIBus  *bus;
 ReadWriteHandler pci_config_handler;
 uint32_t iommu[4];
 uint32_t pci_control[16];
@@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, 
pcibus_t addr,
 
 val = qemu_bswap_len(val, size);
 APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val);
-pci_data_write(s->host_state.bus, addr, val, size);
+pci_data_write(s->bus, addr, val, size);
 }
 
 static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr,
@@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, 
pcibus_t addr,
 uint32_t ret;
 APBState *s = container_of(h, APBState, pci_config_handler);
 
-ret = pci_data_read(s->host_state.bus, addr, size);
+ret = pci_data_read(s->bus, addr, size);
 ret = qemu_bswap_len(ret, size);
 APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret);
 return ret;
@@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 s = sysbus_from_qdev(dev);
 /* apb_config */
 sysbus_mmio_map(s, 0, special_base);
+/* PCI configuration space */
+sysbus_mmio_map(s, 1, special_base + 0x100ULL);
 /* pci_ioport */
-sysbus_mmio_map(s, 1, special_base + 0x200ULL);
-/* pci_config */
-sysbus_mmio_map(s, 2, special_base + 0x100ULL);
-/* mem_data */
-sysbus_mmio_map(s, 3, mem_base);
+sysbus_mmio_map(s, 2, special_base + 0x200ULL);
 d = FROM_SYSBUS(APBState, s);
-d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci",
+
+d->bus = pci_register_bus(&d->busdev.qdev, "pci",
  pci_apb_set_irq, pci_pbm_map_irq, d,
  0, 32);
-pci_bus_set_mem_base(d->host_state.bus, mem_base);
+pci_bus_set_mem_base(d->bus, mem_base);
 
 for (i = 0; i < 32; i++) {
 sysbus_connect_irq(s, i, pic[i]);
 }
 
-pci_create_simple(d->host_state.bus, 0, "pbm");
+pci_create_simple(d->bus, 0, "pbm");
+
 /* APB secondary busses */
-*bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0),
+*bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0),
 PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
 pci_apb_map_irq,
 "Advanced PCI Bus secondary bridge 1");
 apb_pci_bridge_init(*bus2);
 
-*bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1),
+*bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1),
 PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
 pci_apb_map_irq,
 "Advanced PCI Bus secondary bridge 2");
 apb_pci_bridge_init(*bus3);
 
-return d->host_state.bus;
+return d->bus;
 }
 
 static void pci_pbm_reset(DeviceState *d)
@@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d)
 static int pci_pbm_init_device(SysBusDevice *dev)
 {
 APBState *s;
-int pci_mem_data, apb_config, pci_ioport, pci_config;
+int pci_config, apb_config, pci_ioport;
 unsigned int i;
 
 s = FROM_SYSBUS(APBState, dev);
@@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev)
 /* apb_config */
 apb_config = cpu_register_io_memory(apb_config_read,
 apb_config_write, s);
+/* at region 0 */
 sysbus_init_mmio(dev, 0x1ULL, apb_config);
-/* pci_ioport */
-pci_ioport = cpu_register_io_memory(pci_apb_ioread,
-  pci_apb_iowrite, s);
-sysbus_init_mmio(dev, 0x1ULL, pci_ioport);
-/* pci_config */
+
+/* PCI configuration space */
 s->pci_config_handler.read = apb_pci_config_read;
 s->pci_config_handler.write = apb_pci_config_write;
 pci_config = cpu_register_io_memory_simple(&s->pci_config_handler);
 assert(pci_config >= 0);
+/* at region 1 */
 sysbus_init_mmio(dev, 0x100ULL, pci_config);
-/* mem_data */
-pci_mem_data = pci_host_data_register_mmio(&s->host_state, 1);
-sysbus_init_mmio(dev, 0x1000ULL, pci_mem_data);
+
+/* pci_ioport */
+pci_ioport = cpu_register_io_memory(pci_apb_ioread,
+pci_apb_iowrite, s);
+/* at region 2 */
+sysbus_init_mmio(dev, 0x10

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 03:03 PM, Christoph Hellwig wrote:

On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote:
   

Currently if someone wants to add a new block format, they have to
upstream it and wait for a new qemu to be released.  With a plugin API,
they can add a new block format to an existing, supported qemu.
 

So?  Unless we want a stable driver ABI which I fundamentally oppose as
it would make block driver development hell


We'd only freeze it for a major release.


they'd have to wait for
a new release of the block layer.  It's really just going to be a lot
of pain for no major gain.  qemu releases are frequent enough, and if
users care enough they can also easily patch qemu.
   


May not be so easy for them, they lose binary updates from their distro 
and have to keep repatching.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] sparc64: clean up pci bridge map

2010-05-25 Thread Igor Kovalenko

On Tue, May 25, 2010 at 3:56 PM, Artyom Tarasenko
 wrote:
> 2010/5/25 Igor V. Kovalenko :
>> From: Igor V. Kovalenko 
>>
>> - remove unused host state and store pci bus pointer only
>> - do not map host state access into unused 1fe.1000 range
>> - reorder pci region registration
>> - assign pci i/o region to isa_mem_base
>> - rename default machine (it's Ultrasparc IIi now)
>
> Just rename the machine or use another CPU too? While you are at it
> maybe split these two?

Let's rename the cpu only since at the moment the rest of sun4u is
more Ultrasparc IIi than anything else anyway.
I posted updated set with separated rename bit.

-- 
Kind regards,
Igor V. Kovalenko

Re: [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET

2010-05-25 Thread Jan Kiszka

Paul Brook wrote:
>>> I realise that. However I'd expect things to break if the guest OS
>>> devices to share an IRQ line between the HPET and some other device.
>> The guest would share IRQ8, not the RTC output. So there would be no
>> difference to the current situation.
> 
> The difference is that you've removed the check that prevented overlap 
> between 
> the PIC and annother device.  You should be using 
> isa_reserve_irq/isa_init_irq 
> before you use an ISA IRQ line.  Any uses of isa_bus_irqs (including teh 
> existing HPET code) are probably broken.

...at least fragile. OK, will address this as well.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization

2010-05-25 Thread Anthony Liguori


On 05/25/2010 02:23 AM, Avi Kivity wrote:

On 05/24/2010 11:22 PM, Anthony Liguori wrote:
This converts the entire qdev tree into an undocumented stable 
protocol (the qdev paths were already in this state I believe).  
This really worries me.



N.B. the association with qdev is only in identifying the device.  
The contents of the device's state are not part of qdev but rather 
part of vmstate.  vmstate is something that we already guarantee to 
be stable since that's required for live migration compatibility.


That removes out ability to deprecate older vmstate as time passes.  
Not a blocker but something to consider.


I don't think that qdev device names and paths are something we have 
to worry much about changing over time since they reflect logical bus 
layout.  They should remain static provided the devices remain static.


Modulo mistakes.  We already saw one (lack of pci domains).  To reduce 
the possibility of mistakes, we need reviewable documentation.


pci domains was only a mistake as a nice-to-have.  We can add pci 
domains in a backwards compatible way.


The arguments you're making about the importance of backwards 
compatibility and what's needed to strongly guarantee it are equally 
applicable to the live migration protocol.  We really do need to 
formally document the live migration protocol in such a way that it's 
reviewable if we hope to truly make it compatible across versions.


Regards,

Anthony Liguori


Note sysfs had similar assumptions and problems.

The qdev properties are a different matter entirely.  A command like 
'info qdm' would be potentially difficult to support as part of QMP 
but the proposed command's output is actually already part of a 
backward compatible interface (vmstate).


That's all good.  But documentation is critical for this.  Not only to 
improve quality, but also so that tool authors would have something to 
code against instead of trial and error (which invariably misses some 
corner cases).

Re: [Qemu-devel] [PATCH v2 1/3] add some tests for invalid JSON

2010-05-25 Thread Anthony Liguori


On 05/25/2010 02:28 AM, Paolo Bonzini wrote:

On 05/24/2010 10:17 PM, Anthony Liguori wrote:

On 05/24/2010 02:39 AM, Paolo Bonzini wrote:

Signed-off-by: Paolo Bonzini


I think this series conflicts a bit with Luiz's series which I just
pushed. Could you rebase against the latest?


You didn't apply this one yet, at least I don't see it on qemu.git

commit e546343ee0f3f904529d32c1a9a60f5baa181852
Author: Luiz Capitulino 
Date:   Wed May 19 18:15:32 2010 -0300

json-lexer: Drop 'buf'

QString supports adding a single char, 'buf' is unneeded.

Signed-off-by: Luiz Capitulino 

I based my series on top of Luiz's, so it should apply.


Yeah, I confused myself into thinking that Luiz's series was more 
contentious than it is.  Nevermind, your patches are fine on top of his.


Regards,

Anthony Liguori

The above is the only commit that is actually required.  I can ping 
the series once Luiz's patches are applied, so you can disregard it in 
the meanwhile.


Paolo

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori


On 05/25/2010 04:14 AM, Avi Kivity wrote:

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with our 
API changes would be if it was in tree which defeats the purpose of 
having plugins.


We could guarantee API/ABI stability in a stable branch but not across 
releases.


We have releases every six months.  There would be tons of block plugins 
that didn't work for random sets of releases.  That creates a lot of 
user confusion and unhappiness.


Regards,

Anthony Liguori

Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization

2010-05-25 Thread Avi Kivity


On 05/25/2010 04:03 PM, Anthony Liguori wrote:


I don't think that qdev device names and paths are something we have 
to worry much about changing over time since they reflect logical 
bus layout.  They should remain static provided the devices remain 
static.


Modulo mistakes.  We already saw one (lack of pci domains).  To 
reduce the possibility of mistakes, we need reviewable documentation.



pci domains was only a mistake as a nice-to-have.  We can add pci 
domains in a backwards compatible way.


It adds a new level to the qdev tree.  Of course we can hide the new 
level for older clients, and newer clients can drop the level for older 
qemus, but it will be oh-so-painful.




The arguments you're making about the importance of backwards 
compatibility and what's needed to strongly guarantee it are equally 
applicable to the live migration protocol.  We really do need to 
formally document the live migration protocol in such a way that it's 
reviewable if we hope to truly make it compatible across versions.


Mostly agreed.  I think live migration has a faster/easier deprecation 
schedule (easier not to support migration from 0.n-k to 0.n than to 
remove qmp support for a feature introduced in 0.n-k when releasing 
0.n).  But that's a minor concern, improving our externally visible 
interface documentation is a good thing and badly needed.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host

2010-05-25 Thread Andre Przywara

Anthony Liguori wrote:

On 05/21/2010 02:50 AM, Andre Przywara wrote:

-cpu host currently only propagates the CPU's family/model/stepping,
the brand name and the feature bits.
Add a whitelist of safe CPUID leafs to let the guest see the actual
CPU's cache details and other things.

Signed-off-by: Andre Przywara

The problem I can see is that this greatly increases the chances of 
problems with live migration since we don't migrate the cpuid state.
I think that should be fixed. Although -cpu host is not a wise choice 
for migration, even without these additional leaves the feature bits 
probably don't match between source and target.

What's the benefit of exposing this information to the guest?

That is mostly to propagate the cache size and organization parameters 
to the guest:

>> +/* safe CPUID leafs to propagate to guest if -cpu host is specified
>> + * Intel defined leafs:
>> + * Cache descriptors (0x02)
>> + * Deterministic cache parameters (0x04)
>> + * Monitor/MWAIT parameters (0x05)
>> + *
>> + * AMD defined leafs:
>> + * L1 Cache and TLB (0x05)
>> + * L2+L3 TLB (0x06)
>> + * LongMode address size (0x08)
>> + * 1GB page TLB (0x19)
>> + * Performance optimization (0x1A)
>> + */
Since at least L1 and L2 caches are mostly private to vCPUs, I see no 
reason to disguise them.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori


On 05/25/2010 06:25 AM, Avi Kivity wrote:

On 05/25/2010 02:02 PM, Kevin Wolf wrote:





So could we not standardize a protocol for this that both sheepdog and
ceph could implement?

The protocol already exists, nbd.  It doesn't support snapshotting etc.
but we could extend it.

But IMO what's needed is a plugin API for the block layer.

What would it buy us, apart from more downstreams and having to maintain
a stable API and ABI?


Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin 
API, they can add a new block format to an existing, supported qemu.


Whether we have a plugin or protocol based mechanism to implement block 
formats really ends up being just an implementation detail.


In order to implement either, we need to take a subset of block 
functionality that we feel we can support long term and expose that.  
Right now, that's basically just querying characteristics (like size and 
geometry) and asynchronous reads and writes.


A protocol based mechanism has the advantage of being more robust in the 
face of poorly written block backends so if it's possible to make it 
perform as well as a plugin, it's a preferable approach.


Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Regards,

Anthony Liguori

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 04:17 PM, Anthony Liguori wrote:

On 05/25/2010 04:14 AM, Avi Kivity wrote:

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with 
our API changes would be if it was in tree which defeats the purpose 
of having plugins.


We could guarantee API/ABI stability in a stable branch but not 
across releases.


We have releases every six months.  There would be tons of block 
plugins that didn't work for random sets of releases.  That creates a 
lot of user confusion and unhappiness.


The current situation is that those block format drivers only exist in 
qemu.git or as patches.  Surely that's even more unhappiness.


Confusion could be mitigated:

  $ qemu -module my-fancy-block-format-driver.so
  my-fancy-block-format-driver.so does not support this version of qemu 
(0.19.2).  Please contact my-fancy-block-format-driver-de...@example.org.


The question is how many such block format drivers we expect.  We now 
have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
we'll want an lvm2 driver and btrfs driver.  This is an area with a lot 
of activity and a relatively simply interface.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] RFC: ehci -> uhci handoff suggestions

2010-05-25 Thread David S. Ahern


USB 2.0 leverages companion UHCI or OHCI host controllers for full and
low speed devices. I do not see an appropriate means for doing that bus
transition and could use some suggestions.

I've read through the code for the "legacy" path in handling USB devices
(-usbdevice CLI arg and usb_add monitor command), and I am now working
on the new path (now that I know about it).

As I understand the code at this point it is a top down setup: device
added, bus found, device attached.


   |   Qemu USB admin   |   - adding/removing devices
   | interface  |   - showing device list

 |

   |   USB controller   |

 |

   |  USB device model  |   - emulated devices (e.g., hw/usb-serial)
   |(or driver )|   - host devices



ie., key point is the expectation that the bus to which the device is
assigned is known early in the code path.

For USB devices the bus to attach it to should be determined
automatically when the device is attached. Something along the lines of:


   |   Qemu USB admin   |
   | interface  |

 |
  
   |   EHCI controller  |--->|UHCI / OHCI |
  
 | |
  
   |  USB device model  ||  USB device model  |
   |(or driver )||(or driver )|
  
 high speed full / low speed


To know which bus to attach it to the device needs to be queried/probed
for basic information - something the current architecture does not have.

Suggestions?

David

P.S. I skimmed the USB 3.0 spec and it has the same design: super speed
devices are attached to the new 3.0 controller, high speed to ehci and
low/full to uhci/ohci.

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 04:29 PM, Anthony Liguori wrote:
The current situation is that those block format drivers only exist 
in qemu.git or as patches.  Surely that's even more unhappiness.


Confusion could be mitigated:

  $ qemu -module my-fancy-block-format-driver.so
  my-fancy-block-format-driver.so does not support this version of 
qemu (0.19.2).  Please contact 
my-fancy-block-format-driver-de...@example.org.


The question is how many such block format drivers we expect.  We now 
have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
we'll want an lvm2 driver and btrfs driver.  This is an area with a 
lot of activity and a relatively simply interface.



If we expose a simple interface, I'm all for it.  But BlockDriver is 
not simple and things like the snapshoting API need love.


Of course, there's certainly a question of why we're solving this in 
qemu at all.  Wouldn't it be more appropriate to either (1) implement 
a kernel module for ceph/sheepdog if performance matters 


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.

(ABUSE was proposed a while ago by Zach).

If you want to use a block device within qemu, you almost certainly 
want to be able to manipulate it on the host using standard tools 
(like mount and parted) so it stands to reason that addressing this in 
the kernel makes more sense.


qemu-nbd also allows this.

This reasoning also applies to qcow2, btw.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori


On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to make 
it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?




Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Also, we don't want to expose all of the qemu API.  We should default 
the visibility attribute to "hidden" and expose only select functions, 
perhaps under their own interface.  And no inlines.


Yeah, if we did plugins, this would be a key requirement.

Regards,

Anthony Liguori

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to 
make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 04:25 PM, Anthony Liguori wrote:
Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin 
API, they can add a new block format to an existing, supported qemu.



Whether we have a plugin or protocol based mechanism to implement 
block formats really ends up being just an implementation detail.


True.

In order to implement either, we need to take a subset of block 
functionality that we feel we can support long term and expose that.  
Right now, that's basically just querying characteristics (like size 
and geometry) and asynchronous reads and writes.


Unfortunately, you're right.

A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to make 
it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.



Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Also, we don't want to expose all of the qemu API.  We should default 
the visibility attribute to "hidden" and expose only select functions, 
perhaps under their own interface.  And no inlines.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:26 PM, Anthony Liguori wrote:

On 05/25/2010 08:21 AM, Andre Przywara wrote:

What's the benefit of exposing this information to the guest?

That is mostly to propagate the cache size and organization 
parameters to the guest:

>> +/* safe CPUID leafs to propagate to guest if -cpu host is specified
>> + * Intel defined leafs:
>> + * Cache descriptors (0x02)
>> + * Deterministic cache parameters (0x04)
>> + * Monitor/MWAIT parameters (0x05)
>> + *
>> + * AMD defined leafs:
>> + * L1 Cache and TLB (0x05)
>> + * L2+L3 TLB (0x06)
>> + * LongMode address size (0x08)
>> + * 1GB page TLB (0x19)
>> + * Performance optimization (0x1A)
>> + */
Since at least L1 and L2 caches are mostly private to vCPUs, I see no 
reason to disguise them.

But in practice, what is it useful for? 

See my other mail.

Just because we can expose it doesn't mean we should.

What's the point of -cpu host then?

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf

Am 25.05.2010 15:25, schrieb Anthony Liguori:
> On 05/25/2010 06:25 AM, Avi Kivity wrote:
>> On 05/25/2010 02:02 PM, Kevin Wolf wrote:
>>>

> So could we not standardize a protocol for this that both sheepdog and
> ceph could implement?
 The protocol already exists, nbd.  It doesn't support snapshotting etc.
 but we could extend it.

 But IMO what's needed is a plugin API for the block layer.
>>> What would it buy us, apart from more downstreams and having to maintain
>>> a stable API and ABI?
>>
>> Currently if someone wants to add a new block format, they have to 
>> upstream it and wait for a new qemu to be released.  With a plugin 
>> API, they can add a new block format to an existing, supported qemu.
> 
> Whether we have a plugin or protocol based mechanism to implement block 
> formats really ends up being just an implementation detail.
> 
> In order to implement either, we need to take a subset of block 
> functionality that we feel we can support long term and expose that.  
> Right now, that's basically just querying characteristics (like size and 
> geometry) and asynchronous reads and writes.
> 
> A protocol based mechanism has the advantage of being more robust in the 
> face of poorly written block backends so if it's possible to make it 
> perform as well as a plugin, it's a preferable approach.
> 
> Plugins that just expose chunks of QEMU internal state directly (like 
> BlockDriver) are a really bad idea IMHO.

I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.

But if we were to decide that there's no way around it, I agree with you
that directly exposing the internal API isn't going to work.

Kevin

Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization

2010-05-25 Thread Anthony Liguori


On 05/25/2010 08:19 AM, Avi Kivity wrote:

On 05/25/2010 04:03 PM, Anthony Liguori wrote:


I don't think that qdev device names and paths are something we 
have to worry much about changing over time since they reflect 
logical bus layout.  They should remain static provided the devices 
remain static.


Modulo mistakes.  We already saw one (lack of pci domains).  To 
reduce the possibility of mistakes, we need reviewable documentation.



pci domains was only a mistake as a nice-to-have.  We can add pci 
domains in a backwards compatible way.


It adds a new level to the qdev tree.


The tree is not organized like that today.  IOW, the PCI hierarchy is 
not reflected in the qdev hierarchy.  All PCI devices (regardless of 
whether they're a function or a full slot) simply sit below the PCI bus.




The arguments you're making about the importance of backwards 
compatibility and what's needed to strongly guarantee it are equally 
applicable to the live migration protocol.  We really do need to 
formally document the live migration protocol in such a way that it's 
reviewable if we hope to truly make it compatible across versions.


Mostly agreed.  I think live migration has a faster/easier deprecation 
schedule (easier not to support migration from 0.n-k to 0.n than to 
remove qmp support for a feature introduced in 0.n-k when releasing 
0.n).  But that's a minor concern, improving our externally visible 
interface documentation is a good thing and badly needed.




Regards,

Anthony Liguori

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity


On 05/25/2010 04:53 PM, Kevin Wolf wrote:


I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.

   


The first part of your argument may be true, but the second isn't.  No 
user can run upstream qemu.git.  It's not tested or supported, and has 
no backwards compatibility guarantees.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] Re: [PATCH] add support for protocol driver create_options

2010-05-25 Thread Kevin Wolf

Am 24.05.2010 08:34, schrieb MORITA Kazutaka:
> At Fri, 21 May 2010 18:57:36 +0200,
> Kevin Wolf wrote:
>>
>> Am 20.05.2010 07:36, schrieb MORITA Kazutaka:
>>> +
>>> +/*
>>> + * Append an option list (list) to an option list (dest).
>>> + *
>>> + * If dest is NULL, a new copy of list is created.
>>> + *
>>> + * Returns a pointer to the first element of dest (or the newly allocated 
>>> copy)
>>> + */
>>> +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest,
>>> +QEMUOptionParameter *list)
>>> +{
>>> +size_t num_options, num_dest_options;
>>> +
>>> +num_options = count_option_parameters(dest);
>>> +num_dest_options = num_options;
>>> +
>>> +num_options += count_option_parameters(list);
>>> +
>>> +dest = qemu_realloc(dest, (num_options + 1) * 
>>> sizeof(QEMUOptionParameter));
>>> +
>>> +while (list && list->name) {
>>> +if (get_option_parameter(dest, list->name) == NULL) {
>>> +dest[num_dest_options++] = *list;
>>
>> You need to add a dest[num_dest_options].name = NULL; here. Otherwise
>> the next loop iteration works on uninitialized memory and possibly an
>> unterminated list. I got a segfault for that reason.
>>
> 
> I forgot to add it, sorry.
> Fixed version is below.
> 
> Thanks,
> 
> Kazutaka
> 
> ==
> This patch enables protocol drivers to use their create options which
> are not supported by the format.  For example, protcol drivers can use
> a backing_file option with raw format.
> 
> Signed-off-by: MORITA Kazutaka 

$ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G
Unknown option 'cluster_size'
qemu-img: Invalid options for file format 'qcow2'.

I think you added another num_dest_options++ which shouldn't be there.

Kevin

[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi

On Tue, May 25, 2010 at 1:04 PM, Avi Kivity  wrote:
> Those %ps are more or less useless.  We need better ways of identifying
> them.

You're right, the vq pointer is useless in isolation.  We don't know
which virtio device or which virtqueue number.

With the full context of a trace it would be possible to correlate the
vq pointer if we had trace events for vdev and vq setup.

Adding custom formatters is could be tricky since the format string is
passed only to tracing backends that use it, like UST.  And UST uses
its own sprintf implementation which we don't have direct control
over.

I think we just need to guarantee that any pointer can be correlated
with previous trace entries that give context for that pointer.

Stefan

1 2 >

1 - 100 of 191 matches

Mail list logo